A ZIP API in the browser?


#8

I don’t see why it needs to be either. It can be specified on its own, then if ES wants to make it a language requirement it can just reference it. There’s no reason that specifying it on its own makes it browser-only.


#9

It might make sense as a part of the JS standard library:
http://wiki.ecmascript.org/doku.php?id=harmony:modules_standard

For example in the Python standard library: https://docs.python.org/3/library/archiving.html


#10

Quick question: What do you want to use for binary data? “raw” encoded strings where each character is a char-code between 0 and 255 or Unit8Array or something else?


#11

ArrayBuffer is the general plan.


#12
// Normal deflate as a simple sync function
var deflated = zlib.deflate(data);
var inflated = zlib.inflate(deflated);

// Variants for the other two common encodings
zlib.deflateRaw(data);
zlib.inflateRaw(data);
zlib.gzip(data);
zlib.gunzip(data);

// Streaming Interface
var deflater = zlib.deflateStream();
var out = deflater.write(chunk);
out = deflater.write(chunk);
out = deflater.flush();

// When you know how many bytes to send
var inflater = zlib.inflateStream();
var out = inflater.write(chunk);
out = inflater.write(chunk);
out = inflater.flush();

// When you don't know where the deflate stream ends
var inflater = zlib.inflateStream(onEnd);
var out = inflater.write(chunk);
out = inflater.write(chunk)
function onEnd(extra) {
  // extra is the extra bytes from the last chunk that don't belong.
  // It could be zero-byte if you fed in the exact amount
}
// And then variants for the other two formats
// Or there could be an options hash in all these interfaces if that was
// perferred over multiple named functions.

#13

The exact stream interface isn’t important. What matters to me is the onEnd in the last example. I need this in js-git where I have a stream of unknown length and unknown number of bytes in that stream are deflate data. I don’t know how many bytes to hand off to inflate. The only way to know is to perform the actual inflate which knows internally when it’s state machine reaches the end state.

For performance, it might make more sense if the stream didn’t constantly flush output data. Instead something like:

var inflater = zlib.inflateStream();
inflater.write(chunk);
inflater.write(chunk);
var out = inflater.flush();
inflater.write(chunk);
out = inflater.end();

Here flush gives you the data that’s ready to be emitted. End is flush and check to tell the parser there will be no more data. This will throw if more data is expected, so end is like a validator.


#14

For your information Deflate is name of the main compression scheme used within ZIP archives. Raw Deflate is the bare minimum compressed data stream (no header of any kind, no checksum), usually it is used with Zlib warpers (small header and checksum), GZIP defines the file format associated to .gz files (it adds a GZIP header, a file name, a time stamp and the original file size) the sole compression method used by GZIP is Deflate. All these are defined by RFC 1951 (Deflate), RFC 1950 (Zlib) and RFC 1952 (GZIP). But Deflate is an aging compression method, over time new ones (producing smaller compressed data, not really faster) have been added to ZIP archives, Deflate64, BZIP2, PPMD and even LZMA. http://en.wikipedia.org/wiki/Zip_(file_format)#Compression_methods Supporting only Deflate would thus only cover a subset of ZIP and it the implementation would appear to be stuck in the 90’s.


#15

while i realy like the idea of this i think that with newer methods comming to play we run into the same problems that video, audio, webrtc, picture and all this stuff will have a few years from now. As an api i think it should provide basic functions that all this methods use, like low level actions, huffman, dictionaries search and the like. i think that with the ongoing optimizations we can do much but we also need ways to control stuff like memory allocation or types so that this kind of algorithm run as fast as possible.


#16

That is a great point. Hmm.

This is largely solved already by asm.js and asm.js-like techniques, i.e. using a fixed-size arraybuffer and manipulating the memory there manually instead of using JS’s native memory semantics, and using “type annotations” like x | 0 for integers.


#17

Ah! was driving me nuts I couldn’t login. Anyhow, that’s fixed. There is a Cordova plugin for this: http://plugins.cordova.io/#/package/org.chromium.zip

I’ve used it. Its helpful for grabbing assets progressively. (Say: level 2 in a game.)


#18

How about this API? http://stuk.github.io/jszip/documentation/api_jszip.html


#19

I think zip functionality must be an ECMAScript feature, as @mathias mentioned.


#20

I think it would be interesting to have some metrics about the time it takes to do some basic unzip operations on different systems in JS and natively. It would also be great to have a reference implementation and see how it runs on top of FTLJIT or ASM.js.


#21

I think this topic needs further consideration.

While it is true that improvements on DEFLATE have been made in the decades since its introduction, it is also true that these algorithms are slower than DEFLATE, often by a great deal, for comparatively marginal gains. As the simpler algorithm among a handful of efficient compression fundamentals discovered, it will be with us for a long time to come.

Comparative efficiency arguments are superfluous, however, in the face of the breadth of DEFLATE usage by clients. PNG will not be superseded in our lifetimes, certainly. All the W3C member orgs make exensive use of DEFLATE in a variety of capacities in their software, from browsers to operating systems. While asm.js implementations of DEFLATE do exist, the fact remains that they are not native and when dealing with compression speed matters. It’s just another reason to rely on insecure plugin technology instead of scripting for routine tasks. asm.js DEFLATE are also inefficient, in that the deflation scripts must be included in every webpage that uses them. Is this not the purpose of the browser as a scripting runtime, to prevent the needless duplication of code?

Client-side DEFLATE is worth having. It will reduce server loads by allowing compression/decompression of the client side. In the age of fast phones, DEFLATE isn’t the CPU hog it used to be. It will mean compression becoming a tool of the casual developer. Today, those who try to implement compression in their web apps face criticism for not using native code solutions. This not only discourages the use of compression on the client side, but discourages the development of web apps generally.


#22

Zip.js has an alternative asm.js compiled compression engine, and it is designed to run compression tasks in a web worker asynchronously. Assuming that asm.js is near-enough native performance, what benefit does building this in to the browser have? As far as I can tell it is just that it might be a little faster, but the cost is a major standardisation effort and lot of work by browser vendors. Why not just contribute to the framework to add the features you want?


#23

You do realize that it’s already in the browser, and that it’s simply not been exposed to Javascript? For people experienced with the browser’s code, it wouldn’t take more than 15 minutes to expose the code.


#24

Yes, but standards are nowhere near that straightforward. Any feature any browser adds must effectively be supported forever to avoid breaking websites. It must also be fully specced, tested and interoperable with all other browsers. Future changes can be difficult to make without also breaking websites. Using a framework avoids a lot of these problems, and seems like an especially good choice if having it built in to the browser only has a minor benefit.


#25

I don’t know if I would call not having to download several KB of redundant (asm.)JS/WebAssembly of varying quality per-site to implement something already implemented more optimally by the UA a “minor benefit”. I think exposing this functionality to JS, as a low-level algorithm with a well-defined interface, makes sense. The “specs are hard, therefore we should leave everything up to pages” argument doesn’t hold water with me.


#26

I am shocked by the fact, that so many people are ready to make the specification even more complex, because of something, what could be performed by 50 lines of code.

Maybe we should have Math.multiplyByTwoPointFive(x) , which would multiply the input by 2.5 ? Sure, you could write x *= 2.5;, but that is just too hard, there should be a built-in function for it.

Multiplication by 2.5 is certainly used more often, than Deflate / Inflate algorithms, so we should add it first.


#27

Please avoid from using dismissive language. If you have a concrete example where standard inflate/deflate can be performed by 50 lines of JS is a way that is as fast as the native implementation that’d be great input for this thread.


Interest and use cases for transcoding streams