A ZIP API in the browser?


Deflate and inflate are the hard parts to implement. I’m currently using the pako library from nodeca on github. It’s fast, smallish, and written in easy-to-consume common-js format.

If there was such an API in the browser, I would want just the basics. I would make it sync, but make it available to web workers so that it could be offloaded along with other cpu intensive work that usually surrounds this kind of work.

For JS-Git I need non-buffered deflate and inflate. I don’t need raw-deflate or gzip style deflate, but I’m sure others would like these extra formats. I also need a form of inflate where I can feed it bytes a chunk at a time and the parser tells me when the end of the deflate stream has been reached and gives me back the extra bytes. I don’t need the output streaming, but others might. We should probably have streaming inflate and deflate for completeness and symmetry.

If it is helpful I could draft up a concrete API for discussion.


A use case I’ve explored is for localStorage. Use of existing JS zip libs for images as dataURLs is unrealistically slow I’ve found. So it’s a +1 from me :wink:



Why should this API only be available in the browser, though? I understand browsers currently have things like deflate compression built-in while stand-alone ECMAScript engines don’t, but if feasible, I’d prefer to add these features to ECMAScript so that they’re available in Node.js etc. as well.


In the past the feedback on this has been “it can be done in script, even if optimisations are needed we should see what comes up” and “this would need streams”.

I think we’re at the point where we ought to move ahead with what we have. +1 to get the ball rolling.


@mathias: I don’t think this is in any way a language feature. It might be an API that multiple environments implement, like setTimeout, but it wouldn’t be part of the language or VM.

@creationix given the positive feedback here, I think at least a first-pass draft of a concrete API would be very helpful. Your experience as someone with a concrete use case would be invaluable :). And your tendency to start with just the basics sounds good too.


Why not? Can you elaborate on this? setTimeout is an interesting example; ideally that should just be part of ECMAScript too.

Every feature that gets standardized is an opportunity to increase interoperability between ECMAScript engines in browsers and non-browser JS engines. It would be a shame to dismiss opportunity right from the start.

Putting APIs in ECMAScript

I don’t see why it needs to be either. It can be specified on its own, then if ES wants to make it a language requirement it can just reference it. There’s no reason that specifying it on its own makes it browser-only.


It might make sense as a part of the JS standard library:

For example in the Python standard library: https://docs.python.org/3/library/archiving.html


Quick question: What do you want to use for binary data? “raw” encoded strings where each character is a char-code between 0 and 255 or Unit8Array or something else?


ArrayBuffer is the general plan.

// Normal deflate as a simple sync function
var deflated = zlib.deflate(data);
var inflated = zlib.inflate(deflated);

// Variants for the other two common encodings

// Streaming Interface
var deflater = zlib.deflateStream();
var out = deflater.write(chunk);
out = deflater.write(chunk);
out = deflater.flush();

// When you know how many bytes to send
var inflater = zlib.inflateStream();
var out = inflater.write(chunk);
out = inflater.write(chunk);
out = inflater.flush();

// When you don't know where the deflate stream ends
var inflater = zlib.inflateStream(onEnd);
var out = inflater.write(chunk);
out = inflater.write(chunk)
function onEnd(extra) {
  // extra is the extra bytes from the last chunk that don't belong.
  // It could be zero-byte if you fed in the exact amount
// And then variants for the other two formats
// Or there could be an options hash in all these interfaces if that was
// perferred over multiple named functions.


The exact stream interface isn’t important. What matters to me is the onEnd in the last example. I need this in js-git where I have a stream of unknown length and unknown number of bytes in that stream are deflate data. I don’t know how many bytes to hand off to inflate. The only way to know is to perform the actual inflate which knows internally when it’s state machine reaches the end state.

For performance, it might make more sense if the stream didn’t constantly flush output data. Instead something like:

var inflater = zlib.inflateStream();
var out = inflater.flush();
out = inflater.end();

Here flush gives you the data that’s ready to be emitted. End is flush and check to tell the parser there will be no more data. This will throw if more data is expected, so end is like a validator.


For your information Deflate is name of the main compression scheme used within ZIP archives. Raw Deflate is the bare minimum compressed data stream (no header of any kind, no checksum), usually it is used with Zlib warpers (small header and checksum), GZIP defines the file format associated to .gz files (it adds a GZIP header, a file name, a time stamp and the original file size) the sole compression method used by GZIP is Deflate. All these are defined by RFC 1951 (Deflate), RFC 1950 (Zlib) and RFC 1952 (GZIP). But Deflate is an aging compression method, over time new ones (producing smaller compressed data, not really faster) have been added to ZIP archives, Deflate64, BZIP2, PPMD and even LZMA. http://en.wikipedia.org/wiki/Zip_(file_format)#Compression_methods Supporting only Deflate would thus only cover a subset of ZIP and it the implementation would appear to be stuck in the 90’s.


while i realy like the idea of this i think that with newer methods comming to play we run into the same problems that video, audio, webrtc, picture and all this stuff will have a few years from now. As an api i think it should provide basic functions that all this methods use, like low level actions, huffman, dictionaries search and the like. i think that with the ongoing optimizations we can do much but we also need ways to control stuff like memory allocation or types so that this kind of algorithm run as fast as possible.


That is a great point. Hmm.

This is largely solved already by asm.js and asm.js-like techniques, i.e. using a fixed-size arraybuffer and manipulating the memory there manually instead of using JS’s native memory semantics, and using “type annotations” like x | 0 for integers.


Ah! was driving me nuts I couldn’t login. Anyhow, that’s fixed. There is a Cordova plugin for this: http://plugins.cordova.io/#/package/org.chromium.zip

I’ve used it. Its helpful for grabbing assets progressively. (Say: level 2 in a game.)


How about this API? http://stuk.github.io/jszip/documentation/api_jszip.html


I think zip functionality must be an ECMAScript feature, as @mathias mentioned.


I think it would be interesting to have some metrics about the time it takes to do some basic unzip operations on different systems in JS and natively. It would also be great to have a reference implementation and see how it runs on top of FTLJIT or ASM.js.


I think this topic needs further consideration.

While it is true that improvements on DEFLATE have been made in the decades since its introduction, it is also true that these algorithms are slower than DEFLATE, often by a great deal, for comparatively marginal gains. As the simpler algorithm among a handful of efficient compression fundamentals discovered, it will be with us for a long time to come.

Comparative efficiency arguments are superfluous, however, in the face of the breadth of DEFLATE usage by clients. PNG will not be superseded in our lifetimes, certainly. All the W3C member orgs make exensive use of DEFLATE in a variety of capacities in their software, from browsers to operating systems. While asm.js implementations of DEFLATE do exist, the fact remains that they are not native and when dealing with compression speed matters. It’s just another reason to rely on insecure plugin technology instead of scripting for routine tasks. asm.js DEFLATE are also inefficient, in that the deflation scripts must be included in every webpage that uses them. Is this not the purpose of the browser as a scripting runtime, to prevent the needless duplication of code?

Client-side DEFLATE is worth having. It will reduce server loads by allowing compression/decompression of the client side. In the age of fast phones, DEFLATE isn’t the CPU hog it used to be. It will mean compression becoming a tool of the casual developer. Today, those who try to implement compression in their web apps face criticism for not using native code solutions. This not only discourages the use of compression on the client side, but discourages the development of web apps generally.