Interest and use cases for transcoding streams


#1

The notion of exposing transcoders that are supported by browsers (as part of the network stack, etc) to script has come up frequently:

… and gets discussion elsewhere too, such as this twitter thread with some implementers/standards folks weighing in.

Now that we have some precedent with TextDecoderStream/TextEncoderStream it’s worth collecting the set of such additional transcoding streams for which there has been demand, and look at whether there’s a generic API that could provide these, or if some require separate APIs but we can at least identify commonalities.

Share your use cases here!


#2

This one isn’t easy, but: Image decoding.

A streaming decoder would yield:

  • A portion of an image (eg, the first n rows).
  • A scan of an image, for formats like JPEG that perform multiple scans.
  • A frame of an image, for GIF & APNG.

#3

There’s also https://w3c.github.io/webcrypto/#dfn-SubtleCrypto-method-digest. Maybe the method could just accept a stream? Although, I’m not sure if it’s better for a function to accept a readable stream, or provide a writable endpoint. In the latter case you’d have to figure out how to expose the result.


#4

Encrypting/decrypting (AES-GCM for example) might be a better example of stream-in, stream-out. Hashes just take a stream in.

The image example makes me think that it’s a poor fit for this. Maybe you could select from bytes-in-region-out, bytes-in-scan-out, or bytes-in-frame-out, but that is tricky.

How much value is there in a unified API as oppose to having functions in the form stream<byte> encrypt(stream<byte> in)?


#5

The transform stream API is better for web developers because it is an established pattern that is expected to work. It would be unfortunate if you used one pattern for text encoding and another for byte encrypting. It also makes it clear to web developers that such a transformation is stateful, whereas encrypt() would be holding the state implicitly through a closure.

The transform stream API is better for spec authors and implementers, because it automatically takes care of backpressure for you. Having to implement the proper queue-size-tracking there yourself is somewhat nightmareish.