[Proposal] ImageDecoder API extension for WebCodecs

Dale_Curtis · 2020-04-22

Participate:

Introduction

Today <img> elements don’t provide access to any frames beyond the first. They also provide no control over which frame is displayed in an animation. As we look to provide audio and video codecs through WebCodecs we should consider similar interfaces for images as well.

We propose a new ImageDecoder API to provide web authors access to an ImageBitmap of each frame given an arbitrary byte array as input. The returned ImageBitmaps can be used for drawing to canvas or WebGL (as well as any other future ImageBitmap use cases). Since the API is not bound to the DOM it may also be used in workers.

Example 1: Animated GIF Renderer

// This example renders an animated image to a canvas via ReadableStream.

let canvas = document.createElement('canvas');
let canvasContext = canvas.getContext('2d');
let imageDecoder = null;
let imageIndex = 0;

function renderImage(imageFrame) {
  canvasContext.drawImage(imageFrame.image, 0, 0);
  if (imageDecoder.frameCount == 1)
    return;

  if (imageIndex + 1 >= imageDecoder.frameCount)
    imageIndex = 0;

  // Decode the next frame ahead of display so it's ready in time.
  imageDecoder.decode(++imageIndex).then(nextImageFrame => setTimeout(
      _ => { renderImage(nextImageFrame); }, imageFrame.duration / 1000.0));
}

function decodeImage(imageByteStream) {
  imageDecoder = new ImageDecoder({data: imageByteStream, options: {}});
  console.log('imageDecoder.frameCount = ' + imageDecoder.frameCount);
  console.log('imageDecoder.type = ' + imageDecoder.type);
  console.log('imageDecoder.repetitionCount = ' + imageDecoder.repetitionCount);
  imageDecoder.decode(imageIndex).then(renderImage);
}

fetch("animated.gif").then(response => decodeImage(response.body));

Output:

imageDecoder.frameCount = 20
imageDecoder.type = "image/gif"
imageDecoder.repetitionCount = 0

test-gif

Example 2: Inverted MJPEG Renderer

// This example renders a multipart/x-mixed-replace MJPEG stream to canvas.

let canvas = document.createElement('canvas');
let canvasContext = canvas.getContext('2d');

function decodeImage(imageArrayBufferChunk) {
  // JPEG decoders don't have the concept of multiple frames, so we need a new
  // ImageDecoder instance for each frame.
  let imageDecoder = new ImageDecoder(
      {data: imageArrayBufferChunk, options: {imageOrientation: "flipY"}});
  console.log('imageDecoder.frameCount = ' + imageDecoder.frameCount);
  console.log('imageDecoder.type = ' + imageDecoder.type);
  console.log('imageDecoder.repetitionCount = ' + imageDecoder.repetitionCount);
  imageDecoder.decode(imageIndex).then(
      imageFrame => canvasContext.drawImage(imageFrame.image, 0, 0));
}

fetch("https://mjpeg_server/mjpeg_stream").then(response => {
  const contentType = response.headers.get("Content-Type");
  if (!contentType.startsWith("multipart"))
    return;

  let boundary = contentType.split("=").pop();

  // See https://github.com/whatwg/fetch/issues/1021#issuecomment-614920327
  let parser = new MultipartParser(boundary);
  parser.onChunk = arrayBufferChunk => decodeImage(arrayBufferChunk);

  let reader = response.body.getReader();
  reader.read().then(function getNextImageChunk({done, value}) {
    if (done) return;
    parser.addBinaryData(value);
    return reader.read().then(getNextImageChunk);
  });
});

Output:

imageDecoder.frameCount = 1
imageDecoder.type = "image/jpeg"
imageDecoder.repetitionCount = 0
...

flipped-gif

Open Questions / Notes / Links

image/svg support is not currently possibly in Chrome since it’s bound to DOM.
Using a ReadableStream may over time accumulate enough data to cause OOM.
Should we allow mime sniffing at all? It’s discouraged these days, but <img> has historically depended on it.
Is there more EXIF information that we’d want to expose?
Should we allow decode() to take a “completeFramesOnly” flag which defaults to true?
- This would allow partial decodes to be returned for more savvy users.
Should we take a fetch() Request object as input so that the API can allow display with tainting of image data that would be blocked by CORS?

Considered alternatives

Providing image decoders through the VideoDecoder API.

The VideoDecoder API being designed for WebCodecs is intended for transforming demuxed encoded data chunks into decoded frames. Which is problematic for image formats since generally their containers and encoded data are tightly coupled. E.g., you don’t generally have a gif demuxer and a gif decoder, just a decoder.

If we allow VideoDecoder users to enqueue raw image blobs we’ll have to output all contained frames at once. Without external knowledge of frame locations within the blob, users will have to decode batches of unknown size or decode everything at once. I.e., there is no piece-wise decoding of an arbitrarily long image sequence and users need to cache all decoded outputs. This feels bad from a utility and resource usage perspective.

The current API allows users to provide as much or as little data as they want. Images are not decoded until needed. Users don’t need to cache their decoded output since they have random access to arbitrary images.

Other minor cumbersome details:

Image containers may define image specific fields like repetition count.
Image containers typically have complicated ICC profiles which need application.

Hang the API off Image/Picture elements

This is precluded due to our goal of having the API work out of DOM.

AshleyScirra · 2020-04-23

This would be great for image editing/animation software. There’s no built-in way to extract each frame of animated images like GIF and APNG, and this would let us easily do things like import a GIF as an image sequence.

Dale_Curtis · 2020-05-14

This is now available in Chrome Canary behind the --enable-blink-features=WebCodecs flag. Please try it out and let us know your thoughts.

Crissov · 2020-05-18

Re Providing image decoders through the VideoDecoder API, HEIF is basically a container, ISOBMFF, that is also used for videos (i.e. MP4) coupled with a video codec, e.g. HEVC, AVC or AV1, to encode still images and various kinds of image sequences – does the dismissal still apply?

Dale_Curtis · 2020-05-18

Yes. If we lived only in a world where HEIF/AVIF existed, then it’d be logical to require folks to demux their own packets. However we expect significant usage of current generation image formats for quite some time.