[Proposal] OffscreenVideo

As a developer working on a video editor that run in the browser, I got very excited when I saw the OffscreenCanvas API. I believe that I have identified a way to improve it with my potential solution below

Problem:

A very common pattern when manipulating video on an application would be to have a requestAnimationFrame loop that will extract the presented frame of an HTMLVIdeoElement, manipulates the pixels and then draw the result on a canvas. The release of the OffscreenCanvas API allows for the rendering of WebGL graphics off the main thread. However there are currently no reliable way to render a video on an OffscreenCanvas.

We currently have to run a requestionAnimationFrame loop on the main thread that creates an ImageBitmap that will be sent to a worker, to then be rendered on the OffscreenCanvas. This defeats the purpose as if any UI work is perfomed on the main thread this could delay the sending of the frame.

Proposed solution:

An OffscreenVideo would have the same capability as an HTMLVideoElement but would be controllable from the main thread or a worker. My initial thought would be to follow the pattern that the OffscreenCanvas brought and apply that to the video.

const video = document.createElement("video");
const offscreenVideo = video.transferControlOffscreen();

The OffscreenVideo interface would implement all the playback related functions.

An alternative approach to that would be to piggyback the work made to the Media Source Extensions API to run in a worker. https://github.com/w3c/media-source/issues/175. A MediaSource could implement a mechanism to get an ImageBitmap for a given time. This looks a bit more unrealistic to me as all of this logic currently lives only as part of the HTMLVideoElement

I am looking forward to hearing your thoughts about this

1 Like

A very similar concept is described at OfflineMediaContext. In particular see https://github.com/whatwg/html/issues/2981.

My initial thought would be to follow the pattern that the OffscreenCanvas brought and apply that to the video.

const video = document.createElement("video");
const offscreenVideo = video.transferControlOffscreen();

Would suggest that document not be required at all. Technically it is already possible to create and play an HTML <video> element within a Module script, then import the MediaStream or ReadableStream to the main document https://plnkr.co/edit/vQjbBo?preview&p=preview , https://plnkr.co/edit/Axkb8s?preview, though each approach requires a document.

It should be possible to create an object in Worker, SharedWorker, ServiceWorker, Worklet scopes which is capable of accessing and using the media decoders and web media player implementation shipped with the browser in any context, without a document being involved at all.

Note, requestAnimationFrame is not the only means to use to get image frames from a video. ReadableStream alone or paired with a WritableStream, where implemented, can be used to get all of the frames of a video and, if necessary, adjust the number of frames captured.

If the videos are generated in the browser using, e.g., MediaRecorder where the codec used is H264 or AVC1 the file size will always be the same over multiple uses of the same code. For VP8 and VP9 codecs the file size will always not be the same. Meaning attempting to determine the exact number of frames encoded in a given video is variable based on the codec used https://plnkr.co/edit/0GnT1d?p=info.

A related concept is to create a generic method similar to Web Audio API OffscreenAudioContext startRendering() for video files and buffers. Where the video would not be played back, rather ImageBitmaps, or ImageData or data URIs will be extracted from the file “as fast as possible” without rendering the video to any monitor or output device https://github.com/w3c/css-houdini-drafts/issues/905. If modelled on AudioWorkletNode, a currentTime and currentFrame parameter can be set as defaults to the constructor, to avoid the need to mathematically calculate which frame the code is reading (given a finite stream or media file). Once all of the frames are extracted the frame rate can be calculated, it will then be possible to playback the extracted frames at HTML <video> element; <canvas>; Web Animation API; potentially MediaSource, given the raw frames proposal; and using other standard Web APIs; and code unrelated to and outside the constraints of any formalized Web specification, e.g. https://plnkr.co/edit/Inb676.

There’s the case of EME to consider. There are videos that can’t be rendered offscreen.

It is currently possible to render a video on an OffscreenCanvas in a Worker frame by frame.

The individual frames can then be posted to the main thread, drawn onto a <canvas> where the captured MediaStream of the <canvas> can be set as srcObject of a <video> element. Audio represented as Float32Arrays can also be streamed from the Worker thread to the main thread.

HTML

<!DOCTYPE html>
<html>

  <head>
    <title>Stream video frames from Worker to main thread</title>
  </head>

  <body>
    <video autoplay muted controls></video><br>
    <code></code>
    <script>
      const video = document.querySelector("video");
      const canvas = document.createElement("canvas");
      const ctx = canvas.getContext("2d");
      const code = document.querySelector("code");
      ctx.globalComposite = "copy";
      const canvasStream = canvas.captureStream(0);
      const [canvasTrack] = canvasStream.getVideoTracks();
      const mediaStream = [canvasStream, canvasTrack].find(({requestFrame:rf}) => rf);
      const worker = new Worker("worker.js");
      const readStream = async e => {
        if (e.data === "stream done") {
          console.log(e.data);
          // canvasTrack.stop();
          // canvasTrack.enabled = false;
          worker.removeEventListener("message", readStream);
          return;
        }
        const {imageBitmap, width, height} = e.data;
        canvas.width = width;
        canvas.height = height;
        ctx.drawImage(imageBitmap, 0, 0);
        mediaStream.requestFrame();
        imageBitmap.close();
      }
      video.srcObject = canvasStream;
      video.ontimeupdate = e => code.textContent = video.currentTime;
      worker.addEventListener("message", readStream);
    </script>
  </body>

</html>

Worker

(async() => {
  const url = "https://gist.githubusercontent.com/guest271314/895e9961e914ad39a3365a42ec6a945c/raw/97b4d51ae42e17bdda41f16708700e3ebf1d6de4/frames.json";
  const frames = await (await fetch(url)).json();
  console.log(frames);

  const rs = new ReadableStream({
    async pull(controller) {
      for (const frame of frames) {
        const [{
          duration, frameRate, width, height
        }] = frame;
        const framesLength = frame.length -1;
        const frameDuration = Math.ceil((duration * 1000) / framesLength);
        for (let i = 1; i < framesLength; i++) {
          const osc = new OffscreenCanvas(width, height);
          const osctx = osc.getContext("2d");
          const blob = await (await fetch(frame[i])).blob();
          const bmp = await createImageBitmap(blob);
          osctx.drawImage(bmp, 0, 0);
          const imageData = osctx.getImageData(0, 0, width, height);
          // manipulate pixels here
          const imageBitmap = await createImageBitmap(imageData);
          controller.enqueue({imageBitmap, frameDuration});
        }
      }
      controller.close();
    }
  });
  
  const reader = rs.getReader();
  const processStream = async({value, done}) => {
    if (done) {
      await reader.closed;
      return "stream done";
    }
    const {imageBitmap, frameDuration} = value;
    const {width, height} = imageBitmap;
    postMessage({imageBitmap, width, height}, [imageBitmap]);
    await new Promise(resolve => setTimeout(resolve, frameDuration));
    return processStream(await reader.read());
  }
  const done = await processStream(await reader.read());
  postMessage(done);

})();

plnkr https://plnkr.co/edit/gCjYSt?p=preview