WebCodecs Proposal

WebCodecs

API that allows web applications to encode and decode audio and video

Many Web APIs use media codecs internally to support APIs for particular uses:

  • HTMLMediaElement and Media Source Extensions
  • WebAudio (decodeAudioData)
  • MediaRecorder
  • WebRTC

But there’s no general way to flexibly configure and use these media codecs. Because of this, many web applications have resorted to implementing media codecs in JavaScript or WebAssembly, despite the disadvantages:

  • Increased bandwidth to download codecs already in the browser.
  • Reduced performance
  • Reduced power efficiency

It’s great for:

  • Live streaming
  • Cloud gaming
  • Media file editing and transcoding

See the explainer for more info.

6 Likes

This sounds great! Currently we ship a WebAssembly encoder for WebM Opus in our web app, even though Chrome has one built-in, just so we can get faster-than-realtime encoding.

It’s not quite clear to me from the explainer how the container formats work though. It looks like we can get a stream from an Opus encoder, but how would those be arranged in to a WebM container? Would that still be left to the web app to solve? It’s a common case for transcoding and the browser also has readers and writers for the container formats too, so it would be nice if that could be covered as well.

Does this proposal include the ability to decode any video that HTMLMediaVideoElement can decode?

We can already use decodeAudioData() of AudioContext and startRendering() of OfflineAudioContext() to get audio data. What have been trying to achieve is getting video data; e.g., decodeVideoData() as an array of ImageData or ImageBitmap faster than real-time.

The scope of this explainer excludes container formats. That would be left to the JS/WASM. The idea here is to do what the JS/WASM cannot (as efficiently) do, but let it it control everything from there. However, if there is enough interest, I suppose one could also propose/design some kind of WebMediaContainers API that goes well with this one.

The idea here is to expose all of the codecs that the browser currently has underneath in the implementation of HTMLMediaElement, so yes, it should be able to decode any video. However (as mentioned in a previous comment), the media going into the decoder is not containerized. So, if you want to decode vp8 inside of mp4, you need to parse the mp4 and pass in the raw vp8 rather than passing in the mp4.

It’s true that decodeAudioData is already there, but it has flaws pointed out in the explainer.

The idea of this proposal is to basically give you “decodeVideoData”, but also “encodeVideoData” and “encodeAudioData”, and all with a better (but lower-level) API.

Filed an issue relevant to the current language in Explainer about MediaRecorder being able to record multiple tracks, which is not presently possible.

Also filed a PR to include an additional use case of merging multiple input media files to a single output stream or file.

Can

Decoded and encoding images

be clarified at the Explainer?

Does the proposal intend to provide a means to get the encoded images from any container that HTMLVideoElement is currently capable of decoding?

The idea of this proposal is to basically give you “decodeVideoData”, but also “encodeVideoData” and “encodeAudioData”, and all with a better (but lower-level) API.

Ok. That should be close enough if not exactly what have been trying to achieve at https://github.com/guest271314/MediaFragmentRecorder, by piping input through MediaRecorder to get various, potentially dissimilar input containers and codecs into webm e.g., input const merged = await decodeVideoData(["video1.webm#t=5,10", "video2.mp4#t=10,15", "video3.ogv#t=0,5"], {codecs="openh264"});; similar to the output of using mkvmerge (though mkvmerge also sets cues and duration, which MediaRecorder at least at Chromium, does not).

I really like this proposal – it would be very useful for several of my plans for Wikipedia’s video support, including in-browser transcoding on upload and realtime composition and transitions in a video editor… as long as it’s possible to manipulate and synthesize the data in a DecodedVideoFrame.

Currently it looks like there’s no way specified to manipulate one other than to pass it into stuff for playback or recording, and no way to create one except through decoding a compressed frame.

Ideally, I’d be able to get at the pixels in a decoded frame so I can do something custom with them (recode them manually, or combine with another decoded frame or a generated image to create a transition or visual effect) and then send that on.

Would you consider adding pixel-data getters and a constructor for DecodedVideoFrame, or would another way of doing these be preferable? Thanks!

[edit: It occurs to me that some kind of composition of this proposal with something like [Proposal] Allow Media Source Extensions to support demuxed and raw frames may be a happy union on that front. :slight_smile: ]

Re: composition with MSE. Yes I had that thought as well. If we end up with standardized definitions for an EncodedPacket and DecodedFrame we could add append methods for arrays of those to MediaSource SourceBuffer objects.

It’s possible that has a longer standardization process since it requires new APIs versus an extensions to the byte stream registry. It’s also unclear exactly how a wasm decoder would be able to directly write into a JS object. Possibly the object can use ArrayBufferViews that can point into wasm memory.

(Lets keep subsequent discussion of that on the other proposal. Thanks!)

It should be possible to get to raw audio through WebAudio. It should also be possible to go through a canvas to get to the raw pixel data, but that’s rather hacky. We have been discussing better ways to get access to raw pixel data but it’s complicated to do it in a way that’s easy and fast.