A partial archive of discourse.wicg.io as of Saturday February 24, 2024.

Media timed events API for MPEG DASH MPD and emsg events

chrisn
2018-10-15

There is a need in the media industry for an API to support metadata events synchronized to audio or video media, specifically for both out-of-band event streams and in-band discrete events (e.g., MPD -carriage and emsg events in MPEG DASH: http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip).

These media timed events can be used to support use cases such as ad insertion or presentation of supplemental content alongside the audio or video.

On resource constrained devices such as smart TVs and streaming sticks, parsing media segments to extract event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.

The DataCue API has been previously discussed as a means to deliver in-band event data to Web applications, but this is not implemented in mainstream browser engines and is therefore not reflected in the WHATWG living HTML specification. There is previous discussion here, and an earlier liaison statement from HbbTV here. Whether DataCue should be taken up again or another API should be developed, we believe there is a recognized need for extending the existing HTML specification to assist web applications in being able to properly process and render media-timed events.

The Media & Entertainment Interest Group has a draft use case and requirements document here. Please note that this document describes a broader range of topics, and we only propose to work on MPEG DASH MPD and emsg events at WICG at this stage.

We ask for your support in progressing this, and look forward to working with you.

eric_carlson
2018-10-17

The DataCue API has been previously discussed as a means to deliver in-band event data to Web applications, but this is not implemented in mainstream browser engines and is therefore not reflected in the WHATWG living HTML specification.

WebKit supports DataCue.

The original interface was extended with two attributes to support non-text metadata, type and value:

 interface DataCue : TextTrackCue {
    attribute ArrayBuffer data; // Always empty

    // Proposed extensions.
    attribute any value;
    readonly attribute DOMString type;
};

https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/html/track/DataCue.idl

type: A string identifying the type of metadata:

"com.apple.quicktime.udta" - QuickTime User Data
"com.apple.quicktime.mdta" - QuickTime Metadata
"com.apple.itunes" - iTunes metadata
"org.mp4ra" - MPEG-4 metadata
"org.id3" - ID3 metadata

value: An object with the metadata item key, data, and optionally a locale:

value = {
    key: String
    data: String | Number | Array | ArrayBuffer | Object
    locale: String
}

This simple WebKit layout tests loads various types of ID3 metadata from an HLS stream.

For more information, see this session from WWDC 2014.

chrisn
2018-12-07

At the WICG meeting at TPAC in Lyon we were approved to start an incubation. Is there anything more needed from me in order to set up a repository? I suggest we use the name ‘data-cue’. Thanks.

marcosc
2018-12-13

I’ve requested for a Mozilla position on this proposal https://github.com/mozilla/standards-positions/issues/122

chrisn
2018-12-17

An initial draft explainer is here: https://github.com/chrisn/datacue/blob/master/explainer.md

chrisn
2019-03-12

I have updated the explainer, PTAL: https://github.com/chrisn/datacue/blob/master/explainer.md

guest271314
2019-10-06

This is contained in the Explainer referring to VTTCue()

It also does not directly support in-band timed metadata.

VTTCue() can be created at any point, correct?

chrisn
2020-02-14

@guest271314 Sorry for the late reply. Yes, VTTCue can be created at any point. As this is currently the only cue type with a constructor, it’s the only way for web apps to create cues at the moment.