This is a proposal to transfer the mst-content-hint repo to the WICG. Currently in a readable format here.
Multiple MediaStreamTrack consumers such as MediaRecorder and PeerConnection APIs do not have strictly defined behaviors for how to process the incoming content. How to do this best is dependent on the type of content that’s being encoded/processed/transmitted.
For audio: Applying noise suppression improves speech intelligibility, but removes drum snares from music. Packet loss concealment helps conceal packet loss by smearing/time stretching samples. This messes up rhythm in music and might not be the best way to conceal packet loss there.
For video: Highly detailed content (webpages with a lot of text, line art, etc.) cannot be downscaled or highly quantized without losing intelligibility. For fluid content (movies/games) degrading by dropping resolution/detail is acceptable and as a result fluid content can preserve a higher framerate without dropping intelligibility.
For RTCRtpSender there’s degradationPreference for framerate/resolution, but max quantization levels for instance are not defined, and are different for detailed content. As far as I’m aware the audio side has no standard knobs/buttons for turning on/off audio processing either. Adding the content hint would also be useful for PeerConnection and MediaStreamRecorder as well (without having to propose the same knobs there).
Blink currently treats tab capture/desktop capture as “screenshare” to preserve detailed content of website/presentations and USB video is treated as non-screenshare (webcam). This assumption is incorrect and results in a bad experience when screencasting videos/games or incorrectly believing that a HDMI capture card is a webcam rather than monitor input.
When the application can make a reasonable guess (game live streaming service likely selects “fluid”, audio workstations likely select “music”) or by providing an option in their UI, underlying track consumers can handle the content in a better way. When the application has no idea, it can keep the value unset as “”, and content will be treated the same way as it is today.
Instead of adding buttons/knobs for all of these features (and future features that are applicable to speech but not music) across all track consumers we propose a simpler hint that can guide implementations. This hint is also significantly simpler to understand for a web developer without a video encoding backgrounds than video-encoder parameters such as max quantization.
This API would not be in place of adding knobs/buttons to interfaces to turn on/off specific features, but help guide implementations where this behavior is not defined, or for values that are not overridden by the user. It should not override behavior specified on the track consumer, e.g. setting a content hint should not invalidate standards compliance in consumers.