Hint attribute in WebRTC to influence underlying audio/video buffering

Tags: #<Tag:0x00007f0539532b50>

In WebRTC it is currently up to the User Agent to decide how much or little audio/video received on the network to buffer before playout. More buffering increases the likelihood of smooth playout but increases the playout delay. The User Agent makes the decision based on network conditions, internal bandwidth estimation, congestion control mechanisms, etc. This is an implementation-specific heuristic that has been fine-tuned with video conferencing use cases in mind. However, different applications may have different preferences for the tradeoff between latency and smoothness. The JavaScript application has no means to influence this decision today.

Objective

We want to provide means for javascript applications to set their preferences on how fast they want to render audio or video data. As fast as possible might be beneficial for applications which concentrates on real time experience. For others additional data buffering may provide smother experience in case of network issues.

Use cases

Cloud gaming is a good example where application requires a very high level of interactivity. 60 FPS corresponds to 16.6 milliseconds delay and we would like to provide User Agent a hint to render media as fast as possible without hurting the user experience.

On the other hand, for live streaming additional 500~1500 milliseconds probably won’t hurt live experience but it would allow smoother playback because when small network issues happen you would still have some data to play.

API Surface

partial interface RTCRtpReceiver {
  attribute double? playoutDelayHint;
};

playoutDelayHint is measured in seconds. null value corresponds to the current default User Agent behavior or in other words “no application preference”. More details can be found in this spec where we collect WebRTC extensions.

We have experimented with it in Chrome for half year, and additional buffering indeed shows measurable benefits in applications which don’t require “as fast as possible” level of interactivity.

Example Usage

// Here |pc| represent peer connection
// with remote audio and video streams attached.
let pc = new RTCPeerConnection();
// ... setup connection with remote audio and video.
const [audioReceiver, videoReceiver] = pc.getReceivers();
// Add additional 500 milliseconds of buffering.
audioReceiver.playoutDelayHint = 0.5;
videoReceiver.playoutDelayHint = 0.5;

Links