Background
Low-latency streaming is becoming increasingly common in today’s MSE video players. The prevailing mechanism of low-latency streaming, chunked transfer response (CTE), involves clients receiving media segments in a series of small chunks as opposed to whole segments. Chunks are can range in sizes depending on connection speed, but on a good connection I’ve seen them as 16kb. Therefore for a 1mb media segment will be received in 63 chunks, and about 63 chunks will eventually be append to the SourceBuffer (this depends on player implementation but let’s assume this for now).
Problem
Appending to an MSE SourceBuffer is not instantaneous - it’s an asynchronous operation which has a scheduling overhead. According to @Matt_Wolenetz this scheduling overhead is the same for chunks up to 128kb:
During the async parsing, Chrome’s MSE implementation chunks the appended bytes by 128KB, so if there is a tiny append, it will incur similar scheduling costs as a 128KB-minus-1-byte append, but a 128KB-plus-1-byte append will incur one additional potential point of contention with other main thread work (essentially, think of it as a setTimeout(processNext128KBChunk, 0)).
My bench testing (2018 MBP) was showing about 30ms for each append up to 128kb. In the non low-latency player, this overhead is incurred only once - but for low-latency, it can incurred dozens of times. And because MSE appends occur on the main thread (append and subsequent of updateend
), each append has an opportunity to be blocked**. This can lead to extra buffer empty events, and hotter devices due to to more JS running.
** Won’t be as significant when MSE in workers land
Solution
My proposed solution is to add an overload to SourceBuffer.appendBuffer()
which would allow an ArrayBuffer[]
argument (in addition to the existingArrayBuffer
arg). The expected UA behavior is to consume the array members in batches up to 128kb, minimizing the scheduling overhead. While a SourceBuffer is updating, clients would be able to bath all incoming data and execute a single append on updateend
.
Why batching in JS is not a good idea
It’s fair to say that clients are more than capable of batching chunks themselves (in fact, this is the existing behavior of Hls.js). However, this causes the client to frequently allocate TypedArrays (for the purpose of merging TypedArrays). Eventually this will lead to GC events which freeze the thread and block further appends. Done at an inopportune time (e.g. during a downswitch), this can cause a rebuffer. Clients could try to re-use TypedArrays in order to reduce allocations, but this becomes infeasible when passing data to/from a worker.
Very Simple Example
const batch = ArrayBuffer[];
function appendChunk(chunk: ArrayBuffer) {
batch.push(chunk);
if (sb.updating) {
return;
}
sb.appendBuffer(batch);
}
function onUpdateEnd() {
if (batch.length) {
sb.appendBuffer(batch);
}
}