[Proposal] Array overload for SourceBuffer.appendBuffer

Background

Low-latency streaming is becoming increasingly common in today’s MSE video players. The prevailing mechanism of low-latency streaming, chunked transfer response (CTE), involves clients receiving media segments in a series of small chunks as opposed to whole segments. Chunks are can range in sizes depending on connection speed, but on a good connection I’ve seen them as 16kb. Therefore for a 1mb media segment will be received in 63 chunks, and about 63 chunks will eventually be append to the SourceBuffer (this depends on player implementation but let’s assume this for now).

Problem

Appending to an MSE SourceBuffer is not instantaneous - it’s an asynchronous operation which has a scheduling overhead. According to @Matt_Wolenetz this scheduling overhead is the same for chunks up to 128kb:

During the async parsing, Chrome’s MSE implementation chunks the appended bytes by 128KB, so if there is a tiny append, it will incur similar scheduling costs as a 128KB-minus-1-byte append, but a 128KB-plus-1-byte append will incur one additional potential point of contention with other main thread work (essentially, think of it as a setTimeout(processNext128KBChunk, 0)).

My bench testing (2018 MBP) was showing about 30ms for each append up to 128kb. In the non low-latency player, this overhead is incurred only once - but for low-latency, it can incurred dozens of times. And because MSE appends occur on the main thread (append and subsequent of updateend), each append has an opportunity to be blocked**. This can lead to extra buffer empty events, and hotter devices due to to more JS running.

** Won’t be as significant when MSE in workers land

Solution

My proposed solution is to add an overload to SourceBuffer.appendBuffer() which would allow an ArrayBuffer[] argument (in addition to the existingArrayBuffer arg). The expected UA behavior is to consume the array members in batches up to 128kb, minimizing the scheduling overhead. While a SourceBuffer is updating, clients would be able to bath all incoming data and execute a single append on updateend.

Why batching in JS is not a good idea

It’s fair to say that clients are more than capable of batching chunks themselves (in fact, this is the existing behavior of Hls.js). However, this causes the client to frequently allocate TypedArrays (for the purpose of merging TypedArrays). Eventually this will lead to GC events which freeze the thread and block further appends. Done at an inopportune time (e.g. during a downswitch), this can cause a rebuffer. Clients could try to re-use TypedArrays in order to reduce allocations, but this becomes infeasible when passing data to/from a worker.

Very Simple Example

const batch = ArrayBuffer[];
function appendChunk(chunk: ArrayBuffer) {
    batch.push(chunk);
    if (sb.updating) {
        return;
    }
    sb.appendBuffer(batch);
}

function onUpdateEnd() {
    if (batch.length) {
        sb.appendBuffer(batch);
    }
}
2 Likes

Am not entirely following the gist of the proposed solution. It is already possible to pass an ArrayBuffer to appendBuffer().

Is the requirement to pass multiple discrete buffers to appendBuffer() at the same time?

Yes that’s the plan - will add an example. Also added a background section.

The code at the example is already currently possible. If the input are discrete ArrayBuffers with metadata set at each instance (an Initialization segment) the pattern at the example can be used right now. If the requirement is to pass multiple parts of a single media file stored as an ArrayBuffer subarray() can be utilized to pass each part of the single ArrayBuffer to appendBuffer(), in pertinent part

    const sourceOpen = e => {
      sourceBuffer = mediaSource.addSourceBuffer(mimeCodec);
      sourceBuffer.mode = "sequence";
      const init = ab.subarray(start, trackOffsets[index]);
      start = trackOffsets[index];
      ++index;
      sourceBuffer.appendBuffer(init);
    }
    const handleWaiting = e => {
      if (start < ab.byteLength) {
        console.log(start, index);
        const chunk = ab.subarray(start, start + trackOffsets[index]);
        sourceBuffer.appendBuffer(chunk);
        start += trackOffsets[index];
        ++index;
      } else {
        try {
          video.removeEventListener("waiting", handleWaiting);
          mediaSource.endOfStream();
        } catch (e) {
          console.error(e.stack);
          console.trace();
        }
      }
    }

Your example is assuming that you know the size of ab before, which you do not. This introduces a burden on the client to re-size ab in the case that it not large enough to append the current chunk, which involves extra allocations. Furthermore, how does your system know that ab should free memory if it has been allocated more space than it needs?

Yes, that is the base case for the code at the example. The code was composed attempting to work around a bug. Any values relevant to start and offset can be adjusted. The input can be an “infinite” stream. The pattern is dynamic.

This introduces a burden on the client to re-size ab in the case that it not large enough to append the current chunk, which involves extra allocations.

An ArrayBuffer cannot be resized.

Furthermore, how does your system know that ab should free memory if it has been allocated more space than it needs?

Are you referring to SourceBuffer storage in the browser?

What is input and the expected result?

The input can be an “infinite” stream. The pattern is dynamic.

But you still need to initially allocate ab to be large enough to handle all inputs; but you also want it to be small enough so that you’re not wasting memory. This does not seem trivial.

An ArrayBuffer cannot be resized.

What I mean by resize is transferring the current contents to a new and larger array

Are you referring to SourceBuffer storage in the browser?

No, I’m referring to the ArrayBuffer allocated in JS.

Maybe having a appendBuffer accept a ReadableStream is a better idea? Seems like it would be more work to implement though.

Is the input in this case known or unknown?

Two input options

  • Known number of discrete parts
  • Unknown number of discrete parts

One of the two input options the rate of expected output, expected total duration, expected toal memory usage, static hardware, software, API, data size and type limitations can be calculated with regard to the requirement of streaming media “seamlessly” - at the given device.

The common denominator for either of the input media options is that there MUST be 0: currentTime | byte 0 | frame 0 | unknown data 0 | read 0 | write 0. From the mathematical certainty of 0 once the first discrete part is input to produce output up until the last part of input is output can be profiled for memory usage - at that specific device.

In general, it is not possible to determine each maximum input and total memory usage at every device, network configuration, and implementation of a specified API without testing the same code at each device, network, and implementation at least until the code and device break.

An ArrayBuffer cannot be resized.

What I mean by resize is transferring the current contents to a new and larger array

Either input is a single ArrayBuffer or multiple ArrayBuffers. The ArrayBuffer(s) can either be each an initialization segment or a part of the media following an initialization segment.

Why do the current content need to be transferred to a new array?

Does the application have control of input?

Are you referring to SourceBuffer storage in the browser?

No, I’m referring to the ArrayBuffer allocated in JS.

It should be possible to test the limitations of the target devices then write code which outputs the expected result.

Maybe having a appendBuffer accept a ReadaableStream is a better idea? Seems like it would be more work to implement though.

It is possible to use a ReadableStream or an async generator and iterator (https://next.plnkr.co/edit/KXdaXG?preview) to input media to appendBuffer(). FWIW there are several example approaches for streaming media using MediaSource and MediaStream at https://github.com/guest271314/MediaFragmentRecorder/issues/8. The code at the previous comment can be used to create a subarray of the input having the exact range byte length to match the device and implementation (Chrome, Firefox, et al.) configuration.

From experimentating with MediaSource waiting event of HTMLMediaElement (see handleWaiting function at the code at previous comment, e.g., video.onwaiting = handleWaiting) is a means to programmatically append the correct size buffers to the SourceBuffer at the necessary time during playback.

An example of using ReadableStream with MediaSource http://next.plnkr.co/edit/fIm1Qp?p=preview&preview

I think we’re talking past eachother - my point is that streaming small appends to MSE are inefficient, and that the problem is best solved by the UA instead of the client. Your example is using a ReadableStream, but is still doing multiple small writes in a cycle with the updating flag. So it does not solve the stated problem.