Request For Comments: switching/resizing heaps in asm.js

luke · 2014-09-19

We’ve had several feature requests for adding the ability to resize the heap of an asm.js module. Right now, the asm.js rules force you to pick a fixed heap size ahead of time and this is undesirable when the module needs to manipulate a dynamic amount of data and run reliably on 32-bit devices. We have a proposal to extend asm.js to allow changing heaps and we’d appreciate any feedback. This has not been implemented in Firefox yet, so we’re definitely open to changes.

The proposal is to allow a special heap-changing function in asm.js that allows the caller to pass in a new ArrayBuffer that replaces the current heap encapsulated in the asm.js module’s closure state. This heap-changing function would have to take a very specific form so that AOT compilation cannot possibly be invalidated by pathological callers. The specific form proposed is:

function asmModule(stdlib, foreign, buffer) {
    "use asm";
    ...
    var Int32Array = stdlib.Int32Array;
    ...
    var byteLength = stdlib.byteLength;
    ...
    function changeHeap(newBuffer)
    {
        if (byteLength(newBuffer) &    0xffffff ||
            byteLength(newBuffer) <=   0xffffff || 
            byteLength(newBuffer) >  0x80000000)
            return false;
        heap32 = new Int32Array(newBuffer);
        ...
        buffer = newBuffer;
        return true;
    }
    ...

In particular, the byteLength and changeHeap definitions must be exactly the same as the above code, modulo:

comments, whitespace, ASI, empty statements and unnecessary parens/curlies (as implied by Section 4)
the literals in the condition must be asm.js integer literals (so base 16, base 10, and exponential forms are acceptable)
all names except the stdlib ‘byteLength’ property name (more on this in a second)
the numeric literals: the first literal must include bits 0xffffff; the second literal must be >= 0xffffff; the third literal must be <= 0x10000000.

Notes:

Additionally, the changeHeap function must be the first function in the asm.js module (if it is present at all). The byteLength import can occur one or more times anywhere in the global imports section.
All views in the global scope must be replaced (in the order they were imported); it is not possible to simultaneously have views into two separate heaps.

So what’s the deal with ‘byteLength’? We need the ability to

verify that ‘newBuffer’ is a real ArrayBuffer,
get its byteLength.

The ArrayBuffer.prototype.byteLength accessor property’s getter achieves both these effects (it is a non-generic getter, so it throws if ‘this’ isn’t an ArrayBuffer) but we can’t just call ‘newBuffer.byteLength’ since … JS. Instead, we require the caller to pass in the result of:

Function.prototype.call.bind(Object.getOwnPropertyDescriptor(ArrayBuffer.prototype, 'byteLength').get)

which is validated at link-time. If this validates, we have a foolproof way to solve problems (1) and (2) above. Note: on Safari and IE, ArrayBuffer.prototype.byteLength is currently a data property. For these, a polyfill would simply be:

function byteLength(buf) { return buf.byteLength }

Should these JS engines want to take advantage of asm.js for AOT, though, they’d need to implement byteLength according to 24.1.4.1 in the current ES draft.

So why do we need to check those three byteLength conditions?

A non-multiple-of-16mb size may inhibit bounds-checking optimizations on different platforms (see this topic).
The <= 0xffffff check ensures two things. First, since it is possible to have a zero-byte-length buffer, it prevents shrinking the heap below the link-time minimum heap size. Second, since the length can be any integer greater than 0xffffff, this gives a guarantee to AOT compilation that bounds checks can be removed for constant heap accesses below that length.
The > 0x10000000 check ensures that an asm.js module always has a full view of the heap, even if the engine allows ArrayBuffers larger than 2gb.

In particular, the min/max length requirements in change-heap establish a minimum-heap requirement for the module:

Constant heap accesses must be below this limit to validate.
Link-time validation fails if the initial ArrayBuffer is below the min or above the max.

In addition to the validation-time and link-time requirements listed above, there is one more validation-time requirement: non-builtin function calls are not allowed in any nested expression of a MemberExpression (see ValidateHeapAccess). The reason for this is that the order-of-evaluation rules of JS require that, in both of:

HEAP32[f()] = 0
HEAP32[i>>2] = f()

HEAP32 is evaluated before f(). Thus, if calling f() ultimately calls changeHeap, HEAP32 must refer to the original heap. Cutting out function calls rules out this corner case. This limitation applies to FFI calls, internal (normal) calls, and function-pointer calls, but not builtin calls (since none of them can reenter). Note: to avoid invalidating existing asm.js, this validation rule would only apply if a changeHeap function was defined.

Lastly, how would changeHeap be used in practice?

If an asm.js heap needs to grow (e.g., malloc() runs out of buffer): allocate a new ArrayBuffer, copy over the contents of the old ArrayBuffer, and changeHeap to the new ArrayBuffer. This can be done from an FFI called by the asm.js malloc() implementation.
If TC39 accepts the ArrayBuffer.transfer proposal, ArrayBuffer.transfer could be used instead of copying. Until ArrayBuffer.transfer is ubiquitous, applications would be advised to feature-test for ArrayBuffer.transfer and fall back on copying.
changeHeap can be called with any ArrayBuffer, not just resized-from-previous heaps. This allows things like passing heaps around between workers (via postMessage-transfer) and without having to re-link an asm.js module each time.

ben · 2014-09-23

Looks good to me! Having a “bound” object opens a whole new world of possibilities…

A few questions:

Why the “bound” object is in the stdlib namespace? Not trying to bikeshed, but the fact that this bound value isn’t a classic property of a global makes me think it would be in the FFI namespace. In the meanwhile, as it is something that relates to properties of a global (in this case, Array.prototype.byteLength), that also makes sense. I don’t feel strongly about it, would just like to understand the rationale.
IIUC, one would need to replace all views on the buffer first, and then the buffer at the end, is that right? We should make it clear that two heaps can’t coexist in the closure state.
IIUC, the only way to call the changeHeap function is from a FFI, so this function needs to be exported if it’s defined. Regarding forbiding calls in heap indexes, could we restrict this to “not calling FFI functions or functions that call FFI”? (such an analysis seems doable, even in parallel compilation modes) Or is it just too complicated and we can just use instead

 HEAP32[x] = 0```

luke · 2014-09-23

Answering your questions:

You’re right, ‘bound’ is a bit arbitrary. I definitely wanted something on the stdlib object because stdlib imports are the only ones that imply link-time validation rules. Perhaps stdlib.byteLength?
Yes; I meant to imply that with the …, but you’re right we should be explicit about that.
It seems simpler to specify “no calls” since this is a simple, local syntactic property (and, for Emscripten to take advantage of this, it would have to do the same analysis and then what happens if we come to slightly different results…)

abchatra · 2014-09-23

Thanks Luke for the detailed information.

Overall it looks good for me. Though I am not sure I like byteLength on stdlib. This is bit harder to understand as it is entirely different from current set of methods on stdlib. I don’t have any better alternative for the same (extra parameter to module may be an option?).

I have few questions:

• How do you plan to do the link time validation for stdlib.byteLength? Is the following right?

    - Check for target function of bound object to be Function.prototype.call.
    - Check for bound this to be ArrayBuffer.prototype.getLength.

• Is stdlib.bound.byteLength typo in your post? Is this same as stdlib.byteLength?

• I am good with removing calls altogether from inside the heap access. This allows generating heap access code much simpler.

• Link time validation requires heap size to be 2^n or 2^24 * n. How does this check correlate with that? (byteLength(newBuffer) & 0xffffff || byteLength(newBuffer) < 0xffff)

• Also I believe you avoided bounds check for constant index access on heap by doing link time validation. By allowing resizing to lower sized heap that can’t be done anymore. Is that right?

• What is atm in “IE atm”?

luke · 2014-09-23

You’re right that ‘stdlib.byteLength’ is awkward. I feel like it should hang off stdlib somewhere, since stdlib implies link-time validation constraints and foreign is a free-for-all. For a while I had stdlib.bound.byteLength, but that also seemed arbitrary (Benjamin commented above). The important thing is that the stdlib argument doesn’t have to be the actual window object; it can be any mocked-up JS object (and that is actually what Emscripten does by default) so there should never be a hard name conflict. What is your preference?

Answering questions:

That is exactly what I was thinking.
Yes, that is a typo from my first iteration which named the import ‘stdlib.bound.byteLength’, where bound was any ordinary JS object with data property ‘byteLength’.
I would have done that, except it would cause all existing large asm.js apps to stop validating. We did that a few times in the early days, but I was hoping to not do it again. Any strong feelings on this?
It is a simplification of those rules: 2^24*n means the low 24 bits are clear, which is what the mask in the first disjunct tests. Thus, we lose the ability to resize to sizes 2^n below 16mb, but avoid a more complicated pattern. My assumption is that it’s not a big loss to require that the smallest heap you can resize to is 16mb. But perhaps it’s more regular to keep the changeHeap test more in sync with the link-time validation rules? What do you think? As for the second disjunct, the goal was to provide a reliable minimum heap length (so that bounds checks at constant heap indices below this limit could have their bounds checks removed AOT). (This constant can be anything; Emscripten would pick the smallest limit that included all constant heap accesses.) But, given that byteLength can’t be zero and the first disjunct thus requires at least 16mb, it seems like this disjunct should be optional (and, when present, not have a lower bound). Sound good?
This is what the byteLength(newBuffer) < C disjunct allows: any constant heap access below C can have bounds-checks removed and be safe from changeHeap.
Oops, I was unclear: “atm” was short for “at the moment” and I meant to write “Safari and IE, at the moment, …”. Clarified now.

abchatra · 2014-09-24

Thanks Luke. I am good with stdlib.byteLength. Seems logical for link time validation. Overall proposal sounds good.

FremyCompany · 2014-09-26

Looks interesting to me too, though it is unfortunate that the IE and Safari have a different byteLength implementation.

Also, instead of disallowing all function calls on the lefthand side of an assignement, wouldn’t it be possible to use tainting (aka check which functions call “changeHeap” and which functions call these functions… until the set of such functions is stable)? This would look much better to me.

luke · 2014-09-26

I imagine the variance on byteLength is temporary; typed arrays were only recently imported into the ECMAScript spec and I assume things were more ambiguous in the Khronos spec. Also, Emscripten should be able to easily polyfill over this problem in the interim.

As to the second point: that would require an interprocedural analysis (which would have to be conservative in case of, e.g., function-pointer calls) which would make compilation and the specification more complex (you’d need to exactly what the analysis did). There is likely zero real-world size increase due to the lost expression-inlining ability and, with modern SSA-based compiler backends, no performance loss from the separated assignment.

FremyCompany · 2014-09-26

Yes, you’re right it would make things more complex, and since most ASM is autogenerated anyway, it doesn’t annoy to much to simply remove function calls from them.

luke · 2014-09-26

[Updated] I just learned that ArrayBuffer byteLength can be zero. Thus, we need both the mask and <= byteLength checks.

luke · 2014-09-26

A few more tweaks made to the proposal found while implementing that generally simplify things:

I think it is simpler to treat byteLength as an ordinary global import (with a magic type like fround) which means it can be imported anywhere, possibly multiple times. (The change heap function must still be before all other functions to provide AOT bounds-check-elimination.)
Heap views have to be replaced in the order they are declared in the change-heap function.
For the same reason HEAP32[(f()|0)>>2] is disallowed, so is “HEAP32[i>>2] = (f()|0)”.

Let me know if there are any problems with any of these changes (or the previous minimum-heap-length change).

azakai · 2014-09-27

I don’t understand what the last tweak means?

luke · 2014-09-27

Sorry, I rushed that a bit. Updated previous comment.

luke · 2014-10-09

An initial implementation of exactly what is in the proposal above landed in Firefox Nightly and should ship with Firefox 35. (Wow. Such AST matching.) Feel free to play with it and post feedback here.

IIUC (correct me if I’m wrong Alon), the tentative plan is for Emscripten to allow resizable heaps by default (so as to avoid requiring users to specify the heap size up-front) once this optimization is in Firefox release. Comments welcome on this timing, though.

Alon, do you think you could post a link to a public demo that uses change-heap so we could try it out?

azakai · 2014-10-09

Well, we would encourage it more when browsers are ready I suppose, but not sure we would want it on by default? It does inhibit some amount of optimizations. (My worry is that if it’s on by default, almost no one that doesn’t need a resizable heap will disable it and get those opts; whereas if it’s off by default, people hit the memory error so they will know they need to enable it.) On the other hand, the cost is about 0.5% in code size. Perhaps not a big deal, I guess.

In any case, whether this is on or off by default in emscripten would end up being whatever the emscripten community prefers, we’ll ask on the mailing list.

What kind of demo do you want? Just a random demo that happens to be built this way, or something that actually grows the heap over time?

luke · 2014-10-09

It doesn’t disable anything on the browser side. A 0.5% code size increase seems reasonable (is this compressed?). Also, while flagrant heap-too-small bugs are easy to find/fix, I expect (1) that leaves rare spike cases that end up shipping (2) it forces devs to claim more address space than they normally need (leading to higher incidence of virtual-address-space-OOM on 32-bit). Asking the list seems like a good idea, though.

Ideally something that does grow, if it’s not too much trouble. It’d be nice to see it in action

luke · 2014-10-13

I realized that, if engines start allowing >2gb ArrayBuffers in the future (none appear to, atm, but the spec indicates ArrayBuffer lengths can be up to 2^53-1), it would be really nice to have a maximum-heap-length requirement as well. This avoids the unpleasant situation where (due to the signed right-shift in heap access) asm.js cannot access the full range of an ArrayBuffer which in turn poses problems for the use of memory protection tricks to avoid bounds checks.

Thus, the ‘if’ in the change-heap function would have a third disjunct:

|| byteLength(buffer) > 0x80000000

In the future, if we want to allow asm.js to access a full [0, 4gb) range, we could allow a bigger max-length constant which, if used, would require all heap accesses to use unsigned right shift.

azakai · 2014-10-15

I uploaded a demo of bullet that uses memory growth at

http://kripken.github.io/ammo.js/examples/growable/ammo.html

Pick a large number of cubes and look in the web console, for 1500 for example it should log out multiple memory growth events.

This doesn’t contain the last change. It sounds fine to me though.

luke · 2014-10-15

Thanks Alon! I’ve heard no other objections, so I planning to land the max-length change soon.

luke · 2014-10-21

This feature (along with the max-length tweak) is now in both FF Nightly (36) and Aurora (35). ArrayBuffer.transfer landed in FF Nightly (and will stay #ifdef’d to Nightly until it makes progress in TC39) which allows for comparison of heap resize with and without AB.transfer (the demo feature-tests and prints timing/feature-test results to console.log). Anecdotally: on my machine, with AB.transfer, change-heap usually takes 4-5ms (regardless of size), whereas copying takes anywhere between 10 and 100ms.