A way to the use DOM in a Web Worker

AshleyScirra · 2017-12-18

I’ve written an experimental library that uses Proxy objects to enable accessing the DOM APIs in a Web Worker:

It’s very much experimental, but works surprisingly well as a proof-of-concept. It seems with a few browser additions, this could start to approach a practical way to use DOM APIs in a worker.

If we had WeakRefs, this library could more or less handle it on its own. But if browsers used this approach in a built-in manner, it seems there would be a range of advantages:

better performance
better error-checking (perhaps using existing IDL)
solving the memory management problem
API improvements, e.g. removing the need for a special get() function (e.g. await document.title rather than await get(via.document.title)
better supporting user-gesture limited APIs from a worker

On the other hand it potentially duplicates APIs which are available directly from a worker, e.g. IndexedDB. Any thoughts on the practicality of this approach?

AshleyScirra · 2017-12-19

WeakRefs could solve the memory leak problem, but I understand that they’re contentious due to making GC observable. However I think it should be possible to solve this without making GC observable.

The fundamental problem is demonstrated by the following code:

// ON THE WORKER:

// This records the call and returns a placeholder
// Proxy which is assigned object id 1
const placeholderDiv = via.document.createElement("div");

// This is then sent to the main thread as a command similar to:
// 1. call "document.createElement" with argument "div" and assign
//    the return value object id 1

// Any subsequent calls then refer to the object id, e.g.:
placeholderDiv.textContent = "foo";
// results in a command like:
// 2. assign object id 1 property "textContent" to "foo"

// ON THE MAIN THREAD:

// Upon receiving the first command the main thread does the real call:
const realDiv = document.createElement("div");

// Then assigns the intended object ID:
idMap.set(1, realDiv);

// The map is used to look up future commands, e.g. to run command 2 we
// need to start by looking up object id 1, similar to this:
const realDiv = idMap.get(1);
realDiv.textContent = "foo";

// However, now we have a permanent strong reference to the div, so
// it will never be collected. We can't use a WeakMap here since
// the key is not an object. We don't know when to delete the entry,
// since GC is not observable and we don't know when the placeholder
// Proxy on the worker will be collected.

To solve this, there could be a special WeakKey object. This is like a reduced WeakRef that only serves to be used as a key in a WeakMap. If a WeakKey can then be posted between a Worker and the main thread, this should solve the problem by using it in place of the object ID:

// ON THE WORKER:

// This records the call and returns a placeholder
// Proxy which is assigned its own WeakKey
const placeholderDiv = via.document.createElement("div");

// internally, this will do something like:
// placeholderProxy._key = new WeakKey(placeholderProxy)

// This is then sent to the main thread as a command similar to:
// 1. call "document.createElement" with argument "div" and
//    here is a WeakKey representing the return value

// Any subsequent calls then refer to the WeakKey, e.g.:
placeholderDiv.textContent = "foo";
// results in a command like:
// 2. assign this WeakKey property "textContent" to "foo"

// ON THE MAIN THREAD:

// Upon receiving the first command the main thread does the real call:
const realDiv = document.createElement("div");

// Then assigns the intended object by its WeakKey:
weakMap.set(weakKey, realDiv);

// The map is used to look up future commands, e.g. to run command 2 we
// need to start by looking up the same WeakKey, similar to this:
const realDiv = weakMap.get(weakKey);
realDiv.textContent = "foo";

This WeakKey approach then behaves how we want:

The main thread can still look up real objects from messages sent from the worker.
If a placeholder Proxy is collected on the worker, then there are no more references to its WeakKey. This allows the entry in the weak map on the main thread to be collected.
GC is not observable.

I guess the downsides are this is pretty specific to this library, I’m not sure there are any use cases for this outside of Via.js. It also looks like it involves cross-context garbage collection which may be tricky for implementors, but I don’t know much about that.

Anyone have thoughts on this idea?

matthewp · 2018-01-02

No opinion on the DOM in a Web Worker idea, but I do like WeakKey a lot. I’ve experienced the same issue with trying to sync state between the main thread and a worker.

AshleyScirra · 2018-01-02

Interesting - can you elaborate on the specific use case?

AshleyScirra · 2018-01-22

I wrote a blog post that outlines the current problem with memory management and a few options for dealing with it: https://www.scirra.com/blog/ashley/38/why-javascript-needs-cross-heap-collection

bcherny · 2018-03-25

@AshleyScirra This is a really neat idea! I was just thinking about doing something similar last night and found this when searching around to see if anyone’s done this already. If it’s any validation, sketching out how to build this, I ran across the same exact limitation. Consider creating a Stage 0 proposal?

AshleyScirra · 2018-03-25

WeakRefs are now at stage 2 apparently: https://github.com/tc39/proposal-weakrefs

This would solve the memory management problem too. I think it’s worth seeing where that goes first.

bsmith-cycorp · 2018-04-24

I’ve been doing a lot of thinking about how great it would be if React’s core processes could be moved to a web worker to maintain page responsiveness when they take a long time. There are complex reasons why a piece of React couldn’t easily be moved to web workers, but then I thought, why not put the entire app in a worker and just send DOM updates to the main thread? Why does JS run in the same thread that handles reflow and scrolling in the first place?

Really hope this gets some traction. Great idea.

Freddy · 2018-05-08

+1

React and other frameworks have a very high overhead cost. A native ability inside a browser would be a much more performance efficient option.