[Proposal] Tasklets


#21

Just to confirm:

// tasklet.js
let i = 0;

export function increment() {
  return ++i;
}
const t = await tasklet.addModule('tasklet.js');
await t.increment(); // 1
await t.increment(); // is this always going to be 2? (assuming no other calls to increment)

const t2 = await tasklet.addModule('tasklet.js');
await t2.increment(); // 1 or 3?

#22

In my mental model:

const t = await tasklet.addModule('tasklet.js');
await t.increment(); // 1
await t.increment(); // 2

const t2 = await tasklet.addModule('tasklet.js');
await t2.increment(); // 1 

#23

Ok, so a TaskletGlobalScope per whatever-addModule-resolves-to. Although multiple globals can run in the same thread.

It’s probably too early for bikeshedding, but the “module” naming seems wrong. With modules, two ‘imports’ of the same module name will return the same instance, but that doesn’t seem to be what’s proposed here.


#25

I’m very much in favor of this and I like the API. My only feedback is regarding constructors being synchronous, and the example given of the error thrown in the constructor not being thrown in the main thread.

If I understand correctly, this is how it would work with the current proposal.

//tasklet.js
export class ThingDoer {
  constructor({id,target}) {
    if(!id) { throw new Error('Thing doers need an ID') }
    if(!target) { throw new Error('Thing doers need a target') }
  }

  doThing() {
    if(!canDoThing) { throw new Error('Cannot do thing') }
    // do the thing
  }
}
const module = await tasklet.addModule('tasklet.js')
const ThingDoer = module.ThingDoer
let doer;
try {
  doer = new ThingDoer()
}
catch(constructorError) {
  // will never happen
}
try {
  const result = await doer.doThing()
}
catch(resultError) {
  // this is either a constructor error or the error thrown by the function
}

I think it might would be better if it was possible to catch errors thrown by the constructor somehow. Maybe something like this

const doer = new ThingDoer()
try {
  await tasklet.whenConstructed(doer)
}
catch(error) {
  // one of the two types of errors that could be thrown by the constructor
}

#26

So, if I understand you correctly – you want a whenConstructed() so you know when the constructor failed. But what would whenConstructed() do? You could actually implement the behavior you are proposing already:

//tasklet.js
export class ThingDoer {
  constructor({id,target}) {
    if(!id) { throw new Error('Thing doers need an ID') }
    if(!target) { throw new Error('Thing doers need a target') }
  }
  whenConstructed() { /* literally does nothing */ }
  doThing() { /* ... */ }
}

That feels pretty awkward. I can see how it might be important to know whether it’s the constructor that failed or the actual method you invoked. What do you think about this? (@iank and what do you think?)

const module = await tasklet.addModule('tasklet.js')
try {
  const thingDoer = new module.ThingDoer();
  await thingDoer.doThing()
}
catch(error) {
  // error.constructorError === true
}

#27

I don’t want to know just that an error was thrown by the constructor, I want to know about the error that was thrown by the constructor. Having error.constructorError is better, but because in my example, an error can be thrown for either an invalid id or an invalid target and a developer might want to do two different things depending on which of the two errors is thrown.

Also, when the error is thrown can important.

Consider this example, written so it could work both as a tasklet and as a regular module:

// tasklet.js
export class Adder {
  constructor(number) {
    if(typeof number != 'number' || isNaN(number)) { throw new Error('Invalid number') }
    this.number = number
  }
  async addOne() {
    return this.number + 1
  }
}

We don’t expect addOne to throw the error because it’s a very simple function, and validation of what’s passed to the object is done in the constructor, so we probably won’t use a try/catch when we’re calling addOne.

// as a module
import Adder from 'tasklet.js'
// or as a tasklet
const {Adder} = await tasklet.addModule('tasklet.js')

try {
  const adder = new Adder(parseInt(document.getElementById('number-input').value))
  // If the constructor throws an error when loaded as a module, the stuff after this will not happen
  // If it's a tasklet, the next lines will run.
  const statusElement = document.getElementById('status')
  statusElement.classList.add('has-result')
  // Even if the input has "foobar" instead of a number, the `has-result` class has already been added and the next line will fail
  const result = await adder.addOne()
  statusElement.innerHTML = `Result: ${result}`
}
catch(error) {
  // If loaded as a module, this happens right after the the constructor throws the error
  // If loaded as a tasklet, this happens after the `has-result` class has been added.
  alert(error.message)
}

I can imagine having weird problems in situations like that, for example when migrating an existing module to a tasklet. It could take a lot of time to find bugs like that an refactor. Since constructor errors aren’t thrown until after instance functions are called, we have to expect that any instance function can throw an error, which I think is probably not good.

I threw out whenConstructed just to have some kind of promise-based thing that could give the error thrown by the constructor. I just borrowed the idea from customElements.whenDefined() because it’s kind of similar to what I want, in that it’s a promise-based thing that waits for something - in CE’s case, some JS to load; in tasklet’s case, something on another thread.

I thought about it and something like this would make it so we can get the error in basically the same place without changing much code by wrapping constructors in a promise-based thing.

// ... load the tasklet
try {
  const adder = await tasklet.constructObject(Adder, parseInt(document.getElementById('number-input').value))
  // if the constructor throws an error, it will throw here as a module OR as a tasklet
  const statusElement = document.getElementById('status')
  statusElement.classList.add('has-result')
  const result = await adder.addOne()
  statusElement.innerHTML = `Result: ${result}`
}
catch(error) {
  alert(error.message)
}

async tasklet.constructObject(someClass,[arguments]) just resolves with the constructed object, or throws the error thrown by the constructor. I think it’s also easier to polyfill than making constructors throw errors asynchronously. What do you think?


#28

I am not convinced this proposal solves a real problem.

We build a large PWA (~250k LOC) and make extensive use of Web Workers for asynchronous processing. We have a small framework which does the following:

  • creates a “job worker” per navigator.hardwareConcurrency (i.e. one worker per hardware thread)
  • wraps the postMessage() bridge in a simple call, e.g. JobScheduler.Do("expensiveJob"), returning a promise

Creating a new task to be run asynchronously - even a small one - simply involves adding an extra worker import script with a function to do the work, and then calling Do("jobName") from the main thread. Then you get a promise that resolves when the work is done.

This appears to invalidate the starting assumptions of this API:

  • postMessage() is not used by any callers, so its “clunkiness” doesn’t matter
  • a fixed number of workers are used, regardless of how many types of jobs are defined, so the per-worker overhead doesn’t matter

We’ve also done a lot of work to intelligently schedule jobs between workers, hold the queue off the main thread so workers can pick up new jobs even when the main thread is busy, etc. Providing enough work is submitted, we can comfortably max out all CPU cores on a device and achieve the highest possible throughput for processing that work, which is useful for certain batch jobs like importing a folder of audio files.

It’s not obvious to me that tasklets add anything useful beyond this, so it seems that what developers really need is a good framework, not a new API surface.


#29

Yes, this API is first and foremost about developer convenience (and opening up opportunities for further performance optimizations). The proposed API could be implemented in user space right now.

However: While postMessage can be abstracted, by default it is not. It’s up to the developer to either roll their own RPC protocol (like you did) or pull in some 3rd party library like those in @iank’s post.

As a result WebWorkers are there but scarcely used. That is one of the problem this API is trying to solve: Lower the barrier of entry to off-main-thread work so that architectural patterns like native platforms use can be adopted on the web. As a result, I hope to see reduced jank on the main thread in general.

Again, the main reason why this is proposed as a platform API is to allow browsers to spin up only one thread for tasklets of different sites (something that is not possible with WebWorkers). Additionally I could see fast-paths being added for certain types that get passed around from Tasklets to other threads.


#30

That is a good point.

I have to admit I am hesitant to add a special function for constructing objects as it also strays away from normal JS, but I do see the necessity. I am wondering if that needs to be baked in the API.

I have turned your question/suggestion into an issue on the repo, as I don’t have a good answer myself right now.


#31

Generally I am against any spec that can be implemented in user space. Why not focus on something that isn’t already possible, rather than something that can be done already, even if it takes a small framework?

On what basis do you think tasklets will actually change this? If the real problem is something else - e.g. the fact async code is just harder to reason about in general - then tasklets won’t necessarily fix that problem, either. Another way of approaching that same issue would be for Google to develop an official web worker framework that does it, then use Google’s considerable developer evangelism resources to promote it. Specs aren’t the way to solve every problem!

Maybe it’s not even a problem - what if most websites just don’t need to do a lot of background processing?

Is this really out of the question for web workers? Why can’t web workers from two different origins share a thread, particularly if one is from a page in the background?


#33

I think the web’s capabilities are pretty vast already. My feeling is that most “missing” APIs are either niche or contain a really hard problem (and/or are a security nightmare). One of the biggest factors that limits adoption of the web currently is that the web is being perceived as slow and clunky. We still see a lot of main thread jank.

If we take your suggestion of extending WebWorkers to allow multiple cross-origin WebWorkers to share an OS thread, it becomes easy for one worker to starve out all the other’s by accident by using APIs like Synchronous XHR. So I think we should disallow them – which already makes them different from the “old” WebWorkers. Starvation is still possible, but less likely to happen by accident. Also, WebWorkers still only have postMessage() as a way to communicate, so developers would need to pull in a library to have a straight-forward way to invoke functions and get back the result of said function. That’s a major barrier to using them. I only have anecdotal evidence to support this claim, but from experience a lot of developers don’t bother setting up a generic infrastructure to communicate with their WebWorkers. Additionally, things like a WebWorker being a EventTarget is not possible.

What I am trying to say: Tasklets are effectively WebWorkers with the added capability of sharing an OS thread. But the proposal also includes the (necessary) measures to cover the consequences of that change. Additionally, there’s a convenience layer (i.e. proxy classes) so that off-thread work becomes a no-brainer.

I don’t believe that to be the case with async/await.


#34

This is a really interesting proposal. The API is very intuitive, and I like the goal of increasing worker adoption.

I’m pretty sure this is possible today with WebWorkers? In fact Edge’s implementation used to use a thread pool (limited to the number of available processors), although we dropped it recently in favor of “one worker, one thread” for the sake of simplicity and reliability.

If the proposed spec doesn’t offer anything that can’t already be implemented in user space, then the primary value is in increasing adoption. One potential issue I see with that is that, from my POV, the main thing limiting WebWorker adoption is not the awkward messaging API or perf overhead of multiple workers – both of which can be solved with a minimal library on top of a single worker (e.g. promise-worker is 2.5kB min+gz, most of which is the promise polyfill).

Instead the main limiting factor IMO is needing to have a separate script file. This makes it clunky to use bundlers, since you have to manually specify loaders/transforms and architect your code around the worker/UI thread boundary, which leads people to heroic but clumsy workarounds like operative and catiline that convert JS strings to blob URIs or data URIs.

It also makes it impossible to ship a library that uses workers under the hood, e.g. I can’t write a solve-pi-to-n-digits library which runs its heavy computation transparently on a background thread when the user imports/require()s it. Either my entire library needs to be able to run in a worker thread (and the user must set that up themselves), or none of it can.

Another blocker I’ve seen to WebWorker adoption is the lack of DOM APIs. E.g. you rely on some-cool-library which seems like it shouldn’t require the DOM, but then it does for some reason – assuming window as the global, using an anchor tag to parse a URL, etc. Or you’re a framework/platform trying to offer a generic API, and you can’t assume that your users will write code that’s agnostic to the UI thread vs worker thread, so you just run everything on the UI thread.

I don’t see this proposal as addressing these two issues. Although I admit that trying to solve the “separate file” limitation comes with its own problems (double-parsing the JS, how to handle closures, etc.), as does the DOM issue. (Fake DOM APIs in workers? :stuck_out_tongue:) I’m not ragging on this proposal, but I think it’s worth asking how much it will really boost worker adoption and if there isn’t some simpler solution that can help patch up the nastiest parts of the WebWorker API (e.g. add a promise-based message API to WebWorkers?).


#35

Just thought I’d throw this out there: the Tasklet proposal is eerily similar to a thing I wrote for Node.js about a year ago.


#36

We [at webpack] heavily agree here. Solving this issue would be beneficial. Why not include APIs that make it easier to offload some of the module registering capabilities of a bundler to a thread, or even let’s say webpack’d code splitting runtime could register inside a worklet so that when that code is “dynamically” requested, that it is available instantly across thread vs XHR/Preload


#37

100% agree with @nolanlawson and @TheLarkInn. I don’t understand why we need separate file to create a thread. Libraries like https://github.com/developit/workerize prove that the API itself is a problem here.


#38

@coderitual , @nolanlawson , @TheLarkInn - we agree - I think. We’d like to have something that you can write inline in a script file, e.g.

(above is just one way of doing this, there may be others). I’m excited to see libraries like workerize, greenlet, cloonyjs, pick up in popularity. Which have similar API surfaces.


#39

I’d point out you already can create workers from a string:

const sourceStr = `...`;
const blob = new Blob([sourceStr], { type: "application/javascript" });
const blobUrl = URL.createObjectURL(blob);
const worker = new Worker(blobUrl);

#40

@domenic and @surma have created a proposal for the JavaScript language itself, temporarily named “blöcks”, which would have similar applications to tasklets, and which may have a greater chance of adoption by the vendors than tasklets.

The explainer includes a section addressing tasklets:

Tasklets were a WICG proposal that tried to enable off-thread computing with a strong focus on ergonomics. They involved creating a module file, which was loaded into the “tasklet” (mini-worker), and exposed an interface via its exports that became a bunch of async function proxies on the other side.

Tasklets got considerable pushback from Microsoft and Mozilla. The main concerns were:

  • No clear reason why tasklets were different from workers or weren’t using workers
  • Tasklets still required the off-thread code to live in a separate file

In general we think the ergonomics improvements of tasklets help when you have a clear separation between classes or services that can live in another thread, and communicate back and forth via well-defined interfaces. However, they don’t make it simple enough to move smaller chunks of work off the main thread, in the way we see as prevalent in other languages.