A partial archive of discourse.wicg.io as of Saturday February 24, 2024.

[Proposal] Support Cloudflare’s HTMLRewriter api in Workers

bahrus
2021-12-28

Cloudflare Workers based their API on the Service Worker API.

But Cloudflare found it quite useful to enhance that API with an HTMLRewriter api.

It is possible to implement the API in a service worker but it is quite costly and limited in scope (no streaming support, for example).

This is a proposal to incorporate that API (more or less) into the platform so it can be used from workers (as well as from outside workers, I guess), including streaming support.

yoavweiss
2021-12-28

/cc @Jake_Archibald

bahrus
2022-01-03

Just to add a little more justification:

The need has been stated eloquently by others. For example: We need DOM APIs in Workers - Modern Web Development with Chrome by Paul Kinlan.

Cloudflare has a strong interest in supporting the most helpful api it can provide within the tight constraints it needs to impose as far as memory usage / cpu utlization. This is what they came up with. I think the same findings would likely hold in a browser setting, especially on a resource-limited mobile phone.

I’ve faced scenarios where I wanted to implement functionality (such as providing link preview functionality) in workers, but abandoned it due to the lack of a parser.

Another use case – integrating data that is in indexedDB or other worker accessible locations into a server-rendered stream.

Being that we are encouraged to take processing off the main thread, lack of some way of parsing HTML has been a significant barrier for me.

From my explorations of the HTMLRewriter api, I believe it does provide sufficient hooks to implement a DOM Parser of sorts (maybe without css query capabilities) in a fairly straightforward way, with a fairly small download footprint (at the expense of the performance toll on memory this would entail, which Cloudflare steers clear from). Perhaps if the need for such a DOM Parser is really required, it could build on the HTMLRewriter down the road, once a good implementation is proven out.

bahrus
2023-04-11

Update. There is now a wasm-based custom solution, that supports streaming. Haven’t tried it yet.