Diff chunks in HTML file format

simevidas · 2022-12-20

When an HTML document is very large, and it’s updated often, browsers have to re-download the entire document every time. This is very inefficient. Imagine fixing a typo in a 1000-page document. Browsers are re-downloading a file that is 99.999% the same. What a waste of time and resources.

What if the HTML file format supported diffs that would be prepended to the file? So if an HTML document is edited, a diff is generated (think Git) and added to the beginning of the file.

The web server could then stream only the necessary diffs based on the request’s last-visited time, and the browser would reconstruct the HTML document from the received diffs and the original, cached file.

In order to achieve this system today, web developers have to use a service worker. I would love if this was built into the HTML format itself.

suns · 2022-12-20

HTTP has the means to retrieve the part of content.

The proposal for diff assumes the browser (or a resource management lib) has knowledge of the history and the update itself. While some last state of update info can be retrieved by OPTIONS, browser has to hold the info sufficient to request diff to version. I.e. in addition to old content, the injection range in old and replacement range in new content.

This “patch” kind information can be given back by content owner (CDN, server, package manager, etc.) by special request which is not in the scope of HTML by itself. Rather this is a proposal in HTTP scope.

What is in HTML group scope, the pattern of “patch” as a transformation layer and part of semver support. Here the “patch” is a compatibility polyfill which would allow to apply the versioned patch content over existing version. This pattern is applicable as to JS modules (usual case in current stack) as to declarative modules( part of Declarative Web Application not yet a proposal). When module is deployed, its descriptor along with other info( version, dependencies, etc.) could hold the set of version compatibility polyfills along with associated patches those polyfill could use as opposite to load the full content. It is assumed that after old version + polyfill(patch) the content would become identical/compatible with new version full content.

I would be hesitant to support string level patching, rather version recovery step in the transformation pipeline. For such the "patch’ would look like a transformation script. XSLT or DeclarativeCustomElement template for example.

mnot · 2022-12-21

See also:

This turns out to be hard because of deployment complexity, and because it has security implications (as all data compression techniques do).

suns · 2022-12-21

@mnot

This turns out to be hard…

git has proven otherwise. The large repository update is more efficient that get the latest. Of course, there is a balance and efficient way would be computed by loader(browser). The proposal would enable such balanced loading.

Anonymous2292900 · 2022-12-22

Hey there!

This topic interests me a lot, I use an editor called hope.js, which has this feature that you all are talking about in the html. hope.js is inspired by an idea by ted nelson (one of the forerunners of hypertext) - please, see this: source-code or live on

I would be happy if this idea here is a proposal - In part because there is a library that implements this feature, only the service work that uses the library would be missing

With hope.js - you can map the html according to a sequence of characters of beginning and end (It’s similar to git as you want, please see this example):

<textarea class="halfsize" id="markup">
0-43:h3
45-139:p
116-126:a href="http://en.wikipedia.org/wiki/Ted_Nelson"
1185-1191:em
</textarea>

Now it’s just a matter of creating a service worker with the hope.js library and transmitting the difference in the html changes.

It is not so complex to be done, because there is a library that implements the feature, the data security part maybe using md5 or sha1 is possible on the strings that are modified - (I’m not sure what i’m talking about, but it’s something I thought about now)

mnot · 2022-12-22

With git, you have a backing store that’s a ‘source of truth’ for all revisions of the document. Historically, that’s been uncommon on the general web.

So an important question here is whether this is a common enough pattern to make it part of the web (whether in HTML, as suggested, or HTTP, as might be more well-integrated), or whether it should remain something that you can do locally with javascript (etc.).

Personally, I’m interested in the idea of an updated delta-encoding spec for HTTP; this has come up a few times, and I suspect some – especially API folks – would benefit from it. We already have PATCH; one could look at this as doing PATCH in the other direction.

However, that doesn’t mean browsers will implement it; they have a pretty high bar for what they work on, and limited resources. That’s going to be the case whether it’s done in HTTP or HTML, however.

Anonymous2292900 · 2022-12-23

Hi mnot.

Network protocols such as IPFS, Hypercore, Bittorrent have content addressing, perhaps the proposal here allows for content addressing with HTML. Maybe… today this seems to be more common or no. (There are sites for example on IPFS, and things working with Hypercore and linux isos being shared with torrent. Somehow it’s all on the web - just not the html.)

There is an interesting proposal to transform html into hypertext: extending-html-as-a-hypermedia. Maybe this proposal makes sense: Diff chunks in html file format + extending-html-as-a-hypermedia and Alternative to WebPackages to gurantee integrity (Document Hash)

(input) with diff and htmx and md5:

<div hx-get="/clicked" hx-trigger="click" diff="3f5d70219e8a2828c12572e2a48ac1d6">Click Me</div>

(input) with diff and htmx:

<div hx-get="/clicked" hx-trigger="click" diff="45-139:p">Click Me</div>

(output) with diff and htmx and md5:

<div hx-get="/clicked" hx-trigger="click"><p>Click Me again</p></div>

(output) with diff and htmx:

<div hx-get="/clicked" hx-trigger="click"><p>Click Me again</p></div>

Can I make a get or post call here in hx-get to capture the modified information from diff

suns · 2022-12-23

While the partial resource delivery is on HTTP protocol side, the abilities to decide should the resource retrieval use chunk or whole GET reside on browser side.

Browser has sufficient info on what props cached resource has: its checksum, deployment date, etc. Browser can sniff whether resource been changed and during this OPTIONS call can provide its latest available state. The server side would provide whether patch for cached version exist. This flow fits into server-client communication layer. I.e. HTTP.

If you want this proposal be implemented, it has to go to HTTP group first.

Note, the resource is abstract, it is not just HTML. It can be JS, image, web archive or whatever.

Anonymous2292900 · 2023-01-16

Hey there!

I would like to contribute to this proposal with relevant links, there are things like I spoke of as hope.js.

So I was discovering these two implementations of git on the web: webdiff and isomorphic-git.

So please see this (webdiff):

So would that be something relevant to what you’re all talking about, thinking about - like isomorphic-git, hope.js or webdiff here?