Proposal: a `Cache-Path` header to encourage proper caching of HTML history-based SPAs

isiahmeadows · 2019-04-04

With modern HTML5 history navigation, you typically need server support to render the same page for a variety of routes. But browsers aren’t going to know this is the same across multiple URLs - given a web app at https://example.com/app/, if you first visit /app and then a few days later open a new tab to /app/1/view, the browser will see the second URL isn’t in the cache, despite it requiring basically the same set of files.

For this reason, I suggest there should be a Cache-Path: /path/to/parent header, usable for all cacheable responses as well as 304s. This would tell the client to cache the response as if the request was to the following URL, and to read from the cache similarly. This may include query strings, which are considered part of the cache key. If missing, the default value is just the request path itself. This added to encourage correct caching behavior for SPAs, one that also ends up taking a lot less cache space. It would also allow better caching for other requests where query strings are not used to control server actions and responses, like with removing referrer strings and other SEO junk from the cache path so browsers don’t cache https://example.com/?_ref=foo differently from https://example.com/.

sollyucko · 2019-04-04

I’m probably missing something, but can’t ETags be used for this? You could probably use the conical URL as the ETag.

isiahmeadows · 2019-04-04

Not quite. ETags are about identity, but this is about names - those two are subtly different.

Now that I’m thinking about it better, the Cache-Path: ... here should really just be a Cache-As: ..., with a space-separated list of paths, so it’s more flexible and the browser knows a bit better to just ignore everything. Each path is either a /path/to/whatever or prefix /path/to/whatever, with the former being absolute and latter being a path prefix. This would tell clients much better what paths a resource represents, and it could potentially read a URL it’s never even seen before straight from cache.

sollyucko · 2019-04-04

True. But wouldn’t we want to read it from cache iff it has identical content? And couldn’t a weak ETag be used if computing the content is expensive? Also, with prefix, what would happen with multiple matches? Use the latest?

sollyucko · 2019-04-04

Actually, I think I see what you’re saying. Is the goal to also save the page’s state?

isiahmeadows · 2019-04-04

If by that you mean returning to that view as if it were a traditional, separate web page, yes. The key here is that the server itself doesn’t always care about certain portions of the URL - only the client. It’s not unlike using the hash as routing, but in this case, it ends up sent to the server anyways. This is just so the server can tell the client it should be cached for multiple URLs and not just one.

sollyucko · 2019-04-04

I guess the goal is too avoid extra round-trips, which ETag would need, which is reasonable.