[Proposal] Migrate some high-entropy HTTP request headers to Client Hints

mikewest · 2018-11-05

A Problem

HTTP request headers expose quite a bit of information about users by default, even over plaintext. It would be better if this information was a) opt-in, b) locked to secure transport, and c) delegated by the first-party to specific third-parties. Client Hints, conviniently enough, provides exactly this infrastructure.

Some Proposals

User-Agent and Accept-Language headers seem like particularly low-hanging fruit:

https://github.com/mikewest/ua-client-hints suggests that we split User-Agent into UA, UA-Platform, UA-Arch, and UA-Model Client Hints.
https://github.com/mikewest/lang-client-hint suggests that we turn Accept-Language into a Lang Client Hint.

triblondon · 2018-11-06

Generally I think this idea is reasonable. Three issues:

1. Not granular enough?

I’d say if we are trying to make the data more granular, why not finish the job and make the values singular, ie. UA-Brand: Chrome and UA-Version: 189 rather than UA: Chrome 189.

2. Intentionally incorrect tokens

Yes to UA-Engine of course (and UA-Engine-Version), otherwise chaos will continue to reign in embedded-browser-land. But nevertheless we should expect copious lying. The danger of the proposed approach is that whereas with the User-Agent string, we could mostly treat it as a unique key into a table of UA builds containing the genuine metadata, this mechanism will likely produce an increase in the number of cases where two or more different browsers submit exactly the same UA-related metadata and are therefore indistinguishable (unless we take the craziness up a notch and try to distinguish them based on variations in what they do with the Accept header or how they format Accept-Language or something)

What may be the only way to prevent this is to send a header with a verifiable build signature that can’t be replicated by other browsers. I don’t know how you’d do this but cleverer people might.

3. The CH opt-in

User agents will attach the UA header to every secure outgoing request by default, with a value that includes only the major version (e.g. " Chrome 69 "). Servers can opt-into receiving more detailed version information in the UA header, along with the other available Client Hints, by delivering an Accept-CH header or Accept-CH-Lifetime header in the usual way.

So, I had though that the weirdness (IMHO!) of CH being opt-in but not subject to user consent was a lost battle. This suggests that the debate is still worth having, because the idea that UA is a client hint but somehow escapes the opt-in and gets sent anyway, but all others require the Accept-CH thing betrays how arbitrary this design decision is. There’s no privacy benefit, there’s no security/fingerprinting benefit, other new request headers like Sec-Metadata have not followed this pattern, there’s significant developer toil, and an incentive to bad practices.

I’ve previously suggested that this design encourages developers to start any user session with a redirect to capture full CH data before producing a page. The more CHs we add, the stronger that incentive becomes.

PS. As usual, absurdly well-written explainer which I can understand even before I’ve had any coffee.

mikewest · 2018-11-06

Thanks! In the spirit of your first point, I’m going to split your three issues into somewhat more than three things to talk about.

UA-Brand and UA-Version

We could certainly split UA into UA-Version and UA-Brand. That would be consistent with splitting the rest of the detail out into distinct headers. That said, I see “Chrome 41” and “Chrome 73” as being actually different browsers in a deep way. It’s not clear to me that either the branding or version information would be meaningful on its own.

Yes to UA-Engine

Advertising the engine for browsers whose brand is otherwise ambiguous is probably a good idea. I’m not sure I prefer it to advertising the variant as part of the branding (e.g. the status quo’s CriOS as opposed to Chrome). Do you have a reason for preferring the separate header?

One might be that every iOS browser will use the same engine for reasons, so advertising WKWebView or something as the underlying engine might set expectations appropriately. One response to that is that Chrome’s WKWebView has all sorts of things injected into it to enable APIs that WebKit doesn’t (yet?) support, so that Chrome’s WebView is actually quite distinct from Edge’s, which is yet again distinct from Firefox’s. It seems reasonable to me to reflect that in the branding.

Copious lying.

I tend to believe the folks who have told me that it’s hopelessly naive to expect browser vendors not to lie about their nature if developers gate access to features based on user agent information advertised in the request. In my mind, the question is how to support these white lies without destroying the usefulness of the mechanism in the process.
Why is lying bad? In particular, the explainer’s GREASE-like suggestion to turn UA into a set with intentional lies is pretty appealing. It forces developers to parse the whole UA string, and gives smaller players the ability to talk about themselves in one part of the string, while advertising themselves as someone else for purposes of compatibility.

If there was a magic wand I could wave over the internet that made #1 above untrue, I would (perhaps developers will be overtaken by an aesthetic desire for clean UA string? And will start doing pure feature detection rather than relying on unique keys into a table of UA builds containing the genuine metadata?). I think it’s unlikely, given historical precedent.

because the idea that UA is a client hint but somehow escapes the opt-in and gets sent anyway

A pedantic nit: it partially escapes the opt-in, insofar as the major version is exposed by default, and the minor version only after Accept-CH: UA. But the point stands.

There’s no privacy benefit, there’s no security/fingerprinting benefit [to the opt-in]

I disagree!

As the explainer notes, browsers can make decisions about when to respect the site’s request. For example, browsers that categorize the world into “Boo!” and “Yay!” via some heuristic or other can deny additional detail to the former and grant it to the latter.
As the explainer doesn’t note, but ought to, the public nature of the opt-in makes data collection visible to researchers, regulators, and users who care. Rather than broadcasting entropy to every site indiscriminately, we’ll get some collective insight into the entities who actually want to collect it, and the entities with whom they choose to share that detail.

new request headers like Sec-Metadata have not followed this pattern

The guy who wrote that spec probably had good reasons for not following the pattern. He might, for instance, have thought that the requests which are interesting from a security perspective are all third-party (e.g. evil.com poking at a CSRF vulnerability in bank.com, or executing cross-site search attacks against bing.com), and that attackers would be sincerely unlikely to delegate helpful metadata collection to their victims via Feature-Policy.

I’ve previously suggested that this design encourages developers to start any user session with a redirect to capture full CH data before producing a page.

That’s a thing that could happen. It has minor perf impacts. But since we’d ideally redirect through the apex of a registrable domain anyway to set Strict-Transport-Security headers with includeSubdomains, it’s not clear that it’s actually not best practice?

Thanks again for your feedback!

triblondon · 2018-11-06

I see “Chrome 41” and “Chrome 73” as being actually different browsers in a deep way. It’s not clear to me that either the branding or version information would be meaningful on its own.

polyfill.io makes an assumption, for the purposes of not having a ludicrously large compat table, that once a browser ships a feature, support generally continues to exist in subsequent versions. That’s a patterns that generally true. And there are lots of other tools and technologies that segment browsers in terms such as “Chrome >= 50” or whatever.

Do you have a reason for preferring the separate [UA-Engine] header?

“Normal” developers probably don’t want to spend the time maintaining a lookup table of all possible embedded browser names. I get that the multiple-token approach you also suggested might work, but it will depend a lot on a consistent approach to reporting. This might be resolved through more detailed definition of what the tokens should represent.

it’s not clear that it’s [redirect on first request] actually not best practice?

Interesting! I haven’t seen anyone try to defend that pattern yet. HSTS explicitly includes a mechanism to preload the directive so that the redirect on first ever visit is avoided, and goes to some effort to do this - with browser implementors having to upload opt-ins when they encounter HSTS headers with preload, maintain a gigantic preload list, distribute it to all active installs of their browser… that’s a lot of effort to avoid one redirect. I appreciate that there’s a security reason for this as well as a perf one, but if it succeeds in removing the need for that first redirect in most cases, it seems a shame that we’re building up the case for reintroducing it.

How’s Origin-Policy doing? Is there a solution there?

mikewest · 2018-11-06

polyfill.io makes an assumption, for the purposes of not having a ludicrously large compat table, that once a browser ships a feature, support generally continues to exist in subsequent versions.

I’ve personally made this assumption false on a number of occasions. Sorry, <isindex>. I intend to continue making it false in the future (I’m looking at you, document.domain).

That’s a patterns that generally true. And there are lots of other tools and technologies that segment browsers in terms such as “Chrome >= 50” or whatever.

I don’t think this answers my underlying claim that “Browser X” is different than “Browser Y”. It might be the case that “Browser 10” is the same as “Browser 13” along some axis that you care about. It might also be that “Browser 13” dropped support for your favourite element, or changed its MIME type handling in ways you find interesting.

HSTS explicitly includes a mechanism to preload the directive so that the redirect on first ever visit is avoided, and goes to some effort to do this - with browser implementors having to upload opt-ins when they encounter HSTS headers with preload, maintain a gigantic preload list, distribute it to all active installs of their browser… that’s a lot of effort to avoid one redirect.

The reason that the HSTS preload list exists is not the perf impact of one redirect, but the window of opportunity that redirect creates. That is, if https://exciting.site is preloaded, then it’s literally impossible for an attacker to cause a user to visit the site via plaintext, which they could intercept, modify, etc. So when you visit your boring relative’s house, you can be sure that you’re actually seeing https://exciting.site/'s login page, even though that relative is far too uncool to have visited it before.

In the absence of the preload list, network attackers can grab the initial request you make (because literally no one ever types https:// in the address bar, and browsers still default to http:// (though I expect that to change)).

I’d also note that for HSTS, we accept it on basically any subresource response, so it’s quite possible to request an image from your apex (which I think is what Dropbox was doing, when last I checked). Once https://github.com/WICG/feature-policy/issues/129 lands, that won’t be the case for Client Hints. So maybe HSTS is a bad example.

How’s Origin-Policy doing? Is there a solution there?

Spec is woefully out of date. Chrome has a partial implementation behind a flag. I hear positive rumblings from Mozilla. It’s a solution of sorts, but doesn’t address the redirect issue you’re raising, as you only get the policy after touching the origin once to learn that you need to go get it.

mikewest · 2018-11-29

I sketched these headers out in a little more detail in https://tools.ietf.org/html/draft-west-ua-client-hints and https://tools.ietf.org/html/draft-west-lang-client-hint. Feedback would be most welcome.

triblondon · 2018-12-05

I do love a nice IETF draft. So annoying when people make webpages more than 80 characters wide.

The Lang CH not being sent on first request really brings this whole opt-in issue to a head for me. @mikewest you’ve argued in this thread that an initial redirect bounce to collect CH data is OK. I’ve heard others (possibly @igrigorik?) say that CH was not designed for adapting HTML page responses, which is why having it kick in in time for subresource loading is sufficient.

So, categorically, language prefs are main-resource territory, and I can’t exactly render a page in one language and then switch suddenly into another language on a second navigation! How do you propose to solve this? If the answer is that we continue to use Accept-Language if CH is not available, then since changing the language experience unprompted is jarring, I don’t see how you’d ever justify switching to a different data source (unless it would give you the same result, and then what’s the point).

This leads us back to performing a redirect before rendering any content. Smart developers would presumably do this only for browsers, since they would assume that there might be ranking penalties from search engines if they exhibit a redirect response before rendering a page. It also probably means an additional cookie to prevent a redirect loop for browsers that don’t support CH.

I don’t have a clever answer to this problem, and in QUICland, perhaps a redirect isn’t such a terrible thing. But it would be nice to make this pattern easier to implement, perhaps by making it possible to detect CH support on first request?

mikewest · 2018-12-05

If we agree that we should reduce the passive fingerprinting surface we expose to the web by default, I’d suggest that we ought to prefer a route that sends as little data as possible until the site explicitly asks for more. That seems like a pretty good fit for the Client Hints infrastructure, regardless of its original intentions. I recognize that that makes main-frame content negotiation marginally more difficult in exactly the way you’re suggesting here. Given the number of sites with infrastructure complex enough to serve multi-language content (and the fact that my anecdata are full of IP-based language selection rather than Accept-Language-based selection…), an extra redirect seems like a fine tradeoff.

I’d also note that for many of these sites, a redirect happens even in the status quo in order to segregate content based on URL as opposed to figuring out the correct incantation of Vary to make things work correctly (both domain-based segregation, as in google.de and de.wikipedia.org, or path-based a la https://www.lufthansa.com/de/de/homepage).

Perhaps it wasn’t clear, but my intent with this new header is to deprecate and remove Accept-Language entirely.

This is a good point I hadn’t considered. I don’t think it requires a cookie, however, as the redirect endpoint can be distinct from the original URL (e.g. append ?asked-for-language=1 or something similarly visible in the subsequent request).

The spec doc should probably include some reasonable pattern to make that sort of thing clear to developers.

(Also, I’d like to get rid of cookies. I might have mentioned that? )

igrigorik · 2018-12-05

To the contrary, our recent work on Accept-CH-Lifetime is specifically designed to enable delivery of hints on navigation responses.

That’s exactly how we arrived at current design. The constraints are that we don’t want to incur overhead of sending every hint on every request (performance), and want to constrain passive fingerprinting by requiring that the origin explicitly requests the data it needs. As a result, if the user has never visited an origin before, their first navigation does not carry any hints, however the response to that navigation can request the list of hints and set a policy for how long that preference should be persisted. With that in place, hints are delivered on all subsequent navigations and subresource request, with extra bits for 3P delegation.

p.s. effectively, this is the same progressive enhancement mechanics as ServiceWorker…

triblondon · 2018-12-06

Apologies for misrepresenting you.

All my concerns here relate to a first navigation, before we’ve seen an Accept-CH-Lifetime response header. So if this means a new best practice of serving a redirect in this situation, you’re writing off an RTT… which surprises me given the extreme focus this community has had on reducing RTTs in recent years.

I get that some of this concern might relate to advanced use cases / developers only, and many people do do language selection based on GeoIP (we even facilitate some of that kind of pattern usage at Fastly, though we don’t encourage it!).

I’m not trying to flog a dead horse, I understand that there’s no majority for sending the data without opt-in. So that leaves me with two points:

There is potential developer confusion over best practice as it relates to this first exchange, and it would be great if participants in this process that have access to devrel teams use them to help communicate the new normal.
Where there’s an incentive for developers to exhibit different behaviour depending on the type of user agent making the request, it is an anti-pattern to leave them with no option but to lean on the UA header for that, so it would be nice if it were possible to detect support for client hints.

The service worker thing is a great point. More and more it feels like the first request for a page is an ‘install’ request, and that the actual content is likely to follow in a subsequent exchange, after various dependencies have been resolved - even for conventional ‘document’-like content such as news stories. If the web is shifting to that model then the design for CH makes a lot more sense to me.

yoavweiss · 2018-12-06

FWIW, I would love to figure out a solution that enables Client-Hints for the very-first-page-load case, without exposing users to passive fingerprinting. At the same time, I think such a solution, if we’d come up with one(*), can be additive to the current CH infrastructure.

(*) I have a couple of ideas, which everyone hate: a) a DNS flag which indicates server opt-in b) a “preload list” (similar to HSTS preload). I think we need to continue to explore that space, but don’t think it’s a blocker. Opinions welcome!

yoavweiss · 2019-01-31

Due to interest in that proposal here, at a TPAC breakout session as well as on a related HTTPWG thread, the repos have now been adopted as part of the WICG org, and live at https://github.com/WICG/lang-client-hint and https://github.com/WICG/ua-client-hints. Let’s kill some entropy!

marcosc · 2019-02-01

For record keeping purposes, Mozilla folks weighted in a bit in this thread:

mikewest · 2019-02-01

I’m hoping for more feedback from David in particular on the TAG review (https://github.com/w3ctag/design-reviews/issues/320). I believe it’s scheduled for discussion on Tuesday.