[Proposal] Migrate some high-entropy HTTP request headers to Client Hints

client-hints
Tags: #<Tag:0x00007f8c12296288>

#1

A Problem

HTTP request headers expose quite a bit of information about users by default, even over plaintext. It would be better if this information was a) opt-in, b) locked to secure transport, and c) delegated by the first-party to specific third-parties. Client Hints, conviniently enough, provides exactly this infrastructure.

Some Proposals

User-Agent and Accept-Language headers seem like particularly low-hanging fruit:


#2

Generally I think this idea is reasonable. Three issues:

1. Not granular enough?

I’d say if we are trying to make the data more granular, why not finish the job and make the values singular, ie. UA-Brand: Chrome and UA-Version: 189 rather than UA: Chrome 189.

2. Intentionally incorrect tokens

Yes to UA-Engine of course (and UA-Engine-Version), otherwise chaos will continue to reign in embedded-browser-land. But nevertheless we should expect copious lying. The danger of the proposed approach is that whereas with the User-Agent string, we could mostly treat it as a unique key into a table of UA builds containing the genuine metadata, this mechanism will likely produce an increase in the number of cases where two or more different browsers submit exactly the same UA-related metadata and are therefore indistinguishable (unless we take the craziness up a notch and try to distinguish them based on variations in what they do with the Accept header or how they format Accept-Language or something)

What may be the only way to prevent this is to send a header with a verifiable build signature that can’t be replicated by other browsers. I don’t know how you’d do this but cleverer people might.

3. The CH opt-in

User agents will attach the UA header to every secure outgoing request by default, with a value that includes only the major version (e.g. " Chrome 69 "). Servers can opt-into receiving more detailed version information in the UA header, along with the other available Client Hints, by delivering an Accept-CH header or Accept-CH-Lifetime header in the usual way.

So, I had though that the weirdness (IMHO!) of CH being opt-in but not subject to user consent was a lost battle. This suggests that the debate is still worth having, because the idea that UA is a client hint but somehow escapes the opt-in and gets sent anyway, but all others require the Accept-CH thing betrays how arbitrary this design decision is. There’s no privacy benefit, there’s no security/fingerprinting benefit, other new request headers like Sec-Metadata have not followed this pattern, there’s significant developer toil, and an incentive to bad practices.

I’ve previously suggested that this design encourages developers to start any user session with a redirect to capture full CH data before producing a page. The more CHs we add, the stronger that incentive becomes.

PS. As usual, absurdly well-written explainer which I can understand even before I’ve had any coffee.


#3

Thanks! In the spirit of your first point, I’m going to split your three issues into somewhat more than three things to talk about. :slight_smile:

UA-Brand and UA-Version

We could certainly split UA into UA-Version and UA-Brand. That would be consistent with splitting the rest of the detail out into distinct headers. That said, I see “Chrome 41” and “Chrome 73” as being actually different browsers in a deep way. It’s not clear to me that either the branding or version information would be meaningful on its own.

Yes to UA-Engine

Advertising the engine for browsers whose brand is otherwise ambiguous is probably a good idea. I’m not sure I prefer it to advertising the variant as part of the branding (e.g. the status quo’s CriOS as opposed to Chrome). Do you have a reason for preferring the separate header?

One might be that every iOS browser will use the same engine for reasons, so advertising WKWebView or something as the underlying engine might set expectations appropriately. One response to that is that Chrome’s WKWebView has all sorts of things injected into it to enable APIs that WebKit doesn’t (yet?) support, so that Chrome’s WebView is actually quite distinct from Edge’s, which is yet again distinct from Firefox’s. It seems reasonable to me to reflect that in the branding.

Copious lying.

  1. I tend to believe the folks who have told me that it’s hopelessly naive to expect browser vendors not to lie about their nature if developers gate access to features based on user agent information advertised in the request. In my mind, the question is how to support these white lies without destroying the usefulness of the mechanism in the process.

  2. Why is lying bad? In particular, the explainer’s GREASE-like suggestion to turn UA into a set with intentional lies is pretty appealing. It forces developers to parse the whole UA string, and gives smaller players the ability to talk about themselves in one part of the string, while advertising themselves as someone else for purposes of compatibility.

If there was a magic wand I could wave over the internet that made #1 above untrue, I would (perhaps developers will be overtaken by an aesthetic desire for clean UA string? And will start doing pure feature detection rather than relying on unique keys into a table of UA builds containing the genuine metadata?). I think it’s unlikely, given historical precedent.

because the idea that UA is a client hint but somehow escapes the opt-in and gets sent anyway

A pedantic nit: it partially escapes the opt-in, insofar as the major version is exposed by default, and the minor version only after Accept-CH: UA. But the point stands.

There’s no privacy benefit, there’s no security/fingerprinting benefit [to the opt-in]

I disagree!

  1. As the explainer notes, browsers can make decisions about when to respect the site’s request. For example, browsers that categorize the world into “Boo!” and “Yay!” via some heuristic or other can deny additional detail to the former and grant it to the latter.

  2. As the explainer doesn’t note, but ought to, the public nature of the opt-in makes data collection visible to researchers, regulators, and users who care. Rather than broadcasting entropy to every site indiscriminately, we’ll get some collective insight into the entities who actually want to collect it, and the entities with whom they choose to share that detail.

new request headers like Sec-Metadata have not followed this pattern

The guy who wrote that spec probably had good reasons for not following the pattern. He might, for instance, have thought that the requests which are interesting from a security perspective are all third-party (e.g. evil.com poking at a CSRF vulnerability in bank.com, or executing cross-site search attacks against bing.com), and that attackers would be sincerely unlikely to delegate helpful metadata collection to their victims via Feature-Policy.

I’ve previously suggested that this design encourages developers to start any user session with a redirect to capture full CH data before producing a page.

That’s a thing that could happen. It has minor perf impacts. But since we’d ideally redirect through the apex of a registrable domain anyway to set Strict-Transport-Security headers with includeSubdomains, it’s not clear that it’s actually not best practice?

Thanks again for your feedback!


#4

I see “Chrome 41” and “Chrome 73” as being actually different browsers in a deep way. It’s not clear to me that either the branding or version information would be meaningful on its own.

polyfill.io makes an assumption, for the purposes of not having a ludicrously large compat table, that once a browser ships a feature, support generally continues to exist in subsequent versions. That’s a patterns that generally true. And there are lots of other tools and technologies that segment browsers in terms such as “Chrome >= 50” or whatever.

Do you have a reason for preferring the separate [UA-Engine] header?

“Normal” developers probably don’t want to spend the time maintaining a lookup table of all possible embedded browser names. I get that the multiple-token approach you also suggested might work, but it will depend a lot on a consistent approach to reporting. This might be resolved through more detailed definition of what the tokens should represent.

it’s not clear that it’s [redirect on first request] actually not best practice?

Interesting! I haven’t seen anyone try to defend that pattern yet. HSTS explicitly includes a mechanism to preload the directive so that the redirect on first ever visit is avoided, and goes to some effort to do this - with browser implementors having to upload opt-ins when they encounter HSTS headers with preload, maintain a gigantic preload list, distribute it to all active installs of their browser… that’s a lot of effort to avoid one redirect. I appreciate that there’s a security reason for this as well as a perf one, but if it succeeds in removing the need for that first redirect in most cases, it seems a shame that we’re building up the case for reintroducing it.

How’s Origin-Policy doing? Is there a solution there?


#5

polyfill.io makes an assumption, for the purposes of not having a ludicrously large compat table, that once a browser ships a feature, support generally continues to exist in subsequent versions.

I’ve personally made this assumption false on a number of occasions. Sorry, <isindex>. I intend to continue making it false in the future (I’m looking at you, document.domain).

That’s a patterns that generally true. And there are lots of other tools and technologies that segment browsers in terms such as “Chrome >= 50” or whatever.

I don’t think this answers my underlying claim that “Browser X” is different than “Browser Y”. It might be the case that “Browser 10” is the same as “Browser 13” along some axis that you care about. It might also be that “Browser 13” dropped support for your favourite element, or changed its MIME type handling in ways you find interesting.

HSTS explicitly includes a mechanism to preload the directive so that the redirect on first ever visit is avoided, and goes to some effort to do this - with browser implementors having to upload opt-ins when they encounter HSTS headers with preload, maintain a gigantic preload list, distribute it to all active installs of their browser… that’s a lot of effort to avoid one redirect.

The reason that the HSTS preload list exists is not the perf impact of one redirect, but the window of opportunity that redirect creates. That is, if https://exciting.site is preloaded, then it’s literally impossible for an attacker to cause a user to visit the site via plaintext, which they could intercept, modify, etc. So when you visit your boring relative’s house, you can be sure that you’re actually seeing https://exciting.site/'s login page, even though that relative is far too uncool to have visited it before.

In the absence of the preload list, network attackers can grab the initial request you make (because literally no one ever types https:// in the address bar, and browsers still default to http:// (though I expect that to change)).

I’d also note that for HSTS, we accept it on basically any subresource response, so it’s quite possible to request an image from your apex (which I think is what Dropbox was doing, when last I checked). Once https://github.com/WICG/feature-policy/issues/129 lands, that won’t be the case for Client Hints. So maybe HSTS is a bad example.

How’s Origin-Policy doing? Is there a solution there?

Spec is woefully out of date. Chrome has a partial implementation behind a flag. I hear positive rumblings from Mozilla. It’s a solution of sorts, but doesn’t address the redirect issue you’re raising, as you only get the policy after touching the origin once to learn that you need to go get it.