Extending localization beyond language


#1

Continuing the discussion from Communicating your home location:

Browsers (or at least Chrome) do have mechanisms that select a language for a site, or at least Chrome does: the “Translate this page” popup. I propose that browsers can start using Accept-Language headers (or possibly another new header if Accept-Language has been devalued by UAs sending it without regard for user preference) with an interface like this, which may have new triggers for being exposed (like perhaps an “internationalizable” attribute on <html> alongside lang, or a <meta name="localizations"> tag that can contain a list of available localizations/languages as its content attribute).

A new header could also address the fact that internationalization/localization is more than just language: https://en.wikipedia.org/wiki/Internationalization_and_localization notes things like currency and units of measurement, which, when considered at all in the current Web contexts, is usually conflated with language (and/or, as mentioned in the linked thread, IP block).

Some additional thoughts on a Locale header:

  • It could provide granularity for each kind of locale category that is currently configurable in native locale settings, though it should also have an overall locale (that can be easily parsed out) for sites that can’t handle the developer burden of more granular localization.
  • If a header is present that addresses these non-language localization concerns, that could be used as a kind of semaphore that the Accept-Language header can be trusted (if browsers are currently misrepresenting it).

There’s also the matter of time zone / UTC offset, which can be exposed from the host OS (and, since that’s generally the only place the user will adjust this, should be), and should be treated as separate from these other concerns (since when people travel, their time zone will change but their language generally won’t). This has been talked about here before, but discussion of a request-level header that is exposed to the server, despite being requested and something implementations could distinctly benefit from (this StackOverflow answer, asking how to accomplish such a thing, has over 360 upvotes at time of writing), discussion didn’t really continue past the initial stages.

Of course, all this comes together to be a vector for fingerprinting (eg. tracking the one anonymous user on your site who speaks German on the East Coast of Mexico), but when the experience this can solve is so useful, it’s not worth holding back for that concern, especially when there are other ways to counter fingerprinting.

TLDR: What I’m proposing:

  • Browsers expose the localization settings of the user via headers, including Accept-Language as they currently do, in addition to two new headers like Accept-Locale and User-Timezone.
  • Sites expose the localizations they provide as part of their returned HTML. Sites provide content based on either stored user preferences or the headers on the request; if the preferences on the request do not match the preferences for the site (and the site provides the preferences offered in the request), the site should ask the user if they wish to change their user-localization preferences to match the request. (Sites may also choose to only use user preferences as a fallback, or allow users to override their global browser-specified preferences.)
  • Browsers store the user’s localization preferences (where the defaults are determined from a mechanism like a setting provided by the host OS environment), as they currently do (partially) for mechanisms like Translation popups.
  • When the page’s stated/detected language does not match the user’s localization settings, browsers present a popup suggesting translation (including localization) if the current page does not match the user’s localization settings, as Chrome does now. If they offer automated translation services (as Chrome does), they may present their automated options, with some kind of signifier to distinguish which translations would be presented by the site.
  • To accommodate sites which unwittingly provide sub-par localizations, browsers may also surface mechanisms to request an alternative localization for the current site, and optionally translate from that alternative “source” language using their automated translation facilities.
  • When the page’s stated/detected language does match the user’s localization settings (which may not be accurate, for whatever reason), but sites expose the localizations they provide, browsers expose an icon for their Translation interface (as Chrome does now for pages where the Translation dialog is presented and dismissed), which opens the Translation / Localization dialogue described in the last item.

What I’m looking for feedback on here is:

  • What do browser vendors think of exposing better i18n/l10n preferences to the user as part of the browser interface?
  • What would be a good structure for HTTP and/or HTML extensions to solve the most pressing use cases for this spec? (Mostly in terms of structural / syntax bikeshedding.)