[Proposal] First-Party Sets

mikewest · 2019-01-30

(The following is a quick summary of the explainer at https://github.com/mikewest/first-party-sets. If you’re interested in more details, I’d suggest skimming that document.)

A Problem

One pattern that most browsers have agreed upon is a categorization of requests and documents into “first-party” and “third-party” buckets, giving users the option to regulate cross-context access to persistent state.

A number of browser features depend upon this distinction to some extent. Cookie controls are the most prominent example, followed by narrower features like credential sharing schemes (Shared Web Credentials and Smart Lock for Passwords, for example), process selection, etc. Note that the latter two examples rely on proprietary heuristics in some cases.

The first-/third-party distinction breaks down to an extent in practice, as a single entity will often host its assets and services across domains that aren’t known a priori to be related. Consider https://apple.com/ and https://icloud.com/ , https://google.com/ and https://youtube.com/ , or https://amazon.com/ and https://amazon.de/ . These origins all represent distinct registrable domains, and are generally considered “third-party” to each other, though they’re controlled by the same entity, and explicitly share state information with each other in order to support features like single sign-on.

A Proposal

One way of approaching this problem would be to run with an approach similar to what some native platforms have already shipped: JSON files hosted at well-known locations on various origins that wish to assert their shared first-partyness. We could allow https://a.example/ , https://b.example/ , and https://c.example/ to declare themselves as a first-party set by hosting a JSON file at /.well-known/first-party-set containing a first-party-set member which holds the set of origins being asserted:

{
  ...,
  "first-party-set": {
    "origins": [ "https://a.example/", "https://b.example/", "https://c.example/" ]
  }
  ...
}

https://a.example/ informs the browser about this file by delivering an X-Bikeshed-This-Origin-Asserts-A-First-Party-Set: ?T header. Upon receipt, the browser pulls down this file, as well as the files for any other origin asserted to be in the set. If all the origins assert the same set, huzzah! If not, it’s not a set.

What does being first-party set enable?

I have three things in mind:

The “block third-party cookies and site data” behavior in browsers could respect this notion of first-partyness to avoid breaking same-entity interactions. Likewise, browsers can enhance their cookie control mechanisms with this additional metadata. For example, “Forget this site” can shift towards “Forget this entity”, wiping data for an entire set of first-parties at once.
Browsers’ credential sharing behavior for sites which are affiliated could substitute this proposal for the vendor-specific solutions which exist today.
Browsers may use first-party sets as one additional input into heuristics around their process models while they ramp up to strict origin isolation.

WDYT?

krgovind · 2020-05-28

Bumping this thread back up because the proposal has undergone significant revision, and has recently been the subject of discussion on various forums; including at PrivacyCG, and WebAdvBG.

Our primary motivation for this proposal remains to define a privacy boundary that allows browsers to eliminate cross-site tracking that currently relies on mechanisms such as third-party cookies and fingerprinting. Tracking policies and privacy models from various browser vendors - Chromium, Edge, Mozilla, WebKit - scope access to user identity to some notion of first-party, which we refer to as a privacy boundary.

Although the top-level document’s registrable domain can act as a natural privacy boundary; it is clear that multi-domain sites are a reality, which compels us to define a better alternative. For example, Firefox ships an entity list to group together domains belonging to the same organization.

Organizations generally prefer maintaining distinct domain names to manage branding, or to allow for future business sales/acquisitions. Additionally, choosing the registrable domain as the privacy boundary may compel organizations to move all their web properties to a single parent domain. The parent domain that a property is hosted on may change with business ownership, and train users to make security decisions based on the subdomain component of URLs. This could make them more susceptible to phishing attacks.

First-Party Sets serves as a web platform mechanism that allows site operators to assert a list of domains as being associated with the same entity. This then allows us to define a top-level document’s First-Party Set as the privacy boundary. Browsers may choose to not impose cross-domain communication restrictions across members of a given First-Party Set (such as is done in practice with disconnect.me’s extension, Firefox ETP’s use of the entity list, and Edge Tracking Protection’s similar exception for same-party domains). However, it is important to apply a set of countervailing pressures:

Preventing abuse by unrelated websites forming a First-Party Set - This is achieved by requiring every organization to submit their list for acceptance based on conformance with a published UA policy.
Making site associations visible to the user - This is achieved by making First-Party Sets discoverable via various browser UI surfaces.
Discourage formation of arbitrarily large sets by imposing storage and entropy limits - Browser storage limits and entropy limits such as the proposed Privacy Budget that are currently applied per-domain are applied per First-Party Set

The primary goal of this proposal is to standardize First-Party Sets as the web’s privacy boundary.

In order to ensure privacy, security, and platform predictability, it may also be valuable to standardize the framework for:

Assertion - The notion of letting sites opt-in to First-Party Sets by designating an “owner domain” and defining “member domains”. The format for the .well-known manifest file.
Acceptance - Agreement on what constitutes the “UA policy” and verification process. The proposal currently allows maximum flexibility for individual UAs to define their own policy and process, but it may be desirable to achieve rough consensus.
Delivery - The current proposal allows for both (a) static lists shipped with the browser; or (b) signed assertions served by the site, to help overcome scalability issues with a static list.

We would also like to invite the community’s inputs on our alternative design of defining sets using origins instead of registrable domains, which can help mitigate some of the problems with web platform features’ reliance on Public Suffix List.

pbannist · 2020-05-28

While I believe that the First-Party sets proposal is valuable and serves a material purpose, it contains significant bias that entrenches major organizations at the expense of smaller ones. And it does this in a way that:

Significantly relies on an opaque and arbitrary construct (the “organization”) to create the appearance (but not reality) of more privacy for users, via pre-existing UA policies.
Implies that small organizations are more likely to take advantage of users and their privacy.

As I pointed out in my issue on the github proposal (https://github.com/krgovind/first-party-sets/issues/14), large organizations can own many disparate entities that are not obvious to the user. From a user’s perspective, there is no obvious way for them to discern that “geico.com” and “dairyqueen.com” are owned by the same organization (Berkshire Hathway, in this case), and have grouped together to provide web services. Just as there is no way for them to discern that two small companies (foo.com and bar.com) have partnered together to create a single First-Party set across their domains to provide similar services.

The addition of browser UI surfaces that allow the user the discover that a given domain is part of a First-Party set is a great way to create transparency to the user. Additionally, Privacy Budget and browser storage limits prevent their generalized misuse. However, these elements do not address the underlying point that small organizations are biased against. Another issue on the github proposal (https://github.com/krgovind/first-party-sets/issues/13) speaks to how First-Party sets (as authored) will drive the consolidation of ownership of websites and remove diverse and unique voices from the open web.

My primary issue with this proposal is the reliance on browser UA policies, which all rely on “organizations” to be the deciding factor about whether a given First-Party Set is valid or not. This issue appears in the often-repeated phrase “unrelated websites”. The definition of what is related or not related is completely arbitrary to the user. “Unrelated websites” is a meaningless term, as there is clearly no relation between geico.com and dairyqueen.com - but that is a completely valid First-Party set. This implication that a large organization can be “trusted” to create a First-Party set, but many small organizations cannot group together exposes the underlying bias in these UA policies.

I believe that browsers should modify their UA policies to allow for small organizations to group together and form First-Party sets (or similar entities) as long as they adhere to all other relevant policies. Alternatively, the authors should remove the requirement of using the browser UA policy from this proposal.

Without addressing this major concern, I do not believe that this proposal represents the needs of the many stakeholders of a thriving web, and there are ways to solve for this issue that do not negatively impact user privacy concerns in any way.

mallory · 2020-05-28

I agree with @pbannist about the problem: This preferences big over small entities, I come to a different conclusion in that I think a privacy concerned user would want dairyqueen.com and geico.com to be considered separate entities.

In short: I think the solution is to be less permissive, not more, to comport with user expectations of what “first party” means.

-Mallory

lassey · 2020-05-29

In https://lists.webkit.org/pipermail/webkit-dev/2020-May/031222.html @othermaciej noted that while there are still issues to work out, that he would support moving this to a CG such as WICG to incubate.

othermaciej · 2020-06-01

This is an interesting technology, though in my opinion incomplete as-is because it has not solved the hard problems.

I feel like this is a better fit for Privacy CG than WICG, since the right privacy experts from browsers, search engines, and adtech ecosystem are there.

jwrosewell · 2020-06-02

The discussion in the W3C Improving Web Advertising Business Group this week, specifically in relation to First Party sets, once again raises the issue of governance of the various proposals that have been put forth. Like Turtledove/Sparrow, the proposals around first party sets imply (in fact, they require) a governance structure. Specifically, the group discussed that in some cases independent domains should be allowed to federate browser data, while in other cases this would not be allowed. This means a decisioning structure needs to be put in place to provide basic rules for what federation(s) would be allowed, and to potentially adjudicate requests and violations.

This same requirement is central to the debate over Turtledove and Sparrow, where the main discussion is around what entities have access to end user content consumption data and are responsible for creating the cohorts and populating the reporting structures.

In both cases, it seems implied that the only “governance” is the browsers themselves, and that this governance will be opaque (not necessarily published, without clearly visible procedures).

This proposal needs an explicit understanding of what governance structures are being proposed. There needs to be success criteria for the application of these policies. These criteria should benefit all stakeholders including browser vendors who would avoid any appearance of collusion that could otherwise be viewed as stifling competition. The W3C Improving Web Advertising Business Group have developed a draft of such succes criteria.

Recognizing there are important questions to address in finalizing these success criteria to evaluate first-party sets and other similar proposals aimed at improving web advertised. A non-exhaustive list below highlights some of these issues that deserve greater attention:

What safeguards are in place to ensure that browser decisions are not unilateral and are consistent with the agreed norms for content owners, marketers who fund them and other stakeholders?
If for-profit companies govern certain content consumption data collection and processing activities, and if so, what is the minimum number required for a competitive open market as well as should there be limits on the number of these governing authorities?
How should cross-publisher data sharing permissions be granted, administered and audited?
Which risks to people will these changes reduce or eliminate?
Should people be given the right to overrule default settings to further restrict or more broadly allow the collection and processing of their content consumption data?

melanierichards · 2020-06-02

Speaking on behalf of the Microsoft Edge team, we believe that First-Party Sets could be useful in helping unblock valid intra-organizational use cases while maintaining the right privacy promises. We’re supportive of exploring this idea further. Agreed that as a community we’ll need to continue workshopping mitigations against abuse while striking the right balance between organizational cohesion vs. sets that can be reasoned about by most users. We’re hopeful that we can collectively come up with solutions to these considerations, and are interested in continued discussion on First-Party Sets!

cwilso · 2020-06-03

Great. We transferred this to WICG for the time being; krgovind was going to file a Privacy CG proposal issue to enable discussion there, and if that CG wants to pick it up as a work item we can transfer it over there.

pbannist · 2020-06-05

I’m glad this was moved to the WICG and I support that it stay there. FPS has many facets to it, some of which are related to privacy, but others are related to other considerations. Being discussed in the Privacy Group would make privacy the overriding priority, rather than a broader set of considerations (inclusive of privacy) discussed around the proposal and its potential implementation.