Overview
I’m working to design browser infrastructure that makes it possible for an advertiser to show ads to a selected group of people (a) without the advertiser learning what sites any individual person visits, and (b) without site owners learning what advertisers are using this technique to show ads to any particular person on their site. This capability would bridge a gap between many sites’ source of revenue and the web’s privacy needs in a post-3p-cookies world.
I sketched one solution in the TURTLEDOVE explainer earlier this year. We’ve discussed this proposal in the W3C’s Improving Web Advertising Business Group, where it has elicited an array of suggestions, extensions, and proposed alternate approaches. We’d like to bring it to WICG as a venue to include a broader range of participants and reach consensus on a range of fundamental design choices.
Problem
A large fraction of the web is supported by revenue from advertising. Some online advertising is based on showing someone an ad while they are visiting one site, but based on information about the person’s interests gathered while they were visiting other sites. Historically this has worked by the advertiser or their agent recognizing a specific person as they browse across web sites, a core privacy concern with today’s web.
The privacy concern has led to browsers dropping or planning to drop support for 3rd-party cookies and attempting to block other means of cross-site tracking. This substantially hurts websites’ revenue (per a Google study and other academic and industry analyses) and decreases people’s satisfaction with the ads they see (per the Google study’s “Additional Reflections” section).
By offering a new way to compose web pages that prevents information sharing, we should be able to offer the privacy, economic, and user experience wins all at the same time.
Note that ads may also be selected based on the content of the surrounding page and site, or on prior “first-party” activity of the person while visiting the same site where they see the ad. This incubation is not concerned with those types of ad selection.
Solution space
At a high level, the envisioned novel flow for showing an ad to a group of people involves a series of steps: creating an audience, bidding to show that audience some ad, running an auction among ads (including ones from other sources), rendering the winning ad, and after-the-fact reporting and accounting.
To show the ad without leaking information, each of these steps will require some new browser mechanism with privacy guarantees. There are substantial open questions about possible design choices for each step.
Creating an Audience
An “audience” or “interest group” is the group of people to whom an advertiser will want to show some type of ad, and in this proposal the advertiser (or their agent) still makes the decisions on putting people into groups. In this proposal, though, the browser is responsible for keeping track of a list of which interest groups it has joined, rather than the advertiser maintaining a list of the people in each group. To prevent leaking information, there is no web-visible way to query a browser’s group memberships or to join memberships with other user information — the only thing you can do with a group is target ads at it.
Major open questions include:
- What techniques can be used to add people to interest groups. Since the results are not web-visible, it should be feasible to allow some cross-site influence here without enabling cross-site tracking.
- In TURTLEDOVE I started with a simple API with no special cross-site capability.
- There is an issue discussing many viable ways to expand this.
- There is a specific proposal by Ben Savage (Facebook).
- There is a specific proposal by Basile Leparmentier and colleagues (Criteo).
- What minimum size an interest group ought to have. We might want to require a minimum size threshold to help address two different threats (issue): a server learning too much information about a targeted individual, and a person seeing an ad that knows too much about them.
Targeting and Bidding
When a person in some interest group visits some web page, we want to offer a way for an ad targeting that interest group to participate in the ensuing ad selection process. This process can include applying filtering rules set by the web site owner (who cares what ads appear on their pages) and by the advertiser (who cares what pages their ads appear on). Then each candidate ad needs to produce a bid, which will compete against other ads’ bids for the same opportunity to appear.
Both filtering and bidding are inherently difficult because they require combining information about the interest group with information about the web page it will appear on — two pieces of information which we need to keep any party from combining, to meet our privacy goals.
The key open question here is how to safely sandbox this targeting and bidding process so that no information leaks out of it. We’d like to reach consensus on one of several possible approaches:
- TURTLEDOVE proposed fetching the interest-group-targeted ad in advance of the actual visit to the page, and bidding via pre-fetched in-browser javascript which receives secure signals from its server but is not allowed to store any data or talk to the network. This avoids leaking the sensitive combination of information to any server, but we’ve heard that the pre-fetched in-browser bidding presents many challenges (e.g. ML model size, real-time adjustment of bidding, control of proprietary logic).
- Criteo’s SPARROW proposal approaches the problem by introducing an independently-run server that is trusted by browsers to process but not log or leak any privacy-sensitive information, and trusted by ad tech companies to execute their proprietary filtering and bidding logic. This would solve many of the challenges of in-browser bidding. We would need to explore whether or not it is possible to meet the trust requirements and provide sufficient privacy guarantees.
- Perhaps there is an intermediate approach that allows the advantages of servers but doesn’t require as much trust in a single entity. The Secure Multi-Party Computation infrastructure proposed for the Aggregation Service may provide inspiration here.
Running an Ad Auction
The decision of which ad should appear in some location on some web page generally involves one or more ad networks running an auction or a sequence of auctions. We need to figure out how to let them do so in a way that can involve consideration of both ads targeted through the new interest-group mechanism being developed here and ads coming from other sources.
The key question remains where the auctions take place. Auctions could run in the same place as the bidding described above — with the ad network’s business logic running either in the browser or on a mutually-trusted server. Alternatively, auctions could run across platforms, for example with the winning bid from a browser-side auction feeding into a subsequent server-side auction. (This would surely require some additional privacy gymnastics).
The output of an auction includes both the winning ad and its price. These must both be handled carefully since they could be vectors for leaking interest group membership information. This probably involves some opaque object — either something like an opaque fetch response or a novel on-device opaque computation result — which can enter later auctions and can be used for rendering and reporting (see below), but cannot be inspected directly.
Rendering an Ad
Once the winning ad has been selected, it must be rendered in the browser. Per our privacy goals, we don’t want the surrounding web site to be able to learn what interest groups its visitors are in, even when one of those interest groups leads to the ad on the page.
This calls for a new type of web page composition, in which content from different sources can appear on the same page but be unable to leak information to each other even if both sources would be willing to do so. Both direct communication (like postMessage) and side-channels (like correlated network activity) are privacy risks.
This new composition model is a separable problem, with multiple plausible solutions for our needs and with potential applications other than ads. We will publish an independent explainer on this topic.
Reporting and Accounting
In addition to a winning ad appearing on the screen, the other outcome of this process is a bunch of logging. This is crucial for at least three reasons: (1) winning bids cause money to change hands, (2) the results of auctions are part of a feedback loop that can affect the bidding in future auctions, and (3) logs are part of after-the-fact compliance auditing and fraud prevention.
In TURTLEDOVE I assumed availability of an Aggregated Reporting API; the current design state is in the Multi-Browser Aggregation Service Explainer (in WICG now). The on-device bidding code, the auction code, and the rendered ad could all use this API to get aggregate statistics on outcomes.
If bidding and auctions run instead on a trusted server, then the details of how that server shares outcome information with the auction participants are important: event-level information would still compromise our privacy goals if it becomes possible for ad networks to associate specific web page visits with specific interest groups after the fact (issue). So some appropriately private aggregated reporting is still needed, even if a trusted server makes its implementation easier.
Key open questions here include:
- What design, implementation, and testing decisions lead to a system robust enough that the various parties in the ads ecosystem trust it to handle data that directly affects revenue.
- How close to real time we can provide privacy-preserving insight into time-sensitive aggregate questions like win rate and remaining budget.
- What provisions we can make to support auditing use cases, which today might include tracking down discrepancies in individual events.
Whew
Thank you for reading, and sorry that even this high-level overview is so long.
We would welcome opinions on how to move this work forward. At this point, the work spans documents in two personal Github repos and several WICG and Web Advertising BG repos. I think the best path would be to move TURTLEDOVE into WICG, encourage the SPARROW authors to move it to WICG also, and work together to combine the two into a consensus design. But I’m open to other modes of work as well.