Advertising to Interest Groups without tracking

michaelkleber · 2020-06-22

Overview

I’m working to design browser infrastructure that makes it possible for an advertiser to show ads to a selected group of people (a) without the advertiser learning what sites any individual person visits, and (b) without site owners learning what advertisers are using this technique to show ads to any particular person on their site. This capability would bridge a gap between many sites’ source of revenue and the web’s privacy needs in a post-3p-cookies world.

I sketched one solution in the TURTLEDOVE explainer earlier this year. We’ve discussed this proposal in the W3C’s Improving Web Advertising Business Group, where it has elicited an array of suggestions, extensions, and proposed alternate approaches. We’d like to bring it to WICG as a venue to include a broader range of participants and reach consensus on a range of fundamental design choices.

Problem

A large fraction of the web is supported by revenue from advertising. Some online advertising is based on showing someone an ad while they are visiting one site, but based on information about the person’s interests gathered while they were visiting other sites. Historically this has worked by the advertiser or their agent recognizing a specific person as they browse across web sites, a core privacy concern with today’s web.

The privacy concern has led to browsers dropping or planning to drop support for 3rd-party cookies and attempting to block other means of cross-site tracking. This substantially hurts websites’ revenue (per a Google study and other academic and industry analyses) and decreases people’s satisfaction with the ads they see (per the Google study’s “Additional Reflections” section).

By offering a new way to compose web pages that prevents information sharing, we should be able to offer the privacy, economic, and user experience wins all at the same time.

Note that ads may also be selected based on the content of the surrounding page and site, or on prior “first-party” activity of the person while visiting the same site where they see the ad. This incubation is not concerned with those types of ad selection.

Solution space

At a high level, the envisioned novel flow for showing an ad to a group of people involves a series of steps: creating an audience, bidding to show that audience some ad, running an auction among ads (including ones from other sources), rendering the winning ad, and after-the-fact reporting and accounting.

To show the ad without leaking information, each of these steps will require some new browser mechanism with privacy guarantees. There are substantial open questions about possible design choices for each step.

Creating an Audience

An “audience” or “interest group” is the group of people to whom an advertiser will want to show some type of ad, and in this proposal the advertiser (or their agent) still makes the decisions on putting people into groups. In this proposal, though, the browser is responsible for keeping track of a list of which interest groups it has joined, rather than the advertiser maintaining a list of the people in each group. To prevent leaking information, there is no web-visible way to query a browser’s group memberships or to join memberships with other user information — the only thing you can do with a group is target ads at it.

Major open questions include:

What techniques can be used to add people to interest groups. Since the results are not web-visible, it should be feasible to allow some cross-site influence here without enabling cross-site tracking.
- In TURTLEDOVE I started with a simple API with no special cross-site capability.
- There is an issue discussing many viable ways to expand this.
- There is a specific proposal by Ben Savage (Facebook).
- There is a specific proposal by Basile Leparmentier and colleagues (Criteo).
What minimum size an interest group ought to have. We might want to require a minimum size threshold to help address two different threats (issue): a server learning too much information about a targeted individual, and a person seeing an ad that knows too much about them.

Targeting and Bidding

When a person in some interest group visits some web page, we want to offer a way for an ad targeting that interest group to participate in the ensuing ad selection process. This process can include applying filtering rules set by the web site owner (who cares what ads appear on their pages) and by the advertiser (who cares what pages their ads appear on). Then each candidate ad needs to produce a bid, which will compete against other ads’ bids for the same opportunity to appear.

Both filtering and bidding are inherently difficult because they require combining information about the interest group with information about the web page it will appear on — two pieces of information which we need to keep any party from combining, to meet our privacy goals.

The key open question here is how to safely sandbox this targeting and bidding process so that no information leaks out of it. We’d like to reach consensus on one of several possible approaches:

TURTLEDOVE proposed fetching the interest-group-targeted ad in advance of the actual visit to the page, and bidding via pre-fetched in-browser javascript which receives secure signals from its server but is not allowed to store any data or talk to the network. This avoids leaking the sensitive combination of information to any server, but we’ve heard that the pre-fetched in-browser bidding presents many challenges (e.g. ML model size, real-time adjustment of bidding, control of proprietary logic).
Criteo’s SPARROW proposal approaches the problem by introducing an independently-run server that is trusted by browsers to process but not log or leak any privacy-sensitive information, and trusted by ad tech companies to execute their proprietary filtering and bidding logic. This would solve many of the challenges of in-browser bidding. We would need to explore whether or not it is possible to meet the trust requirements and provide sufficient privacy guarantees.
Perhaps there is an intermediate approach that allows the advantages of servers but doesn’t require as much trust in a single entity. The Secure Multi-Party Computation infrastructure proposed for the Aggregation Service may provide inspiration here.

Running an Ad Auction

The decision of which ad should appear in some location on some web page generally involves one or more ad networks running an auction or a sequence of auctions. We need to figure out how to let them do so in a way that can involve consideration of both ads targeted through the new interest-group mechanism being developed here and ads coming from other sources.

The key question remains where the auctions take place. Auctions could run in the same place as the bidding described above — with the ad network’s business logic running either in the browser or on a mutually-trusted server. Alternatively, auctions could run across platforms, for example with the winning bid from a browser-side auction feeding into a subsequent server-side auction. (This would surely require some additional privacy gymnastics).

The output of an auction includes both the winning ad and its price. These must both be handled carefully since they could be vectors for leaking interest group membership information. This probably involves some opaque object — either something like an opaque fetch response or a novel on-device opaque computation result — which can enter later auctions and can be used for rendering and reporting (see below), but cannot be inspected directly.

Rendering an Ad

Once the winning ad has been selected, it must be rendered in the browser. Per our privacy goals, we don’t want the surrounding web site to be able to learn what interest groups its visitors are in, even when one of those interest groups leads to the ad on the page.

This calls for a new type of web page composition, in which content from different sources can appear on the same page but be unable to leak information to each other even if both sources would be willing to do so. Both direct communication (like postMessage) and side-channels (like correlated network activity) are privacy risks.

This new composition model is a separable problem, with multiple plausible solutions for our needs and with potential applications other than ads. We will publish an independent explainer on this topic.

Reporting and Accounting

In addition to a winning ad appearing on the screen, the other outcome of this process is a bunch of logging. This is crucial for at least three reasons: (1) winning bids cause money to change hands, (2) the results of auctions are part of a feedback loop that can affect the bidding in future auctions, and (3) logs are part of after-the-fact compliance auditing and fraud prevention.

In TURTLEDOVE I assumed availability of an Aggregated Reporting API; the current design state is in the Multi-Browser Aggregation Service Explainer (in WICG now). The on-device bidding code, the auction code, and the rendered ad could all use this API to get aggregate statistics on outcomes.

If bidding and auctions run instead on a trusted server, then the details of how that server shares outcome information with the auction participants are important: event-level information would still compromise our privacy goals if it becomes possible for ad networks to associate specific web page visits with specific interest groups after the fact (issue). So some appropriately private aggregated reporting is still needed, even if a trusted server makes its implementation easier.

Key open questions here include:

What design, implementation, and testing decisions lead to a system robust enough that the various parties in the ads ecosystem trust it to handle data that directly affects revenue.
How close to real time we can provide privacy-preserving insight into time-sensitive aggregate questions like win rate and remaining budget.
What provisions we can make to support auditing use cases, which today might include tracking down discrepancies in individual events.

Whew

Thank you for reading, and sorry that even this high-level overview is so long.

We would welcome opinions on how to move this work forward. At this point, the work spans documents in two personal Github repos and several WICG and Web Advertising BG repos. I think the best path would be to move TURTLEDOVE into WICG, encourage the SPARROW authors to move it to WICG also, and work together to combine the two into a consensus design. But I’m open to other modes of work as well.

Kris_Chapman · 2020-06-23

My 0.02 is that audience targeting is about more than just showing an ad to people who all share the same characteristic in the hope that they’ll be interested in the ad. It’s often used as a mechanism for user preferences, frequency capping across browsers/devices, managing messaging across marketing channels, etc. I’d argue that these types of activities are directly beneficial for users, not just advertisers & publishers - and it’s where my concern about having limits for the number of users in a segment stems from. If a user indicates an advertising preference, I don’t think they should have to wait until X number of people share the same preference before their preference can go into effect.

That said, I do appreciate that the smaller the segment size, the bigger the data privacy concern is. That’s why I like the idea of a trusted third-party who doesn’t have a stake in the outcome. I think of this third-party as a sort of bank or credit card company: an entity set up to protect a consumer’s interests while also facilitating transactions.

jwrosewell · 2020-06-23

What are the criteria we are going to use to assess these proposals against one another and confirm that what we end up with is indeed “improved”?

To attempt to answer my own question there is now a draft set of success criteria for improved web advertising available on the W3C Improved Web Advertising Business Group.

https://github.com/w3c/web-advertising/blob/master/success-criteria.md

Understanding where the document falls short of various stakeholder groups interests would be very useful feedback. Perhaps @michaelkleber could comment.

Christopher_Cornett · 2020-06-23

100% agree. I am also concerned that the current designs are going to be very restrictive on web analytics leveraged to provide personalized experiences to users. I imagine everyone remembers the days when you called a bank for one product and then had to be transferred / re-authenticated to discuss another product. It seems like we are heading back to the digital equivalent of this disassociated experience.

jwrosewell · 2020-06-23

It does appear as if we’re taking a step backwards by removing trust choices for people. In no other industry do we require someone to understand the entire supply chain of a vendor. Imagine purchasing a car and having to receive a full list of all the suppliers to the automobile manufacturer. People trust the vendor AND their supply chain.

michaelkleber · 2020-06-23

Hi Kris: Note that the major open questions in the “Creating an Audience” section included both “What techniques can be used to add people to interest groups” and “What minimum size an interest group ought to have”.

I think these cover what you said you want to weigh in on, right? Of course the questions listed here aren’t the only things we need to resolve during incubation, but it’s a good sign that we’re on the right track.

michaelkleber · 2020-06-23

Hi James: I’m not sure what you mean by confirm that what we end up with is indeed “improved”. This is about adding a new capability (“Advertising to interest groups without tracking”), and the question here is whether we should incubate a design — whether it is a valuable thing which browsers should implement and the web ecosystem will use.

Regarding “taking a step backwards by removing trust choices for people” and “full list of all the suppliers”, I’m not sure how it relates to this proposal. I also dislike the UX of consent boxes with large lists of vendors, but I don’t think anything in this proposal is related to that problem.

Kris_Chapman · 2020-06-23

Yup, sorry Michael. I wasn’t saying it was missed. I was just weighing in on why I think it’s important.

jwrosewell · 2020-06-23

@michaelkleber The justification for this proposal is to address privacy concerns and increase people’s satisfaction with the ads they see. Correct?

If so how do we know if these outcomes are achieved by these proposals? Are there any drawbacks or problems generated by the proposal not directly related to the justifications? How do the proposals compare to one another when measured against these outcomes?

If we don’t know how we’ll address these questions incubation is premature.

Does this make sense now?

My second comment related to a “step backwards” is a general observation about the current situation of consent, trust and verification.

michaelkleber · 2020-06-23

Hi James: The justification for this proposal is to (1) enable a bunch of ad ecosystem use cases (2) in a way that browsers will be happy to ship.

Many use cases described in the Web Adv BG’s " Advertising Use Cases" doc include notes on how they can be accomplished with TURTLEDOVE, SPARROW, or other variants in this idea space like Facebook’s PETREL.

Browsers have already published privacy and anti-tracking stances, which is why we think it’s valuable to design a way to meet those use cases but which doesn’t enable tracking.

jwrosewell · 2020-06-23

@michaelkleber - I’m struggling to understand how the “way that browsers will be happy to ship” is a justification. It is a constraint.

As I understand this proposal it is part of an initiative called Privacy Sandbox. Correct?

The blog describing the objectives of Privacy Sandbox states “We are looking to build a more trustworthy and sustainable web together, and to do that we need your continued engagement. We encourage you to give feedback…”

I have responded. I’ve joined the W3C and I’m leaning in, taking an active role. If we’re to build a more trustworthy and sustainable web, which meets all the values and goals of the W3C, we need to define what success looks like. That includes questioning and validating the anti-tracking stances of browsers which might work against the goals of an open web for all.

So the justification of this proposal is singularly “enabling a bunch of ad ecosystem use cases”. We need a method of establishing how proposals are measured against one another and the impacts they have on all the stakeholders. How do you propose this pre-requisite is achieved?

yoavweiss · 2020-06-23

That is strictly not a pre-requisite to incubation.

jyasskin · 2020-06-23

FWIW, we’re trying to nail down the browser-side constraints on solutions in this space in the PING Target Privacy Threat Model. That threat model isn’t final, so if you find things you don’t like, please file issues and PRs.

jwrosewell · 2020-06-24

@yoavweis - as a new participant I’m in no position to dispute your knowledge of the bureaucracy.

Like many other new participants, I’ve joined because the W3C and Google sought wider stakeholder engagement and dialogue. As an engaged new participant, I can observe the merits of this proposal are widely disputed, particular when considering the needs of publishers and marketers. See following evidence:

Boston University (see CafeMedia analysis of 4%)
Facebook
Google

There are wider ramifications for societies and people. I hope the UK’s CMA report into digital advertising due to be published on the 3rd July will provide some insight into these matters.

I also observe many new participants and organisations do not have the time or numbers of people to follow all the different groups, documents and dependencies between them. Only the very largest participants who have the budgets to dedicate many of the brightest and the best people are able to do so. As a consequence the bureaucracy favours the largest participants.

Until these “tussles” are resolved there is no justification to progress this proposal.

yoavweiss · 2020-06-24

I don’t understand if this is evidence that personalized ads are unimportant or super important, as the different links point to studies with extremely different conclusions.

Since you’re a new participant, let me try to explain how this Community Group works. This CG’s role is not to block new proposals or require them to somehow justify their existence to all participants.

The CG’s role is to help participants incubate new proposals, help them establish the use-cases they are trying to solve, find other interested parties, and then design a solution that can solve those use cases. As long as there are multiple parties that are interested in solving a use-case, that’s enough “justification” to create a repo and allow people to work.

marcosc · 2020-06-24

People trust the vendor AND their supply chain.

Um, I’m pretty sure don’t. That’s why we have complete end-to-end check to see if the whole supply chain is ethical, and companies built around that.

marcosc · 2020-06-24

This seems like a reasonable proposal, @michaelkleber. I’d be supportive of seeing you explore it further and to see what would need to change/added to browsers to help something like this happen.

jwrosewell · 2020-06-24

People who trust a publisher to provide them free content in return for viewing advertising do. That is the value exchange.

There are laws created to sanction bad actors who perform bad acts.

Is there a specification or implementation that provides this level of verification across the supply chain?

jwrosewell · 2020-06-24

Precisely. There is a debate to be had and it is happening at the W3C Improving Web Advertising Business Group already.

Creating new groups which duplicate existing work and groups is a classic case of “the bureaucracy is expanding to meet the needs of the expanding bureaucracy”. Only people from organisations with the resources and time, such as browser vendors, are realistically able to participate. This then skews the outcome towards those narrow set of stakeholders to the detriment of all others.

Resolving the existing known issues associated with the proposal must precede any further work which will skew the outcome.

marcosc · 2020-06-24

People who trust a publisher to provide them free content in return for viewing advertising do. That is the value exchange.

Users who learn that their private information is being sold and that they are being tracked are horrified by this. This is not a value exchange: It’s an abuse of people’s trust and consent without their knowledge - why browsers are cracking down on these practices.

There are laws created to sanction bad actors who perform bad acts.

The laws are insufficient due to corruption of the political process by special interests (see the US, for instance). This is why browsers and other folks are having to intervene.

Is there a specification or implementation that provides this level of verification across the supply chain?

Yes. Why most browsers are open source and allow users to modify preferences to protect themselves. See also Mozilla’s missions statement, and the track protection work done by browser vendors.