Building websites that cannot read, exfiltrate, or store the data they operate on

Summary

Redact is a project that aims to leverage only existing and well-known browser components to build “zero-knowledge” websites: sites that allow for user interaction with user data, but cannot themselves see that data. The project’s aim is to provably guarantee a site’s inability to abuse or sell user data, while maintaining the site’s ability to provide rich interactions between users and their data. This is achieved in a way that is fully backwards-compatible with “web 1.0”: HTML elements only, and absolutely no JS or client-side crypto. Furthermore, unlike existing data privacy initiatives that are based around trust and promising to abide by a user’s preferences, this technology provides technological guarantees that require no trust of the website on the user’s part or assumption of liability of having to securely store data on the website’s part.

The basic premise of the technology works as such:

  1. A user installs the Redact client on their local device as a native app (this could be integrated into the browser in the future)

  2. The client opens a port on the local device, and begins listening to requests

  3. When the user visits a Redact-enabled website, the website serves a standard HTML page, but wherever user data would typically be placed, these individual pieces of data are represented as iframe elements pointing to the Redact client

    a. An example of this would be an iframe pointing to localhost:8080/.firstName.

  4. The client responds with a series of pages that block CSRF attacks, and finally serves the data (as a string, integer, boolean, or multimedia type) inside the secured iframe element

Thanks to CORS, CSP, and the iframe sandboxing features long-stabilized and available in any modern browser, the data displayed in this iframe is completely unavailable for reading or exfiltration by the parent website. Additionally, thanks to the endpoint’s CSRF protections, the data endpoint cannot be manually hit to exfiltrate data via the XmlHttpRequest API. However, this only covers displaying of user data.

Redact also supports full CRUD operations on “redacted” data. By including the edit=true query parameter to the iframe source, the Redact client responds to the request with an input field appropriate to the requested data’s type. The user can then input a different value for the requested field, from within the secure iframe. We are currently exploring ways to provide visual feedback that a field has been “redacted” and can be safely interacted with, similar to TLS’ green-lock icon.

Redact also includes features for smooth integration of redacted fields with the styling and UI of the parent website. A css=... query parameter allows for including arbitrary styling which will be applied to the returned iframe contents.

Redact includes a federated storage component for non-technical users to be able to store their user-data for use and re-use across multiple websites in a secure way. The local client does not store a user’s data locally. When data is submitted to the client via an editable iframe, it is encrypted using a well-known symmetric encryption algorithm as provided by libsodium. The encrypted blob is sent, along with the unencrypted key value for retrieval, to a third-party storage provider whom need not be trusted. Similarly, when data is requested by a website via the iframe API, it is fetched from storage, decrypted locally, and then served via secure iframe.

Redact’s final component allows for a website to provide a customized, “logged in” view to each of their users, with redacted data. Websites can optionally provide a (same-origin) relay_url=... query parameter to the editable iframe source. When the user submits data to the field, the relay_url endpoint is notified that the user has done so, providing the key of the data submitted and some highly restricted metadata such as submission time. This allows a website to keep a database of references/paths to user data associated with user sessions, and then populate the page with those data fields when the same user comes to visit the page again. This also eliminates the need to ever have another login screen.

Redact essentially allows a website to orchestrate and templatize a user’s interactions with their data, without ever seeing that underlying data. Imagine a social media site capable of knowing that a million users shared a million posts amongst each other, without ever knowing the contents of those posts, or in effect any concrete data about its users. It combines this with a federated way for user’s to store their own data, shifting the web’s paradigm of data ownership from platform-based to user-based. As a side-effect, it also eliminates one of the web’s trickiest subjects: login screens.

Motivation and Use Cases

The motivation for this technology arose out of data privacy concerns and the opaque abuse of user data for the purposes of behavior profiling and prediction. When data privacy and security was thrust into the public eye in 2018 with the Cambridge Analytica scandal, the primary question the developers of Redact asked themselves was, “How do we allow for the rich user experience of the web, without the providers of those experiences being capable of reading the contents of user interactions?”

Use cases:

  1. A private, user-focused social media website. In such a website, users who have added each other as friends would be capable of reading each other’s posts, and the website would be capable of hosting and organizing those interactions, but the contents of the posts would be hidden from the website owner. A VERY rough/bare-bones implementation of this is available at redact-feed-ui.dev.pauwelslabs.com. This is the first publically available Redacted website.

  2. A secure and private EHR or telehealth portal. One of the primary issues in migrating health data to the web is the health portal provider’s ability (or inability) to properly secure that data. With the availability of a technology like Redact, the developer of a health portal could focus on building an innovative UI connecting patients to health providers without the overhead of having to build secure storage for user data. Additionally, by storing their own data, users/patients can transport that data across different portal providers.

  3. Small, self-contained redacted modules. Rather than immediately switching to entire redacted websites, things like end-to-end encrypted chat modules could be packaged and provided to a website as a chunk of HTML and JS, where only that module is redacted.

Compatibility Risk

A redacted website is dependent on having a locally installed client able to provide responses to secure iframe requests. If such a client were integrated in a web browser, it would reduce the barrier of entry towards accessing Redact websites, but it would also mean that if such a client were removed, it would completely break previously working websites.

Redact as a technology and protocol could co-exist with existing websites, but it would be incompatible with how data is currently handled on the web.

Links to implementations and demos

Redact has been fully implemented in an end-to-end fashion, complete with client and storage implementations, and a working redacted website showcasing the majority of its basic features. All code is provided under GPLv2 licenses.

Client: GitHub - pauwels-labs/redact-client: Receives incoming requests from the browser and serves up decrypted contents in a secured iframe in response.

Storage: GitHub - pauwels-labs/redact-store: Provides a universal encrypted data storage interface for Redact.

Crypto abstractions on top of libsodium: GitHub - pauwels-labs/redact-crypto: Contains all cryptographic abstractions used across redact codebases.

Some very basic docs and installation instructions are available here: https://docs.redact.ws

A redact-enabled website is available here: https://redact-feed-ui.dev.pauwelslabs.com

Concerns and Mitigations

Ultimate privacy is not always a good thing. Moderation helps ensure communities abide not only to a country’s laws, but also to a wider moral and ethical standard as defined by the owners of a website. Often cited is avoiding the proliferation of CSAM, or child sexual-abuse material, on platforms which use end-to-end encryption to guarantee the privacy of a user’s data. Detecting and fact-checking misinformation has also become an important aspect of being a responsible platform on the internet.

There are a couple potential ways to mitigate these issues. Similar to Apple/WhatsApp’s proposals to package CSAM-detecting algorithms in client-side code for end-to-end encrypted messaging systems, such algorithms could be implemented, reviewed, and approved by the community and included within the Redact client. If the client is packaged as part of a browser, websites could additionally specify that certain algorithms must be available to scan user data, and report offending behavior to the website owner, either with or without the original material.

We are open to other alternatives and technologies that could be securely implemented at the client level to detect and report illegal material.