Intent to Migrate: Reporting and Network Error Logging

dcreager · 2018-06-28

We would like to migrate the Reporting and Network Error Logging (NEL) proposals from WICG into the Web Performance Working Group.

Working group decision to adopt

Added to proposed working group recharter

Proposal

Reporting: [draft spec] [GitHub repo]

Network Error Logging: [draft spec] [GitHub repo]

Summary

Reporting: Provides a best-effort report delivery system that executes out-of-band with website activity

Network Error Logging: Defines a mechanism enabling web applications to declare a reporting policy that can be used by the user agent to report network errors for a given origin

Motivation and Use Cases

Network Error Logging

Accurately measuring performance characteristics of web applications is an important aspect in helping site developers understand how to improve their web applications. The worst case scenario is the failure to load the application, or a particular resource, due to a network error, and to address such failures the developer requires assistance from the user agent to identify when, where, and why such failures are occurring.

Today, application developers do not have real-time web application availability data from their end users. For example, if the user fails to load the page due to a network error, such as a failed DNS lookup, a connection timeout, a reset connection, or other reasons, the site developer is unable to detect and address this issue. Note that these kinds of network errors cannot be detected purely server-side, since by definition the client might not have been able to successfully establish a connection with the server.

Existing methods (such as synthetic monitoring) provide a partial solution by placing monitoring nodes in predetermined geographic locations, but require additional infrastructure investments, and cannot provide truly global and near real-time availability data for real end users.

Reporting

Instead of defining the report delivery logic directly in NEL, this logic has been factored out into a separate specification, Reporting. Reporting can be used by other specs that need out-of-band report delivery, without having to duplicate the report delivery logic in all of them. For example, Content Security Policy (CSP) is planning to adopt Reporting as a new delivery mechanism.

Compatibility Risk

The primary compatibility risk is in the format of the Report-To and NEL HTTP response headers, and in the payload format of the reports sent back to Reporting collectors. Ideally, any future changes to these formats would be backwards compatible (e.g., adding new fields and allowed values). Any changes that require removing or renaming fields, or changing the interpretation of existing values, will have an impact on already deployed implementations.

Ongoing technical constraints

User agents will have to hook into their network stack to generate NEL reports about requests to an origin that has provided a NEL policy. Exactly how this is done will be specific to each user agent implementation.

User agents will have to maintain a cache of Reporting and NEL policies that they have received. The specifications recommend, but do not require, this cache to persist across restarts.

Link to implementation experience and demos

NEL is based on Google’s experience with the Domain Reliability feature of Chrome, which Google has been using to generate an opt-in client-side reliability signal for traffic to its properties. Google has found this signal to be very useful for detecting network outages that affect its users’ ability to reach its services. NEL is an effort to standardize Domain Reliability so that other web operators can take advantage of this signal.

Data

Reporting and NEL are new features. They have been implemented in Chrome, and we are awaiting final approval to ship it in Chrome stable.

Google will start to use the new NEL standard alongside the existing Domain Reliability mechanism. We anticipate that many other web operators will activate this feature to collect client-side reliability information about their services.

Security and Privacy

NEL is intended to generate data equivalent to server-side request logs (e.g., an Apache access.log). This has two important security and privacy ramifications:

NEL reports about an origin should only be visible to the operators of the server(s) that receive traffic for that origin.

To support this, NEL policies are delivered in HTTP response headers, and can only be used for traffic delivered via HTTPS. This ensures that we only act on NEL policies that are received from the legitimate owner of the server in question.

This also means that a web page cannot use NEL to collect information about outbound cross-origin requests that it makes. NEL is used to monitor inbound requests to an origin, and not outbound requests from an origin.
NEL reports should only contain information that would be visible to the server receiving a request to the origin.

As an example, the IP address of the DNS resolver that the client used is not present in a NEL report, because server-side monitoring would not be able to see that information, either.

The specifications also require user agents to allow users to opt out of collecting and reporting this information.

Accessibility

This feature has no direct UI or UX impact, and therefore no accessibility concerns. The data generated by this feature will be imported into existing monitoring and observability platforms; all accessibility concerns about how to present this data are the responsibility of those other platforms.

Internationalization

This feature has no direct UI or UX impact, and therefore no internationalization concerns. The data generated by this feature will be imported into existing monitoring and observability platforms; all internationalization concerns about how to present this data are the responsibility of those other platforms.

yoavweiss · 2018-06-28

Thanks for filing an intent. LGTM!