[Proposal] HTML Ping for navigation


#1

in amp or webpackaging, the origin server can’t gathering access log because request doesn’t reach to origin server.

currently it’s solved by analytics library like 1px img, send beacon etc (like google analytics).

on the other side, <a ping> attribute sends ping to server when clicking links. the benefit of adding ping attr spec into html as declarative attribute was

  • opt-outable
  • no extra library
  • works fine if JS are disabled (by feature policy or amp policy)

I think its good to use <a ping> way when user navigate to page, for gathering access log if contents served from other server like amp cache or webpackaging cdn.

for example

<meta name="ping" ping="https://example.com/ping">
  • opt-outable
    • user can configure browser like do not track
  • contents provider can gather access log to contents
    • even if it’s served other server by signign or amp
  • no extra script

polyfill can build upon Send Beacon API at onload/domcontentloadded event like current analytics scripts are doing.


A small issue I have with PWA's
#2

I love this idea - it’d simplify analytics tremendously, while making tracking opt-outs trivial for browsers to implement. Here’s some of the major wins I see:

  1. It’s the kind of tracking I would find optimal - it’s 100% user-controllable, and users could select which tracking sources they are okay with (one might be okay with Google Analytics, but not necessarily some questionable third party they’ve never heard of), even without the website’s permission. They wouldn’t even have to trust what the website says to know what trackers are available, provided the site developers use that API, and it’d offer the more technically inclined the ability to inspect what data is actually being sent about their activities. It’d also allow browsers to make decisions more easily to blacklist trackers with poor (and often illegal) data handling practices. Let’s just say even if a site isn’t complying with GDPR, you don’t necessarily need them to tell you what’s actually going on.
  2. It’s possible for browsers to optimize and streamline the data gathering, so it takes far less memory and CPU resources, especially for the common case of page load and link tracking. (This is one of the two main issues I have with trackers - the other being unnecessary invasiveness.) They could even take advantage of HTTP2 by putting things like tracker methods as headers, to take advantage of guaranteed compression, and by keeping tracking down to a single connection per source. They could also push this off the main thread as a low-priority worker thread or task, ensuring it doesn’t interfere with what’s important, without having to delay sending all the data until page leave. (Having no extra script is nice, but it’s insignificant compared to this.)
  3. It’s possible to still do analytics even when scripting is disabled or just not supported (AMP isn’t the only scenario, although it’s the most popular one). This means the main reason ads use JS at all is eliminated, and ads can revert back to HTML/CSS only (allowing sandboxing), thwarting most of the malvertisements that exist today without affecting legitimate ones. (Ad distributors would still want JS to check that ads are being served properly, but they don’t need to trust third party trackers’ JS otherwise.)
  4. Many of the analytics stuff are highly duplicative of various HTTP headers and HTML data browsers already have to interpret, like site referrers, screen resolution, viewport size, document encoding, etc. Of course, some escape hatches will be required for proper initialization (the initial fetch should be delayed until DOM ready, so settings can be set).
  5. Error reporting could become a lot more efficient. If you hooked into the HTML error reporting system, you could tell the browser to report errors to a particular location as a certain type of ping, rather than going through the complicated mess of an entire script load. You could even send async stack traces by default. It’s not like they don’t already have 90% of the client-side stuff implemented for this just to comply with the spec, so the remaining 10% is pretty much cake. Plus, all this could be queued up off thread.

I could see as web sites migrate from the existing patchwork to this kind of system, transparency would increase tremendously, and independent tracker scripts themselves becoming increasingly suspect in general, especially if they continue to require a script by default.

I do have a few requests:

  1. It would probably be better as a <link rel="ping" as="name" href="..."> instead - it makes more sense as this than as a meta element, since the ping endpoint is conceptually an external resource. It’s also possible you may want to notify different trackers about different events, and this would become especially relevant for
  2. This should be capable of notifying on initial page view, too, as well as on element hover (by option). 99% of analytics boils down to either page load, element click, or element hover, so this is absolutely required.
  3. This should include enough info with the pings to link one page to another, with the ability to control how sites can view these transitions. This could be done by two things: requesting an ID that’s per-document (not per-window), per-name from the URL, used exclusively for such pings, and with page transitions, sending instead a combined ping with both the old and new ID rather than a single link ping + a subsequent request for a new ID. The reason the ID is requested from the ping server is so it knows to set up a “session” as necessary, to help make session forgery harder to accomplish.
  4. There absolutely must be a DOM API for this, for manually sending pings and controlling what data is sent with them. It can (and should) require direct user intervention to be used, but for dynamic web pages, not all pings can really be traced to a direct, clear route that maps one-to-one with the URL. It should also allow sending more than one kind of ping, sending a ping with data, setting default options for the ping (with some browser defaults for referrers, etc.), awaiting the processing of a ping, and resetting the session (request a new ID without implying a new visit). Note that these can only error if the alias is wrong, and that browsers should make efforts to thwart detection of disabled tracking.
  5. There needs to be a means to control cross-domain tracking. This could be done through extending CORS pretty trivially with a request header, and browsers could also control this.

The first three could be done as part of the initial ping sequence, but the fourth would mainly exist to replace existing programmatic APIs, and the fifth would be asserting web security and extending it to tracker protection at a much finer grained level than what exists today.


BTW, as you already pointed out, there is the send beacon API, but it’s kind of ridiculous to require a script to be loaded every time just to do basic analytics and data gathering that could be almost trivially handled and optimized by browsers. (Browsers know better than scripts when downtime actually exists.)

One other thing: GA in terms of this would become slightly different: you’d need to add two lines instead of one, and their script would become a module you’d want to bundle instead for 99% of cases where you actually need their plugin and multiple-tracker system.


#3

Could this not make use of https://www.w3.org/TR/reporting/