Google has been promoting AMP pages. since it’s existence. It has been marketed much and developers are encouraged to develop AMP pages. This however comes with a caveat, that the url of the page is not correct in the browser and that internet gets more centralized.
Google presented the community with “WebPackages” to fix the problem. However, we believe that it is not the right solution for the problem.
The reason Google hosts AMP it’s self is to guarantee that it follows the AMP standards and that website does not serve different code to useragent based on wither it’s Google bot or a real user.
Document Hash comes with the guarantees of integrity of WebPackages but the ability for the author’s server to distribute it’s self and integrity guarantees to the referrer. Document Hash is the SHA256 hash of the dom elements of the document.
Higher level overview of how it works:
Inside the <a>
element a sha256 hash attribute is added.
example <a hash="SHA256:d04b98f48e8f8bcc15c6ae5ac050801cd6dcfd428fb5f9e65c4e16e7807340fa">
This means that the page that will open once <a>
element is clicked is guaranteed to have the same document hash. Once the user clicks on another link on the document, the hash limits are removed.
Every hash attribute starts with the name of the hashing algorithm.
How to deal with when hashes don’t match? An fail attribute to <a>
can be added.
The fail attribute will have three properties.
- return
- ignore
- inform
example 1 and behavior.
<a href="http://example.com" hash="SHA256:d04b98f48e8f8bcc15c6ae5ac050801cd6dcfd428fb5f9e65c4e16e7807340fa" fail="return" >
What happens when example.com serves different element:
- User clicks on the element from google.com
- Browser attempts to download example.com document
- Hash are not the same hence hash check failed
- User automatically is redirected to google.com
example 2 and behavior:
<a href="http://example.com" hash="SHA256:d04b98f48e8f8bcc15c6ae5ac050801cd6dcfd428fb5f9e65c4e16e7807340fa" fail="ignore" >
What happens when example.com serves different element:
- User clicks on the element from google.com
- Browser attempts to download example.com document
- Hash are not the same hence hash check failed
- User stays on example.com and nothing happens
example 3 and behavior:
<a href="http://example.com" hash="SHA256:d04b98f48e8f8bcc15c6ae5ac050801cd6dcfd428fb5f9e65c4e16e7807340fa" fail="inform:https://google.com/fail?=blablabla" >
What happens when example.com serves different element:
- User clicks on the element from google.com
- Browser attempts to download example.com document
- Hash are not the same hence hash check failed
- User stays on example.com and browser informs google.com that hash did not match.
What about canvas and iframe elements?
Nothing.
Document Hash means that dom nodes and document’s content is hashed. However the content of canvas and iframe are not a part of the hash and they can be dynamic even when parent document hash is same.
What about external scripts?
Nothing.
Google can require external script element to have the integrity attribute which is part of dom hence must be part of document hash.
Unexpected benefits in-case external scripts have hash attribute: Cache based tracking would be reduced.