“Profiled HTML (e.g. AMP + friends)”

chaals · 2017-01-27

TL;DR: A standard for profiled Web Content would be helpful for a lot of things.

Detail… There is a long history of profiling HTML. In some cases these are quite “niche”, in others they are pretty general.

Many of these have been “mobile-oriented” - WML, iMode HTML, cHTML, XHTML basic, AMP, and various others, where performance is often an important motive.

Other use cases for restricted content include web-based email services and other user-generated content such as blog posts or comments, where security is generally the major concern.

There was also the effort to standardise a modularisation of HTML / XHTML. That seems to have been a learning experience rather than a big long-term success, although it is clearly in the same general lines as the current crop.

Rendering efficiently in “mainstream” services - which now includes most of the above scenarios - is part of the motivation for a current group, including AMP, Facebook’s Instant Articles, Telegram’s Telegraph, and so on.

In many respects these seem to be a modern and more effective approach to modularisation - taking advantage of better standards and the “extensible web” to actually work.

A standard way to work with profiles should

improve performance and security
give third-party developers confidence to build tools
help authors to build content that works on multiple platforms
reduce the amount of work required for accessibility
help decentralise innovation

The goal of a standard would be to let different forks be built from an agreed baseline, with a framework for incorporating innovations that lets implementations develop independently, but work interoperably.

It’s also, obviously, critical to have something that is actually useful. Maybe one reason the old modularisation effort failed was that it didn’t offer much to content authors or tool developers (including browsers), so there wasn’t an obvious motive for adoption except a top-down directive from some industry, which didn’t really happen.

AMP seems to be sufficiently mature, well-known enough, and appropriately licensed to be a basis for further development. It also seems to be a practical starting point, with a path for extending it in various directions, and people who have been working with it enough to think about it.

But it is an implementation not a standard itself, and there are a couple of problems in making another implementation of it.

As an example, the requirement for sites to declare a CORS policy makes it very difficult to effectively create a new implementation, since by default it will be blocked from the ecosystem. This is a pretty serious limitation for something to become a general standard.

To move from a piece of code to a standard that lets people make interoperable code, we need to document how to make extensions, work out the difference between things that are fundamental and need coordination, and things that can be done in implementations without any real impact on other implementations, …

…and of course people who are interested enough in the idea to work on it.

KirbyZhou · 2017-05-04

I’m the the architect, come from SOGOU.COM. We think a new standard like AMP is very useful for our searching application. It can help us more easily to find out high quality content, and help user to access things much faster.

axbing · 2017-05-09

I’m Chen Binghui from UC Browser. Firstly I agree with Chaals, profiled HTML is very useful to informal website and search engine, a standard for profiled HTML is helpful. But in the implements of AMP and MIP(developed by baidu), google and baidu use their own CDN to cache third-party’s assets, and this kind of mechanism is important to AMP and MIP, I think this is something kind of strange to make standard.

chaals · 2017-05-09

There are several pieces that it is important to make work across the web, that we should think about how to standardise.

The cache is an important part of the ecosystem, but I’m not sure that we need to standardise the CDN system.

On the other hand one of the problems that smart people like Andrew Betts and Jeremy Keith have explained is what URL is shown as the source of the content. One possible solution is for an AMP cache to tell the browser it is serving an AMP cache, and where the original content came from, and for the browser to do some of the work of ensuring the user has sensible information.

Another issue is enabling different providers to develop their components. In order to make the ecosystem work for both caches and content providers, there needs to be some way of adding new caches seamlessly, without content providers having to know in advance which search engine has a caching service, and being able to choose components that best suit their needs.

chaals · 2017-05-29

See also Andrew Betts’ alternative suggestion …

tayqassqan · 2017-06-07

I’m a engineer from MIP Team in Baidu. Firstly,we are in favor of AMP becoming part of the web standard. Before that,there are some problems to be solved. MIP(Mobile Instant Page) is a similar standards with AMP.We are actively promoting the integration of AMP and MIP. In the future,we think the following question is that AMP needs to be addressed:

More friendly web amp-viewer implementation Now web amp-viewer is based iframe,Easily lead to a lot of compatibility issues.
Open CDN Cache specification We hope that more companies/organizations will be able to join the AMP Project and provide the developer with the same CDN Cache standard and access.
part of amp features move to browsers. We hope that some of the amp capabilities can be achieved through the browser.such as amp-url to site-url mapping.

The solution to the above problem is the main direction that we will continue to explore with the amp team and more companies that join amp.

chaals · 2017-10-10

Note that there was a workshop held in china recently by some companies trying to work out how to do this.

triblondon · 2017-10-16

Hey @chaals. When you say ‘profiled HTML’ are we talking about web subsetting, essentially? Most of the technologies you list are either subsets or more limited alternatives to HTML. Subsets can be tempting for a number of reasons. not just perf and sec, but also to reduce implementation complexity or dependency requirements on cheap devices.

On the TAG, we’ve looked at this and found that subsets cause far more problems than they solve. We wrote a finding on the evergreen web which advocated that web content should be able to use the entire platform, and that the whole platform should evolve as one.

Regarding Google AMP, the motivation of that project is twofold: to provide a means for developers to create fast-loading pages, and to allow search engines to pre-render results in a safe and privacy-respecting way. While AMP achieves these objectives, it’s integration in Google search incentivises a use that is contrary to the fundamentals of web architecture, and prompted the TAG finding on distributed content. An alternative which would achieve the same objectives in a more interoperable and fair way would be:

Don’t prefer or distinguish content based on a format subset, instead, set a specific benchmark, eg we want to see content with a speedindex of <2000. Authors can use any technique within the web platform to achieve that.
Don’t advertise content at non-canonical URLs. Link directly to the canonical source. Promotable iframes might make that a smoother experience.
Don’t proxy content via a non-canonical URL in order to hide the delivery of the content from the author. web packaging seems the ultimate solution to this though it may be some way off.