Extending HTML As a Hypermedia

Problem

HTML has advanced in many ways over the last two decades, but it hasn’t grown much as a hypermedia. In particular, anchors and forms are still the primary way to drive network activity via user interactions in pure HTML-based applications.

Due to these limitations, javascript has come to the fore as the premier technology for building advanced web applications. Unfortunately, as a consequence, much of the original power and simplicity of the web is being lost. In particular, Hypermedia is no longer The Engine of Application State. Rather, application state is often maintained in in-memory javascript stores and the DOM is reactively updated based on that state.

A Potential Way Forward

I have been working on an open source javascript library for about a decade now that tries to address the shortcomings of HTML while still remaining firmly within the original, REST-ful model of the web. It began as intercooler.js and last summer was renamed to htmx, after I broke the dependency on jQuery.

The core idea is to use extended HTML attributes to drive server interactions, and allow for the replacement of elements within the DOM, rather than whole-page refreshes, based on HTML content returned from the server. This very much satisfies the REST constraints, and extends HTML as a hypertext (hence htmx).

htmx demonstrates that, with a few declarative HTML attributes, you can express many common UX patterns that would normally be done with javascript.

The most significant attributes are:

  • hx-get, etc. - specifies the url to issue a request too
  • hx-trigger - specifies the event that triggers a request
  • hx-target - specifies the element to update in the DOM
  • hx-swap - specifies the manner in which the returned HTML content should be swapped into the DOM

Here is an example that demonstrates the “Live Search” pattern, using only a few attributes:

https://htmx.org/examples/active-search/

Moving HTML Forward as a Hypertext

It seems to me that much of the functionality of htmx could, and perhaps should, simply be part of normal HTML. There is no reason a hypermedia should have to replace the entire document on navigation, and there is no reason why only clicks and submits should be able to drive network interactions.

The particulars of the htmx attributes don’t matter nearly as much as the general idea of revisiting HTML qua hypermedia and thinking about how we can drive it, and, therefore hypermedia-based application development, forward.

10 Likes

Some concrete web platform features that would further this vision, listed here in order of far-fetchedness:

  • Allow arbitrary HTTP methods for forms, not just GET and POST
  • Allow specifying events other than “click” for activating links and buttons
    • perhaps specify a different href for different events
  • Allow any element, not just iframes, to be a target
6 Likes

I like this concept a lot. My only issue is with grammar :nerd_face: The phrase “HTML as a hypermedia” hurts my brain, like fingernails on a chalkboard, because “media” is a plural noun. How about “HTML as hypermedia”? (“a hypermedium” would work, except I’ve never seen anyone use that word before…)

2 Likes

I really like the idea of HTMX and the other libraries which are similar to them like behavior.js, jQuery Unobtrusive Ajax, unpoly, and Hotwire (Turbo).

I wanted to make it simpler to work with so, I added to these libraries (not full featured like the others) called HTMF with the heart of it working with forms so it can work without the JS but be progressively enhanced with JS. One major change from the way HTMX works is (and native HTML targets) that it relies on the return HTML snippets to determine where the new snippets should be added too based off of the id attribute. I found this to be simpler and easier to work with.

I’ve only been programming like this for a short period of time, but really like this pattern of programming as it is much simpler. So, it would be a great addition to the HTML spec having something native to the browser rather than having to roll our own libraries.

One thing I would like to experiment with is also, not having to pass the state up and down but use HTML to keep the state on in the browser.

2 Likes

We all have our gripes with the trifecta of the Web Platform. The path to addressing what we can is not to throw a hundred darts a dartboard. That will only lead to mass injury and property damage. It also just won’t be taken seriously.

When asking for changes, try to keep the scope as minimal as possible. Show a use-case where the web doesn’t work, or is overly convoluted, and find a way to iterate on that. This kind of proposal is just too vague and open-ended to be resolvable.

It is way easier to get the community to rally behind an isolated solution to a problem. That in turns fires up user agent vendors to implement those ideas. When the attempt is made to “solve everything in the browser” no one has the attention span for it. That is why user-land code exists. That provides the space to prove these ideas (as you are trying with htmx) and get wide community adoption. Then as jQuery did to some degree before, the best and most useful ideas can be absorbed into the standards.

Further there is no context as to how this aids end users. Only marginal gains from appearances in the developer experience. Which, while valuable in it’s own right, developer experience is generally outweighed by end user needs.


From one example of HTMX:

<div hx-get="/clicked" hx-trigger="click">Click Me</div> from the htx-trigger doc page linked to.

I’d just like to ask what is wrong with <a href="/clicked">Click me</a>?

Let’s stop re-inventing the wheel in ways that are worse for users. This htmx stuff may be beneficial in some areas, but clearly can also introduce dramatically bad accessibility and usability.

The circumstances around each alternative may not be so generally trivialized. Hence why JavaScript exists to supplement the natural DOM capabilities.

Portals is one attempt to try and solve this type of problem. Perhaps you should support it, help test and refine where possible, and champion it in the community?

This can be said by any library/framework developer. Everyone thinks their method of abstraction is so great everyone should just have it.

Was it ever? HTML is by definition stateless. The content is sent to the browser, engines convert it into the DOM, then trash the HTML content. As far as I am aware state has never truly existed in markup. If this understanding is incorrect, please show where it does happen.

This is a whole separate discussion on it’s own. Good concise topic. Probably best to make a thread specifically for it.

Ok, this is at least a small piece as well. I’d immediately question why it is needed at all since click also is triggered by keyboard events for accessibility. Regardless, this also merits its own isolated discussion.

Another good concise discussion point. I don’t know what you mean by being a target of something. That needs some elaboration. Probably make a thread discussing this in particular to and show an example of why it is needed and how it is useful.

Thank you, this was intended to spark a broad discussion on html as hypermedia, I will try to create more specific topics if there is enough interest in the idea.

Nothing at all, that’s just a conceptual introduction to the htmx library. I would refer to the live search example as a more serious example of what could be accomplished within a hypertext that has more expressive power, and that does add value for the end user (as google adopting this UX pattern for search demonstrates)

But this is different: this is my library… :slight_smile: htmx is different than most other javascript libraries in that it attempts to extend HTML purely as hypermedia, and thus staying within the original REST-ful network model, rather than requiring scripting and the increasingly popular client-server model that has taken over.

I refer you to section 5.1.5 of Roy Fielding’s dissertation on the web architecture.

REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.

Hypermedia as The Engine of Application State (HATEAOS) is a core component of the Uniform Interface feature of REST. The more state is managed outside of HTML on the client side, the further we move away from this architectural style.

By “target” he means having an HTML HTTP response replace an element (i.e. target an element) within the DOM, rather than the entire DOM. Again, I think the live search example shows how this can be beneficial for end users by providing a better user experience when compared to a full-page refresh, while still staying within the original REST-ful model of the web.

Thank you for your feedback, I will consider it and try to come up with some more specific suggestions as time allows.

:beers:

1 Like

Google (and many other orgs) can adopt this right now with no new technology. So what for the end user is superior to what we have now? That’s what needs to be answered.

This does not answer the question of where HTML holds state. HTML is a language to purely transfer a text document structure from a server to a client. It is stateless (although a server may inject pieces of state into the content.) The technologies around it, such as HTTP (i.e. REST is applied on top of which you linked to), manage state. HTML itself however does not hold any in a browser.

I do have an idea in my mind that’s somewhat related to this thread because it could be a possible primitive that allows someone to implement something like HTMX on top of it.

What I think what’s missing in the set of web APIs is basically a way to attach some JS code to a group of elements. By that I mean writing some code that runs automatically for every element of a specific type that a) is present in the DOM or b) will be present in the DOM any time in the future.

The idea is very similar to WebComponents, but unlike those my primitive would work on existing non-custom elements. As an example: That idea would allow me to write a function that get’s executed for every element with a class of read-more attached to it (e.g. <div class="read-more">).

Currently, I can select all instances of those elements by using document.getElementsByClassName() and run a script on it, but this does not run my code for instances of those elements that get added to the DOM later on. The solution to that would be to leverage the MutationObserver, but it’s API is very low-level, so it’s difficult to implement this using that API. (I know that because I’ve tried to use it to put my idea into a library.)

Where is this needed? Potentially on every web site that get’s rendered a) on the server (by CMSs or server frameworks) or b) statically (plain HTML, static site generators).

Why do I think is this needed? It basically allows for progressive enhancement - i.e. deliver basic functionality that does not require JavaScript (users are still able to read content or interact with an web app using server-side actions, forms etc.), then add more fancy features using JavaScript if possible. (I actually don’t know if WebComponents are well suited for this.)

What are the use-cases? This approach could be used to write basically all UI related code of a website (image sliders, slide-in-navigations, lightbox galleries, …). One could attach a script to input fields to do something like masking or custom validation. You can see this like “native jQuery plug-in like constructs”. But they would be more robust, because they initialize automatically. And all you need to do to enable or disable the code behind a feature is to add or remove a class to or from an element. Those features could also very easily be shared with others as libraries. Using this API library authors don’t have to take care of automatic initialization individually, which also reduces the size of such libraries or the work library users have to do.

Actually, I even got inspired to this by other technology, mainly game engines like Unity and Godot. There you can place nodes/entities (which would represent HTML) on your canvas, then write and attach scripts/controllers (which would represent JS) to some of your nodes.

A minimal API could look like this:

// Register some logic for any element with the class `read-more`.
// I don't know what I should call them aside from "controllers".
registerController('read-more', {
  onMount (element) {
    // Runs when an element `.read-more` gets attached to the DOM.
    // `element` is a reference to the current element.
  },
  onUnmount (element) {
    // Runs when an element `.read-more` gets removed from the DOM.
    // Here you can clean up resources.
  }
})

The first parameter could also be a css selector. That way developers would be more flexible.

It might also be useful to observe changes on any of the element’s attributes. But I don’t know if this should be part of this API. Something like Element.prototype.observeAttribute() could be handy here.

Differences to WebComponents would be:

  • Logic applies to existing elements, which could also be more complex ones like form, input, iframe, img or even WebComponents, and you can access every element directly from within the API’s callbacks.
  • You can attach more than one piece of logic to one element. You need to add one custom element to the DOM per piece of logic with WebComponents, which makes the HTML more complex.

Are there any thoughts on my idea?