Exposing a user's input modality


But if you read the piece you’ll note:

Browsers have been experimenting with variations of this since at least IE7. Implementations vary a bit, and browsers are still trying to strike the right balances, but overall the idea is consistent and works well: A couple of billion mostly unaware folks using Web browsers for the last 8+ years have proven it out.

In other words – the web actually does work this way “out of the box” and it works that way because of years and years of user-feedback and testing. I’m not sure how you could make the statement that this is encouraging apps or users anyone to live in one mindset, this is about responding to the fact that there isn’t one. The issue is that: a) “out of the box” styles frequently don’t work with designs. If there is a native outline that is blue that works fine on a white background - but maybe your site has things that rest on a background that is blue and it doesn’t. The Web today gives you levers and switches to tweak this, but in doing so all of the lever options are nuclear - they break the “out of the box” behavior. b) There’s no good plan for things that don’t already come in the box - if you make a new component - or in my brightly envisioned future you import and use non-standard but competing ‘slang’, there’s no good/reasonably simple way to get them to play nice.

The proof here is that lots and lots of pretty web smart folks have tried (and often failed) to solve this problem independently, have a look at the twitters and articles like Marcy Sutton’s Button Focus Hell. These are not new authors who are missing a simple thing about the Web, this problem is frequently evident in things built by frameworks and libraries that build the foundations of the Web today.


I’m suggesting simply “lowest practical level” - definitely not that we need a new primitive, but rather explain an appropriately higher level thing atop existing primitives. Pseudo-classes are generally speaking concepts of state machine wrapped around paired events (:focus is from the focus until the blur for example. Media Queries are similar but generally more tied to an algorithm in which you enter and leave the state. It feels to me like the later is more what we’re doing and is also more flexible (you can, for example, use matchMedia and mediaQueryListener to provide further enhancements this way) and actually simpler in many respects both implementation and use-wise. For example - implementation wise if you want to actually abstract up a simple layer and create a pseudo-class (lets call it :keyfocus) we need new events and all kinds of stuff about bubbling and cancelling and observer perf and so on might get more complicated. Use-wise, note that this works (assuming it is defining the algorithm as I said) it becomes possible to write global rules which is often way way better than trying to somehow match the complexities of what browsers do on each element. If we come up with a new element which is an interactive control, or even if your site has just never used a particular interactive element that exists today, and then someone uses it in your app where you’re targeting focus ring branding based on specific element instructions, this new one will look different. Things like branding are what rules based systems like CSS are actually really good at dealing with in forward compatible/generic ways. If it were a pseudo-class though, what would it look like to add a global rule and what would that mean for bubbling and so on?


I’m not sure I follow - which events do you mean, and how?

Gotcha. Obviously if the modality isn’t being used it’s not a problem, but definitely worth thinking about the mixed modality case. I think we can solve this particular case already in the current framework by allowing you to specify a mixed (as opposed to context-dependent) modality on the containing element, just as you specify a keyboard modality on a custom text input.

Yep, exactly.

Agreed. Hopefully we can get a bigger win from having the UA style sheet only apply the focus ring for keyboard modality, so people are somewhat less incentivised to roll their own focus styles.

In the sense that it could equally be used to refer to things like orientation, device etc?

Excellent question :smile: My instinct is no, but that’s definitely something we need to figure out.


This all makes total sense to me. Personally, I’d prefer the name input-modality/input-mode, since as mentioned, modality is slightly vague.

@tabatkins: Thoughts about adding something like that to the MQ spec?

With my Blink hat on: @aboxhall, if you’re interested in playing around with an implementation in Blink, I’d be happy to assist with design and reviews of the MQ parts.


I am more than fine with input-modality - tbh we’ve used that in discussion and even the title of the post, and — I think even in the text itself at one point. Honestly, I think we were just trying to save characters in actually expressing it but it doesn’t matter to me if we think it’s worth the extra 6 chars to be clear about it.


My only concern with input-modality is that it sounds to me like it’s input in the sense of <input> (i.e. data input), whereas the concept we’re shooting for is more like “interaction”. I do agree that modality alone is vague, though.


One concern I have with the current proposal, is that it appears to be relying on the premise that the focus ring will be prevented be default. As per all your examples using the primary :focus { outline: none; }.

Unless you go to the expensive way to polyfill the modality MQ, that would seem to be lacking backward compatibility with older browsers stuck with only understanding :focus { outline: none; }?

I am thinking that perhaps a ‘non-keyboard’ input modality approach might be better. At least at first. So that it doesn’t further promote preventing the focus ring globally by default, as an idea.

Should design flaws, implementation issues or what not arise, it somehow feels like a reverse solution in that. Allowing a global :focus { outline: none; } is one of the problem we are trying to avoid…


Yeah, didn’t think of that aspect. We could further bikeshed this later on :smile:


Good point. I wonder if we should enable both keyboard and pointer modalities.

@briankardell and @aboxhall: Do you have any other use cases in mind other than the one outlined in the article? (which AFAICT either keyboard or pointer can address, and pointer seems to be better in terms of progressive enhancement)


One important issue as @patrick_h_lauke mentioned, is that a touch explore or sequential focus doesn’t really fall in the pointer or keyboard context. This problem is ever more exacerbated by the fact that none of the Assistive Technology follow the same touchscreen event sequence at the moment. It’s nearly impossible to be dealt with because of that.

On “touch explore”, some browser like Firefox fire :hover but not :focus, some fire :focus, some do neither… And on top of that, they create their own fake outline outside of CSS presentation scope. Honestly as a dev, I sometimes don’t know where my accessibility responsibility stands, and wether I am supposed to show a focus ring or not. Without a predictable environment and a properly defined sequence of events with guarantees that it won’t change. It’s a crapshoot. And a double focus ring prospect looks terrible and redundant.

Just a quick late night brain dump, but perhaps what we need in addition here: is a way to create a bridge of understanding between the forced outlines that AT or the device environment applies, and our CSS :focus styling.

In other words, if could have pseudo and/or MQs modalities such as assisted-focus or hover-intent where we could have semantics like:

button:focus:not(:assisted-focus){} /* I am handling the focus ring */


button:focus:assisted-focus { outline: none; } /* no double focus outlines */

and a less input-centric modality approach like:

@media (modality: hover-intent) {

This could perhaps offer better tooling to deal with both touch simulated :hover(s) and :focus.


(note, that was actually @hexalys, not @yoavweiss, it’s just discourse quote weirdness I can’t seem to fix)


  1. I don’t feel like it’s actually “expensive”, our polyfill is all of 2.5k before minification or gzip… Its all of 69 lines and like 10 of that is whitespace :wink: If that’s expensive, it’s a pretty high bar for calling something inexpensive. The CSS WG is also putting together custom MQs which would make it smaller still if that beats us to being in widespread deployment.
  2. Currently if you don’t include the prolyfill, it works as it always has - if you do it’s specifically because you want to handle things better. Note the prollyfill does a :not() on keyboard here, not just a blind rule, just like in the article. If natively implemented, I would expect UA sheets to add the appropriate MQs to do the right things by default (they already do this, it’s just not via MQ) and then the point seems to be moot.
  3. any author styles can already do (and frequently do) :focus { outline: none; } - at least now they can use an MQ to say “keyboard” or “not keyboard” and old browsers would ignore them.

In other worse, I don’t really think this is an issue - it feels like a red herring.


I agree @patrick_h_lauke had interesting thoughts about whether we could abstract it away from being keyboard, I’m hoping he shares them here. I’m not entirely sure that I agree or disagree - in fact - I kind of wish we had something like attribute selector partial matching here so we could go from less to more specific or something. I do think there is value in treating similar things similarly at some level, and I also think there is value in exposing as much as we can so that the community can actually help figure out how to adapt as new things arise.

Specifically with regard to assisted-focus and hover-intent I definitely don’t understand their meaning/proposal… Explain?


I agree on point 1. Sorry if I wasn’t specific enough. What I mean by “expensive” would be a true polyfill (in the likes of an ajax call re-parsing CSS). I wasn’t applying that argument to the prollyfill. Perhaps because I am not particularly in love with the approach and the use of a custom non-valid attribute…

I guess, I am just arguing in favor of a principle that leave the default focus alone by default, and address the cases where I need to remove the focus ring on a case per case. Because once you’d do that for modality: keyboard, it becomes part of your fundamental CSS approach for everything. I don’t really like that. But it’s not a major concern. I can always come up with an alternative prollyfill reversing that approach.

PS: I’ll definitely explain the assisted-focus and hover-intent suggestion in depth, when I get a chance in a few days or within a week’s time. I feel I am on a good track with the idea, but it’s going to take very long description to explain why it’s needed, what it solves and how it would work. I need to sit on it for a bit, and think about half a dozen use cases to see if that concept can hold up.


We did this with lower fidelity specifically for a few reasons: 1) we’re not far enough along to seriously assume that what we have here could make the jump from prollyfill (speculative) to polyfill without feedback and changes 2) We can illustrate the usefulness in a practical way with a very small amount of code and allow developers to actually use it in production apps - this is really important if we want participation and feedback. 3) upcoming developments of custom mq’s would make high fidelity of whatever design is ultimately settled on similarly small.

With regard to the non-valid attribute, would it help if we made it a data- attribute? I believe we actually did this at one point in the evolution, I’m not entirely sure why we didn’t at least dasherize it, which would at least serve the same purpose and virtually guarantee it was safe. That’s perfectly valid feedback for the prollyfill I think, though you and I appear to be some of the only ones actually worried about this - it’s a pretty common thing actually. If people think that is an issue we should change, I’m happy to adapt it - also, it’s a github project so you can open issues or send simple technically minded pulls about the prollyfill itself.


I’ve done a similar “experiment” called focus-source for pretty much the reasons outlined in the article - by which I’m agreeing that there is a real need for this.

If I understood this correctly, the modality (or input-modality or interaction-modality) MediaQuery is supposed to change immediately whenever I change my mode of input (say, type something with my keyboard “keyboard”, then click something with my mouse “pointer”). That means that a focus-style would not be “sticky” and that may lead to confusing UI.

In my focus-source experiment I identified 3 modalities (I called it “interaction-type”): keyboard, pointer and script. The reason for “pointer” was - as has been pointed out by hexalys - that :focus needs to remain the default style for backward compatibility reasons. My applications usually focus a container/wrapper to act as a sequential focus navigation starting point, so I added “script” to allow the CSS to identify “focus set by application”.

Hexalys hinted at assisted-focus to provide the information if some non-document-accessible styling was being applied by an AT. I think that could/should be what :-moz-focusring is about.

I don’t like the MediaQuery idea very much. I’d have preferred a simple pseudo-class :keyboard-focus (that does not change while an element has focus). But the MQ would play nicely with :focus-within, where a simple pseudo-class would not.

I’m not very fond of the idea to define “keyboard-relevant input elements” (as the polyfill shows) - considering custom elements. The supports-modality="keyboard" attribute is technically not necessary, as anyone could achieve the same effect with current CSS functionality. Also using an attribute like that is limiting to single values - what happens when this proposal is extended to support voice input? Do we want to foster multi-value attributes like supports-modality="keyboard voice"?

I had trouble following the proposal and discussion because I was centered altering :focus styles, rather than “generically dealing with input modes”. I think the former - influencing focus styles - is a problem with a simple enough solution. Dealing with “interaction modality” in terms of, say, a FPS (First Person Shooter), is quite a different beast. Is there only one mode, or can keyboard and pointer be used simultaneously? is the mode constantly switching back and forth between keyboard and pointer if I use the mouse to aim, but the space-key to shoot? Is this even relevant to 95% of web sites/applications? I have all sorts of problems wrapping my head around this.


I actually think the focus ring is the special case in a primarily pointer-driven UI - however, as others have pointed out, it’s dangerous to assume that it only applies to keyboards, since other devices (e.g. a D-pad) also need a concept of focus.


Have another look at the proposal and maybe the prollyfill - play with the demo: Modality is determined algorithmically - currently the proposal only spells out the one for keyboard (and, logically “not keyboard”) but effectively it is a based on what just happened and what you are very very likely do to next because of where it happened. For example - if you even if you click on an <input> the modality becomes keyboard because the only way to interact with this is by sending keys to it.


Perhaps the modalities should be “pointer” and “focus” (with the contrast being that “pointer” modality does not use an element-centric “focus” model)? Or perhaps “focus: element” and “focus: ambient” (the latter referring to pointer-like interaction models where focus is not strongly tracked)?

I think this is actually a good point: this topic should primarily be about focus modality, since that’s the actual use case it’s trying to speak to - bigger ideas like general input modality are a confusion of concerns, and liable to gear-up problems in short-sighted assumptions. (Example: look at what happened when the iPhone came out, and every site that saw a “mobile” UA assumed “mobile” meant “low capability”, so the iPhone had to hide its mobile status as much as possible, outright ignoring styles that had been defined for “mobile” modalities.)

Also, this opens the door (at least in terms of discussion) for more granular notions of focus for input modalities that are better at gauging intent, like eye/hand tracking.

Additional use cases for InputDeviceCapabilities

Also one reason leading me to the conclusion that the focus ring isn’t the special case, is consistency. Because we can’t technically reproduce the default focus ring for sure. Even if the color is faithfully reproduced via -webkit-focus-ring-color or with browser specific rules. That browser or user default may change.

A custom focus ring, styled possibly differently on every site, especially for non input elements, sounds like a very inconsistent and possibly annoying experience for “keyboard” only users. But I am open to the contrary with an accessibility user study or more data on this…


I think this is an interesting direction, although potentially focus isn’t the best name since it’d be easy to confuse it with the existing meaning of :focus (imagine a discussion talking about the :focus pseudo-selector in the context of the focus modality - it all gets a little Who’s On First).

We actually discussed the notion of simply not matching :focus unless we gauged (via the mechanisms discussed in the article) that it was likely to be useful; however, we felt that this would unfortunately potentially break things. (This would honestly still be my ideal world scenario, I think.)