Exposing a user's input modality


Would make sense. Currently, outline: none is often (undesirably, but quite widely) used to prevent from appearing outline on clicks which also affects keyboard navigation making the latter much less usable.


This is a great idea, and we absolutely need a solution to this. I often find myself calling .blur() after a click to avoid “sticky” buttons, and I know this is harmful to a11y, but the alternative is so complicated.

This could also be a pseudo-class such as :tab-focus, which would be a shorter way to represent key-focused elements (using tab to be consistent with tabindex).

Talking to Brian, the reason for using a media query is for JS events when modality changes, although I think the media query is misleading. If I stop typing and put my hand on the mouse, my modality is technically “mouse”, but there’s no way to detect that. If I play a FPS, what would the mouse do? If I hold command and click, what’s the modality?

In terms of events, what does the media query switching give you over keydown, mousedown & touchstart?


It’s true I like the idea of a MQ here because focus ring is not the only case where I think this is useful as a concept, but the same could be achieved with a real event that the pseudo-class was associated with to (just like :focus). What I’m trying to drive at here is what is the lowest practical level at which we can define this before we start getting too crazy (extensible web wise).

You’ll note though that our proposal is actually quite limited on purpose - it’s only technically proposed a method and keyboard as the modality that can be defined now - in part because we have like a decade’s worth of data on this bit… Over time, if we have more though, there’s room for it to grow. In your above, if you stop typing and move the mouse you’re not (in spec terms) interacting (interactive elements get focus), so by the definition (in the prollyfill) we’ve provided your modality would remain keyboard until you cause a focus or blur by interacting.


To expand on Brian’s response:

  • As Brian said, having your hand on the mouse is not using the mouse, and it doesn’t seem confusing to delay changing the modality until you are. Going in the other direction, we found it pretty usable to need to move focus once using the keyboard in order for it to show up.

  • I can’t imagine a FPS is going to have many focusable elements, though you could use this to do something like show different targeting mechanisms depending on which modality was being used, I guess? (The tricky thing would be detecting a “mixed” modality where you’re using something like WASD + mouse, but I think that’s a question for later on down the line.)

  • Good question on cmd+click - I don’t think we consider modifier keys to be keyboard interaction per se, so I’d expect that to be “mouse”.


For interaction combining mouse and keyboard, note that we have the css :hover pseudo class at our disposal, to remove the focus ring when the mouse hovers on the document.

I was recently trying to find a method preventing focus outlines on Android Phone without affecting AT or keyboard users. I have come up with this slightly crazy, yet functional approach, using any-hover/any-pointer MQ classes:

html:hover:not(.hover):not(.fine).touch.coarse a:focus.ma, a:active, a:hover {
  outline: 0;/* Normalize readability on direct focus and hover, or Android touch only*/

The above means that the context is a touch enabled device (verified by feature detect), with a coarse input (likely touch), no fine inputs (no mouse) and no any-hover: hover (meaning no inputs that hover directly whether coarse or fine). I inject those classes Modernizr style at the document level with an any-hover/any-pointer MQs feature detect. The .ma class being a specific instance of navigation sub-menu toggles which should definitely not have a focus ring in a traditional touch or mouse only context.

The only issue there, is that I can’t seem to find a way to cancel the focus ring on the first touch, only on subsequent ones. Because :hover hasn’t yet fired yet on the document from any previous touches (note: this is speaking of Android, iOS doesn’t show default outlines anyway, and has its own black outlines when using voice over, so it’s of no concern there).

But mobile aside, the simple html:hover technique can work reliably, and address that concern for desktop. From what I can tell, assistive technology traditionally never fires hover. So unless proven the contrary, any inference on html:hover seems like a reliable one.

For normal anchor links, what I do just to be safe, is apply my own custom outline in the html:hover state, but fallback to the browser’s default focus ring if not. Necolas’ normalize rule taking care of any direct hover or active state on those.

So technically, the case mentioned by Jake can be adjusted with the html:hover pseudo method, either as part of the modality prollyfill, or excluded to be handled externally…


While I understand how exposing this property only on interaction would help the problem of sites catering their UI only to a certain kind of input they detect the presence of (and, to be clear, when I say that, I’m talking about sites that sniff for touch to toggle display-on-hover behaviors, as Discourse itself is guilty of), I’m still not convinced that enabling this manner of modal presentation is a good idea. Discoverability suffers enough without apps encouraging users to live in the mindset that there’s only one way to interact with them.

That said, I could settle for letting this be a debate left up to content authors, rather than trying (and failing) to force the all-interfaces-must-be-on-at-all-times opinion onto them.


We already have a low-level way to achieve this using events, so I think the aim here should be to make something easier to avoid people turning to hacky alternatives that harm a11y (outline: none and el.blur()).

Yeah, that’s what I meant by FPS (sorry, that wasn’t clear at all), the interaction is a combination of key & mouse. I guess this would cause modality thrashing, but if that isn’t a performance problem, it’s not a problem.

From reading the explainer, I guess hitting any key would switch to keyboard modality? That seems sensible.

I’m coming around to the idea. One of the problems with :tab-focus is it’s based on how that element received focus, whereas modality can alter the appearance without an additional focus taking place (eg a click in a focused element, pressing a key with no interaction bound).

button { background: white; }
button:focus { outline: none; background: blue; }

@media not (modality: keyboard) {
  button:focus { background: white; }

…would be the backwards compatible way to use this. It’s much lengthier than a pseudo-class, hopefully that wouldn’t discourage usage.

Modality as a term feels vaguer than what it’s describing here, but I don’t have a better idea.

If a mousedown, touchstart or keydown listener prevents default, does modality switch to that input device? Or does modality only switch on click and keypress?


But if you read the piece you’ll note:

Browsers have been experimenting with variations of this since at least IE7. Implementations vary a bit, and browsers are still trying to strike the right balances, but overall the idea is consistent and works well: A couple of billion mostly unaware folks using Web browsers for the last 8+ years have proven it out.

In other words – the web actually does work this way “out of the box” and it works that way because of years and years of user-feedback and testing. I’m not sure how you could make the statement that this is encouraging apps or users anyone to live in one mindset, this is about responding to the fact that there isn’t one. The issue is that: a) “out of the box” styles frequently don’t work with designs. If there is a native outline that is blue that works fine on a white background - but maybe your site has things that rest on a background that is blue and it doesn’t. The Web today gives you levers and switches to tweak this, but in doing so all of the lever options are nuclear - they break the “out of the box” behavior. b) There’s no good plan for things that don’t already come in the box - if you make a new component - or in my brightly envisioned future you import and use non-standard but competing ‘slang’, there’s no good/reasonably simple way to get them to play nice.

The proof here is that lots and lots of pretty web smart folks have tried (and often failed) to solve this problem independently, have a look at the twitters and articles like Marcy Sutton’s Button Focus Hell. These are not new authors who are missing a simple thing about the Web, this problem is frequently evident in things built by frameworks and libraries that build the foundations of the Web today.


I’m suggesting simply “lowest practical level” - definitely not that we need a new primitive, but rather explain an appropriately higher level thing atop existing primitives. Pseudo-classes are generally speaking concepts of state machine wrapped around paired events (:focus is from the focus until the blur for example. Media Queries are similar but generally more tied to an algorithm in which you enter and leave the state. It feels to me like the later is more what we’re doing and is also more flexible (you can, for example, use matchMedia and mediaQueryListener to provide further enhancements this way) and actually simpler in many respects both implementation and use-wise. For example - implementation wise if you want to actually abstract up a simple layer and create a pseudo-class (lets call it :keyfocus) we need new events and all kinds of stuff about bubbling and cancelling and observer perf and so on might get more complicated. Use-wise, note that this works (assuming it is defining the algorithm as I said) it becomes possible to write global rules which is often way way better than trying to somehow match the complexities of what browsers do on each element. If we come up with a new element which is an interactive control, or even if your site has just never used a particular interactive element that exists today, and then someone uses it in your app where you’re targeting focus ring branding based on specific element instructions, this new one will look different. Things like branding are what rules based systems like CSS are actually really good at dealing with in forward compatible/generic ways. If it were a pseudo-class though, what would it look like to add a global rule and what would that mean for bubbling and so on?


I’m not sure I follow - which events do you mean, and how?

Gotcha. Obviously if the modality isn’t being used it’s not a problem, but definitely worth thinking about the mixed modality case. I think we can solve this particular case already in the current framework by allowing you to specify a mixed (as opposed to context-dependent) modality on the containing element, just as you specify a keyboard modality on a custom text input.

Yep, exactly.

Agreed. Hopefully we can get a bigger win from having the UA style sheet only apply the focus ring for keyboard modality, so people are somewhat less incentivised to roll their own focus styles.

In the sense that it could equally be used to refer to things like orientation, device etc?

Excellent question :smile: My instinct is no, but that’s definitely something we need to figure out.


This all makes total sense to me. Personally, I’d prefer the name input-modality/input-mode, since as mentioned, modality is slightly vague.

@tabatkins: Thoughts about adding something like that to the MQ spec?

With my Blink hat on: @aboxhall, if you’re interested in playing around with an implementation in Blink, I’d be happy to assist with design and reviews of the MQ parts.


I am more than fine with input-modality - tbh we’ve used that in discussion and even the title of the post, and — I think even in the text itself at one point. Honestly, I think we were just trying to save characters in actually expressing it but it doesn’t matter to me if we think it’s worth the extra 6 chars to be clear about it.


My only concern with input-modality is that it sounds to me like it’s input in the sense of <input> (i.e. data input), whereas the concept we’re shooting for is more like “interaction”. I do agree that modality alone is vague, though.


One concern I have with the current proposal, is that it appears to be relying on the premise that the focus ring will be prevented be default. As per all your examples using the primary :focus { outline: none; }.

Unless you go to the expensive way to polyfill the modality MQ, that would seem to be lacking backward compatibility with older browsers stuck with only understanding :focus { outline: none; }?

I am thinking that perhaps a ‘non-keyboard’ input modality approach might be better. At least at first. So that it doesn’t further promote preventing the focus ring globally by default, as an idea.

Should design flaws, implementation issues or what not arise, it somehow feels like a reverse solution in that. Allowing a global :focus { outline: none; } is one of the problem we are trying to avoid…


Yeah, didn’t think of that aspect. We could further bikeshed this later on :smile:


Good point. I wonder if we should enable both keyboard and pointer modalities.

@briankardell and @aboxhall: Do you have any other use cases in mind other than the one outlined in the article? (which AFAICT either keyboard or pointer can address, and pointer seems to be better in terms of progressive enhancement)


One important issue as @patrick_h_lauke mentioned, is that a touch explore or sequential focus doesn’t really fall in the pointer or keyboard context. This problem is ever more exacerbated by the fact that none of the Assistive Technology follow the same touchscreen event sequence at the moment. It’s nearly impossible to be dealt with because of that.

On “touch explore”, some browser like Firefox fire :hover but not :focus, some fire :focus, some do neither… And on top of that, they create their own fake outline outside of CSS presentation scope. Honestly as a dev, I sometimes don’t know where my accessibility responsibility stands, and wether I am supposed to show a focus ring or not. Without a predictable environment and a properly defined sequence of events with guarantees that it won’t change. It’s a crapshoot. And a double focus ring prospect looks terrible and redundant.

Just a quick late night brain dump, but perhaps what we need in addition here: is a way to create a bridge of understanding between the forced outlines that AT or the device environment applies, and our CSS :focus styling.

In other words, if could have pseudo and/or MQs modalities such as assisted-focus or hover-intent where we could have semantics like:

button:focus:not(:assisted-focus){} /* I am handling the focus ring */


button:focus:assisted-focus { outline: none; } /* no double focus outlines */

and a less input-centric modality approach like:

@media (modality: hover-intent) {

This could perhaps offer better tooling to deal with both touch simulated :hover(s) and :focus.


(note, that was actually @hexalys, not @yoavweiss, it’s just discourse quote weirdness I can’t seem to fix)


  1. I don’t feel like it’s actually “expensive”, our polyfill is all of 2.5k before minification or gzip… Its all of 69 lines and like 10 of that is whitespace :wink: If that’s expensive, it’s a pretty high bar for calling something inexpensive. The CSS WG is also putting together custom MQs which would make it smaller still if that beats us to being in widespread deployment.
  2. Currently if you don’t include the prolyfill, it works as it always has - if you do it’s specifically because you want to handle things better. Note the prollyfill does a :not() on keyboard here, not just a blind rule, just like in the article. If natively implemented, I would expect UA sheets to add the appropriate MQs to do the right things by default (they already do this, it’s just not via MQ) and then the point seems to be moot.
  3. any author styles can already do (and frequently do) :focus { outline: none; } - at least now they can use an MQ to say “keyboard” or “not keyboard” and old browsers would ignore them.

In other worse, I don’t really think this is an issue - it feels like a red herring.


I agree @patrick_h_lauke had interesting thoughts about whether we could abstract it away from being keyboard, I’m hoping he shares them here. I’m not entirely sure that I agree or disagree - in fact - I kind of wish we had something like attribute selector partial matching here so we could go from less to more specific or something. I do think there is value in treating similar things similarly at some level, and I also think there is value in exposing as much as we can so that the community can actually help figure out how to adapt as new things arise.

Specifically with regard to assisted-focus and hover-intent I definitely don’t understand their meaning/proposal… Explain?


I agree on point 1. Sorry if I wasn’t specific enough. What I mean by “expensive” would be a true polyfill (in the likes of an ajax call re-parsing CSS). I wasn’t applying that argument to the prollyfill. Perhaps because I am not particularly in love with the approach and the use of a custom non-valid attribute…

I guess, I am just arguing in favor of a principle that leave the default focus alone by default, and address the cases where I need to remove the focus ring on a case per case. Because once you’d do that for modality: keyboard, it becomes part of your fundamental CSS approach for everything. I don’t really like that. But it’s not a major concern. I can always come up with an alternative prollyfill reversing that approach.

PS: I’ll definitely explain the assisted-focus and hover-intent suggestion in depth, when I get a chance in a few days or within a week’s time. I feel I am on a good track with the idea, but it’s going to take very long description to explain why it’s needed, what it solves and how it would work. I need to sit on it for a bit, and think about half a dozen use cases to see if that concept can hold up.