A lot of APIs - including some major ones like audio playback - are limited to a “user gesture” (i.e. a synchronous call within a user input event). For legitimate web apps that want to do something like play audio at the first opportunity (e.g. for a game title screen), there is currently an undocumented set of magic events that count as a user gesture.
Currently as far as I can tell the list of events that count as a user gesture includes:
click, keydown, touchend, pointerup, gamepadconnected (?)
And - bizarrely - “poll every gamepad regularly and check for any button more than 75% pressed”, according to a recent Chrome commit.
I am fairly sure there are more, but I can’t work out which they are. Additionally the list has changed over time, such as how mobile browsers used to allow audio playback in touchstart, but changed it to touchend. New APIs could potentially add more kinds of user gesture events as well.
This makes it unnecessarily complicated for legitimate web apps to use user-gesture limited APIs at the first opportunity. Additionally it has a high risk of breakage if any of the events are changed, such as when touchstart was removed in favor of touchend - or even with subtleties like tweaking the percentage gamepad buttons must be pressed in order to qualify as pressed for a user gesture.
If browsers fired a “usergesture” event at the same time as any of these other magic events that qualify as user gestures, it would simplify web app development, and future-proof usage against further changes to which events are the magic user-gesture-qualified ones. This does not make it easier to build abusive content; such content will just listen to the full list of magic events. It only makes it easier to build legitimate, future-proofed web apps.
Just noting that the issue around this is well-known https://github.com/whatwg/html/issues/1903 … an independent event might be a good idea tho, and might make things a bit more future compatible (without needing to stretch the definition of various things … like a “touch” being a “click”, etc.).
Yes, the lack of a proper spec on user gestures and the behavior gap among major browsers is frustrating. In Chromium, we are currently exploring a new design to hopefully fix the interop problem, see the github issue for details.
Here is our thoughts on a dedicated event for user gesture:
We considered the idea before and found that the core challenge remains the same: defining the “raw events” that would triggers this new event. Without a consistent set of raw events across browsers, the new event won’t magically fix the interop problem. We believe the web should settle on a consistent set of those raw events first—we made little progress so far mainly because we are focusing on the underlying design now (see the design doc above).
In our new design, we would need to fire the new event to every ancestor frame of the activated frame. This could be misleading because only a single activation consumption attempt (e.g. the first
window.open() call) would be successful.
The GamePad API is inherently polling-based (vs event driven) on the device side AFAIK, so there is perhaps no way to fire a
usergesture event without polling from the browser.
On a related note: we are planning to expose the user activation state of every frame as an attribute of the corresponding
window object. Hopefully this would address some of the use cases people have in mind.
I agree that it’s important we make it easy for developers to tell when a page is considered to have user activation.
Mustaq, on the gamepad discussion, are you saying that if a user presses a button, this is only treated as a user gesture in chromium today if the page happens to also be polling for button state? I.e. doing the polling has the side-effect of enabling gamepad-generated user-gestures? If so, I agree that’s problematic. Still, maybe if that’s actually our behavior already (which is already web-visible) then we should expose that by firing an event only in those cases?
Maybe if we limited the design to an event which signals THE FIRST time a given frame has gotten activation, that would address a number of the use cases (not for consumption cases like pop-up as you point out) while being relatively simple and low risk from a performance perspective? This wouldn’t prevent us from potentially doing something more for the consumption cases in the future.
Firing an event at the very first activation of a frame sounds reasonable if it really helps some use case.
I still don’t see a good solution that “fits” the Gamepad API because, as you mentioned, it’s problematic that a frame’s activation state would still be a side-effect of polling.
One possible way to avoid the perceived “side-effect” is that after a gamepad is connected, the browser can possibly poll at a slow rate (say once every few seconds) until either the page has polled the same device, or any frame in the page has seen activation (whichever comes first). This looks hacky but I guess it is acceptable because otherwise sites with background-intro-music would be encouraged to poll the device at an arbitrary rate which is clearly worse.
FWIW, W3C TAG guidelines suggest using a promise instead for “one-time events”. This has the nice effect that it’s easy for authors to wait for this even if their script runs after activation, which is hard with events.