Several browser APIs have the restriction that they only work with a user-gesture (e.g. a user input event like “click” or “touchend”). For example off the top of my head:
- window.open usually requires a user gesture otherwise the popup is blocked
- HTMLMediaElement.play requires a user gesture on mobile if it has an audio track
- in Safari, AudioContext remains blocked until a user gesture
- copying to the clipboard can only be done in a user gesture
There are probably a couple others I forgot.
This imposes a pretty tight restriction that you must use the given API synchronously with the event handler, or it is not allowed. This meshes badly with the fact modern web APIs are increasingly async-only. For example you cannot read a blob synchronously; you must use the asynchronous FileReader
API. This is generally a good idea for performance, but becomes quite a problem if you need to do something async before making one of the restricted API calls.
For example the following use cases cannot use the restricted APIs, even though they began with user gestures:
- user clicks a button, app processes some data asynchronously, then opens a popup window to display the result: popup window blocked
- user clicks a button, app makes a network request, upon success/failure plays a sound: playback blocked
- user presses Ctrl+C, app prepares some data to copy asynchronously, then copies to clipboard: copy blocked
In every case the call could have succeeded if it did it immediately, e.g. it could play a sound upon pressing a button that makes a network request, but not upon the request completing, even if it is mere milliseconds later. In some cases it is almost impossible to resolve, e.g. if a blob needs to be converted to text before copying to clipboard, it is impossible to copy it to the clipboard.
IMO this is unnecessarily restrictive and there is no reason to block such API calls if they come shorty after a user gesture. I can think of two ways to resolve this:
- browsers relax restrictions, perhaps based on a timer, e.g. API calls are allowed up to 5 seconds after a user gesture. However this kind of hard limit will still cause the same problem for anything which takes longer.
- add a new API.
For the new API, user gesture events could perhaps provide a token that can be used to use restricted APIs later on. E.g.:
mybutton.addEventListener("click", e =>
{
// get a token that proves we were in a user gesture
let token = e.postpone();
doSomethingAsync().then(sound =>
{
// outside user gesture, but using token allows
// restricted APIs until returning to event loop
token.use();
sound.play();
});
});
This works like this:
- user-gesture events like “click” provide a
postpone()
method which returns a token. This token can be used to prove that we did previously have a real user gesture. (Obviously synthetic events will not provide this, or will not return a valid token.) - the token can conveniently be preserved until later via closures.
- when some work completes, the token can then be used to unblock restricted APIs via its
use()
method. Tokens can only be used once. This effect only lasts until execution returns from JS. (Is that called a “tick”? Basically the effect is temporary and is only intended to allow immediately subsequent calls to work.) - immediately after
use()
, restricted APIs like playing audio are allowed to work. They are blocked again as soon as execution returns to the browser. The browser may also impose restrictions onuse()
, e.g. it must be used within 30 seconds.
This approach doesn’t require changing lots of APIs, and allows web apps to circumvent some of the more annoying restrictions for legitimate use cases.
I don’t think this helps anyone who wants to abuse these APIs. For example if a page wants to open a popup window, or annoy a user with a loud sound, it will simply do it in the original user gesture. As far as I can think of, blocking these APIs shortly after the user gesture only impedes legitimate web apps.
Sometimes there are crazy workarounds to these restrictions, e.g. open a popup window and leave it empty until the async data is processed, then update the window; or play an audio sound immediately that begins with silence, then seek to the sound start point when you want to play back the sound. This shows that in theory this API does not actually make much possible that was not possible before. However it does make it a lot more convenient to use certain web APIs for legitimate purposes.