A lot of APIs - including some major ones like audio playback - are limited to a “user gesture” (i.e. a synchronous call within a user input event). For legitimate web apps that want to do something like play audio at the first opportunity (e.g. for a game title screen), there is currently an undocumented set of magic events that count as a user gesture.
Currently as far as I can tell the list of events that count as a user gesture includes:
click, keydown, touchend, pointerup, gamepadconnected (?)
And - bizarrely - “poll every gamepad regularly and check for any button more than 75% pressed”, according to a recent Chrome commit.
I am fairly sure there are more, but I can’t work out which they are. Additionally the list has changed over time, such as how mobile browsers used to allow audio playback in touchstart, but changed it to touchend. New APIs could potentially add more kinds of user gesture events as well.
This makes it unnecessarily complicated for legitimate web apps to use user-gesture limited APIs at the first opportunity. Additionally it has a high risk of breakage if any of the events are changed, such as when touchstart was removed in favor of touchend - or even with subtleties like tweaking the percentage gamepad buttons must be pressed in order to qualify as pressed for a user gesture.
If browsers fired a “usergesture” event at the same time as any of these other magic events that qualify as user gestures, it would simplify web app development, and future-proof usage against further changes to which events are the magic user-gesture-qualified ones. This does not make it easier to build abusive content; such content will just listen to the full list of magic events. It only makes it easier to build legitimate, future-proofed web apps.