Ok, that is really interesting. It actually works very well even when I switch the screen off and on (assuming I've hit the 'play' button within the iOS native remote controls interface to initially request persistent playback though).
The iOS implementation and what is proposed at [HTMLMEDIAFOCUS] are actually very similar. iOS applies media focus logic to all HTMLMediaElement objects implicitly. If another HTMLMediaElement object begins playing out the currently playing HTMLMediaElement object is automatically paused.
That approach comes at the expense of allowing more than one or element to play out at any given time since only one object can hold media focus and be subject to remote control events at a time. There may be cases where web app 'pings' (e.g. 'You received a new incoming email!') should not pause the media and take over the remote controls though hence the opt-in proposal in [HTMLMEDIAFOCUS].
Perhaps we want to enforce this iOS behavior everywhere (i.e. on Desktop browsers too). Then we don't really need any new API surface. Right now in Desktop browsers two or more media elements can play out at the same time causing a number of issues in directing the remote control key events toward the correct place. That's why we proposed the opt-in 'remotecontrols' HTMLMediaElement content attribute to let web pages adopt the exact same behavior that is currently enforced on iOS devices.
In [HTMLMEDIAFOCUS] we propose firing appropriate events toward a focused HTMLMediaElement object when 'Previous' and 'Next' buttons are pressed in remote control interfaces. What those events would be called is negotiable (e.g. 'next'/'previous', 'seekToEnd'/'seekToStart', etc).
iOS keeps a document alive if one of its child HTMLMediaElement objects currently has media focus (and hence, remote control events access). Both the iOS approach and [HTMLMEDIAFOCUS] assume that only one media element plays out at any given time, hence the overhead here would be to keep only that one active document alive.