This proposal aims to standardize some of the basic patterns in interactive video (but is relevant to any animation that needs to be pausable/seekable). It defines a new class, SyntheticMediaElement, which can act as the source of truth about “what time it is” (e.g. syncing a THREE.js scene to a scrubber bar). It also defines a new abstract interface, MediaElement, which both SyntheticMediaElement and the existing HTMLMediaElement extend. The goal is to preempt the spread of framework-specific plugins for interactive video.
I just posted a detailed writeup of this proposal that goes into more detail about the motivation, the abstractions that can be built on top of this interface, and some partial implementations. As proof-of-concept, see my CodeMirror recording plugins which are compatible across 3 animation frameworks (Liqvid, Remotion, and GSAP). You can add comments on the GitHub issue.
(Note on names: the idea was that MediaElement = HTMLMediaElement − HTML, but I suppose Element should be subtracted as well. On the other hand, Media is too generic. Perhaps MediaElement → Playable and SyntheticMediaElement → Playback?)