Proposal for a Picture in Picture API


We have been working on a proposal to expose a Picture in Picture API. The repository is and the explainer can be found at

This effort started because there are more system APIs for Picture in Picture (Windows, macOS, iOS, Android) and support in browsers (Opera, Safari, Firefox, Samsung). Safari already has a proprietary API for this feature. We believe we should standardise a Web API for this in order to have interoperable behaviour and bring this feature to the Web.


Picture in Picture is a terrible api name, suggesting it has something to do with <img>s inside <img>s.

Also, I do not understand the benefit of this api over a popup with open().


Thanks for the thoughtful feedback Daniel. Picture in Picture is an industry standard denomination. A few links that might be of interest:

As mentioned above, operating systems have specific APIs for Picture in Picture and such a window comes with very different limitations, behaviours and user experience. Re-using might not work very well. However, this is always an option.


Panels on Chrome OS seem to provide the same functionality. Would popups with always on top floating functionality be sufficient?


At the moment, the usage of panels and PIP would be different. The PIP API as currently designed would only work with videos. The native APIs have restrictions that are not all compatible with a fully featured window.


Is there a good reason for that? Only working with video seems needlessly restrictive.


Your question makes sense as I’ve not listed the reasons in the explainer. One is that there is a proprietary API on the Web (Safari) and it is video only. The MVP is to be compatible with it. As said in the explainer, allowing arbitrary elements to be allowed to PIP is something we should consider for the future at least. Another reason, still related to Apple is that iOS and macOS APIs are video only.

In general, PIP is mostly oriented for media content. Mozilla and Opera have a browser feature that only works on videos. Samsung Browser too. Allowing arbitrary content has security implications as the window is on top of all others and has minimal chrome. All the APIs I looked at had restrictions with regards to how the user could interact with a PIP window. Allowing arbitrary content would require these restrictions to be known and understood by the developers. Because these system APIs are still new and again, because the use case is very small, it sounded like we should focus on the main use case and leave the door open for more sophisticated ones.


Is the proposal to only support default contriols?


I assume you mean default controls in the PIP window? At the moment, yes. Actually, I believe that we should use the Media Session actions for the controls to avoid breaking websites (instead of adding a full range of controls that websites might not support). Allowing custom controls will require to allow PIP-ing arbitrary content which might work very poorly or not at all depending on the platforms today. As said above, we want the API to be able to evolve in this direction if the ecosystem does.



If it is only compatible with video, does it need a web api? Can’t the browser or an extension handle putting the video into the mini display?


There are many uses cases that are described in the explainer but I guess the main one would be that websites that implement their own media controls will want to integrate this feature in them.


There are some things in the explainer about what a developer might want to do with an API, but it isn’t clear what the underlying use cases are, so it isn’t clear how these things matter.

The first use case I can come up with that actually explains to me why this might be useful is sign-language interpretation, where you have a limit of one PiP and there is a viable body of externally available content that could reasonably be found by a browser (perhaps via an extension), as well as signed tracks provided by the video website themselves.

Do you know of other examples that actually explain what people will do with this?

Somewhere in my mind is the ability to provide advertising - and to simply deny PiP as a way of blocking it, and using the main video as collateral to convince the user to just leave on PiP by not playing it unless the PiP access is granted. I can see this working for an advertising / video provider, but it doesn’t seem like a great addition to the Web Platform.

What do people actually do with the Safari API?


Existing websites such as and already use the Safari Presentation Mode Web API to provide a custom Picture-in-Picture button to their video controls (see Vimeo screenshot below). This is one reason I believe we want a Web API for this that would work across all browsers.

Regarding your concerns about advertising, websites can already use the PageVisibility API and IntersectionObserver API to detect whether a person is watching a video or not.

But you’re right, ads can also be played in PiP. To prevent abuse though, video.requestPictureInPicture() will require a user gesture - meaning user has to interact with the page before a video can pop up.


For info, this proposal has been moved to the WICG org. The Picture In Picture API is now available at