Your question makes sense as I've not listed the reasons in the explainer. One is that there is a proprietary API on the Web (Safari) and it is video only. The MVP is to be compatible with it. As said in the explainer, allowing arbitrary elements to be allowed to PIP is something we should consider for the future at least. Another reason, still related to Apple is that iOS and macOS APIs are video only.
In general, PIP is mostly oriented for media content. Mozilla and Opera have a browser feature that only works on videos. Samsung Browser too. Allowing arbitrary content has security implications as the window is on top of all others and has minimal chrome. All the APIs I looked at had restrictions with regards to how the user could interact with a PIP window. Allowing arbitrary content would require these restrictions to be known and understood by the developers. Because these system APIs are still new and again, because the use case is very small, it sounded like we should focus on the main use case and leave the door open for more sophisticated ones.