Whether dialog's magic sizing and positioning can be done by CSS is a good question for @tabatkins and others. My impression is dialog positioning is stateful, the HTML spec says (vaguely) to do it when what Blink would call a layout object is created for it. CSS tends not to be stateful in that way.
I think inert should not be expected to explain dialog in general, but it should work consistently with how dialog inertness works. I'm fine with that being layered: dialog built on toplayer + ::backdrop and having toplayer + ::backdrop affect inertness. But the reality of the HTML spec today is that dialog interacts directly with inert.
I think the explainer would be clearer if it separated two things. One is the concept of inertness, which is a property of nodes, and effects things like "how does this appear to accessibility tech?" In this sense of "inert" I think the statements the explainer makes about dialog, like being an implementation prerequisite, make sense.
Then there's a second concept, the inert attribute, and how it sets the inertness property of a subtree. In this sense a lot of the statements about inert polyfilling dialog seem dubious to me. For example, I don't think the inert attribute is useful for polyfilling dialog. The DOM structure constraints the inert attribute put you under will "break" CSS selectors and you will need to polyfill CSS to undo that breakage.
I also think you can't polyfill dialog without toplayer/blockingElements. Without toplayer the polyfill-dialog will have to be a child of body to be rendered, and with the inert attribute I'm not sure there's a way to make text node children of body be inert without also making the polyfill-dialog be inert. Can shadow DOM do it by a slot being inert?
I'm wondering the same kind of things about the explainer's claim that inert is useful for polyfilling blocking elements. The impression I get is that inert is sufficient to polyfill dialog and toplayer but I don't think that is intended.
My $0.02: I think I could understand the explainer more readily if it separated the inert bit on content from the inert attribute, and focused more on the positive use case of offscreen, hidden and non-interactive content. That dialog and blocking elements may also "set" the inert bit is an important technical detail to consider.