Safe Tag for Limiting Functionality

HyperSpec · 2015-12-04

After conducting some reason, it seems there is not currently any tag, nor reasonable method, for limiting the functionality of the Hyper Markup Text Language.

In some scenarios, it may be desirable to limit how one section of markup may behave. For example, for user input, it may be that links and images should be allowed, yet not script executing nor applying Cascading Style Sheets.

Even if one were to create a server-side method for filtering this type of content, for example, the anchor and image tags, there is no guarantee that, as the specifications change, this method will allow the safe generation of content in the future, therefore presenting a security risk.

yoavweiss · 2015-12-04

It would be useful if you could state a clear and concrete use case: What are you trying to limit? Why?

HyperSpec · 2015-12-04

The scenario I am working at the moment is limiting the effected functionality of a certain code block which is generated based upon user submitted content. Specifically, I would like to allow anchor and image tags, but not the application of styles, nor the execution of scripts.

I believe allowing scripts would put users of the site at risk, for instance a redirect to a look alike site which could be used to gain information under the pretext that they are submitting information to a trusted site, or perhaps the interference of existing scripts, et cetera.

yoavweiss · 2015-12-04

Have you looked into Content Security Policy?

HyperSpec · 2015-12-04

I believe that is simply for the loading of external resources. For example, I would like to allow:

<a href="https://focalfuse.com">Focal Fuse LLC</a>

but not (assuming jQuery is used):

background-color:#000000;

<script> $('a').attr('href','https://focalfuse.com'); </script>

While still allowing the execution of inline scripts (which have been dynamically inserted on the client side).

tabatkins · 2015-12-04

If you’re including user-generated code, and you want to limit its ability to attack the rest of the page, you want <iframe sandbox>. You can control the sandbox more specifically by specifying values in the sandbox attribute, but by default it’ll lock things down nice and safe.

AshleyScirra · 2015-12-04

Actually there is one potentially huge use case for this: user-submitted posts with HTML formatting. Loads of blogs, forums, and comment sections support some kind of de-facto standard BBcode for basic markup, such as by using [b]bold[/b] and [i]italic[/i], or markdown like **bold** *italic*. These force web developers to re-invent subsets of HTML for security reasons. Using unfiltered HTML itself is incredibly dangerous due to users being able to post arbitrary <script> tags, and even filtering HTML is incredibly difficult given the number of places Javascript can be included in HTML combined with encoding variations, such as <img src=j&#X41vascript:alert('my js')>.

Something like a <sandbox> tag could solve this. An iframe is probably overkill for including user-submitted content, but a tag could provide similar protections for the content inside of it. Like with the iframe sandbox attribute, by default markup inside <sandbox> would:

not allow script execution in any way
not allow forms
not allow plugins

The restrictions should probably go beyond iframe sandboxing to also:

not allow custom CSS styles
block potentially dangerous tags like <iframe>
block potentially annoying tags like <video>, <audio>
block tags that become useless under the above restrictions, like <canvas>, <dialog>, form controls

Like iframe sandboxing the restrictions could be customised with attributes, such as <sandbox allow="video"> in case a forum wants to allow videos to be included in posts.

Ideally the end result is web developers can ultimately paste user-submitted HTML between <sandbox> and </sandbox> and still have a secure website, without having to use a custom markup engine.

HyperSpec · 2015-12-04

I asked this question, and it was said no such tag existed.

May not be as fine tuned as I was searching for, but I will see what I’m able to do with it. Thank you.

AshleyScirra · 2015-12-04

I wasn’t describing an existing tag, I was suggesting a new <sandbox> tag based on the restrictions of iframe sandboxes. It’s basically a renaming of your proposed <safe> tag with a bit more description of how it works and its use cases.

HyperSpec · 2015-12-04

I just checked. I do not believe the iframe method would work in my case. I am attempting something like the following (example with jQuery):

<sandbox>
<div id="container"></div>
</sandbox>
<script>
$('#container').append('Text');
</script>

Tigt · 2015-12-04

You can mimic that functionality using a data: URI as the <iframe>'s src, if it helps. srcdoc is a nicer way of handling it, but the browser support isn’t there yet.

tabatkins · 2015-12-04

This was indeed one of the leading use-cases for sandboxed iframes, yes. (The other was limiting the damage an injected ad can do.)

Been discussed to death, doesn’t work, unfortunately. A quick rundown of some of the reasons:

There are use-cases for running script or CSS inside, just preventing it from attacking the rest of the page. This means you need some strong boundary separating the stuff inside from the stuff outside. Separate documents are the existing and well-proven way to do that, and iframes are the existing way to embed separate documents in each other - it just makes sense to reuse the existing stuff, rather than reinventing something new and subtly different.
This is a security primitive. When designing a good security primitive intended to be used by non-experts, you need to make it as hard to screw up as possible. Having a tag that contains the hostile code directly means you need to somehow defend against the hostile code containing a </sandbox> tag, closing the sandbox early and escaping into the outer page. Only secure way to do that is to have some unpredictable token appearing at the start and end, but that means: (a) inventing some way to put a token into the end-tag, which doesn’t exist yet, and (b) relying on non-expert authors to generate unpredictable tokens, which we know for a fact doesn’t work. (People generate weak tokens, or reuse tokens, constantly.) Alternately you can try to depend on authors always correctly finding and escaping </sandbox> tags, which we also know for a fact doesn’t work (correctly escaping content is already something people screw up all the time, and that’s trivial “just escape all the <'"& characters”, not non-trivial escaping of HTML syntax).

iframes deal with this by, at minimum, requiring you to put the content into a data: url, which needs url escaping to work a lot of the time, and at worst requires just attribute escaping (whatever quoting character you use). HTML defined the .srcdoc property to make this even easier, by letting you put in raw HTML with no escaping needed whatsoever.

Most of these are available to sandboxed iframes, via the sandboxing options.