A partial archive of discourse.wicg.io as of Saturday February 24, 2024.

Add password restriction attributes to `<input type=“password”>`

isiahmeadows
2020-09-05

When you’re signing up and changing your password, many places have specific restrictions on what characters you’re allowed to use:

  • Alphabet (many places limit special characters)
  • Must contain X uppercase/lowercase/numeric/special characters (usually 1, and some places only require 3 out of those 4 groups)
  • Must not contain specific phrases (like your username/email or the site name)
  • Must be at least N characters and at most M characters

These are typically validated client-side using JS logic, but password managers need this information as well to know what passwords they can generate for that site.

We could in theory accommodate literally all of that with pattern, using the general template ^(?=((REQUIRED_TYPES).)+)(?!.*BANNED_SUBSTRINGS)[ALPHABET]{MIN,MAX}$:

  • The alphabet is trivially encoded - things like alphanumeric passwords only are encoded as easily as ^[A-Za-z0-9]+$ Minimum and maximum length are also trivally encoded likewise.
  • Things like “needs 1 character in at least 3 of these groups: uppercase characters, lowercase characters, numeric characters, special characters” could be done by appending (?=.*([A-Z].*([a-z].*[0-9\W-]|[0-9].*[\W-])|[a-z].*[0-9].*[\W-])) to the beginning of the regexp, after the initial ^. You need to either enumerate all the possible combinations or optimize it more or less algorithmically to do it, though.
  • Things like “must not contain specific phrases” could be appending (?!((?=foo|bar).)+) to the beginning of the regexp, after the initial ^

To show how this all works together, here’s some example password requirements:

  • Must have at least one character from 3 of these groups:
    • Uppercase letters
    • Lowercase letters
    • Digits
    • Symbols: +_%@!$*~
  • Must be between 8 and 16 characters in length
  • Must not match or contain your username
<input type="password"
    pattern="^(?=.*([A-Z].*[a-z].*[\d+_%@!$*~-]|\d.*[a-z+_%@!$*~-]|[+_%@!$*~-].*[a-z\d])|[a-z].*([A-Z].*[\d+_%@!$*~-]|\d.*[A-Z+_%@!$*~-]|[+_%@!$*~-].*[A-Z\d])|\d.*([A-Z].*[a-z+_%@!$*~-]|[a-z].*[A-Z+_%@!$*~-]|[+_%@!$*~-].*[A-Za-z])|[+_%@!$*~-].*([A-Z].*[a-z\d]|[a-z].*[A-Z\d]|\d.*[A-Za-z])))(?!.*{{escapeRegExp(username)}})[\w+_%@!$*~]+$"
    minlength="8"
    maxlength="16"
>

But problem is, pattern gets unwieldy in a hurry as you can see (the regexp above is almost 300 characters excluding the username), so developers are unlikely to want to make broad use of it. Also, password generators need an explicit alphabet, as parsing pattern to get the underlying list of possibilities is straight up intractible for them. (Regular expression equivalence itself is NP-hard, and real-world regexps would also add fairly high constant factors to that, too.)

So I propose we should add a few more attributes to <input type="password">, to help both reduce difficulty of adoption and enable password managers to better understand the constraints of the system they’re working with:

  • alphabet - The allowed alphabet as a single character class body, if only a subset of characters are allowed. (I’ve not once encountered a site that operated on blacklist here, only whitelist.)
  • requiredclasses - A space-separated list of required regexp character class bodies enclosed in brackets. (These are intentionally character classes as spaces could be in a required group and as things like \d, \W, and \p{Ll} would be incredibly useful to simplify their values. They are not general regexps as password managers still need to parse them out and they need to be easy to reason about.)
  • requiredclasscount - The number of the above classes required to be matched. (Default is the number of classes specified.)
  • disallowedwords - A space-separated list of banned words.

We already have minlength and maxlength, so I’m obviously not going to propose either of those.

To show how that could simplify things, here’s that same password pollicy could be encoded:

<input type="password"
    minlength="8"
    maxlength="16"
    alphabet="A-Za-z0-9+_%@!$*~-"
    requiredclasses="[A-Z] [a-z] [0-9] [+_%@!$*~-]"
    requiredclasscount="3"
    disallowedwords="{{username}}"
>

Note: HTML minifiers should be able to optimize these patterns to something smaller, like the alphabet above to alphabet="\w+%@!$*~-" and [0-9] to [\d] in requiredclasses.

The ultimate goal is to encourage developers to specify password restrictions for a few reasons:

  1. It’s easier for them to check as they can just offoad the work to the browser.
  2. It’s easier for password managers to hook into it and know what exactly to generate, without the user even needing to configure it manually.
  3. When password mangers are seamless, users are far more likely to use them, and so they end up much more secure by default as a result.
liamquin
2020-09-05

When you’re signing up and changing your password, many places have specific restrictions on what characters you’re allowed to use:

  • Alphabet (many places limit special characters)
  • Must contain X uppercase/lowercase/numeric/special characters (usually 1, and some places only require 3 out of those 4 groups)
  • Must not contain specific phrases (like your username/email or the site name)
  • Must be at least N characters and at most M characters

The trouble is, these rules considerably facilitate brute-force attacks (by reducing the size of the search space) and so are not necessarily a good idea. On the other hand if developers are already doing this:

<input type=“password” pattern="^(?=.([A-Z].[a-z].[\d+_%@!$~-]|\d.[a-z+_%@!$~- ]|[+%@!$~-].[a-z\d])|[a-z].([A-Z].[\d+%@!$~-]|\d.[A- Z+%@!$*~-]|[+%@!$~-].[A-Z\d])|\d.([A-Z].[a-z+%@!$~-]|[a- z].[A-Z+%@!$~-]|[+_%@!$~-].[A-Za-z])|[+_%@!$~-].([A-Z].[a- z\d]|[a-z].[A-Z\d]|\d.[A-Za- z])))(?!.{{escapeRegExp(username)}})[\w+_%@!$~]+$" minlength=“8” maxlength=“16”

then maybe it makes things no worse? For sure this is worth improving.

<input type="password"
    minlength="8"
    maxlength="16"
    alphabet="A-Za-z0-9+_%@!$*~-"
    requiredclasses="[A-Z] [a-z] [0-9] [+_%@!$*~-]"
    requiredclasscount="3"
    disallowedwords="{{username}}"

Disallowing Greek letters, and accented Latin letters, and Chinese and Japanese letters, excludes a lot of people. Although the site in question might have all its content in Latin or English (say), and it’s an example, the example gets much much harder for (say) Japanese developers.

Maybe specifying the input in terms of Unicode classes would be better? And what is an alphabet for Chinese or Japanese? Maybe, characters- allowed ?

isiahmeadows
2020-09-06

The information is already usually public and available to prospective attackers for more focused attackers, and there are (very incomplete) databases out there that catalog password spaces for some sites already. Not sure what benefit this provides other than security by very mild obscurity. Also, it’s worth mentioning brute force is the last option, not the first - hackers almost always go through numerous levels:

  1. They first try the most common stuff first (like “password”, “letmein”, “abc123”, etc.)
  2. They then try somewhat more common things that aren’t just static (like site name and username parts).
  3. Only after trying the obvious things like the above do they go for a leaked plaintext or MD5-hashed (trivially cracked) database, and then, they check the obvious things like reused passwords first.
  4. If that fails, they then try to find other leaked databases and try the above plus substitutions like “4” for “a”, “$” for “s”, case differences, etc., using the fact they don’t need to issue remote requests to accelerate the process.
  5. Then they attempt basic statistical attacks and easy brute-force attacks, taking advantage of common templates like 4-8 letters + 2 numbers, 4-8 letters + 4 numbers, word substitutions with a dictionary, common character substitutions like the above, and so on that significantly narrow the key space.
  6. Then they attempt to exploit partial rainbow tables and such if they exist.
  7. Then they attempt to brute force it using any and all known weaknesses in the hash function to reduce it as much as they can.

For context, bots operated by hackers usually stop at 1, hackers only interested in low hanging fruit mainly stop at step 2 unless they have easy access to such a database, most hackers with such access mainly stop at step 3 or 4 unless they’re specifically targeting someone, and unless you’re a very high value target in the hacker’s mind, they’re unlikely to surpass step 5, and even then, if they’re left with 7, they’re more likely to bring a rubber hose with them than even try to attempt that unless it’s a known very small key space (like 8-10 characters or less) and they don’t perceive a risk of your password changing in the time it’d take to crack it.

I was using it as an example, but I took it from a business’s password policy with the only actual change being mandating certain character classes. (And yes, the business in question only does business in English.) My actual proposal here specifies it in terms of JS regexp character classes, so yes, you could use requiredclasses="[\P{Lu}] [\P{Ll}] [\P{Lo}] [\P{Lm}] [\P{N}] [\P{M}\P{S}\P{Zs}\P{P}]" requiredclasscount="3" for a more internationally friendly version.

As for the term “alphabet”, I meant that in the mathematical sense, not in the linguistic sense, as in practice we’re viewing it as a set of distinct symbols, not a set of distinct graphemes corresponding to individual phonemes. (Also, informally, most people say “Japanese alphabet”, not “Japanese syllabary”, so risk of confusion in that area is likely also negligible.)

oliverdunk
2020-09-16

Hi @isiahmeadows! I work at 1Password, although I came across this post in my own time.

Are you familiar with some of the work Apple is doing? The passwordrules attribute can be added to input fields in Safari, and these will be used when generating passwords. There’s some information here, as well as in the release post.

There’s a repo in GitHub, also maintained by Apple, called Password Manager Resources. This is an open place where the rules for particular sites can be shared, and we use data from here in 1Password X, our browser extension.

Finally, there’s a proposal in the whatwg repo and some interesting discussion there.

Glad to see so much happening in this space at the moment. It’s very exciting.

isiahmeadows
2020-09-30

I was not aware of that work by Apple, but definitely very interesting. I’ll definitely check it out - that’s pretty much what I was proposing here.

There’s a repo in GitHub, also maintained by Apple, called Password Manager Resources. This is an open place where the rules for particular sites can be shared, and we use data from here in 1Password X, our browser extension.

I have seen that repo before and it was actually one of the inspirations for my write-up here. :slight_smile: