When you’re signing up and changing your password, many places have specific restrictions on what characters you’re allowed to use:
- Alphabet (many places limit special characters)
- Must contain X uppercase/lowercase/numeric/special characters (usually 1, and some places only require 3 out of those 4 groups)
- Must not contain specific phrases (like your username/email or the site name)
- Must be at least N characters and at most M characters
These are typically validated client-side using JS logic, but password managers need this information as well to know what passwords they can generate for that site.
We could in theory accommodate literally all of that with pattern
, using the general template ^(?=((REQUIRED_TYPES).)+)(?!.*BANNED_SUBSTRINGS)[ALPHABET]{MIN,MAX}$
:
- The alphabet is trivially encoded - things like alphanumeric passwords only are encoded as easily as
^[A-Za-z0-9]+$
Minimum and maximum length are also trivally encoded likewise. - Things like “needs 1 character in at least 3 of these groups: uppercase characters, lowercase characters, numeric characters, special characters” could be done by appending
(?=.*([A-Z].*([a-z].*[0-9\W-]|[0-9].*[\W-])|[a-z].*[0-9].*[\W-]))
to the beginning of the regexp, after the initial^
. You need to either enumerate all the possible combinations or optimize it more or less algorithmically to do it, though. - Things like “must not contain specific phrases” could be appending
(?!((?=foo|bar).)+)
to the beginning of the regexp, after the initial^
To show how this all works together, here’s some example password requirements:
- Must have at least one character from 3 of these groups:
- Uppercase letters
- Lowercase letters
- Digits
- Symbols: +_%@!$*~
- Must be between 8 and 16 characters in length
- Must not match or contain your username
<input type="password"
pattern="^(?=.*([A-Z].*[a-z].*[\d+_%@!$*~-]|\d.*[a-z+_%@!$*~-]|[+_%@!$*~-].*[a-z\d])|[a-z].*([A-Z].*[\d+_%@!$*~-]|\d.*[A-Z+_%@!$*~-]|[+_%@!$*~-].*[A-Z\d])|\d.*([A-Z].*[a-z+_%@!$*~-]|[a-z].*[A-Z+_%@!$*~-]|[+_%@!$*~-].*[A-Za-z])|[+_%@!$*~-].*([A-Z].*[a-z\d]|[a-z].*[A-Z\d]|\d.*[A-Za-z])))(?!.*{{escapeRegExp(username)}})[\w+_%@!$*~]+$"
minlength="8"
maxlength="16"
>
But problem is, pattern
gets unwieldy in a hurry as you can see (the regexp above is almost 300 characters excluding the username), so developers are unlikely to want to make broad use of it. Also, password generators need an explicit alphabet, as parsing pattern
to get the underlying list of possibilities is straight up intractible for them. (Regular expression equivalence itself is NP-hard, and real-world regexps would also add fairly high constant factors to that, too.)
So I propose we should add a few more attributes to <input type="password">
, to help both reduce difficulty of adoption and enable password managers to better understand the constraints of the system they’re working with:
-
alphabet
- The allowed alphabet as a single character class body, if only a subset of characters are allowed. (I’ve not once encountered a site that operated on blacklist here, only whitelist.) -
requiredclasses
- A space-separated list of required regexp character class bodies enclosed in brackets. (These are intentionally character classes as spaces could be in a required group and as things like\d
,\W
, and\p{Ll}
would be incredibly useful to simplify their values. They are not general regexps as password managers still need to parse them out and they need to be easy to reason about.) -
requiredclasscount
- The number of the above classes required to be matched. (Default is the number of classes specified.) -
disallowedwords
- A space-separated list of banned words.
We already have
minlength
andmaxlength
, so I’m obviously not going to propose either of those.
To show how that could simplify things, here’s that same password pollicy could be encoded:
<input type="password"
minlength="8"
maxlength="16"
alphabet="A-Za-z0-9+_%@!$*~-"
requiredclasses="[A-Z] [a-z] [0-9] [+_%@!$*~-]"
requiredclasscount="3"
disallowedwords="{{username}}"
>
Note: HTML minifiers should be able to optimize these patterns to something smaller, like the
alphabet
above toalphabet="\w+%@!$*~-"
and[0-9]
to[\d]
inrequiredclasses
.
The ultimate goal is to encourage developers to specify password restrictions for a few reasons:
- It’s easier for them to check as they can just offoad the work to the browser.
- It’s easier for password managers to hook into it and know what exactly to generate, without the user even needing to configure it manually.
- When password mangers are seamless, users are far more likely to use them, and so they end up much more secure by default as a result.