Our approach then in this paper is twofold. First, we demonstrate
that the use of Shannon’s entropy as defined in NIST SP800-63 is
not an effective metric for password security. Second we attempt
to gauge the security provided by conventional password creation
rules. We accomplish both of these tasks by performing standard
password cracking attacks against multiple sets of real life
passwords. These passwords, which will be described in more
detail in Section 4, and Appendix 1 and 2, were all obtained from
publicly disclosed hacking attacks. This is where an attacker
collected the passwords, either through a phishing attack, or
compromising a website, and for whatever reason posted the
password lists online. These lists in some cases can be quite large,
as in the RockYou set [6] which contained over 32 million
passwords. Admittedly these datasets can be problematic, since
none of them represent corporate logins. A counter-example can
easily be made that people on average choose stronger passwords
for more sensitive sites. That being said, these datasets still
represent a significant number of user password creation strategies
and can be applied to evaluate the expected success rate of
different types of attacks. We hope this focus on real passwords
and real attack methodologies can provide a better understanding
of the effectiveness of different password creation policies.
The remainder of this paper is structured as follows: Section 2
details some of the previous work done in this area. Section 3
covers the NIST SP800-63 model of password entropy. Section 4
illustrates why that the NIST notion of password entropy does not
provide an accurate view of password security. Section 4 further
goes on to demonstrate the effectiveness of password cracking
strategies against traditional password creation rules. Finally
Section 5 discusses password creation policies that might be more
applicable when defending against online attacks.
2. Previous Work
There have been several previous attempts to measure password
security by analyzing real life passwords. One of the first papers
to take this approach was written in 1978 by R. Morris and K.
Thompson [19]. They found that in a group of around 3,000 users,
1/3
rd
of the passwords were vulnerable to a dictionary attack
containing 250,000 words. When combined with a limited brute
force attack, they estimated over 86% of the passwords could be
cracked. Since then several other studies have found similar
results. In [20], A. Narayanan, and V. Shmatikov ran experiments
against 142 real user passwords and were able to break 67.6% of
them using a Markov based brute force attack. In [21], Yan,
Blackwell, and Anderson found when testing a group of 300
student passwords, 32% of the control group was cracked via a
limited dictionary based attack. In [22], Wu collected over 25
thousand Kerberos v4 tickets and attempted to crack the
corresponding user passwords. In that experiment, only 8.1% of
the passwords were cracked over a two week period due to the
computational complexity of making a password guess. Perhaps
the largest previous study on password security was done by
Stone-Gross et al when his team temporarily took over the torpig
botnet [23]. During the ten day period they had control of the
botnet, their group collected over 297 thousand unique
username/password pairs from 52 thousand infected computers.
To test the strength of the plaintext passwords collected, they
hashed 173 thousand unique passwords with the MD5 hashing
algorithm and then proceeded to use the popular password
cracking tool John the Ripper to try and crack the hashes using an
offline attack. During the course of a 75 minute cracking session,
the team managed to break over 40% of the passwords. What’s
more, they found that 28% of users re-used the same password
across multiple sites. This closely matches an earlier study by
Sophos [24], where 33% of users polled admitted to using the
same password for all of their internet logins. If this holds true,
that means passwords gathered from low value targets, such as
social networking websites, might successfully be used by an
attacker to target higher value targets such as webmail and bank
accounts. It also means that the results of studying these “low
value” passwords may provide us insight into the effectiveness of
password creation policies for higher value sites.
That being said, none of the above studies focused specifically on
the security that password creation policies actually provide, such
as the effect password length has on password strength. There has
been some research into how effective the notion of Shannon
entropy is for measuring password strength, (and by extension the
recommendations put forward by NIST 800-63). The most notable
papers covering the subject have been [7, 8], but those studies
focused exclusively on the theoretical underpinnings of trying to
convert the Shannon entropy to the Guessing entropy of a system,
and did not verify their findings using real user passwords. In the
pessimistically titled paper, “Password Exhaustion: Predicting the
End of Password Usefulness” [25], Clair et al, attempted to
evaluate the search spaces produced by different password
creation policies along with their resistance to attack. They found
that certain password policies might actually weaken systems
against brute force attacks due to the reduction in key space. They
then collected 3,500 student passwords and attempted to crack
them using a 20 node cluster of computers. This resulted in their
team breaking 34% of the passwords in five days, with a vast
majority of these passwords, (almost 90% of the cracked
passwords), falling to brute force attacks. Unfortunately, their
tests did not attempt to measure security provided, (or reduced),
by the application of different password creation policies beyond
their resistance to brute force attacks. Therefore, we feel that the
results and strategies detailed in this paper are fairly novel as we
attempt to gauge the security of password creation policies by
examining real user passwords and their resistance to dictionary
based attacks.
One other paper that bears mentioning is a survey of password
creation and storage policies among several popular websites by J.
Bonneau and S. Preibusch [26]. There are too many interesting
findings from that paper to list here, and it is highly recommended
reading to help put the results detailed later in this paper into
context with how password policies are currently implemented.
For example, a vast majority of the websites Bonneau and
Preibusch examined, including sites such as eBay, Amazon.com,
and Wordpress, did not support rate limiting the number of
guesses allowed to an attacker.
3. Password Entropy per NIST SP800-63
As mentioned previously, the password recommendations
provided in the NIST document are based on the idea of
information entropy. Building on the notion of entropy detailed in
Equation #1, it can further be expanded by noting that the entropy
of several random variables can be modeled as:
,
2
In the NIST document, they attempt to define these random
variables by specifying how they are created through the use of
common password creation policies. These random variables can
be viewed as representing an unknown value that an attacker