This post is part of the $100 Answers Series.
Context: We have a web application that requires users to login with their own password. Assume that the application is storing passwords in a custom database table--the solution isn't dependent on database functionality, but databases are common and I like to build my examples on something concrete. We'll also assume that the user's password is sent to the application over SSL/TLS and that the threat model for the application considers that acceptable.
- Require your passwords to be as long and as complex as your users will tolerate and still use efficiently
- Get the user's raw password
- Create a salt value by generating a 128-bit random number
- Concatenate the salt and raw password and feed them into SHA-256:
hashedPassword = SHA256( salt + rawPassword )
- Encode hashedPassword so that it is text and not binary
- Store hashedPassword in the password column in the database
- Store salt in the salt column in the database
- Require the user to change the password every two years
The rationale behind these steps is described in the rest of this post.
The procedures outlined above will give us decent protection in most cases. If a library doesn't have SHA-256, we'll go for SHA-1. If SHA-1 isn't available, consult a local security expert to determine which of the available algorithms is suitable. However, be prepared to swap out the hashing function. MD5's weaknesses are such that it's hard to recommend new uses, and SHA-1 is showing signs of weakening against the onslaught of hash research. It is best to be prepared and build your code to support swapping in a new hash function in the future.
Will this protect me from brute force attacks?
Yes. There are two kinds of brute force attacks: online and offline. An online brute force attack involves simply trying guess the password across the network. The attacker (or more likely, the attacker's automated script) guesses a password and tries to login. If he fails, he guesses another password and tries again. And repeats. And repeats. Until he succeeds or gives up.
An offline attack requires the attacker to already have the hashed password (perhaps he stole it from a backup tape). The attacker then guesses a password, hashes it, and then compares the generated hash to the hash he already has. If they match, he has found the password. Otherwise he continues guessing and hashing until he gives up.
The primary defense against brute force attacks is to require suitably long and complex passwords, say, twelve or more characters that include letters (upper and lower case), numbers, and special characters. An interesting consideration in this regard is that if you require all passwords to include both upper and lower case characters then you make the attacker's job somewhat easier because he now knows that he doesn't have to test all of the cases where the password is just lower case letters, just uppercase letters, just numbers, or otherwise doesn't include both upper and lower case letters. This isn't a reason not to require mixed case, it's just good to know what you're doing.
Throttling is an additional layer of defense against online brute force attacks. Throttling means that attempts to login after failed logins are made to take longer or are simply denied for some period of time. For instance, after an account has had 5 consecutive failed login attempts, subsequent attempts trigger a timer so that the success or failure isn't reported back to the user until after 5 seconds.
What about stretching? Will that slow down offline attacks?
Yes, but I don't recommend it for most web applications. Spend your extra CPU cycles doing other useful things. Like logging, monitoring, and filtering inputs and outputs.
Stretching a password generally means that we recursively hash the password a bunch of times. The exact number of recursions is chosen so that it won't inconvenience a legitimate user yet will seriously hinder a brute force attacker. When an attacker tries an offline attack he'll have to repeat all of the recursive calls for each guess. This slows the offline attacks. Of course our web application must also consume extra CPU cycles for the recursive hashing, but in many cases this hit is not too significant. I recommend bringing in your performance experts to examine the potential impact of a stretching routine.
All of this is fine, and it will make the attacker's life harder. Still, I don't think it's worth the effort. Let's take a look at the abuse case. Offline brute forcing SHA-256 is not all that practical today. But it may be practical in 10 years. The attack is this: our adversary steals the hashed passwords today. He sits on them for a while and then in 10 years pulls them out and cracks them.
Now for the tricky part. Is our web application still around? Are there still active accounts from 10 years ago that have not changed their password in the intervening time? The likelihood of all of these things happening seems small enough that I don't lose sleep over it. I wouldn't expect a physical safe to resist 10 years of safecracking effort.
Still, let's say the data is sensitive enough that we are concerned about this attack. So we recursively hash the password to the point that we think it'll still be secure in 10 years. Good for us. In this case, though, our intrepid attacker simply waits another 5 or 10 years and cracks it with the technology that is current then.
There is no way to prevent this type of attack. At some point we have to say the security is good enough and stop chasing decreasing returns. I happen to think that it's not even worth starting down the stretching path with web application passwords. Hopefully, in 10 years time we won't even be using passwords, or at least they'll be augmented by something else.
However the concern is legitimate and luckily there is a simple solution: have the users change their passwords. Make the password change annual or every 5 years or every decade. As long as they change the password, the attack described above is thwarted.
The final step we should take to thwart brute force attacks is to introduce tripwire accounts. These are accounts that are not tied to any user. No one should ever attempt to login to them, and any attempt to do so is either a mistake or an attack. We set up a system so that login attempts to a tripwire account are immediately reported. Any successful login is immediately escalated. Be monitoring a set of these accounts with password of varying strength, we can get a feel for the types of attacks launched against the application and its users.
Will this protect me from the dreaded RAINBOW TABLES?
Yes. I've seen some confusion between offline attacks that use rainbow tables and attacks that use precomputed hashes based on dictionaries. So let's make sure that we're all on the same page. A textbook attack against hashed passwords simply hashes each item in a list of all possible passwords. Then the attacker compares a hashed password to his gigantic list of hashes and finds the password. It's that simple. Except...
The list of hashes can be really big. Gigantic is a serious understatement. For instance, if a system required passwords of up to 8 case-sensitive alphanumeric characters then the hash table to crack those passwords contains just over 220 trillion (220*10^12) items. If we were using SHA-1, which produces hashes that are 20 bytes long, we'd need over 4*10^15 bytes of storage. That's over 4000 terabytes. Did I mention that the list of precomputed hashes is really big?
Since storing that much data is beyond the reach of most attackers (today...), they instead hash lists of likely passwords--such lists include names, words, dates, simple concatenations, and so forth. These lists are called dictionaries, and they are pretty effective. This is why avoiding common words, dates, etc. is a good practice.
Rainbow tables are another way of optimizing attacks using precomputed hashes. I encourage you to go read Wikipedia on rainbow tables, but the basic idea is to create a huge database of the starting and ending points of hash chains (a chain is a sequence of hashes where the current hash value is used to generate the next hash value). The attacker then creates a new chain of hashes based on the password hash that he's attacking and compares the values in that chain to the hashes in the rainbow table. If he finds a match, then he can (in many cases--but it's not guaranteed) recompute the hash chain from the rainbow table to find the actual password.
Salts, as described in the main answer, are the primary defense against these sorts of precomputation attacks. Using a salt means that the precomputed hashes have to be recomputed to incorporate the salt. This is possible, but generally not feasible because the entire 4000 terabytes of data have to be recomputed or the entire set of hash chains have to be regenerated. And this has to be done for each salt value you use.
Does the salt really need to be different for each password?
That's the best way to do it. A salt that is the same for every password is still better than no salt at all (see the discussion above), but unless we have specific requirements that absolutely prevent a unique salt for each password we should generate a random, unique salt for each password. By the way, in cryptography, a salt that is random and unique for each password is a type of nonce, a number used once.