By now, I'd think most people are probably aware of the recent Ashley Madison hack. There's also been a lot of talk about the usefulness of the leaked bcrypt hashes. Almost every professional password cracker I have spoken with seems to agree that cracking these hashes would be pointless. Recently, however, Dean Pierce wrote a blog post which made some waves, chronicling his exploits with these hashes on a $1500 rig. I'll give him kudos for performing his research and sharing it with the community. I've had some interesting conversations around that research this week, and thought maybe it was time to do an experiment of my own.

I really didn't want to spend an enormous amount of time or effort, so I spent about an hour setting everything up, and then just let it run.

A lot of people have been talking about (and praising) his use of a bitcoin mining rig to attack these hashes, but IMO that's just plain wrong to do. Bcrypt is not what we consider a "GPU-friendly" algorithm. Can you crack hashes with GPUs? Sure- but you will be wasting plenty of power and time doing so, as I will demonstrate here. At the moment, your best choices for bcrypt are still CPU and FPGA. If you care about cost, CPU is your friend.

So let's start our own experiment: How long would it take us to pass Dean's 4,007 mark? How many can we crack in the same timeframe as Dean (5 days)? Can we do this cheaper? And what does it all mean?

The Hardware

I decided to setup three different boxes for this experiment. The first will be my standard password cracking rig. It's an open air rig with four R9 290X AMD GPUs. This is pretty close to what Dean used, but they have a little bit more memory. The cost was probably around $1600 to build.

My second box is a dual quad-core Xeon HP server (DL360 g6). Since CPUs are our friends, this box (giving us 16 useable cores for hashcat with hyper-threading) should be pretty friendly. It's a little old, but I expect it should give us some interesting stats. Doing a quick check on ebay, I can find similar DL360s for around $300 (some of them less).

The last box I decided to borrow a 48-core Xeon server. More cores is always better, right? This guy is probably closer to the cost of the GPU rig, which is not really unreasonable for someone to purchase.

It's also worth noting, I am using Debian for all my boxes.

Using GPUs

So now we begin... I noticed two issues with Dean's tests. First, the OS he used (PIMP) which is supposed to abstract things like drivers for you may not be the best choice here. The requirements for cracking passwords vs bitcoin mining are actually fairly different, on both the hardware and software sides. You can use pretty much any driver you want for bitcoin mining, but oclHashcat is very sensitive to particular bugs you will find in the more recent AMD Catalyst drivers. If you are running anything newer than 14.9 (especially with oclHashcat 1.36) you are going to incur some performance penalties. This is probably why he had to use the --force flag in his examples. Generally you don't want to have to do that with oclHashcat. Additionally, the latest version of oclHashcat is actually 1.37, which also has some small performance boosts, so that is what I am going to use here.

Now, the Ashley Madison dump isn't the largest hash list I've seen, so I wanted to experiment with loading in the entire list on 290X cards as opposed to Dean's 290 cards. Fortunately, I can successfully load all 36 million in a single job. Using the rockyou list (as he did), I got the following speed stat:

Speed.GPU.#*...: 210 H/s

It's a little faster than what Dean received, but not by any margin we really care about. A pity, but not unexpected (it would probably run a little faster using the same sample size he used.) I figured I would just let this run and see how well it did as is.

On to 16 Cores

With the first box happily cracking away, I setup the next job on my 16-core Xeon machine. For this, I turned to hashcat 0.50, which is the current CPU release available to the public.

Although my first job was close to what Dean Pierce did in his blog article, I wanted to optimize the CPU jobs a little. Again, didn't want to spend a lot of time on this, so I kept it simple. I would split up the hashes in separate, sorted and uniqued lists, each containing 1 million entries. I took the second list of 1 million as my target (mostly so I could avoid cracking dupes from the GPU).

Why would I do this? Well, it all comes down to performance. We need to spend X amount of time working on a hash before moving on to the next one. Hashcat will run through all iterations of your dictionary against a hash, and then proceed. Thus, we have two bottlenecks here:

  • the size of your dictionary
  • the size of your hash list

What we want to do is cycle through as many of the hashes as possible with the most relevant passwords we can find. A shorter dictionary could mean that we get through more of the bcrypt list (with faster algorithms, shorter dictionaries could actually lead to a loss of performance due to an inability to gain full acceleration, but that's not really a concern in this instance).

Now that I have the modified hash list, I need a better dictionary. For simplicity, I chose the first 250 lines of the top 500 worst passwords. Let's see what we can get...

Fun with 48 Cores!

Of course, the final box is setup with hashcat 0.50 just like the 16-core box. I used the third list of 1 million hashes and the same 250 line dictionary.

And then We Wait...

I waited just 24 hours and collected the results of my pots.

  • GPU 898 cracks
  • 16-core 1199 cracks
  • 48-core 3336 cracks

That came out to 5,433 cracks total (all unique hashes). So it took just one day, and we were able to pass the 4,007 mark (albeit not on a single machine).

Sadly, the GPU rig looks like it was doing only marginally better than Dean's rig. After 5 days, at it's current rate, it would probably only have around 4,500 cracks. It was pretty obvious that any additional stats from this box were unimportant, but I let it run anyway.

After 5 days, here were the updated totals:

  • GPU 4686 cracks
  • 16-core 5159 cracks
  • 48-core 15824 cracks

In total, that's 25,669 unique hashes cracked for 48 hours, and among those, only 399 unique plaintext passwords. At this point, we can end the experiment and try to figure out if we've actually proved anything.

What does all of this mean?

Honestly.... not much. It does demonstrate that GPUs are not the way to attack bcrypt, and that you can easily buy a lot more effective hardware for about the same price Dean spent on his rig.

Dean did make a comment about how linear these cracks were discovered. Like his own results, cracks found over time were quite linear for the duration of the experiment. Some may ask, why would that be? It's actually pretty simple. The duration of the experiment is too small for the complexity of the algorithm. As Dean theorized, the number of cracks found per time unit should decline once the weak password threshold is reached. However, it then becomes important to understand what that threshold is. If one were to attack a sample of say 100 md5 leaks from the past year, you would find that maybe 60% + of these passwords can be found with dictionary attacks using weak passwords (this data was gained from a random selection of lists I tested against). If we were to use that threshold here, it would take centuries before you would reach the point where that graph declines. The exact same list done in md5s would be noticeable over a period of minutes.

Ultimately, should we be impressed? Not really. I think the point of password cracking is to learn from the information in the leaks so we can improve security. With the Ashley Madison data, you don't actually learn anything. We use these lists with extremely common and weak passwords, so of course anything we find is going to be weak! And really, there's no introspection into the quality of the entire dump. As arstechnica pointed out, Dean's data represents only 0.0668% of his 6 million hash pool. Which is only 0.0111% for the entire 36 million hash dump. This is hardly enough of a sample for scientific analysis. We really can't learn anything useful in a sufficient period of time, and there are plenty of other dumps out there which are both easier to attack and will cultivate a much greater pool of information (especially for the longer and more complex password combinations.)

As has already been mentioned in a ton of other places, all of this really illustrates that cracking bcrypt with a cost of 12 is just pointless in the common era. There are a ton of dumps out there where companies are still using md5, sha1, and similar algorithms. I think a short-term goal should be to try and convince those companies to update their password storage mechanisms (working on better security practices and mechanisms globally is going to take a lot longer.) Regardless of whether or not you agree with the security practices of Ashley Madison, they definitely did their password storage correctly.

Kudos to Ashley Madison for that.
And thanks to Dean Pierce for his initial research.

Some Data

For anyone interested, here are the top 50 cracks.

$ sort -nr hashcat.pot | head -n 50 | awk '{ print $2, "("$1")" }' | column -t
123456 (8568)
12345 (2650)
password (1086)
qwerty (876)
12345678 (726)
1234567 (511)
fuckme (375)
abc123 (344)
pussy (278)
111111 (275)
11111 (246)
696969 (241)
654321 (203)
baseball (184)
football (165)
superman (159)
michael (158)
harley (139)
jackson (121)
asshole (118)
mustang (112)
letmein (108)
monkey (106)
dragon (99)
soccer (98)
horny (95)
aaaaaa (93)
shadow (91)
iloveyou (90)
hockey (86)
daniel (84)
7777777 (84)
andrew (83)
jordan (81)
jennifer (81)
buster (81)
charlie (78)
trustno1 (76)
master (76)
george (76)
lucky (75)
bulldog (73)
tigers (71)
bailey (71)
sunshine (70)
hunter (69)
ranger (68)
whatever (67)
lover (67)
money (66)