Attacking Google Authenticator

A story about RFC 6238

DISCLAIMER: I hate to keep mentioning it, but please don't abuse this knowledge. There are enough malicious actors out there already. This article was written to try and help our defenders of the realm to be more effective.

Security models always seem to be great until they are not. The curious part is how many people seem to beg the question more often than not. It's easy to just blindly accept something as true without challenging it (such as the safety of a given model). As a community, we should support challenging perceptions and sharing knowledge with others.

At DefCon 26 I was drinking with some friends, discussing hashing and encryption, when the topic of RFC 6238 came up. This RFC defines an open standard for Time-based One-Time Passwords (TOTP) to be used as a factor in authentication schemes. For the unfamiliar, it's pretty popular and widely supported. Many common multi-factor authentication mechanisms rely on implementing this spec (for example Google Authenticator, Duo Mobile, Authy, pyotp for python, and others.) But all of these services are only secure if the secret is confidential. Some people were convinced the TOTP implementations are flawed, and others were not. So can we break this security model by generating the same TOTP codes as a particular target?

After a bit of research and testing, I submitted attack mode 18100 to Hashcat.

NOTE: I think the process for submitting any PR to the Hashcat project is fascinating. Mine have been worthwhile exercises in good coding practices and collaborative development. I would recommend others contribute to this project for the experience as well.

How does this work?

NOTE: I think the theories behind attacks are important, but if you don't care for the details just skip ahead to "Simulating an attack" in the next section.

The RFC itself is fairly straight forward. It builds on a previous RFC for OTP, but uses a timestamp to create a derived counter.

We start with our secret. It's just a base32 encoded value. This really could be anything, but it seems that most implementations have standardized on defaulting to encoding a random 10 byte value. (This leaves us with 16 bytes of encoded text.)

We then take the HMAC of our counter, using our encoded secret as the key. Let's call the product of this H.

An offset is next calculated from the end bits of H, and we extract four bytes of the hash from that offset and convert them to an unsigned integer. Using our integer, we take the modulo of 10^length (where length is the number of digits you want in your resulting code; up to 10), and we now have our TOTP code.

Now, if we can reliably determine which secret generates the OTP code in question at a particular point in time we will have a practical attack. This means that we need at least two pieces of information:

A TOTP code
The rough timestamp at which this code was generated

The caveat here is that the reduction on the HMAC hash results in a large number of collisions for a single point in time. Between two points in time, the collisions seem to be unique, but we should have a duplicate for the real secret which created our codes.

Simulating an attack

Let's now see if we can simulate an attack within a controlled keyspace. In order to keep our OpenCL kernels fast, the base32 encoding of our secret is only performed by the CPU once a candidate has been matched. This means that all of the candidates we try will be raw, but the output of a match will be base32 encoded.

Don't try to attack the base32 value directly. This won't work!

For this test, we will take our raw secret (hashcat) and base32 encode it to produce NBQXG2DDMF2A====. This represents our target secret which we hope to recover. In the wild, it would be sitting on some remote resource responsible for authentication, and unknown to us.

Next, we need two TOTP codes derived from this secret. It's a lot easier than it seems to obtain these codes. They do not need to be consecutive, we just need two of them. Something as simple as a photo (or video) of someone's TOTP client would be sufficient (a picture has the code which was produced, and the EXIF data should provide you with a timestamp.)

For this experiment, since this is just a test lab, we will use python's pyotp package to produce our two TOTP codes. We are going to use two random timestamps to generate these codes, using the pyotp.TOTP.at() method. (The timestamps can be any values, as long as they are different.)

$ python
Python 2.7.10 (default, Aug 17 2018, 17:41:52)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyotp
>>> totp = pyotp.TOTP("NBQXG2DDMF2A====")
>>> totp.at(1263384780)
u'833060'
>>> totp.at(1528848780)
u'549115'

Now that we have our tokens and our timestamps, we need to place them in a file using the format $TOKEN:$TIMESTAMP. Let's call the file test.hash for this simulation.

$ cat test.hash
833060:1263384780
549115:1528848780

The attack itself will be done in two parts: * First, we run Hashcat against the test codes to produce all of the secret candidates in our keyspace. To keep this simulation manageable, we will only use the topology ?l?l?l?l?l?l?l. * Second, we will check our candidates for duplicates to find our genuine secret.

The attack

This particular hash mode is not like your average modes in Hashcat. By default its definition tells Hashcat to keep guessing candidates until the available keyspace is exhausted. As a result, Hashcat's "recovered" stat will always remain 0 despite your pot file filling up. This is expected, so don't worry!

Let's see an example in action...

./hashcat -m17300 -a3 -o totp.potfile test.hash ?l?l?l?l?l?l?l
hashcat (v4.2.1-58-g1b980cf0+) starting...

OpenCL Platform #1: Apple
=========================
* Device #1: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz, skipped.
* Device #2: Intel(R) Iris(TM) Graphics 550, 384/1536 MB allocatable, 48MCU

Hashes: 2 digests; 2 unique digests, 2 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Applicable optimizers:
* Zero-Byte
* Not-Iterated
* Brute-Force

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 256

Watchdog: Hardware monitoring interface not found on your system.
Watchdog: Temperature abort trigger disabled.

[s]tatus [p]ause [b]ypass [c]heckpoint [q]uit =>

Session..........: hashcat
Status...........: Exhausted
Hash.Type........: TOTP (HMAC-SHA1)
Hash.Target......: test.hash
Time.Started.....: Wed Oct  3 21:37:27 2018 (1 hour, 23 mins)
Time.Estimated...: Wed Oct  3 23:01:15 2018 (0 secs)
Guess.Mask.......: ?l?l?l?l?l?l?l [7]
Guess.Queue......: 1/1 (100.00%)
Speed.#2.........:  7218.0 kH/s (2.43ms) @ Accel:4 Loops:2 Thr:256 Vec:1
Recovered........: 0/2 (0.00%) Digests, 0/2 (0.00%) Salts
Progress.........: 16063620352/16063620352 (100.00%)
Rejected.........: 0/16063620352 (0.00%)
Restore.Point....: 456976/456976 (100.00%)
Restore.Sub.#2...: Salt:1 Amplifier:17574-17576 Iteration:0-2
Candidates.#2....: uqxtuaq -> xqxqxqg

Post-hashcat cleanup

Once the attack is finished, you should have a pool of candidates in your pot file. Most of them should be unique, but your genuine secret will be duplicated across two entries. We just need a tiny bit of shell work to reveal which one that was. The following snippet should do this for you.

$ cut -d: -f3 totp.potfile | sort | uniq -c | sort -nr | head
  2 NBQXG2DDMF2A====
  1 PJZXS23RPFYA====
  1 PJZXO5TDNZTA====
  1 PJZXM5DMOBTQ====
  1 PJZXM2LTMV2A====
  1 PJZXI5DWNZWA====
  1 PJZXG2DUNZTQ====
  1 PJZXA4LQOBUQ====
  1 PJZXA4LPN55A====
  1 PJZWYYTBONYA====

Huzzah! Our duplicated secret appears on top!

NOTE: If you see more than one entry with a count higher than 1, something went wrong. You should re-evaluate your steps.

Now, we can take this secret and plug it back into pyotp (or your client of choice) to generate valid tokens!

Practical exploitation

In the wild, this is a bit more nuanced. We aren't usually going to know much about the target secret (like its topology), so we may need to search through a larger keyspace. This could take quite some time. Of course, the more GPUs you have, the faster this will go.

This also assumes that your TOTP implementation is using a few defaults: 30 second steps, HMAC-SHA1 hashing, and 10-byte secrets. You may be surprised at how many implementations use the defaults, or worse yet make the defaults immutable (I'm looking at you Google Authenticator!)

For clarity, the RFC technically defines a secret length at 16 bytes minimum, but many implementations are calculating this from the base32 encoded secret. This means, our 16 byte secret is really a 10 byte secret when decoded. Again, it doesn't have to be this way. But even if the secret is 10 bytes, this attack may not be practical for most (though it isn't out of the realm of possibilities for some actors.) A truly random 10 byte keyspace is incredibly large (although "true randomness" deserves its own discussion.) Any insight you can gain to help reduce the keyspace is definitely going to help here.

If you are responsible for maintaining a TOTP system, I would recommend setting a stronger algorithm (use HMAC-SHA512 if possible) and longer secrets (24+ bytes should last a while into the future.)

Never take your world for granted. Test everything. Hack the planet.