I just wanted to let you guys know about some interesting work Atom has been doing recently with Hashcat, more specifically their implementation of Markov chains.
The Markov model is a mathematical system that has had numerous uses and variations since it's inception over a hundred years ago. Most notable, in terms of computer science, is probably its use in voice recognition systems and telephony networks.
A Markov chain, in this case, is a statistical analysis of a list of words and is already being used by password cracking tools such as JTR (John The Ripper). This analysis is stored in a table that is then used to calculate the probability of the placement of characters in a quasi brute-force attack. The result of which is (usually) a more efficient way of cracking passwords. So instead of guessing every possible combination of characters incrementally, it uses a statistical model where the most common characters are used first. 'C' followed by 'a' or 'e' for example, or 'q' followed by 'u'.
Hashcat comes with a utilities pack which includes various different tools for creating and streamlining password cracking dictionaries. One such tool is the 'hcstatgen' tool which will take a list of words and generate a 'stats' file. This file can then be used to generate a kind of wordlist/brute-force hybrid based on a list of previously cracked passwords you have from an engagement (or anywhere really). It's really interesting to see the types of words and phrases it generates.
To demonstrate its use, I've chosen the leaked LinkedIn password hashes of recent fame. I hadn't spent any time trying to crack these hashes, however, it seemed like the perfect scenario to demonstrate this cracking vector.
A colleague of mine at Spiderlabs had previously run a casual crack of the list using various wordlists and rulesets. This essentially provided me with a list of 3,157,060 plain-text/cracked passwords out of the 6,458,020 hashes that were leaked. It should be noted that due to artifacts remaining from the original hackers' cracking of the hashes, these hashes were split into two types. One is a standard SHA1 hash and the other is padded with 0's, it is thought that the padded hashes were cracked by the hacker before being released. For simplicity I will be demonstrating the attack using only the standard raw SHA1 hashes. I therefore had a list of 2,181,750 un-cracked hashes that should be considered some of the more complex passwords of the bunch.
The first step is to analyze the cracked passwords using 'hcstatgen':
The above commands are pretty self-explanatory, it requires an output file which will contain the stats and then an input file.
The next step is to process this using the 'statsprocessor' tool which can be downloaded from here. The tool takes as arguments a minimum/maximum password length to generate, a threshold parameter and then optional mask flags. These masks are usually used to tell Hashcat what characters to use in each position when performing brute-force attacks, I won't go into further detail on this but a full guide can be found here. These masks have been ported into the 'statsprocessor' tool so that you don't lose that functionality, but I'm going to leave it as the default which is all standard ASCII characters.
As you can see this outputs an interesting wordlist based on an analysis of the previously provided passwords. It was interesting to see some of the phrases it created and cracked during the whole process, below are some examples found:
506522cf9f1404456b4d2bc80e4c2446b2967149:bigcowpies
849ac3678110bb3e0477b0ac8c03ba5679fbe042:alanandangela
82d74bcae0f0d5817ba2c2178307de8b383096ef:shiralinked13
a65bbf725607a5711ef4b9bd4c9c83ee281943e5:missingindore
02bfc9ca3da8766863023ed4a8e6761f806966d9:onemoreonce
It's also good for slang, variations of English words, as well as other Latin based languages:
ded2dbef30fc964b4e30630eb05167789c3bb56a:Minga@2006
cb1a1a2bf8996dcf71569beeb5ef24302fa54980:R@sha1212
fba100fa91208fa35194429332e2dc39a7e2aee9:linkediners1!
At the moment it requires piping into Hashcat-Plus, and it doesn't fully utilise the GPU but it's being fully integrated in the next version. The latest version of Hashcat-Lite, released recently, now supports this but was not appropriate for this testing (as it doesn't support multiple hashes as input).
The arguments specified for Hashcat-Plus are simply -m 100 (SHA1 mode) and the path to the list of hashes.
By now some of you may be thinking "So what? JTR does the same thing?" Correct, well almost. While JTR allows you to all of the above, where Hashcat differs is by calculating the probabilities on a per position basis. Which means that while JTR and Hashcat both calculate the probability of the next character, Hashcat also considers the position of the character within the plain-text (password). So 's' may most likely be followed by 't' under normal circumstances, but if 's' is the 8th character in the text there may be a higher probability of it being followed by '1'.
I've done some dirty comparisons of the two implementations and also some brute force equivalents. JTR uses a 'level' setting as opposed to the 'threshold' used in Hashcat, so it was tricky to keep things fair. It also doesn't support GPU cracking for the SHA1 algorithm yet as far as I can see, so the comparisons have been made by finding level/threshold settings that produce roughly the same amount of guesses. I created a custom 'stats' file for JTR using the same list of cracked hashes, then ran both tools against the remaining 2,181,750 hashes. Below are some examples (I've limited the command flags to only the relevant settings):
------------------------------------------
./sp64.bin --pw-min 9 --pw-max 9 --threshold 8 ./linkedin.hcstat
134,217,728 GUESSES
66 CRACKED
./john --markov=195:0:0:9-9 -fo=raw-sha1
147,197,574 GUESSES
0 CRACKED
------------------------------------------
------------------------------------------
./sp64.bin --pw-min 9 --pw-max 9 --threshold 12 ./linkedin.hcstat
5,159,780,352 GUESSES
1,467 CRACKED
./john --markov=230:0:0:9-9 -fo=raw-sha1
5,052,709,164 GUESSES
488 CRACKED
------------------------------------------
------------------------------------------
./sp64.bin --pw-min 9 --pw-max 9 --threshold 16 ./linkedin.hcstat
68,719,476,736 GUESSES
8,455 CRACKED
./john --markov=258:0:0:9-9 -fo=raw-sha1
71,150,160,679 GUESSES
12,592 CRACKED
------------------------------------------
------------------------------------------
Pure brute force attacks don't come anywhere near these results, with the keyspace reduced to only lowercase/digits and far more guesses allowed the following results were produced with the password length set to 9:
133,615,845,376 GUESSES
1,694 CRACKED
What we begin to see is that the implementation used in Hashcat is much better at cracking the passwords with a limited keyspace, however, the inverse is true when the thresholds are increased. The good news for Hashcat users is that it also has a '--markov-classic' option. Where I've seen the best results for the per-position Markov attack has been with longer password lengths, where the thresholds must be kept to a minimum to reduce cracking times. Here are some further tests that also suggest this:
------------------------------------------
------------------------------------------
./sp64.bin --pw-min 10 --pw-max 10 --threshold 8 ./linkedin.hcstat
1,073,741,824 GUESSES
342 CRACKED
./john --markov=214:0:0:10-10 -fo:raw-sha1
1,016,134,573 GUESSES
1 CRACKED
------------------------------------------
------------------------------------------
./sp64.bin --pw-min 10 --pw-max 10 --threshold 14 ./linkedin.hcstat
289,254,654,976 GUESSES
8335 CRACKED
./john /root/dan/wordlists/linkedin.hash --markov=271:0:0:10-10 -fo:raw-sha1
303,155,912,574 GUESSES
16,036 CRACKED
------------------------------------------
------------------------------------------
./sp64.bin --pw-min 13 --pw-max 13 --threshold 6 ./linkedin.hcstat
13,060,694,016 GUESSES
62 CRACKED
./john /root/dan/wordlists/linkedin.hash --markov=239:0:0:13-13 -fo:raw-sha1
12,933,957,455 GUESSES
0 CRACKED
------------------------------------------
------------------------------------------
All cracking was attempted against the original list of 2,181,750. No hashes were removed by previous attempts so there's a lot of overlapping but this is only for testing purposes. Also the number of guesses used gives a slight bias to JTR in most of the above tests, but I've tried to keep it as fair as possible.
There are some obvious limitations to using it with hashes gained from the current engagement, in that it will begin by cracking the passwords that are similar to the ones already cracked. Which are likely to be the less complicated passwords, but with the correct balance of threshold it will eventually cover the entire key space. The most efficient method in my opinion, is to keep separate stat files for each situation. So a leaked database dump such as LinkedIn would provide a good reference for cracking hashes extracted from a websites database. For cracking OS hashes, which usually have some complexity requirements, it would be useful to have a stats file derived from a more complex list. A feature to incorporate minimum requirements into this method would also be useful, so that each guess must contain at least 3 out of 4 character types (uppercase, lowercase, numeric and symbol) for example. It does support the addition of masks, but this seems a bit superfluous considering it can be managed by the threshold function instead.
This further emphasizes the need for a certain level of entropy in a password for it to be considered sufficiently complex. Which essentially means that using uppercase, lowercase, digits and symbols in a password does not automatically make it secure. Because although the standard ASCII table has 95 printable characters this does not equate to an even distribution of probability, people are more likely to use common characters and frequencies which potentially reduces the key space.