In this article I am going to share some bash scripting commands and regular expressions which I find useful in password cracking. Most of the time, we find hashes to crack via shared pastes websites (the most popular of them being Pastebin.) Isolating the hashes by hand can be a time consuming process; for that reason we are going to use regular expressions to make our life easier!

Extract md5 hashes

# egrep -oE '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|$)' *.txt | egrep -o '[a-fA-F0-9]{32}' > md5-hashes.txt

An alternative could be with sed

# sed -rn 's/.*[^a-fA-F0-9]([a-fA-F0-9]{32})[^a-fA-F0-9].*/\1/p' *.txt > md5-hashes

Note: The above regexes can be used for SHA1, SHA256 and other unsalted hashes represented in hex. The only thing you have to do is change the '{32}' to the corresponding length for your desired hash-type.

Extract valid MySQL-Old hashes

# grep -e "[0-7][0-9a-f]\{7\}[0-7][0-9a-f]\{7\}" *.txt > mysql-old-hashes.txt

Extract blowfish hashes

# grep -e "\$2a\\$\08\\$\(.\)\{75\}" *.txt > blowfish-hashes.txt

Extract Joomla hashes

# egrep -o "([0-9a-zA-Z]{32}):(\w{16,32})" *.txt > joomla.txt

Extract VBulletin hashes

# egrep -o "([0-9a-zA-Z]{32}):(\S{3,32})" *.txt > vbulletin.txt

Extraxt phpBB3-MD5

# egrep -o '\$H\$\S{31}' *.txt > phpBB3-md5.txt

Extract Wordpress-MD5

# egrep -o '\$P\$\S{31}' *.txt > wordpress-md5.txt

Extract Drupal 7

# egrep -o '\$S\$\S{52}' *.txt > drupal-7.txt

Extract old Unix-md5

# egrep -o '\$1\$\w{8}\S{22}' *.txt > md5-unix-old.txt

Extract md5-apr1

# egrep -o '\$apr1\$\w{8}\S{22}' *.txt > md5-apr1.txt

Extract sha512crypt, SHA512(Unix)

# egrep -o '\$6\$\w{8}\S{86}' *.txt > sha512crypt.txt

Extract e-mails from text files

# grep -E -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > e-mails.txt

Extract HTTP URLs from text files

# grep http | grep -shoP 'http.*?[" >]' *.txt > http-urls.txt

For extracting HTTPS, FTP and other URL format use # grep -E '(((https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]' *.txt > urls.txt

Note: if grep returns "Binary file (standard input) matches" use the following approaches # tr '[\000-\011\013-\037\177-\377]' '.' < *.log | grep -E "Your_Regex" OR # cat -v *.log | egrep -o "Your_Regex"

Extract Floating point numbers

# grep -E -o "^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$" *.txt > floats.txt

Extract credit card data

Visa # grep -E -o "4[0-9]{3}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > visa.txt

MasterCard # grep -E -o "5[0-9]{3}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > mastercard.txt

American Express # grep -E -o "\b3[47][0-9]{13}\b" *.txt > american-express.txt

Diners Club # grep -E -o "\b3(?:0[0-5]|[68][0-9])[0-9]{11}\b" *.txt > diners.txt

Discover # grep -E -o "6011[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > discover.txt

JCB # grep -E -o "\b(?:2131|1800|35\d{3})\d{11}\b" *.txt > jcb.txt

AMEX # grep -E -o "3[47][0-9]{2}[ -]?[0-9]{6}[ -]?[0-9]{5}" *.txt > amex.txt

Extract Social Security Number (SSN)

# grep -E -o "[0-9]{3}[ -]?[0-9]{2}[ -]?[0-9]{4}" *.txt > ssn.txt

Extract Indiana Driver License Number

# grep -E -o "[0-9]{4}[ -]?[0-9]{2}[ -]?[0-9]{4}" *.txt > indiana-dln.txt

Extract US Passport Cards

# grep -E -o "C0[0-9]{7}" *.txt > us-pass-card.txt

Extract US Passport Number

# grep -E -o "[23][0-9]{8}" *.txt > us-pass-num.txt

Extract US Phone Numberss

# grep -Po '\d{3}[\s\-_]?\d{3}[\s\-_]?\d{4}' *.txt > us-phones.txt

Extract ISBN Numbers

# egrep -a -o "\bISBN(?:-1[03])?:? (?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]\b" *.txt > isbn.txt


WordList Manipulation

Remove the space character with sed

# sed -i 's/ //g' file.txt OR # egrep -v "^[[:space:]]*$" file.txt

Remove the last space character with sed

# sed -i s/.$// file.txt

Sorting Wordlists by Length

# awk '{print length, $0}' rockyou.txt | sort -n | cut -d " " -f2- > rockyou_length-list.txt

Convert uppercase to lowercase and the opposite

# tr [A-Z] [a-z] < file.txt > lower-case.txt
# tr [a-z] [A-Z] < file.txt > upper-case.txt

Remove blank lines with sed

# sed -i '/^$/d' List.txt

Remove defined character with sed

# sed -i "s/'//" file.txt

Delete a string with sed

# echo 'This is a foo test' | sed -e 's/\<foo\>//g'

Replace characters with tr

# tr '@' '#' < emails.txt OR # sed 's/@/#' file.txt

Print specific columns with awk

# awk -F "," '{print $3}' infile.csv > outfile.csv OR # cut -d "," -f 3 infile.csv > outfile.csv

Note: if you want to isolate all columns after column 3 use # cut -d "," -f 3- infile.csv > outfile.csv

Generate Random Passwords with urandom

# tr -dc 'a-zA-Z0-9._!@#$%^&*()' < /dev/urandom | fold -w 8 | head -n 500000 > wordlist.txt
# tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?=' < /dev/urandom | fold -w 12 | head -n 4
# base64 /dev/urandom | tr -d '[^:alnum:]' | cut -c1-10 | head -2
# tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 10 | head -n 4
# tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?=' < /dev/urandom | fold -w 12 | head -n 4 | grep -i '[!@#$%^&*()_+{}|:<>?=]'
# tr -dc '[:print:]' < /dev/urandom | fold -w 10| head -n 10
# tr -cd '[:alnum:]' < /dev/urandom | fold -w30 | head -n2

Remove Parenthesis with tr

# tr -d '()' < in_file > out_file

Generate wordlists from your file-names

# ls -A | sed 's/regexp/&\n/g'

Process text files when cat is unable to handle strange characters

# sed 's/\([[:alnum:]]*\)[[:space:]]*(.)\(\..*\)/\1\2/' *.txt

Generate length based wordlists with awk

# awk 'length == 10' file.txt > 10-length.txt

Merge two different txt files

# paste -d' ' file1.txt file2.txt > new-file.txt

Faster sorting

# export alias sort='sort --parallel=<number_of_cpu_cores> -S <amount_of_memory>G ' && export LC_ALL='C' && cat file.txt | sort -u > new-file.txt

Mac to unix

# tr '\015' '\012' < in_file > out_file

Dos to Unix

# dos2unix file.txt

Unix to Dos

# unix2dos file.txt

Remove from one file what is in another file

# grep -F -v -f file1.txt -w file2.txt > file3.txt

Isolate specific line numbers with sed

# sed -n '1,100p' test.file > file.out

Create Wordlists from PDF files

# pdftotext file.pdf file.txt

Find the line number of a string inside a file

# awk '{ print NR, $0 }' file.txt | grep "string-to-grep"


Faster filtering with the silver searcher

https://github.com/ggreer/the_silver_searcher

For faster searching, use all the above grep regular expressions with the command ag. The following is a proof of concept of its speed:

# time ack-grep -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    1m2.447s
user    1m2.297s
sys 0m0.645s

# time egrep -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    0m30.484s
user    0m30.292s
sys 0m0.310s

# time ag -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    0m4.908s
user    0m4.820s
sys 0m0.277s

Useful Use of Cat

Contrary to what many veteran unix users may believe, this happens to be one of the rare opportunities where using cat can actually make your searches faster. The SilverSearcher utility is (at the time of this writing) not quite as efficient as cat when it comes to reading from file handles. Therefore, you can pipe output from cat into ag to see nearly a 2x real time performance gain:

$ time ag -o '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|\$)' *.txt | ag -o '[a-fA-F0-9]{32}' > /dev/null

real    0m10.851s 
user    0m13.069s
sys 0m0.092s

$ time cat *.txt | ag -o '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|\$)' | ag -o '[a-fA-F0-9]{32}' > /dev/null

real    0m6.689s
user    0m7.881s 
sys 0m0.424s