Securing your private data with the Enterprise Cryptographic Filesystem

These days, it isn't hard to see why privacy is a huge concern. With BlackHat hackers, state surveillance, and big corporation data-mining (to name just a few), your personal information is often at great risk. There are quite a few different ways to secure your information. Some work better than others, especially under certain circumstances. Although the many "best techniques" could be argued at length, the point of this article is just to familiarize the reader with a simple approach using Enterprise Cryptographic Filesystem (eCryptfs).

eCryptfs is a POSIX-compliant filesystem-level encrypted file system that's been part of the mainline Linux kernel since 2.6.19. However, many distributions will still require you to install the userland tools before you can start using it. Using Debian or Fedora, this should be pretty simple. Unlike full-disk encryption, a filesystem-level encryption will sit on top of your existing filesystem, working seamlessly with various standards like the EXT family, XFS, Btrfs, and even network shares like NFS and SMB.

The advantage to using a filesystem-level encryption as opposed to full-disk is overhead and performance. Since it can be selectively applied to specific folders, you don't have to waste time and resources encrypting data that you know is unimportant or public.

Preparing the userland tools

Usually, it's as simple as installing the ecryptfs-utils package of your distribution to prepare your system. On a Debian-based system, you can type: # apt-get install ecryptfs-utils

On a RedHat-based system, use yum instead: # yum install ecryptfs-utils

Creating an encrypted folder

Though there are many ways to implement this, I will show you how to manually setup a simple encrypted folder. It's a lot easier than you might think. First, let's create the folder to encrypt: $ mkdir ~/Private

Once the folder is created, we can easily mount it with the eCryptfs filesystem type. Using no options, eCryptfs will prompt you for various settings to be used. You should be able to create config files in your ~/.ecryptfs/ directory to store these settings if you don't want to be prompted for them each time you mount. Or you can just pass the options on to eCryptfs when mounting. Here, we will mount with the later: $ sudo mount -t ecryptfs ~/Private ~/Private -o key=passphrase,ecryptfs_cipher=aes,ecryptfs_key_bytes=32,ecryptfs_passthrough=n,ecryptfs_enable_filename_crypto=y

Summarizing the options here:

  • We are going to choose a passphrase rather than RSA keys to protect our data
  • We will use AES data encryption to protect the data (this is highly recommended. Don't use DES, for example)
  • We will set the key size to 32 bytes
  • We will disable passthrough (this will stop unencrypted files from being used inside the mount. It's generally considered safer to keep this disabled)
  • Finally, we will encrypt filenames on the private volume

At this point, eCryptfs will create a filename encryption key (fnek), which is the master key used for encryption and decryption of your data. If you are prompted to "Add signature to cache", simply acknowledge "yes"; this will allow a signature hash of the key to be stored so that the key can be verified when generated from your passphrase. From then on, any mismatch in the fnek signature will be reported upon mounting, which can suggest either:

  • you typed your encryption password incorrectly
  • or there has been corruption in the encrypted store

Once the volume is mounted, your fnek_sig should be displayed, and you can use the following the mount it in the future: $ sudo mount -t ecryptfs ~/Private ~/Private -o key=passphrase,ecryptfs_cipher=aes,ecryptfs_key_bytes=32,ecryptfs_passthrough=n,ecryptfs_enable_filename_crypto=y,ecryptfs_fnek_sig=xxxxxxxxxxxxxxxx

(Just remember to replace xxxxxxxxxxxxxxxx with your actual fnek_sig hash.)

You can now place files and folder inside ~/Private/ that you want to keep protected. Once you are ready, unmount the volume to keep it safe from prying eyes: $ sudo umount ~/Private

And we're done! to make your mount commands shorter, you can store the relevant entry in /etc/fstab, and then you can use the shorthand: $sudo mount ~/Private

The automated method

If you are less inclined to setup encrypted folders by hand, you can use the ecryptfs-setup-private utility to automate the whole process for you. $ ecryptfs-setup-private

You will be prompted for your login passphrase (your login password), and a mount passphrase (which will be used to create the fnek.) Using this method, two new folders will be created for you:

  • ~/.Private stores your encrypted data
  • ~/Private displays the unencrypted data that you can access and manipulate

Once you log out, the ~/Private directory is automatically unmounted and its contents are encrypted back into the ~/.Private directory. eCryptfs will take care of setting up the proper automation so that your folder is available to you when you log in (using PAM to access the fnek.)

Considering how easy these utilities are to use and manage, there should no longer be a reason people are not protecting their sensitive data with encryption.

A cheat-sheet for password crackers

In this article I am going to share some bash scripting commands and regular expressions which I find useful in password cracking. Most of the time, we find hashes to crack via shared pastes websites (the most popular of them being Pastebin.) Isolating the hashes by hand can be a time consuming process; for that reason we are going to use regular expressions to make our life easier!

Extract md5 hashes

# egrep -oE '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|$)' *.txt | egrep -o '[a-fA-F0-9]{32}' > md5-hashes.txt

An alternative could be with sed

# sed -rn 's/.*[^a-fA-F0-9]([a-fA-F0-9]{32})[^a-fA-F0-9].*/\1/p' *.txt > md5-hashes

Note: The above regexes can be used for SHA1, SHA256 and other unsalted hashes represented in hex. The only thing you have to do is change the '{32}' to the corresponding length for your desired hash-type.

Extract valid MySQL-Old hashes

# grep -e "[0-7][0-9a-f]\{7\}[0-7][0-9a-f]\{7\}" *.txt > mysql-old-hashes.txt

Extract blowfish hashes

# grep -e "\$2a\\$\08\\$\(.\)\{75\}" *.txt > blowfish-hashes.txt

Extract Joomla hashes

# egrep -o "([0-9a-zA-Z]{32}):(\w{16,32})" *.txt > joomla.txt

Extract VBulletin hashes

# egrep -o "([0-9a-zA-Z]{32}):(\S{3,32})" *.txt > vbulletin.txt

Extraxt phpBB3-MD5

# egrep -o '\$H\$\S{31}' *.txt > phpBB3-md5.txt

Extract Wordpress-MD5

# egrep -o '\$P\$\S{31}' *.txt > wordpress-md5.txt

Extract Drupal 7

# egrep -o '\$S\$\S{52}' *.txt > drupal-7.txt

Extract old Unix-md5

# egrep -o '\$1\$\w{8}\S{22}' *.txt > md5-unix-old.txt

Extract md5-apr1

# egrep -o '\$apr1\$\w{8}\S{22}' *.txt > md5-apr1.txt

Extract sha512crypt, SHA512(Unix)

# egrep -o '\$6\$\w{8}\S{86}' *.txt > sha512crypt.txt

Extract e-mails from text files

# grep -E -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > e-mails.txt

Extract HTTP URLs from text files

# grep http | grep -shoP 'http.*?[" >]' *.txt > http-urls.txt

For extracting HTTPS, FTP and other URL format use # grep -E '(((https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]' *.txt > urls.txt

Note: if grep returns "Binary file (standard input) matches" use the following approaches # tr '[\000-\011\013-\037\177-\377]' '.' < *.log | grep -E "Your_Regex" OR # cat -v *.log | egrep -o "Your_Regex"

Extract Floating point numbers

# grep -E -o "^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$" *.txt > floats.txt

Extract credit card data

Visa # grep -E -o "4[0-9]{3}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > visa.txt

MasterCard # grep -E -o "5[0-9]{3}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > mastercard.txt

American Express # grep -E -o "\b3[47][0-9]{13}\b" *.txt > american-express.txt

Diners Club # grep -E -o "\b3(?:0[0-5]|[68][0-9])[0-9]{11}\b" *.txt > diners.txt

Discover # grep -E -o "6011[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}" *.txt > discover.txt

JCB # grep -E -o "\b(?:2131|1800|35\d{3})\d{11}\b" *.txt > jcb.txt

AMEX # grep -E -o "3[47][0-9]{2}[ -]?[0-9]{6}[ -]?[0-9]{5}" *.txt > amex.txt

Extract Social Security Number (SSN)

# grep -E -o "[0-9]{3}[ -]?[0-9]{2}[ -]?[0-9]{4}" *.txt > ssn.txt

Extract Indiana Driver License Number

# grep -E -o "[0-9]{4}[ -]?[0-9]{2}[ -]?[0-9]{4}" *.txt > indiana-dln.txt

Extract US Passport Cards

# grep -E -o "C0[0-9]{7}" *.txt > us-pass-card.txt

Extract US Passport Number

# grep -E -o "[23][0-9]{8}" *.txt > us-pass-num.txt

Extract US Phone Numberss

# grep -Po '\d{3}[\s\-_]?\d{3}[\s\-_]?\d{4}' *.txt > us-phones.txt

Extract ISBN Numbers

# egrep -a -o "\bISBN(?:-1[03])?:? (?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]\b" *.txt > isbn.txt

WordList Manipulation

Remove the space character with sed

# sed -i 's/ //g' file.txt OR # egrep -v "^[[:space:]]*$" file.txt

Remove the last space character with sed

# sed -i s/.$// file.txt

Sorting Wordlists by Length

# awk '{print length, $0}' rockyou.txt | sort -n | cut -d " " -f2- > rockyou_length-list.txt

Convert uppercase to lowercase and the opposite

# tr [A-Z] [a-z] < file.txt > lower-case.txt
# tr [a-z] [A-Z] < file.txt > upper-case.txt

Remove blank lines with sed

# sed -i '/^$/d' List.txt

Remove defined character with sed

# sed -i "s/'//" file.txt

Delete a string with sed

# echo 'This is a foo test' | sed -e 's/\<foo\>//g'

Replace characters with tr

# tr '@' '#' < emails.txt OR # sed 's/@/#' file.txt

Print specific columns with awk

# awk -F "," '{print $3}' infile.csv > outfile.csv OR # cut -d "," -f 3 infile.csv > outfile.csv

Note: if you want to isolate all columns after column 3 use # cut -d "," -f 3- infile.csv > outfile.csv

Generate Random Passwords with urandom

# tr -dc 'a-zA-Z0-9._!@#$%^&*()' < /dev/urandom | fold -w 8 | head -n 500000 > wordlist.txt
# tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?=' < /dev/urandom | fold -w 12 | head -n 4
# base64 /dev/urandom | tr -d '[^:alnum:]' | cut -c1-10 | head -2
# tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 10 | head -n 4
# tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?=' < /dev/urandom | fold -w 12 | head -n 4 | grep -i '[!@#$%^&*()_+{}|:<>?=]'
# tr -dc '[:print:]' < /dev/urandom | fold -w 10| head -n 10
# tr -cd '[:alnum:]' < /dev/urandom | fold -w30 | head -n2

Remove Parenthesis with tr

# tr -d '()' < in_file > out_file

Generate wordlists from your file-names

# ls -A | sed 's/regexp/&\n/g'

Process text files when cat is unable to handle strange characters

# sed 's/\([[:alnum:]]*\)[[:space:]]*(.)\(\..*\)/\1\2/' *.txt

Generate length based wordlists with awk

# awk 'length == 10' file.txt > 10-length.txt

Merge two different txt files

# paste -d' ' file1.txt file2.txt > new-file.txt

Faster sorting

# export alias sort='sort --parallel=<number_of_cpu_cores> -S <amount_of_memory>G ' && export LC_ALL='C' && cat file.txt | sort -u > new-file.txt

Mac to unix

# tr '\015' '\012' < in_file > out_file

Dos to Unix

# dos2unix file.txt

Unix to Dos

# unix2dos file.txt

Remove from one file what is in another file

# grep -F -v -f file1.txt -w file2.txt > file3.txt

Isolate specific line numbers with sed

# sed -n '1,100p' test.file > file.out

Create Wordlists from PDF files

# pdftotext file.pdf file.txt

Find the line number of a string inside a file

# awk '{ print NR, $0 }' file.txt | grep "string-to-grep"

Faster filtering with the silver searcher

For faster searching, use all the above grep regular expressions with the command ag. The following is a proof of concept of its speed:

# time ack-grep -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    1m2.447s
user    1m2.297s
sys 0m0.645s

# time egrep -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    0m30.484s
user    0m30.292s
sys 0m0.310s

# time ag -o "\b[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+\.[a-zA-Z0-9.-]+\b" *.txt > /dev/null 
real    0m4.908s
user    0m4.820s
sys 0m0.277s

Useful Use of Cat

Contrary to what many veteran unix users may believe, this happens to be one of the rare opportunities where using cat can actually make your searches faster. The SilverSearcher utility is (at the time of this writing) not quite as efficient as cat when it comes to reading from file handles. Therefore, you can pipe output from cat into ag to see nearly a 2x real time performance gain:

$ time ag -o '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|\$)' *.txt | ag -o '[a-fA-F0-9]{32}' > /dev/null

real    0m10.851s 
user    0m13.069s
sys 0m0.092s

$ time cat *.txt | ag -o '(^|[^a-fA-F0-9])[a-fA-F0-9]{32}([^a-fA-F0-9]|\$)' | ag -o '[a-fA-F0-9]{32}' > /dev/null

real    0m6.689s
user    0m7.881s 
sys 0m0.424s  

A lesson in legacy routing using virtual network interfaces

There comes a point in every successful and growing infrastructure where the implementation needs of said network will outgrow its design needs. Fair enough; there's only so much one can account for (especially in this industry), and managing a network to be flexible and change with business needs is what separates the men from the boys (so-to-speak).

Our DC infrastructure hit this point fairly recently. We have a number of db servers split out into shards. Each shard is designed with a master/slave replication topology, keeping one of the slaves as an analysis machine. The analysis machines were originally meant to perform some very limited and specific work, but of course this expanded and evolved as time progressed. Eventually we had the need to upgrade the analysis hardware, but since there were only one in each shard it would be hard to keep them down for the length of time needed to do our work.

The decision was made to move a copy of the analysis database onto a db slave, and migrate the analysis IP to the slave as a virtual IP. In theory, this would be simple. Machines from various VLANs would continue to connect to the "analysis machines" by the same IP addresses they always have, but they would now be temporarily served by the slaves.

Creating the virtual aliases were easy. We could use ifconfig to add an additional interface like so:

# ifconfig bond0:1 netmask

The trouble now came with the routes. We added several routes to the machine in our usual manner.

route add -net gw dev bond0
route add -net gw dev bond0
route add -net gw dev bond0
route add -net gw dev bond0

But when anyone tried to connect from these subnets, their connections would time out.

As a test, I tried connecting to the primary IP over telnet:

$ telnet 3306

Immediately I was presented with a MySQL banner. Yet doing the same for the virtual IP would simply time-out. We verified the routes, just to make sure everything seemed apropriate.

$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface        *        U     0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0
default         UG    0      0     0   bond0

Then we took packet captures on the network to see where our traffic was going. Packets were indeed moving into the db slaves, but none returned. This smelled an awful lot like a routing issue.

As a simple step, we removed our routes, and added them back using the virtual designation as a test.

route add -net gw dev bond0:1
route add -net gw dev bond0:1
route add -net gw dev bond0:1
route add -net gw dev bond0:1

The result here was peculiar. Examining the routing table again, we received the same output as before:

$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface        *        U     0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0
default         UG    0      0     0   bond0

We tried testing conectivity anyway, but unfortunately the result was the same. We next tried removing the routes and adding them in with the iproute2 tools. Again, this was unsuccessful. The primary interface would continue to work, and the virtual interface acted as a virtual black hole. There was a bit of head-scratching next, and quite a bit of reading, but we eventually did come to the bottom of it.

You see, there are a few nasty elements at play here. Back in the golden days of Unix, multiple IPs per interface weren't even a thing, so when the net-tools suite was written (ifconfig, route, netstat, etc) their support didn't exist. Once it became desirable to have them, the idea of creating virtual interfaces was monkey-patched on-top of the existing infrastructure. This was problematic in it's own right, but was compounded by efforts to actually correct this later down the road. The iproute2 suite was eventually written to work more integrally with a completely redesigned network subsystem in the Linux kernel. This subsystem would be far more flexible and advanced that the original model, and as a result much of the administration and support for advanced networking features would become just easier and more reliable to use. In order to facilitate this migration, the old net-tools have been deprectated. In the meantime, they have been mostly patched to map behind the scenes on-top of the new network subsystem. Sadly, this leaves a bunch of legacy components in a sorry state. The new networking subsystem has no real concept of virtual interfaces, so they don't map properly. As a result, trying to use the iproute2 utils with VIPs can have unexpected results (often in the form of virtual interfaces falling back to the primary interface definition: i.e., bond0:1 becomes bond0).

This whole debacle takes one step further and shows broken behaviour with the original net-tools. As it runs out, if you explicitly bind a route to a particular virtual interface, it will always default to the primary interface. However, not defining it allows all interfaces (even virtual interfaces) to use the route. The madening part is, if you don't specify an interface, the routing table still shows you bound to the primary interface. So viewing your routing table will give you identical output whether you have explicity bound the route or not.

The final solution here was to simply remove all of our routes and redefine them without explicit binds:

$ sudo route add -net gw
$ sudo route add -net gw
$ sudo route add -net gw
$ sudo route add -net gw
$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface        *        U     0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0   UG    0      0     0   bond0
default         UG    0      0     0   bond0

From this point, everything immediately started working. Looking at the routing tables before and after the fix, you wouldn't even notice the difference.

The moral of the story? They heyday of net-tools and VIPs has passed. It's time to just start using iproute2 and multiple address interfaces the way the good developers have intended.

NOTE: Addresses and routes have been altered to protect the original infrastructure, but still accurately reflect the original behaviour for each example.

Suspending and resuming processes

While performing regular administrative duties on any server, you will often find yourself in the position to need to suspend and resume processes. Luckily, nearly all shells on modern systems will have some sort of job control built in to address these needs.

A job is really just another term for a process group. This will commonly be just a single application, but is not exclusive to that idea. Any process group which is not receiving terminal input is considered a background job, while a process group that does receive terminal input is a foreground job. In order for a shell to control these jobs, various IPC calls (termed signals) can be sent to the job which can be interpreted by various signal handlers inside the program.

Conveniently, most programs don't need to define any signal handlers. There is a default set that gets included into each program, and can be generically used.

The most common signals for job control are often as follows: SIGTSTOP SIGCONT SIGINT

There are two main ways we can send these signals. * We can use keyboard shortcuts to instruct the shell to send a particular signal * We can use the kill command to send a signal to a particular process ID

Traditionally, the suspend character has been mapped to Ctrl+Z (SIGSTOP), which will temporarily suspend your process. Similarly, the interrupt character can be signaled by pressing Ctrl+C (SIGINT) on your keyboard.

A lesser known secret is we can send any signal we want to an application using the kill command. It doesn't have to simply interrupt a program. Let's say we have a process with the ID of 1337, and we want to suspend it. We can easily do the following: $ kill -SIGSTOP 1337

Likewise, once we are ready to resume the process, we can use kill to send it the SIGCONT signal: $ kill -SIGCONT 1337

And there you have it! The man pages cover a wide range of additional signals that can be used for various purposes, but mastering even just these simple signal techniques will make a significant difference to your administrative abilities.

Adding timestamps to Bash history

The Bash history feature is an invaluable tool which allows users to recall commands previously entered into their shell with relative ease. This makes it easy to enter repeated commands and keep track of what was done on a system. By default, however, a user is unable to see when these commands were actually entered. When auditing a system, it can sometimes be useful to see this type of information, for example when trying to determine how and when a file may have gone missing on the file system. Since Bash version 3, however, you are able to enable time-stamping of entries for review later.

Applicable versions of Bash provide the environment variable HISTTIMEFORMAT which you can set for this purpose. Although it is empty by default, you can simple assign a time format string to enable time-stamping.

The easiest example is as follows:


Any new commands entered after this variable is set will be stamped, and the stamps will be accessible when you use the history command.

$ ls
file1 file2 file3
$ date
Mon Oct 20 11:54:37 EDT 2014
$ cal
    October 2014    
Su Mo Tu We Th Fr Sa
          1 2 3 4
 5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
$ history
 1001 2014-10-20 11:54:08 export HISTTIMEFORMAT="%F %T "
 1002 2014-10-20 11:54:28 ls
 1003 2014-10-20 11:54:37 date
 1004 2014-10-20 11:54:49 cal
 1005 2014-10-20 11:55:29 history

To make this persistent, you can easily add the export command to your .bashrc file. You can also look at the man pages to get more information on how to craft custom time format strings: man 3 strftime

It's worth noting that once you enable time-stamping, any existing entries in your .bash_history file will automatically get stamped with the current time. This is important because it will not be possible to recover timestamps of commands that were entered before you enabled time-stamping. You must make sure to enable time-stamping before using your shell if you wish to have this information.

A Foundation for Buffer Overflow Attacks

Buffer overflows are some of the oldest and most important attacks against computer technology. These types of attacks are commonly associated with low level languages (like C and C++), but are not exclusive to them. Despite the importance of understanding this type of attack, there are still a large number of technical people who still don't fully understand it. Hopefully, this article will give you some basic insight into how buffer overflows work and why they are useful/dangerous. This guide will attempt to give you a very basic understanding on the concepts behind these attacks, but please bare in mind: due to a variety of protection mechanisms that are built into modern systems it is actually much more difficult to exploit modern systems using these attacks than it used to be. (If you are reading this to play The Hacker's Sandbox, then everything here is still applicable.)

So what is a buffer overflow?

Buffer overflows are attacks that allow for an unintentional (by design) change in the logical flow of an application. When information needs to be stored in memory, it will either land in the stack, or (for long-term and dynamic data) the heap. (This article will deal specifically with Stack overflows, though similar concepts will apply to the heap as well.) The stack is a region of memory that gets created on every thread that your application is running on. It works using a Last-In, First Out (LIFO) model, where data is said to be either pushed onto or popped off of the stack. When an application wants to store data into a buffer, it will allocate memory on the stack to be filled for that purpose. It can later be manipulated or moved to the heap as needed. The danger comes in when the application tries to write more data to the stack than has been allocated for the buffer. In this instance, an application can overwrite other important locations in memory, causing the program to corrupt or the logical flow of the program to change.

Examining the stack

To understand this a little better, let's take a look at an abstraction of the stack. You can easily visualize the stack using the following parts:


Image that you have three cards that you want to put down in the buffer. Card A, Card B, and Card C. You can push them down one at a time, first Card A, then Card B, last Card C. You will now have a buffer on the stack that looks like this:

  • Card C
  • Card B
  • Card A

If you wanted to then access Card B, first you would have to pop Card C off of the stack in order to access it. This is the basis for memory management on the stack, and is crucial to understand for understanding buffer overflows. We can take this a step further and see how a real-world function would work with the stack.

I am going to use the mmap() system function exposed by the Linux kernel. Looking at the man pages, you can see the function looks like this:

void *mmap(
 void *addr,
 size_t length,
 int prot,
 int flags,
 int fd,
 off_t offset

If we were to call this function, first we would push the return variable, then we would push each argument onto the stack in reverse order, and finally we would make the call to mmap(). (In this case, the function is void, so no return variable will be pushed.) Abstractly, it would look something like this:

PUSH off_t offset
PUSH int fd,
PUSH int flags,
PUSH int prot,
PUSH size_t length,
PUSH void *addr,
CALL mmap

Once finished, our actual stack will look like this:

void *addr,
size_t length,
int prot,
int flags,
int fd,
off_t offset

Don't worry about the actual data here, the important part is that you understand how the stack works to be able to understand how to exploit it. This is all well and good, but how does it actually help you to control the flow of a program? Well, In addition to holding the buffer, the stack also holds a frame pointer, and the address to return to after a function has finished executing. Let's take another look at that stack:


Did you notice it? The buffer is placed on the stack before the return address. If we could keep pushing data onto to buffer until it overflows into the RETURN_ADDRESS, we would change where the program thinks it should be jumping back to.

A little bit of math

Ok, so now we know the theory behind overflowing buffers, but how do we know how much data we need to actually exploit this? The truth is, that depends on a few factors, such as your architecture. Computers of different architectures will follow this same stack model, but their memory allocation won't always be the same. You see, every machine architecture has a minimum amount of storage it needs to allocate. Think of it in terms of blocks. A system can only reserve 1 block at a time. If your program were to request only half a block worth of data, the system would need to reserve a full block to satiate that request.

On a 32 bit system, this block is going to be 32 bits. On a 64 bit system, each block will be 64 bits.

So if we were to request 1 byte of data (8 bits), on a 32 bit system, we would end up reserving 4 bytes (32 bits), while on a 64 bit system, we would end up reserving 8 bytes (64 bits).

In order to actually change the RETURN_ADDRESS, first we would need to fill our complete buffer (all of the space reserved for us), then we would need to overflow the stack pointer (this will usually be 1 block of space), and finally we would be able to overwrite the RETURN_ADDRESS (also 1 block in size.)

To complicate this matter more, modern compilers will often use padding on the buffers which will depend on various factors (such as data type and size) which will effect the reserved memory size. Often, the easiest way to determine how large your buffer is would be to open your application in a debugger and actually look at the assembly. If you are playing The Hacker's Sandbox, assume that there is no padding from the compiler.

Some practical examples

The following is a small C application (overflow.c) which will have no measures in place to protect it from buffer overflows.

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
    char buf[10];
    return 0;

void the_shell()

This program is pretty simple. It takes the first argument into the program and places that into a buffer that is 10 bytes long. Notice that there is also a function called the_shell() which never actually gets called. The code exists inside of it, however, to spawn a shell running as the owner of the program. (The truth is, this would be a pretty useless program in real life, but it suits our demonstration very well.)

We could easily try to compile this into a program called overflow. For the purposes of this demonstration, we will be compiling on a 32 bit architecture:

$ gcc -o overflow overflow.c

Note: This compile would be incomplete on a modern machine. Modern compilers, like gcc, would have multiple mechanisms in place, like a stack guard and NX memory regions, which you would need to disable and/or bypass in order to successfully exploit your binaries.

Now, we can try playing around with this for a bit. If we passed an argument of 10 characters (say 0000000000), we will notice the program does nothing, and simply exits. However, if we pass a much longer argument, of say 30 characters, it will crash. The application crashes because key pieces of data become corrupt (like our return address, which likely now points to a nonexistent region of memory), and the logical flow of the program can not continue. So how do we send just enough data to overflow the stack and change the flow of execution without crashing the program? Let's try to fill the stack just enough to trick it into returning to our hidden function the_shell(). Conceptually, we will need to fill the stack so it looks like this on our 32 bit system:

[garbage data] 12 bytes [garbage data] 4 bytes [address of the_shell] 4 bytes

Remember, when the buffer requests 10 bytes, the smallest amount of 32 bit blocks (4 bytes) we can use is 3. Thus, 4 bytes X 3 = 12 bytes.

First, we would fire up our binary in a debugger. We'll use gdb for this example; just launch it against our executable to get an interactive debugging shell:

$ gdb overflow

We know from our source example that we want to try to find the location of the_shell() to fire off our attack. We can simply use gdb's disassembly command to dump out that function, and see what address the beginning of the function has been mapped to.

(gdb) disas the_shell
Dump of assembler code for function the_shell:
   0x080484a4 <+0>: push %ebp
   0x080484a5 <+1>: mov %esp,%ebp
   0x080484a7 <+3>: sub $0x18,%esp
   0x080484aa <+6>: movl $0x8048560,(%esp)
   0x080484b1 <+13>:    call 0x8048350 <puts@plt>
   0x080484b6 <+18>:    movl $0x8048567,(%esp)
   0x080484bd <+25>:    call 0x8048360 <system@plt>
   0x080484c2 <+30>:    leave  
   0x080484c3 <+31>:    ret    
End of assembler dump.

Here, we can see that the entry point for the_shell is 0x080484a4. We're almost there! Before we can execute our attack, we need to push the address onto the stack in the correct order. Remember, our input is being placed on the stack by strcpy() 1 byte at a time. As a result, our bytes are going to look reversed in a dump from the way we would expect to see it. Therefore, we are going to need to input the bytes (one at a time) in reverse order: 0xa4 0x84 0x04 0x08

Putting this all together

Finally, we're ready to launch the attack. We'll need to send garbage data to overflow the buffer (12 bytes for the buffer + 4 bytes for the SFP), and then our payload. We can easily just use 0's for our garbage data. Unfortunately, 0x08 and 0x04 are not printable characters, so we will need to find some way to inject these bytes into our program. A lot of hackers tend to like using a C, Perl, or Python program to do this. Personally, I just tend to use the echo command inside my Bash shell (but feel free to use whatever suits you best). Using the -e flag for echo, I can allow escaped sequences in my strings, and -n will suppress appending newline characters.

$ ./overflow 0000000000000000$(echo -ne "xa4x84x04x08")

Success! We've been able to alter the flow of execution in our binary to run the hidden shell code.

Again, if if you trying this on a modern system, there are a few more safe guards that need to be taken into consideration before this will work. For example, you would probably need to disable the NX flag from your binaries (you can use execstack on Linux systems to do this.) But if you are playing The Hacker's Sandbox, you won't need to worry about any of that.

For additional information on running buffer overflows against modern systems, I would recommend reading Smashing the Stack in 2011.

Firefox 31+ and sec_error_ca_cert_invalid

Some Background

In early 2014, the Mozilla foundation released a new library for certificate verification called mozilla::pkix. This library was built to be more robust and maintainable than its predecessors, using techniques such as building multiple trust chains to be used in the verification process. With this new library came some changes to the enforcement of some requirements in Mozilla's CA Certificate Policy. Where the most of users may not necessarily notice any differences, some subset of users have definitely been affected. Most specifically, using self-signed certificates can lead to failed verification of the certificate chain. In such instances, the entire connection itself will be refused, and you will be prompted with the following error:

An error occurred during a connection to

Issuer certificate is invalid.

(Error code: sec_error_ca_cert_invalid)

It is still currently possible to get around this, however, by disabling support for the new mozilla::pkix library. Be aware, however, that doing so may leave you at slightly higher risk to malicious connections being uncaught by verification. Use this method at your own risk.

Disabling mozilla::pix

  • In your Firefox browser, type about:config into the address bar, and hit enter.
  • Search for security.use_mozillapkix_verification and set it to true (you can double click on it to do so)

Now you should be able to reload the page you were trying to connect to and receive your familiar prompt about the unsafe connection. Simply accept the exception to continue on your way.

Dynamic Linker Voodoo: aka How to LD_PRELOAD

The dynamic linker is one of the most important yet widely overlooked components of a modern Operating System. Its job is to load and link-in executable code from shared libraries into executables at run time. There are many intricate details on how the dynamic linker does what it does, but one of the more interesting is the use of the LD_LIBRARY_PATH and LD_PRELOAD environment variables. When defined, these variables will let you override the default behaviour of the dynamic linker to load specific code into your executables. For example, using LD_PRELOAD, you can override the C standard library functions in libc with your own versions (think printf, puts, getc, etc.)

Let's see this in action! We'll start by making a simple program to test the (now deprecated) gets() function. Here, we will create a file called test.c and put the follow contents inside it:

#include <stdio.h>

int main (void)
  char str[128];
  printf ("Testing gets()...\n");
  return 0;

Note that this code is not safe and should not be used for production, but it makes a simple test scenario.

Next, we can compile the source with gcc. (Since gets() is deprecated, we're going to throw in the -w flag to suppress warning messages. We don't really care for this example.)

$ gcc -w -o test test.c

Finally, we can run the program and examine it's output:

$ ./test 
Testing gets()...

Success! When the executable is run, it links the gets() code from libc into memory and executes that code when we call gets(). Now let's see how we can override libc's implementation with our own. First, we'll write a new version of gets() that we want run. Make a file called mygets.c and enter the following:

char *gets( char *str )
  printf("Error: Stop using deprecated functions!\n");
  return "";

Once finished, we can compile this into our own shared object library:

gcc -w -fPIC -shared -o mygets.c

Finally, let's run the test executable again, but this time we will call it with LD_PRELOAD to load our custom shared library before dynamically linking libc:

$ LD_PRELOAD=./ ./test 
Testing gets()...
Error: Stop using deprecated functions!

As you can see, now our custom code is displaying where we were once being prompted for input. Of course, we could write any code we want to go in here. The only limit is whatever we can think up. This technique could be extremely useful when trying advanced debugging or when trying to replace specific parts of a shared library in your program. You can even take this a step further to create hooks for the original overridden functions.

To illustrate this, let's modify our shared library one more time:

#define _GNU_SOURCE

#include <stdio.h>
#include <dlfcn.h>

char *gets( char *str )
  printf("Error: Stop using deprecated functions!\n");
  char *(*original_gets)( char *str );
  original_gets = dlsym(RTLD_NEXT, "gets");
  return (*original_gets)(str);

We've done a few things here. We have now referenced the stdio header and we use dlsym to find the original gets() function. Notice that we use the RTLD_NEXT pseudo-handle with dlsym. In order to use this handle, we must include the _GNU_SOURCE test macro (otherwise RTLD_NEXT will not be found). This finds the next occurrence of gets() after the current library and allows us to map it to original_gets(). We can then use it in this function with the mapped name.

We can compile our library again to test out the new code (this time linking the dl lib):

$ gcc -w -fPIC -shared -o mygets.c -ldl

Using this method, we can run our test executable again:

$ LD_PRELOAD=./ ./test 
Testing gets()...
Error: Stop using deprecated functions!

At this point, you should notice the custom code we provided for gets(), followed by the prompt by the original libc function. Hopefully this dispels a little bit of the voodoo and gives you another valuable tool to stash in your belt.

Google C++ Style Guide

I am a really huge fan of nicely formatted and presented documentation. Sometimes I can get a little too OCD about this. If you haven't reviewed any of Google's coding style guides you should, they have a lot of nice information in there, and their views are worth considering. Lately I've decided to try and conform to their C++ guide a bit, but perusing through their doc was just ugly for offline viewing. It may not matter to most people, but I made a slightly cleaner version of the 3.274 revision of the guide in PDF format.

Download Google's C++ Style Guide in PDF here

Creating RSA public keys from private keys

Generating a public RSA key is a pretty simple task. More often than not, keys will be generated in pairs. But sometimes you will be given an RSA private key and you will need to create your own public key from it. (For example, when using Amazon EC2). Fortunately, you only need to remember a quick one-liner to do so:

$ ssh-keygen -y -f mykey.pem >

The -y flag tells ssh-keygen to create a public key based off of the private key passed to the -f flag. It's really that simple. by default, this will simple print to stdout, so redirecting output to a file (as in the example here) is your quickest way to saving your new key.