CS 240: Dark Buffer Arts

code Smash the stack to understand calling conventions and security concerns.

Assign: Monday, 7 November
Checkpoint: aim to complete at least two stages by 11:59pm Monday, 14 November
Due: 11:59pm Thursday, 17 November
Starter Code: fork wellesleycs240 / cs240-buffer (just once!), keep the "cs240-buffer" name, and add bpw as admin. (Need help?)
Submit:
- Commit and push your final revision and double-check that you submitted the up-to-date version. (Need help?)
- Do not submit a paper copy.
Relevant Reference:
Collaboration: Individual code assignment policy, as defined by the syllabus.

Overview

Silly version: Impressed by your recent quest for the Sourceror’s Code, an infamous after-market magical artifact enhancement shop has called you in as a consultant. Your task is to augment the powers of an unassuming pink umbrella on behalf of a client who wishes to use it for daily tasks for which it was never intended.

Serious version: This assignment helps you develop a detailed understanding of the call stack organization by deploying a series of buffer overrun attacks on a vulnerable executable file called umbrella.

Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.

Overview
Setup
Tasks
Grading
The umbrella
Tools for Crafting Exploits
Running and Testing Exploits
Exploits

Setup

Do this assignment in one of the CS 240 computing environments. The executables for this assignment were compiled specifically for the CS Linux machines and the wx appliance. As usual, fork wellesleycs240 / cs240-buffer and clone your Bitbucket repository to your machine.

Files: Your working copy should provide:

descriptions.txt: file for English descriptions of your exploits
exploit1.hex: file for Exploit 1
exploit2.hex: file for Exploit 2
exploit3.hex: file for Exploit 3
exploit4.hex: file for Exploit 4
hex2raw: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytes
id2cookie: utility to convert user ID to unique “cookie” value
Makefile: recipes to test your exploits
umbrella: executable you will attack
umbrella.c: important parts of C code used to compile umbrella

Create your cookie: Most attacks in this assignment will require you to make a unique¹ 8-byte “cookie” value show up in places where it ordinarily would not. This value will also determine the exact behavior of your executable. To create your personalized cookie, run make cookie and enter your Bitbucket ID. This will print your cookie in hex and record your Bitbucket ID and cookie in the files id.txt and cookie.txt.

Tasks

You must craft exploit strings that accomplish four increasingly sophisticated buffer overrun attacks when provided as input to the vulnerable umbrella executable. Each exploit is described below.

Submit two parts for each exploit:

Exploit string (input): Write your exploit string in hex2raw input format in each of the files exploit1.hex through exploit4.hex.
Description: In the separate descriptions.txt file, write a succinct paragraph describing succinctly in English how the exploit works. Your description should demonstrate that you understand:
- What existing stack memory contents are overwritten by what parts of your exploit string when it is copied into memory.
- What instructions execute, using what data, to accomplish the attack.
- How these relate to the calling conventions and stack discipline.
As with the x86 rune descriptions, show that you understand the relation between the specifics of your exploit and the higher-level context for how and why it works.
- Do not give an exhaustive instruction-by-instruction account of the exploit’s execution or simply translate the code to English. (e.g., “Next, it adds 24 to %rsp, then it pops the value from the top of the stack and stores it in %rbx, then it returns…“)
- Do focus on the instructions that involve stack or exploit data and accomplish key steps in the exploit. (e.g., “Then, the acme function loads the contents of the widget variable from the stack, which the exploit string has overwritten with the magic number 42, causing the following computation to produce the result 34 instead of the expected 12.”)
- For each exploit, focus on what is new or different from the last exploit. Do not re-explain the basics that carry over from previous exploits.

Grading

The assignment is graded from a maximum of 100 points:

80 points for exploits (20 points each). Run make test to check all of your exploits.
20 points for descriptions. We will grade descriptions of one or two exploits (chosen randomly), by the criteria above.

The `umbrella`

The umbrella executable requires a user ID argument on the command line and reads a string from standard input once it starts up. The user ID customizes stack layout and verifies a unique “cookie” value that your attacks must provide.

Usage

To run the umbrella executable:

$ ./umbrella -u your_bitbucket_username
Type string:

Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:

$ ./umbrella -u $(cat id.txt)
Type string:

Input Vulnerability

The umbrella executable reads a string from standard input with the function getbuf():

unsigned long long getbuf() {
  char buf[36];
  // ...
  unsigned long long val = (unsigned long long)Gets(buf);
  // ...
  return val % 40;
}

The full version of this function contains more code for an optional additional challenge. The part shown here is sufficient for the required parts of this assignment. The key feature to note is that getbuf() calls the function Gets(), passing the address of its local array buf, which is allocated on the stack with space for 36 chars.

The function char* Gets(char* buf) is similar to the standard C library function char* gets(char* buf). It reads a string from standard input, terminated by a newline character ('\n'), and stores the characters of the string, followed by a null terminator ('\0') starting at the memory address given by its argument, buf. It returns its argument.

Neither Gets() nor gets() has any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, assuming the destination is large enough and thus possibly over-running the bounds of the storage allocated at the destination.

If the input string read by getbuf() is less than 36 characters long, it is clear that getbuf() will return some value less than 0x28 (that’s 40₁₀), as shown by the following execution example:

$ ./umbrella -u your_bitbucket_username
Type string: Acromantula!
Dud: getbuf returned 0x20

The value returned might differ for you, since it is derived from the address of buf on the stack, which may vary between systems. Running the umbrella under gdb will also yield different values than it does outside gdb.

Typically, an error occurs if we type a longer string:

$ ./umbrella -u your_bitbucket_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!

As the error message indicates, over-running the buffer typically causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella so that it does more interesting things. These are called exploit strings.

Tools for Crafting Exploits

Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.

Formatting Exploit Strings with `hex2raw`

Each ASCII character in a string is represented by one byte. For example 'A' is represented by the byte value also described by the hexadecimal number value 0x41. While your exploits will be delivered under the guise of strings, they will embed sequences of bytes encoding addresses, numbers, or other non-character data. It is hard enough to map each desired byte value in your exploit back to a character by hand, but often, the specific bytes required do not even correspond to any typeable or printable ASCII characters, making it “difficult” to type your exploit string on a keyboard or view it on the screen. Do not try to encode your exploit by hand!

We have provided a tool called hex2raw to encode exploit strings:

The input to hex2raw is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces.
The output of hex2raw is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.

Suppose we want the sequence of bytes whose values are the hexadecimal numbers 0x41, 0x42, 0x43, 0x1b. These same values, when interpreted with the ASCII encoding, mean the characters 'A', 'B', 'C', followed by the ASCII “escape” (ESC) character, which not treated as a string character when typed on the keyboard or printed to the terminal output. Given the input 41 42 43 1b, the hex2raw utility will output the desired 4-byte sequence.

To run hex2raw, type the series of hexadecimal byte value descriptions you want in a file (e.g., exploit1.hex for Exploit 1). Following our example, we could save the string 41 42 43 1b into the file exploit1.hex using Emacs. Then run:

$ ./hex2raw < exploit1.hex > exploit1.bytes

The input redirection symbol < instructs the command-line shell to use the contents of exploit1.hex as standard input to hex2raw, instead of looking for input from the keyboard. The output redirection symbol > instructs the command-line shell to store the standard (printed) output of hex2raw into a file called exploit1.bytes. Input and output redirection (< and >) are general features of the command-line shell that can be used independently and with any executable command.

Once the exploit string byte sequence is stored into the file exploit1.bytes, run umbrella with the contents of the file exploit1.bytes as input:

$ ./umbrella -u your_bitbucket_username < exploit1.bytes

Naturally, as with compiled source code, if you update your exploit string specification in exploit1.hex, you must run hex2raw again to translate the new version to a byte sequence in exploit1.bytes to use this new exploit with the umbrella.

Warning: do not use 0A

Your exploit string must not contain byte value 0x0A (0A in hex2raw input) at any intermediate position, since this is the ASCII code for newline ('\n'). When Gets() encounters this byte, it will assume you intended to terminate the string input. hex2raw will warn you if it encounters this byte value.

Byte-Encoding Instructions

You may wish to come back and read this section later after looking at the exploits. When including instructions as part of an exploit payload, you must use the instruction encoding as machine code, the byte sequence used to encode an instruction like pushq %rax for the machine. This is not the byte sequence representing the string "pushq %rax".

Use gcc as an assembler and objdump as a disassembler to generate the byte codes for instruction sequences. Suppose we write a file example.s containing the following assembly code:

# Example of hand-generated assembly code
movq $0x1234abcd,%rax    # Move 0x1234abcd to %rax
pushq $0x401080          # Push 0x401080 on to the stack
retq                     # Return

The code can contain a mixture of instructions and data. Anything to the right of a # character is a comment.

We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:

$ gcc -c example.s
$ objdump -d example.o > example.d

The generated file example.d contains the following lines:

   0:	48 c7 c0 cd ab 34 12 	mov    $0x1234abcd,%rax
   7:	68 80 10 40 00       	pushq  $0x401080
   c:	c3                   	retq

Each line shows a single instruction. The number on the left indicates the starting address (starting with 0), while the hex digits after the : character indicate the byte codes for the instruction, in memory order from left to right. Thus, we can see that the instruction pushq $0x401080 has a hex-formatted byte code of 68 80 10 40 00.

If we read the 4 bytes starting at address 6 we get: 80 10 40 00. This is shows the bytes of 4-byte value 0x00401080 in little endian order, with the byte at the lowest address shown on the left and the byte at the highest address shown on the right.

Finally, we can read the byte sequence for our code: 48 c7 c0 cd ab 34 12 68 80 10 40 00 c3

Running and Testing Exploits

Test all exploits (used for grading):

Save your exploits in hex2raw input format in the proper files.
Run make test to translate and test each exploit and generate a summary.

Run an individual exploit:

Write the exploit string in hex2raw input format in, e.g., the file exploit1.hex.

Translate it to raw bytes with hex2raw:

 $ ./hex2raw < exploit1.hex > exploit1.bytes

Run it directly (possible for Exploits 1 and 2):

 $ ./umbrella -u your_bitbucket_username < exploit1.bytes

or under gdb (required for Exploits 3 and 4):

 $ gdb ./umbrella
 [... gdb startup output ...]
 (gdb) run -u your_bitbucket_username < exploit1.bytes

GDB Scripts

When using gdb, you may find it useful to save a series of gdb commands to a text file (e.g., commands.txt) and then use the -x commands.txt flag, which runs each line of the file as a command in gdb. This saves the trouble of retyping the commands every time you run gdb. You can read more about the -x flag in gdb’s man page.

Exploits

Save your buffer overrun exploit strings in hex2raw input format in the files exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex.

Exploit 1: Candle

The function getbuf() is called within umbrella by a function test():

void test() {
  volatile unsigned long long val;
  volatile unsigned long long local = 0xdeadbeef;
  char* variable_length;
  entry_check(3);  /* Make sure entered this function properly */
  val = getbuf();
  if (val <= 40) {
    variable_length = alloca(val);
  }
  entry_check(3);
  /* Check for corrupted stack */
  if (local != 0xdeadbeef) {
    printf("Sabotaged!: the stack has been corrupted\n");
  } else if (val == cookie) {
    printf("Boom!: getbuf returned 0x%llx\n", val);
    if (local != 0xdeadbeef) {
      printf("Sabotaged!: the stack has been corrupted\n");
    }
    if (val != cookie) {
      printf("Sabotaged!: control flow has been disrupted\n");
    }
    validate(3);
  } else {
    printf("Dud: getbuf returned 0x%llx\n", val);
  }
}

When getbuf() executes its return statement, the program ordinarily resumes execution within function test(). Within the file umbrella, there is a function smoke():

void smoke() {
    entry_check(0); /* Make sure entered this function properly */
    printf("Smoke!: You called smoke()\n");
    validate(0);
    exit(0);
}

Your task is to get umbrella to execute the code for smoke() when getbuf() executes its return statement, rather than returning to test(). You can do this by supplying an exploit string that overwrites the stored return pointer in the stack frame for getbuf() with the address of the first instruction in smoke. Note that your exploit string may also corrupt other parts of the stack state, but this will not cause a problem, because smoke() causes the program to exit directly.

Advice

All the information you need to devise this exploit string can be determined by examining a disassembled version of umbrella.
Be careful about byte ordering (i.e., endianness).
You might want to use gdb to step the program through the last few instructions of getbuf() to make sure it is doing the right thing.
The placement of buf within the stack frame for getbuf() depends on which version of gcc was used to compile umbrella. You will need to pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary.
Don’t forget to use hex2raw to simplify your job.

Exploit 2: Soda Fountain

Within the umbrella there is also a function fizz():

void fizz(int arg1, char arg2, long arg3,
    char* arg4, short arg5, short arg6, unsigned long long val) {
  entry_check(1);  /* Make sure entered this function properly */
  if (val == cookie) {
	printf("Fizz!: You called fizz(0x%llx)\n", val);
	validate(1);
  } else {
	printf("Misfire: You called fizz(0x%llx)\n", val);
  }
  exit(0);
}

Similar to Exploit 1, your task is to get umbrella to execute the code for fizz() rather than returning to test. In this case, however, you must make it appear to fizz as if you have passed your cookie as its argument. You can do this by encoding your cookie in the appropriate place within your exploit string.

Advice

Recall that the first six arguments are passed in registers and additional arguments are passed on the stack. Your exploit code needs to write to the appropriate place within the stack. This explains our somewhat contrived fizz parameters.
You can use gdb to get the information you need to construct your exploit string. Set a breakpoint within getbuf() and run to this breakpoint. Determine key features such as the address of val and the location of the buffer.

Exploit 3: Door Buster

A much more sophisticated form of buffer attack involves supplying a string that encodes actual machine instructions. The exploit string then overwrites the return pointer with the starting address of these instructions. When the calling function (in this case getbuf) executes its ret instruction, the program will start executing the instructions on the stack rather than returning. With this form of attack, you can get the program to do almost anything. The code you place on the stack is called the exploit code. This style of attack is tricky, though, because you must get machine code onto the stack and set the return pointer to the start of this code.

You will need to run umbrella under gdb for this exploit to succeed. (Modern systems use memory protection mechanisms to prevent execution of memory locations in the stack and guard against exactly this type of attack. Since gdb works a little differently than normal program execution, it allows the exploit to succeed.)

Within the file umbrella there is a function bang():

unsigned global_value = 0;

void bang(unsigned long long val) {
  entry_check(2); /* Make sure entered this function properly */
  if (global_value == cookie)  {
    printf("Bang!: You set global_value to 0x%llx\n", global_value);
    validate(2);
  } else  {
    printf("Misfire: global_value = 0x%llx\n", global_value);
  }
  exit(0);
}

Similar to Exploits 1 and 2, your task is to get umbrella to execute the code for bang() rather than returning to test(). Before this, however, you must set global variable global_value to your cookie. Your exploit code should set global_value, push the address of bang() on the stack, and then execute a ret instruction to cause a jump to the code for bang().

Advice:

You will need to run umbrella under gdb for this exploit to succeed.
Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You can let tools do all of the work by writing an assembly code file containing the instructions and data you want to put on the stack. Assemble this file with gcc and disassemble it with objdump. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.)
Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie.
Watch your use of address modes when writing assembly code. Note that movq $0x4, %rax moves the value 0x0000000000000004 into register %rax; whereas movq 0x4, %rax moves the value in memory at address 0x0000000000000004 into %rax, which is not likely your intent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!)
Do not attempt to use either a jmp or a call instruction to jump to the code for bang(). These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use the ret instruction.

Exploit 4: Whizbang Basic Blaze Box

You will need to run umbrella under gdb for this exploit to succeed.

Our preceding attacks have all caused the program to jump to the code for some other function, which then causes the program to exit. As a result, it was acceptable to use exploit strings that corrupt the stack, overwriting the saved value of register %rbp and the return pointer.

The most sophisticated form of buffer overrun attack causes the program to execute some exploit code that patches up the stack and makes the program return to the original calling function (test() in this case). The calling function is oblivious to the attack. This style of attack is tricky, though, since you must: (1) get machine code onto the stack, (2) set the return pointer to the start of this code, and (3) undo the corruption made to the stack state.

Your job is to supply an exploit string that will cause getbuf() to return your cookie back to test(), rather than the value 1. You can see in the code for test() that this will cause the program to go Boom!. Your exploit code should set your cookie as the return value, restore any corrupted state, push the correct return location on the stack, and execute a ret instruction to really return to test().

Advice:

You will need to run umbrella under gdb for this exploit to succeed.
In order reach the return address slot on the stack, your exploit string must also cover all the items saved on the stack between the buf array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return to test() may depend on it. Consider carefully what is stored here on the stack during getbuf, its original source, where it stored after getbuf completes, and how test may use it. Use gdb to inspect the disassembled code of getbuf and test. Determine how you can organize your exploit to avoid disturbing stack data in getbuf on which test later relies.
Let tools such as gcc and objdump do all of the work of generating a byte encoding of the instructions.
Keep in mind that your exploit string depends on your cookie, your machine, and your compiler.

Reflect on what you have accomplished.

You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant security problem!

Mayhem (optional extra credit)

execve is a system call that replaces the currently running program with another program inheriting all the open file descriptors. What are the limitations of the exploits you have performed so far? How could calling execve allow you to circumvent this limitation? If you have time, try writing an additional exploit (mayhem.hex) that uses execve and another program to print a message. Talk to Ben if you’re curious about more.

There is high probability your cookie is unique. ↩

Overview

Contents

Setup

Tasks

Grading

The umbrella

Usage

Input Vulnerability

Tools for Crafting Exploits

Formatting Exploit Strings with hex2raw

Byte-Encoding Instructions

Running and Testing Exploits

GDB Scripts

Exploits

Exploit 1: Candle

Advice

Exploit 2: Soda Fountain

Advice

Exploit 3: Door Buster

Advice:

Exploit 4: Whizbang Basic Blaze Box

Advice:

Mayhem (optional extra credit)

The `umbrella`

Formatting Exploit Strings with `hex2raw`