• Checkpoint: Aim to complete a couple phases by 11:00pm Thursday 5 November.
  • Due: 11:00pm Monday 9 November
  • Starter code: fork wellesleycs240 / cs240-buffer and add bpw as admin
  • Submissions: Commit and push your final version.
  • Relevant Reading: CSAPP Chapter 3 (especially 3.12)
  • Collaboration: For this assignment, you may discuss and develop reverse-engineering strategies together with other students. but you may not “drive” another student’s reverse engineering process or allow someone else “drive” yours. For example, you should not let someone else type or dictate a series of gdb commands to you or vice versa. You may discuss stack layouts, for example.

Overview

Impressed with your skills in defusing binary whizbangs, an infamous after-market magical artifact enhancement shop has called you in for an interview. To get the job, you must use some shadowy techniques to cause an unassuming pink umbrella to act like a whizbang. These techniques can easily cause our unassuming umbrella to misfire if you apply them incorrectly, but the job pays well if you get it!

Translation: This assignment helps you develop a detailed understanding of the call stack organization on a 32-bit x86 processor. It involves applying a series of buffer overflow attacks on an executable file called umbrella.

In this assignment, you will gain firsthand experience with one of the methods commonly used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.

Contents

Suggested Practice

CSAPP Practice Problems 3.30, 3.31, 3.33, and others nearby are good review of the stack discipline.

Instructions

After you fork wellesleycs240 / cs240-buffer, you should find the following files in your working copy:

  • makecookie: generates a “cookie” based on some string (which will be your username)
  • umbrella: executable you will attack
  • umbrella.c: important parts of C code used to compile umbrella
  • sendstring: utility to help convert between string formats.
  • Makefile: helps prepare exploits for submission

All of these programs are compiled to run on the CS Linux machines. The rest of the instructions assume that you will be performing your work there.

You will save your buffer overflow exploits for different levels in these files:

  • smoke.txt - Level 0 exploit
  • fizz.txt - Level 1 exploit
  • bang.txt - Level 2 exploit
  • boom.txt - Level 3 exploit
  • id.txt - contains your username

You are also required to keep a “journal” describing how you reverse-engineered the executable, how you constructed your exploit, and why it works. (Or, if it’s not quite working, how it is supposed to work.) Follow the pattern of your whizbang descriptions for this. We are looking for explanations that show you understand the big picture of how and why your exploit works, not just what it does. Save this in either of these files:

  • journal.txt or journal.pdf

Grading is roughly:

  • Level 0: 20 points
  • Level 1: 30 points
  • Level 2: 35 points
  • Level 3: 15 points

where the grade for each level depends on both your exploit and your descriptions.

Be sure to read this document carefully before beginning your work.

Aside: Line Endings

This is a non-issue if you work solely on Linux/Unix/Mac OS X machines.

Linux (and all flavors of UNIX in general, including Mac OS X) use a different line ending from Windows and old Mac OS (pre-Mac OS X) in text files. The reason for this difference is historical: early printers need more time to move the print head back to the beginning of the next line that to print a single character, so someone introduced the idea of separate line feed \n and carriage return \r characters. Windows and HTTP use the \r\n pairs, classic Mac OS used \r, and Linux/Unix/Mac OS X use \n. In this assignment it is important that your lines end with line feed (\n), not any of the alternative line endings. This should not be an issue if working on the CS Linux machines.

For the purposes of this assignment, a cookie is a string of four bytes (or 8 hexadecimal digits) that is (with high probability) unique to you. You can generate your cookie with the makecookie program giving your Bitbucket username as the argument:

$ ./makecookie wendyw
0x5e57e632

(Of course, you should replace wendyw with your own username. While you are doing this, you might as well prepare the first file you need to turn in: id.txt

$ echo your_bitbucket_username > id.txt

This will generate a text file containing your username followed by a single newline.

In most of the attacks in this assignment, your objective will be to make your cookie show up in places where it ordinarily would not.

How to use an umbrella

The umbrella program reads a string from standard input with the function getbuf():

unsigned getbuf() {
  char buf[36];
  volatile char* variable_length;
  int i;
  unsigned val = (unsigned)Gets(buf);
  variable_length = alloca((val % 40) < 36 ? 36 : val % 40);
  for(i = 0; i < 36; i++) {
	variable_length[i] = buf[i];
  }
  return val % 40;
}

Do not worry about variable_length, val, and alloca() for now. All you need to know is that getbuf() calls the function Gets() and returns some arbitrary value.

The function Gets() is similar to the standard C library function gets()—it reads a string from standard input (terminated by \n) and stores it (followed by a null terminator, \0) at the specified destination. In the above code, the destination is an array buf with space sufficient for 36 characters.

Neither Gets() nor gets() has any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, possibly overrunning the bounds of the storage allocated at the destination.

If the string typed by the user to getbuf() is less than 36 characters long, it is clear that getbuf() will return some value less than 0x28, as shown by the following execution example:

$ ./umbrella
Type string: howdy doody
Dud: getbuf returned 0x20

It’s possible that the value returned might differ for you, since the returned value is derived from the location on the stack that Gets() is writing to. The returned value will also be different depending on whether you run the umbrella inside gdb or run it outside of gdb for the same reason.

Typically, an error occurs if we type a longer string:

$ ./umbrella
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!

As the error message indicates, overrunning the buffer typically causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella so that it does more interesting things. These are called exploit strings.

umbrella must be run with the -u your_username flag, which operates the umbrella for the indicated username. (We will feed umbrella your username with the -u flag when grading your solutions.) umbrella determines the cookie you will be using based on this flag value, just as the program makecookie does. Some of the key stack addresses you will need to use depend on your cookie.

Formatting Exploit Strings

Your exploit strings will typically contain byte values that do not correspond to the ASCII values for printing characters. The program sendstring can help you generate these raw strings. sendstring takes as input a hex-formatted string and prints the raw string to standard output. It expects input on standard in unless it is given the -f file option, and then it reads from file. In a hex-formatted string, each byte value is represented by two hex digits. Byte values are separated by spaces. For example, the string "012345" could be entered in hex format as 30 31 32 33 34 35. (The ASCII code for decimal digit N is 0x3N. Run man ascii for a full table.) Non-hex digit characters are ignored, including the blanks in the example shown.

If you generate a hex-formatted exploit string in a file named exploit.txt, you can send it to umbrella through a couple of pipes (see tools if you are unfamiliar with pipes – they take the output of one program and direct it as input to another program):

$ cat exploit.txt | ./sendstring | ./umbrella -u your_username

Or you can store the raw bytes in a file and use I/O redirection to supply it to umbrella:

$ ./sendstring -f exploit.txt > exploit.bytes
$ ./umbrella -u your_username < exploit.bytes

With the above method, when running umbrella from within gdb, you can pass in the exploit string as follows:

$ gdb ./umbrella
(gdb) run -u your_username < exploit.bytes

One important point: your exploit string must not contain byte value 0x0A at any intermediate position, since these are the ASCII codes for newline ('\n'). When Gets() encounters this byte, it will assume you intended to terminate the string input. sendstring will warn you if it encounters this byte value.

When using gdb, you may find it useful to save a series of gdb commands to a text file and then use the -x commands.txt flag. This saves you the trouble of retyping the commands every time you run gdb. You can read more about the -x flag in gdb’s man page.

Generating Byte Codes

You may wish to come back and read this section later after looking at the problems.

Using gcc as an assembler and objdump as a disassembler makes it convenient to generate the byte codes for instruction sequences. For example, suppose we write a file example.s containing the following assembly code:

# Example of hand-generated assembly code
movl $0x1234abcd,%eax    # Move 0x1234abcd to %eax
pushl $0x401080          # Push 0x401080 on to the stack
ret                      # Return

The code can contain a mixture of instructions and data. Anything to the right of a # character is a comment.

We can now assemble and disassemble this file:

$ gcc -m32 -c example.s
$ objdump -d example.o > example.d

The generated file example.d contains the following lines:

   0:	b8 cd ab 34 12       	mov    $0x1234abcd,%eax
   5:	68 80 10 40 00       	push   $0x401080
   a:	c3                   	ret

Each line shows a single instruction. The number on the left indicates the starting address (starting with 0), while the hex digits after the : character indicate the byte codes for the instruction. Thus, we can see that the instruction pushl $0x401080 has a hex-formatted byte code of 68 80 10 40 00.

If we read off the 4 bytes starting at address 6 we get: 80 10 40 00. This is a byte-reversed version of the data word 0x00401080. This byte reversal represents the proper way to supply the bytes as a string, since a little-endian machine lists the least significant byte first.

Finally, we can read off the byte sequence for our code:

b8 cd ab 34 12 68 80 10 40 00 c3

The Exploits

There are three functions that you must exploit for this assignment. The exploits increase in difficulty. There is a fourth function you can exploit for extra pizzaz if you are having fun.

Correction/clarification [Tues. 31 March]: There are four functions to exploit for this assignment. The exploits increase in difficulty. There is an addition (“Mayhem”) to the last function that you can exploit for extra pizzaz if you are having fun. Keep in mind that the grading relies on both exploits and your documentation, so describe your approach to all stages, especially any you cannot get working.

Level 0: Candle

The function getbuf() is called within umbrella by a function test():

void test() {
  unsigned val;
  volatile unsigned local = 0xdeadbeef;
  char* variable_length;
  entry_check(3);  /* Make sure entered this function properly */
  val = getbuf();
  if (val <= 40) {
	variable_length = alloca(val);
  }
  entry_check(3);
  /* Check for corrupted stack */
  if (local != 0xdeadbeef) {
	printf("Sabotaged!: the stack has been corrupted\n");
  } else if (val == cookie) {
	printf("Boom!: getbuf returned 0x%x\n", val);
	if (local != 0xdeadbeef) {
	  printf("Sabotaged!: the stack has been corrupted\n");
	}
	validate(3);
  } else {
	printf("Dud: getbuf returned 0x%x\n", val);
  }
}

When getbuf() executes its return statement, the program ordinarily resumes execution within function test(). Within the file umbrella, there is a function smoke():

void smoke() {
    entry_check(0); /* Make sure entered this function properly */
    printf("Smoke!: You called smoke()\n");
    validate(0);
    exit(0);
}

Your task is to get umbrella to execute the code for smoke() when getbuf() executes its return statement, rather than returning to test(). You can do this by supplying an exploit string that overwrites the stored return pointer in the stack frame for getbuf() with the address of the first instruction in smoke. Note that your exploit string may also corrupt other parts of the stack state, but this will not cause a problem, because smoke() causes the program to exit directly.

Advice

  • All the information you need to devise your exploit string for this level can be determined by examining a disassembled version of umbrella.
  • Be careful about byte ordering (i.e., endianness).
  • You might want to use gdb to step the program through the last few instructions of getbuf() to make sure it is doing the right thing.
  • The placement of buf within the stack frame for getbuf() depends on which version of gcc was used to compile umbrella. You will need to pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary.
  • Check the line endings of smoke.txt with hexdump -C smoke.txt.

Level 1: Sparkler

Within the umbrella there is also a function fizz():

void fizz(unsigned val) {
  entry_check(1);  /* Make sure entered this function properly */
  if (val == cookie) {
	printf("Fizz!: You called fizz(0x%x)\n", val);
	validate(1);
  } else {
	printf("Misfire: You called fizz(0x%x)\n", val);
  }
  exit(0);
}

Similar to Level 0, your task is to get umbrella to execute the code for fizz() rather than returning to test. In this case, however, you must make it appear to fizz as if you have passed your cookie as its argument. You can do this by encoding your cookie in the appropriate place within your exploit string.

Advice

  • You can use gdb to get the information you need to construct your exploit string. Set a breakpoint within getbuf() and run to this breakpoint. Determine parameters such as the address of val and the location of the buffer.

Level 2: Firecracker

A much more sophisticated form of buffer attack involves supplying a string that encodes actual machine instructions. The exploit string then overwrites the return pointer with the starting address of these instructions. When the calling function (in this case getbuf) executes its ret instruction, the program will start executing the instructions on the stack rather than returning. With this form of attack, you can get the program to do almost anything. The code you place on the stack is called the exploit code. This style of attack is tricky, though, because you must get machine code onto the stack and set the return pointer to the start of this code.

For Level 2, you will need to run your exploit within gdb for it to succeed. (Modern systems use memory protection mechanisms to prevent execution of memory locations in the stack and guard against exactly this type of attack. Since gdb works a little differently than normal program execution, it allows the exploit to succeed.)

Within the file umbrella there is a function bang():

unsigned global_value = 0;

void bang(unsigned val) {
  entry_check(2);  /* Make sure entered this function properly */
  if (global_value == cookie) {
	printf("Bang!: You set global_value to 0x%x\n", global_value);
	validate(2);
  } else {
	printf("Misfire: global_value = 0x%x\n", global_value);
  }
  exit(0);
}

Similar to Levels 0 and 1, your task is to get umbrella to execute the code for bang() rather than returning to test(). Before this, however, you must set global variable global_value to your cookie. Your exploit code should set global_value, push the address of bang() on the stack, and then execute a ret instruction to cause a jump to the code for bang().

Advice:

  • Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You can let tools do all of the work by writing an assembly code file containing the instructions and data you want to put on the stack. Assemble this file with gcc and disassemble it with objdump. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.)
  • Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie. Make sure your exploit string works on the CS Linux machines, and make sure you include your Bitbucket username on the command line to umbrella.
  • Watch your use of address modes when writing assembly code. Note that movl $0x4, %eax moves the value 0x00000004 into register %eax; whereas movl 0x4, %eax moves the value at memory location 0x00000004 into %eax, which is not likely your intnent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!)

  • Do not attempt to use either a jmp or a call instruction to jump to the code for bang(). These instructions use PC-relative addressing, which is very tricky to set up correctly in this attack. Instead, push an address on the stack and use the ret instruction.

Level 3: Whizbang

For level 3, you will need to run your umbrella exploit within gdb for it to succeed.

Our preceding attacks have all caused the program to jump to the code for some other function, which then causes the program to exit. As a result, it was acceptable to use exploit strings that corrupt the stack, overwriting the saved value of register %ebp and the return pointer.

The most sophisticated form of buffer overflow attack causes the program to execute some exploit code that patches up the stack and makes the program return to the original calling function (test() in this case). The calling function is oblivious to the attack. This style of attack is tricky, though, since you must: (1) get machine code onto the stack, (2) set the return pointer to the start of this code, and (3) undo the corruptions made to the stack state.

Your job for this level is to supply an exploit string that will cause getbuf() to return your cookie back to test(), rather than the value 1. You can see in the code for test() that this will cause the program to go Boom!. Your exploit code should set your cookie as the return value, restore any corrupted state, push the correct return location on the stack, and execute a ret instruction to really return to test().

Advice:

  • In order to overwrite the return pointer, you must also overwrite the saved value of %ebp. However, it is important that this value is correctly restored before you return to test(). You can do this by either (1) making sure that your exploit string contains the correct value of the saved %ebp in the correct position, so that it never gets corrupted, or (2) restore the correct value as part of your exploit code. You’ll see that the code for test() has some explicit tests to check for a corrupted stack.

  • The NOP instruction is useful when constructing this style of buffer overflow exploit, used in a pattern called a “NOP sled”.

  • You can use gdb to get the information you need to construct your exploit string. Set a breakpoint within getbuf() and run to this breakpoint. Determine parameters such as the saved return address and the saved value of %ebp.

  • Again, let tools such as gcc and objdump do all of the work of generating a byte encoding of the instructions.

  • Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie. Again, again make sure your exploit string works on the CS Linux machines, and make sure you include your Bitbucket username on the command line to umbrella.

Reflect on what you have accomplished. You caused a program to execute machine code of your own design. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss.

Mayhem

execve is system call that replaces the currently running program with another program inheriting all the open file descriptors. What are the limitations the exploits you have preformed so far? How could calling execve allow you to circumvent this limitation? If you have time, try writing an additional exploit that uses execve and another program to print a message.

Submitting Your Work

Make sure the following files are committed (hg commit) and pushed (hg push) so they appear in your Bitbucket repository for this assignment:

  • smoke.txt - Level 0 exploit
  • fizz.txt - Level 1 exploit
  • bang.txt - Level 2 exploit
  • boom.txt - Level 3 exploit
  • id.txt - contains your username
  • journal.txt or journal.pdf - log of the reverse engineering discoveries and techniques that enabled your exploits

The first four files correspond to the different exploits. Each of these files should contain only the hex-formatted exploit string (i.e., input to sendstring, not the actual byte output of sendstring). The id.txt file should contain your CS username followed by an empty line. We will run your exploits. Additional text or different naming will cause them to fail.

Before submitting your exploits, you can check them by placing them in the same directory as umbrella and running make test. This will generate a summary of your exploits and whether they succeed. The Makefile looks for the four exploit files above and sends the contents of each to umbrella. Update: you may have a stale version of the accompanying Makefile.