code Smash the stack to understand calling conventions and security concerns.

Contents

Overview1

Silly version: Impressed by your recent quest for the Sourceror’s Code, an infamous after-market magical artifact enhancement shop has called you in as a consultant. Your task is to augment the powers of an unassuming pink umbrella on behalf of a client who wishes to use it for daily tasks for which it was never intended.

Serious version: This assignment helps you develop a detailed understanding of the call stack organization by deploying a series of buffer overrun attacks on a vulnerable executable file called umbrella.

Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.

Goals

  • To understand the procedure call abstraction and the details of its implementation with the stack discipline.
  • To understand the far-reaching impacts of system design choices, especially through security implications of the call stack in a language that does not enforce memory safety.
  • To understand the principles of buffer overrun vulnerabilities through practice exploits in a controlled environment.
  • To scare yourself a bit when realizing that the same kind of vulnerability you exploited probably exists somewhere in the software powering your healthcare, transportation, utilities, and more.

Setup

Get your repository with cs240 start buffer --solo.

Your Dark Buffer Arts Palette contains the following files:

  • descriptions.txt: file for English descriptions of your exploits
  • exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex: files for Exploits 1-4
  • hex2raw: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytes
  • id2cookie: utility to convert user ID to unique “cookie” value
  • Makefile: recipes to test your exploits
  • umbrella: executable you will attack
  • umbrella.c: important parts of C code used to compile umbrella

Create your cookie: Most attacks in this assignment will require you to make a unique2 8-byte “cookie” value show up in places where it ordinarily would not. This value will also determine the exact behavior of your executable. To create your personalized cookie, run make cookie. This will print your cookie in hex and store your CS username and cookie in the files id.txt and cookie.txt, respectively

Tasks

You must craft exploit strings that accomplish four increasingly sophisticated buffer overrun attacks when provided as input to the vulnerable umbrella executable. Each exploit is described below.

Submit two parts for each exploit:

  1. Exploit string (input): Write your exploit string in hex2raw input format in each of the files exploit1.hex through exploit4.hex.
  2. Description questions: In the separate descriptions.txt file, answer the questions given with each exploit succinctly to describe how your exploit works.
    • Many of these questions request only a couple words or an instruction listing from your disassembled code.
    • Prose answers should focus on the general meaning of code and data rather than specific numbers or addresses (e.g., “return address”, not “0x4067c5”).

    You may find it helpful to use these questions to help guide your exploit development.

Grading considers both the effectiveness of your exploits and your descriptions of how they work.

The remainder of this document describes:

  1. Prepatory exercises.
  2. The executable you will attack.
  3. Tools and techniques to use while constructing an exploit. (Skim, then return when working on Exploit 1.)
  4. How to run and test your exploits. (Skim, then return when working on Exploit 1.)
  5. The requirements and questions for each exploit.
  6. The grading criteria.

Preparatory Exercises

As you read this document, complete these exercises to familiarize yourself with stack frame layout, details of vulnerable functions, and tools for constructing exploits.

Preparation is your ticket for assistance.

  1. Make sure you completed the setup above, including mbaking your “cookie.”
  2. As you read about the umbrella executable, disassemble it to find getbuf.
  3. Draw the call stack frame for a call to getbuf right before it calls Gets, using the conventions from class and lab. Label the positions and sizes of as many parts of the frame as you can recover.
  4. On your call stack drawing, simulate a call to Gets getbuf with a sample input string of your choosing by following the C code in umbrella.c and, showing any updates to the call stack bounds or content. Do not bother simulating Gets at the x86 level – take its functionaliy at face value as documented (or use the C code).
  5. Remember these later to save time:
    • Which exploits will run alone without GDB? Which exploits work only under GDB?
    • What is the purpose of hex2raw?
    • What are the steps for running your exploit? For testing all exploits?

The umbrella Executable

The umbrella executable requires a user ID argument on the command line and reads a string from standard input once it starts up. The user ID customizes stack layout and verifies a unique “cookie” value that your attacks must provide.

Usage

To run the umbrella executable:

$ ./umbrella -u your_cs_username
Type string: 

Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:

$ ./umbrella -u $(cat id.txt)
Type string: 

Input Vulnerability

The umbrella executable reads a string from standard input with the function getbuf():

unsigned long long getbuf() {
  char buf[36];
  // ...
  unsigned long long val = (unsigned long long)Gets(buf);
  // ...
  return val % 40;
}

The full version of this function contains more code for an optional additional challenge. The part shown here is sufficient for the required parts of this assignment. The key feature to note is that getbuf() calls the function Gets(), passing the address of its local array buf, which is allocated on the stack with space for 36 chars.

The function char* Gets(char* buf) is similar to the standard C library function char* gets(char* buf). It reads a string from standard input, terminated by a newline character ('\n'), and stores the characters of the string, followed by a null terminator ('\0') starting at the memory address given by its argument, buf. It returns its argument.

Neither Gets() nor gets() has any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, assuming the destination is large enough and thus possibly over-running the bounds of the storage allocated at the destination.

If the input string read by getbuf() is less than 36 characters long, it is clear that getbuf() will return some value less than 0x28 (that’s 4010), as shown by the following execution example:

$ ./umbrella -u your_cs_username
Type string: Acromantula!
Dud: getbuf returned 0x20

The value returned might differ for you, since it is derived from the address of buf on the stack, which may vary between systems. Running the umbrella under gdb will also yield different values than it does outside gdb.

Typically, an error occurs if we type a longer string:

$ ./umbrella -u your_cs_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!

As the error message indicates, over-running the buffer typically causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella so that it does more interesting things. These are called exploit strings.

Tools for Crafting Exploits

Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.

Formatting Exploit Strings with hex2raw

Each ASCII character in a string is represented by one byte. For example 'A' is represented by the byte value also described by the hexadecimal number value 0x41. While your exploits will be delivered under the guise of strings, they will embed sequences of bytes encoding addresses, numbers, or other non-character data. It is hard enough to map each desired byte value in your exploit back to a character by hand, but often, the specific bytes required do not even correspond to any typeable or printable ASCII characters, making it “difficult” to type your exploit string on a keyboard or view it on the screen. Do not try to encode your exploit by hand!

We have provided a tool called hex2raw to encode exploit strings:

  • The input to hex2raw is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces.
  • The output of hex2raw is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.

Suppose we want the sequence of bytes whose values are the hexadecimal numbers 0x41, 0x42, 0x43, 0x1b. These same values, when interpreted with the ASCII encoding, mean the characters 'A', 'B', 'C', followed by the ASCII “escape” (ESC) character, which is not treated as a string character when typed on the keyboard or printed to the terminal output. Given the input 41 42 43 1b, the hex2raw utility will output the desired 4-byte sequence.

To run hex2raw, type the series of hexadecimal byte value descriptions you want in a file (e.g., exploit1.hex for Exploit 1). Following our example, we could save the string 41 42 43 1b into the file exploit1.hex using Emacs. Then run:

$ ./hex2raw < exploit1.hex > exploit1.bytes

The shell’s input redirection symbol < instructs the command-line shell to use the contents of exploit1.hex as standard input to hex2raw, instead of looking for input from the keyboard. The shell’s output redirection symbol > instructs the command-line shell to store the standard (printed) output of hex2raw into a file called exploit1.bytes. Input and output redirection (< and >) are general features of the command-line shell that can be used independently and with any executable command.

Once the exploit string byte sequence is stored into the file exploit1.bytes, run umbrella with the contents of the file exploit1.bytes as input:

$ ./umbrella -u your_cs_username < exploit1.bytes

Naturally, as with compiled source code, if you update your exploit string specification in exploit1.hex, you must run hex2raw again to translate the new version to a byte sequence in exploit1.bytes to use this new exploit with the umbrella.

Warning: do not use 0A

Your exploit string must not contain byte value 0x0A (0A in hex2raw input) at any intermediate position, since this is the ASCII code for newline ('\n'). When Gets() encounters this byte, it will assume you intended to terminate the string input. hex2raw will warn you if it encounters this byte value.

Byte-Encoding Instructions

You may wish to come back and read this section later after looking at the exploits. When including instructions as part of an exploit payload, you must use the instruction encoding as machine code, the byte sequence used to encode an instruction like pushq %rax for the machine. This is not the byte sequence representing the string "pushq %rax".

Use gcc as an assembler and objdump as a disassembler to generate the byte codes for instruction sequences. Suppose we write a file example.s containing the following assembly code:

# Example of hand-generated assembly code
movq $0x1234abcd,%rax    # Move 0x1234abcd to %rax
pushq $0x401080          # Push 0x401080 on to the stack
retq                     # Return

The code can contain a mixture of instructions and data. Anything to the right of a # character is a comment.

We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:

$ gcc -c example.s
$ objdump -d example.o > example.d

The generated file example.d contains the following lines:

   0:	48 c7 c0 cd ab 34 12 	mov    $0x1234abcd,%rax
   7:	68 80 10 40 00       	pushq  $0x401080
   c:	c3                   	retq

Each line shows a single instruction. The number on the left indicates the starting address (starting with 0), while the hex digits after the : character indicate the byte codes for the instruction, in memory order from left to right. Thus, we can see that the instruction pushq $0x401080 has a hex-formatted byte code of 68 80 10 40 00.

If we read the 4 bytes starting at address 6 we get: 80 10 40 00. This is shows the bytes of 4-byte value 0x00401080 in little endian order, with the byte at the lowest address shown on the left and the byte at the highest address shown on the right.

Finally, we can read the byte sequence for our code: 48 c7 c0 cd ab 34 12 68 80 10 40 00 c3

Running and Testing Exploits

Test all exploits (used for grading):

  1. Save your exploits in hex2raw input format in the proper files.
  2. Run make test to translate and test each exploit and generate a summary.

Run an individual exploit:

  1. Write the exploit string in hex2raw input format in, e.g., the file exploit1.hex.
  2. Translate it to raw bytes with hex2raw:

     $ ./hex2raw < exploit1.hex > exploit1.bytes
    
  3. Run it directly (possible for Exploits 1 and 2):

     $ ./umbrella -u your_cs_username < exploit1.bytes
    

    or under gdb (required for Exploits 3 and 4):

     $ gdb ./umbrella
     [... gdb startup output ...]
     (gdb) run -u your_cs_username < exploit1.bytes
    

GDB Scripts

When using gdb, you may find it useful to save a series of gdb commands to a text file (e.g., commands.txt) and then use the -x commands.txt flag, which runs each line of the file as a command in gdb. This saves the trouble of retyping the commands every time you run gdb. You can read more about the -x flag in gdb’s man page.

Exploits

Save your buffer overrun exploit strings in hex2raw input format in the files exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex.

Exploit 1: Candle

The function getbuf() is called within umbrella by a function test():

void test() {
  volatile unsigned long long val;
  volatile unsigned long long local = 0xdeadbeef;
  char* variable_length;
  entry_check(3);  /* Make sure entered this function properly */
  val = getbuf();
  if (val <= 40) {
    variable_length = alloca(val);
  }
  entry_check(3);
  /* Check for corrupted stack */
  if (local != 0xdeadbeef) {
    printf("Sabotaged!: the stack has been corrupted\n");
  } else if (val == cookie) {
    printf("Boom!: getbuf returned 0x%llx\n", val);
    if (local != 0xdeadbeef) {
      printf("Sabotaged!: the stack has been corrupted\n");
    }
    if (val != cookie) {
      printf("Sabotaged!: control flow has been disrupted\n");
    }
    validate(3);
  } else {
    printf("Dud: getbuf returned 0x%llx\n", val);
  }
}

When getbuf() executes its return statement, the program ordinarily resumes execution within function test(). Within the file umbrella, there is a function smoke():

void smoke() {
    entry_check(0); /* Make sure entered this function properly */
    printf("Smoke!: You called smoke()\n");
    validate(0);
    exit(0);
}

Your task is to get umbrella to execute the code for smoke() when getbuf() executes its return statement, rather than returning to test(). You can do this by supplying an exploit string that overwrites the stored return pointer in the stack frame for getbuf() with the address of the first instruction in smoke. Note that your exploit string may also corrupt other parts of the stack state, but this will not cause a problem, because smoke() causes the program to exit directly.

Advice

  • All the information you need to devise this exploit string can be determined by examining a disassembled version of umbrella.
  • Be careful about byte ordering (i.e., endianness).
  • You might want to use gdb to step the program through the last few instructions of getbuf() to make sure it is doing the right thing.
  • The placement of buf within the stack frame for getbuf() depends on which version of gcc was used to compile umbrella. You must pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary.
  • Don’t forget to use hex2raw to simplify your job.

Description Questions for Exploit 1

Answer these questions succinctly in descriptions.txt.

  1. What is the instruction inst (give its address and assembly) in the umbrella executable that executes differently under your exploit than it does under normal (overflow-free) execution of umbrella and causes the computer to execute a different next instruction than usual?
  2. What part of your exploit string (described as a byte offset from the start of the string) causes instruction inst to behave differently than normal? Why? (Write a sentence or two.)
  3. What instruction (address and assembly representation) executes next after instruction inst?

Exploit 2: Soda Fountain

Within the umbrella there is also a function fizz():

void fizz(int arg1, char arg2, long arg3,
    char* arg4, short arg5, short arg6, unsigned long long val) {
  entry_check(1);  /* Make sure entered this function properly */
  if (val == cookie) {
	printf("Fizz!: You called fizz(0x%llx)\n", val);
	validate(1);
  } else {
	printf("Misfire: You called fizz(0x%llx)\n", val);
  }
  exit(0);
}

Similar to Exploit 1, your task is to get umbrella to execute the code for fizz() rather than returning to test. In this case, however, you must make it appear to fizz as if you have passed your cookie as its argument. You can do this by encoding your cookie in the appropriate place within your exploit string.

Advice

  • Recall that the first six arguments are passed in registers and additional arguments are passed on the stack. Your exploit code needs to write to the appropriate place within the stack. This explains our somewhat contrived fizz parameters.
  • You can use gdb to get the information you need to construct your exploit string. Set a breakpoint within getbuf() and run to this breakpoint. Determine key features such as the address of val and the location of the buffer.

Description Questions for Exploit 2

Answer these questions succinctly in descriptions.txt.

  1. Describe how the beginning of the exploited execution of fizz differs from the beginning of the execution of fizz by a normal procedure call:
    • What instruction (assembly, no address) executes right before fizz?
    • How does that instruction modify the call stack? (Write a phrase or sentence. Give a general description, not a number.)
  2. What instruction (address and assembly) in fizz finds the value of the val argument? Where does it find val relative to the top of the call stack? (Give a byte offset.)
  3. Describe how your exploit causes this instruction to find your cookie. (Write a sentence or two.)

Exploit 3: Door Knocker

A much more sophisticated form of buffer attack involves supplying a string that encodes actual machine instructions. The exploit string then overwrites the return pointer with the starting address of these instructions. When the calling function (in this case getbuf) executes its ret instruction, the program will start executing the instructions on the stack rather than returning. With this form of attack, you can get the program to do almost anything. The code you place on the stack is called the exploit code. This style of attack is tricky, though, because you must get machine code onto the stack and set the return pointer to the start of this code.

You must run umbrella under gdb for this exploit to succeed. (Modern systems use memory protection mechanisms to prevent execution of memory locations in the stack and guard against exactly this type of attack. Since gdb works a little differently than normal program execution, it allows the exploit to succeed.)

Within the file umbrella there is a function bang():

unsigned global_value = 0;

void bang(unsigned long long val) {
  entry_check(2); /* Make sure entered this function properly */
  if (global_value == cookie)  {
    printf("Bang!: You set global_value to 0x%llx\n", global_value);
    validate(2);
  } else  {
    printf("Misfire: global_value = 0x%llx\n", global_value);
  }
  exit(0);
}

Similar to Exploits 1 and 2, your task is to get umbrella to execute the code for bang() rather than returning to test(). Before this, however, you must set global variable global_value to your cookie. Your exploit code should set global_value, push the address of bang() on the stack, and then execute a ret instruction to cause a jump to the code for bang().

Advice

  • You must run umbrella under gdb for this exploit to succeed.
  • Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You can let tools do all of the work by writing an assembly code file containing the instructions and data you want to put on the stack. Assemble this file with gcc and disassemble it with objdump. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.)
  • Keep in mind that your exploit string depends on your computer, your compiler, and even your cookie.
  • Watch your use of address modes when writing assembly code. Note that movq $0x4, %rax moves the value 0x0000000000000004 into register %rax; whereas movq 0x4, %rax moves the value in memory at address 0x0000000000000004 into %rax, which is not likely your intent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!)
  • Do not attempt to use either a jmp or a call instruction to jump to the code for bang(). These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use the ret instruction.

Description Questions for Exploit 2

Answer these questions succinctly in descriptions.txt.

  1. Starting from the ret instruction in getbuf, list the series of instructions (address and assembly) that executes until the first instruction in bang.
  2. Describe how this code sequence changes memory contents, register contents, and program counter. (Write a couple/few sentences or annotate your listing above.)

Exploit 4: Whizbang Basic Blaze Box

You must run umbrella under gdb for this exploit to succeed.

Our preceding attacks have all caused the program to jump to the code for some other function, which then causes the program to exit. As a result, it was acceptable to use exploit strings that corrupt the stack, overwriting the saved value of register %rbp and the return pointer.

The most sophisticated form of buffer overrun attack causes the program to execute some exploit code that patches up the stack and makes the program return to the original calling function (test() in this case). The calling function is oblivious to the attack. This style of attack is tricky, though, since you must: (1) get machine code onto the stack, (2) set the return pointer to the start of this code, and (3) undo the corruption made to the stack state.

Your job is to supply an exploit string that will cause getbuf() to return your cookie back to test(), rather than the value 1. You can see in the code for test() that this will cause the program to go Boom!. Your exploit code should set your cookie as the return value, restore any corrupted state, push the correct return location on the stack, and execute a ret instruction to really return to test().

Advice

  • You must run umbrella under gdb for this exploit to succeed.
  • In order overwrite the return address slot on the stack, your exploit string must also cover all the items saved on the stack between the buf array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return to test() may depend on it. Consider carefully what is stored here on the stack during getbuf, its original source, where it stored after getbuf completes, and how test may use it. Use gdb to inspect the disassembled code of getbuf and test. Determine how you can organize your exploit to avoid disturbing stack data in getbuf on which test later relies.
  • Let tools such as gcc and objdump do all of the work of generating a byte encoding of the instructions.
  • Keep in mind that your exploit string depends on your cookie, your computer, and your compiler.

Description Questions for Exploit 4

Answer these questions succinctly in descriptions.txt.

  1. What instruction (address and assembly) in test inspects the return value of getbuf?
  2. Under your exploit, what instruction (address and assembly) stores this return value to “trick” test? Does this instruction execute before or after the ret in getbuf?
  3. What other instruction inst (address and assembly) in test depends on data from your exploit string?
  4. Under normal (overflow-free) execution, what instruction (address and assembly) stores the data that this instruction expects to find there?
  5. What is the meaning of that location on the stack? (Write a sentence or two to describe in terms of general procedure conventions.)
  6. Why might test crash if your exploit does not heed the importance of this stack location? (Write a sentence or two.)
  7. How does your exploit avoid this crash? (Write a sentence or two.)

Reflect on what you have accomplished.

You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant problem!

Mayhem (optional challenge)

execve is a system call that replaces the currently running program with another program inheriting all the open file descriptors. What are the limitations of the exploits you have performed so far? How could calling execve allow you to circumvent this limitation? If you have time, try writing an additional exploit (mayhem.hex) that uses execve and another program to print a message. Talk to the instructors if you’re curious about more.

Submission

Submit: The course staff will collect your work directly from your hosted repository as of the deadline. To submit your work:

  1. Make sure you have committed your latest changes.

    $ git add ...
    $ git commit ...
  2. Run the command cs240 sign to sign your work and respond to any assignment survey questions.

    $ cs240 sign
  3. Push your signature and your latest local commits to the hosted repository.

    $ git push

Confirm: After pushing, all local changes have been submitted if the output of git status shows both:

  • Your branch is up to date with 'origin/master', meaning all local commits have been pushed
  • nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

The assignment is graded from a maximum of 100 points:

  • Working exploits (80 points): run make test to check all of your exploits.
    • Exploit 1: 25 points
    • Exploit 2: 25 points
    • Exploit 3: 15 points
    • Exploit 4: 15 points
  • Descriptions (20 points):
    • We may grade a subset of your answers.
  1. This document is an alternative (s/ia32 pyrotechnics/x86-64 incoherent magic references/g) description for the old-style CSAPP Buffer Lab, which is available in ia32 form on the CSAPP website