Assignment: Buffer

Contents

Overview1

Boring version: This assignment helps you develop a detailed understanding of the call stack organization by deploying a series of buffer overrun attacks on a vulnerable executable file called laptop.bin.

Silly version: Impressed by your recent reverse engineering adventure, an anonymous Wellesley alum contacts you for assistance subverting the laptop of an evil mastermind bent on, you know, something evil. Your task is to exploit vulnerabilities in the laptop’s software with C’s catch-fire semantics by providing carefully crafted inputs that will cause buffer overflows and lead to the self-destruction of the laptop (and its evil whatchyamacallits) in increasingly alarming ways. (Do not worry, neither your computer nor ours will explode as a result of this assignment.)

Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.

Goals

  • To understand the procedure call abstraction and the details of its implementation with the stack discipline.
  • To understand the far-reaching impacts of system design choices, especially the security implications of the call stack in a language that does not enforce memory safety (e.g., bounds checking).
  • To understand the principles of buffer overrun vulnerabilities through practice exploits in a controlled environment.
  • To scare yourself a bit when realizing that the same kind of vulnerability you exploited probably exists somewhere in the software powering your healthcare, transportation, utilities, and more.

Time Reports

According to self-reported times on this assignment from a recent semester:

  • 25% of students spent <= 8 hours.
  • 50% of students spent <= 10 hours.
  • 75% of students spent <= 15 hours.

Setup

Get your repository with cs240 start buffer --solo.

Your starter repository contains the following files:

  • responses.txt: file for English descriptions of your exploits
  • exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex: files for Exploits 1-4
  • hex2raw.bin: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytes
  • id2cookie.bin: utility to convert user ID to unique “cookie” value
  • Makefile, test.gdb: recipes to test your exploits
  • laptop.bin: executable you will attack
  • laptop.c: C code used to compile laptop.bin. This file is there to help you understand laptop.bin, but should not be modified or re-compiled.

Create your cookie: Most attacks in this assignment will require you to make a unique2 8-byte “cookie” value show up in places where it ordinarily would not. This value will also determine the exact behavior of your executable. To create your personalized cookie, run make cookie. This will print your cookie in hex and store your CS username and cookie in the files id.txt and cookie.txt, respectively

Tasks

You must craft exploit strings that accomplish four increasingly sophisticated buffer overrun attacks when provided as input to the vulnerable laptop.bin executable. Each exploit is described below.

Submit two parts for each exploit:

  1. Exploit string (input): Write your exploit string in hex2raw input format in each of the files exploit1.hex through exploit4.hex.
  2. Description questions: In the separate responses.txt file, answer the questions given with each exploit succinctly to describe how your exploit works.
    • Many of these questions request only a couple words or an instruction listing from your disassembled code.
    • Prose answers should focus on the general meaning of code and data rather than specific numbers or addresses (e.g., “return address”, not “0x4067c5”).

    You may find it helpful to use these questions to help guide your exploit development.

Grading considers both the effectiveness of your exploits and your descriptions of how they work.

The remainder of this document describes:

  1. Prepatory exercises.
  2. The executable you will attack.
  3. Tools and techniques to use while constructing an exploit. (Skim, then return when working on Exploit 1.)
  4. How to run and test your exploits. (Skim, then return when working on Exploit 1.)
  5. The requirements and questions for each exploit.
  6. The grading criteria.

Preparatory Exercises

As you read this document, complete these exercises to familiarize yourself with stack frame layout, details of vulnerable functions, and tools for constructing exploits.

Preparation is your ticket for assistance.

  1. Make sure you completed the setup above, including mbaking your “cookie.”
  2. As you read about the laptop.bin executable, disassemble it to find getbuf.
  3. Draw the call stack frame for a call to getbuf right before it calls Gets, using the conventions from class and lab. Label the positions and sizes of as many parts of the frame as you can recover.
  4. On your call stack drawing, simulate a call to getbuf with a sample input string of your choosing by following the C code in laptop.bin.c and, showing any updates to the call stack bounds or content. Do not bother simulating Gets at the x86 level – take its functionaliy at face value as documented (or use the C code).
  5. Remember these later to save time:
    • Which exploits will run alone without GDB? Which exploits work only under GDB?
    • What is the purpose of hex2raw.bin?
    • What are the steps for running your exploit? For testing all exploits?

The laptop.bin Executable

The laptop.bin executable requires a user ID argument on the command line and reads a string from standard input once it starts up. The user ID customizes stack layout and verifies a unique “cookie” value that your attacks must provide.

Usage

To run the laptop.bin executable:

$ ./laptop.bin -u your_cs_username
Type string: 

Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:

$ ./laptop.bin -u $(cat id.txt)
Type string: 

Input Vulnerability

The laptop.bin executable reads a string from standard input with the function getbuf():

unsigned long long getbuf() {
  char buf[36];
  // ...
  unsigned long long val = (unsigned long long)Gets(buf);
  // ...
  return val % 40;
}

The full version of this function contains more code for an optional additional challenge. The part shown here is sufficient for the required parts of this assignment. The key feature to note is that getbuf() calls the function Gets(), passing the address of its local array buf, which is allocated on the stack with space for 36 chars.

The function char* Gets(char* buf) is similar to the standard C library function char* gets(char* buf). It reads a string from standard input, terminated by a newline character ('\n'), and stores the characters of the string, followed by a null terminator ('\0') starting at the memory address given by its argument, buf. It returns its argument.

Neither Gets() nor gets() has any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, assuming the destination is large enough and thus possibly over-running the bounds of the storage allocated at the destination.

If the input string read by getbuf() is less than 36 characters long, it is clear that getbuf() will return some value less than 0x28 (that’s 4010), as shown by the following execution example:

$ ./laptop.bin -u your_cs_username
Type string: Acromantula!
Dud: getbuf returned 0x20

The value returned might differ for you, since it is derived from the address of buf on the stack, which may vary between systems. Running the laptop.bin under gdb will also yield different values than it does outside gdb.

Typically, an error occurs if we type a longer string:

$ ./laptop.bin -u your_cs_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!

As the error message indicates, over-running the buffer typically causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed laptop.bin so that it does more interesting things. These are called exploit strings.

Disassembly

As in the previous assignment, use gdb or objdump to disassemble the laptop.bin executable whenever you need to inspect its contents. You do not need to start, run, or single-step into the code you want to inspect. For example, to disassemble getbuf, start GDB with gdb ./laptop.bin then type disas getbuf at the GDB prompt.

You may encounter the leaveq instruction in the laptop.bin executable. This instruction is a historical artifact tied to how the %rbp register was used before x86-64 (when it was %ebp). The leaveq instruction is equivalent to the following pair of instructions in order:

mov %rbp, %rsp
popq %rbp

Some laptop.c Details

unsigned long long

Some variables in laptop.c are declared as unsigned long long. What does this mean, and how many bytes are required for values with this declaration?

The Wikipedia article on C Data Types is helpful in this regard. The size of C data types can vary between implementations—although a long on the CS Linux server is a signed integer type whose size is 8 bytes, C only requires it to be at least 4 bytes. The long long type is required to have a size of at least 8 bytes on any machine. The code in this assignment is adapted from CSAPP, whose code is designed to work correctly on as many architectures as possible, so it uses long long rather than just long for 8 byte integer values.

volatile

Some variables in laptop.c are declared as being volatile. This means that the compiler must assume that the variable value must be fetched from memory at every reference, disregarding optimizations that might otherwise be performed.

entry_check, validate, and alloca

Several functions in laptop.c use the functions entry_check and validate, and a few use alloca. These functions are used to to make sure that the exploits are working properly. You do not have to understand what these functions do as part of this assignment.

Tools for Crafting Exploits

Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.

Formatting Exploit Strings with hex2raw.bin

Each ASCII character in a string is represented by one byte. For example 'A' is represented by the byte value also described by the hexadecimal number value 0x41. While your exploits will be delivered under the guise of strings, they will embed sequences of bytes encoding addresses, numbers, or other non-character data. It is hard enough to map each desired byte value in your exploit back to a character by hand, but often, the specific bytes required do not even correspond to any typeable or printable ASCII characters, making it “difficult” to type your exploit string on a keyboard or view it on the screen. Do not try to encode your exploit by hand!

We have provided a tool called hex2raw.bin to encode exploit strings:

  • The input to hex2raw.bin is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces.
  • The output of hex2raw.bin is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.

Suppose we want the sequence of bytes whose values are the hexadecimal numbers 0x41, 0x42, 0x43, 0x1b. These same values, when interpreted with the ASCII encoding, mean the characters 'A', 'B', 'C', followed by the ASCII “escape” (ESC) character, which is not treated as a string character when typed on the keyboard or printed to the terminal output. Given the input 41 42 43 1b, the hex2raw.bin utility will output the desired 4-byte sequence.

To run hex2raw.bin, type the series of hexadecimal byte value descriptions you want in a file (e.g., exploit1.hex for Exploit 1). Following our example, we could save the string 41 42 43 1b into the file exploit1.hex using Emacs. Then run:

$ ./hex2raw.bin < exploit1.hex > exploit1.bytes

The shell’s input redirection symbol < instructs the command-line shell to use the contents of exploit1.hex as standard input to hex2raw.bin, instead of looking for input from the keyboard. The shell’s output redirection symbol > instructs the command-line shell to store the standard (printed) output of hex2raw.bin into a file called exploit1.bytes. Input and output redirection (< and >) are general features of the command-line shell that can be used independently and with any executable command.

Once the exploit string byte sequence is stored into the file exploit1.bytes, run laptop.bin with the contents of the file exploit1.bytes as input:

$ ./laptop.bin -u your_cs_username < exploit1.bytes

Naturally, as with compiled source code, if you update your exploit string specification in exploit1.hex, you must run hex2raw.bin again to translate the new version to a byte sequence in exploit1.bytes to use this new exploit with the laptop.bin.

Warning: do not use 0A

Your exploit string must not contain byte value 0x0A (0A in hex2raw.bin input) at any intermediate position, since this is the ASCII code for newline ('\n'). When Gets() encounters this byte, it will assume you intended to terminate the string input. hex2raw.bin will warn you if it encounters this byte value.

Running and Testing Exploits

Test all exploits (used for grading):

  1. Save your exploits in hex2raw input format in the proper files.
  2. Run make test to translate and test each exploit and generate a summary.

Run an individual exploit:

  1. Write the exploit string in hex2raw input format in, e.g., the file exploit1.hex.
  2. Translate it to raw bytes with hex2raw.bin:

     $ ./hex2raw.bin < exploit1.hex > exploit1.bytes
    
  3. Run it directly (possible for Exploits 1 and 2):

     $ ./laptop.bin -u your_cs_username < exploit1.bytes
    

    or under gdb (required for Exploits 3 and 4):

     $ gdb ./laptop.bin
     [... gdb startup output ...]
     (gdb) run -u your_cs_username < exploit1.bytes
    

GDB Scripts

When using gdb, you may find it useful to save a series of gdb commands to a text file (e.g., commands.txt) and then use the -x commands.txt flag, which runs each line of the file as a command in gdb. This saves the trouble of retyping the commands every time you run gdb. You can read more about the -x flag in gdb’s man page.

Exploits

Save your buffer overrun exploit strings in hex2raw input format in the files exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex.

Exploit 1: Smoke

The function getbuf() is called within laptop.bin by a function test():

void test() {
  volatile unsigned long long val;
  volatile unsigned long long local = 0xabcadabc;
  char* variable_length;
  entry_check(3); /* Check for entering this function properly */
  val = getbuf();
  if (val <= 40) {
    variable_length = alloca(val);
  }
  entry_check(3);
  /* Check for corrupted stack */
  if (local != 0xabcadabc) {
    printf("Sabotaged!: the stack has been corrupted\n");
  } else if (val == cookie) {
    printf("Boom!: getbuf returned 0x%llx\n", val);
    if (local != 0xabcadabc) {
      printf("Sabotaged!: the stack has been corrupted\n");
    }
    if (val != cookie) {
      printf("Sabotaged!: control flow has been disrupted\n");
    }
    validate(3);
  } else {
    printf("Dud: getbuf returned 0x%llx\n", val);
  }
}

When getbuf() executes its return statement, the program ordinarily resumes execution within function test(). Within the file laptop.bin, there is a function smoke():

void smoke() {
    entry_check(0); /* Make sure entered this function properly */
    printf("Smoke!: You called smoke()\n");
    validate(0);
    exit(0);
}

Your task is to get laptop.bin to execute the code for smoke() when getbuf() executes its return statement, rather than returning to test(). You can do this by supplying an exploit string that overwrites the stored return pointer in the stack frame for getbuf() with the address of the first instruction in smoke. Note that your exploit string may also corrupt other parts of the stack state, but this will not cause a problem, because smoke() causes the program to exit directly.

Answer the corresponding questions in responses.txt. Reading and answering these questions may help you solve this exploit.

Advice

  • All the information you need to devise this exploit string can be determined by examining a disassembled version of laptop.bin.
  • Be careful about byte ordering (i.e., endianness).
  • You might want to use gdb to step the program through the last few instructions of getbuf() to make sure it is doing the right thing.
  • The placement of buf within the stack frame for getbuf() depends on which version of gcc was used to compile laptop.bin. You must pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary.
  • Don’t forget to use hex2raw.bin to simplify your job.

Exploit 2: Fizz

Within the laptop.bin there is also a function fizz():

void fizz(int arg1, char arg2, long arg3,
    char* arg4, short arg5, short arg6, unsigned long long val) {
  entry_check(1);  /* Make sure entered this function properly */
  if (val == cookie) {
	printf("Fizz!: You called fizz(0x%llx)\n", val);
	validate(1);
  } else {
	printf("Misfire: You called fizz(0x%llx)\n", val);
  }
  exit(0);
}

Similar to Exploit 1, your task is to get laptop.bin to execute the code for fizz() rather than returning to test. In this case, however, you must make it appear to fizz as if you have passed your cookie as its argument. You can do this by encoding your cookie in the appropriate place within your exploit string.

Answer the corresponding questions in responses.txt.

Advice

  • Recall that the first six arguments are passed in registers and additional arguments are passed on the stack. Your exploit code needs to write to the appropriate place within the stack. This explains our somewhat contrived fizz parameters.
  • You can use gdb to get the information you need to construct your exploit string. Set a breakpoint within getbuf() and run to this breakpoint. Determine key features such as the address of val and the location of the buffer.

Exploit 3: Bang

A much more sophisticated form of buffer attack involves supplying a string that encodes actual machine instructions. The exploit string then overwrites the return pointer with the starting address of these instructions. When the calling function (in this case getbuf) executes its ret instruction, the program will start executing the instructions on the stack rather than returning. With this form of attack, you can get the program to do almost anything. The code you place on the stack is called the exploit code. This style of attack is tricky, though, because you must get machine code onto the stack and set the return pointer to the start of this code.

You must run laptop.bin under gdb for this exploit to succeed. (Modern systems use memory protection mechanisms to prevent execution of memory locations in the stack and guard against exactly this type of attack. Since gdb works a little differently than normal program execution, it allows the exploit to succeed.)

Within the file laptop.bin there is a function bang():

unsigned global_value = 0;

void bang(unsigned long long val) {
  entry_check(2); /* Make sure entered this function properly */
  if (global_value == cookie)  {
    printf("Bang!: You set global_value to 0x%llx\n", global_value);
    validate(2);
  } else  {
    printf("Misfire: global_value = 0x%llx\n", global_value);
  }
  exit(0);
}

Similar to Exploits 1 and 2, your task is to get laptop.bin to execute the code for bang() rather than returning to test(). Before this, however, you must set global variable global_value to your cookie. Your exploit code should set global_value, push the address of bang() on the stack, and then execute a ret instruction to cause a jump to the code for bang().

Answer the corresponding questions in responses.txt.

Byte-Encoding Instructions for Exploit Code

When including instructions as part of an exploit payload, you must use the instruction encoding as machine code, the byte sequence used to encode an instruction like pushq %rax for the machine. This is not the byte sequence representing the string "pushq %rax".

Use gcc as an assembler and objdump as a disassembler to generate the byte codes for instruction sequences. Suppose we write a file example.s containing the following assembly code:

# Example of hand-generated assembly code
movq $0x1234abcd,%rax    # Move 0x1234abcd to %rax
pushq $0x401080          # Push 0x401080 on to the stack
retq                     # Return

The code can contain a mixture of instructions and data. Anything to the right of a # character is a comment.

We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:

$ gcc -c example.s
$ objdump -d example.o > example.d

The generated file example.d contains the following lines:

   0:	48 c7 c0 cd ab 34 12 	mov    $0x1234abcd,%rax
   7:	68 80 10 40 00       	pushq  $0x401080
   c:	c3                   	retq

Each line shows a single instruction. The number on the left indicates the starting address (starting with 0), while the hex digits after the : character indicate the byte codes for the instruction, shown as individual bytes in memory order from left to right. (Do not flip them.) Thus, we can see that the instruction pushq $0x401080 has a hex-formatted byte code of 68 80 10 40 00 that could be entered into an exploit string. The entire byte sequence to encode the above instructions would be: 48 c7 c0 cd ab 34 12 68 80 10 40 00 c3.

Advice

  • You must run laptop.bin under gdb for this exploit to succeed.
  • Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You should let tools do all of the work by:

    1. Writing an assembly code file exploit3Assembly.s containing the instructions and data you want to put on the stack.
    2. Assembling this file with gcc -c exploit3Assembly.s to create the binary file exploit3Assembly.o (as shown in the Byte-Encoding Instructions section above).
    3. Disassembling the binary file with objdump -d exploit3Assembly.o > exploit3Assembly.d (as shown in the Byte-Encoding Instructions section above). exploit3Assembly.d will allow you to see the byte sequence to include in your exploit.
  • Keep in mind that your exploit string depends on your computer, your compiler, and even your cookie.
  • Watch your use of address modes when writing assembly code. Note that movq $0x4, %rax copies the literal value 0x0000000000000004 into register %rax; whereas movq 0x4, %rax copies the contents of memory at address 0x0000000000000004 into %rax. If you forget a $ character, your code is likely to cause a segmentation fault, because the literal number that you mistakenly wrote as memory address is most likely not a legal memory address.
  • Due to restrictions on the total size of instruction encodings, x86 does not support all combinations of operands. For example, it is not possible to write a movq instruction with a large literal source operand and an absolute memory address. If you get errors from the assembler (such as Error: operand size mismatch), try breaking down instructions into multiple steps, storing intermediate values in registers.
  • Do not attempt to use either a jmp or a call instruction to jump to the code for bang(). These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use the ret instruction.

Exploit 4: Boom

You must run laptop.bin under gdb for this exploit to succeed.

Our preceding attacks have all caused the program to jump to the code for some other function, which then causes the program to exit. As a result, it was acceptable to use exploit strings that corrupt the stack, overwriting the saved value of register %rbp and the return pointer.

The most sophisticated form of buffer overrun attack causes the program to execute some exploit code that patches up the stack and makes the program return to the original calling function (test() in this case). The calling function is oblivious to the attack. This style of attack is tricky, though, since you must: (1) get machine code onto the stack, (2) set the return pointer to the start of this code, and (3) undo the corruption made to the stack state.

Your job is to supply an exploit string that will cause getbuf() to return your cookie back to test(), rather than the value 1. You can see in the code for test() that this will cause the program to go Boom!. Your exploit code should set your cookie as the return value, restore any corrupted state, push the correct return location on the stack, and execute a ret instruction to really return to test().

Advice

  • You must run laptop.bin under gdb for this exploit to succeed.
  • The leaveq instruction (a historical artifact tied to how the %ebp register was used before x86-64) is equivalent to the following pair of instructions in order:

    mov %rbp, %rsp
    popq %rbp
    
  • In order to overwrite the return address slot on the stack, your exploit string must also cover all the items saved on the stack between the buf array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return to test() may depend on it. Consider carefully what is stored here on the stack during getbuf, its original source, where it is stored after getbuf completes, and how test may use it. Use gdb to inspect the disassembled code of getbuf and test. Determine how you can organize your exploit to avoid disturbing stack data in getbuf on which test later relies.
  • As in Exploit 3, let tools such as gcc and objdump do all of the work of byte-encoding the instructions. Use them to create a file exploit4Assembly.d file that will show you the instruction bytes to use in your exploit.
  • Keep in mind that your exploit string depends on your cookie, your computer, and your compiler.

[Independent] Description Questions for Exploit 4

The questions for Exploit 4 are [Independent].

Recognize what you have accomplished.

You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant problem!

Submission

Submit: The course staff will collect your work directly from your hosted repository. To submit your work:

  1. Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.

  2. Make sure you have committed your latest changes. (Replace FILES with the files you changed and MESSAGE with your commit message.)

    $ git add FILES
    $ git commit -m "MESSAGE"
    
  3. Run the command cs240 sign to sign your work and respond to any assignment survey questions.

    $ cs240 sign
    

    (If this encounters an error, instead execute cs240.s24 sign.)

  4. Push your signature and your latest local commits to the hosted repository.

    $ git push
    

Confirm: All local changes have been submitted if the output of git status shows both:

  • Your branch is up to date with 'origin/main', meaning all local commits have been pushed
  • nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

The assignment is graded from a maximum of 100 points:

  • Working exploits (75 points): run make test to check all of your exploits.
    • Exploit 1: 20 points
    • Exploit 2: 25 points
    • Exploit 3: 15 points
    • Exploit 4: 15 points
  • Questions (25 points):
    • Your answers for exploit 4 and one or more other exploits in responses.txt will be graded.
  1. This document is an alternative (s/ia32/x86-64/g) description for the old-style CSAPP Buffer Lab, which is available in ia32 form on the CSAPP website