CS 240 Lab 10

Buffer Launch

CS 240 Lab 10

This lab will introduce you to buffer overflow, a common type of security vulnerability. You can design an input to a program which takes advantage (exploits) how the stack is used to call functions. This can cause a program to execute differently from how it was intended to work.

Setup

Open a terminal on cs.wellesley.edu (e.g., using VSCode), and start the buffer assignment using:

cd cs240-repos
cs240 start buffer --solo

This assignment requires you to have a unique 8-byte “cookie.” We have provided a Makefile which has a recipe to create this cookie.

Go to your buffer directory and open the Makefile. A make file can define variables, but mostly consists of “rules.” A rule starts with a name followed by a colon, possibly with other names after the colon. Then below that line, it has one or more indented lines of code composing the body of the rule. Each of these rules does the following:

  1. Defines a unique name for the product of the rule (this is the name part before the colon). This is the file that should be produced by running the rule body as shell commands. Within the rule body, writing $@ will refer to the rule product.
  2. Defines which other files should already exist before the rule can be run (dependencies). These are listed after the colon on the first line of the rule. The dependency listed first can be referred to in the rule body as $<.
  3. Gives the series of shell commands to execute in order to produce the rule product, assuming the dependencies are all up-to-date. These commands should include one that will produce the rule product file as output.

What make does is that you give it a rule name, and it then checks the file creation time for the dependencies for that rule. If the rule product already exists, and all of the dependency files were created earlier than the rule product file, make does nothing, because that indicates that the rule product file is already up-to-date. On the other hand, if at least one of the dependencies is newer than the rule product, it will run the rule body to update the product file. It also applies this process recursively: when a rule dependency for one rule is a rule product of another rule, it checks the freshness of the dependencies of the second rule, and so on and so forth.

Looking at this Makefile, answer the following questions:

  1. Which rule doesn’t have any dependences (give the rule product name)?

    Correct answer: id.txt

  2. Run the cs240 id command yourself in the shell, without the > $@ part that’s in the Makefile. Given that > redirects output into a file, explain what id.txt should contain:

    Example answer: It should contain your username.

    Now run ls in the shell and see whether id.txt exists (it probably doens’t). Then run make id.txt in the shell to produce it and check whether you were right.

    Note: normally make prints out each command it executes. An @ in a makefile at the start of a line suppresses this.

  3. The rule that produces cookie.txt has one dependency. What is it?

    Correct answer: id.txt

  4. If we run make cookie.txt now, will make need to run the rule for id.txt? Why or why not?

    Example answer: It won’t. Since we just ran make id.txt, the rule dependency for make cookie.txt is already up-to-date, so make won’t bother to re-run that rule.

  5. Which executable file will the cookie.txt rule run to produce the cookie file (note that > in the shell redirects output into a file).

    Correct answer: id2cookie.bin

  6. In addition to creating cookie.txt, what else does this rule do?

    Example answer: It commits that file into your git repository.

  7. The cookie rule just prints out your id and cookie. It’s marked as .PHONY at the bottom of the file because it doesn’t actually produce a file named cookie. What would happen if you ran make cookie without first running make cookie.txt? Why?

    Example answer: Since cookie.txt is listed as a dependency of the cookie rule, make will first check that it is up-to-date. If it doesn’t exist, make will run the cookie.txt rule to create it (first running the id.txt rule if that file is also missing or not up-to-date). So just running make cookie can produce id.txt and cookie.txt and will still work to print out the cookie. Make is pretty sweet :)

Now run make cookie to create and display your cookie. As we just discussed, this will print your cookie in hex and store your CS username and cookie in the files id.txt and cookie.txt, respectively.

Contents of Repository

List the contents of the repository (use ls). You should see the following files (along with a few others):

  • laptop.c: you are given important parts of the C code that you can read to help you understand how the compiled version works. This is similar to how you were given the main.c code for the x86 assignment to help you understand how adventure.bin worked.

  • laptop.bin: compiled/executable version of laptop.c. You will run laptop.bin using a file containing input to the program. This is similar to how you ran adventure.bin using inputs.txt in the x86 assignment.

    You will design the input to the program to take advantage of how the stack is used to get the program to do something different that what was originally intended (this is called an exploit).

  • exploit1.hex, exploit2.hex, exploit3.hex, exploit4.hex: input files you will write for each of the 4 exploits you will design. These correspond to inputs.txt for the x86 assignment, except that you use a separate file for each exploit.

    Each exploit file must contain the correct number of bytes with the correct hexadecimal value for each byte to represent your exploit.

  • hex2raw.bin: tool/executable file you are given which helps you prepare to run your exploits. This executable file will take the input you design for an exploit (for example, exploit1.hex) and turn it into the proper raw byte form (exploit1.bytes) needed to run the program.

  • responses.txt: file for English descriptions of your exploits This is similar to descriptions.txt in the x86 assignment.

Investigate getbuf

Exercise 1:

In the Terminal, open the laptop executable:

gdb ./laptop.bin

Examine the getbuf function using the list command:

(gdb) list getbuf

Look for the following three key statements (you may have to hit “enter” to list more lines if they don’t all show up at once):

unsigned long long getbuf() {
    char buf[36];
 
    // other statements
 
    unsigned long long val = (unsigned long long)Gets(buf);
 
    // other statements
  
    return val % 40;
}

The function getbuf() uses buf, an array of 36 bytes, to calculate a value.

Where will the memory for the buf array be allocated?

Example answer: Since buf is a local variable, space for the 36 bytes is allocated on the stack.

Use disas getbuf to confirm your answer: which instruction allocates space for buf (along with other local variables of getbuf):

Correct answer: sub $0x30,%rsp

Which register actually holds the buf pointer when Gets is called?

Correct answer: %rdi

The bytes to fill buf are obtained by calling the function Gets(), which has the signature:

char* Gets(char* buf) 

where buf is a pointer to an array of bytes, and the string read from standard input which supplies the bytes is terminated by a newline character ('\n').

The bytes from the input are stored starting at buf, which is an address in the stack.

Gets() is similar to a C system function called gets() , which is not safe to use because it does not check to see if the string accepted is longer than the size of the array!

What do you think might happen if the string accepted IS longer than the array? Why is this “not safe?”

Example answer:

The extra bytes will be written over stuff beyond the buffer. Since the buffer is on the stack, these bytes will overwrite important other stack information, like the return address of the current function, which could lead to an attacker gaining control of the system.

Fill in the stack diagram and registers below for a call to getbuf right before it calls a function Gets, using the conventions from class and lab (use disas getbuf to see the assembly code, you only need to look at the first few lines). First, fill in as much as you can without using gdb, then use gdb to fill in the rest. Use the following assumptions:

  1. Assume the return address is 0x4017e3.
  2. Assume the value of %rbp before the call to getbuf was 0x7fffffffb210.
  3. Assume the value of %rsp before the call to getbuf was 0x7fffffffb1f0.
  4. For memory slots with known values, fill in hex values starting with 0x.
  5. For memory slots that are allocated but which haven’t been initialized yet, write ‘X’.
  6. For memory slots that are below the stack pointer, write ‘-’.
  7. When running it with gdb to print out actual values, use run -u pmwh to get matching stack addresses. When actually doing the assignment, use your own username instead.

Hint: to fill in these values, simulate the effects of each instruction, starting with the call getbuf instruction that jumps into getbuf (which isn’t shown as part of disas getbuf) and ending with the final instruction before the call. Consult the lab slides for reminders about how call, ret, push, and pop modify the stack.

Hint 2: The call getbuf instruction is the instruction that will determine the value for the first address listed below.

Address Value Comments
0x7fffffffb1e8 Correct answer: 0x4017e3 Example answer: Return address for getbuf
0x7fffffffb1e0 Correct answer: 0x7fffffffb210 Example answer: Pushed %rbp value.
0x7fffffffb1d8 Correct answer: X Example answer: 16x3 = 48 bytes allocated by sub.
0x7fffffffb1d0 Correct answer: X
0x7fffffffb1c8 Correct answer: X
0x7fffffffb1c0 Correct answer: X
0x7fffffffb1b8 Correct answer: X
0x7fffffffb1b0 Correct answer: X
0x7fffffffb0a8 Correct answer: - Example answer: Beyond the current stack pointer.
0x7fffffffb1a0 Correct answer: -
Register Value Comments
Correct answer: %rsp Correct answer: 0x7fffffffb1b0 The stack pointer
%rbp Correct answer: 0x7fffffffb1e0 The stack “base” pointer for the base of our stack frame.
%rdi Correct answer: 0x7fffffffb1b0 Example answer: Copy of %rsp as first argument to Gets; this is buf from the C code.

Excluding the 8 bytes for the return address (which is part of main’s stack frame), how many total bytes are allocated in getbuf’s stack frame?

Correct answer: 56 Explanation: The push instruction allocates 8 and the sub allocates an additional 0x30 = 3x16 = 48 bytes.

If you haven’t already, set a breakpoint at the instruction that calls Gets, and also set one on the instruction following the call.

(gdb) break *getbuf+12
(gdb) break *getbuf+17

Run the program, using your own username as the argument:

(gdb) run -u your_username

It should stop at your first breakpoint. Which of these commands will display the stack frame plus the return address?

info reg rsp
x /gx $rsp
x /s $rsp
x /8gx $rsp

Run that command and inspect the results. Make sure that the output agrees with your table above.

Remember that you have to read the values in the opposite order from your diagram, since gdb displays memory from lowest to highest address.

Now, continue execution so that you hit the next breakpoint, after Gets has been called. On the way, the program will pause to ask for input. Enter the following 40-byte string:

0123456789012345678901234567890123456789

Now that we’re at the second breakpoint, display the stack frame again using the same x command. What is the first 8-byte value that you see, at the bottom of the stack (start with 0x)?

Correct answer: 0x3736353433323130

Explain where that specific value came from:

Example answer: We typed in digits starting with 01234567, and each digit is represented by 1 byte. The ASCII codes for the digits 0-9 are 0x30-0x39 in hexadecimal, and we interpret multi-byte values as little-endian, so we get the value shown above by concatenating the ASCII codes for the first 8 digits we typed.

The buf variable is only 36 bytes, and the assembly code for getbuf uses some of the allocated stack space beyond those 36 bytes to store the variable_length variable. Since we have input more than 36 bytes and Gets does not check for that, part of that variable’s memory may be overwritten. Why is that not an issue?

We haven’t initialized variable_length yet and it will get reset when we do.
The volatile keyword protects variable_length from being overwritten.
40 input characters isn’t enough to overwrite variable_length, since there’s a gap between the two variables.

That string was 40 bytes long. What will happen if the string is 48 bytes long?

Example answer: That’s just long enough to fill up the extra 8-bytes of zeroes, and so the final NUL terminator will overwrite the last two hex digits of the old %rbp. If we’re lucky it’s not actually being used by anyone and we’ll be fine. If not, other stuff may start to go wrong as the corrupted %rbp value results in other code writing values into the wrong memory locations.

What will happen if the string is 56 characters long?

Example answer: In this case we’d overwrite the old %rbp value and corrupt the last byte of the return address. We’d almost certainly get a segmentation fault or other error when attempting to return from getbuf, but literally anything could happen, since bytes that are probably not aligned with real instructions will be reinterpreted as instructions to run.

Now quit out of gdb.

Formatting an Exploit String with hex2raw.bin

Basically, your job on the assignment is to design strings of various lengths which represent bytes being stored to the stack when they are accepted as input. Some of the bytes are simply “filler,” to fill up the buffer so that it overflows into other values stored on the stack. The filler bytes can be any value.

However, some bytes must be set to particular values so that the stack is corrupted in a way which will cause the program to do something it was not designed to do.

You will create the string representing the bytes in a file, using the format described in the assignment: Each byte in the byte sequence is written as a pair of hexadecimal digits. Successive bytes may be separated by spaces, but all bytes must be on the same line.

Avoid the use of 0A, which is the newline character and will cause Gets to stop reading input.

Exercise 2:

Use your editor to create a file named exploit0.hex. You will enter a string which represents the numbers you want stored to memory, and then create the raw bytes using hex2raw.bin (this way you can directly enter your desired bytes, without having to do a reverse lookup in an ASCII table for each byte to figure out what to type in).

In exploit0.hex, enter the following 40 values (these values are chosen to be easy to recognize when we display memory in gdb):

31 31 31 31 31 31 31 f1 32 32 32 32 32 32 32 f2 33 33 33 33 33 33 33 f3 34 34 34 34 34 34 34 f4 35 35 35 35 35 35 35 f5

Note: You must specify a pair of digits in the .hex file to represent each byte, e.g., 01 02 03, etc.

Create the raw bytes in the .bytes file from the .hex file by entering at the command line:

./hex2raw.bin < exploit0.hex > exploit0.bytes

(Note that we’re using the < and > shell operators we saw earlier to read input from a file and redirect output to a file.)

Start the debugger again, and set a breakpoint at the instruction after the Gets call in getbuf:

(gdb) break *getbuf+17

What gdb command (using your_username as the username and < to read input from a file) will run the program using the raw .bytes file you just created as input?

Example answer:

run -u yourusername < ./exploit0.bytes

Run that command, and gdb should stop at the breakpoint you set up. Examine the stack:

(gdb) x /8gx $rsp

Display the results, and explain what you see as compared to the last time when we entered digits manually:

Example answer:

This time we see what we wrote in the .hex file directly reflected in memory, since hex2raw.bin has done the reverse ASCII lookup for us for each pair of hex digits we entered.

Overwrite the return address

Assume that you want to replace the return address for the getbuf call on the stack with the address 0x0000000000400e45 (NOTE: this is not the correct value to use for any of the exploits).

Exercise 3:

  1. How many bytes of input would you need in total?

    Example answer: 64 bytes to overwrite the entire return address, which will also corrupt one byte beyond it in the stack. 63 bytes is probably enough since the high-order bytes of most return addresses you’d want to use (including text-segment and stack addresses) will be 0. In this specific case, we only need to overwrite the 3 least-significant bytes of the return address since the rest are identical to what’s already there (and zeroes so the NUL won’t be noticed), so 59 bytes is enough.

  2. Modify your exploit0.hex string to accomplish this and display the contents of your file below:

    Example answer:

    31313131313131f1 32323232323232f2 33333333333333f3 34343434343434f4 35353535353535f5 36363636363636f6 37373737373737f7 450e4000000000

    Note: this answer uses spaces every 8 hex values to make it easier to count out the bytes. It also uses exactly 63 bytes so that it overwrites the entire 8-byte return address when you count the trailing NUL byte that gets added automatically. Your answer can mostly contain ANY bytes you want, as long as there are the right number of them before the 45, 0e and 40.

Now create the raw bytes in the .bytes file from the .hex file by entering at the command line:

./hex2raw.bin < exploit0.hex > exploit0.bytes

Note that you can run terminal commands without having to quit out of gdb and set up your breakpoints again by typing ! within gdb followed by the terminal command. You could also just open up a second terminal to run commands alongside gdb.

If you need to, start gdb again and set up the breakpoint after the Gets call in getbuf.

Run the program using the raw bytes file and examine the stack:

(gdb) run -u your_username < ./exploit0.bytes
...
(gdb) x /8gx $rsp

Confirm that you have successfully overwritten the return address by entering the last 8-byte hex value displayed by that command:

Correct answer: 0x0000000000400e45

Run disas 0x400e45 to display the code that’s at the return address we just set up. Based on the result you see, what do you think will happen when we continue to execute the program in gdb?

Example answer: gdb just says “No function contains the specified address.” Since the address isn’t code, it’s most likely going to be in a different segment which will not be marked as executable, and so we’ll get a segmentation fault when we continue.

Use c to continue in gdb and confirm your prediction.

The usage function defined in laptop.c offers a way to exit the program cleanly, since it ends up calling exit rather than simply returning. Use disas to figure out what return address we should use if we want to jump straight to the exit call within the usage function when we’re done with getbuf (write down the address starting with 0x but omitting leading 0s):

Correct answer: 0x4014bf Explanation: Note that 0x4014bd is also a reasonable choice as that would set the argument to exit to 0 first.

Edit your exploit0.hex to use this return address value, run hex2raw.bin to create a new exploit0.bytes, and run the program. Confirm that it exits normally if you continue past your breakpoint.

NOTE: Although this lab contains essential details for Buffer, you should also read the assignment carefully, since it contains additional information not included in the lab exercises.

You are ready to begin working on exploit 1 on the assignment. Use remaining lab time to start working on the assignment.