Dark Buffer Arts?|?d??|?d?Segmentation fault: 11
code Smash the stack to understand calling conventions and security concerns.
- Due: 11:00pm Monday, 11 April
- Starter Code: fork wellesleycs240 / cs240-buffer (just once!) and add bpw as admin. (Need help?)
- Submit:
- Commit and push your final revision and double-check that you submitted the up-to-date version. (Need help?)
- Do not submit a paper copy.
- Relevant Reference:
- Collaboration: Individual code assignment policy, as defined by the syllabus.
Overview
Silly version: Impressed with your skills in defusing binary whizbangs, an infamous after-market magical artifact enhancement shop has called you in for an interview. To get the job, you must use some shadowy techniques to cause an unassuming pink umbrella to put on a colorful fireworks display.
Translation: This assignment helps you develop a detailed understanding of the call stack organization on a 32-bit x86 processor. It involves applying
a series of buffer overflow attacks on an executable file
called umbrella
.
Ethics: In this assignment, you will gain firsthand experience with one of the methods commonly used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.
Suggested Practice: CSAPP Practice Problems 3.30, 3.31, 3.33, and others nearby are good review of the stack discipline.
Contents
Instructions
The executables for this assignment were compiled specifically for the CS Linux machines and the wx appliance. Do this assignment in one of the CS 240 computing environments.
As usual, fork wellesleycs240 / cs240-buffer and clone your Bitbucket repository to your machine. You should find the following files in your working copy:
makecookie
: generates a “cookie” based on some string (which will be your username)umbrella
: executable you will attackumbrella.c
: important parts of C code used to compileumbrella
sendstring
: utility to help convert human-readable exploit descriptions to raw exploit stringsMakefile
: recipes to test your exploits
Exploits: Your main job is to craft 4 separate exploit strings that accomplish increasingly sophisticated buffer overrun exploits (described below) on the vulnerable umbrella
executable. Save your buffer overflow exploits for the different levels in hex sendstring
format in these files:
smoke.txt
- Level 0 exploitfizz.txt
- Level 1 exploitbang.txt
- Level 2 exploitboom.txt
- Level 3 exploit
Again, store your exploits in human-readable hex sendstring
format, not the fully byte-encoded format.
Additionally, store your Bitbucket username in the file id.txt
so the testing harness can find it automatically.
To test your exploits, make sure these files are setup properly, then run make test
. This will test each exploit level and generate a summary.
Descriptions: In addition to constructing each exploit, you must give a clear and concise English explanation of the exploit. Your explanation should demonstrate that you understand:
- What existing stack memory contents are overwritten by what parts of your exploit string when it is copied into memory.
- What instructions execute, using what exploit data, to achieve the exploit goal.
- How these relate to the calling conventions and stack discipline.
As with the whizbang descriptions, show that you understand the relation between the specifics of your exploit and the higher-level context for how and why it works.
- Do not give an exhaustive instruction-by-instruction account of the exploit’s execution or simply translate the code to English. (e.g., “Next, it copies the value from
%ebp
into%esp
, then it pops a value off the stack into%ebp
, then it returns…“) - Focus on the instructions that involve stack or exploit data and accomplish key steps in the exploit. (e.g., “Then, the
acme
function loads the contents of thewidget
variable from the stack, which the exploit string has overwritten with the magic number 42, causing the following computation to produce the result 34 instead of the expected 12.”) - For each phase, focus on what is new or different from the last phase. Do not bother re-explaining the basics that carry over from previous
Save your descriptions as plain text in descriptions.txt
or, if you prefer to use another format, feel free to do so, then convert it to PDF format as descriptions.pdf
and hg add
this file to your repository.
Grading will weight your exploit and your description roughly equally. All four levels have equal weight.
Be sure to read this document carefully before beginning your work.
The umbrella
The umbrella
program must be run with the -u your_bitbucket_username
flag, which operates the umbrella
for the indicated username. (We will feed umbrella
your username
with the -u
flag when grading your solutions.)
In most of the attacks in this assignment, your objective will be to make a unique1 personalized 4-byte “cookie” value show up in places where it ordinarily would not. The proper cookie value is derived from the username you provide with -u
. It also affects the stack layout.
You can generate your cookie with the makecookie
program giving your Bitbucket username as the argument:
$ ./makecookie wendyw
0x5e57e632
(Of course, you should replace wendyw
with your own username.) While you are doing this, you might as well prepare the first file you need to turn in: id.txt
$ echo your_bitbucket_username > id.txt
(Of course, you should replace your_bitbucket_username
with your Bitbucket username.) This will generate a text file containing your username followed by a single newline.
To test your exploits, make sure your human-readable hex sendstring
-format exploits are stored in the proper files and then run make test
. This will test each exploit level and generate a summary.
How it works
The umbrella
program reads a string from standard input
with the function getbuf()
:
unsigned getbuf() {
char buf[36];
// ...
unsigned val = (unsigned)Gets(buf);
// ...
return val % 40;
}
The full version of this function contains more code for an optional additional challenge, but you can reason about this version for the requirements of this assignment. The key feature to note is that getbuf()
calls the function Gets()
, passing the address of its local array buf
, which is allocated on the stack with space for 36 char
s.
The function Gets()
is similar to the standard C library function gets()
—it reads a string from standard input (terminated by \n
)
and stores it (followed by a null terminator, \0
) at the specified
destination. It returns its argument, an address.
Neither Gets()
nor gets()
has any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, assuming the destination is large enough and thus possibly over-running the bounds of the storage allocated at the destination.
If the string typed by the user and read by getbuf()
is less than 36 characters long, it is clear that getbuf()
will return some value less than 0x28, as shown by the following execution example:
$ ./umbrella -u your_bitbucket_username
Type string: Merlin's beard!
Dud: getbuf returned 0x20
The value returned might differ for you, since it is derived from the address of buf
on the stack, which may vary between systems. Running the umbrella
under gdb
will also yield different values than it does outside gdb
.
Typically, an error occurs if we type a longer string:
$ ./umbrella -u your_bitbucket_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!
As the error message indicates, over-running the buffer typically
causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella
so that it does more interesting things. These are called exploit strings.
Tools for Crafting Exploits
Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.
Formatting Exploit Strings
Remember that each ASCII character is represented by a byte. For example 'A'
is represented by the byte value 0x41
, as described in hexadecimal. Embedding addresses, numbers, or other non-character data in your exploit string means finding the sequence of characters whose ASCII encodings happen to match the byte values you wish to generate. This is annoying and hard to begin with, but it gets worse when you need a byte value that corresponds to no ASCII character you can type on the keyboard. Don’t try to do encode your exploit by hand!
To simplify this task, we have provided a tool called sendstring
that reads in a human-readable text description of a byte sequence and produces its encoding as bytes. Essentially, this allows you to skip the step of figuring out what character to type to generate the byte value you want.
Suppose we want the byte sequence 0x41
, 0x42
, 0x43
, 0x1b
, where each desired byte has been shown in hexadecimal notation. To get this sequence of bytes by typing characters, we would need characters 'A'
, 'B'
, 'C'
, followed by the ASCII “escape” (ESC) character, which is treated as something other than a normal character when typed on the keyboard, making it hard to type as string input! sendstring
will take the string input "41 42 43 1b"
and produce the byte sequence we desire.
To run sendstring
, type the series of hexadecimal byte value descriptions you want in a file (e.g., smoke.txt
for Level 0), and run:
$ ./sendstring < smoke.txt > smoke.bytes
Here the <
instructs the shell to provide the contents of smoke.txt
as standard input to sendstring
and the >
to store all of sendstring
’s standard (printed) output into a new file called smoke.bytes
. This feature is called input (<
) and output (>
) redirection – each can be used independently of the other.
Now, you can run your umbrella
, reading standard input from contents of file smoke.bytes
instead of from the keyboard:
$ ./umbrella -u your_bitbucket_username < smoke.bytes
Alternatively, if you are not running the umbrella
under gdb
, you can do this process all at once, skipping the middle smoke.bytes
file by using a pipe to attach the output of sendstring
directly to the input of umbrella
:
$ ./sendstring < smoke.txt | ./umbrella -u your_bitbucket_username
0A
Your exploit string must not contain byte value 0x0A
(0A
in sendstring
input) at any intermediate position,
since this is the ASCII code for newline ('\n'
). When Gets()
encounters this byte, it will assume you intended to terminate
the string input. sendstring
will warn you if it encounters this
byte value.
Testing Exploits
To test your exploits, make sure your human-readable hex sendstring
-format exploits are stored in the proper files and then run make test
. This will test each exploit level and generate a summary.
Saving GDB commands
When using gdb
, you may find it useful to save a
series of gdb
commands to a text file and then use
the -x commands.txt
flag. This saves you the trouble of
retyping the commands every time you run gdb
. You can
read more about the -x
flag
in gdb
’s man
page.
Generating Byte Codes
You may wish to come back and read this section later after looking at
the exploits. When including instructions as part of an exploit, you must include the instruction encoding, the actual series of bytes used to encode an instruction like pushl %eax
, not the byte encoding of string of the assembly language fragment "pushl %eax"
.
Using gcc
as an assembler and objdump
as a disassembler
makes it convenient to generate the byte codes for instruction sequences.
For example, suppose we write a file example.s
containing the
following assembly code:
# Example of hand-generated assembly code
movl $0x1234abcd,%eax # Move 0x1234abcd to %eax
pushl $0x401080 # Push 0x401080 on to the stack
ret # Return
The code can contain a mixture of instructions and data. Anything
to the right of a #
character is a comment.
We can now assemble and disassemble this file:
$ gcc -m32 -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following lines:
0: b8 cd ab 34 12 mov $0x1234abcd,%eax
5: 68 80 10 40 00 push $0x401080
a: c3 ret
Each line shows a single instruction. The number on the left
indicates the starting address (starting with 0), while the hex digits
after the :
character indicate the byte codes for the
instruction. Thus, we can see that the instruction pushl $0x401080
has a hex-formatted byte code of 68 80 10 40 00
.
If we read off the 4 bytes starting at address 6 we
get: 80 10 40 00
. This is a byte-reversed version of the
data word 0x00401080
. This byte reversal represents the proper way to supply the bytes as a string, since a little-endian machine lists the least significant byte first.
Finally, we can read off the byte sequence for our code:
b8 cd ab 34 12 68 80 10 40 00 c3
Exploits
There are four functions to exploit for this assignment. The exploits increase in difficulty. There is an addition to the last function that you can exploit for extra pizzaz if you are having fun. Keep in mind that the grading relies on both exploits and your documentation, so describe your approach to all stages, even if your exploit is not yet working.
Level 0: Candle
The function getbuf()
is called within umbrella
by a function test()
:
void test() {
unsigned val;
volatile unsigned local = 0xdeadbeef;
char* variable_length;
entry_check(3); /* Make sure entered this function properly */
val = getbuf();
if (val <= 40) {
variable_length = alloca(val);
}
entry_check(3);
/* Check for corrupted stack */
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
} else if (val == cookie) {
printf("Boom!: getbuf returned 0x%x\n", val);
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
validate(3);
} else {
printf("Dud: getbuf returned 0x%x\n", val);
}
}
When getbuf()
executes its return statement, the program ordinarily resumes execution within function test()
. Within the file umbrella
, there is a
function smoke()
:
void smoke() {
entry_check(0); /* Make sure entered this function properly */
printf("Smoke!: You called smoke()\n");
validate(0);
exit(0);
}
Your task is to get umbrella
to execute the code
for smoke()
when getbuf()
executes its
return statement, rather than returning to test()
. You
can do this by supplying an exploit string that overwrites the stored
return pointer in the stack frame for getbuf()
with the
address of the first instruction in smoke
. Note that
your exploit string may also corrupt other parts of the stack state,
but this will not cause a problem, because smoke()
causes
the program to exit directly.
Advice
- All the information you need to devise your exploit string for
this level can be determined by examining a disassembled version
of
umbrella
. - Be careful about byte ordering (i.e., endianness).
- You might want to use
gdb
to step the program through the last few instructions ofgetbuf()
to make sure it is doing the right thing. - The placement of
buf
within the stack frame forgetbuf()
depends on which version ofgcc
was used to compileumbrella
. You will need to pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary. - Don’t forget to use
sendstring
to simplify your job.
Level 1: Sparkler
Within the umbrella
there is also a function fizz()
:
void fizz(unsigned val) {
entry_check(1); /* Make sure entered this function properly */
if (val == cookie) {
printf("Fizz!: You called fizz(0x%x)\n", val);
validate(1);
} else {
printf("Misfire: You called fizz(0x%x)\n", val);
}
exit(0);
}
Similar to Level 0, your task is to get umbrella
to
execute the code for fizz()
rather than returning
to test
. In this case, however, you must make it appear
to fizz
as if you have passed your cookie as its
argument. You can do this by encoding your cookie in the appropriate
place within your exploit string.
Advice
- You can use
gdb
to get the information you need to construct your exploit string. Set a breakpoint withingetbuf()
and run to this breakpoint. Determine key features such as the address ofval
and the location of the buffer.
Level 2: Firecracker
A much more sophisticated form of buffer attack involves supplying
a string that encodes actual machine instructions. The exploit string
then overwrites the return pointer with the starting address of these
instructions. When the calling function (in this
case getbuf
) executes its ret
instruction,
the program will start executing the instructions on the stack rather
than returning. With this form of attack, you can get the program to
do almost anything. The code you place on the stack is called
the exploit code. This style of attack is tricky, though,
because you must get machine code onto the stack and set the return
pointer to the start of this code.
For Level 2, you will need to run your exploit
within gdb
for it to succeed. (Modern systems use memory protection mechanisms to prevent execution of memory locations in the stack and guard against exactly this type of attack. Since gdb
works a little differently than normal program execution, it allows the exploit to succeed.)
Within the file umbrella
there is a function bang()
:
unsigned global_value = 0;
void bang(unsigned val) {
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie) {
printf("Bang!: You set global_value to 0x%x\n", global_value);
validate(2);
} else {
printf("Misfire: global_value = 0x%x\n", global_value);
}
exit(0);
}
Similar to Levels 0 and 1, your task is to get umbrella
to execute the code for bang()
rather than returning
to test()
. Before this, however, you must set global
variable global_value
to your cookie. Your exploit code
should set global_value
, push the address
of bang()
on the stack, and then execute
a ret
instruction to cause a jump to the code
for bang()
.
Advice:
- Determining the byte encoding of instruction sequences by hand
is tedious and prone to errors. You can let tools do all of the work
by writing an assembly code file containing the instructions and
data you want to put on the stack. Assemble this file
with
gcc
and disassemble it withobjdump
. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.) - Keep in mind that your exploit string depends on your machine,
your compiler, and even your cookie. Make sure your exploit
string works on the CS Linux machines, and make sure you include your Bitbucket
username on the command line to
umbrella
. -
Watch your use of address modes when writing assembly code. Note that
movl $0x4, %eax
moves the value0x00000004
into register%eax
; whereasmovl 0x4, %eax
moves the value at memory location0x00000004
into%eax
, which is not likely your intent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!) - Do not attempt to use either a
jmp
or acall
instruction to jump to the code forbang()
. These instructions use PC-relative addressing, which is very tricky to set up correctly in this attack. Instead, push an address on the stack and use theret
instruction.
Level 3: Whizbang
For level 3, you will need to run your umbrella
exploit
within gdb
for it to succeed.
Our preceding attacks have all caused the program to jump to the
code for some other function, which then causes the program to
exit. As a result, it was acceptable to use exploit strings that
corrupt the stack, overwriting the saved value of
register %ebp
and the return pointer.
The most sophisticated form of buffer overflow attack causes the
program to execute some exploit code that patches up the stack and
makes the program return to the original calling function
(test()
in this case). The calling function is oblivious
to the attack. This style of attack is tricky, though, since you must:
(1) get machine code onto the stack, (2) set the return pointer to the
start of this code, and (3) undo the corruption made to the stack
state.
Your job for this level is to supply an exploit string that will
cause getbuf()
to return your cookie back
to test()
, rather than the value 1. You can see in the
code for test()
that this will cause the program to go
Boom!
. Your exploit code should set your cookie as the
return value, restore any corrupted state, push the correct return
location on the stack, and execute a ret
instruction to
really return to test()
.
Advice:
-
In order to overwrite the return pointer, you must also overwrite the saved value of
%ebp
. However, it is important that this value is correctly restored before you return totest()
. You can do this by either (1) making sure that your exploit string contains the correct value of the saved%ebp
in the correct position, so that it never gets corrupted, or (2) restore the correct value as part of your exploit code. You’ll see that the code fortest()
has some explicit tests to check for a corrupted stack. -
You do not need it for this exploit, but the NOP (no operation) instruction is useful when constructing this style of buffer overflow exploit, used in a pattern called a “NOP sled”.
-
You can use
gdb
to get the information you need to construct your exploit string. Set a breakpoint withingetbuf()
and run to this breakpoint. Determine parameters such as the saved return address and the saved value of%ebp
. -
Let tools such as
gcc
andobjdump
do all of the work of generating a byte encoding of the instructions. -
Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie. Again, again make sure your exploit string works on the CS Linux machines, and make sure you include your Bitbucket username on the command line to
umbrella
.
Reflect on what you have accomplished. You caused a program to execute machine code of your own design. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss.
Mayhem (optional extra exploration)
execve
is a system call that replaces the currently running
program with another program inheriting all the open file descriptors. What
are the limitations of the exploits you have performed so far? How could calling
execve
allow you to circumvent this limitation? If you have time,
try writing an additional exploit that uses execve
and another
program to print a message.