Dark Buffer Arts
code Smash the stack to understand calling conventions and security concerns.
- Assign: Monday, 7 November
- Checkpoint: aim to complete at least two stages by 11:59pm Monday, 14 November
- Due: 11:59pm Thursday, 17 November
- Starter Code: fork wellesleycs240 / cs240-buffer (just once!), keep the "cs240-buffer" name, and add bpw as admin. (Need help?)
- Submit:
- Commit and push your final revision and double-check that you submitted the up-to-date version. (Need help?)
- Do not submit a paper copy.
- Relevant Reference:
- Collaboration: Individual code assignment policy, as defined by the syllabus.
Overview
Silly version: Impressed by your recent quest for the Sourceror’s Code, an infamous after-market magical artifact enhancement shop has called you in as a consultant. Your task is to augment the powers of an unassuming pink umbrella on behalf of a client who wishes to use it for daily tasks for which it was never intended.
Serious version: This assignment helps you develop a detailed
understanding of the call stack organization by deploying a series
of buffer overrun attacks on a vulnerable executable file called
umbrella
.
Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.
Contents
- Overview
- Setup
- Tasks
- Grading
- The
umbrella
- Tools for Crafting Exploits
- Running and Testing Exploits
- Exploits
Setup
Do this assignment in one of the CS 240 computing environments. The executables for this assignment were compiled specifically for the CS Linux machines and the wx appliance. As usual, fork wellesleycs240 / cs240-buffer and clone your Bitbucket repository to your machine.
Files: Your working copy should provide:
descriptions.txt
: file for English descriptions of your exploitsexploit1.hex
: file for Exploit 1exploit2.hex
: file for Exploit 2exploit3.hex
: file for Exploit 3exploit4.hex
: file for Exploit 4hex2raw
: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytesid2cookie
: utility to convert user ID to unique “cookie” valueMakefile
: recipes to test your exploitsumbrella
: executable you will attackumbrella.c
: important parts of C code used to compileumbrella
Create your cookie: Most attacks in this assignment will require
you to make a unique1 8-byte “cookie” value show up
in places where it ordinarily would not. This value will also
determine the exact behavior of your executable. To create your
personalized cookie, run make cookie
and enter your Bitbucket
ID. This will print your cookie in hex and record your Bitbucket ID
and cookie in the files id.txt
and cookie.txt
.
Tasks
You must craft exploit strings that accomplish four increasingly
sophisticated buffer overrun attacks when provided as input to the
vulnerable umbrella
executable.
Each exploit is described below.
Submit two parts for each exploit:
- Exploit string (input): Write your exploit string in
hex2raw
input format in each of the filesexploit1.hex
throughexploit4.hex
. - Description: In the separate
descriptions.txt
file, write a succinct paragraph describing succinctly in English how the exploit works. Your description should demonstrate that you understand:- What existing stack memory contents are overwritten by what parts of your exploit string when it is copied into memory.
- What instructions execute, using what data, to accomplish the attack.
- How these relate to the calling conventions and stack discipline.
As with the x86 rune descriptions, show that you understand the relation between the specifics of your exploit and the higher-level context for how and why it works.
- Do not give an exhaustive instruction-by-instruction account of
the exploit’s execution or simply translate the code to English.
(e.g., “Next, it adds 24 to
%rsp
, then it pops the value from the top of the stack and stores it in%rbx
, then it returns…“) - Do focus on the instructions that involve stack or exploit data and
accomplish key steps in the exploit. (e.g., “Then, the
acme
function loads the contents of thewidget
variable from the stack, which the exploit string has overwritten with the magic number 42, causing the following computation to produce the result 34 instead of the expected 12.”) - For each exploit, focus on what is new or different from the last exploit. Do not re-explain the basics that carry over from previous exploits.
Grading
The assignment is graded from a maximum of 100 points:
- 80 points for exploits (20 points each). Run
make test
to check all of your exploits. - 20 points for descriptions. We will grade descriptions of one or two exploits (chosen randomly), by the criteria above.
The umbrella
The umbrella
executable requires a user ID argument on the command
line and reads a string from standard input once it starts up. The
user ID customizes stack layout and verifies a unique “cookie” value
that your attacks must provide.
Usage
To run the umbrella
executable:
$ ./umbrella -u your_bitbucket_username
Type string:
Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:
$ ./umbrella -u $(cat id.txt)
Type string:
Input Vulnerability
The umbrella
executable reads a string from standard input with the
function getbuf()
:
unsigned long long getbuf() {
char buf[36];
// ...
unsigned long long val = (unsigned long long)Gets(buf);
// ...
return val % 40;
}
The full version of this function contains more code for an optional
additional challenge. The part shown here is sufficient for the
required parts of this assignment. The key feature to note is that
getbuf()
calls the function Gets()
, passing the address of its
local array buf
, which is allocated on the stack with space
for 36 char
s.
The function char* Gets(char* buf)
is similar to the standard C library function
char* gets(char* buf)
. It reads a string from standard input, terminated by a
newline character ('\n'
), and stores the characters of the string,
followed by a null terminator ('\0'
) starting at the memory address
given by its argument, buf
. It returns its argument.
Neither Gets()
nor gets()
has any way to determine whether there
is enough space at the destination to store the entire
string. Instead, they simply copy the entire string, assuming the
destination is large enough and thus possibly over-running the bounds
of the storage allocated at the destination.
If the input string read by getbuf()
is less than 36 characters
long, it is clear that getbuf()
will return some value less than
0x28 (that’s 4010), as shown by the following execution
example:
$ ./umbrella -u your_bitbucket_username
Type string: Acromantula!
Dud: getbuf returned 0x20
The value returned might differ for you, since it is derived from the
address of buf
on the stack, which may vary between systems.
Running the umbrella
under gdb
will also yield different values
than it does outside gdb
.
Typically, an error occurs if we type a longer string:
$ ./umbrella -u your_bitbucket_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!
As the error message indicates, over-running the buffer typically
causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella
so that it does more interesting things. These are called exploit strings.
Tools for Crafting Exploits
Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.
Formatting Exploit Strings with hex2raw
Each ASCII character
in a string is represented by one byte. For example 'A'
is
represented by the byte value also described by the hexadecimal number
value 0x41
. While your exploits will be delivered under the guise
of strings, they will embed sequences of bytes encoding addresses,
numbers, or other non-character data. It is hard enough to map each
desired byte value in your exploit back to a character by hand, but
often, the specific bytes required do not even correspond to any
typeable or printable ASCII characters, making it “difficult” to type
your exploit string on a keyboard or view it on the screen. Do not
try to encode your exploit by hand!
We have provided a tool called hex2raw
to encode exploit strings:
- The input to
hex2raw
is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces. - The output of
hex2raw
is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.
Suppose we want the sequence of bytes whose values are the hexadecimal
numbers 0x41, 0x42, 0x43, 0x1b. These same values, when interpreted
with the ASCII encoding, mean the characters 'A'
, 'B'
, 'C'
,
followed by the ASCII “escape” (ESC) character, which not treated as a
string character when typed on the keyboard or printed to the terminal
output. Given the input 41 42 43 1b
, the hex2raw
utility will
output the desired 4-byte sequence.
To run hex2raw
, type the series of hexadecimal byte value
descriptions you want in a file (e.g., exploit1.hex
for Exploit 1).
Following our example, we could save the string 41 42 43 1b
into the
file exploit1.hex
using Emacs. Then run:
$ ./hex2raw < exploit1.hex > exploit1.bytes
The input redirection symbol <
instructs the command-line shell to
use the contents of exploit1.hex
as standard input to hex2raw
,
instead of looking for input from the keyboard. The output
redirection symbol >
instructs the command-line shell to store the
standard (printed) output of hex2raw
into a file called
exploit1.bytes
. Input and output redirection (<
and >
) are
general features of the command-line shell that can be used
independently and with any executable command.
Once the exploit string byte sequence is stored into the file
exploit1.bytes
, run umbrella
with the contents of the file
exploit1.bytes
as input:
$ ./umbrella -u your_bitbucket_username < exploit1.bytes
Naturally, as with compiled source code, if you update your
exploit string specification in exploit1.hex
, you must run hex2raw
again to translate the new version to a byte sequence in
exploit1.bytes
to use this new exploit with the umbrella
.
0A
Your exploit string must not contain byte value 0x0A
(0A
in hex2raw
input) at any intermediate position,
since this is the ASCII code for newline ('\n'
). When Gets()
encounters this byte, it will assume you intended to terminate
the string input. hex2raw
will warn you if it encounters this
byte value.
Byte-Encoding Instructions
You may wish to come back and read this section later after looking
at the exploits. When including instructions as part of an exploit
payload, you must use the instruction encoding as machine code, the byte sequence
used to encode an instruction like pushq %rax
for the machine. This
is not the byte sequence representing the string "pushq %rax"
.
Use gcc
as an assembler and objdump
as a disassembler to generate
the byte codes for instruction sequences. Suppose we
write a file example.s
containing the following assembly code:
# Example of hand-generated assembly code
movq $0x1234abcd,%rax # Move 0x1234abcd to %rax
pushq $0x401080 # Push 0x401080 on to the stack
retq # Return
The code can contain a mixture of instructions and data. Anything
to the right of a #
character is a comment.
We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:
$ gcc -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following lines:
0: 48 c7 c0 cd ab 34 12 mov $0x1234abcd,%rax
7: 68 80 10 40 00 pushq $0x401080
c: c3 retq
Each line shows a single instruction. The number on the left indicates
the starting address (starting with 0), while the hex digits after the
:
character indicate the byte codes for the instruction, in memory
order from left to right. Thus, we can see that the instruction pushq
$0x401080
has a hex-formatted byte code of 68 80 10 40 00
.
If we read the 4 bytes starting at address 6 we get: 80 10 40 00
.
This is shows the bytes of 4-byte value 0x00401080
in little endian
order, with the byte at the lowest address shown on the left and the
byte at the highest address shown on the right.
Finally, we can read the byte sequence for our code: 48 c7 c0 cd ab
34 12 68 80 10 40 00 c3
Running and Testing Exploits
Test all exploits (used for grading):
- Save your exploits in
hex2raw
input format in the proper files. - Run
make test
to translate and test each exploit and generate a summary.
Run an individual exploit:
- Write the exploit string in
hex2raw
input format in, e.g., the fileexploit1.hex
. -
Translate it to raw bytes with
hex2raw
:$ ./hex2raw < exploit1.hex > exploit1.bytes
-
Run it directly (possible for Exploits 1 and 2):
$ ./umbrella -u your_bitbucket_username < exploit1.bytes
or under
gdb
(required for Exploits 3 and 4):$ gdb ./umbrella [... gdb startup output ...] (gdb) run -u your_bitbucket_username < exploit1.bytes
GDB Scripts
When using gdb
, you may find it useful to save a series of gdb
commands to a text file (e.g., commands.txt
) and then use the -x
commands.txt
flag, which runs each line of the file as a command in
gdb
. This saves the trouble of retyping the commands every time you
run gdb
. You can read more about the -x
flag in gdb
’s man
page.
Exploits
Save your buffer overrun exploit strings in
hex2raw
input format in the files exploit1.hex
,
exploit2.hex
, exploit3.hex
, exploit4.hex
.
Exploit 1: Candle
The function getbuf()
is called within umbrella
by a function test()
:
void test() {
volatile unsigned long long val;
volatile unsigned long long local = 0xdeadbeef;
char* variable_length;
entry_check(3); /* Make sure entered this function properly */
val = getbuf();
if (val <= 40) {
variable_length = alloca(val);
}
entry_check(3);
/* Check for corrupted stack */
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
} else if (val == cookie) {
printf("Boom!: getbuf returned 0x%llx\n", val);
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
if (val != cookie) {
printf("Sabotaged!: control flow has been disrupted\n");
}
validate(3);
} else {
printf("Dud: getbuf returned 0x%llx\n", val);
}
}
When getbuf()
executes its return statement, the program ordinarily resumes execution within function test()
. Within the file umbrella
, there is a
function smoke()
:
void smoke() {
entry_check(0); /* Make sure entered this function properly */
printf("Smoke!: You called smoke()\n");
validate(0);
exit(0);
}
Your task is to get umbrella
to execute the code
for smoke()
when getbuf()
executes its
return statement, rather than returning to test()
. You
can do this by supplying an exploit string that overwrites the stored
return pointer in the stack frame for getbuf()
with the
address of the first instruction in smoke
. Note that
your exploit string may also corrupt other parts of the stack state,
but this will not cause a problem, because smoke()
causes
the program to exit directly.
Advice
- All the information you need to devise this exploit string can be
determined by examining a disassembled version of
umbrella
. - Be careful about byte ordering (i.e., endianness).
- You might want to use
gdb
to step the program through the last few instructions ofgetbuf()
to make sure it is doing the right thing. - The placement of
buf
within the stack frame forgetbuf()
depends on which version ofgcc
was used to compileumbrella
. You will need to pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary. - Don’t forget to use
hex2raw
to simplify your job.
Exploit 2: Soda Fountain
Within the umbrella
there is also a function fizz()
:
void fizz(int arg1, char arg2, long arg3,
char* arg4, short arg5, short arg6, unsigned long long val) {
entry_check(1); /* Make sure entered this function properly */
if (val == cookie) {
printf("Fizz!: You called fizz(0x%llx)\n", val);
validate(1);
} else {
printf("Misfire: You called fizz(0x%llx)\n", val);
}
exit(0);
}
Similar to Exploit 1, your task is to get umbrella
to
execute the code for fizz()
rather than returning
to test
. In this case, however, you must make it appear
to fizz
as if you have passed your cookie as its
argument. You can do this by encoding your cookie in the appropriate
place within your exploit string.
Advice
- Recall that the first six arguments are passed in registers and
additional arguments are passed on the stack. Your exploit code
needs to write to the appropriate place within the stack. This
explains our somewhat contrived
fizz
parameters. - You can use
gdb
to get the information you need to construct your exploit string. Set a breakpoint withingetbuf()
and run to this breakpoint. Determine key features such as the address ofval
and the location of the buffer.
Exploit 3: Door Buster
A much more sophisticated form of buffer attack involves supplying
a string that encodes actual machine instructions. The exploit string
then overwrites the return pointer with the starting address of these
instructions. When the calling function (in this
case getbuf
) executes its ret
instruction,
the program will start executing the instructions on the stack rather
than returning. With this form of attack, you can get the program to
do almost anything. The code you place on the stack is called
the exploit code. This style of attack is tricky, though,
because you must get machine code onto the stack and set the return
pointer to the start of this code.
You will need to run umbrella
under gdb
for this exploit to
succeed. (Modern systems use memory protection mechanisms to prevent
execution of memory locations in the stack and guard against exactly
this type of attack. Since gdb
works a little differently than
normal program execution, it allows the exploit to succeed.)
Within the file umbrella
there is a function bang()
:
unsigned global_value = 0;
void bang(unsigned long long val) {
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie) {
printf("Bang!: You set global_value to 0x%llx\n", global_value);
validate(2);
} else {
printf("Misfire: global_value = 0x%llx\n", global_value);
}
exit(0);
}
Similar to Exploits 1 and 2, your task is to get umbrella
to execute
the code for bang()
rather than returning to test()
. Before this,
however, you must set global variable global_value
to your
cookie. Your exploit code should set global_value
, push the address
of bang()
on the stack, and then execute a ret
instruction to
cause a jump to the code for bang()
.
Advice:
- You will need to run
umbrella
undergdb
for this exploit to succeed. - Determining the byte encoding of instruction sequences by hand is
tedious and prone to errors. You can let tools do all of the work by
writing an assembly code file containing the instructions and data
you want to put on the stack. Assemble this file with
gcc
and disassemble it withobjdump
. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.) - Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie.
- Watch your use of address modes when writing assembly code. Note
that
movq $0x4, %rax
moves the value0x0000000000000004
into register%rax
; whereasmovq 0x4, %rax
moves the value in memory at address0x0000000000000004
into%rax
, which is not likely your intent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!) - Do not attempt to use either a
jmp
or acall
instruction to jump to the code forbang()
. These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use theret
instruction.
Exploit 4: Whizbang Basic Blaze Box
You will need to run umbrella
under gdb
for this exploit to
succeed.
Our preceding attacks have all caused the program to jump to the
code for some other function, which then causes the program to
exit. As a result, it was acceptable to use exploit strings that
corrupt the stack, overwriting the saved value of
register %rbp
and the return pointer.
The most sophisticated form of buffer overrun attack causes the
program to execute some exploit code that patches up the stack and
makes the program return to the original calling function
(test()
in this case). The calling function is oblivious
to the attack. This style of attack is tricky, though, since you must:
(1) get machine code onto the stack, (2) set the return pointer to the
start of this code, and (3) undo the corruption made to the stack
state.
Your job is to supply an exploit string that will
cause getbuf()
to return your cookie back
to test()
, rather than the value 1. You can see in the
code for test()
that this will cause the program to go
Boom!
. Your exploit code should set your cookie as the
return value, restore any corrupted state, push the correct return
location on the stack, and execute a ret
instruction to
really return to test()
.
Advice:
- You will need to run
umbrella
undergdb
for this exploit to succeed. - In order reach the return address slot on the stack, your exploit
string must also cover all the items saved on the stack between the
buf
array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return totest()
may depend on it. Consider carefully what is stored here on the stack duringgetbuf
, its original source, where it stored aftergetbuf
completes, and howtest
may use it. Usegdb
to inspect the disassembled code ofgetbuf
andtest
. Determine how you can organize your exploit to avoid disturbing stack data ingetbuf
on whichtest
later relies. - Let tools such as
gcc
andobjdump
do all of the work of generating a byte encoding of the instructions. - Keep in mind that your exploit string depends on your cookie, your machine, and your compiler.
You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant security problem!
Mayhem (optional extra credit)
execve
is a system call that replaces the currently running
program with another program inheriting all the open file descriptors. What
are the limitations of the exploits you have performed so far? How could calling
execve
allow you to circumvent this limitation? If you have time,
try writing an additional exploit (mayhem.hex
) that uses execve
and another
program to print a message.
Talk to Ben if you’re curious about more.