Dark Buffer Arts
code Smash the stack to understand calling conventions and security concerns.
- Assign: Monday, 3 April
- Checkpoint: aim to complete at least two stages by 11:59pm Thursday, 6 April
- Due: 11:59pm Monday, 10 April
- Collaboration: Individual code assignment policy, as defined by the syllabus and honor code.
- Starter Code:
hg clone ssh://hg@bitbucket.org/cs240codetub/cs240-buffer-owner
(Need help?) - Submit: Commit and push your final revision and double check. (Need help?)
- Relevant Reference:
Contents
- Overview
- Setup
- Tasks
- Preparatory Exercises
- The
umbrella
Executable - Tools for Crafting Exploits
- Running and Testing Exploits
- Exploits
- Grading
Overview
Silly version: Impressed by your recent quest for the Sourceror’s Code, an infamous after-market magical artifact enhancement shop has called you in as a consultant. Your task is to augment the powers of an unassuming pink umbrella on behalf of a client who wishes to use it for daily tasks for which it was never intended.
Serious version: This assignment helps you develop a detailed
understanding of the call stack organization by deploying a series
of buffer overrun attacks on a vulnerable executable file called
umbrella
.
Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.
Setup
Do this assignment in one of the CS 240 computing environments. The executables for this assignment were compiled specifically for the CS Linux machines and the wx appliance.
Your Dark Buffer Arts Palette contains the following files:
descriptions.txt
: file for English descriptions of your exploitsexploit1.hex
,exploit2.hex
,exploit3.hex
,exploit4.hex
: files for Exploits 1-4hex2raw
: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytesid2cookie
: utility to convert user ID to unique “cookie” valueMakefile
: recipes to test your exploitsumbrella
: executable you will attackumbrella.c
: important parts of C code used to compileumbrella
Create your cookie: Most attacks in this assignment will require
you to make a unique1 8-byte “cookie” value show up
in places where it ordinarily would not. This value will also
determine the exact behavior of your executable. To create your
personalized cookie, run rm id.txt
and make cookie
. and enter your Bitbucket
ID. This will print your cookie in hex and record your Bitbucket ID
and cookie in the files id.txt
and cookie.txt
.
Spring 2017: automatic pre-generation of cookies failed. Please use these steps:
rm id.txt
make cookie
- When prompted, type in your Bitbucket ID and hit enter.
The hex value displayed is your cookie.
Tasks
You must craft exploit strings that accomplish four increasingly
sophisticated buffer overrun attacks when provided as input to the
vulnerable umbrella
executable.
Each exploit is described below.
Submit two parts for each exploit:
- Exploit string (input): Write your exploit string in
hex2raw
input format in each of the filesexploit1.hex
throughexploit4.hex
. - Description questions: In the separate
descriptions.txt
file, answer the questions given with each exploit succinctly to describe how your exploit works.- Many of these questions request only a couple words or an instruction listing from your disassembled code.
- Prose answers should focus on the general meaning of code and data rather than specific numbers or addresses (e.g., “return address”, not “0x4067c5”).
You may find it helpful to use these questions to help guide your exploit developoment.
The remainder of this document describes:
- Ungraded prepatory exercises to complete as a prequisite for assistance from course staff.
- The executable you will attack
- Tools and techniques to use while constructing an exploit. (Skim, then return when working on Exploit 1.)
- How to run and test your exploits. (Skim, then return when working on Exploit 1.)
- The requirements and questions for each exploit.
- The grading policy.
Preparatory Exercises
Complete these exercises before asking questions on the exploits. You may ask questions about these exercises before completing them.
- Make sure you completed the setup above, including
m
baking your “cookie.” - As you read about the
umbrella
executable, disassemble it to findgetbuf
. - Draw the call stack frame for a call to
getbuf
right before it callsGets
, using the conventions from class and lab. Label the positions and sizes of as many parts of the frame as you can recover. - On your call stack drawing, simulate a call to
Getsgetbuf
with a sample input string of your choosing by following the C code inumbrella.c
and, showing any updates to the call stack bounds or content. Do not bother simulatingGets
at the x86 level – take its functionaliy at face value as documented (or use the C code). - Remember these later to save time:
- Which exploits will run alone without GDB? Which exploits work only under GDB?
- What is the purpose of
hex2raw
? - What are the steps for running your exploit? For testing all exploits?
The umbrella
Executable
The umbrella
executable requires a user ID argument on the command
line and reads a string from standard input once it starts up. The
user ID customizes stack layout and verifies a unique “cookie” value
that your attacks must provide.
Usage
To run the umbrella
executable:
$ ./umbrella -u your_bitbucket_username
Type string:
Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:
$ ./umbrella -u $(cat id.txt)
Type string:
Input Vulnerability
The umbrella
executable reads a string from standard input with the
function getbuf()
:
unsigned long long getbuf() {
char buf[36];
// ...
unsigned long long val = (unsigned long long)Gets(buf);
// ...
return val % 40;
}
The full version of this function contains more code for an optional
additional challenge. The part shown here is sufficient for the
required parts of this assignment. The key feature to note is that
getbuf()
calls the function Gets()
, passing the address of its
local array buf
, which is allocated on the stack with space
for 36 char
s.
The function char* Gets(char* buf)
is similar to the standard C library function
char* gets(char* buf)
. It reads a string from standard input, terminated by a
newline character ('\n'
), and stores the characters of the string,
followed by a null terminator ('\0'
) starting at the memory address
given by its argument, buf
. It returns its argument.
Neither Gets()
nor gets()
has any way to determine whether there
is enough space at the destination to store the entire
string. Instead, they simply copy the entire string, assuming the
destination is large enough and thus possibly over-running the bounds
of the storage allocated at the destination.
If the input string read by getbuf()
is less than 36 characters
long, it is clear that getbuf()
will return some value less than
0x28 (that’s 4010), as shown by the following execution
example:
$ ./umbrella -u your_bitbucket_username
Type string: Acromantula!
Dud: getbuf returned 0x20
The value returned might differ for you, since it is derived from the
address of buf
on the stack, which may vary between systems.
Running the umbrella
under gdb
will also yield different values
than it does outside gdb
.
Typically, an error occurs if we type a longer string:
$ ./umbrella -u your_bitbucket_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!
As the error message indicates, over-running the buffer typically
causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed umbrella
so that it does more interesting things. These are called exploit strings.
Tools for Crafting Exploits
Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.
Formatting Exploit Strings with hex2raw
Each ASCII character
in a string is represented by one byte. For example 'A'
is
represented by the byte value also described by the hexadecimal number
value 0x41
. While your exploits will be delivered under the guise
of strings, they will embed sequences of bytes encoding addresses,
numbers, or other non-character data. It is hard enough to map each
desired byte value in your exploit back to a character by hand, but
often, the specific bytes required do not even correspond to any
typeable or printable ASCII characters, making it “difficult” to type
your exploit string on a keyboard or view it on the screen. Do not
try to encode your exploit by hand!
We have provided a tool called hex2raw
to encode exploit strings:
- The input to
hex2raw
is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces. - The output of
hex2raw
is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.
Suppose we want the sequence of bytes whose values are the hexadecimal
numbers 0x41, 0x42, 0x43, 0x1b. These same values, when interpreted
with the ASCII encoding, mean the characters 'A'
, 'B'
, 'C'
,
followed by the ASCII “escape” (ESC) character, which not treated as a
string character when typed on the keyboard or printed to the terminal
output. Given the input 41 42 43 1b
, the hex2raw
utility will
output the desired 4-byte sequence.
To run hex2raw
, type the series of hexadecimal byte value
descriptions you want in a file (e.g., exploit1.hex
for Exploit 1).
Following our example, we could save the string 41 42 43 1b
into the
file exploit1.hex
using Emacs. Then run:
$ ./hex2raw < exploit1.hex > exploit1.bytes
The input redirection symbol <
instructs the command-line shell to
use the contents of exploit1.hex
as standard input to hex2raw
,
instead of looking for input from the keyboard. The output
redirection symbol >
instructs the command-line shell to store the
standard (printed) output of hex2raw
into a file called
exploit1.bytes
. Input and output redirection (<
and >
) are
general features of the command-line shell that can be used
independently and with any executable command.
Once the exploit string byte sequence is stored into the file
exploit1.bytes
, run umbrella
with the contents of the file
exploit1.bytes
as input:
$ ./umbrella -u your_bitbucket_username < exploit1.bytes
Naturally, as with compiled source code, if you update your
exploit string specification in exploit1.hex
, you must run hex2raw
again to translate the new version to a byte sequence in
exploit1.bytes
to use this new exploit with the umbrella
.
0A
Your exploit string must not contain byte value 0x0A
(0A
in hex2raw
input) at any intermediate position,
since this is the ASCII code for newline ('\n'
). When Gets()
encounters this byte, it will assume you intended to terminate
the string input. hex2raw
will warn you if it encounters this
byte value.
Byte-Encoding Instructions
You may wish to come back and read this section later after looking
at the exploits. When including instructions as part of an exploit
payload, you must use the instruction encoding as machine code, the byte sequence
used to encode an instruction like pushq %rax
for the machine. This
is not the byte sequence representing the string "pushq %rax"
.
Use gcc
as an assembler and objdump
as a disassembler to generate
the byte codes for instruction sequences. Suppose we
write a file example.s
containing the following assembly code:
# Example of hand-generated assembly code
movq $0x1234abcd,%rax # Move 0x1234abcd to %rax
pushq $0x401080 # Push 0x401080 on to the stack
retq # Return
The code can contain a mixture of instructions and data. Anything
to the right of a #
character is a comment.
We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:
$ gcc -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following lines:
0: 48 c7 c0 cd ab 34 12 mov $0x1234abcd,%rax
7: 68 80 10 40 00 pushq $0x401080
c: c3 retq
Each line shows a single instruction. The number on the left indicates
the starting address (starting with 0), while the hex digits after the
:
character indicate the byte codes for the instruction, in memory
order from left to right. Thus, we can see that the instruction pushq
$0x401080
has a hex-formatted byte code of 68 80 10 40 00
.
If we read the 4 bytes starting at address 6 we get: 80 10 40 00
.
This is shows the bytes of 4-byte value 0x00401080
in little endian
order, with the byte at the lowest address shown on the left and the
byte at the highest address shown on the right.
Finally, we can read the byte sequence for our code: 48 c7 c0 cd ab
34 12 68 80 10 40 00 c3
Running and Testing Exploits
Test all exploits (used for grading):
- Save your exploits in
hex2raw
input format in the proper files. - Run
make test
to translate and test each exploit and generate a summary.
Run an individual exploit:
- Write the exploit string in
hex2raw
input format in, e.g., the fileexploit1.hex
. -
Translate it to raw bytes with
hex2raw
:$ ./hex2raw < exploit1.hex > exploit1.bytes
-
Run it directly (possible for Exploits 1 and 2):
$ ./umbrella -u your_bitbucket_username < exploit1.bytes
or under
gdb
(required for Exploits 3 and 4):$ gdb ./umbrella [... gdb startup output ...] (gdb) run -u your_bitbucket_username < exploit1.bytes
GDB Scripts
When using gdb
, you may find it useful to save a series of gdb
commands to a text file (e.g., commands.txt
) and then use the -x
commands.txt
flag, which runs each line of the file as a command in
gdb
. This saves the trouble of retyping the commands every time you
run gdb
. You can read more about the -x
flag in gdb
’s man
page.
Exploits
Save your buffer overrun exploit strings in
hex2raw
input format in the files exploit1.hex
,
exploit2.hex
, exploit3.hex
, exploit4.hex
.
Exploit 1: Candle
The function getbuf()
is called within umbrella
by a function test()
:
void test() {
volatile unsigned long long val;
volatile unsigned long long local = 0xdeadbeef;
char* variable_length;
entry_check(3); /* Make sure entered this function properly */
val = getbuf();
if (val <= 40) {
variable_length = alloca(val);
}
entry_check(3);
/* Check for corrupted stack */
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
} else if (val == cookie) {
printf("Boom!: getbuf returned 0x%llx\n", val);
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
if (val != cookie) {
printf("Sabotaged!: control flow has been disrupted\n");
}
validate(3);
} else {
printf("Dud: getbuf returned 0x%llx\n", val);
}
}
When getbuf()
executes its return statement, the program ordinarily resumes execution within function test()
. Within the file umbrella
, there is a
function smoke()
:
void smoke() {
entry_check(0); /* Make sure entered this function properly */
printf("Smoke!: You called smoke()\n");
validate(0);
exit(0);
}
Your task is to get umbrella
to execute the code
for smoke()
when getbuf()
executes its
return statement, rather than returning to test()
. You
can do this by supplying an exploit string that overwrites the stored
return pointer in the stack frame for getbuf()
with the
address of the first instruction in smoke
. Note that
your exploit string may also corrupt other parts of the stack state,
but this will not cause a problem, because smoke()
causes
the program to exit directly.
Advice
- All the information you need to devise this exploit string can be
determined by examining a disassembled version of
umbrella
. - Be careful about byte ordering (i.e., endianness).
- You might want to use
gdb
to step the program through the last few instructions ofgetbuf()
to make sure it is doing the right thing. - The placement of
buf
within the stack frame forgetbuf()
depends on which version ofgcc
was used to compileumbrella
. You must pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary. - Don’t forget to use
hex2raw
to simplify your job.
Description Questions for Exploit 1
Answer these questions succinctly in descriptions.txt
.
- What is the instruction inst (give its address and assembly) in the
umbrella
executable that executes differently under your exploit than it does under normal (overflow-free) execution ofumbrella
and causes the computer to execute a different next instruction than usual? - What part of your exploit string (described as a byte offset from the start of the string) causes instruction inst to behave differently than normal? Why? (Write a sentence or two.)
- What instruction (address and assembly representation) executes next after instruction inst?
Exploit 2: Soda Fountain
Within the umbrella
there is also a function fizz()
:
void fizz(int arg1, char arg2, long arg3,
char* arg4, short arg5, short arg6, unsigned long long val) {
entry_check(1); /* Make sure entered this function properly */
if (val == cookie) {
printf("Fizz!: You called fizz(0x%llx)\n", val);
validate(1);
} else {
printf("Misfire: You called fizz(0x%llx)\n", val);
}
exit(0);
}
Similar to Exploit 1, your task is to get umbrella
to
execute the code for fizz()
rather than returning
to test
. In this case, however, you must make it appear
to fizz
as if you have passed your cookie as its
argument. You can do this by encoding your cookie in the appropriate
place within your exploit string.
Advice
- Recall that the first six arguments are passed in registers and
additional arguments are passed on the stack. Your exploit code
needs to write to the appropriate place within the stack. This
explains our somewhat contrived
fizz
parameters. - You can use
gdb
to get the information you need to construct your exploit string. Set a breakpoint withingetbuf()
and run to this breakpoint. Determine key features such as the address ofval
and the location of the buffer.
Description Questions for Exploit 2
Answer these questions succinctly in descriptions.txt
.
- Describe how the beginning of the exploited execution of
fizz
differs from the beginning of the execution offizz
by a normal procedure call:- What instruction (assembly, no address) executes right before
fizz
? - How does that instruction modify the call stack? (Write a phrase or sentence. Give a general description, not a number.)
- What instruction (assembly, no address) executes right before
- What instruction (address and assembly) in
fizz
finds the value of theval
argument? Where does it findval
relative to the top of the call stack? (Give a byte offset.) - Describe how your exploit causes this instruction to find your cookie. (Write a sentence or two.)
Exploit 3: Door Knocker
A much more sophisticated form of buffer attack involves supplying
a string that encodes actual machine instructions. The exploit string
then overwrites the return pointer with the starting address of these
instructions. When the calling function (in this
case getbuf
) executes its ret
instruction,
the program will start executing the instructions on the stack rather
than returning. With this form of attack, you can get the program to
do almost anything. The code you place on the stack is called
the exploit code. This style of attack is tricky, though,
because you must get machine code onto the stack and set the return
pointer to the start of this code.
You must run umbrella
under gdb
for this exploit to
succeed. (Modern systems use memory protection mechanisms to prevent
execution of memory locations in the stack and guard against exactly
this type of attack. Since gdb
works a little differently than
normal program execution, it allows the exploit to succeed.)
Within the file umbrella
there is a function bang()
:
unsigned global_value = 0;
void bang(unsigned long long val) {
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie) {
printf("Bang!: You set global_value to 0x%llx\n", global_value);
validate(2);
} else {
printf("Misfire: global_value = 0x%llx\n", global_value);
}
exit(0);
}
Similar to Exploits 1 and 2, your task is to get umbrella
to execute
the code for bang()
rather than returning to test()
. Before this,
however, you must set global variable global_value
to your
cookie. Your exploit code should set global_value
, push the address
of bang()
on the stack, and then execute a ret
instruction to
cause a jump to the code for bang()
.
Advice:
- You must run
umbrella
undergdb
for this exploit to succeed. - Determining the byte encoding of instruction sequences by hand is
tedious and prone to errors. You can let tools do all of the work by
writing an assembly code file containing the instructions and data
you want to put on the stack. Assemble this file with
gcc
and disassemble it withobjdump
. This will allow you to see the byte sequence to include in your exploit. (A brief example of how to do this is included in the Generating Byte Codes section above.) - Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie.
- Watch your use of address modes when writing assembly code. Note
that
movq $0x4, %rax
moves the value0x0000000000000004
into register%rax
; whereasmovq 0x4, %rax
moves the value in memory at address0x0000000000000004
into%rax
, which is not likely your intent. (Also, because that memory location is usually undefined, the second instruction will cause a segmentation fault!) - Do not attempt to use either a
jmp
or acall
instruction to jump to the code forbang()
. These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use theret
instruction.
Description Questions for Exploit 2
Answer these questions succinctly in descriptions.txt
.
- Starting from the
ret
instruction ingetbuf
, list the series of instructions (address and assembly) that executes until the first instruction inbang
. - Describe how this code sequence changes memory contents, register contents, and program counter. (Write a couple/few sentences or annotate your listing above.)
Exploit 4: Whizbang Basic Blaze Box
You must run umbrella
under gdb
for this exploit to
succeed.
Our preceding attacks have all caused the program to jump to the
code for some other function, which then causes the program to
exit. As a result, it was acceptable to use exploit strings that
corrupt the stack, overwriting the saved value of
register %rbp
and the return pointer.
The most sophisticated form of buffer overrun attack causes the
program to execute some exploit code that patches up the stack and
makes the program return to the original calling function
(test()
in this case). The calling function is oblivious
to the attack. This style of attack is tricky, though, since you must:
(1) get machine code onto the stack, (2) set the return pointer to the
start of this code, and (3) undo the corruption made to the stack
state.
Your job is to supply an exploit string that will
cause getbuf()
to return your cookie back
to test()
, rather than the value 1. You can see in the
code for test()
that this will cause the program to go
Boom!
. Your exploit code should set your cookie as the
return value, restore any corrupted state, push the correct return
location on the stack, and execute a ret
instruction to
really return to test()
.
Advice:
- You must run
umbrella
undergdb
for this exploit to succeed. - In order overwrite the return address slot on the stack, your exploit
string must also cover all the items saved on the stack between the
buf
array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return totest()
may depend on it. Consider carefully what is stored here on the stack duringgetbuf
, its original source, where it stored aftergetbuf
completes, and howtest
may use it. Usegdb
to inspect the disassembled code ofgetbuf
andtest
. Determine how you can organize your exploit to avoid disturbing stack data ingetbuf
on whichtest
later relies. - Let tools such as
gcc
andobjdump
do all of the work of generating a byte encoding of the instructions. - Keep in mind that your exploit string depends on your cookie, your machine, and your compiler.
Description Questions for Exploit 4
Answer these questions succinctly in descriptions.txt
.
- What instruction (address and assembly) in
test
inspects the return value ofgetbuf
? - Under your exploit, what instruction (address and assembly) stores
this return value to “trick”
test
? Does this instruction execute before or after theret
ingetbuf
? - What other instruction inst (address and assembly) in
test
depends on data from your exploit string? - Under normal (overflow-free) execution, what instruction (address and assembly) stores the data that this instruction expects to find there?
- What is the meaning of that location on the stack? (Write a sentence or two to describe in terms of general procedure conventions.)
- Why might
test
crash if your exploit does not heed the importance of this stack location? (Write a sentence or two.) - How does your exploit avoid this crash? (Write a sentence or two.)
You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant security problem!
Mayhem (optional extra credit)
execve
is a system call that replaces the currently running
program with another program inheriting all the open file descriptors. What
are the limitations of the exploits you have performed so far? How could calling
execve
allow you to circumvent this limitation? If you have time,
try writing an additional exploit (mayhem.hex
) that uses execve
and another
program to print a message.
Talk to the instructors if you’re curious about more.
Grading
The assignment is graded from a maximum of 100 points:
- Working exploits (80 points): run
make test
to check all of your exploits.- Exploit 1: 25 points
- Exploit 2: 25 points
- Exploit 3: 15 points
- Exploit 4: 15 points
- Descriptions (20 points):
- We may grade a subset of your answers.