Buffer
Assignment: Buffer
- Assign: Tuesday 11 April
- Due: Tuesday 18 April
- Policy: Individual graded synthesis assignment
-
Code:
cs240 start buffer --solo
(if this doesn't work, usecs240.s23 start buffer --solo
- Submit:
git commit
andgit push
your completed code. - Reference:
Contents
- Assignment: Buffer
- Overview
- Setup
- Tasks
- Preparatory Exercises
- The
laptop.bin
Executable - Tools for Crafting Exploits
- Running and Testing Exploits
- Exploits
- Submission
- Grading
- Extra Fun Mayhem
Overview1
Boring version: This assignment helps you develop a detailed
understanding of the call stack organization by deploying a series
of buffer overrun attacks on a vulnerable executable file called
laptop.bin
.
Silly version: Impressed by your recent reverse engineering
adventure, an anonymous Wellesley alum contacts you for assistance
subverting the laptop
of an evil mastermind bent on, you know,
something evil. Your task is to exploit vulnerabilities in the
laptop
’s software with C’s catch-fire semantics by providing
carefully crafted inputs that will cause buffer overflows and lead to
the self-destruction of the laptop
(and its evil whatchyamacallits)
in increasingly alarming ways. (Do not worry, neither your computer
nor ours will explode as a result of this assignment.)
Ethics: In this assignment, you will gain firsthand experience with exploits of a common type of security vulnerability in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature and impact of this form of security weakness so that you can avoid it when you write system code. We do not condone the use of these or any other form of attack to gain unauthorized access to any system resources. There are criminal statutes governing such activities.
Goals
- To understand the procedure call abstraction and the details of its implementation with the stack discipline.
- To understand the far-reaching impacts of system design choices, especially through security implications of the call stack in a language that does not enforce memory safety.
- To understand the principles of buffer overrun vulnerabilities through practice exploits in a controlled environment.
- To scare yourself a bit when realizing that the same kind of vulnerability you exploited probably exists somewhere in the software powering your healthcare, transportation, utilities, and more.
Time Reports
According to self-reported times on this assignment from Fall 2018:
- 25% of students spent <= 7 hours.
- 50% of students spent <= 10 hours.
- 75% of students spent <= 10 hours.
Setup
Get your repository with cs240 start buffer --solo
.
Your starter repository contains the following files:
responses.txt
: file for English descriptions of your exploitsexploit1.hex
,exploit2.hex
,exploit3.hex
,exploit4.hex
: files for Exploits 1-4hex2raw.bin
: utility to convert human-readable exploit descriptions written in hexadecimal to raw bytesid2cookie.bin
: utility to convert user ID to unique “cookie” valueMakefile
,test.gdb
: recipes to test your exploitslaptop.bin
: executable you will attacklaptop.c
: important parts of C code used to compilelaptop.bin
Create your cookie: Most attacks in this assignment will require
you to make a unique2 8-byte
“cookie”
value show up in places where it ordinarily would not. This value
will also determine the exact behavior of your executable. To
create your personalized cookie, run make cookie
. This will print
your cookie in hex and store your CS username and cookie in the files
id.txt
and cookie.txt
, respectively
Tasks
You must craft exploit strings that accomplish four increasingly
sophisticated buffer overrun attacks when provided as input to the
vulnerable laptop.bin
executable.
Each exploit is described below.
Submit two parts for each exploit:
- Exploit string (input): Write your exploit string in
hex2raw
input format in each of the filesexploit1.hex
throughexploit4.hex
. - Description questions: In the separate
responses.txt
file, answer the questions given with each exploit succinctly to describe how your exploit works.- Many of these questions request only a couple words or an instruction listing from your disassembled code.
- Prose answers should focus on the general meaning of code and data rather than specific numbers or addresses (e.g., “return address”, not “0x4067c5”).
You may find it helpful to use these questions to help guide your exploit development.
Grading considers both the effectiveness of your exploits and your descriptions of how they work.
The remainder of this document describes:
- Prepatory exercises.
- The executable you will attack.
- Tools and techniques to use while constructing an exploit. (Skim, then return when working on Exploit 1.)
- How to run and test your exploits. (Skim, then return when working on Exploit 1.)
- The requirements and questions for each exploit.
- The grading criteria.
Preparatory Exercises
As you read this document, complete these exercises to familiarize yourself with stack frame layout, details of vulnerable functions, and tools for constructing exploits.
Preparation is your ticket for assistance.
- You must complete the preparatory exercises (and show evidence) before asking questions about code or debugging on the main assignment.
- You may ask questions on preparatory exercises at any time.
- Make sure you completed the setup above, including
m
baking your “cookie.” - As you read about the
laptop.bin
executable, disassemble it to findgetbuf
. - Draw the call stack frame for a call to
getbuf
right before it callsGets
, using the conventions from class and lab. Label the positions and sizes of as many parts of the frame as you can recover. - On your call stack drawing, simulate a call to
getbuf
with a sample input string of your choosing by following the C code inlaptop.bin.c
and, showing any updates to the call stack bounds or content. Do not bother simulatingGets
at the x86 level – take its functionaliy at face value as documented (or use the C code). - Remember these later to save time:
- Which exploits will run alone without GDB? Which exploits work only under GDB?
- What is the purpose of
hex2raw.bin
? - What are the steps for running your exploit? For testing all exploits?
The laptop.bin
Executable
The laptop.bin
executable requires a user ID argument on the command
line and reads a string from standard input once it starts up. The
user ID customizes stack layout and verifies a unique “cookie” value
that your attacks must provide.
Usage
To run the laptop.bin
executable:
$ ./laptop.bin -u your_cs_username
Type string:
Alternatively, since your username was saved in a file when you made your cookie earlier, you can also use a subshell to pass the contents of this file as an argument:
$ ./laptop.bin -u $(cat id.txt)
Type string:
Input Vulnerability
The laptop.bin
executable reads a string from standard input with the
function getbuf()
:
unsigned long long getbuf() {
char buf[36];
// ...
unsigned long long val = (unsigned long long)Gets(buf);
// ...
return val % 40;
}
The full version of this function contains more code for an optional
additional challenge. The part shown here is sufficient for the
required parts of this assignment. The key feature to note is that
getbuf()
calls the function Gets()
, passing the address of its
local array buf
, which is allocated on the stack with space
for 36 char
s.
The function char* Gets(char* buf)
is similar to the standard C library function
char* gets(char* buf)
. It reads a string from standard input, terminated by a
newline character ('\n'
), and stores the characters of the string,
followed by a null terminator ('\0'
) starting at the memory address
given by its argument, buf
. It returns its argument.
Neither Gets()
nor gets()
has any way to determine whether there
is enough space at the destination to store the entire
string. Instead, they simply copy the entire string, assuming the
destination is large enough and thus possibly over-running the bounds
of the storage allocated at the destination.
If the input string read by getbuf()
is less than 36 characters
long, it is clear that getbuf()
will return some value less than
0x28 (that’s 4010), as shown by the following execution
example:
$ ./laptop.bin -u your_cs_username
Type string: Acromantula!
Dud: getbuf returned 0x20
The value returned might differ for you, since it is derived from the
address of buf
on the stack, which may vary between systems.
Running the laptop.bin
under gdb
will also yield different values
than it does outside gdb
.
Typically, an error occurs if we type a longer string:
$ ./laptop.bin -u your_cs_username
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!
As the error message indicates, over-running the buffer typically
causes the program state (e.g., the return addresses and other data structured that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed laptop.bin
so that it does more interesting things. These are called exploit strings.
Disassembly
As in the previous assignment, use gdb
or objdump
to
disassemble the laptop.bin
executable whenever you need to inspect its
contents. You do not need to start
, run
, or single-step into the
code you want to inspect. For example, to disassemble getbuf
, start
GDB with gdb ./laptop.bin
then type disas getbuf
at the GDB prompt.
You may encounter the leaveq
instruction in the laptop.bin
executable. This instruction is a historical artifact tied to how the
%rbp
register was used before x86-64 (when it was %ebp
). The
leaveq
instruction is equivalent to the following pair of
instructions in order:
mov %rbp, %rsp
popq %rbp
Some laptop.c
Details
unsigned long long
Some variables in laptop.c
are declared as unsigned long long
. What does this mean,
and how many bytes are required for values with this declaration?
The Wikipedia article on C Data Types is
helpful in this regard. It turns out that the size of C data types is not fixed and
can vary between implementations. For example, although a long
on tempest is
a signed integer type whose size is 8 bytes, C only requires it to be at least 4 bytes.
Integer types involving long long
are required to have a size of at least 8 bytes, n
and on tempest its size is 8 bytes (just like long
). The code in this assignment
is adapted from CSAPP, whose code is designed to work correctly on as many architectures
as possible, which is why it uses long long
rather than just long
for integer
value that should be 8 bytes.
volatile
Some variables in laptop.c
are declared as being volatile
. This means that the
compiler must assume that the variable value must be fetched from memory at every
reference, disregarding optimizations that might otherwise be performed.
entry_check
, validate
, and alloca
Several functions in laptop.c
use the functions entry_check
and validate
,
and a few use alloca
. These functions are used to to make sure that the
exploits are working properly. You do not have to understand what these
functions do as part of this assignment.
Tools for Crafting Exploits
Constructing exploits involves tricky tasks like writing untypeable characters and determining the byte encoding of x86 instructions. Use the techniques below to simplify your job.
Formatting Exploit Strings with hex2raw.bin
Each ASCII character
in a string is represented by one byte. For example 'A'
is
represented by the byte value also described by the hexadecimal number
value 0x41
. While your exploits will be delivered under the guise
of strings, they will embed sequences of bytes encoding addresses,
numbers, or other non-character data. It is hard enough to map each
desired byte value in your exploit back to a character by hand, but
often, the specific bytes required do not even correspond to any
typeable or printable ASCII characters, making it “difficult” to type
your exploit string on a keyboard or view it on the screen. Do not
try to encode your exploit by hand!
We have provided a tool called hex2raw.bin
to encode exploit strings:
- The input to
hex2raw.bin
is a human-readable text description of a byte sequence where each byte is written as pair of hexadecimal digits. Successive bytes may be separated by spaces. - The output of
hex2raw.bin
is a raw byte sequence, where each byte has the hexadecimal value described by the corresponding pair of characters in the input.
Suppose we want the sequence of bytes whose values are the hexadecimal
numbers 0x41
, 0x42
, 0x43
, 0x1b
. These same values, when interpreted
with the ASCII encoding, mean the characters 'A'
, 'B'
, 'C'
,
followed by the ASCII “escape” (ESC) character, which is not treated as a
string character when typed on the keyboard or printed to the terminal
output. Given the input 41 42 43 1b
, the hex2raw.bin
utility will
output the desired 4-byte sequence.
To run hex2raw.bin
, type the series of hexadecimal byte value
descriptions you want in a file (e.g., exploit1.hex
for Exploit 1).
Following our example, we could save the string 41 42 43 1b
into the
file exploit1.hex
using Emacs. Then run:
$ ./hex2raw.bin < exploit1.hex > exploit1.bytes
The shell’s input redirection symbol <
instructs the command-line shell to
use the contents of exploit1.hex
as standard input to hex2raw.bin
,
instead of looking for input from the keyboard. The shell’s output
redirection symbol >
instructs the command-line shell to store the
standard (printed) output of hex2raw.bin
into a file called
exploit1.bytes
. Input and output redirection (<
and >
) are
general features of the command-line shell that can be used
independently and with any executable command.
Once the exploit string byte sequence is stored into the file
exploit1.bytes
, run laptop.bin
with the contents of the file
exploit1.bytes
as input:
$ ./laptop.bin -u your_cs_username < exploit1.bytes
Naturally, as with compiled source code, if you update your
exploit string specification in exploit1.hex
, you must run hex2raw.bin
again to translate the new version to a byte sequence in
exploit1.bytes
to use this new exploit with the laptop.bin
.
Warning: do not use 0A
Your exploit string must not contain byte value 0x0A
(0A
in hex2raw.bin
input) at any intermediate position,
since this is the ASCII code for newline ('\n'
). When Gets()
encounters this byte, it will assume you intended to terminate
the string input. hex2raw.bin
will warn you if it encounters this
byte value.
Running and Testing Exploits
Test all exploits (used for grading):
- Save your exploits in
hex2raw
input format in the proper files. - Run
make test
to translate and test each exploit and generate a summary.
Run an individual exploit:
- Write the exploit string in
hex2raw
input format in, e.g., the fileexploit1.hex
. -
Translate it to raw bytes with
hex2raw.bin
:$ ./hex2raw.bin < exploit1.hex > exploit1.bytes
-
Run it directly (possible for Exploits 1 and 2):
$ ./laptop.bin -u your_cs_username < exploit1.bytes
or under
gdb
(required for Exploits 3 and 4):$ gdb ./laptop.bin [... gdb startup output ...] (gdb) run -u your_cs_username < exploit1.bytes
GDB Scripts
When using gdb
, you may find it useful to save a series of gdb
commands to a text file (e.g., commands.txt
) and then use the -x
commands.txt
flag, which runs each line of the file as a command in
gdb
. This saves the trouble of retyping the commands every time you
run gdb
. You can read more about the -x
flag in gdb
’s man
page.
Exploits
Save your buffer overrun exploit strings in
hex2raw
input format in the files exploit1.hex
,
exploit2.hex
, exploit3.hex
, exploit4.hex
.
Exploit 1: Smoke
The function getbuf()
is called within laptop.bin
by a function test()
:
void test() {
volatile unsigned long long val;
volatile unsigned long long local = 0xdeadbeef;
char* variable_length;
entry_check(3); /* Make sure entered this function properly */
val = getbuf();
if (val <= 40) {
variable_length = alloca(val);
}
entry_check(3);
/* Check for corrupted stack */
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
} else if (val == cookie) {
printf("Boom!: getbuf returned 0x%llx\n", val);
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
if (val != cookie) {
printf("Sabotaged!: control flow has been disrupted\n");
}
validate(3);
} else {
printf("Dud: getbuf returned 0x%llx\n", val);
}
}
When getbuf()
executes its return statement, the program ordinarily resumes execution within function test()
. Within the file laptop.bin
, there is a
function smoke()
:
void smoke() {
entry_check(0); /* Make sure entered this function properly */
printf("Smoke!: You called smoke()\n");
validate(0);
exit(0);
}
Your task is to get laptop.bin
to execute the code
for smoke()
when getbuf()
executes its
return statement, rather than returning to test()
. You
can do this by supplying an exploit string that overwrites the stored
return pointer in the stack frame for getbuf()
with the
address of the first instruction in smoke
. Note that
your exploit string may also corrupt other parts of the stack state,
but this will not cause a problem, because smoke()
causes
the program to exit directly.
Advice
- All the information you need to devise this exploit string can be
determined by examining a disassembled version of
laptop.bin
. - Be careful about byte ordering (i.e., endianness).
- You might want to use
gdb
to step the program through the last few instructions ofgetbuf()
to make sure it is doing the right thing. - The placement of
buf
within the stack frame forgetbuf()
depends on which version ofgcc
was used to compilelaptop.bin
. You must pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary. - Don’t forget to use
hex2raw.bin
to simplify your job.
Description Questions for Exploit 1
Answer these questions in responses.txt
.
-
When the first instruction of the
getbuf
function is about to be executed, the%rsp
register contains a stack address we’ll refer to asgetbuf_entry_rsp
. What is the exact value on the stack at the address ingetbuf_entry_rsp
and what is the purpose of this value? -
Executing the first instruction in the
getbuf
function causes a value to be pushed onto the stack. Briefly explain at a high level the purpose of this value in the normal (unexploited) execution ofgetbuf
. -
Within the
getbuf
function, there is a call to theGets
function, whose single argument is a pointer to the stack. Call this stack pointerGets_arg
. What is the purpose ofGets_arg
? I.e., what does theGets
function do withGets_arg
? -
What is the number of bytes in the difference between the stack addresses getbuf_entry_rsp
and
Gets_arg`? -
During a successfully exploited execution of the
laptop.bin
, one crucial control-flow instruction ingetbuf
is affected by your exploit string data in a way that causes it to choose a different next instruction to execute compared to normal execution (in the absence of buffer overflow), and allow the attack to begin executing different code than usual. What is the instruction address and assembly code for that crucial control-flow instruction ingetbuf
? -
What part of your exploit string (described as a range of byte offsets from the start of the string) causes the instruction from (e) to behave differently than normal? Why? (Write a sentence or two.)
-
What instruction executes next after the instruction in (e) in a normal execution? Give the instruction’s address and assembly code.
-
What instruction executes next after the instruction in (e) in your exploited execution? Give the instruction’s address and assembly code.
Exploit 2: Fizz
Within the laptop.bin
there is also a function fizz()
:
void fizz(int arg1, char arg2, long arg3,
char* arg4, short arg5, short arg6, unsigned long long val) {
entry_check(1); /* Make sure entered this function properly */
if (val == cookie) {
printf("Fizz!: You called fizz(0x%llx)\n", val);
validate(1);
} else {
printf("Misfire: You called fizz(0x%llx)\n", val);
}
exit(0);
}
Similar to Exploit 1, your task is to get laptop.bin
to
execute the code for fizz()
rather than returning
to test
. In this case, however, you must make it appear
to fizz
as if you have passed your cookie as its
argument. You can do this by encoding your cookie in the appropriate
place within your exploit string.
Advice
- Recall that the first six arguments are passed in registers and
additional arguments are passed on the stack. Your exploit code
needs to write to the appropriate place within the stack. This
explains our somewhat contrived
fizz
parameters. - You can use
gdb
to get the information you need to construct your exploit string. Set a breakpoint withingetbuf()
and run to this breakpoint. Determine key features such as the address ofval
and the location of the buffer.
Description Questions for Exploit 2
Answer these questions in responses.txt
.
-
When the first instruction of the
fizz
function is about to be executed, the%rsp
register contains a stack address we’ll refer to asfizz_entry_rsp
. What is the purpose of the value thatfizz
expects to be at this address? -
When the first instruction of the
fizz
function is about to be executed, what is the purpose of the value thatfizz
expects to be stored at the stack address that isfizz_entry_rsp
+ 8? -
The
retq
instruction ingetbuf
uses one word of your exploit string as a return address. Describe how each subsequent word of the exploit string is interpreted byfizz
, including how it finds your cookie asval
, and why each of these words must be at its position to allow fizz to make this interpretation. -
What instruction in
fizz
finds the value of itsval
argument? Give the instruction’s address and assembly code. -
Where does the instruction from (d) find
val
relative to the top of the call stack when the instruction is executed? (Give a byte offset.)
Exploit 3: Bang
A much more sophisticated form of buffer attack involves supplying
a string that encodes actual machine instructions. The exploit string
then overwrites the return pointer with the starting address of these
instructions. When the calling function (in this
case getbuf
) executes its ret
instruction,
the program will start executing the instructions on the stack rather
than returning. With this form of attack, you can get the program to
do almost anything. The code you place on the stack is called
the exploit code. This style of attack is tricky, though,
because you must get machine code onto the stack and set the return
pointer to the start of this code.
You must run laptop.bin
under gdb
for this exploit to
succeed. (Modern systems use memory protection mechanisms to prevent
execution of memory locations in the stack and guard against exactly
this type of attack. Since gdb
works a little differently than
normal program execution, it allows the exploit to succeed.)
Within the file laptop.bin
there is a function bang()
:
unsigned global_value = 0;
void bang(unsigned long long val) {
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie) {
printf("Bang!: You set global_value to 0x%llx\n", global_value);
validate(2);
} else {
printf("Misfire: global_value = 0x%llx\n", global_value);
}
exit(0);
}
Similar to Exploits 1 and 2, your task is to get laptop.bin
to execute
the code for bang()
rather than returning to test()
. Before this,
however, you must set global variable global_value
to your
cookie. Your exploit code should set global_value
, push the address
of bang()
on the stack, and then execute a ret
instruction to
cause a jump to the code for bang()
.
Byte-Encoding Instructions for Exploit Code
When including instructions as part of an exploit
payload, you must use the instruction encoding as machine code, the byte sequence
used to encode an instruction like pushq %rax
for the machine. This
is not the byte sequence representing the string "pushq %rax"
.
Use gcc
as an assembler and objdump
as a disassembler to generate
the byte codes for instruction sequences. Suppose we
write a file example.s
containing the following assembly code:
# Example of hand-generated assembly code
movq $0x1234abcd,%rax # Move 0x1234abcd to %rax
pushq $0x401080 # Push 0x401080 on to the stack
retq # Return
The code can contain a mixture of instructions and data. Anything
to the right of a #
character is a comment.
We can now assemble and disassemble this file, saving the disassembler’s description of the binary object code:
$ gcc -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following lines:
0: 48 c7 c0 cd ab 34 12 mov $0x1234abcd,%rax
7: 68 80 10 40 00 pushq $0x401080
c: c3 retq
Each line shows a single instruction. The number on the left indicates
the starting address (starting with 0), while the hex digits after the
:
character indicate the byte codes for the instruction, shown as
individual bytes in memory order from left to right. (Do not
flip them.) Thus, we can see that the instruction pushq $0x401080
has a hex-formatted byte code of 68 80 10 40 00
that could be
entered into an exploit string. The entire byte sequence to encode
the above instructions would be: 48 c7 c0 cd ab 34 12 68 80 10 40 00 c3
.
Advice
- You must run
laptop.bin
undergdb
for this exploit to succeed. -
Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You should let tools do all of the work by:
- Writing an assembly code file
exploit3Assembly.s
containing the instructions and data you want to put on the stack. - Assembling this file with
gcc -c exploit3Assembly.s
to create the binary fileexploit3Assembly.o
(as shown in the Byte-Encoding Instructions section above). - Disassembling the binary file with
objdump -d exploit3Assembly.o > exploit3Assembly.d
(as shown in the Byte-Encoding Instructions section above).exploit3Assembly.d
will allow you to see the byte sequence to include in your exploit.
- Writing an assembly code file
- Keep in mind that your exploit string depends on your computer, your compiler, and even your cookie.
- Watch your use of address modes when writing assembly code. Note
that
movq $0x4, %rax
copies the literal value0x0000000000000004
into register%rax
; whereasmovq 0x4, %rax
copies the contents of memory at address0x0000000000000004
into%rax
. If you forget a$
character, your code is likely to cause a segmentation fault, because the literal number that you mistakenly wrote as memory address is most likely not a legal memory address. - Due to restrictions on the total size of instruction encodings, x86
does not support all combinations of operands. For example, it is
not possible to write a
movq
instruction with a large literal source operand and an absolute memory address. If you get errors from the assembler, try breaking down instructions into multiple steps, storing intermediate values in registers. - Do not attempt to use either a
jmp
or acall
instruction to jump to the code forbang()
. These instructions use PC-relative addressing, which is tricky to set up correctly in this attack. Instead, push an address on the stack and use theret
instruction.
Description Questions for Exploit 3
Answer these questions in responses.txt
.
-
Show the contents of your
exploit3Assembly.d
file. -
Starting with (including) the
retq
instruction ingetbuf
, list the sequence of instructions that the computer executes under your exploit up through (including) the first instruction inbang
. For each instruction, list the instruction address and its assembly code. -
Describe how the instruction sequence in (b) changes memory contents, register contents, and program counter (i.e.,
%rip
). Either write a few sentences to explain or annotate your instruction listing in part (b).
Exploit 4: Boom
You must run laptop.bin
under gdb
for this exploit to
succeed.
Our preceding attacks have all caused the program to jump to the
code for some other function, which then causes the program to
exit. As a result, it was acceptable to use exploit strings that
corrupt the stack, overwriting the saved value of
register %rbp
and the return pointer.
The most sophisticated form of buffer overrun attack causes the
program to execute some exploit code that patches up the stack and
makes the program return to the original calling function
(test()
in this case). The calling function is oblivious
to the attack. This style of attack is tricky, though, since you must:
(1) get machine code onto the stack, (2) set the return pointer to the
start of this code, and (3) undo the corruption made to the stack
state.
Your job is to supply an exploit string that will
cause getbuf()
to return your cookie back
to test()
, rather than the value 1. You can see in the
code for test()
that this will cause the program to go
Boom!
. Your exploit code should set your cookie as the
return value, restore any corrupted state, push the correct return
location on the stack, and execute a ret
instruction to
really return to test()
.
Advice
- You must run
laptop.bin
undergdb
for this exploit to succeed. -
The
leaveq
instruction (a historical artifact tied to how the%ebp
register was used before x86-64) is equivalent to the following pair of instructions in order:mov %rbp, %rsp popq %rbp
- In order to overwrite the return address slot on the stack, your exploit
string must also cover all the items saved on the stack between the
buf
array and the return address slot. So far, the code we have attempted to run with the exploit has not depended on this data, but a “normal”-looking return totest()
may depend on it. Consider carefully what is stored here on the stack duringgetbuf
, its original source, where it is stored aftergetbuf
completes, and howtest
may use it. Usegdb
to inspect the disassembled code ofgetbuf
andtest
. Determine how you can organize your exploit to avoid disturbing stack data ingetbuf
on whichtest
later relies. - As in Exploit 3, let tools such as
gcc
andobjdump
do all of the work of byte-encoding the instructions. Use them to create a fileexploit4Assembly.d
file that will show you the instruction bytes to use in your exploit. - Keep in mind that your exploit string depends on your cookie, your computer, and your compiler.
[Independent] Description Questions for Exploit 4
Answer these questions in responses.txt
.
These description questions must be answered independently without
assistance from others.
-
Show the contents of your
exploit4Assembly.d
file.Besides the primary behavior of providing a “modified” return value, a successful exploit for this part must also take care to “cover its tracks” and avoid corruption that could cause segmentation faults or other unexpected behavior in the remainder of the computation in
test
and its callers after the exploit causes this “modified” return fromgetbuf
.Explain in detail how your exploit works and avoids the potential corruption by answering the following questions:
-
Which register or memory location would hold a corrupted value after execution of your exploit code if your exploit did not “cover its tracks?”
-
How is the potential corruption related to instructions that manage the saving and restoring the caller’s function call frame?
-
What instruction(s) (along with its address) in
getbuf
would place a correct value in that register or memory location under normal execution, but would instead place the corrupt value there under an exploit that does not “cover its tracks?” -
Which instruction (along with itsaddress) in
test
could first raise a segmentation fault (or lead to other data corruption) when using the corrupt value from that register or memory location? -
How does your exploit “covers its tracks” to avoid this corruption?
Answering the above questions effectively will require a complete understanding of how your exploit works and how it interacts with existing code.
Recognize what you have accomplished.
You caused a program to execute arbitrary machine code of your own design simply by choosing a particular input. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss. Surely this is a significant problem!
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes. (Replace
FILES
with the files you changed andMESSAGE
with your commit message.)$ git add FILES $ git commit -m "MESSAGE"
-
Run the command
cs240 sign
to sign your work and respond to any assignment survey questions.$ cs240 sign
(If this encounters an error, instead execute
cs240.s23 sign
.) -
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/main'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.
Grading
The assignment is graded from a maximum of 100 points:
- Working exploits (75 points): run
make test
to check all of your exploits.- Exploit 1: 20 points
- Exploit 2: 25 points
- Exploit 3: 15 points
- Exploit 4: 15 points
- Questions (25 points):
- Your answers for all Exploits will be graded.
Extra Fun Mayhem
This is an optional fun challenge. Try it after finishing the required parts of the assignment if you want to see the full power of buffer exploits.
execve
is a system call that replaces the currently running program
with another program inheriting all the open file descriptors. What
are the limitations of the exploits you have performed so far? How
could calling execve
allow you to circumvent this limitation? If you
have time, try writing an additional exploit (mayhem.hex
) that uses
execve
and another program to print a message. This will require
more independence and resourcefulness than the main exploits.
-
This document is an alternative (s/ia32 pyrotechnics/x86-64 incoherent magic references/g) description for the old-style CSAPP Buffer Lab, which is available in ia32 form on the CSAPP website. ↩