Ancient x86 Runes
- Assign: Monday, 28 October
- Due: 11:59pm Friday, 8 November
- Policy: Pair graded synthesis assignment
- Partner search: Find a partner here.
-
Code:
cs240 start x86
- Submit:
git commit
,cs240 sign
, andgit push
your completed code. - Reference:
- ISA: x86, assembly
- ISA: control flow
- ISA: procedures and stacks
P. and N. FlamelG. M. Hopper,Assembly and Applications of the Sourceror's CodeThe Education of a Computer. Proceedings of the ACM,15921952.- x86 resources
- debugging resources
Contents
Overview1
Silly version: Overhearing a hushed conversation in Prof. McCodegal’s office, you learn that the powerful, yet volatile, Sourceror’s Code lies hidden away in the depths of the Science Castle, guarded by six phases of magical protection. It holds the key to eternal coding enlightenment (and also the key to finishing your homework). While wandering the Science Castle corridors, you find an unassuming door you believe hides the Code. Meanwhile, the tutors who shall not be named are launching their own quest to steal away the Sourceror’s Code for the questionable machinations of their shadowy organization. You must retrieve the Sourceror’s Code before it is too late!
Fortunately, the designers of the six magical protections have left behind their recipes in ancient x86 runes. You must decipher the runes to determine how the protections work and discover the magical incantations that will disarm them. But beware, these protections are not to be taken lightly – speaking the wrong incantation will trip an alarm, ensnare you in a magical trap, and cause your runes to self-destruct! By disarming all six protections, you will recover the Sourceror’s Code and save the day.
Serious version: We have given you a compiled x86 binary executable without the C source code that created it. You must find inputs that satisfy six computational phases of the program to avoid tripping the alarm (and deleting your executable2). To do this, you will disassemble the executable and work with the x86 code to reconstruct what the program computes with each input. You will submit the six inputs plus your descriptions of what each phase computes (in English or C source code). You receive points for each correct solution and clear description.
Goals
- To understand data storage and control flow at the instruction set architecture (ISA) level of abstraction.
- To practice principled reasoning about program execution at the ISA level of abstraction, using detailed execution models.
- To extract and reason about rigorous models of program structure, such as control-flow graphs, from programs at the ISA and C levels of abstraction.
- To understand and communicate how ISA-level operations and program structure relate to C-level operations and program structure through translation and representation.
- To practice using tools such as debuggers and disassemblers to inspect binary executables.
- To leap or shout in a moment of enlightenment about the high-level meaning hidden in series of low-level instructions.
Advice
- Start early. This assignment will be as challenging (and fun) as Bit Transfiguration, but in a different way. We will do the first phase together in lab.
- You cannot do this assignment in one sitting.3 Allocate time to work on the assignment, get stuck, go away, and come back repeatedly.
- Share strategies, patterns, and ideas with classmates.
- Take plenty of breaks for puns, laughing, sprinting around the SCI L037, and other wheezes. Have fun!
Time Reports
According to self-reported times on this assignment from Fall 2018:
- 25% of students spent <= 7 hours.
- 50% of students spent <= 10 hours.
- 75% of students spent <= 14 hours.
Pairs tended to take less time than individuals.
Setup
Get your repository with cs240 start x86
.
Your x86 Runes Adventure Kit contains the following files:
descriptions.txt
: File in which you describe the x86 codeincantations.txt
: File in which you write your six inputsmain.c
: C source file with the executable’s main functionsample
: A sample binary executable that we will use together only in labselect
: A one-time-use script that selects your customrunes
executable for your team.
In lab you will use the sample
executable to complete Phase 1. For
your main assignment, each team receives a different
executable. Once you have settled on a team for the main assignment,
set up your team’s unique executable by running the command:
$ ./select
This will produce the following file:
runes
: Your team’s unique executable that you will use for the main assignment
You must use a CS 240 computing environment for CS 240 code assignments.
Tasks
Your executable has six phases of protection, each requiring an incantation (input) to disarm. You must use various principled tools and techniques to reverse engineer your executable before attempting to disarm it. Entering an incorrect incantation (input) at any phase trips an alarm and may cause your executable to self-destruct2, so guessing randomly or by brute force is inadvisable.
A lab activity helps you
learn basic reverse engineering tools and techniques and
disarm the first phase of a sample executable. Later phases get
progressively harder to disarm, but the expertise you gain with each
phase should offset the increased difficulty. If you are stumped,
inspect the *starred*
words in each new message printed
by the executable for a hint about what to consider in the next
phase.
Extra Fun: Unsubtantiated rumors suggest that the designers left behind yet another cryptic challenge and reward, something hidden among the six protections, something that only the most intrepid adventurers might find, disarm, and explain!
Submit two parts for each phase:
- Incantation (input): Write your disarming incantations for each
of the six phases on separate lines in
incantations.txt
, in order. - Description: In the separate
descriptions.txt
file, write a succinct paragraph or two for each phase, describing:- What the phase is computing with your input at a high level of abstraction.
- Some key features of the x86 code that correspond to the high-level computation.
Describe at a high level as if you are summarizing whatever C code compiled to this assembly/machine code. (Feel free to write C to describe what is computed.) Do mention a couple assembly details that were particular aha! moments or red flags that alerted you to this high-level structure, but do not give a line-by-line run-down of the assembly code.
Keep notes along the way so you do not need to repeat the reverse engineering process to remember how it worked!
As you work through phases, the computations get more interesting but your reverse engineering skills and experience grow as well. In later phases, tutors and instructors will scale back the level of assistance to match. Note that the final phase is an Independent problem.
Grading considers both the effectiveness of your incantations and your descriptions of the mystery code.
The remainder of this document describes:
- Prepatory (lab) exercises to learn tool and techniques for reverse engineering.
- Usage of the executable you will explore.
- Tools and techniques to use in reverse engineering the executable.
- The grading criteria.
Preparatory Exercises
Complete the associated lab assignment and activity. These will help you learn basic reverse engineering tools and techniques and disarm the first phase of a sample executable very similar to your own.
Usage
The executable ignores blank input lines. If you run your executable with a command line argument, for example:
$ ./runes incantations.txt
Then it will read input lines from the file incantations.txt
until
it reaches EOF
(end of file), when it will switch over to read from
stdin
(standard input from the terminal). In a moment of weakness,
the magical designers added this feature so you do not have to retype
the solutions to phases you have already disarmed. The source code of
the main
function of the executable is also available in main.c
,
but there is no source code for other functions.
Text encoding matters.
Make sure to add a new line (return) at the end of each phase input in
incantations.txt
, otherwise you may trip the alarm. emacs
will add
the trailing newline automatically.
Do not edit or copy inputs from rich text applications, which sometimes use alternative text encodings of similar-looking characters or line endings.
Tools and Techniques
There are many ways to determine how to disarm your executable.
-
You can run the executable under a debugger to examine its behavior dynamically, inspect its actions and state as it runs instruction by instruction, experiment with inputs, and generalize from this information to understand what it does.
-
You can examine the structure of the executable statically without ever running it, build a detailed mental model of its control flow and computations, and abstract this model to understand exactly what it does.
The most effective approach is often to combine elements of both static and dynamic techniques and develop your own new reasoning tools.
One approach that will not be useful is brute force. You (or a program) could try every possible input, but the number of possibilities may be prohibitively large. Even if you get extremely lucky, you will not learn much. Furthermore, you likely will be unable to give a good explanation of how the executable works, an important component for your grade.
Dynamic Inspection with gdb
“Dynamic” means “as the program runs” and refers to analysis of the actions and state of the running executable.
To avoid accidentally triggering the protections, becoming ensnared, and temporarily losing your executable2, you must learn how to use a debugger to single-step through the assembly code, inspect state of register and memory, and use breakpoints. We will learn these techniques in lab activities.
Our key tool is gdb
, the GNU Debugger. To run the executable using
your incantations.txt
under gdb
, invoke gdb
on the executable
alone, then when using the run
command inside gdb (or abbreviated to
r
), give the command-line arguments there:
$ gdb runes
... startup messages from gdb...
(gdb) run incantations.txt
Hint: break trip_alarm
will save you time!
Documentation and quick reference for gdb
:
- CS 240 GDB reference sheet (pdf, txt)
- GDB Quick Reference card
- GDB manual
- For documentation from the command line, type
help
at thegdb
command prompt, or typeman gdb
, orinfo gdb
in the shell. Some people also like to rungdb
under gdb-mode in Emacs.
Static Inspection with objdump
and strings
“Static” means “before the program runs” and refers to analysis of the structure of the code alone without executing it.
objdump -t runes
prints the executable’s symbol table, including
the names and addresses of all functions and global variables
defined or used in the executable. You may find the names themselves
to be revealing on their own.
objdump -d runes
disassembles the code in the full executable
(or an individual function). Reading the code for functions in
question can be highly helpful in building a mental model of the
function’s behavior.
- Note that calls to system functions defined outside executable often
appear with cryptic names like
__isoc99_sscanf@plt
, including@plt
, or appearing simply as odd offsets from defined functions. Trying to understand the contents of these functions is best done by reading their documentation, assuming you can recover a name, rather than attempting to find inspect their code withobjdump
orgdb
. In this case we have a function from the ISO C99 standard, linked via the PLT.4 (See the pattern?) Try looking upsscanf(3)
in the manual pages or a web search.
strings -t x runes
prints a summary of all string constants (and
their locations) defined in the executable.
- When interpreting data (dynamically or statically), think of all the
possibilities for what a given byte (or string of bytes) may
represent, even if you are expecting one in particular. Consulting
the ASCII tables can be quite helpful in some cases:
ascii(7)
.
x86 Resources
CSAPP Chapter 3 is an excellent resource for understanding x86-64 assembly and machine code. There are several online references as well:
- CSAPP Chapter 3 (draft)
- CS 240 x86-64 exam reference sheet
- exhaustive x86 manuals from Intel and AMD
- x86 Assembly WikiBook
- CSAPP textbook resources
Note that there are two common x86 assembly code syntaxes: Intel and AT&T (we use the latter). CSAPP (p. 177) and Wikipedia describe the differences (including switched operand order).
Submission
Submit: The course staff will collect your work directly from your hosted repository as of the deadline. To submit your work:
-
Make sure you have committed your latest changes.
$ git add ... $ git commit ...
-
Run the command
cs240 sign
to sign your work and respond to any assignment survey questions.$ cs240 sign
-
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: After pushing, all local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/master'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.
Grading
The assignment is graded from a maximum of 100 points.
You may receive up to 75 points for disarming phases:
- Phase 1: 15 points
- Phase 2: 15 points
- Phase 3: 15 points
- Phase 4: 15 points
- Phase 5: 10 points
- Phase 6 [Independent Problem]: 5 points
You may receive up to 25 points for descriptions:
- We will grade one or two descriptions from phases 2-5 (chosen randomly from the phases you have completed), by the criteria above.
This point allocation scheme is designed to emphasize that while completing the entire 6 phases is an excellent result, completing just 4-5 phases is also a very solid result. Make the choice that’s right for you.
-
This document is an alternative (s/Austin Powers/Harry Potter/g) description for the CSAPP Binary Bomb Lab, which is available on the CSAPP website. ↩
-
If you did this assignment all in one sitting, come chat. We need to find you a challenge. ↩
-
If you are curious about what’s going on, look ahead to linking and loading, CSAPP Chapter 7. ↩