Assignment: x86

Contents

Overview1

Boring version: We have given you a compiled x86 binary executable without the C source code that created it. You must find inputs that satisfy six computational phases of the program to avoid tripping an alarm that causes your executable to delete itself2. To do this, you will disassemble the executable and work with the x86 code to reconstruct what the program computes with each input. You will submit your inputs plus your descriptions of what each phase computes (in English or C source code). You receive points for correct inputs and clear descriptions.

Silly version: You have set off on a daring adventure.bin around a Wellesley campus filled with preposterous obstacles, with only a cryptic adventure.bin guidebook (executable) to help find your way. By decoding the mysterious adventure.bin guidebook itself, you will reveal the secret course of action (input) that allows you to circumvent each obstacle and eventually return safely to your room. But beware, any misstep along the way could knock you off course and cause your treasured adventure.bin guidebook to self-destruct, leaving you stranded in a remote corner of campus! It’s not an adventure if everything goes according to plan, but a wise adventurer is prepared for this possibility. You will need to learn the tools and techniques that allow you to decipher your guidebook (disassemble and interpret x86 code), set safeguards (with GDB), or recover if lost (with Git).

Goals

  • To understand data storage and control flow at the instruction set architecture (ISA) level of abstraction.
  • To practice principled reasoning about program execution at the ISA level of abstraction, using detailed execution models.
  • To extract and reason about rigorous models of program structure, such as control-flow graphs, from programs at the ISA and C levels of abstraction.
  • To understand and communicate how ISA-level operations and program structure relate to C-level operations and program structure through translation and representation.
  • To practice using tools such as debuggers and disassemblers to inspect binary executables.
  • To leap or shout in a moment of enlightenment about the high-level meaning hidden in series of low-level instructions.

Advice

Find a partner!

It will help a lot in this assignment to solve the phase puzzles together with a partner. Talking about solution strategies with a partner will tend to lead to solving the phases more quickly, and it’s also likely to be more enjoyable. Find a partner using the pair stream)

  • Start early. This assignment will be as challenging (and fun) as Bits, but in a different, more methodically approachable way. We have done a version of the first phase together in lab.
  • You cannot do this assignment in one sitting.3 Allocate time to work on the assignment, get stuck, go away, and come back repeatedly.
  • Share strategies, patterns, and ideas with classmates.
  • Take plenty of breaks for puns, laughing, or real-life campus adventures. Have fun!

Time Reports

According to self-reported times on this assignment from a previous offering:

  • 25% of students spent <= 10 hours.
  • 50% of students spent <= 12 hours.
  • 75% of students spent <= 15 hours.

Setup

Moving from Lab to Assignment

In lab, you used the sample.bin executable to find the strategy for finding the input to Phase 1.

For your main assignment, each team receives a different version of an executable named adventure.bin. Once you have settled on a team for the main assignment, even if you are working on your own or continuing to work with the same team from lab, you must complete the following steps to begin the assignment:

  1. Use cs240 start x86 to get a fresh repository. (If this fails for any reason, instead use cs240.s24 start x86.) If another member of your team has already created it, you will be able to choose the existing repository from the list. Otherwise, enter your team’s usernames to create it.

  2. cd into your fresh local copy and set up your team’s unique executable by running the command:

    $ ./choose.py
    

    This will produce the following file:

    • adventure.bin: Your team’s unique executable that you will use for the main assignment
  3. Use cp -a adventure.bin saved.bin to create a copy of adventure.bin named saved.bin that will come in handy later.

Your starter repository initially contains the following files:

  • choose.py: A one-time-use script that chooses your team’s own custom adventure.bin executable.
  • descriptions.txt: File in which you describe the x86 code
  • inputs.txt: File in which you write your inputs to each of the six phases. (The input to each phase goes on a separate line.)
  • main.c: C source file with the executable’s main function
  • sample.bin: A sample binary executable that we will use together only in lab

Once you run choose.py as described above, your repository will also contain the binary file adventure.bin that your team will use in this assignment. The adventure.bin file differs for each team, so that each team will have it’s own different adventure, solving different puzzles along the way.

You must use a CS 240 GNU/Linux computing environment for CS 240 code assignments.

Tasks

Your executable has 6 phases of protective obstacles, each requiring a certain input to avoid. You must use various principled tools and techniques to reverse engineer your executable before attempting to navigate it. Entering an incorrect input at any phase trips an alarm, knocks you off course, and may cause your executable to self-destruct2, so guessing randomly or by brute force is inadvisable.

Lecture timing

The material for phase 3 will be covered on 3/11. The material for phase 4-6 will be covered on 3/14. Feel free to look ahead to the corresponding videos if you would like a head start on later phases.

A lab activity helped you learn basic reverse engineering tools and techniques and navigate the first phase of a sample executable. For this assignment, later phases get progressively harder to navigate, but the expertise you gain with each phase should offset the increased difficulty. If you are stumped, inspect the *starred* words in each new message printed by the executable for a hint about what to consider in the next phase.

Extra Fun: Unsubtantiated rumors suggest that yet another cryptic challenge lies hidden among the obstacles, something that only the most intrepid adventurers might find, navigate, and explain! We definitely do not know anything about that but we do know that there is no secret grading.

Submit two parts for each phase:

  1. Input: Write your navigation inputs for each of the six phases on separate lines in inputs.txt, in order.
  2. Description: In the separate descriptions.txt file, write a succinct paragraph or two for each phase, describing:
    • What the phase is computing with your input at a high level of abstraction.
    • Some key features of the x86 code that correspond to the high-level computation.

    Describe at a high level as if you are summarizing whatever C code compiled to this assembly/machine code. Feel free to write C to describe what is computed. Do mention a couple assembly details that were particular aha! moments or red flags that alerted you to this high-level structure, but do not give a line-by-line run-down of the assembly code.

    Keep notes along the way so you do not need to repeat the reverse engineering process to remember how it worked!

As you work through phases, the computations get more interesting but your reverse engineering skills and experience grow as well. Grading considers both the effectiveness of your inputs and your descriptions of the mystery code. Note which phases are [Independent]. On pair assignments, you should continue to work with your partner on Independent parts. You must be Independent with respect to everyone else outside your team.

The remainder of this document describes:

  1. Preparatory (lab) exercises.
  2. Usage of the executable you will explore.
  3. Tools and techniques to use in reverse engineering the executable.
  4. The grading criteria.

Preparatory Exercises

Be sure you have completed the associated lab exercises. These will help you learn basic reverse engineering tools and techniques and navigate the first phase of a sample executable very similar to your own.

Checkpoint

Complete phase 1 and 2 by your lab section on Wednesday, March 13. Be sure to commit and push your inputs.txt with these two inputs/lines by this deadline.

While you do not need to have completed the full corresponding written descriptions for these two phases by this point, you should be able to show Prof. Mawhorter some notes on how you solved the two phases.

This checkpoint is worth 3% of your overall x86 grade, with partial credit available if you have not completed phase 2 but can show progress and notes on what you have tried so far.

Usage

Solving the Phases

  1. Record the inputs to each phase in the file inputs.txt (one phase input per line).

    • Calling ./adventure.bin inputs.txt will use the lines in inputs.txt as you inputs to the phases so that you don’t have to keep typing them in by hand.

    • In order to use gdb on adventure.bin:

      • Start gdb via gdb adventure.bin
      • In order to run the program within gdb, use run inputs.txt
  2. adventure.bin has a different input for Phase 1 than sample.bin, though you can use the same strategy you used in sample.bin to find the Phase 1 input.

  3. If you enter the wrong input for a Phase, your adventure.bin file will be deleted after you see a message similar to the following:

    !!!! WOMP WOMP !!!!
    
    This executable will self-destruct in 3 seconds...
    This executable will self-destruct in 2 seconds...
    This executable will self-destruct in 1 seconds...
    This executable will self-destruct in 0 seconds...
    
    **** POOF! ****
    

    You can restore your deleted adventure.bin by executing either of the following two commands:

    • git checkout HEAD -- adventure.bin

    • cp -a saved.bin adventure.bin (This assumes that you previously made a copy of adventure.bin named saved.bin via cp -a adventure.bin saved.bin)

The executable ignores blank input lines. If you run your executable with a command line argument, for example:

$ ./adventure.bin inputs.txt

Then it will read input lines from the file inputs.txt until it reaches EOF (end of file), when it will switch over to read from stdin (standard input from the terminal). In a moment of weakness, the magical designers added this feature so you do not have to retype the solutions to phases you have already navigated. The source code of the main function of the executable is also available in main.c, but there is no source code for other functions.

Text encoding matters.

Make sure to add a new line (return) at the end of each phase input in inputs.txt, including the last, otherwise you may trip the alarm.

Do not edit or copy inputs from rich text applications, which sometimes use alternative text encodings of similar-looking characters or line endings.

Tools and Techniques

There are many ways to determine how to navigate your executable.

  • You can run the executable under a debugger to examine its behavior dynamically, inspect its actions and state as it runs instruction by instruction, experiment with inputs, and generalize from this information to understand what it does.

  • You can examine the structure of the executable statically without ever running it, build a detailed mental model of its control flow and computations, and abstract this model to understand exactly what it does.

The most effective approach is often to combine elements of both static and dynamic techniques and develop your own new reasoning tools.

One approach that will not be useful is brute force. You (or a program) could try every possible input, but the number of possibilities may be prohibitively large. Even if you get extremely lucky, you will not learn much. Furthermore, you likely will be unable to give a good explanation of how the executable works, an important component for your grade.

Dynamic Inspection with gdb

“Dynamic” means “as the program runs” and refers to analysis of the actions and state of the running executable.

To avoid accidentally triggering the protections, becoming ensnared, and temporarily losing your executable2, you must learn how to use a debugger to single-step through the assembly code, inspect state of register and memory, and use breakpoints. We have introduced these techniques in lab activities.

Our key tool is gdb, the GNU Debugger. The CS 240 GDB reference sheet will be especially useful for this assignment.

To run the executable using your inputs.txt under gdb, invoke gdb on the executable alone, then when using the run command inside gdb (or abbreviated to r), give the command-line arguments there:

$ gdb adventure.bin
... startup messages from gdb...
(gdb) run inputs.txt

Hint: break trip_alarm will save you time!

Documentation and quick reference for gdb:

Static Inspection with objdump and strings

“Static” means “before the program runs” and refers to analysis of the structure of the code alone without executing it.

objdump -t adventure.bin prints the executable’s symbol table, including the names and addresses of all functions and global variables defined or used in the executable. You may find the names themselves to be revealing on their own.

objdump -d adventure.bin disassembles the code in the full executable (or an individual function). Reading the code for functions in question can be highly helpful in building a mental model of the function’s behavior.

  • Note that calls to system functions defined outside executable often appear with cryptic names like __isoc99_sscanf@plt, including @plt, or appearing simply as odd offsets from defined functions. Trying to understand the contents of these functions is best done by reading their documentation, assuming you can recover a name, rather than attempting to find inspect their code with objdump or gdb. In this case we have a function from the ISO C99 standard, linked via the PLT.4 (See the pattern?) Try looking up sscanf(3) in the manual pages or a web search.

strings -t x adventure.bin prints a summary of all string constants (and their locations) defined in the executable.

  • When interpreting data (dynamically or statically), think of all the possibilities for what a given byte (or string of bytes) may represent, even if you are expecting one in particular. Consulting the ASCII tables can be quite helpful in some cases: ascii(7).

x86 Resources

CSAPP Chapter 3 is an excellent resource for understanding x86-64 assembly and machine code. There are several online references as well:

Note that there are two common x86 assembly code syntaxes: Intel and AT&T (we use the latter). CSAPP (p. 177) and Wikipedia describe the differences (including switched operand order).

Submission

Submit: The course staff will collect your work directly from your hosted repository. To submit your work:

  1. Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.

  2. Make sure you have committed your latest changes. (Replace FILES with the files you changed and MESSAGE with your commit message.)

    $ git add FILES
    $ git commit -m "MESSAGE"
    
  3. Run the command cs240 sign to sign your work and respond to any assignment survey questions.

    $ cs240 sign
    

    (If this encounters an error, instead execute cs240.s24 sign.)

  4. Push your signature and your latest local commits to the hosted repository.

    $ git push
    

Confirm: All local changes have been submitted if the output of git status shows both:

  • Your branch is up to date with 'origin/main', meaning all local commits have been pushed
  • nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

The assignment is graded from a maximum of 100 points.

You may receive up to 80 points for navigating phases, based directly on the same results you see while working:

  • Phase 1: 5 points (Note: the Phase 1 input for adventure.bin is different than the Phase 1 input for sample.bin.)
  • Phase 2: 20 points
  • Phase 3: 20 points
  • Phase 4: 20 points
  • Phase 5 [Independent]: 10 points
  • Phase 6 [Independent]: 5 points
    • On pair assignments, you should continue to work with your partner on Independent parts. You must be Independent with respect to everyone else outside your team.

You may receive up to 20 points for descriptions, based on reading by graders:

  • 15 points: One or more descriptions from your completed phases 2-4 will be graded.
  • 5 points: One description from your completed [Independent] phases 5-6 will be graded.
  • Even if you do not solve a phase, writing what you know about it in the description can get some credit.

This point allocation scheme is designed to emphasize that, while completing the entire 6 phases is an excellent result, completing 4 or 5 phases is also a solid result. Make the choice that’s right for you.

  1. This document is an alternative (s/Austin Powers/Wellesley Adventure/g) description for the CSAPP Binary Bomb Lab, which is available on the CSAPP website

  2. Thanks to version control, it will be recoverable.  2 3

  3. If you did this assignment all in one sitting, come chat. We need to find you a challenge.