code Reverse engineer compiled x86 runes to recover the Sourceror’s Code.

Overview

Silly version: Overhearing a hushed conversation in Prof. McCodegal’s office, you learn that the powerful, yet volatile, Sourceror’s Code lies hidden away in the depths of the Science Castle, guarded by six phases of magical protection. It holds the key to eternal coding enlightenment (and also the key to finishing your homework). While wandering the Science Castle corridors, you find an unassuming door you believe hides the Code. Meanwhile, the tutors who shall not be named are launching their own quest to steal away the Sourceror’s Code for the questionable machinations of their shadowy organization. You must retrieve the Sourceror’s Code before it is too late!

Fortunately, the designers of the six magical protections have left behind their recipes in ancient x86 runes. You must decipher the runes to determine how the protections work and discover the magical incantations that will disarm them. But beware, these protections are not to be taken lightly – speaking the wrong incantation will trip an alarm and ensnare you in a magical trap. By disarming all six protections, you will recover the Sourceror’s Code and save the day.

Serious version: We have given you a compiled x86 binary executable without the C source code that created it. You must find inputs that satisfy six computational phases of the program to avoid tripping the alarm (and deducting points). To do this, you will disassemble the executable and work with the x86 code to reconstruct what the program computes with each input. You will submit the six inputs plus your descriptions of what each phase computes (in English or C source code). You receive points for each correct solution, but you may lose points every time you trip the alarm (which notifies us).

Tips for Success
  • Start early. This assignment will be as challenging (and fun) as Bit Transfiguration, but in a different way. We will do the first phase together in lab.
  • You cannot do this assignment in one sitting.1 Allocate time to work on it repeatedly with other things between.
  • Use the tools and techniques introduced by lab and this document.
  • Do not worry if you get stumped temporarily. Go do something else and come back refreshed.
  • Share reverse engineering strategies, patterns, and ideas with each other.
  • Take plenty of breaks for puns, laughing, sprinting around the microfocus, and other wheezes. Have fun!

Contents

Setup

Do this assignment in one of the CS 240 computing environments. The runes in this assignment require an x86-64 processor running Linux. (Most insurance policies do not cover loss of time or eyebrows due to use of other environments.) In fact, it appears as though the magical protections may ensnare you (and deduct points) immediately when run in other environments.

Get your x86 Runes Adventure Kit via the usual process (see the top of the page). Your kit contains the following files:

  • descriptions.txt: File in which you describe the x86 code
  • incantations.txt: File in which you write your six inputs
  • sample: A sample binary executable we will use together only in lab
  • executables: A directory containing one binary executable per student, named by Bitbucket ID
  • main.c: C source file with the executable’s main function

Select your unique executable. There is one for each student.

$ hg mv executables/your_bitbucket_id runes
$ hg rm executables
$ hg commit -m 'Selected my runes'

Tasks

Your executable has six phases of protection, each requiring an incantation (input) to disarm. You must use various principled tools and techniques to reverse engineer your executable before attempting to disarm it. Entering an incorrect incantation (input) at any phase trips an alarm, notifies the instructors, and deducts a tiny portion of your grade, so guessing randomly or by brute force is inadvisable.

A lab activity helps you learn basic reverse engineering tools and techniques and disarm the first phase of a sample executable. Later phases get progressively harder to disarm, but the expertise you gain with each phase should offset the increased difficulty. If you are stumped, note that *hints* about each phase appear in the messages that the executable prints.

Unsubtantiated rumors suggest there may be a secret seventh protection hidden among the six that can be disarmed for extra fun, credit, and reward.

Submit two parts for each phase:

  1. Incantation (input): Write your disarming incantations for each of the six phases on separate lines in incantations.txt, in order.
  2. Description: In the separate descriptions.txt file, write a succinct paragraph or two for each phase, describing:
    • What the phase is computing with your input at a high level of abstraction.
    • Some key features of the x86 code that correspond to the high-level computation.

    Describe at a high level as if you are summarizing whatever C code compiled to this assembly/machine code. (Feel free to write C to describe what is computed.) Do mention a couple assembly details that were particular aha! moments or red flags that alerted you to this high-level structure, but do not give a line-by-line run-down of the assembly code.

    Keep notes along the way so you do not need to repeat the reverse engineering process to remember how it worked!

Grading

The assignment is graded from a maximum of 100 points.

You may receive up to 85 points for disarming phases:

  • Phase 1: 15 points
  • Phase 2: 15 points
  • Phase 3: 15 points
  • Phase 4: 15 points
  • Phase 5: 15 points
  • Phase 6: 10 points

But you may lose one point every time the protections catch you, for a total of at most 10 points deducted. (Use breakpoints to avoid this!)

You may receive up to 15 points for descriptions:

  • We will grade descriptions of one or two phases (chosen randomly), by the criteria above.

If a secret phase exists and you disarm it, you will receive a number of extra points that will become non-secret only if you disarm the hypothetical secret phase…

Usage

The executable ignores blank input lines. If you run your executable with a command line argument, for example:

$ ./runes incantations.txt

Then it will read input lines from the file incantations.txt until it reaches EOF (end of file), when it will switch over to read from stdin (standard input from the terminal). In a moment of weakness, the magical designers added this feature so you do not have to retype the solutions to phases you have already disarmed.

Text Encoding Matters

Make sure to add a new line (return) at the end of each phase input in incantations.txt, otherwise you may trip the alarm. emacs will do this automatically.

Do not edit or copy inputs from rich text applications, which sometimes use alternative text encodings of similar-looking characters or line endings.

To avoid accidentally triggering the protections, becoming ensnared, and losing points, you must learn how to single-step through the assembly code and how to set breakpoints. You will also need to learn how to inspect the state of both registers and memory. We will learn these techniques in lab. One of the nice side effects of this assignment is that you will get very good at using a debugger. This is a crucial skill that will pay big dividends in the future.

To run the executable using your incantations.txt under gdb, invoke gdb on the executable alone, then when using the run command inside gdb (or abbreviated to r), give the command-line arguments there:

$ gdb runes
... startup messages from gdb...
(gdb) run incantations.txt

x86 Resources

There are a number of online resources that will help you understand any assembly instructions you may encounter while examining the executable. In particular, the programming manuals for x86 processors distributed by Intel and AMD both describe the same ISA, but sometimes one may be easier to understand than the other.

Note that there are two common x86 assembly code syntaxes: Intel and AT&T (we use the latter). CSAPP (p. 177) and Wikipedia describe the differences (including switched operand order).

Tools and Techniques

There are many ways of figuring out how to disarm your executable. You can examine it in great detail without ever running the program and determine exactly what it does. This is a useful dskill, but it not always easy. You can also run the executable under a debugger, watch what it does step by step, perform disassembly along the way, and use this information to understand its behavior and disarm it. This is usually the most effective method.

Please do not use brute force! You could try every possible key to find the right one (or write a program to do so), but the number of possibilities is so large that you will not be able to try them all in time.

There are many tools designed to help figure out both how programs work and what is wrong when they do not work. You find the following tools (and hints on how to use them) useful in analyzing your whizbang.

gdb: The GNU debugger is a command line debugger tool available on virtually every platform. You can trace through a program line by line, examine memory and registers, look at both the source code and assembly code (we are not giving you the source code for most of your whizbang), set breakpoints, set memory watch points, and write scripts. We have used GDB to examine the execution of C programs. Here, without C source or debugging symbols, you will examine only the compiled x86 program.

Here are some tips for using gdb:

  • To keep the protections from catching you (and losing points) every time you type in a wrong input, you will want to learn how to set breakpoints. (We will do this in lab. In short, break trip_alarm will save you many points.)
  • Here is a one-page gdb reference sheet. The tools page has more links.
  • For other documentation, type help at the gdb command prompt, or type man gdb, or info gdb in the shell. Some people also like to run gdb under gdb-mode in emacs.

objdump -t runes: This will print out the executable’s symbol table. The symbol table includes the names of all functions and global variables in the executable, the names of all the functions the executable calls, and their addresses. You may learn something by looking at the function names!

objdump -d runes: Use this to disassemble all of the code in the executable. You can also just look at individual functions. Reading the assembler code can tell you how the executable works. Although objdump -d gives you a lot of information, it does not tell you the whole story. Calls to system-level functions may look cryptic. For example, a call to sscanf might appear as:

8048c36: e8 99 fc ff ff call 80488d4 <_init+0x1a0>

To determine that the call was to sscanf, you would need to disassemble within gdb.2

strings -t x runes: This utility will display the printable strings in your executable and their offset within the executable.

Looking for a particular tool? How about documentation? Do not forget, the commands apropos and man are your friends. In particular, man ascii is very useful. If you get stumped, come to office hours or feel free to post questions to the course’s Google group.

  1. If you can do this assignment all in one sitting, come chat. We need to find you a challenge.