Assignment 10: SMT 2

In this assignment, you will:

Practice programmatically creating SMT expressions to model actual implementations (rather than designs).
Use a parsing library to analyze the structure of C programs.
Implement a (vastly simplified) symbolic execution engine for C programs!

This is a pair assignment—you may choose to either complete the assignment alone or with one other CS340 classmate. On Gradescope, you will be able to form a group with maximum size 2.

Setup

Stencil code

As before, this assignment uses the z3 SMT solver as a library in Python (Python 3, specifically). If needed, see the instructions from class to install the z3 library on your local machine.

In addition, the stencil code uses the pycparser Python library for constructing representations of C programs.

Install with:

pip3 install pycparser

Most of the code for interacting with this library is already implemented for you in the stencil code.

This assignment is based on a stencil file and provided C test programs, which you can download here.

In your directory, also create a README.{md, txt}.

Symbolic execution

Simplifying assumptions

In the basic requirements for assignment, you will implement a simplified version of symbolic execution that works for C programs with integers (int), assignments (int x = ...), conditional statements (if (...) { ...} else ...), and arithmetic expressions on integers (a + b) and booleans (a && b).

You implementation only needs to handle the features of C used by the test programs excluding extra-credit-bin-search.c, which is extra credit.

The list of features you do not need to support for the basic requirements includes (but is not limited to): loops, early return statements, arbitrary function calls, etc.

Goal

Our goal for this simple symbolic execution engine is to statically identify two types of potential bugs:

Problematic exit(...) statements (recall: in C, which exit codes indicate a problem?)
Division by zero. Unlike many other languages (Python, Java), division by zero does not crash your program, but is instead undefined behavior. This in essence means the compiler can produce code that does anything in the case where a division by 0 is attempted, which could cause serious bugs!

Simplified algorithm for symbolic execution

Because we have substantially reduced the type of control flow possible in a program, we can simplify how we handle path conditions and symbolic variables.

Rather than merging symbolic constraints after control flow rejoins (for example, after an if-statement), the stencil code is set up to recursively pass a single path condition and a single mutable symbolic variable map.

At any given program point, the path condition will constrain the entire combined constraint to get to that point.

Core simplifications

For assignment statements, the assignment should use the path condition at that point to conditionally update the assignment. This is less efficient than other strategies to merge assignments, but works for our simple examples. For example, if the path condition is $P$ and there is an assignment $x = y$ with symbolic values $x : M, y : N$ ; after the assignment the symbolic values should be $x : I f (P, N, M), y : N$ .
In the basic implementation, only if-statements should modify the path condition; and should do so recursively. For example, if the path condition is $P$ and there is an if statement:
```
if (c) { 
   block1 
} else { 
   block2 
}
```
Then for the recursive call for block1, the path condition should be $P \land C$ for (where $C$ is the symbolic representation of c); in the recursive call for block2 the path condition should be $P \land \neg C$ .

Data structures

The specific data structures you use for the path conditions and the symbolic constraints are up to you. However, both should contain in some way Z3 SMT expressions.

Checking potential bugs

At each point where there could be a potential bug (problematic exit or divide by zero) your engine should invoke an SMT solver to check if the bug is actually feasible for some concrete inputs.

Output format

For each potential bug, your symbolic execution engine should print either:

Ok, if it is unreachable; or,
Problem! plus the corresponding model if the bug is possible on some input.

For example, a working solution would print the following on a modified example from our last lecture:

$ python3 symex.py test-programs/3-shift2.c
Checking function f at test-programs/3-shift2.c:1:6
Checking exit with error
    Ok. Exit with error not possible in this case.
Checking exit with error
    Problem! Exit with error is possible.
    [y = 8388480, x = 8388608]

I will be manually checking these outputs, so feel free to format the model differently. For example, you may want to print the hex or binary version of values to make it easier to check if they do indeed represent a possible bug.

Suggested workflow

The stencil code is set up with the basic shape of a symbolic execution engine, but it has many marked TODO blocks that you will implement. In most cases, the code to access relevant fields of the parser API is already in place.

You can run the stencil code with:

python3 symex.py <filename>

I suggest trying to run your engine on each test file in numbered order, starting with the example from our first SMT lecture:

python3 symex.py test-programs/1-max.c

Initially, this will report just:

$ python3 symex.py test-programs/1-max.c
Checking function check_bounds at test-programs/1-max.c:4:6

Once you have implemented the TODO statements to support the features in 1-max.c, you should see (your model may vary):

Checking function check_bounds at test-programs/1-max.c:4:6
Checking exit with error
    Problem! Exit with error is possible.
    [b = 2724236881, a = 2151743455]

Adding concrete calls to `README`

Written Q1-9: In your README, for each test program 1-9, give examples of a concrete function calls that would cause each potential bug as defined in this assignment.

For example, for max(..., ...); with ... replaced with concrete values. You can test your calls (via running a C compilers).

If no bugs are possible, write "No possible bugs".

Extra credit

For up to 20 points of extra credit, you can extend your engine to handle more interesting feature of C.

One useful feature to implement is assert statements, which allows you to run your engine on 10-extra-credit-bin-search.c. An assert statement needs to modify the path condition for every statement that occurs after it; to implement this, either your path condition data structure needs to be mutable, or you need to modify visit_statement to return the path condition as part of its recursion.

Other ideas of features to support (varying in complexity):

The ternary operator (?).
Simple for statements with static bounds.
Early return statements.
Non-int data types (long, char, etc). I would suggest sticking to integral (non-float) types.

Grading

In your README, also report:
Written Q10: Collaborators other than your partner?
Written Q11: Roughly how long did this assignment take you (in hours)?

(Optional) Written Q12: If you implemented any extra credit, describe what features here.

This assignment is worth 100 points, broken down by:

80 points: Correct verdict on potential bugs
- 10 points per test programs 2-9.
10 points: Style and design.
10 points: README.

Submission

Submit the following files on Gradescope: symex.py and README.{txt, md}.