Symbolic execution

Unlike most of the modeling we've done so far on system design, symbolic execution is a strategy for reasoning about actual implementations of systems.

The core idea of symbolic execution is to run programs on "symbolic" inputs rather than concrete values. At each operation and program branch, we update the symbolic state to formulas that represent possible concrete values along that program path.

The examples of SMT that we have shown so far (and some that you are working on for current your homework) are essentially hard-coded versions of symbolic execution. A symbolic execution engine talks in program source code and produces the symbolic output, which can (often) tell us what program paths and what outcomes are possible.

Under-the-hood, most state-of-the-art symbolic execution engines use SMT solvers to reason about possible program paths.

Benefits of symbolic execution:

  1. Provide a concrete input on which the program fails to meet some specification.
  2. Analysis is done automatically, via the underlying solver.

Downsides:

  1. Often, in the literature, symbolic execution is unsound: it's used even in cases where not all program paths are explored.

What is a case where symbolic execution might not explore a program path completely?

Think, then click!

If a program has an loop that is not statically bounded(for example, a server always waiting for connections), then it wouldn't be possible for symbolic execution to reason about every possible path, since the number of paths is not statically knowable.


Symbolic execution by example

Consider the following program:

void f (int x, int y) { if (x > y) { int tmp = x; x = y; y = tmp; if (x - y > 0) { exit(-1); } } int z = x << 7; int w = x & 0xFF; if (!w) { exit(-1); } }

In last week's lecture, we loosely went over the idea that we can model nested if statements with and.

Full symbolic execution formalizes this process. At each program point, we track two data structures: (1) a mapping of variables to symbolic values, and (2) the path conditions to get to that program point.

If we treat points in the above program as nodes in a graph, we can draw the path conditions as edges and the symbolic values as annotations on each node:

symex

Notable points:

  1. We start with a path condition of just True.
  2. Most normal statements, like assignments (var = ...), do not modify the path condition.
  3. If statements do update the path condition, for both the if- and else- branches. The condition is added to the if- brand with an And(cond, ...) and added to the else branch with an And((Not(cond)), ...).
  4. Assignments cause us to update our symbolic variable mapping. In class, the strategy we used was to look up whatever variables are in the right hand side of the assignment in the symbolic map, and combine them (with constants as applicable) to build a new symbolic value for the variable.

At each point that could indicate a potential error (here, the exit(1) statements), a symbolic execution engine can use an SMT solver to check whether the path to get to that point is feasible. If these is some assignment of inputs such that the path is reachable, then we have an issue! If the SMT solver returns UNSAT, then the potential bug is unreachable, as desired.

:star: In class, I incorrectly said the first exit(1) was unreachable, and a student aptly pointed out that it is reachable due to overflow! This will become apparent in the next assignment. :sunglasses:

Is it possible to hit the error case on line 14?

Think, then click!

Yes, if the value of x at line 10 is odd. This is the case if the original value of x is odd and the if-statement is not taken, or if the original value of y is odd and the if-statement is taken.


Final project proposal

We spent the remainder of class talking about the final project and the final project proposal; you can find that information here.