This week, we'll go entirely beyond the realm of Alloy and look to other tools commonly used to model computer systems.

Solving beyond booleans

As we saw last week, SAT is an incredibly broadly applicable format in which to represent problems. However, the task of encoding a specific problem into SAT can be very difficult, depending on the domain.

Consider the example of addition in Alloy: we discussed last week that Alloy needs to model add on integers with n bits as an n-bit adder. For more complex int arithmetic, Alloy also needs to encode multiplication, etc.

An Satisfiability Modulo Theories (SMT) solver is a satisfiability solver that can handle domain-specific concepts (or "theories") beyond pure boolean logic.

The name "modulo theories" alludes to these solvers still answering the question of satisfiability, but with the addition of (for example) the theory of "bitvectors", which model machine integers (vectors of bits).

The main theories supported by SMT solvers that we'll talk about today:

  1. Bitvectors (machine integers)
  2. Real numbers
  3. Integers

And some more we'll see some of later:
4. Uninterpreted functions
6. Floating pointers
7. Arrays
8. Strings
9. Data types

Difference: design vs implementation

A core difference from Alloy is whether we are modeling based on a separate design specification, or based on the implementation itself.

To use Alloy, an engineer needs to write a new model of their design in the Alloy Analyzer. This is an example of design-level modeling.

SMT solvers use a more flexibly, lower-level format. You can think of this like an assembly code for modeling tools: it's more flexible in how different tools build on top of it.

Many SMT tools work on the implementation of a system itself, rather than a separate design specification (though you could make any choice as a modeler).

Example

For example of an application, consider this program in C, which tries to check the bounds of two variables and their sum:

#define MAX 16
void check_bounds(int a, int b) {
  if (a < MAX) {
    if (b < MAX) {
       if (a + b > 2*MAX) {
          printf("Error!\n");
          exit(1);
       } 
    }
  } 
}

Does this program ever crash with an error? If so, say when.

Think, then click!

Yes, this could hit the error!

C's int type uses a two's complement signed representation. If both a and b are negative numbers, their sum can overflow to a position number that's greater than 32.


We can use an SMT solver to find a specific instance of this bug for us.

The Z3 solver

The most popular SMT solver is one put out by Microsoft, called the Z3 solver. Note that SMT solvers are also often "theorem provers"; more on this later. z3 won an award in 2015 for "developing a software system that has had a lasting influence, reflected in contributions to concepts, in commercial acceptance, or both".

z3 can be used as a library within Python (and Rust, but we'll stick to Python for today).

If you want to follow along in class today, install z3 with:

pip3 install z3-solver

And update to the latest with:

pip install z3-solver --upgrade

You can then use Z3 within a Python file:

from z3 import * 

x = Int('x')
y = Int('y')
solve(x > 2, y < 10, x + 2*y == 7)

Continuing example: bitvector bounds

We can model our previous example in Z3 by using the theory of bitvectors.

First, we can model this directly in the SMT-LIB format. This is a low-level format (you can think of it almost like an assembly code for verification and modeling tools). It uses a syntax with lots of parenthesis, like many functional programming languages (e.g., Lisp). We'll only briefly show this low-level SMT-LIB format; after this, we'll instead work with SMT via libraries in Python or Rust.

;; Create two new variables, that are 32-bit integers
(declare-const a (_ BitVec 32))
(declare-const b (_ BitVec 32))

;; Model if-statement 1:
;; Add a constraint that a < 16 (in hex)
;; "bvslt" is for "bitvector signed less than"
(assert (bvslt a #x00000010))

;; Model if-statement 2
;; Add a constraint that b < 16 (in hex)
(assert (bvslt b #x00000010))

;; Model if-statement 3:
;; Add a constraint that the sum of a+b < 32 (in hex)
(assert (bvsgt (bvadd a b) #x00000020))

;; Check if our model has any satisfying instances
(check-sat)

;; Print one such instance (see the note on terminology below)
(get-model)

If we save this file as ex.smt, we can run this as:

z3 ex.smt

Note, you may need to install z3 as a separate utility (e.g., brew install z3) to run on this file; however, this is not necessary for the assignments in this class where we will use z3 as a library in Python or Rust.

Terminology note: In Alloy, we used the word model to describe the definitions and constraints you use to model a system. This is generally what the software-engineering community means by the word. The logic community, on the other hand, uses model to mean the same thing that we call an instance in Alloy or "assignment" in SAT latest homework: the assignment of values to variables that either satisfies or does not satisfy a set of constraints. Z3's "get model" uses the word "model" like a logician, not a software engineer, and it means the concrete assignment.

Python's z3 library

To make it easier to use z3 without writing code in the low-level format, the remaining examples will use Python's library for z3.

We can model our example now as this:

from z3 import *

# Create a new solver to add constraints to
s = Solver()

# Declare the 32-bit variables
a = BitVec('a', 32)
b = BitVec('b', 32)

# Add constraints. This type, the "<" operator is overloaded to also work for 
# bitvectors, without needing to use the special named version. It means the
# same thing; the library handles the conversion based on the declared types.
s.add(a < 0x00000010)
s.add(b < 0x00000010)
s.add(a + b > 0x00000020)
s.add(a == 0xffffffff)

# Similar to a check/run in Alloy
print(s.check())

# Similar to pressing the "see instance" button in Alloy
print(s.model())

# To print the value of b in hex
print(hex(int(str(s.model().evaluate(b)))))

Eager vs lazy

SMT solvers can be "eager" or "lazy". An "eager" approach immediately translates all constraints to booleans, then asks a basic SAT solver to find the answer. The approach Alloy uses of modeling one boolean variable per bit to handle integers is called "bit-blasting", and is an example eager approach.

Most state-of-the-art general purpose SMT solvers are instead lazy. They instead use specialized domain-specific algorithms and combine those with a purely-SAT core, lazily converting between the two only when needed.

Different numeric types

z3 solvers use several different types to represent numbers. Let's go over a different between integers (mathematical, unbounded ones) and reals.

We'll do so with a question: is there an x such that x squared is in between 4 and 9?

For integers, we expect no! Let's see this in SMT:

# Goal: find an x where x^2 falls in between two square numbers, e.g,
# x^2 greater than 4 but less than 9
def demo_ints():
    s = Solver()
    x = Int('x')

    # Note: power doesn't work in this library, use *
    s.add(x*x > 4)
    s.add(x*x < 9)

    result = s.check() # returns either SAT or UNSAT

    # We can only print the instance if the result is SAT
    if result == sat:
        print(s.model())
    else: 
        print(result)

    # Integers: we expect no answer, get UNSAT result

When we run this, we get unsat, as we expected, since no such integer exists.

What if we change the example to use Real instead of Int?

def demo_reals():
    s = Solver()
    
    # Now: Real, not Int
    x = Real('x')
    s.add(x*x > 4)
    s.add(x*x < 9)

    result = s.check() 
    if result == sat:
        print(s.model())
    else: 
        print(result)
    # Now, we expect a result!

When we run this new version, we get:

[x = 5/2]

When we allow non-whole numbers, an x that meets these constraints does exist! Many such x values exist, we may get a different one if we run z3 multiple times or on multiple different computers.

:star: Student question: what's the different between BitVec and Int?

We can use z3 to show us the difference.

Consider these constraints: is there an x such that x + 1 > x?

If we use the Int type:


def bitvector_diff():
    s = Solver()
    x = Int('x')

    s.add((x + 1) < x)

    result = s.check() 
    if result == sat:
        print(s.model())
    else: 
        print(result)

We get unsat: there is no such x, as we expect.

But if we use the bitvector BitVec type:


def bitvector_diff():
    s = Solver()
    x = BitVec('x', 4)

    s.add((x + 1) < x)

    result = s.check()
    if result == sat:
        print(s.model())
    else: 
        print(result)

We get SAT! [x = 7]. 7 in 4 bits is 0111. When we add 1, we get 1111, which is -1!

In short: bitvectors model overflow, like actual machine integers that have a fixed number of bits.

Engineers using z3 have to decide: do they want their model to actually find overflow bugs? Then they should use the theory of bitvectors. Otherwise (or if they need other mathematical integer properties), they should use the theory of integers.

Showing equivalence with SMT

A frequent modeling task is to show the equivalence of two representations. For example: are two hardware circuit designs equivalent? Does the unoptimized and optimized version of an algorithm produce the same result?

The most direct way to format an equivalence question as a human would be this:

For all inputs, are form A and form B equivalent?

For all is a quantifier, like all in Alloy. Specifically, "for all" is a universal quantifier, while "there exists" is an existential quantifier.

It turns out that it's often easier for modeling systems to reason about existential quantifiers than universal ones: the solver just needs to find one instance to answer the question.

How can we formulate the equivalence question using "there exists" instead of "for all"?

Think, then click!

As we saw earlier in CS340, we can convert this by thinking about what it means for a counterexample to exist.

In general, the statement "for all x, P(x)" where P is some property asks if something is true for all x. For example, "for all Wellesley students, the student is in Workday."

We can instead formulate this question by asking if a counterexample exists (if a counterexample exists, the original statement is false; if a counterexample does not exist, then the original statement is true). "for all x, P(x)" = "Not (exists y, not P(y))".


In terms of our equivalence question and SMT solvers, we can see if A and B are equivalent by asking: is there some world for which A and B are not equal? If the counterexample exists (the SMT solver returns SAT), then equivalence is false. If no counterexample exists (the SMT solver returns UNSAT), then equivalence is true (the outcome we usually want). More on this next time!

Example: implies definition

We saw earlier in CS340 that logical implies can be expressed with not and or. Let's use z3 to show these two forms are equivalent:

# Example: the definition of "P IMPLIES Q"
# represent with "AND, OR, NOT"
        
def demo_implies_equiv():
    s = Solver()
    p = Bool('p')
    q = Bool('q')

    # For all p, q, does "p => q" mean the same as "(not p) or q"
    # Ask the same question: answer is answerable SAT/UNSAT 
    s.add(Not(Implies(p, q) == Or(Not(p), q)))
    print(s.check())
    # print(s.model())
    # UNSAT => no counterexample => equivalent (good)
    # SAT => FOUND counterexample => NOT equivalent (bad!!!!)

Example: mixing quantifiers

In some cases, we need to combine (implicit) "there exists" quantifiers with explicit "for all" quantifiers.

Consider factoring integer polynomials:

def demo_int_factoring():
   # (x + ROOT1)(x + ROOT2) = x^2 - 4

    s = Solver()

    x = Int('x') # Use int because we just are doing math, not worrying about overflow
    r1 = Int('r1')
    r2 = Int('r2')

    rhs = (x*x) - 4
    lhs = (x + r1) * (x + r2)

    # If we try this, it works for *some* x, not *all* x
    # s.add(rhs == lhs)

    # print(s.check())
    # print(s.model())

    # What we want to say here: *there exists* some r1 and r2 such that *for all* x, 
    # (x - r1)(x + r2) = x^2 - 4 

    # For this mix of "there exists" and "for all", it's useful to have an explicit quantifier
    # in the SMT code itself

    # ForAll is an actual operator we can use in SMT

    s.add(ForAll(x, lhs == rhs))
    print(s.check())
    print(s.model())

We can repeat this with real numbers to find solutions in more cases (or no solutions in some cases):

def demo_real_factoring():
    s = Solver()

    x = Real('x') # Use real to not restrict to whole numbers
    r1 = Real('r1')
    r2 = Real('r2')

    lhs = (x + r1) * (x + r2)

    #  Find = x^2 - 198x - 39484

    rhs = (x*x) - 198*x - 39484

    s.add(ForAll(x, lhs == rhs))
    print(s.check())
    print(s.model())

#  demo_real_factoring()

def demo_real_factoring_unsat():
   # (x + ROOT1)(x + ROOT2) = x^2 - 4

    s = Solver()

    x = Real('x') # Use real to not restrict to whole numbers
    r1 = Real('r1')
    r2 = Real('r2')

    lhs = (x + r1) * (x + r2)

    #  Find = x^2 -2x + 5 | (b^2 - 4ac) 
    rhs = (x*x) - 2*x +5

    s.add(ForAll(x, lhs == rhs))
    print(s.check())
    # print(s.model())

demo_real_factoring_unsat()

Theory of uninterpreted functions

SMT also supports uninterpreted functions, which are treated as arbitrary mathematical functions (they pass the veritcal line test) unless further constrained.

def demo_functions():

    s = Solver()

    x = Int('x')
    y = Int('y')

    f = Function('f', IntSort(), IntSort())

    # s.add(f(x) == y)
    s.add(x != y)
    s.add(f(x) == y)
    s.add(f(f(x)) == x)
    
    print(s.check())
    print(s.model())

# demo_functions()

Often, real systems will use uninterpreted functions to model or approximate functions that are not directly available in pure SMT:

def demo_functions_model_function():

    s = Solver()

    x = Int('x')
    y = Int('y')

    square = Function('square', IntSort(), IntSort())
    
    # Full model
    # s.add(ForAll(x, square(x) = x * x))

    # Approximate model
    s.add(ForAll(x, square(x) >= 0))

    print(s.check())
    print(s.model())

# demo_functions_model_function()