Assignment 8: SAT Solver

Learning outcomes

In this assignment, you will:

This is a language-agnostic programming assignment.

Setup

Programming language

You may use any programming language to complete this assignment, with instructor permission. Permission is granted by default for two programming languages: Python and Rust.

You can run this assignment locally, on your own machine, but it will be tested in a UNIX-based environment by the Gradescope autograder. You should not need to make any changes for a Linux or Mac based solution, but if you develop on Windows, I suggest testing your solution on the CS Linux system.

Minimal stencil files

The autograder runs your solver by running a run.sh script (so that your solver can be implemented in either Python or Rust). The run.sh script assumes a certain filename format.

When you are ready to start your implementation, start your solver with the run.sh script and minimal stencil files for the language of your choice found here.

The SAT problem

As we have discussed in class, Alloy (and many other tools) use underlying solvers to analyze models of useful systems. One of the most foundational types of solver is a boolean satisfiability solver, also known as a SAT solver.

The boolean satisfiability problem was the first problem shown to be NP-complete, and it has enormous significance across both the theoretical computer science and practical applications.

In short, the boolean satisfiability problem is this:

The SAT problem: For a given boolean formula, does there exist an assignment of true and false to the variables such that the formula evaluates to true?

Boolean formulas are composed of a finite number of named variables combined with and, or, and not. (More complicated logical operators, such as implies, can be reduced to these 3 core operators, as we saw earlier in CS340).

Conjunctive Normal Form (CNF)

For both algorithmic reasons and to facilitate reuse across solvers, most SAT solver take in boolean formulas in the specific format we covered in class: conjunctive normal form, or CNF.

In CNF, a formula is made up of a list of clauses of literals.

  1. A literal is a boolean variable (e.g., x or it's negation not x).
  2. A clause is a list of literals combined with or (e.g., (x or y or not z)). Mathematically, this is called a disjunction.
  3. The top-level formula in CNF is a list of clauses combined with and (e.g., (x) and (x or y or not z)). Mathematically, this is called a conjunction.

SAT competition format

There is an annual competition to compare SAT solvers, which specifies an even more specific CNF to make it easier to parse inputs and compare outputs.

Input

In addition to rules 1-3 above, the SAT competition format (sections 4.1 and 5.2) also specifies how a formula is encoded as lines of text in a file. Specifically, the format your SAT solver should expect for its input is as follows. Your solver should take in a single filename as its argument. The autograder will only test against example files that meet this format; you do not need to do extensive error checking and can assume the following:

  1. Variables are integers. A positive number \(i\) indicates the literal is the positive form of the variable, a negative number \(-i\) indicates the literal is negated (not i ). For example, a 7 in a clause refers to the 7th variable. A -9 in a clause refers to the negation of the 9th variable.
  2. The file starts with 0 or more comment lines that begin with c followed by a space. Your solver should ignore these lines.
  3. The first non-comment line should have the form p cnf <numvar> <numclauses>, where <numvar> is the number of variables and <numclauses> is the number of clauses.
  4. After the first line, each subsequent line represents a clause. Each clause is a sequence of distinct non-null numbers. Positive numbers indicate the corresponding variable. Negative numbers indicates the negation of the corresponding variable. A clause will not contain the opposite literals i and −i for a given variable. Each literal is separates by a space, and the clause ends with a 0 to indicates the end of the clause. The 0 technically allows clauses to span multiple lines, but the autograder will only test with each clause on exactly one line.

Input CNF example

p cnf 4 3
-1 0
1 3 -2 0
-1 4 0

p cnf 4 3 indicates that there are 4 variables and 3 clauses.

We can read this as describing the formula:

Or, equivalently using variables a, b, c, d:

(!a) and (a or c or !b) and (!a or d)

Or:

\[ (\neg a) \wedge (a \vee c \vee \neg b) \wedge (\neg a \vee d) \]

You can use this online CNF SAT solver to quickly check if an input is correctly formatted.

Output

The SAT competition format specifies the following for the output (written to standard out):

  1. The first line is the solution line, starting with an s. It is either:
    • s SATISFIABLE
      • This indicates that the formula is satisfiable.
    • s UNSATISFIABLE
      • This indicates that the formula is unsatisfiable, e.g., no assignment that satisfies the formula exists.
  2. If the formula is satisfiable, the following lines indicates the value mapping. Each starts with a v, then a space, then a space-separated list of literals, with positive numbers indicating a true assignment and negative numbers indicating a false assignment. The final value line is terminated with a 0.

Output example

A correct solution could output the following for the input example. Note the exact literals chosen may differ, since there can be multiple valid assignments for a given problem.

s SATISFIABLE
v -1 -2 3 -4 0

Your task: Solver

Your task for this assignment is to implement a simplified version of the classic DPLL algorithm for SAT solving.

The two cores idea you will implement are:

  1. Unit propagation (finding literals on their own and using them to simplify other clauses).
  2. Backtracking search (choosing a variable and guessing true or false; trying the other option if the first fails).

Unit propagation

As we discussed in class, unit propagation captures the same idea we as humans likely use when trying to find a solution to a SAT problem.

If some clause has only a single literal, then we know that the final assignment must assign that variable the value of the variable. The intuition is that if we have x and <anything>, we know x must be assigned \(true\); if we have !x and <anything>, we know x must be assigned \(false\).

The propagation component is that we can use this knowledge of a single variable's value to simplify other clauses.

As we saw in class, there are two cases:

  1. Case 1: Any clause that contains the literal, with the same sign, is now also satisfied (because we have committed to that literal in our mapping). We can remove the entire clause (or otherwise mark it as satisfied, depending on how you structure your solution).
  2. Case 2: Any cause that contains the literal, with the opposite sign, will not have this literal be the reason it is satisfied (because we have committed to the opposite assignment of the literal in our mapping). We can remove the negated literal from the clause (potentially creating an empty clause, depending on how you structure your solution).

It is up to you how exact to model changes to the boolean formula. You can either construct a new boolean formula as the result of unit propagation, or mutate the existing formula, as long as you are using a data structure design that enables the previous formula to still be recoverable during the backtracking search.

Some possible pseudocode:

unit_propagate(formula, mapping):
  for each unit clause {+/-x} in formula:
    add {+/-x} to mapping
    // Case 1
    // Same sign only!
    remove all clauses containing +/-x
    // Case 2
    // Flipped sign only!
    remove all instances of -/+x in every clause
  return formula

Note that this step may remove variables that are not needed to make the entire formula true. It's up to you whether your solution tracks all variables at the start, or tracks the potentially-removed variables here.

The DPLL implements backtracking search via a recursive algorithm.

In each recursive call, DPLL first propagates all unit clauses. After that, one of the following must be true:

  1. No clauses remain. Because DPLL removes satisfied clauses, this means the search has succeeded with a successful SAT result.
  2. The empty clause exists in the list of clauses. Because DPLL removes unhelpful literals to a clause, this means it derived a contradiction, which means this path is UNSAT.
  3. Neither of the above cases hold, which means DPLL must make a guess; then potentially backtrack if the guess was not successful.

Full algorithm

Possible pseudocode for the recursive backtracking is included below. You do not need to follow this exact pseudocode, but your submission should include both unit propagation and backtracking search.

You are free to look up other pseudocode for DPLL online. However, for academic integrity, please do not reference other full implementations of DPLL.

You are free to use libraries to aid in your implementation, but please do not use any libraries designed for SAT solving (since that is the point of this assignment!).

// Inputs: formula (list of clauses), assignments (mapping)
dpll(formula, assignments):
  while formula contains unit clause:
    // Unit propagate potentially updates assignments and the formula
    formula = unit_propagate(formula, assignments) 
        
  if no clauses remain:
    // Since we remove satisfied clauses, this means we are done and have a SAT 
    // result! :)
    return SAT, assignments
        
  if formula has empty clause:
    // Since we remove unhelpful literals to a clause, this means we must have
    // hit a contradiction, so this path in the search is UNSAT :(
    return UNSAT

  // You can use any strategy you'd like to pick a variable to split on
  x = pick_variable(formula)
      
  // Guess that the variable is true
  // Note: use a copy of assignments because we may need to backtrack and
  // forget the assignments used during this solve attempt
  asgn_copy = assignments + {x : true}
  true_result = dpll(formula + [x], asgn_copy)
      
  // If our guess worked out, then we are done! :)
  if true_result is SAT
    return true_result

  // If our guess did not work out (true_result is UNSAT), try making the 
  // opposite guess for the same variable
  // Guess that the variable is false
    asgn_copy = assignments + {x : false}
  false_result = dpll(formula + [!x], asgn_copy)
  // There is no other possible value for x, so return whatever we have,
  // regardless of whether it is SAT or UNSAT
  return false_result
}


// Be sure to also pick arbritrary values for variables not needed in the final
// assignments.

Extra credit: pure elimination [Independent]

The full DPLL algorithm implements an additional strategy, pure elimination.

A literal is "pure" if, whenever it appears in a clause, it always has the same sign. In that case, DPLL commits to that sign in the mapping then simplifies the formula based on that decision.

You can choose to implement pure elimination for up to 10 extra credit points. If you do so, document so and describe your addition in your README.

Examples (:star: workshop)

Here are a few examples that walk through this simplified DPLL algorithm. We'll walk through these in groups together in workshop.

Example 1

If we had the following input:

p cnf 5 4
-1 0
1 3 -2 0
1 5 0
4 -5 0

First, draw a representation of the list of clauses and the initial mapping for this formula. Once your group has your answer, check it below.

Expand once you have your own version of `formula` and `assignments` for this example

One possible way to draw the formula (you will likely have a more complicated data structure for this in your actual implementation!). We start with an empty assignment mapping.

Formula:
[
  [-1],
  [1, 3, -2]
  [1, 5]
  [4, -5]
]

Assignment:
{}

Next, draw new version of the formula and the mapping (don't erase your old version) after each round of unit propagation and each recursive call to the algorithm. Check your group's answers with Alexa, then expand to see one possible solution below.

Expand once you've checked your version with Alexa.

The first clause is a unit clause, so we commit to 1 being \(false\) and propagates that knowledge.

Formula:
[
  [3, -2]
  [5]
  [4, -5]
]

Assignment:
{
  1 : false,
}

This created another unit clause for 5, so we propagate again:

Formula:
[
  [3, -2]
  [4]
]

Assignment:
{
  1 : false,
  5: true,
}

This created another unit clause for 5, so we propagate again:

Formula:
[
  [3, -2]
]

Assignment:
{
  1 : false,
  5: true,
  4: true,
}

Now, there are no more unit literals, but we have neither an empty list of clauses nor have we reached the empty clause (a contradiction). So, we make a guess: let's say 3 is \(true\). Here is one way to model that:

Formula:
[
  [3]
  [3, -2]
]

Assignment:
{
  1 : false,
  5: true,
  4: true,
  3: true,
}

Which gets simplified in the next call to:

Formula:
[
]

Assignment:
{
  1 : false,
  5: true,
  4: true,
  3: true,
}

We now hit the success case of an empty list of clauses!
Since 2 has no assignment, we can pick any assignment for it, and return SAT with our assignment.

We should print:

s SATISFIABLE
v -1 2 3 4 5 0

We can sanity check that this makes sense by mentally checking that for each clause line in the input, at least one of its literals has the same sign in the solution as it does in that clause.


Example 2

Repeat the above exercise with the following input:

p cnf 2 4
1 2 0
-1 2 0
1 -2 0
-1 -2 0
Expand once you've checked your version with Alexa.

We would start with:

Formula:
[
  [1, 2],
  [-1, 2]
  [1, -2]
  [-1, -2]
]

Assignment:
{}

We have no unit clauses, so we might guess 1 to be true:

Formula:
[
  [1]
  [1, 2],
  [-1, 2]
  [1, -2]
  [-1, -2]
]

Assignment:
{
  1 : true,
}

Which after unit prop results in first:

Formula:
[
  [2]
  [-2]
]

Assignment:
{
  1 : true,
}

Then this intermediate step:

Formula:
[
  [2]
  [-2]
]

Assignment:
{
  1 : true,
  2 : true,
}

Which results in:

Formula:
[
  []
]

Assignment:
{
  1 : true,
  2 : true,
}

We have an empty clause! Note that this is different from having no clauses; the empty clause is unsatisfiable. Hence our guess was wrong. We now need to backtrack and assign 1 to be false.

However, we also hit the empty clause in the false guess, which means the entire formula is truly unsatisfiable. We should then print:

s UNSATISFIABLE

Design

In your README, document the following:

Written Q1: Describe the purpose and design of your solver, in your own words, as if this were a publicly accessible project. (You do not need to document the input/output format beyond stating that it is the SAT competition format).
Written Q2: What data structures did you use to represent the formula and the assignment mapping? Did you mutate each, or return new copies for unit propagation and each recursive call?
Written Q3: What strategy does your solver take for choosing which variable and assignment to guess?

Testing

Create at least 5 new expected SAT formulas and 5 new expected UNSAT formulas to test your solver. These should be in addition to the examples given here.

You can add tests either directly as your own formula representation, as strings in the specified format, or as new files in the specified format.

Implement automatic testing such that your tests are all run with a single command line call (this can be accomplished by either adding a new executable file that calls functions from your main file, or by using a unit testing library, or by adding a script that passes each file into your solver).

Answer the following in your README.

Written Q4: Give the command line call to execute that runs your tests. Describe a few examples of what different tests are checking for.

Extra credit: property based testing

For up to 20 points of extra credit, implement a randomized test generator for your solver. You can either use your own generator, or use a PBT library such as Hypothesis.

Note that you will likely not receive extra credit if your baseline solver does not pass the correctness tests.

Grading

In your README, also report:
Written Q5: Collaborators you discussed or debugged with?
Written Q6: Roughly how long did this assignment take you (in hours)?
(Optional) Written Q7: If you chose to implement either extra credit option, describe your additions here.

This assignment is worth 100 points, broken down by:

60 points: Solver correctness

20 points: Design of your solver

10 points: Solver test suite

10 points: README

Submission

Submit the following files on Gradescope: run.sh, README.{txt, md}, and the file(s) that implement your solution.

Attribution

This assignment is based on a previous version of the SAT assignment from Tim Nelson's Logic for Systems at Brown University.