Today, we’ll wrap up the preliminaries on mathematical logic and introduce a new concept, property-based testing.

Preliminaries: Logic

Propositions and predicates

Proposition

A proposition is a statement that can be true or false.

Wellesley College is in MA and Wellesley College is in CT are propositions.

Give me some chocolate. is a sentence but not a proposition.

Predicate

A predicate is a statement with a variable in it, that is true or false depending on the value of the variable.

x>1 is a predicate. We might write this as P(x)=x>1.

If we have a variable in predicates that are used in an implication P(x)Q(x), we read this as “for all x,P(x)Q(x)”. The full statement is then a proposition. The proposition is false if there exists any x such that P(x) is true but Q(x) is false.

Operations

Basic operations

As a review, the basic logic operators are:

not, ¬

and,

or,

xor,

Implication

Recall the truth table for logical implication:

A B    A => B
-------------
T T    T
T F    F
F T    T
F F    T

Not that P implies Q is the same as not P or Q (you can work out the truth table to convince yourself).

In math:

PQ is equivalent to ¬PQ.

Exercise: Show that PQ is equivalent to ¬Q?.

(Think, then click!)

Solution:
PQ¬Q¬P.

This can be obtained by starting with this definition of implication:

¬PQ

Then double-negating Q:

¬P¬¬Q

Flipping the order of the or:

¬¬Q¬P

And adding parenthesis to get back the shape of an implication:

¬(¬Q)¬P


This is called the contrapositive.

The contrapositive of:
Square implies rectangle.
is
Not rectangle implies not square.

These are equivalent (and true).

Equivalence/bidirectional implication/iff

And the truth table for equivalence, or bidirectional implication:

A B    A <=> B
-------------
T T    T
T F    F
F T    F
F F    T

We often pronounce this as “if and only if” (iff). In many mathematical proofs of equivalences, you prove the “forward” direction and then prove the “backward” direction to show that two propositions are equivalent.

Quantifiers

Quantifiers help us turn predicates into propositions that have a true or false value.

Universal:

“For all”, “for every”.

In math notation: .

Existential:

“For some”, “there exists”, “there is a”.

In math notation: .

Relationship between quantifiers

x.P(x)¬x.¬(P(x))

This has a close relationship with counterexamples, which will be very important in CS340!

“For all inputs, the program produces the right output.”

is equivalent to

“There does not exist an input where the program produces the wrong output.”

We can also negate both sides:

¬(x.P(x))x.¬(P(x))

“It’s not the case that for all inputs, the program produces the right output.”

is equivalent to

“There exists some input where the program produces the wrong output.”

Property-Based Testing (PBT)

Now that we have the preliminaries down, we’ll get back down to business thinking about the correctness of computer systems.

Traditional test cases

In designing software, we write some code, and then maybe we write some tests to see if the code works.

Classically, tests are input and output pairs. Maybe we have multi-step tests that check multiple system outputs.

What are some strengths of testing software via traditional testing?

(Think, then click!)

What are some weaknesses of testing software via traditional testing?

(Think, then click!)

We’ll now dig into an example that highlights some of the weaknesses in traditional unit testing.

Example program: sorting records

Let’s think about a program that sort records (lightweight objects). Each record has a name and an ID.

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
]

When defining a sort on this list, what do we need to specify?

(Think, then click!)

What are we sorting by, name or id or both? In what order?


Let’s say we just want to sort by names. Call our function sort_records.

Your coworker implements sort_records and puts up a pull request with these tests:

# test 1:
assert(sort_records([{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }]) 
  == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])

# test 2:
assert(sort_records([]) == [])

What’s wrong with the set of these two tests?

(Think, then click!)

They only cover two not very interesting cases:

  1. An already-sorted list.
  2. The empty list.

A good test suite should check for lists that are out-of-order.

# test 3:
assert(sort_records([{ name: "Wendy", id: 2 }, { name: "Alice", id: 1}]) 
  == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])

What else should we test for?

(Think, then click!)

To start with: we should also test for other names, longer names, duplicate names—but that’s just the start!


Under-specification/non-determinism

Consider a list with duplicate names:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]

In our specification as described, we only care that the names are sorted.

So both of these outputs are potentially valid:

    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},

We might prefer the first output, which is stable. But maybe we don’t care!

Why might we not care?

(Think, then click!)

This means we can’t just write a test case with concrete input-output pairs!

In general, we might care about the properties an output has based on the input to the program, rather than the specific output.

Let’s think about types

What is the high-level type signature of sort_records?

(Think, then click!)
sort_records : List<Record> -> List<Record>

What is the high-level type signature of our testing module?

(Think, then click!)
<one test> : (List<Record>, List<Record>) -> boolean

<many tests> : List<(List<Record>, List<Record>)> -> boolean

We can use these types to guide us to an interesting solution. We need our testing modules to consider inputs and outputs considered by the program under tests, but we do not need to ourselves write the inputs and outputs!

Property-based testing: broken down

The key insights of property-based testing are:

  1. We don’t need to test each concrete input against a provided concrete output (we can instead see what output our program under test produces, then check properties of that output).
  2. We can automatically (semi-randomly) generate concrete inputs, then use the previous process to check the properties of their output.

Checking properties

What properties should be true?

(Think, then click!)

Property 1. The output is sorted with respect to keys.


The good news is: checking if a list is sorted has a faster O() runtime than sorting a list.

Why? (Think, then click!)

Sorting a list with n elements is nlog(n), because we need to compare elements.

We can check if a list is sorted with a single linear scan, so it is O(n). We just need to check that adjacent values in the list never are out-of-order.


Is testing for is_sorted enough?

Consider:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]
-> 
[
    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
]

We also need to check something about elements not disappearing!

Property 2. All records in the input are in the output.

Is this enough?

Consider:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]
-> 
[
    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
    {name: "Wendy", id: 2},
    {name: "Wendy", id: 5},
]

So we also need:

Property 3. All records in the output are in the input.

Property 2 and Property 3 could be combined to say: the input and output values are the same set.

We could now write a function that takes in a concrete input to sort_records, runs sort_records on that input, then checks that our 3 properties are met.

This approach is now resilient to harmless, specified non-determinism!

We’ll call this system “oracle”: it tells you if your answer is correct (up to some definition of correctness).

That’s one half of this solution, and a decent half: maybe you write the oracle and feed it a ton of inputs that you yourself think of. But ideally, we also want to improve the test-generation side.

Generating tests

Our complete goal is to both generate tests and check their validity with our oracle.

generate_input: void -> List<Record>

The core idea is that we can use randomness to escape the issue of not happening to think of the right edge case.

We can break down the problem into pieces:

Have we proven anything?

No! We still aren’t covering the complete input space.

To be continued next lecture!