Today, we’ll wrap up the preliminaries on mathematical logic and introduce a new concept, property-based testing.

Preliminaries: more on relations

Cartesian Product

The cartesian product of sets $A$ and $B$ is defined as:

$A \times B = {(a, b) | a \in A, b \in B}$

That is, the cartesian product is all possible combinations of the elements in set $A$ and set $B$ .

Exercise: if $A$ has size $M$ (which we can write as $| A | = M$ ) and $B$ has size $N$ , what is the size of $A \times B$ ?

(Click for answer)

$A \times B$ has size $M \cdot N$ .

We can have a cartesian product on a single set:

$S \times S = S^{2} = {(x, y) | x, y \in S}$

which is the same as:

$S \times S = S^{2} = {(x, y) | x \in S, y \in S}$

Relation definition, again

Now that we have cartesian products, we can more formally define a relation.

A relation $R$ on set $A \times B$ is any subset $R \subseteq A \times B$ .

Relation properties

For a relation $R$ on $A \times A$ , we can define useful properties:

Reflexive

$R$ is reflexive if for all $a \in A, (a, a) \in R$ .

“equal to” is reflexive.

Symmetric

$R$ is symmetric if for all $a, b \in A, (a, b) \in R \Rightarrow (b, a) \in R$ .

“classmate in CS340” is symmetric.

“student of” is not symmetric.

Transitive

$R$ is transitive if for all $a, b, c \in A, (a, b) \in R \land (b, c) \in R \Rightarrow (a, c) \in R$ .

“lives in the same city” is transitive.

Functions

Mathematical functions can be defined as a special kind of relation, where each value $a$ in $(a, b) \in R$ has one distinct value $b$ .

Said more formally:

A function $F$ on sets $A$ and $B$ is a subset $F \in A \times B$ such that:

For all $a \in A$ , these exists a $b \in B$ such that $(a, b) \in F$ , and
$(a, b) \in F \land (a, c) \in F \Rightarrow b = c$ .

The first condition says that every element in $A$ maps to something, and the second condition says that it maps to only one distinct thing.

programmer’s view: Python dictionaries are a close mental model for functions.

In class, we drew out the 2D plane to relate this to the vertical line test for determining if a line in the plane is a function (the 2D plane is a visual representation of $R^{2}$ ). A unit circle is not a function, because it does not pass the vertical line test, but it is a relation on $R \times R = R^{2}$ .

If $C$ is the unit circle:

$(0, 1) \in C$
$(0, - 1) \in C$
$C$ is a relation, $C \in R^{2}$ .

$C$ is not a function because $1 \neq - 1$ (visually, $C$ fails the vertical line test at x=0).

Propositions and predicates

Proposition

A proposition is a statement that can be true or false.

Wellesley College is in MA and Wellesley College is in CT are propositions.

Give me some chocolate. is a sentence but not a proposition.

Predicate

A predicate is a statement with a variable in it, that is true or false depending on the value of the variable.

$x > 1$ is a predicate. We might write this as $P (x) = x > 1$ .

If we have a variable in predicates that are used in an implication $P (x) \Rightarrow Q (x)$ , we read this as “for all $x, P (x) \Rightarrow Q (x)$ ”. The full statement is then a proposition. The proposition is false if there exists any $x$ such that $P (x)$ is $t r u e$ but $Q (x)$ is $f a l s e$ .

Operations

Basic operations

As a review, the basic logic operators are:

not, $\neg$

and, $\land$

or, $\lor$

xor, $\oplus$

Implication $\Rightarrow$

Recall the truth table for logical implication:

A B    A => B
-------------
T T    T
T F    F
F T    T
F F    T

Not that P implies Q is the same as not P or Q (you can work out the truth table to convince yourself).

In math:

$P \Rightarrow Q$ is equivalent to $\neg P \lor Q$ .

Exercise: Show that $P \Rightarrow Q$ is equivalent to $\neg Q \Rightarrow ?$ .

(Think, then click!)

Solution:
$P \Rightarrow Q \Leftrightarrow \neg Q \Rightarrow \neg P$ .

This can be obtained by starting with this definition of implication:

$\neg P \lor Q$

Then double-negating Q:

$\neg P \lor \neg \neg Q$

Flipping the order of the or:

$\neg \neg Q \lor \neg P$

And adding parenthesis to get back the shape of an implication:

$\neg (\neg Q) \lor \neg P$

This is called the contrapositive.

The contrapositive of:
Square implies rectangle.
is
Not rectangle implies not square.

These are equivalent (and $t r u e$ ).

Equivalence/bidirectional implication/iff

And the truth table for equivalence, or bidirectional implication:

A B    A <=> B
-------------
T T    T
T F    F
F T    F
F F    T

We often pronounce this as “if and only if” (iff). In many mathematical proofs of equivalences, you prove the “forward” direction and then prove the “backward” direction to show that two propositions are equivalent.

Quantifiers

Quantifiers help us turn predicates into propositions that have a $t r u e$ or $f a l s e$ value.

Universal:

“For all”, “for every”.

In math notation: $\forall$ .

Existential:

“For some”, “there exists”, “there is a”.

In math notation: $\exists$ .

Relationship between quantifiers

$\forall x . P (x) \Leftrightarrow \neg \exists x . \neg (P (x))$

This has a close relationship with counterexamples, which will be very important in CS340!

“For all inputs, the program produces the right output.”

is equivalent to

“There does not exist an input where the program produces the wrong output.”

We can also negate both sides:

$\neg (\forall x . P (x)) \Leftrightarrow \exists x . \neg (P (x))$

“It’s not the case that for all inputs, the program produces the right output.”

is equivalent to

“There exists some input where the program produces the wrong output.”

Property-Based Testing (PBT)

Now that we have the preliminaries down, we’ll get back down to business thinking about the correctness of computer systems.

Traditional test cases

In designing software, we write some code, and then maybe we write some tests to see if the code works.

Classically, tests are input and output pairs. Maybe we have multi-step tests that check multiple system outputs.

What are some strengths of testing software via traditional testing?

(Think, then click!)

(Relatively) easy to write.
Automated (maybe).
Usually pretty fast.
Can check for specific devious edge cases—the empty list, null data, etc.
Helps you detect if your code breaks from a later change.

What are some weaknesses of testing software via traditional testing?

(Think, then click!)

You likely will forget some edge cases, and maybe some are important.
They can be tedious to write.
There may be infinite possible inputs, so you can’t test them all.

We’ll now dig into an example that highlights some of the weaknesses in traditional unit testing.

Example program: sorting records

Let’s think about a program that sort records (lightweight objects). Each record has a name and an ID.

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
]

When defining a sort on this list, what do we need to specify?

(Think, then click!)

What are we sorting by, name or id or both? In what order?

Let’s say we just want to sort by names. Call our function sort_records.

Your coworker implements sort_records and puts up a pull request with these tests:

# test 1:
assert(sort_records([{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])
  == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])

# test 2:
assert(sort_records([]) == [])

What’s wrong with the set of these two tests?

(Think, then click!)

They only cover two not very interesting cases:

An already-sorted list.
The empty list.

A good test suite should check for lists that are out-of-order.

# test 3:
assert(sort_records([{ name: "Wendy", id: 2 }, { name: "Alice", id: 1}])
  == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])

What else should we test for?

(Think, then click!)

To start with: we should also test for other names, longer names, duplicate names—but that’s just the start!

Under-specification/non-determinism

Consider a list with duplicate names:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]

In our specification as described, we only care that the names are sorted.

So both of these outputs are potentially valid:

    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
    {name: "Wendy", id: 2},

    {name: "Alice", id: 5},
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},

We might prefer the first output, which is stable. But maybe we don’t care!

Why might we not care?

(Think, then click!)

We can think of testing as just testing the interface, not the implementation.
Maybe the other version is faster. We don’t want the tests to start failing because someone made a harmless change to the code.

This means we can’t just write a test case with concrete input-output pairs!

In general, we might care about the properties an output has based on the input to the program, rather than the specific output.

Let’s think about types

What is the high-level type signature of sort_records?

(Think, then click!)

sort_records : List<Record> -> List<Record>

What is the high-level type signature of our testing module?

(Think, then click!)

<one test> : (List<Record>, List<Record>) -> boolean

<many tests> : List<(List<Record>, List<Record>)> -> boolean

We can use these types to guide us to an interesting solution. We need our testing modules to consider inputs and outputs considered by the program under tests, but we do not need to ourselves write the inputs and outputs!

Property-based testing: broken down

The key insights of property-based testing are:

We don’t need to test each concrete input against a provided concrete output (we can instead see what output our program under test produces, then check properties of that output).
We can automatically (semi-randomly) generate concrete inputs, then use the previous process to check the properties of their output.

Checking properties

What properties should be true?

(Think, then click!)

Property 1. The output is sorted with respect to keys.

The good news is: checking if a list is sorted has a faster $O (\dots)$ runtime than sorting a list.

Why? (Think, then click!)

Sorting a list with $n$ elements is $n l o g (n)$ , because we need to compare elements.

We can check if a list is sorted with a single linear scan, so it is $O (n)$ . We just need to check that adjacent values in the list never are out-of-order.

Is testing for is_sorted enough?

Consider:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]
->
[
    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
]

We also need to check something about elements not disappearing!

Property 2. All records in the input are in the output.

Is this enough?

Consider:

[
    {name: "Alice", id: 1},
    {name: "Wendy", id: 2},
    {name: "Alice", id: 5},
]
->
[
    {name: "Alice", id: 1},
    {name: "Alice", id: 5},
    {name: "Wendy", id: 2},
    {name: "Wendy", id: 5},
]

So we also need:

Property 3. All records in the output are in the input.

Property 2 and Property 3 could be combined to say: the input and output values are the same set.

We could now write a function that takes in a concrete input to sort_records, runs sort_records on that input, then checks that our 3 properties are met.

This approach is now resilient to harmless, specified non-determinism!

We’ll call this system “oracle”: it tells you if your answer is correct (up to some definition of correctness).

That’s one half of this solution, and a decent half: maybe you write the oracle and feed it a ton of inputs that you yourself think of. But ideally, we also want to improve the test-generation side.

Generating tests

Our complete goal is to both generate tests and check their validity with our oracle.

generate_input: void -> List<Record>

The core idea is that we can use randomness to escape the issue of not happening to think of the right edge case.

We can break down the problem into pieces:

Generate random strings with letters, numbers, non-alphanumeric characters
Generate records consisting of these random strings
Generate lists of varying lengths consisting of these random records

Have we proven anything?

No! We still aren’t covering the complete input space.

To be continued next lecture!

Preliminaries: more on relations

Cartesian Product

Relation definition, again

Relation properties

Reflexive

Symmetric

Transitive

Functions

Propositions and predicates

Proposition

Predicate

Operations

Basic operations

Implication ⇒

Equivalence/bidirectional implication/iff

Quantifiers

Universal:

Existential:

Relationship between quantifiers

Property-Based Testing (PBT)

Traditional test cases

Example program: sorting records

Under-specification/non-determinism

Let’s think about types

Property-based testing: broken down

Checking properties

Generating tests

Have we proven anything?

Implication $\Rightarrow$