Today, we’ll wrap up the preliminaries on mathematical logic and introduce a new concept, property-based testing.
Preliminaries: Logic
Propositions and predicates
Proposition
A proposition is a statement that can be true or false.
Wellesley College is in MA
and Wellesley College is in CT
are propositions.
Give me some chocolate.
is a sentence but not a proposition.
Predicate
A predicate is a statement with a variable in it, that is true or false depending on the value of the variable.
is a predicate. We might write this as .
If we have a variable in predicates that are used in an implication , we read this as “for all ”. The full statement is then a proposition. The proposition is false if there exists any such that is but is .
Operations
Basic operations
As a review, the basic logic operators are:
not
,
and
,
or
,
xor
,
Implication
Recall the truth table for logical implication:
A B A => B
-------------
T T T
T F F
F T T
F F T
Not that P implies Q
is the same as not P or Q
(you can work out the truth table to convince yourself).
In math:
Exercise: Show that is equivalent to .
(Think, then click!)
Solution:
.
This can be obtained by starting with this definition of implication:
Then double-negating Q
:
Flipping the order of the or
:
And adding parenthesis to get back the shape of an implication:
This is called the contrapositive.
The contrapositive of:
Square implies rectangle.
is
Not rectangle implies not square.
These are equivalent (and ).
Equivalence/bidirectional implication/iff
And the truth table for equivalence, or bidirectional implication:
A B A <=> B
-------------
T T T
T F F
F T F
F F T
We often pronounce this as “if and only if” (iff). In many mathematical proofs of equivalences, you prove the “forward” direction and then prove the “backward” direction to show that two propositions are equivalent.
Quantifiers
Quantifiers help us turn predicates into propositions that have a or value.
Universal:
“For all”, “for every”.
In math notation: .
Existential:
“For some”, “there exists”, “there is a”.
In math notation: .
Relationship between quantifiers
This has a close relationship with counterexamples, which will be very important in CS340!
“For all inputs, the program produces the right output.”
is equivalent to
“There does not exist an input where the program produces the wrong output.”
We can also negate both sides:
“It’s not the case that for all inputs, the program produces the right output.”
is equivalent to
“There exists some input where the program produces the wrong output.”
Property-Based Testing (PBT)
Now that we have the preliminaries down, we’ll get back down to business thinking about the correctness of computer systems.
Traditional test cases
In designing software, we write some code, and then maybe we write some tests to see if the code works.
Classically, tests are input and output pairs. Maybe we have multi-step tests that check multiple system outputs.
What are some strengths of testing software via traditional testing?
(Think, then click!)
- (Relatively) easy to write.
- Automated (maybe).
- Usually pretty fast.
- Can check for specific devious edge cases—the empty list, null data, etc.
- Helps you detect if your code breaks from a later change.
What are some weaknesses of testing software via traditional testing?
(Think, then click!)
- You likely will forget some edge cases, and maybe some are important.
- They can be tedious to write.
- There may be infinite possible inputs, so you can’t test them all.
We’ll now dig into an example that highlights some of the weaknesses in traditional unit testing.
Example program: sorting records
Let’s think about a program that sort records (lightweight objects). Each record has a name and an ID.
[
{name: "Alice", id: 1},
{name: "Wendy", id: 2},
]
When defining a sort on this list, what do we need to specify?
(Think, then click!)
What are we sorting by, name or id or both? In what order?
Let’s say we just want to sort by names. Call our function sort_records
.
Your coworker implements sort_records
and puts up a pull request with these tests:
assert(sort_records([{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])
== [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])
assert(sort_records([]) == [])
What’s wrong with the set of these two tests?
(Think, then click!)
They only cover two not very interesting cases:
- An already-sorted list.
- The empty list.
A good test suite should check for lists that are out-of-order.
assert(sort_records([{ name: "Wendy", id: 2 }, { name: "Alice", id: 1}])
== [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])
What else should we test for?
(Think, then click!)
To start with: we should also test for other names, longer names, duplicate names—but that’s just the start!
Under-specification/non-determinism
Consider a list with duplicate names:
[
{name: "Alice", id: 1},
{name: "Wendy", id: 2},
{name: "Alice", id: 5},
]
In our specification as described, we only care that the names are sorted.
So both of these outputs are potentially valid:
{name: "Alice", id: 1},
{name: "Alice", id: 5},
{name: "Wendy", id: 2},
{name: "Alice", id: 5},
{name: "Alice", id: 1},
{name: "Wendy", id: 2},
We might prefer the first output, which is stable. But maybe we don’t care!
Why might we not care?
(Think, then click!)
- We can think of testing as just testing the interface, not the implementation.
- Maybe the other version is faster. We don’t want the tests to start failing because someone made a harmless change to the code.
This means we can’t just write a test case with concrete input-output pairs!
In general, we might care about the properties an output has based on the input to the program, rather than the specific output.
Let’s think about types
What is the high-level type signature of sort_records
?
(Think, then click!)
sort_records : List<Record> -> List<Record>
What is the high-level type signature of our testing module?
(Think, then click!)
<one test> : (List<Record>, List<Record>) -> boolean
<many tests> : List<(List<Record>, List<Record>)> -> boolean
We can use these types to guide us to an interesting solution. We need our testing modules to consider inputs and outputs considered by the program under tests, but we do not need to ourselves write the inputs and outputs!
Property-based testing: broken down
The key insights of property-based testing are:
- We don’t need to test each concrete input against a provided concrete output (we can instead see what output our program under test produces, then check properties of that output).
- We can automatically (semi-randomly) generate concrete inputs, then use the previous process to check the properties of their output.
Checking properties
What properties should be true?
(Think, then click!)
Property 1. The output is sorted with respect to keys.
The good news is: checking if a list is sorted has a faster runtime than sorting a list.
Why? (Think, then click!)
Sorting a list with elements is , because we need to compare elements.
We can check if a list is sorted with a single linear scan, so it is . We just need to check that adjacent values in the list never are out-of-order.
Is testing for is_sorted
enough?
Consider:
[
{name: "Alice", id: 1},
{name: "Wendy", id: 2},
{name: "Alice", id: 5},
]
->
[
{name: "Alice", id: 1},
{name: "Alice", id: 5},
]
We also need to check something about elements not disappearing!
Property 2. All records in the input are in the output.
Is this enough?
Consider:
[
{name: "Alice", id: 1},
{name: "Wendy", id: 2},
{name: "Alice", id: 5},
]
->
[
{name: "Alice", id: 1},
{name: "Alice", id: 5},
{name: "Wendy", id: 2},
{name: "Wendy", id: 5},
]
So we also need:
Property 3. All records in the output are in the input.
Property 2 and Property 3 could be combined to say: the input and output values are the same set.
We could now write a function that takes in a concrete input to sort_records
, runs sort_records
on that input, then checks that our 3 properties are met.
This approach is now resilient to harmless, specified non-determinism!
We’ll call this system “oracle”: it tells you if your answer is correct (up to some definition of correctness).
That’s one half of this solution, and a decent half: maybe you write the oracle and feed it a ton of inputs that you yourself think of. But ideally, we also want to improve the test-generation side.
Generating tests
Our complete goal is to both generate tests and check their validity with our oracle.
generate_input: void -> List<Record>
The core idea is that we can use randomness to escape the issue of not happening to think of the right edge case.
We can break down the problem into pieces:
- Generate random strings with letters, numbers, non-alphanumeric characters
- Generate records consisting of these random strings
- Generate lists of varying lengths consisting of these random records
Have we proven anything?
No! We still aren’t covering the complete input space.
To be continued next lecture!
Today, we’ll wrap up the preliminaries on mathematical logic and introduce a new concept, property-based testing.
Preliminaries: Logic
Propositions and predicates
Proposition
A proposition is a statement that can be true or false.
Wellesley College is in MA
andWellesley College is in CT
are propositions.Give me some chocolate.
is a sentence but not a proposition.Predicate
A predicate is a statement with a variable in it, that is true or false depending on the value of the variable.
If we have a variable in predicates that are used in an implication , we read this as “for all ”. The full statement is then a proposition. The proposition is false if there exists any such that is but is .
Operations
Basic operations
As a review, the basic logic operators are:
not
,and
,or
,xor
,Implication
Recall the truth table for logical implication:
Not that
P implies Q
is the same asnot P or Q
(you can work out the truth table to convince yourself).In math:
Exercise: Show that is equivalent to .
(Think, then click!)
Solution:
.
This can be obtained by starting with this definition of implication:
Then double-negating
Q
:Flipping the order of the
or
:And adding parenthesis to get back the shape of an implication:
This is called the contrapositive.
The contrapositive of:
Square implies rectangle.
is
Not rectangle implies not square.
These are equivalent (and ).
Equivalence/bidirectional implication/iff
And the truth table for equivalence, or bidirectional implication:
We often pronounce this as “if and only if” (iff). In many mathematical proofs of equivalences, you prove the “forward” direction and then prove the “backward” direction to show that two propositions are equivalent.
Quantifiers
Quantifiers help us turn predicates into propositions that have a or value.
Universal:
“For all”, “for every”.
In math notation: .
Existential:
“For some”, “there exists”, “there is a”.
In math notation: .
Relationship between quantifiers
This has a close relationship with counterexamples, which will be very important in CS340!
“For all inputs, the program produces the right output.”
is equivalent to
“There does not exist an input where the program produces the wrong output.”
We can also negate both sides:
“It’s not the case that for all inputs, the program produces the right output.”
is equivalent to
“There exists some input where the program produces the wrong output.”
Property-Based Testing (PBT)
Now that we have the preliminaries down, we’ll get back down to business thinking about the correctness of computer systems.
Traditional test cases
In designing software, we write some code, and then maybe we write some tests to see if the code works.
Classically, tests are input and output pairs. Maybe we have multi-step tests that check multiple system outputs.
What are some strengths of testing software via traditional testing?
(Think, then click!)
What are some weaknesses of testing software via traditional testing?
(Think, then click!)
We’ll now dig into an example that highlights some of the weaknesses in traditional unit testing.
Example program: sorting records
Let’s think about a program that sort records (lightweight objects). Each record has a name and an ID.
[ {name: "Alice", id: 1}, {name: "Wendy", id: 2}, ]
When defining a sort on this list, what do we need to specify?
(Think, then click!)
What are we sorting by, name or id or both? In what order?
Let’s say we just want to sort by names. Call our function
sort_records
.Your coworker implements
sort_records
and puts up a pull request with these tests:# test 1: assert(sort_records([{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }]) == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }]) # test 2: assert(sort_records([]) == [])
What’s wrong with the set of these two tests?
(Think, then click!)
They only cover two not very interesting cases:
A good test suite should check for lists that are out-of-order.
# test 3: assert(sort_records([{ name: "Wendy", id: 2 }, { name: "Alice", id: 1}]) == [{ name: "Alice", id: 1}, { name: "Wendy", id: 2 }])
What else should we test for?
(Think, then click!)
To start with: we should also test for other names, longer names, duplicate names—but that’s just the start!
Under-specification/non-determinism
Consider a list with duplicate names:
[ {name: "Alice", id: 1}, {name: "Wendy", id: 2}, {name: "Alice", id: 5}, ]
In our specification as described, we only care that the names are sorted.
So both of these outputs are potentially valid:
{name: "Alice", id: 1}, {name: "Alice", id: 5}, {name: "Wendy", id: 2},
{name: "Alice", id: 5}, {name: "Alice", id: 1}, {name: "Wendy", id: 2},
We might prefer the first output, which is stable. But maybe we don’t care!
Why might we not care?
(Think, then click!)
This means we can’t just write a test case with concrete input-output pairs!
In general, we might care about the properties an output has based on the input to the program, rather than the specific output.
Let’s think about types
What is the high-level type signature of
sort_records
?(Think, then click!)
What is the high-level type signature of our testing module?
(Think, then click!)
We can use these types to guide us to an interesting solution. We need our testing modules to consider inputs and outputs considered by the program under tests, but we do not need to ourselves write the inputs and outputs!
Property-based testing: broken down
The key insights of property-based testing are:
Checking properties
What properties should be true?
(Think, then click!)
Property 1. The output is sorted with respect to keys.
The good news is: checking if a list is sorted has a faster runtime than sorting a list.
Why? (Think, then click!)
Sorting a list with elements is , because we need to compare elements.
We can check if a list is sorted with a single linear scan, so it is . We just need to check that adjacent values in the list never are out-of-order.
Is testing for
is_sorted
enough?Consider:
[ {name: "Alice", id: 1}, {name: "Wendy", id: 2}, {name: "Alice", id: 5}, ] -> [ {name: "Alice", id: 1}, {name: "Alice", id: 5}, ]
We also need to check something about elements not disappearing!
Property 2. All records in the input are in the output.
Is this enough?
Consider:
[ {name: "Alice", id: 1}, {name: "Wendy", id: 2}, {name: "Alice", id: 5}, ] -> [ {name: "Alice", id: 1}, {name: "Alice", id: 5}, {name: "Wendy", id: 2}, {name: "Wendy", id: 5}, ]
So we also need:
Property 3. All records in the output are in the input.
Property 2 and Property 3 could be combined to say: the input and output values are the same set.
We could now write a function that takes in a concrete input to
sort_records
, runssort_records
on that input, then checks that our 3 properties are met.This approach is now resilient to harmless, specified non-determinism!
We’ll call this system “oracle”: it tells you if your answer is correct (up to some definition of correctness).
That’s one half of this solution, and a decent half: maybe you write the oracle and feed it a ton of inputs that you yourself think of. But ideally, we also want to improve the test-generation side.
Generating tests
Our complete goal is to both generate tests and check their validity with our oracle.
The core idea is that we can use randomness to escape the issue of not happening to think of the right edge case.
We can break down the problem into pieces:
Have we proven anything?
No! We still aren’t covering the complete input space.
To be continued next lecture!