Notes: introduction & specifications

Welcome!

Welcome to CS340: Modeling for Computer Systems!

Agenda for today:

Introductions
Collaborative discussion of high-level course goal.
Course expectations, learning outcomes.
Some mysteries!

Professor introduction

I’m Alexa VanHattum—you can call me “Alexa”, “Prof. Alexa”, or “Prof VanHattum”.

This is my first year as an Assistant Professor at Wellesley. My research focus is the intersection of programming languages & computer systems, with a focus on applying lightweight formal methods (e.g., this course) to compilers for systems languages.

Before Wellesley, a version of this course (taught at Brown University by Tim Nelson) convinced me to pursue a PhD in the area. I spent time as a software engineer for Apple health, then completed grad school at Cornell.

Student introductions

(Names, pronouns, year, favorite programming project or programming language).

This is a small elective, and we want to all get to know eachother!

Correctness exercise

Let’s think about correctness in the context of programs you’ve written before.

Questions:

How many people have worked on programming projects for a class? [Everyone]
How many people have worked on programming projects not for a class? In what context? [Almost everyone]
How many people have worked on programming projects that have had users other than yourself (it’s ok if not! My answer to this question was no until my senior year of college!)? [Some folks]

Now I want you to step back, and think about how you knew what to program in each case.

In groups, discuss the following. Fill in your 2x4 sheet of paper: {class project, real users}
Write your logins in the top right.

A. How did you know what code to write? Did you know it fully ahead of time, or did you change what you were implementing as you went?
B. How did you know you were done programming?
C. How did you know if your design and code were correct?
D. If the code was incorrect, how would you find that out afterward? What were the implications of it being incorrect?

(Group discussion of these 8 cases). Main points: with course projects, you were given a specification of what to program in the form of instructions from your professor. You knew your program was done because you either (1) passsed some tests or (2) ran out of time. You knew your code was correct because of how many tests it passed, manually testing, or the grade you got. The implication was a potentially bad grade.

With external users and software out in the world, a complete specification might never exist. You may know what to program from clients, or your own ideas, or just be experimenting. You might be done when you reach a deadline, or when you pass code review. For code correctness, you have to write your own tests in this case. Maybe you interact with Quality Assurance (QA) or other forms of additional manual testing. If code is incorrect: maybe it’s caught in QA. Maybe your users don’t use your software how you expect, or complain about it online or in your bug tracker. In extreme cases, maybe the code is used in medical software, or avionics, or autonomous driving, and someone’s life is at risk.

Spectrum of correctness

We can think of the correctness of our program as existing along a wide spectrum.

On the far left, near 0% certainty of correctness, we have code that we wrote without every running.

On the far right, near (but typically not at) 100% certainty of correctness, we have implementations that have been proven (with machine-checked proofs) to meet a formal specification. Even in that case, we ultimately have to trust some layers of the proof infrastructure, so there is still a vanishingly small chance that there are correctness issues with our program.

0%                                                                            approaching 100%
<------------------------------------------------------------------------------------->
Never run the code!
  Make sure it compiles
     Make sure the code runs one input, manually
        Manually check several cases
            Write a few tests
                 Write a LOT of tests 
                                         | automatically 
                                         generating tests
                                             

                                                                        Write a formal specification
                                                                        Formally prove that your code 
                                                                        implements your specification. 
                                                                        "interactive theorem proving"
                                                                        lives here

                                -----------------------------------
                                            This course!
                                Sometimes called "lightwight formal methods".
                                we'll to the nice spot in the middle by
                                using MODELS of our system, instead of
                                proving the actual implementation of
                                every component against a formal 
                                specification.

Expectations

We went over the course website and syllabus, focusing on:

Assignments
Collaboration policy
Expectations around programming maturity, independence
Workshop sessions
Final project
Readings
Zulip, calendly, office hours
No exams

Mysteries

We considered some mysterious computer programs.

Mystery 1

// What does this function do? How do you know?
int mystery(int x) {
    if (x % 2 == 0) {
        return x;
    } else {
        return x - 1;
    }
}

Mystery 2

/*

Recall the heap layout from Malloc:                              
              HEAP        
     +--------------------+   
     |                    |   
     +--------------------+  
A  B | header   (32|0b11) |
L  L |   --------------   |  
L  O | payload            |
O  C | ...                |
C  K |                    |
     +--------------------+  
     |   UNKNOWN BLOCK    |
     |        ...         |
     +--------------------+  
F  B | header  (32|0b?0)  |
R  L |   --------------   |   
E  O | ... padding        |
E  C |              ....  |
   K |   --------------   |  
     | footer        (32) |
     +--------------------+
     |   ... blocks ...   |
     +--------------------+  


Question: given that we don't know the allocation status of  UNKNOWN BLOCK, 
is it possible for us to know if the following is true?
    "There is some free block whose predecessor is allocated"
*/

Mystery 3

Let’s model a basic public-key cryptography scheme!

Can someone explain the basic idea of public key cryptography?

For this to work to let two people communicate, they need a way to know they are speaking to the right person. We don’t need to fully understand the details for this example, but the important thread to follow is that each person can “prove” they are who they say they are by decoding messages that were encoded with their public key.

Alice                                     Wendy
Alice picks random "nonce",
initiates communication
   ----- {NonceA, I'm Alice}PublicW ---->

                                    Wendy sees that Alice 
                                    wants to communicate, sends 
                                    back old nonce (proving Wendy 
                                    can descrypt a hidden message 
                                    for Wendy) and a new nonce

   <------- {NonceA, NonceW}PublicA -----

Alice sends back the second 
nonce to prove Alice can 
descrypt a hidden message 
for Alice
   ----- {NonceW}PublicW ----------->

Now, do both parties know they are communicating with the
right person?

This scenario (a simplification of the Needham–Schroeder protocol)
was “proven” correct for over a decade, but it has a bug!

Man-in-the-Middle Attack

If Eve can trick Alice into starting a communication:

Alice                           Eve                     Alice's Bank
  -- {NonceA, I'm Alice}PublicE --->
                                ---- {NonceA, I'm Alice}PublicB ------>
                                <------ {NonceA, NonceB}PublicA -------
  <--- {NonceA, NonceB}PublicA ----
  ---- {NonceB}PublicE ----------- >
                                --------------- {NonceB}PublicB ------>

Now Alice is sending her bank info to Eve!

Mystery 4

We briefly touched on the Chord distributed systems protocol, and how a bug was found in it using Alloy, one of the tools we’ll learn in this class. More on this next time!