• Due: Mon., Dec. 12
  • Notes:
    • Although formally “due” on Mon., Dec. 12, this (and all your other as-yet-incomplete CS251 work) must actually be completed by the end of finals period
    • This pset does not have any solo problems.
    • This pset has 100 total points.
    • The problems needn’t be done in order. Feel free to jump around.
  • Submission:
    • In your yourFullName CS251 Fall 2016 Folder, create a Google Doc named yourFullName CS251 PS8.
    • At the top of your yourFullName CS251 PS8 doc, include your name, problem set number, date of submission, and an approximation of how long each problem part took.
    • For all parts of all problems, include all answers (including SML code) in your PS8 google doc. Format code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
    • For Problems 1 and 2, include your interpretation/compilation derivations. It is easiest to write these via nested bullets, as shown in both the pset description and the lecture notes.
    • In Problems 3 through 5, you will modify the following code files from starter files populating your ~wx/cs251/sml/ps8 directory on your wx Virtual Machine:

      • Problem 3: IntexArgChecker.sml.sml
      • Problem 4: BindexToPostFix.sml
      • Problem 5: Simprex.sml and SimprexEnvInterp.sml

      Drop a copy of your entire ps8 directory into your ~/cs251/drop/ps08 drop folder on cs.wellesley.edu.

    • For Problems 3 through 5 you should include all modified functions from the four files listed above in your Google Doc (so that I can comment on them). In the Google Doc, you need only include the functions you modified, not all functions.

Starting this Problem Set

Problems 3 through 5 involve starter files in the ~wx/cs251/sml/ps8 directory in your wx Virtual Machine.

To create this directory, execute the following two commands in a wx VM shell:

cd ~/cs251/sml
git pull origin master

1. Deriving Implementations (10 points)

As we discussed in the Metaprogramming lecture, there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:

The Interpreter Rule (I)

The Translator Rule (T)

In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “S-to-T translator machine” as an “S-to-T translator” or “S-to-T compiler”. We will also often elide the word “program”; e.g., we will refer to a “P-in-L program” as “P-in-L”. So an “L interpreter” means an “L interpreter machine” while a “S-in-L interpreter” means an “S-in-L interpreter program”. Similar remarks hold for translators (also called compilers): an “S-to-T translator” means an “S-to-T translator machine”, while a “S-to-T-in-L translator” means an “S-to-T-in-L translator program”.

Example

For example, we considered the following elements in class:

  • a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
  • an HTML-interpreter-in-C program (i.e., a web browser written in C)
  • a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
  • a x86 computer (i.e., an x86 interpreter machine)

They can be used to build a 251 web page display machine as follows:

Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.

Writing Derivations

Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:

  • 251 web page machine (I)
    • 251-web-page-in-HTML program
    • HTML interpreter machine (I)
      • HTML-interpreter-in-x86 program (T)
        • HTML-interpreter-in-C program
        • C-to-x86 compiler machine (I)
          • C-to-x86-compiler-in-x86 program
          • x86 computer
      • x86 computer

Questions

  1. (5 points) Suppose you are given the following:

    • A P-in-Java program
    • a Java-to-C-compiler-in-Racket program (i.e., a Java-to-C compiler written in Racket);
    • a Racket-interpreter-in-x86 program (i.e., a Racket interpreter written in x86 machine code);
    • a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86 code);
    • an x86 computer (i.e., an x86 interpreter machine).

    Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the P-in-Java program on the computer. That is, show how to construct a P machine.

  2. (5 points) Suppose you are given:

    Show how to construct a GraphViz machine.

2. Bootstrapping (20 points)

In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:

  1. (8 points) Stage II of the paper describes how a C compiler can be “taught” to recognize '\v' as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:

    • the usual C programming language
    • C+\v – an extension to C in which '\v' is treated as the vertical tab character (which has ASCII value 11).

    Suppose we are given the following:

    • a C-to-binary compiler (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
    • a C+\v-to-binary-compiler-in-C (Figure 2.3 in Thompson’s paper);
    • a C+\v-to-binary-compiler-in-C+\v (what should be Figure 2.2 in Thompson’s paper);
    • a machine that executes the given type of binary code.

    Construct a proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.

  2. (12 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:

    • a C-to-binary-compiler (again, just a “black box” we are given);
    • a C-to-binary-compiler-in-C without Trojan Horses (Figure 3.1 in Thompson’s paper);
    • a C-to-binary-compilerTH-in-C with two Trojan Horses (Figure 3.3 in Thompson’s paper);
    • a login-in-C program with no Trojan Horse;
    • a binary-based computer;

    The subscript TH indicates that a program contains a Trojan Horse. A C-to-binary compilerTH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:

    The Trojan Horse Rule (TH)

    Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program loginTH-in-binary that is the result of compiling login-in-C with a C compiler that is the result of compiling C-to-binary-compiler-in-C. The interesting thing about this derivation is that loginTH-in-binary contains a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C) contains no code for inserting Trojan horses!

3. Static Argument Checking in Intex (15 points)

As described in the lecture slides on Intex, it is possible to statically determine (i.e., without running the program) whether an Intex program contains a argument reference with an out-of-bound index. In this problem you will flesh out skeleton implementations of the two approaches for static argument checking sketched in the slides. The skeleton implementations are in the file ps8/IntexArgChecker.sml.

  1. Top-down checker (7 points): In the top-down approach, the checkTopDown function on a program passes the number of arguments to the checkExpTopDown function on expressions, which recursively passes it down the subexpressions of the abstract syntax tree. When an Arg expression is reached, the index is examined to determine a boolean that indicates whether it is in bounds. The boolean on all subexpressions are combined in the upward phase of recursion to determine a boolean for the whole tree. Complete the following skeleton to flesh out this approach.

    (* val checkTopDown: pgm -> bool *)
    fun checkTopDown (Intex(numargs,body)) =
      checkExpTopDown numargs body
    
    (* val checkExpTopDown : int -> Intex.exp -> bool *)
    and checkExpTopDown numargs (Int i) = raise Unimplemented
      | checkExpTopDown numargs (Arg index) = raise Unimplemented
      | checkExpTopDown numargs (BinApp(_,exp1,exp2)) = raise Unimplemented

    Uncomment the definition of topDownChecks to test your implementation.

  2. Bottom-up checker (6 points): In the bottom-up approach, the checkExpBottomUp function returns a pair (min, max) of the minimum and maximum argument indices that appear in an expression. If an expression contains no argument indices, it should return the (valOf Int.maxInt, valOf Int.minInt), which is a pair of SML’s (1) maximum integer and (2) minumum integer. The funtions Int.min and Int.max are useful for combining such pairs. (Note that Int.maxInt is the identity for Int.min, and Int.minInt is the identity for Int.max.)

    The checkBottomUp function on programs returns true if all argument references are positive and less than or equal to the programs number of arguments, and false otherwise. Complete the following skeleton to flesh out this approach.

    (* checkBottomUp: pgm -> bool *)
    fun checkBottomUp (Intex(numargs,body)) =
      let val (min,max) = checkExpBottomUp body
       in 0 < min andalso max <= numargs
      end
       
    (* val checkExpBottomUp : Intex.exp -> int * int *)
    and checkExpBottomUp (Int i) = (valOf Int.maxInt, valOf Int.minInt)
      | checkExpBottomUp (Arg index) = raise Unimplemented
      | checkExpBottomUp (BinApp(_,exp1,exp2)) = raise Unimplemented

    Uncomment the definition of bottomUpChecks to test your implementation.

  3. Static vs dynamic checking (2 points): A static argument index checker flags programs that might dynamically raise an index-out-of-bounds error, but does not guarantee they will dynamically raise such an error. Give an example of an Intex program for which checkTopDown and checkBottomUp returns false but does not raise an argument-out-of-bounds error when run on particular arguments. Hint: such a program can raise a different error.

4. Bindex To PostFix (20 points)

Background

In lecture, we studied the following intexToPostFix function that automatically translates Intex programs to equivalent PostFix programs:

fun intexToPostFix (Intex.Intex(numargs, exp)) =
  PostFix.PostFix(numargs, expToCmds exp 0)
    (* 0 is a depth argument that statically tracks 
       how many values are on stack above the arguments *)

and expToCmds (Intex.Int i) depth = [PostFix.Int i]
  | expToCmds (Intex.Arg index) depth = [PostFix.Int (index + depth), PostFix.Nget]
   (* specified argument is on stack at index + depth *)
  | expToCmds (Intex.BinApp(binop,exp1,exp2)) depth =
    (expToCmds exp1 depth) (* 1st operand is at same depth as whole binapp *)
    @ (expToCmds exp2 (depth + 1)) (* for 2nd operand, add 1 to depth 
                                      to account for 1st operand *)
    @ [PostFix.Arithop (binopToArithop binop)]

and binopToArithop Intex.Add = PostFix.Add
  | binopToArithop Intex.Sub = PostFix.Sub
  | binopToArithop Intex.Mul = PostFix.Mul
  | binopToArithop Intex.Div = PostFix.Div
  | binopToArithop Intex.Rem = PostFix.Rem

For example, given the Intex program intexP2 expressed as an Intex.pgm datatype instance

val intexP2 = Intex(4, BinApp(Mul,
                              BinApp(Sub, Arg 1, Arg 2),
                              BinApp(Div, Arg 3, Arg 4)))

the intexToPostFix translation is as follows:

- intexToPostFix(intexP2);
val it =
  PostFix
    (4,
     [Int 1,Nget,Int 3,Nget,Arithop Sub,Int 4,Nget,Int 6,Nget,Arithop Div,
      Arithop Mul]) : PostFix.pgm

With the helper function translateString shown below, the input and output of translation can be expressed as strings in s-expression format:

fun translateString intexPgmString =
  PostFix.pgmToString (intexToPostFix (Intex.stringToPgm intexPgmString))
- IntexToPostFix.translateString("(intex 4 (* (- ($ 1) ($ 2)) (/ ($ 3) ($ 4))))");
val it = "(postfix 4 1 nget 3 nget sub 4 nget 6 nget div mul)" : string

There are two key aspects to the translation from Intex to PostFix:

  1. The expToCmds function translates every Intex expression to a sequence of PostFix commands that, when executed, will push the integer value of that expression onto the stack. In the case of translating a binary application expression, these sequence is composed by appending the sequences for the two operands followed by the appropriate arithmetic operator command.

  2. The trickiest part of expToCommands is knowing how many integers are on the stack above the program arguments, so that references to Intex arguments by index can be appropriately translated. This is handled by a depth argument to expToCommands that keeps track of this number. Note that the first operand of a binary application has the same depth as the whole binary application expression, but the second operand has a depth one greater than the first to account for the integer value of the first operand pushed onto the stack.

Your Task

In this problem, you will implement a similar translator from Bindex programs to PostFix programs.

The file ps8/BindexToPostFix.sml contains a skeleton of the Bindex to PostFix translator in which all Bindex expressions translate to a command sequence consisting of the single integer command 42:

fun makeArgEnv args = Env.make args (Utils.range 1 ((length args) + 1))
(* returned env associates each arg name with its 1-based index *)

fun envPushAll env = Env.map (fn index => index + 1) env
(* add one to the index for each name in env *)

fun envPush name env = Env.bind name 1 (envPushAll env)

fun bindexToPostFix (Bindex.Bindex(args, body)) =
  PostFix.PostFix(length args, expToCmds body (makeArgEnv args))

(* In expToCmds, env statically tracks the depth of each named variable
   value on the stack *)
and expToCmds (Bindex.Int i) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.Var name) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.BinApp(binop, rand1, rand2)) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.Bind(name, defn, body)) env = [PostFix.Int 42] (* replace this stub *)

and binopToArithop Bindex.Add = PostFix.Add
  | binopToArithop Bindex.Sub = PostFix.Sub
  | binopToArithop Bindex.Mul = PostFix.Mul
  | binopToArithop Bindex.Div = PostFix.Div
  | binopToArithop Bindex.Rem = PostFix.Rem

When loaded, the BindexToPostFix.sml file performs numerous tests of the translator. Initially, all of these fail. Your task is to redefine the clauses of expToCmds so that all the test cases succeed.

Notes:

  • The expToCmds function in the Bindex to PostFix translator uses an environment to track the depth of each variable name in the program. Recall that an environment (defined by the Env structure in utils/Env.sml) maps names to values. In this case, it maps names in a Bindex program to the current stack depth of the value associated with that name. Carefully study the helper functions makeArgEnv, envPushAll, and envPush to understand what they do; all are helpful in the contex of this problem.

  • As in the Intex to PostFix translator, in the Bindex to PostFix translator, each Bindex expression should translate to a sequence of PostFix commands that, when executed, pushes exactly one value — the integer value of that expression — onto the stack.

  • For simplicity, you may assume there are no unbound variables in the Bindex program you are translating. With this assumption, you may use the Option.valOf function to extract the value v from a SOME(v) value.

  • The BindexToPostFix structure contains a testTranslator function that takes the name of a file containing a Bindex program and a list of argument lists and (1) displays the Bindex program as an s-expression string, (2) displays the PostFix program resulting from the translation as an s-expression string and (3) tests the behavior of both programs on each of the argument lists in the list of argument lists. If the behaviors match, it prints an approprate message; if they don’t match, it reports an an error with the two different result values. For instance, in the initial skelton, the behavioral tests fail on the program in ../bindex/avg.bdx:

    Testing Bindex program file ../bindex/avg.bdx
    
    Bindex program input to translation:
    (bindex (a b) (/ (+ a b) 2))
    
    PostFix program output of translation:
    (postfix 2 42)
    
    Testing args [3,7]: *** ERROR IN TRANSLATION ***
      Bindex result: 5
      PostFix result: 42
    
    Testing args [15,5]: *** ERROR IN TRANSLATION ***
      Bindex result: 10
      PostFix result: 42

    However, when the translator is working, the following will be displayed:

    Testing Bindex program file ../bindex/avg.bdx
    
    Bindex program input to translation:
    (bindex (a b) (/ (+ a b) 2))
    
    PostFix program output of translation:
    (postfix 2 1 nget 3 nget add 2 div)
    
    Testing args [3,7]: both programs return 5
    
    Testing args [15,5]: both programs return 10
  • You have succeeded when all of the test cases succeed when you load the .sml file.

5. Simprex (35 points)

Background

Rae Q. Cerf of Toy-Languages-я-Us likes the sigma construct from the Bindex lecture, but she wants something more general. In addition to expressing sums, she would also like to express numeric functions like factorial and exponentiation that are easily definable via simple recursion. The functions that Rae wants to define all have the following form:

f(n) = z, if n  0
f(n) = c(n, f(n-1)), if n > 0

Here, z is an integer that defines the value of f for any non-positive integer value and and c is a binary combining function that combines n and the value of f(n1) for any positive n. Expanding the definition yields:

f(n) = c(n, c(n-1, c(n2, ... c(2, c(1, z)))))

For example, to define the factorial function, Rae uses:

z_fact = 1
c_fact(i, a) = i*a

To define the exponentation function bn, Rae uses:

z_expt = 1
c_expt(i, a) = b*a

In this case, c_expt ignores its first argument, but the fact that c_expt is called n times is important.

As another example, Rae defines the sum of the squares of the integers between 1 and n using

z_sos = 0
c_sos(i, a) = (i*i) + a

The examples considered so far don’t distinguish right-to-left vs. left-to-right evaluation, A simple example of that is

z_sub = 0
c_sub(i, a) = i-a

For example, when n is 4, the result is the value of 4 - (3 - (2 - (1 - 0))), which is 2, and not the value of 1 - (2 - (3 - (4 - 0))), which is -2.

Another example that distinguishes right-to-left vs. left-to-right evaluation is using Horner’s method to calculate the value of the polynomial x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n:

z_horner = 0
c_horner(i, a) = i + x*a

For example, when n is 4 and x is 5, the result is (4 + 5*(3 + 5*(2 + 5*(1 + 5*0)))) = 194, which is the value of 5^3 + 2*5^2 + 3*5^1 + 4 = (125 + 50 + 15 + 4) = 194.

Rae designs an extension to Bindex named Simprex that adds a new simprec construct for expressing her simple recursions:

(simprec E_zero (Id_num Id_ans E_combine ) E_arg)

This simprec expression evaluates E_arg to the integer value n_arg and E_zero to the integer value n_zero. If n_arg ≤ 0, it returns n_zero. Otherwise, it returns the value

c(n_arg, c(n_arg1, c(n_arg2, ... c(2, c(1, nzero)))))

where c is the binary combining function specified by (Id_num Id_ans E_combine). This denotes a two-argument function whose two formal parameters are named Id_num and Id_ans and whose body is E_combine. The Id_num parameter ranges over the numbers from n_arg down to 1, while the Id_ans parameter ranges over the answers built up by c starting at n_zero. The scope of Id_num and Id_ans includes only E_combine; it does not include E_zero or E_arg.

Rae’s simprec expression is closely related to the notion of primitive recursive functions defined in the theory of computation.

Here are some sample Simprex programs:

;; Program that calculuates the factorial of n
(simprex (n) (simprec 1 (i a (* i a)) n))

;; Exponentiation program raising base b to power p 
(simprex (b p) (simprec 1 (i ans (* b ans)) p))

;; Program summing the squares of the numbers from 1 to hi
(simprex (hi) (simprec 0 (i sumSoFar (+ (* i i) sumSoFar)) hi))

;; Program distingishing right-to-left and left-to-right evaluation via subtraction
(simprex (n) (simprec 0 (i ans (- i ans)) n))

;; Calculating x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n via Horner's method
(simprex (n x) (simprec 0 (i a (+ i (* x a)) n)))

Your Tasks

After completing her design, Rae is called away to work on another language design problem. Toy-Languages-я-Us is impressed with your CS251 background, and has hired you to implement the Simprex language, starting with a version of the Sigmex implementation. (Sigmex is the name of the language that results from extending Bindex with the sigma construct from lecture.) Your first week on the job, you are asked to complete the following tasks that Rae has specified in a memo she has written about finishing her project.

  1. (15 points) Rae’s memo contains the following Simprex test programs. Give the results of running each of the programs on the argument 3. Show your work so that you may get partial credit if your answer is incorrect.

    ;; program 1 (2 points)
    (simprex (a) (simprec 0 (b c (+ 2 c)) a))
    
    ;; program 2 (3 points)
    (simprex (x) (simprec 0 (n sum (+ n (* x sum))) 4))
    
    ;; program 3 (4 points)
    (simprex (y) (simprec 0 (a b (+ b (sigma c 1 a (* a c)))) y))
    
    ;; program 4 (6 points)
    (simprex (n) 
      (simprec (simprec (* n (- n 3)) (q r r) (* n n))
               (c d (+ d (simprec 0 (x sum (+ sum (- (* 2 x) 1))) c))) 
               (simprec -5 (a b (+ 1 b)) (* n n))))
  2. (20 points)

    Rae has created a skeleton implementation of Simprex by modifying the files for the Sigmex (= Bindex + sigma) implementation to contain stubs for the simprec construct. Her modified files, which are named Simprex.sml and SimprexEnvInterp.sml, can be found in the ps8 folder.

    Finish Rae’s implementation of the Simprex language by completing the following four tasks, which Rae has listed in her memo:

    1. (2 points) Extend the definition of sexpToExp in Simprex.sml to correctly parse simprec expressions.

    2. (2 points) Extend the definition of expToSexp in Simprex.sml to correctly unparse simprec expressions.

    3. (5 points) Extend the definition of freeVarsExp in Simprex.sml to correctly determine the free variables of a simprec expression.

    4. (11 points) Extend the definition of eval in SimprexEnvInterp.sml to correctly evaluate simprec expressions using the environment model.

    Notes:

    • This problem is similar the the Sigmex (= Bindex + sigma) language extension we implemented in lecture. You may wish to study the solution files bindex/SigmexSolns.sml and bindex/SigmexEnvInterp.sml as part of doing this problem.

    • In Simprec.sml, the exp type is defined to be:

      and exp = Int of int (* integer literal with value *)
              | Var of var (* variable reference *)
              | BinApp of binop * exp * exp (* binary operator application with rator, rands *)
              | Bind of var * exp * exp (* bind name to value of defn in body *)
              | Sigma of var * exp * exp * exp (* E_lo, E_hi, E_body *)
              | Simprec of exp * var * var * exp * exp 
                           (* zeroExp * numVar * ansVar * combExp * argExp *)

      The s-expression notation (simprec E_zero (I_num I_ans E_combine) E_arg) is represented in SML as

      Simprec (<exp for E-zero>, <string for I_num>, <string for I_ans>, 
               <exp for E_combine>, <exp for E_arg>)

      For example, the expression (simprec 1 (x a (* x a)) n) is represented in SML as Simprec(Int 1, "x", "a", BinApp(Mul, Var "x", Var "a"), Var "n")

    • You can test your modifications of sexpToExp, expToExp, and freeVarsExp via use "Simprex.sml" after you have first uncommented the test cases at the end of Simprex.sml. This will display the results of various test cases for freeVarsExp before displaying the Simprex structure. It will also test sexpToExp and expToSexp. If you uncomment the test cases before you have correctly modified sexpToExp and expToSexp, any attempt to use "Simprec.sml" or use "SimprecEnvInterp.sml" will fail with a SyntaxError.

    • You can test your modifications of eval via use "SimprexEnvInterp.sml" after you have first uncommented the test cases at the end of SimprexEnvInterp.sml. This will display the results of various test cases for eval before displaying the SimprexEnvInterp structure.

Extra Credit: Self-printing Program (15 points)

In his Turing-lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a self-printing program. Write a self-printing program in any general-purpose programming language you choose. Your program should be self-contained — it should not read anything from the file system.

Important Note: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Do this completely one your own, without looking anything up online.