PS8: Open to Interpretation
- Due: Mon., Dec. 12
- Notes:
- Although formally “due” on Mon., Dec. 12, this (and all your other as-yet-incomplete CS251 work) must actually be completed by the end of finals period
- This pset does not have any solo problems.
- This pset has 100 total points.
- The problems needn’t be done in order. Feel free to jump around.
- Submission:
- In your yourFullName CS251 Fall 2016 Folder, create a Google Doc named yourFullName CS251 PS8.
- At the top of your yourFullName CS251 PS8 doc, include your name, problem set number, date of submission, and an approximation of how long each problem part took.
- For all parts of all problems, include all answers (including SML code) in your PS8 google doc. Format code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
- For Problems 1 and 2, include your interpretation/compilation derivations. It is easiest to write these via nested bullets, as shown in both the pset description and the lecture notes.
-
In Problems 3 through 5, you will modify the following code files from starter files populating your
~wx/cs251/sml/ps8directory on your wx Virtual Machine:- Problem 3:
IntexArgChecker.sml.sml - Problem 4:
BindexToPostFix.sml - Problem 5:
Simprex.smlandSimprexEnvInterp.sml
Drop a copy of your entire
ps8directory into your~/cs251/drop/ps08drop folder oncs.wellesley.edu. - Problem 3:
- For Problems 3 through 5 you should include all modified functions from the four files listed above in your Google Doc (so that I can comment on them). In the Google Doc, you need only include the functions you modified, not all functions.
Starting this Problem Set
Problems 3 through 5 involve starter files in the ~wx/cs251/sml/ps8 directory in your wx Virtual Machine.
To create this directory, execute the following two commands in a wx VM shell:
cd ~/cs251/sml
git pull origin master1. Deriving Implementations (10 points)
As we discussed in the Metaprogramming lecture, there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:
The Interpreter Rule (I)
The Translator Rule (T)
In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “S-to-T translator machine” as an “S-to-T translator” or “S-to-T compiler”. We will also often elide the word “program”; e.g., we will refer to a “P-in-L program” as “P-in-L”. So an “L interpreter” means an “L interpreter machine” while a “S-in-L interpreter” means an “S-in-L interpreter program”. Similar remarks hold for translators (also called compilers): an “S-to-T translator” means an “S-to-T translator machine”, while a “S-to-T-in-L translator” means an “S-to-T-in-L translator program”.
Example
For example, we considered the following elements in class:
- a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
- an HTML-interpreter-in-C program (i.e., a web browser written in C)
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
- a x86 computer (i.e., an x86 interpreter machine)
They can be used to build a 251 web page display machine as follows:
Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.
Writing Derivations
Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:
- 251 web page machine (I)
- 251-web-page-in-HTML program
- HTML interpreter machine (I)
- HTML-interpreter-in-x86 program (T)
- HTML-interpreter-in-C program
- C-to-x86 compiler machine (I)
- C-to-x86-compiler-in-x86 program
- x86 computer
- x86 computer
- HTML-interpreter-in-x86 program (T)
Questions
-
(5 points) Suppose you are given the following:
- A P-in-Java program
- a Java-to-C-compiler-in-Racket program (i.e., a Java-to-C compiler written in Racket);
- a Racket-interpreter-in-x86 program (i.e., a Racket interpreter written in x86 machine code);
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86 code);
- an x86 computer (i.e., an x86 interpreter machine).
Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the P-in-Java program on the computer. That is, show how to construct a P machine.
-
(5 points) Suppose you are given:
- a GraphViz-in-C program
- a C-to-LLVM-compiler-in-x86 program
- an LLVM-to-JavaScript-compiler-in-x86 program
- a JavaScript-interpreter-in-ARM program
- an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
- an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)
Show how to construct a GraphViz machine.
2. Bootstrapping (20 points)
In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:
-
(8 points) Stage II of the paper describes how a C compiler can be “taught” to recognize
'\v'as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:- the usual C programming language
- C+\v – an extension to C in which
'\v'is treated as the vertical tab character (which has ASCII value 11).
Suppose we are given the following:
- a C-to-binary compiler (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
- a C+\v-to-binary-compiler-in-C (Figure 2.3 in Thompson’s paper);
- a C+\v-to-binary-compiler-in-C+\v (what should be Figure 2.2 in Thompson’s paper);
- a machine that executes the given type of binary code.
Construct a proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.
-
(12 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:
- a C-to-binary-compiler (again, just a “black box” we are given);
- a C-to-binary-compiler-in-C without Trojan Horses (Figure 3.1 in Thompson’s paper);
- a C-to-binary-compilerTH-in-C with two Trojan Horses (Figure 3.3 in Thompson’s paper);
- a login-in-C program with no Trojan Horse;
- a binary-based computer;
The subscript TH indicates that a program contains a Trojan Horse. A C-to-binary compilerTH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:
The Trojan Horse Rule (TH)
Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program loginTH-in-binary that is the result of compiling login-in-C with a C compiler that is the result of compiling C-to-binary-compiler-in-C. The interesting thing about this derivation is that loginTH-in-binary contains a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C) contains no code for inserting Trojan horses!
3. Static Argument Checking in Intex (15 points)
As described in the lecture slides on Intex, it is possible to statically determine (i.e., without running the program) whether an Intex program contains a argument reference with an out-of-bound index. In this problem you will flesh out skeleton implementations of the two approaches for static argument checking sketched in the slides. The skeleton implementations are in the file ps8/IntexArgChecker.sml.
-
Top-down checker (7 points): In the top-down approach, the
checkTopDownfunction on a program passes the number of arguments to thecheckExpTopDownfunction on expressions, which recursively passes it down the subexpressions of the abstract syntax tree. When anArgexpression is reached, the index is examined to determine a boolean that indicates whether it is in bounds. The boolean on all subexpressions are combined in the upward phase of recursion to determine a boolean for the whole tree. Complete the following skeleton to flesh out this approach.(* val checkTopDown: pgm -> bool *) fun checkTopDown (Intex(numargs,body)) = checkExpTopDown numargs body (* val checkExpTopDown : int -> Intex.exp -> bool *) and checkExpTopDown numargs (Int i) = raise Unimplemented | checkExpTopDown numargs (Arg index) = raise Unimplemented | checkExpTopDown numargs (BinApp(_,exp1,exp2)) = raise UnimplementedUncomment the definition of
topDownChecksto test your implementation. -
Bottom-up checker (6 points): In the bottom-up approach, the
checkExpBottomUpfunction returns a pair(min, max)of the minimum and maximum argument indices that appear in an expression. If an expression contains no argument indices, it should return the(valOf Int.maxInt, valOf Int.minInt), which is a pair of SML’s (1) maximum integer and (2) minumum integer. The funtionsInt.minandInt.maxare useful for combining such pairs. (Note thatInt.maxIntis the identity forInt.min, andInt.minIntis the identity forInt.max.)The
checkBottomUpfunction on programs returnstrueif all argument references are positive and less than or equal to the programs number of arguments, andfalseotherwise. Complete the following skeleton to flesh out this approach.(* checkBottomUp: pgm -> bool *) fun checkBottomUp (Intex(numargs,body)) = let val (min,max) = checkExpBottomUp body in 0 < min andalso max <= numargs end (* val checkExpBottomUp : Intex.exp -> int * int *) and checkExpBottomUp (Int i) = (valOf Int.maxInt, valOf Int.minInt) | checkExpBottomUp (Arg index) = raise Unimplemented | checkExpBottomUp (BinApp(_,exp1,exp2)) = raise UnimplementedUncomment the definition of
bottomUpChecksto test your implementation. -
Static vs dynamic checking (2 points): A static argument index checker flags programs that might dynamically raise an index-out-of-bounds error, but does not guarantee they will dynamically raise such an error. Give an example of an Intex program for which
checkTopDownandcheckBottomUpreturnsfalsebut does not raise an argument-out-of-bounds error when run on particular arguments. Hint: such a program can raise a different error.
4. Bindex To PostFix (20 points)
Background
In lecture, we studied the following intexToPostFix function that automatically translates Intex programs to equivalent PostFix programs:
fun intexToPostFix (Intex.Intex(numargs, exp)) =
PostFix.PostFix(numargs, expToCmds exp 0)
(* 0 is a depth argument that statically tracks
how many values are on stack above the arguments *)
and expToCmds (Intex.Int i) depth = [PostFix.Int i]
| expToCmds (Intex.Arg index) depth = [PostFix.Int (index + depth), PostFix.Nget]
(* specified argument is on stack at index + depth *)
| expToCmds (Intex.BinApp(binop,exp1,exp2)) depth =
(expToCmds exp1 depth) (* 1st operand is at same depth as whole binapp *)
@ (expToCmds exp2 (depth + 1)) (* for 2nd operand, add 1 to depth
to account for 1st operand *)
@ [PostFix.Arithop (binopToArithop binop)]
and binopToArithop Intex.Add = PostFix.Add
| binopToArithop Intex.Sub = PostFix.Sub
| binopToArithop Intex.Mul = PostFix.Mul
| binopToArithop Intex.Div = PostFix.Div
| binopToArithop Intex.Rem = PostFix.RemFor example, given the Intex program intexP2 expressed as an Intex.pgm datatype instance
val intexP2 = Intex(4, BinApp(Mul,
BinApp(Sub, Arg 1, Arg 2),
BinApp(Div, Arg 3, Arg 4)))the intexToPostFix translation is as follows:
- intexToPostFix(intexP2);
val it =
PostFix
(4,
[Int 1,Nget,Int 3,Nget,Arithop Sub,Int 4,Nget,Int 6,Nget,Arithop Div,
Arithop Mul]) : PostFix.pgmWith the helper function translateString shown below, the input and output of translation can be expressed as strings in s-expression format:
fun translateString intexPgmString =
PostFix.pgmToString (intexToPostFix (Intex.stringToPgm intexPgmString))- IntexToPostFix.translateString("(intex 4 (* (- ($ 1) ($ 2)) (/ ($ 3) ($ 4))))");
val it = "(postfix 4 1 nget 3 nget sub 4 nget 6 nget div mul)" : stringThere are two key aspects to the translation from Intex to PostFix:
-
The
expToCmdsfunction translates every Intex expression to a sequence of PostFix commands that, when executed, will push the integer value of that expression onto the stack. In the case of translating a binary application expression, these sequence is composed by appending the sequences for the two operands followed by the appropriate arithmetic operator command. -
The trickiest part of
expToCommandsis knowing how many integers are on the stack above the program arguments, so that references to Intex arguments by index can be appropriately translated. This is handled by adepthargument toexpToCommandsthat keeps track of this number. Note that the first operand of a binary application has the same depth as the whole binary application expression, but the second operand has a depth one greater than the first to account for the integer value of the first operand pushed onto the stack.
Your Task
In this problem, you will implement a similar translator from Bindex programs to PostFix programs.
The file ps8/BindexToPostFix.sml contains a skeleton of the Bindex to PostFix translator in which all Bindex expressions translate to a command sequence consisting of the single integer command 42:
fun makeArgEnv args = Env.make args (Utils.range 1 ((length args) + 1))
(* returned env associates each arg name with its 1-based index *)
fun envPushAll env = Env.map (fn index => index + 1) env
(* add one to the index for each name in env *)
fun envPush name env = Env.bind name 1 (envPushAll env)
fun bindexToPostFix (Bindex.Bindex(args, body)) =
PostFix.PostFix(length args, expToCmds body (makeArgEnv args))
(* In expToCmds, env statically tracks the depth of each named variable
value on the stack *)
and expToCmds (Bindex.Int i) env = [PostFix.Int 42] (* replace this stub *)
| expToCmds (Bindex.Var name) env = [PostFix.Int 42] (* replace this stub *)
| expToCmds (Bindex.BinApp(binop, rand1, rand2)) env = [PostFix.Int 42] (* replace this stub *)
| expToCmds (Bindex.Bind(name, defn, body)) env = [PostFix.Int 42] (* replace this stub *)
and binopToArithop Bindex.Add = PostFix.Add
| binopToArithop Bindex.Sub = PostFix.Sub
| binopToArithop Bindex.Mul = PostFix.Mul
| binopToArithop Bindex.Div = PostFix.Div
| binopToArithop Bindex.Rem = PostFix.RemWhen loaded, the BindexToPostFix.sml file performs numerous tests of the translator. Initially, all of these fail. Your task is to redefine the clauses of expToCmds so that all the test cases succeed.
Notes:
-
The
expToCmdsfunction in the Bindex to PostFix translator uses an environment to track the depth of each variable name in the program. Recall that an environment (defined by theEnvstructure inutils/Env.sml) maps names to values. In this case, it maps names in a Bindex program to the current stack depth of the value associated with that name. Carefully study the helper functionsmakeArgEnv,envPushAll, andenvPushto understand what they do; all are helpful in the contex of this problem. -
As in the Intex to PostFix translator, in the Bindex to PostFix translator, each Bindex expression should translate to a sequence of PostFix commands that, when executed, pushes exactly one value — the integer value of that expression — onto the stack.
-
For simplicity, you may assume there are no unbound variables in the Bindex program you are translating. With this assumption, you may use the
Option.valOffunction to extract the valuevfrom aSOME(v)value. -
The
BindexToPostFixstructure contains atestTranslatorfunction that takes the name of a file containing a Bindex program and a list of argument lists and (1) displays the Bindex program as an s-expression string, (2) displays the PostFix program resulting from the translation as an s-expression string and (3) tests the behavior of both programs on each of the argument lists in the list of argument lists. If the behaviors match, it prints an approprate message; if they don’t match, it reports an an error with the two different result values. For instance, in the initial skelton, the behavioral tests fail on the program in../bindex/avg.bdx:Testing Bindex program file ../bindex/avg.bdx Bindex program input to translation: (bindex (a b) (/ (+ a b) 2)) PostFix program output of translation: (postfix 2 42) Testing args [3,7]: *** ERROR IN TRANSLATION *** Bindex result: 5 PostFix result: 42 Testing args [15,5]: *** ERROR IN TRANSLATION *** Bindex result: 10 PostFix result: 42However, when the translator is working, the following will be displayed:
Testing Bindex program file ../bindex/avg.bdx Bindex program input to translation: (bindex (a b) (/ (+ a b) 2)) PostFix program output of translation: (postfix 2 1 nget 3 nget add 2 div) Testing args [3,7]: both programs return 5 Testing args [15,5]: both programs return 10 -
You have succeeded when all of the test cases succeed when you load the
.smlfile.
5. Simprex (35 points)
Background
Rae Q. Cerf of Toy-Languages-я-Us likes the sigma construct from the Bindex lecture, but she wants something more general. In addition to expressing sums, she would also like to express numeric functions like factorial and exponentiation that are easily definable via simple recursion. The functions that Rae wants to define all have the following form:
f(n) = z, if n ≤ 0
f(n) = c(n, f(n-1)), if n > 0Here, z is an integer that defines the value of f for any non-positive integer value and and c is a binary combining function that combines n and the value of f(n−1) for any positive n. Expanding the definition yields:
f(n) = c(n, c(n-1, c(n−2, ... c(2, c(1, z)))))For example, to define the factorial function, Rae uses:
z_fact = 1
c_fact(i, a) = i*aTo define the exponentation function bn, Rae uses:
z_expt = 1
c_expt(i, a) = b*aIn this case, c_expt ignores its first argument, but the fact that c_expt is called n times is important.
As another example, Rae defines the sum of the squares of the integers between 1 and n using
z_sos = 0
c_sos(i, a) = (i*i) + aThe examples considered so far don’t distinguish right-to-left vs. left-to-right evaluation, A simple example of that is
z_sub = 0
c_sub(i, a) = i-aFor example, when n is 4, the result is the value of 4 - (3 - (2 - (1 - 0))), which is 2, and not the value of 1 - (2 - (3 - (4 - 0))), which is -2.
Another example that distinguishes right-to-left vs. left-to-right evaluation is using Horner’s method to calculate the value of the polynomial x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n:
z_horner = 0
c_horner(i, a) = i + x*aFor example, when n is 4 and x is 5, the result is (4 + 5*(3 + 5*(2 + 5*(1 + 5*0)))) = 194, which is the value of 5^3 + 2*5^2 + 3*5^1 + 4 = (125 + 50 + 15 + 4) = 194.
Rae designs an extension to Bindex named Simprex that adds a new simprec construct for expressing her simple recursions:
(simprec E_zero (Id_num Id_ans E_combine ) E_arg)This simprec expression evaluates E_arg to the integer value n_arg and E_zero to the integer value n_zero. If n_arg ≤ 0, it returns n_zero. Otherwise, it returns the value
c(n_arg, c(n_arg−1, c(n_arg−2, ... c(2, c(1, nzero)))))where c is the binary combining function specified by (Id_num Id_ans E_combine). This denotes a two-argument function whose two formal parameters are named Id_num and Id_ans and whose body is E_combine. The Id_num parameter ranges over the numbers from n_arg down to 1, while the Id_ans parameter ranges over the answers built up by c starting at n_zero. The scope of Id_num and Id_ans includes only E_combine; it does not include E_zero or E_arg.
Rae’s simprec expression is closely related to the notion of primitive recursive functions defined in the theory of computation.
Here are some sample Simprex programs:
;; Program that calculuates the factorial of n
(simprex (n) (simprec 1 (i a (* i a)) n))
;; Exponentiation program raising base b to power p
(simprex (b p) (simprec 1 (i ans (* b ans)) p))
;; Program summing the squares of the numbers from 1 to hi
(simprex (hi) (simprec 0 (i sumSoFar (+ (* i i) sumSoFar)) hi))
;; Program distingishing right-to-left and left-to-right evaluation via subtraction
(simprex (n) (simprec 0 (i ans (- i ans)) n))
;; Calculating x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n via Horner's method
(simprex (n x) (simprec 0 (i a (+ i (* x a)) n)))Your Tasks
After completing her design, Rae is called away to work on another language design problem. Toy-Languages-я-Us is impressed with your CS251 background, and has hired you to implement the Simprex language, starting with a version of the Sigmex implementation. (Sigmex is the name of the language that results from extending Bindex with the sigma construct from lecture.) Your first week on the job, you are asked to complete the following tasks that Rae has specified in a memo she has written about finishing her project.
-
(15 points) Rae’s memo contains the following Simprex test programs. Give the results of running each of the programs on the argument 3. Show your work so that you may get partial credit if your answer is incorrect.
;; program 1 (2 points) (simprex (a) (simprec 0 (b c (+ 2 c)) a)) ;; program 2 (3 points) (simprex (x) (simprec 0 (n sum (+ n (* x sum))) 4)) ;; program 3 (4 points) (simprex (y) (simprec 0 (a b (+ b (sigma c 1 a (* a c)))) y)) ;; program 4 (6 points) (simprex (n) (simprec (simprec (* n (- n 3)) (q r r) (* n n)) (c d (+ d (simprec 0 (x sum (+ sum (- (* 2 x) 1))) c))) (simprec -5 (a b (+ 1 b)) (* n n)))) -
(20 points)
Rae has created a skeleton implementation of Simprex by modifying the files for the Sigmex (= Bindex +
sigma) implementation to contain stubs for thesimprecconstruct. Her modified files, which are namedSimprex.smlandSimprexEnvInterp.sml, can be found in theps8folder.Finish Rae’s implementation of the Simprex language by completing the following four tasks, which Rae has listed in her memo:
-
(2 points) Extend the definition of
sexpToExpinSimprex.smlto correctly parsesimprecexpressions. -
(2 points) Extend the definition of
expToSexpinSimprex.smlto correctly unparsesimprecexpressions. -
(5 points) Extend the definition of
freeVarsExpinSimprex.smlto correctly determine the free variables of asimprecexpression. -
(11 points) Extend the definition of
evalinSimprexEnvInterp.smlto correctly evaluatesimprecexpressions using the environment model.
Notes:
-
This problem is similar the the Sigmex (= Bindex +
sigma) language extension we implemented in lecture. You may wish to study the solution filesbindex/SigmexSolns.smlandbindex/SigmexEnvInterp.smlas part of doing this problem. -
In
Simprec.sml, theexptype is defined to be:and exp = Int of int (* integer literal with value *) | Var of var (* variable reference *) | BinApp of binop * exp * exp (* binary operator application with rator, rands *) | Bind of var * exp * exp (* bind name to value of defn in body *) | Sigma of var * exp * exp * exp (* E_lo, E_hi, E_body *) | Simprec of exp * var * var * exp * exp (* zeroExp * numVar * ansVar * combExp * argExp *)The s-expression notation
(simprec E_zero (I_num I_ans E_combine) E_arg)is represented in SML asSimprec (<exp for E-zero>, <string for I_num>, <string for I_ans>, <exp for E_combine>, <exp for E_arg>)For example, the expression
(simprec 1 (x a (* x a)) n)is represented in SML asSimprec(Int 1, "x", "a", BinApp(Mul, Var "x", Var "a"), Var "n") -
You can test your modifications of
sexpToExp,expToExp, andfreeVarsExpviause "Simprex.sml"after you have first uncommented the test cases at the end ofSimprex.sml. This will display the results of various test cases forfreeVarsExpbefore displaying theSimprexstructure. It will also testsexpToExpandexpToSexp. If you uncomment the test cases before you have correctly modifiedsexpToExpandexpToSexp, any attempt touse "Simprec.sml"oruse "SimprecEnvInterp.sml"will fail with aSyntaxError. -
You can test your modifications of
evalviause "SimprexEnvInterp.sml"after you have first uncommented the test cases at the end ofSimprexEnvInterp.sml. This will display the results of various test cases forevalbefore displaying theSimprexEnvInterpstructure.
-
Extra Credit: Self-printing Program (15 points)
In his Turing-lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a self-printing program. Write a self-printing program in any general-purpose programming language you choose. Your program should be self-contained — it should not read anything from the file system.
Important Note: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Do this completely one your own, without looking anything up online.