PS8: Open to Interpretation
 Due: Mon., Dec. 12
 Notes:
 Although formally “due” on Mon., Dec. 12, this (and all your other asyetincomplete CS251 work) must actually be completed by the end of finals period
 This pset does not have any solo problems.
 This pset has 100 total points.
 The problems needn’t be done in order. Feel free to jump around.
 Submission:
 In your yourFullName CS251 Fall 2016 Folder, create a Google Doc named yourFullName CS251 PS8.
 At the top of your yourFullName CS251 PS8 doc, include your name, problem set number, date of submission, and an approximation of how long each problem part took.
 For all parts of all problems, include all answers (including SML code) in your PS8 google doc. Format code using a fixedwidth font, like Consolas or Courier New. You can use a small font size if that helps.
 For Problems 1 and 2, include your interpretation/compilation derivations. It is easiest to write these via nested bullets, as shown in both the pset description and the lecture notes.

In Problems 3 through 5, you will modify the following code files from starter files populating your
~wx/cs251/sml/ps8
directory on your wx Virtual Machine: Problem 3:
IntexArgChecker.sml.sml
 Problem 4:
BindexToPostFix.sml
 Problem 5:
Simprex.sml
andSimprexEnvInterp.sml
Drop a copy of your entire
ps8
directory into your~/cs251/drop/ps08
drop folder oncs.wellesley.edu
.  Problem 3:
 For Problems 3 through 5 you should include all modified functions from the four files listed above in your Google Doc (so that I can comment on them). In the Google Doc, you need only include the functions you modified, not all functions.
Starting this Problem Set
Problems 3 through 5 involve starter files in the ~wx/cs251/sml/ps8
directory in your wx Virtual Machine.
To create this directory, execute the following two commands in a wx VM shell:
cd ~/cs251/sml
git pull origin master
1. Deriving Implementations (10 points)
As we discussed in the Metaprogramming lecture, there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:
The Interpreter Rule (I)
The Translator Rule (T)
In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “StoT translator machine” as an “StoT translator” or “StoT compiler”. We will also often elide the word “program”; e.g., we will refer to a “PinL program” as “PinL”. So an “L interpreter” means an “L interpreter machine” while a “SinL interpreter” means an “SinL interpreter program”. Similar remarks hold for translators (also called compilers): an “StoT translator” means an “StoT translator machine”, while a “StoTinL translator” means an “StoTinL translator program”.
Example
For example, we considered the following elements in class:
 a 251webpageinHTML program (i.e., a 251 web page written in HTML)
 an HTMLinterpreterinC program (i.e., a web browser written in C)
 a Ctox86compilerinx86 program (i.e., a Ctox86 compiler written in x86)
 a x86 computer (i.e., an x86 interpreter machine)
They can be used to build a 251 web page display machine as follows:
Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.
Writing Derivations
Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:
 251 web page machine (I)
 251webpageinHTML program
 HTML interpreter machine (I)
 HTMLinterpreterinx86 program (T)
 HTMLinterpreterinC program
 Ctox86 compiler machine (I)
 Ctox86compilerinx86 program
 x86 computer
 x86 computer
 HTMLinterpreterinx86 program (T)
Questions

(5 points) Suppose you are given the following:
 A PinJava program
 a JavatoCcompilerinRacket program (i.e., a JavatoC compiler written in Racket);
 a Racketinterpreterinx86 program (i.e., a Racket interpreter written in x86 machine code);
 a Ctox86compilerinx86 program (i.e., a Ctox86 compiler written in x86 code);
 an x86 computer (i.e., an x86 interpreter machine).
Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the PinJava program on the computer. That is, show how to construct a P machine.

(5 points) Suppose you are given:
 a GraphVizinC program
 a CtoLLVMcompilerinx86 program
 an LLVMtoJavaScriptcompilerinx86 program
 a JavaScriptinterpreterinARM program
 an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
 an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)
Show how to construct a GraphViz machine.
2. Bootstrapping (20 points)
In his Turing Award lectureturnedshortpaper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a socalled “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:

(8 points) Stage II of the paper describes how a C compiler can be “taught” to recognize
'\v'
as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered: the usual C programming language
 C+\v – an extension to C in which
'\v'
is treated as the vertical tab character (which has ASCII value 11).
Suppose we are given the following:
 a Ctobinary compiler (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
 a C+\vtobinarycompilerinC (Figure 2.3 in Thompson’s paper);
 a C+\vtobinarycompilerinC+\v (what should be Figure 2.2 in Thompson’s paper);
 a machine that executes the given type of binary code.
Construct a proof showing how to use the C+\vtobinarycompilerinC+\v source code to create a C+\vtobinarycompilerinbinary program.

(12 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:
 a Ctobinarycompiler (again, just a “black box” we are given);
 a CtobinarycompilerinC without Trojan Horses (Figure 3.1 in Thompson’s paper);
 a Ctobinarycompiler_{TH}inC with two Trojan Horses (Figure 3.3 in Thompson’s paper);
 a logininC program with no Trojan Horse;
 a binarybased computer;
The subscript _{TH} indicates that a program contains a Trojan Horse. A Ctobinary compiler_{TH} has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:
The Trojan Horse Rule (TH)
Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program login_{TH}inbinary that is the result of compiling logininC with a C compiler that is the result of compiling CtobinarycompilerinC. The interesting thing about this derivation is that login_{TH}inbinary contains a Trojan horse even though it is compiled using a C compiler whose source code (CtobinarycompilerinC) contains no code for inserting Trojan horses!
3. Static Argument Checking in Intex (15 points)
As described in the lecture slides on Intex, it is possible to statically determine (i.e., without running the program) whether an Intex program contains a argument reference with an outofbound index. In this problem you will flesh out skeleton implementations of the two approaches for static argument checking sketched in the slides. The skeleton implementations are in the file ps8/IntexArgChecker.sml
.

Topdown checker (7 points): In the topdown approach, the
checkTopDown
function on a program passes the number of arguments to thecheckExpTopDown
function on expressions, which recursively passes it down the subexpressions of the abstract syntax tree. When anArg
expression is reached, the index is examined to determine a boolean that indicates whether it is in bounds. The boolean on all subexpressions are combined in the upward phase of recursion to determine a boolean for the whole tree. Complete the following skeleton to flesh out this approach.(* val checkTopDown: pgm > bool *) fun checkTopDown (Intex(numargs,body)) = checkExpTopDown numargs body (* val checkExpTopDown : int > Intex.exp > bool *) and checkExpTopDown numargs (Int i) = raise Unimplemented  checkExpTopDown numargs (Arg index) = raise Unimplemented  checkExpTopDown numargs (BinApp(_,exp1,exp2)) = raise Unimplemented
Uncomment the definition of
topDownChecks
to test your implementation. 
Bottomup checker (6 points): In the bottomup approach, the
checkExpBottomUp
function returns a pair(min, max)
of the minimum and maximum argument indices that appear in an expression. If an expression contains no argument indices, it should return the(valOf Int.maxInt, valOf Int.minInt)
, which is a pair of SML’s (1) maximum integer and (2) minumum integer. The funtionsInt.min
andInt.max
are useful for combining such pairs. (Note thatInt.maxInt
is the identity forInt.min
, andInt.minInt
is the identity forInt.max
.)The
checkBottomUp
function on programs returnstrue
if all argument references are positive and less than or equal to the programs number of arguments, andfalse
otherwise. Complete the following skeleton to flesh out this approach.(* checkBottomUp: pgm > bool *) fun checkBottomUp (Intex(numargs,body)) = let val (min,max) = checkExpBottomUp body in 0 < min andalso max <= numargs end (* val checkExpBottomUp : Intex.exp > int * int *) and checkExpBottomUp (Int i) = (valOf Int.maxInt, valOf Int.minInt)  checkExpBottomUp (Arg index) = raise Unimplemented  checkExpBottomUp (BinApp(_,exp1,exp2)) = raise Unimplemented
Uncomment the definition of
bottomUpChecks
to test your implementation. 
Static vs dynamic checking (2 points): A static argument index checker flags programs that might dynamically raise an indexoutofbounds error, but does not guarantee they will dynamically raise such an error. Give an example of an Intex program for which
checkTopDown
andcheckBottomUp
returnsfalse
but does not raise an argumentoutofbounds error when run on particular arguments. Hint: such a program can raise a different error.
4. Bindex To PostFix (20 points)
Background
In lecture, we studied the following intexToPostFix
function that automatically translates Intex programs to equivalent PostFix programs:
fun intexToPostFix (Intex.Intex(numargs, exp)) =
PostFix.PostFix(numargs, expToCmds exp 0)
(* 0 is a depth argument that statically tracks
how many values are on stack above the arguments *)
and expToCmds (Intex.Int i) depth = [PostFix.Int i]
 expToCmds (Intex.Arg index) depth = [PostFix.Int (index + depth), PostFix.Nget]
(* specified argument is on stack at index + depth *)
 expToCmds (Intex.BinApp(binop,exp1,exp2)) depth =
(expToCmds exp1 depth) (* 1st operand is at same depth as whole binapp *)
@ (expToCmds exp2 (depth + 1)) (* for 2nd operand, add 1 to depth
to account for 1st operand *)
@ [PostFix.Arithop (binopToArithop binop)]
and binopToArithop Intex.Add = PostFix.Add
 binopToArithop Intex.Sub = PostFix.Sub
 binopToArithop Intex.Mul = PostFix.Mul
 binopToArithop Intex.Div = PostFix.Div
 binopToArithop Intex.Rem = PostFix.Rem
For example, given the Intex program intexP2
expressed as an Intex.pgm
datatype instance
val intexP2 = Intex(4, BinApp(Mul,
BinApp(Sub, Arg 1, Arg 2),
BinApp(Div, Arg 3, Arg 4)))
the intexToPostFix
translation is as follows:
 intexToPostFix(intexP2);
val it =
PostFix
(4,
[Int 1,Nget,Int 3,Nget,Arithop Sub,Int 4,Nget,Int 6,Nget,Arithop Div,
Arithop Mul]) : PostFix.pgm
With the helper function translateString
shown below, the input and output of translation can be expressed as strings in sexpression format:
fun translateString intexPgmString =
PostFix.pgmToString (intexToPostFix (Intex.stringToPgm intexPgmString))
 IntexToPostFix.translateString("(intex 4 (* ( ($ 1) ($ 2)) (/ ($ 3) ($ 4))))");
val it = "(postfix 4 1 nget 3 nget sub 4 nget 6 nget div mul)" : string
There are two key aspects to the translation from Intex to PostFix:

The
expToCmds
function translates every Intex expression to a sequence of PostFix commands that, when executed, will push the integer value of that expression onto the stack. In the case of translating a binary application expression, these sequence is composed by appending the sequences for the two operands followed by the appropriate arithmetic operator command. 
The trickiest part of
expToCommands
is knowing how many integers are on the stack above the program arguments, so that references to Intex arguments by index can be appropriately translated. This is handled by adepth
argument toexpToCommands
that keeps track of this number. Note that the first operand of a binary application has the same depth as the whole binary application expression, but the second operand has a depth one greater than the first to account for the integer value of the first operand pushed onto the stack.
Your Task
In this problem, you will implement a similar translator from Bindex programs to PostFix programs.
The file ps8/BindexToPostFix.sml
contains a skeleton of the Bindex to PostFix translator in which all Bindex expressions translate to a command sequence consisting of the single integer command 42
:
fun makeArgEnv args = Env.make args (Utils.range 1 ((length args) + 1))
(* returned env associates each arg name with its 1based index *)
fun envPushAll env = Env.map (fn index => index + 1) env
(* add one to the index for each name in env *)
fun envPush name env = Env.bind name 1 (envPushAll env)
fun bindexToPostFix (Bindex.Bindex(args, body)) =
PostFix.PostFix(length args, expToCmds body (makeArgEnv args))
(* In expToCmds, env statically tracks the depth of each named variable
value on the stack *)
and expToCmds (Bindex.Int i) env = [PostFix.Int 42] (* replace this stub *)
 expToCmds (Bindex.Var name) env = [PostFix.Int 42] (* replace this stub *)
 expToCmds (Bindex.BinApp(binop, rand1, rand2)) env = [PostFix.Int 42] (* replace this stub *)
 expToCmds (Bindex.Bind(name, defn, body)) env = [PostFix.Int 42] (* replace this stub *)
and binopToArithop Bindex.Add = PostFix.Add
 binopToArithop Bindex.Sub = PostFix.Sub
 binopToArithop Bindex.Mul = PostFix.Mul
 binopToArithop Bindex.Div = PostFix.Div
 binopToArithop Bindex.Rem = PostFix.Rem
When loaded, the BindexToPostFix.sml
file performs numerous tests of the translator. Initially, all of these fail. Your task is to redefine the clauses of expToCmds
so that all the test cases succeed.
Notes:

The
expToCmds
function in the Bindex to PostFix translator uses an environment to track the depth of each variable name in the program. Recall that an environment (defined by theEnv
structure inutils/Env.sml
) maps names to values. In this case, it maps names in a Bindex program to the current stack depth of the value associated with that name. Carefully study the helper functionsmakeArgEnv
,envPushAll
, andenvPush
to understand what they do; all are helpful in the contex of this problem. 
As in the Intex to PostFix translator, in the Bindex to PostFix translator, each Bindex expression should translate to a sequence of PostFix commands that, when executed, pushes exactly one value — the integer value of that expression — onto the stack.

For simplicity, you may assume there are no unbound variables in the Bindex program you are translating. With this assumption, you may use the
Option.valOf
function to extract the valuev
from aSOME(v)
value. 
The
BindexToPostFix
structure contains atestTranslator
function that takes the name of a file containing a Bindex program and a list of argument lists and (1) displays the Bindex program as an sexpression string, (2) displays the PostFix program resulting from the translation as an sexpression string and (3) tests the behavior of both programs on each of the argument lists in the list of argument lists. If the behaviors match, it prints an approprate message; if they don’t match, it reports an an error with the two different result values. For instance, in the initial skelton, the behavioral tests fail on the program in../bindex/avg.bdx
:Testing Bindex program file ../bindex/avg.bdx Bindex program input to translation: (bindex (a b) (/ (+ a b) 2)) PostFix program output of translation: (postfix 2 42) Testing args [3,7]: *** ERROR IN TRANSLATION *** Bindex result: 5 PostFix result: 42 Testing args [15,5]: *** ERROR IN TRANSLATION *** Bindex result: 10 PostFix result: 42
However, when the translator is working, the following will be displayed:
Testing Bindex program file ../bindex/avg.bdx Bindex program input to translation: (bindex (a b) (/ (+ a b) 2)) PostFix program output of translation: (postfix 2 1 nget 3 nget add 2 div) Testing args [3,7]: both programs return 5 Testing args [15,5]: both programs return 10

You have succeeded when all of the test cases succeed when you load the
.sml
file.
5. Simprex (35 points)
Background
Rae Q. Cerf of ToyLanguagesяUs likes the sigma
construct from the Bindex lecture, but she wants something more general. In addition to expressing sums, she would also like to express numeric functions like factorial and exponentiation that are easily definable via simple recursion. The functions that Rae wants to define all have the following form:
f(n) = z, if n ≤ 0
f(n) = c(n, f(n1)), if n > 0
Here, z
is an integer that defines the value of f
for any nonpositive integer value and and c
is a binary combining function that combines n
and the value of f(n−1)
for any positive n
. Expanding the definition yields:
f(n) = c(n, c(n1, c(n−2, ... c(2, c(1, z)))))
For example, to define the factorial function, Rae uses:
z_fact = 1
c_fact(i, a) = i*a
To define the exponentation function b^{n}, Rae uses:
z_expt = 1
c_expt(i, a) = b*a
In this case, c_expt
ignores its first argument, but the fact that c_expt
is called n
times is important.
As another example, Rae defines the sum of the squares of the integers between 1 and n using
z_sos = 0
c_sos(i, a) = (i*i) + a
The examples considered so far don’t distinguish righttoleft vs. lefttoright evaluation, A simple example of that is
z_sub = 0
c_sub(i, a) = ia
For example, when n
is 4, the result is the value of 4  (3  (2  (1  0))), which is 2, and not the value of 1  (2  (3  (4  0))), which is 2.
Another example that distinguishes righttoleft vs. lefttoright evaluation is using Horner’s method to calculate the value of the polynomial x^(n1) + 2*(x^(n2)) + 3*(x^(n3)) + ... + (n1)*x + n
:
z_horner = 0
c_horner(i, a) = i + x*a
For example, when n
is 4 and x
is 5, the result is (4 + 5*(3 + 5*(2 + 5*(1 + 5*0))))
= 194, which is the value of 5^3 + 2*5^2 + 3*5^1 + 4
= (125 + 50 + 15 + 4) = 194.
Rae designs an extension to Bindex named Simprex that adds a new simprec
construct for expressing her simple recursions:
(simprec E_zero (Id_num Id_ans E_combine ) E_arg)
This simprec
expression evaluates E_arg
to the integer value n_arg
and E_zero
to the integer value n_zero
. If n_arg
≤ 0, it returns n_zero
. Otherwise, it returns the value
c(n_arg, c(n_arg−1, c(n_arg−2, ... c(2, c(1, nzero)))))
where c
is the binary combining function specified by (Id_num Id_ans E_combine)
. This denotes a twoargument function whose two formal parameters are named Id_num
and Id_ans
and whose body is E_combine
. The Id_num
parameter ranges over the numbers from n_arg
down to 1, while the Id_ans
parameter ranges over the answers built up by c
starting at n_zero
. The scope of Id_num
and Id_ans
includes only E_combine
; it does not include E_zero
or E_arg
.
Rae’s simprec
expression is closely related to the notion of primitive recursive functions defined in the theory of computation.
Here are some sample Simprex programs:
;; Program that calculuates the factorial of n
(simprex (n) (simprec 1 (i a (* i a)) n))
;; Exponentiation program raising base b to power p
(simprex (b p) (simprec 1 (i ans (* b ans)) p))
;; Program summing the squares of the numbers from 1 to hi
(simprex (hi) (simprec 0 (i sumSoFar (+ (* i i) sumSoFar)) hi))
;; Program distingishing righttoleft and lefttoright evaluation via subtraction
(simprex (n) (simprec 0 (i ans ( i ans)) n))
;; Calculating x^(n1) + 2*(x^(n2)) + 3*(x^(n3)) + ... + (n1)*x + n via Horner's method
(simprex (n x) (simprec 0 (i a (+ i (* x a)) n)))
Your Tasks
After completing her design, Rae is called away to work on another language design problem. ToyLanguagesяUs is impressed with your CS251 background, and has hired you to implement the Simprex language, starting with a version of the Sigmex implementation. (Sigmex is the name of the language that results from extending Bindex with the sigma
construct from lecture.) Your first week on the job, you are asked to complete the following tasks that Rae has specified in a memo she has written about finishing her project.

(15 points) Rae’s memo contains the following Simprex test programs. Give the results of running each of the programs on the argument 3. Show your work so that you may get partial credit if your answer is incorrect.
;; program 1 (2 points) (simprex (a) (simprec 0 (b c (+ 2 c)) a)) ;; program 2 (3 points) (simprex (x) (simprec 0 (n sum (+ n (* x sum))) 4)) ;; program 3 (4 points) (simprex (y) (simprec 0 (a b (+ b (sigma c 1 a (* a c)))) y)) ;; program 4 (6 points) (simprex (n) (simprec (simprec (* n ( n 3)) (q r r) (* n n)) (c d (+ d (simprec 0 (x sum (+ sum ( (* 2 x) 1))) c))) (simprec 5 (a b (+ 1 b)) (* n n))))

(20 points)
Rae has created a skeleton implementation of Simprex by modifying the files for the Sigmex (= Bindex +
sigma
) implementation to contain stubs for thesimprec
construct. Her modified files, which are namedSimprex.sml
andSimprexEnvInterp.sml
, can be found in theps8
folder.Finish Rae’s implementation of the Simprex language by completing the following four tasks, which Rae has listed in her memo:

(2 points) Extend the definition of
sexpToExp
inSimprex.sml
to correctly parsesimprec
expressions. 
(2 points) Extend the definition of
expToSexp
inSimprex.sml
to correctly unparsesimprec
expressions. 
(5 points) Extend the definition of
freeVarsExp
inSimprex.sml
to correctly determine the free variables of asimprec
expression. 
(11 points) Extend the definition of
eval
inSimprexEnvInterp.sml
to correctly evaluatesimprec
expressions using the environment model.
Notes:

This problem is similar the the Sigmex (= Bindex +
sigma
) language extension we implemented in lecture. You may wish to study the solution filesbindex/SigmexSolns.sml
andbindex/SigmexEnvInterp.sml
as part of doing this problem. 
In
Simprec.sml
, theexp
type is defined to be:and exp = Int of int (* integer literal with value *)  Var of var (* variable reference *)  BinApp of binop * exp * exp (* binary operator application with rator, rands *)  Bind of var * exp * exp (* bind name to value of defn in body *)  Sigma of var * exp * exp * exp (* E_lo, E_hi, E_body *)  Simprec of exp * var * var * exp * exp (* zeroExp * numVar * ansVar * combExp * argExp *)
The sexpression notation
(simprec E_zero (I_num I_ans E_combine) E_arg)
is represented in SML asSimprec (<exp for Ezero>, <string for I_num>, <string for I_ans>, <exp for E_combine>, <exp for E_arg>)
For example, the expression
(simprec 1 (x a (* x a)) n)
is represented in SML asSimprec(Int 1, "x", "a", BinApp(Mul, Var "x", Var "a"), Var "n")

You can test your modifications of
sexpToExp
,expToExp
, andfreeVarsExp
viause "Simprex.sml"
after you have first uncommented the test cases at the end ofSimprex.sml
. This will display the results of various test cases forfreeVarsExp
before displaying theSimprex
structure. It will also testsexpToExp
andexpToSexp
. If you uncomment the test cases before you have correctly modifiedsexpToExp
andexpToSexp
, any attempt touse "Simprec.sml"
oruse "SimprecEnvInterp.sml"
will fail with aSyntaxError
. 
You can test your modifications of
eval
viause "SimprexEnvInterp.sml"
after you have first uncommented the test cases at the end ofSimprexEnvInterp.sml
. This will display the results of various test cases foreval
before displaying theSimprexEnvInterp
structure.

Extra Credit: Selfprinting Program (15 points)
In his Turinglectureturnedshortpaper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a selfprinting program. Write a selfprinting program in any generalpurpose programming language you choose. Your program should be selfcontained — it should not read anything from the file system.
Important Note: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Do this completely one your own, without looking anything up online.