CS 251: PS6: I Never Metalanguage I Didn't Like

Changes from the original PS6 description are highlighed in magenta.
Dueish: Mon Mar 29 (followed by a 2-day grace period)
Notes:
- This is the last regular pset, but there will be two more Solo Assignments, E & F.
- This pset includes numerous extra credit problems, some of which cover material from the last few lectures that is not covered by the regular problems on this pset.
- As with all regular psets, you can work on this problem in a two-person team or as an individual, but partnering is strongly recommended. Unless you get a special dispensation from Lyn, you need to work with a different partner than you did in your previous two partnerships. Use this Pset Partner Doc to find a partner and record your partnership.
- This pset has 110 total points.
- Much of the material on this pset has not yet been covered in class. Here are some notes to help you plan when you can start the problems:
  - You can do Problem 1 now based on the Lec 24 material on s-expressions
  - You will be able to do Problem 2 after the Mon Mar 22 Lec 26 on a PosfFix interpreter in Racket
  - You will be able to do Problem 3 after the Wed Mar 24 Lec 28 on metaprogramming derivations
  - You will be able to do Problem 5a after the Thu Mar 25 Lec 29 on an Intex interpreter in SML
  - You will be able to do Problems 4 and 5b after the Fri Mar 26 Lec 30 on compiling Intex to PostFix

Times (in hours) from some previous semesters:

The following times do not yet include the times for Problems 3, 4, and 6

Times	Problem 1	Problem 2	Problem 5
average time (hours)	0.5	2.5	1.9
median time (hours)	0.5	2.4	1.8
25% took more than	0.7	3.0	2.0
10% took more than	0.8	3.0	3.0

How to Start PS6
- Follow the instructions in the GDoc CS251-S21-T3 Pset Submission Instructions for creating your PS5 Google Doc. Put all written answers and a copy of all code into this PS5 GDoc. If you are working with a partner, only one of you needs to create this document, but you should link it from both of your List Docs.
Submission:
- For all parts of all problems, include all answers (including Racket and SML code and metaprogramming derivations)) in your PS6 google doc. Format Racket and SML code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
- For Problem 1:
  - Be sure that your deep-reverse definition in yourAccountName-ps6-deep-reverse.rkt also appears in your PS6 GDoc.
  - Drop a copy of your yourAccountName-ps6-deep-reverse.rkt in your ~/cs251/drop/ps06 drop folder on cs.wellesley.edu.
- For Problem 2:
  - Include the modified parts of yourAccountName-ps6-postfix.rkt in your PS6 GDoc. (You only need to include the modified parts, not the entire contents of the PostFix interpreter!)
  - Drop a copy of your yourAccountName-ps6-postfix.rkt in your ~/cs251/drop/ps06 drop folder on cs.wellesley.edu.
- For Problem 3, include your metaprogramming derivations. It is easiest to write these top-down via nested bullets, as shown in both the Problem 3 description and the lecture notes.
- For Problem 4:
  - Include the definition of eval and any helper functions in your PS6 GDoc.
  - Drop a copy of your ~/cs251/sml/ps6/yourAccountName-Condex.sml file in your ~/cs251/drop/ps06 drop folder on cs.wellesley.edu by executing the following (replacing all three occurrences of gdome by your cs server username):
```
scp -r ~/cs251/sml/ps6/gdome-Condex.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06
```
- For Problem 5:
  - For parts a, b, and d, include the drawing of a single AST for P_Intex (from part a) that is annotated with depth information (from part b) and stack information (from part d).
  - For part c, include the commented PostFix program P_PostFix that is the result of translating P_Intex to PostFix.
- For Problem 6:
  - For Part a, include the annotated PostFix program that should result from compiling divRemTest.cdx, along with the sequence of all intermediate stacks when executing the PostFix program on the two given argument sequences.
  - For Part b:
    - Include the definition of expToCmds and any helper functions in your PS6 GDoc.
    - Drop a copy of your ~/cs251/sml/ps6/yourAccountName-CondexToPostFix.sml file in your ~/cs251/drop/ps06 drop folder on cs.wellesley.edu by executing the following (replacing all three occurrences of gdome by your cs server username):
      scp -r ~/cs251/sml/ps6/gdome-CondexToPostFix.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06

1. Deep Reverse (6 points)

We saw in Lec 24 that tree recursion on trees represented as s-expressions can be expressed rather elegantly. For example:

(define (atom? x)
  (or (number? x) (boolean? x) (string? x) (symbol? x)))

(define (sexp-num-atoms sexp)
  (if (atom? sexp)
      1
      (foldr + 0 (map sexp-num-atoms sexp))))

> (sexp-num-atoms '((a (b c) d) e (((f) g h) i j k)))
11
> (sexp-num-atoms '(a b c d))
4
> (sexp-num-atoms 'a)
1
> (sexp-num-atoms '())
0

(define (sexp-atoms sexp)
  (if (atom? sexp)
      (list sexp)
      (foldr append null (map sexp-atoms sexp))))

> (sexp-atoms '((a (b c) d) e (((f) g h) i j k)))
'(a b c d e f g h i j k)
> (sexp-atoms '(a b c d))
'(a b c d)
> (sexp-atoms 'a)
'(a)
> (sexp-atoms '())
'()

In this problem, you will define a function (deep-reverse sexp) that returns a new s-expression in which the order of the children at every node of the s-expression tree sexp is reversed.

> (deep-reverse '((a (b c) d) e (((f) g h) i j k)))
'((k j i (h g (f))) e (d (c b) a))
> (deep-reverse '(a b c d))
'(d c b a)
> (deep-reverse 'a)
'a
> (deep-reverse '())
'()

Notes:

Begin with this starter file ps6-deep-reverse-starter.rkt, which you should rename to yourAccountName-ps6-deep-reverse.rkt. Add your definition of deep-reverse to this file.
Your definition should have form similar to the definitions for sexp-num-atoms and sexp-atoms, but you’ll want to use something other than foldr.
You are not allowed to use Racket’s reverse function in your definition.

2. Extending PostFix (32 points)

Lec 26 on Mon Mar 22 covers several different versions of an interpreter for the PostFix language implemented in Racket. This problem involves starting with the following version of the interpreter:

ps6-postfix-starter.rkt

This is a slightly modified version of the file postfix-transform-fancy.rkt studied in lecture.

Begin by making a copy of ps6-postfix-starter.rkt named yourAccountName-ps6-postfix.rkt and load this into Dr. Racket. Near the bottom of this file is the following PostFix program named sos (for “sum of squares”). Racket’s semi-colon comments have been used to explain the commands in the program:

;; Sum-of-squares program 
(define sos
  '(postfix 2 ; let's call the arguments a and b, from top down
            1 nget ; stack is now (a a b)
            mul ; stack is now (a*a b)
            swap ; stack now (b a*a)
            1 nget ; stack now (b b a*a)
            mul ; stack now (b*b a*a)
            add ; stack now (a*a+b*b); top (only) value is returned by program 
            ))

Let’s run the program on the arguments 5 and 12:

> (postfix-run sos '(5 12))
About to execute commands (1 nget mul swap 1 nget mul add) on stack (5 12)
  after executing 1, stack is (1 5 12)
  after executing nget, stack is (5 5 12)
  after executing mul, stack is (25 12)
  after executing swap, stack is (12 25)
  after executing 1, stack is (1 12 25)
  after executing nget, stack is (12 12 25)
  after executing mul, stack is (144 25)
  after executing add, stack is (169)
169

Note that the stack that results from executing each command on the previous stack is displayed, line by line. This behavior is controlled by the display-steps? flag at the top of the program:

(define display-steps? #t)

If we set the flag to #f, the intermediate stack display will be turned off:

(define display-steps? #f)

> (postfix-run sos '(5 12))
169

Turn the display-steps? flag on when it’s helpful to understand or debug a PostFix program.

(10 points)

Consider the following Racket function g that is defined near the bottom of the PostFix interpreter file:

(define (g a b c)
  (- c
     (if (= 0 (remainder a 2))
         (quotient b (- a c))
         (* (+ b c) a))))

In this part, your goal is to flesh out the definition of a three-argument PostFix program pfg that performs the same calculation as g on three arguments:

(define pfg
   '(postfix 3 
       ;; Flesh out and comment the commands in this PostFix program 
     ))

Here are some examples with a correctly defined pfg:

> (apply g '(10 2 8))
7

> (postfix-run pfg '(10 2 8))
7

> (apply g '(-7 2 3))
38

> (postfix-run pfg '(-7 2 3))
38

> (apply g '(5 4 5))
-40

> (postfix-run pfg '(5 4 5))
-40

Notes:

You must comment command chunks in your pfg program to explain how they affect the stack. Use a commenting style similar to that in sos.

You have been provided with a testing function (test-pfg) that will test your pfg function on 5 sets of arguments:

;; Tests on an incorrect definition of pfg: 
> (test-pfg)
Testing pfg on (10 2 8): ***different results***
  g: 7
  pfg: -7
Testing pfg on (11 2 8): ***different results***
  g: -102
  pfg: 102
Testing pfg on (-6 3 8): ***different results***
  g: 8
  pfg: -8
Testing pfg on (-7 2 3): ***different results***
  g: 38
  pfg: -38
Testing pfg on (5 4 5): ***different results***
  g: -40
  pfg: 40

;; Tests on a correct definition of pfg: 
> (test-pfg)
Testing pfg on (10 2 8):  same result for g and pfg = 7
Testing pfg on (11 2 8):  same result for g and pfg = -102
Testing pfg on (-6 3 8):  same result for g and pfg = 8
Testing pfg on (-7 2 3):  same result for g and pfg = 38
Testing pfg on (5 4 5):  same result for g and pfg = -40

(7 points) Extend PostFix by adding the following two commands:

exp: given a stack i_base i_expt ... (where i_base and i_expt are integers), replaces i_base and i_expt by the result of raising i_base to the power of i_expt (or 0 if i_expt is negative).
chs: given a stack i_n i_k ... (where i_k and i_n are nonnegative integers and i_k <= i_n), replaces i_n and i_k by the value of i_n choose i_k. (See the definition of “choose” notation here). If one or both of i_n and i_k are invalid arguments, displays an error message (sees examples below).

For example:

> (postfix-run '(postfix 0 2 3 exp) '())
8

> (postfix-run '(postfix 0 3 2 exp) '())
9

> (postfix-run '(postfix 0 5 3 exp) '())
125

> (postfix-run '(postfix 0 3 5 exp) '())
243

> (postfix-run '(postfix 0 0 5 exp) '())
0

> (postfix-run '(postfix 0 5 0 exp) '())
1

> (postfix-run '(postfix 0 5 -1 exp) '())
0

> (postfix-run '(postfix 0 3 -5 exp) '())
0

> (postfix-run '(postfix 0 6 0 chs) '())
1

> (postfix-run '(postfix 0 6 1 chs) '())
6

> (postfix-run '(postfix 0 6 2 chs) '())
15

> (postfix-run '(postfix 0 6 3 chs) '())
20

> (postfix-run '(postfix 0 6 4 chs) '())
15

> (postfix-run '(postfix 0 6 5 chs) '())
6

> (postfix-run '(postfix 0 6 6 chs) '())
1

> (postfix-run '(postfix 0 6 7 chs) '())
ERROR: invalid operands for chs (6 7)

> (postfix-run '(postfix 0 6 -2 chs) '())
ERROR: invalid operands for chs (6 -2)

> (postfix-run '(postfix 0 -6 3 chs) '())
ERROR: invalid operands for chs (-6 3)

Notes:

Do not modify postfix-exec-command for this part. Instead, just add two bindings to the list postfix-arithops.
The Racket built-in expt function is helpful.
Use a factorial function (we’ve defined many in class!) to implement chs.

(4 points) Extend PostFix by adding the following three commands:

le is like lt, but checks “less than or equal to” rather than “less than”
ge is like gt, but checks “greater than or equal to” rather than “greater than”
and expects two integers at the top of the stack. It replaces them by 0 if either is 0; otherwise it replaces them by 1.

For example:

> (postfix-run '(postfix 0 4 5 le) '())
1

> (postfix-run '(postfix 0 5 5 le) '())
1

> (postfix-run '(postfix 0 5 4 le) '())
0

> (postfix-run '(postfix 0 4 5 ge) '())
0

> (postfix-run '(postfix 0 4 4 ge) '())
1

> (postfix-run '(postfix 0 5 4 ge) '())
1

> (postfix-run '(postfix 0 0 0 and) '())
0

> (postfix-run '(postfix 0 0 1 and) '())
0

> (postfix-run '(postfix 0 1 0 and) '())
0

> (postfix-run '(postfix 0 0 17 and) '())
0
 
> (postfix-run '(postfix 0 17 0 and) '())
0

> (postfix-run '(postfix 0 1 1 and) '())
1

> (postfix-run '(postfix 0 1 17 and) '())
1

> (postfix-run '(postfix 0 17 17 and) '())
1
 
> (postfix-run '(postfix 0 17 23 and) '())
1

Notes:

Do not modify postfix-exec-command for this part. Instead, just add three bindings to the list postfix-relops.

The testing function (test-5c) will test all of le, ge, and and in the context of the single PostFix program test-sorted:

(define test-sorted '(postfix 3 ; let's call the arguments a, b, and c, from top down
                         2 nget le ; is a <= b?
                         3 nget 3 nget ge ; is c >= b?
                         and ; is a <= b and c >= b?
                         ))

> (test-5c) ; Uses the test-sorted program in its definition 
Trying test-sorted on (4 5 6): works as expected; result = 1
Trying test-sorted on (4 5 5): works as expected; result = 1
Trying test-sorted on (4 4 5): works as expected; result = 1
Trying test-sorted on (4 4 4): works as expected; result = 1
Trying test-sorted on (4 6 5): works as expected; result = 0
Trying test-sorted on (5 6 4): works as expected; result = 0
Trying test-sorted on (5 4 6): works as expected; result = 0
Trying test-sorted on (6 5 4): works as expected; result = 0
Trying test-sorted on (6 4 5): works as expected; result = 0
Trying test-sorted on (5 5 4): works as expected; result = 0
Trying test-sorted on (5 4 4): works as expected; result = 0

A very common issue students encounter with attempted solutions to and is that it always returns 1 in the test cases for and, even though the code has cases that return 0 or 1. This issue is rooted in not understanding how the Racket function associated with the relop symbol is proceessed. Here are some questions to ask yourself in order to debug this issues should you encounter it:
- Study the functions associated with the other relop symbols. What kind of value do they return? Why?
- What does the function postfix-relop->racket-binop do in the PostFix interpreter implementation?
A lesson here is that you need to truly understand the code you are modifying before modifying it!

(2 points) Extend PostFix with a dup command that duplicates the top element of the stack (which can be either an integer or executable sequence). For example:

(define sos-dup '(postfix 2 dup mul swap dup mul add))

> (postfix-run sos-dup '(3 4))
About to execute commands (dup mul swap dup mul add) on stack (3 4)
  after executing dup, stack is (3 3 4)
  after executing mul, stack is (9 4)
  after executing swap, stack is (4 9)
  after executing dup, stack is (4 4 9)
  after executing mul, stack is (16 9)
  after executing add, stack is (25)
25

(define cmd-dup '(postfix 1 (dup dup mul add swap) dup 3 nget swap exec exec pop))

> (postfix-run cmd-dup '(4))
About to execute commands ((dup dup mul add swap) dup 3 nget swap exec exec pop) on stack (4)
  after executing (dup dup mul add swap), stack is ((dup dup mul add swap) 4)
  after executing dup, stack is ((dup dup mul add swap) (dup dup mul add swap) 4)
  after executing 3, stack is (3 (dup dup mul add swap) (dup dup mul add swap) 4)
  after executing nget, stack is (4 (dup dup mul add swap) (dup dup mul add swap) 4)
  after executing swap, stack is ((dup dup mul add swap) 4 (dup dup mul add swap) 4)
About to execute commands (dup dup mul add swap) on stack (4 (dup dup mul add swap) 4)
  after executing dup, stack is (4 4 (dup dup mul add swap) 4)
  after executing dup, stack is (4 4 4 (dup dup mul add swap) 4)
  after executing mul, stack is (16 4 (dup dup mul add swap) 4)
  after executing add, stack is (20 (dup dup mul add swap) 4)
  after executing swap, stack is ((dup dup mul add swap) 20 4)
  after executing exec, stack is ((dup dup mul add swap) 20 4)
About to execute commands (dup dup mul add swap) on stack (20 4)
  after executing dup, stack is (20 20 4)
  after executing dup, stack is (20 20 20 4)
  after executing mul, stack is (400 20 4)
  after executing add, stack is (420 4)
  after executing swap, stack is (4 420)
  after executing exec, stack is (4 420)
  after executing pop, stack is (420)
420

> (postfix-run '(postfix 0 dup) '())
ERROR: dup requires a nonempty stack ()

Notes:

Implement dup by adding a cond clause to postfix-exec-command.
Test your dup implementation using the above test cases. Your dup implementation should ensure there is at least one value on the stack, and give an appropriate error message when there isn’t.

(9 points) In this part you will extend PostFix with a rot command that behaves as follows.

The top stack value should be a positive integer rotlen that we’ll call the rotation length. Assume there are n values v₁, …, v_n below rotlen on the stack, where n is greater than or equal to rotlen. Let’s refer to the first rotlen of these n values as the prefix of the n values, and any remaining values as the suffix of the n values.
The result of performing the rot command is to ``rotate’’ the prefix by moving v₁ to the end of the prefix. That is, the value v₁ is moved to index rotlen in the prefix, and each of the values v₂ through v_rotlen is shifted down one index (i.e., moved one position to the left) so that they now occupy the indices 1 through rotlen-1. So after rot is performed, the elements of the prefix are v₂ … v_rotlen v₁. For example, if rotlen is 2, then rot acts like swap, and if rotlen is 1, then rot leaves the stack unchanged.
The values in the suffix are unchanged by rot.
When rotlen is 0, a negative integer, or a noninteger, the rot command signals an error.

Here are examples involving rot:

(define rot-test '(postfix 6 4 rot 3 rot 2 rot))

> (postfix-run rot-test '(8 7 6 5 9 10))
About to execute commands (4 rot 3 rot 2 rot) on stack (8 7 6 5 9 10)
  after executing 4, stack is (4 8 7 6 5 9 10)
  after executing rot, stack is (7 6 5 8 9 10)
  after executing 3, stack is (3 7 6 5 8 9 10)
  after executing rot, stack is (6 5 7 8 9 10)
  after executing 2, stack is (2 6 5 7 8 9 10)
  after executing rot, stack is (5 6 7 8 9 10)
5

> (postfix-run '(postfix 3 (mul add) rot) '(5 6 7))
About to execute commands ((mul add) rot) on stack (5 6 7)
  after executing (mul add), stack is ((mul add) 5 6 7)
ERROR: rot length must be a positive integer but is (mul add)

> (postfix-run '(postfix 3 -1 rot) '(5 6 7))
About to execute commands (-1 rot) on stack (5 6 7)
  after executing -1, stack is (-1 5 6 7)
ERROR: rot length must be a positive integer but is -1

> (postfix-run '(postfix 3 4 rot) '(5 6 7))
About to execute commands (4 rot) on stack (5 6 7)
  after executing 4, stack is (4 5 6 7)
ERROR: not enough stack values for rot (4 5 6 7)

> (postfix-run '(postfix 0 rot) '())
About to execute commands (rot) on stack ()
ERROR: rot requires a nonempty stack but is ()

Notes:

Implement rot by adding a cond clause to postfix-exec-command.

Racket supplies a list-tail function that is very helpful for implementing rot:

> (list-tail '(10 20 30 40 50) 0)
'(10 20 30 40 50)

> (list-tail '(10 20 30 40 50) 1)
'(20 30 40 50)

> (list-tail '(10 20 30 40 50) 2)
'(30 40 50)

> (list-tail '(10 20 30 40 50) 3)
'(40 50)

> (list-tail '(10 20 30 40 50) 4)
'(50)

> (list-tail '(10 20 30 40 50) 5)
'()

Racket does not provide a similar list-head function, but I have provided it in the ps6-postfix-starter.rkt file. It works this way:

> (list-head '(10 20 30 40 50) 0)
'()

> (list-head '(10 20 30 40 50) 1)
'(10)

> (list-head '(10 20 30 40 50) 2)
'(10 20)

> (list-head '(10 20 30 40 50) 3)
'(10 20 30)

>  (list-head '(10 20 30 40 50) 4)
'(10 20 30 40)

>  (list-head '(10 20 30 40 50) 5)
'(10 20 30 40 50)

Test your rot implementation using the above test cases. Your rot implementation should give appropriate error messages in various situations, like those in the test cases.

3. Metaprogramming Derivations (14 points)

This problem involves the notion of proof-like derivations that show how programming languages are implemented in terms of interpreters and compilers. These derivations are covered in Lec 28 on Wed Mar 24.

Deriving Implementations

As presented in slides 14 to 19 of the Lec 28 Metaprogramming slides there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:

The Interpreter Rule (I)

$\frac{P\text{-in-}L\text{ program} \qquad L\text{ interpreter machine}}{P\text{ machine}} (I)$

The Translator Rule (T)

$\frac{P\text{-in-}S\text{ program} \qquad S\text{-to-}T\text{ translator machine}}{P\text{-in-}T\text{ program}} (T)$

In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “S-to-T translator machine” as an “S-to-T translator” or “S-to-T compiler”. We will also often elide the word “program”; e.g., we will refer to a “P-in-L program” as “P-in-L”. So an “L interpreter” means an “L interpreter machine” while a “S-in-L interpreter” means an “S-in-L interpreter program”. Similar remarks hold for translators (also called compilers): an “S-to-T translator” means an “S-to-T translator machine”, while a “S-to-T-in-L translator” means an “S-to-T-in-L translator program”.

Example

The Metaprogramming slides consider an example with the following elements:

a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
an HTML-interpreter-in-C program (i.e., a web browser written in C)
a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
a x86 computer (i.e., an x86 interpreter machine)

They can be used to build a 251 web page display machine as follows:

$\cfrac{ \text{251-web-page-in-HTML program} \qquad \cfrac{ \cfrac{ \text{HTML-interpreter-in-C program} \qquad \cfrac{ \text{C-to-x86-compiler-in-x86 program} \qquad \text{x86 computer} }{ \text{C-to-x86 compiler machine} } (I) }{ \text{HTML-interpreter-in-x86 program} } (T) \qquad \text{x86 computer} }{ \text{HTML interpreter machine} } (I) }{ \text{251 web page machine} } (I)$

Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.

$\cfrac{ \text{251 web page in HTML} \qquad \cfrac{ \cfrac{ \text{HTML interpreter in C} \qquad \cfrac{ \text{C-to-x86 compiler in x86} \qquad \text{x86 computer} }{ \text{C-to-x86 compiler} } (I) }{ \text{HTML interpreter in x86} } (T) \qquad \text{x86 computer} }{ \text{HTML interpreter} } (I) }{ \text{251 web page machine} } (I)$

Writing Derivations

Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:

251 web page machine (I)

251-web-page-in-HTML program
HTML interpreter machine (I)

HTML-interpreter-in-x86 program (T)

HTML-interpreter-in-C program
C-to-x86 compiler machine (I)

C-to-x86-compiler-in-x86 program
x86 computer

x86 computer

Your Tasks

(7 points) Suppose you are given the following:

A P-in-Java program
a Java-to-C-compiler-in-Racket program (i.e., a Java-to-C compiler written in Racket);
a Racket-interpreter-in-x86 program (i.e., a Racket interpreter written in x86 machine code);
a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86 code);
an x86 computer (i.e., an x86 interpreter machine).

Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the P-in-Java program on the computer. That is, show how to construct a P machine.

(7 points) Suppose you are given:

a GraphViz-in-C program
a C-to-LLVM-compiler-in-x86 program
an LLVM-to-JavaScript-compiler-in-x86 program
a JavaScript-interpreter-in-ARM program
an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)

Show how to construct a GraphViz machine.

Notes:

In the above two parts, it is not possible to derive a “C interpreter”, a “Java interpreter”, or an “LLVM interpreter”, so if you need or use such a part in your derivation, you’ve done something wrong.
As shown in the above examples, you should use an (I) or (T) to indicate a conclusion that results from the Interpreter or Translator rule, respectively. Double check that the conclusion is actually justified by the rule from two nested bullets below the conclusion.

4. Interpreting Condex, an Extension to Intex (15 points)

Starting this Problem

This problem involves files in the ~/cs251/sml/ps6 directory in your csenv Virtual Machine. To create this directory, execute these Linux commands in a shell on your csenv appliance:

cd ~/cs251/sml
git pull origin master

If you encounter any issues when executing these commands, you can troubleshoot by following these git instructions. If these instructions don’t help, please contact Lyn.

Begin this problem by renaming the starter file Condex.sml in ~/cs251/sml/ps6 to yourAccountName-Condex.sml</code.</span>

It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.

REMINDER: ALWAYS MAKE A BACKUP OF YOUR .sml FILES AT THE END OF EACH DAY TO AVOID POTENTIAL LOSS OF WORK.

The Condex Language

Both Problems 4 and 6 involve the Condex language, which is an extension to Intex. As in Intex, all values in Condex are integers. But Condex includes the follow extra features:

Condex includes the relational operators <, =, and >. Since all Condex values are integers, these operators return 1 in the true case and 0 in the false case. This is similar to relational operators in PostFix.
Condex includes the conditional expression (ifnz E_test E_then E_else) where ifnz stands for if-non-zero and is pronounced iffenz. The meaning of the ifnz expression is as follows:

First, the E_test expression is evaluated to an integer value V_test.
- If V_test is non-zero, then the E_then expression is evaluated, and its value is returned as the value of the ifnz expression;
- If V_test is zero, then the E_else expression is evaluated, and its value is returned as the value of the ifnz expression;
- If an error is encountered during any of these evaluations, evaluating the ifnz expression gives the same error.
As with Racket’s if expresssion, Condex’s ifnz evaluates at most two subexpressions. E_test is always evaluated, and if that evaluation does not give an error, exactly one of E_then or E_else is evaluated.
Condex includes the short-circuit logical operations (and E_1 E2) and (or E_1 E2), which are similar to their Racket counterparts except they treat 0 as the only falsey value and all other integers as truthy. In particular:
- If E_1 evaluates to 0, then:
  - (and E_1 E2) evaluates to 0 without evaluating E_2
  - (or E_1 E2) evaluates to the value of E_2
- If E_1 evaluates to a nonzero value V_1, then:
  - (and E_1 E2) evaluates to the value of E_2
  - (or E_1 E2) evaluates the value of V_1 without evaluating E_2 (where E_1 is only evaluated to V_1 only once)

Condex programs are like Intex programs, except they begin with the symbol condex rather than intex. Your ~/cs251/sml/ps6 directory contains several sample Condex programs (with extension .cdx) that we will now discuss:

The program in the file abs.cdx calculates the absolute value of its single argument:
```
(condex 1 (ifnz (< $1 0) (- 0 $1) $1))
```
The program simpleIf.cdx uses ifnz in a simple conditional test:
```
(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))
```
For example, running this program on the arguments 3 and 4 should return 7, but calling it on 4 and 3 should return 12.
The program min3.cdx returns the smallest of its three arguments:
```
(condex 3 (ifnz (< $1 $2)
                (ifnz (< $1 $3) $1 $3)
                (ifnz (< $2 $3) $2 $3)))
```
For example, running this program on three arguments 4, 5, and 6 (in any order) should return 4.
The program sumRelations.cdx doesn’t contain any occurrences of ifnz, but uses the fact that relational operations return 0 or 1 to sum the value of three such operations:
```
(condex 4 (+ (< $1 $2) (+ (= $1 $3) (> $1 $4))))
```
For example, running this program on the four arguments 5, 6, 5, 4 returns 3, because (< 5 6), (= 5 5), and (> 5 4) all return 1. But running it on the four arguments 5, 3, 7, 4 returns 1, because (< 5 3) and (= 5 7) return 0 and only (> 5 4) returns 1.
Similar to Racket’s if, ifnz should evaluate at most one of the two branch expressions. In order to test this, we need an example in which branch expressions can generate errors in some cases. For this purpose, the divRemTest.cdx program below has one branch with a division and another with a remainder that can generate errors when their second operands are zero:
```
(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))
```
- running this program on the arguments 0, -8, and 2 should return -4 from the then branch (but not execute the else branch expression (% -8 0), which would encounter a remainder-by-zero error).
- running this program on the arguments 5, 8, and 0 should return 3 from the else branch (but not execute the then branch expression (/ 8 0), which would encounter a divide-by-zero error).
However, running it on the arguments 3, 2, 0 would encounter a divide-by-zero error in the then branch, and the arguments 0, 4, 5 would enounter a remainder-by-zero error in the else branch.

This divRemTest.cdx program will be a particularly relevant example when we consider compiling Condex to PostFix in Problem 6.
Problem 3a of this PS6, considers another Racket function, g, for translation into PostFix
```
(define (g a b c)
  (- c
     (if (= 0 (remainder a 2))
         (quotient b (- a c))
         (* (+ b c) a))))
```
The Condex program prob3a-g.cdx expresses this function in Condex:
```
(condex 3 (- $3 (ifnz (= 0 (% $1 2))
                      (/ $2 (- $1 $3))
                      (* (+ $2 $3) $1))))
```
Some relevant test cases from Problem 3a are that the three arguments 10, 2, and 8 yield the result 7, and the argments 5, 3, 5 yield the result -40. (See the comments in the file for more test cases.)

The nestedIfs.cdx program is a complex program that uses ifnz expressions as the test, then, and else subexpressions of an outer ifnz expression.

(condex 3 (ifnz (ifnz (< $1 $2) (< $1 $3) (> $1 $3))
          (ifnz (= $2 $3) (* $2 $2) (% $2 $3))
          (ifnz (> $2 $3) (/ $2 $3) (- $3 $2))))

See the comments of this file for several test cases.

The andTest.cdx and orTest.cdx programs test the simple behavior of Condex’s and and or operations:

(condex 2 (and $1 $2)) ; This is andTest.cdx
(condex 2 (or $1 $2)) ; This is orTest.cdx

The following table shows the results of these programs on various inputs. Note that, as in Racket, values other than false (i.e., 0) and true (i.e., 1) can be returned by and and or.

$1	$2	result of andTest.cdx	result of orTest.cdx
0	0	0	0
0	1	0	1
1	0	0	1
1	1	1	1
0	2	0	2
3	0	0	3
4	5	5	4

The shortCircuitAndTest.cdx and shortCircuitOrTest.cdx programs test the short-circuit behavior of Condex’s and and or operations:

(condex 2 (and $1 (/ 17 $2))) ; This is shortCircuitAndTest.cdx
(condex 2 (or $1 (% 42 $2))); This is shortCircuitOrTest.cdx

The following tables shows the results of these programs on various inputs.

$1	$2	result of shortCircuitAndTest.cdx	Note
0	2	0
0	0	0	Illustrates short-circuit behavior
4	5	5	Returns result that’s not 0 or 1
6	0	Error: Tried to divide 17 by 0	Triggers divide-by-zero error

$1	$2	result of shortCircuitOrTest.cdx	Note
0	9	6	Returns result that’s not 0 or 1
4	5	4	Returns result that’s not 0 or 1
7	0	5	Illustrates short-circuit behavior
0	0	Error: Tried to remainder 42 by 0	Triggers remainder-by-zero error

In the rest of this problem, your goal will be to flesh out the skeleton a Condex interpreter implemented in SML. The above programs will be used in test cases that your programs behave as expected.

SML Sum-of-product and S-expression Syntax for Condex

In the ps6 directory on your csenv appliance, your file yourAccountName-Condex.sml contains the definition of the SML representation for Condex syntax. This syntax is is very similar to that for the Intex interpreter we studied in class (and which can be found in ~/cs251/sml/Intex.sml).

Here are the datatypes and helper functions used to represent Condex program trees:

datatype pgm = Condex of int * exp
     and exp = Int of int
             | Arg of int
             | ArithApp of arithop * exp * exp 
             | RelApp of relop * exp * exp (* New in Condex: relational operations *)
             | And of exp * exp (* New in Condex: short-circuit conjunction *)
             | Or of exp * exp (* New in Condex: short-circuit disjunction *)
             | Ifnz of exp * exp * exp (* New in Condex: conditional with E_test, E_then, E_else *)
    and arithop = Add | Sub | Mul | Div | Rem 
    and relop = Lt | Eq | Gt (* <, =, > *)

and arithopToFun Add = op+
  | arithopToFun Mul = op*
  | arithopToFun Sub = op-
  | arithopToFun Div =
    (fn(x,y) => if y = 0 then
                  raise EvalError ("Tried to divide " ^ (Int.toString x) ^ " by 0")
                else
                  x div y)
  | arithopToFun Rem =
    (fn(x,y) => if y = 0 then
                  raise EvalError ("Tried to remainder " ^ (Int.toString x) ^ " by 0")
                else
                  x mod y)

(* New in Condex interpreter *)         
and relopToFun Lt = op<
  | relopToFun Eq = (fn(a,b) => a = b)
  | relopToFun Gt = op>

(* New in Condex interpreter *)         
and boolToInt false = 0
  | boolToInt true = 1

Here are some comparisons with Intex worth noting:

Programs are constructed via the Condex (not Intex) constructor;
Int, Arg, ArithApp, and arithop are unchanged from Intex, as are the `arithop
The RelApp constructor and the relop type are new in Condex; they represent relational operations (<, =, >).
The Ifnz constructor is new in Condex and represents the ifnz construct.
The And and Or constructors are new in Condex and represent the and and or constructs, respectively
Condexs arithopToFun helper function is exactly the same as in Intex.
The new helper function relopToFun defines the meaning of the relop operators.
The new helper function boolToInt is also handy in this problem.

Additionally, yourAccountName-Condex.sml contains functions for parsing s-expression strings into, and unparsing s-expression strings from, the sum-of-product notation. For example:

- val ifnzExp = stringToExp "(ifnz (< $1 6) (and (< $1 0) (> $2 10)) (or (= $1 10) (> $1 $2)))";
val ifnzExp =
  Ifnz
    (RelApp (Lt,Arg 1,Int 6),
     And (RelApp (Lt,Arg 1,Int 0),RelApp (Gt,Arg 2,Int 10)),
     Or (RelApp (Eq,Arg 1,Int 10),RelApp (Gt,Arg 1,Arg 2))) : exp

- expToString ifnzExp;
val it =
  "(ifnz (< ($ 1) 6)\n      (and (< ($ 1) 0) (> ($ 2) 10))\n      (or (= ($ 1) 10) (> ($ 1) ($ 2)))\n      )"
  : string

- print(expToString ifnzExp);
(ifnz (< ($ 1) 6)
      (and (< ($ 1) 0) (> ($ 2) 10))
      (or (= ($ 1) 10) (> ($ 1) ($ 2)))
      )val it = () : unit

- val simpleIfPgm = stringToPgm "(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))";
val simpleIfPgm =
  Condex
    (2,
     Ifnz
       (RelApp (Lt,Arg 1,Arg 2),ArithApp (Add,Arg 1,Arg 2),
        ArithApp (Mul,Arg 1,Arg 2))) : pgm

- pgmToString simpleIfPgm;
val it = "(condex 2 (ifnz (< ($ 1) ($ 2)) (+ ($ 1) ($ 2)) ( * ($ 1) ($ 2))))"
  : string

The above examples assume you have opened the Condex structure via open Condex, and have also executed:

Control.Print.printDepth := 100
Control.Print.stringDepth := 1000;

Your Task

yourAccountName-Condex.sml also contains the skeleton of a Condex interpreter eval function similar to the one defined for Intex. All you need to do is to replace the stubs for the RelApp, And, Or and Ifnz clauses of eval so that they return the correct integer result as specified by the semantics of Condex.

Notes:

You may define helper functions(s) if you want.
The easiest way to test your eval function is to evaluate the function call testInterp(), which will run the evaluator on numerous test cases involving the .cdx program files described above.
- If you see a result like "Passed all 61 test cases", it means your eval function has passed all the test cases. (You can scroll up to see the details.)
- If you see a result that begins with something like "Failed 4 of 61 test cases ...", you should scroll up searching for the string ***ERROR, which will give details on the test program and arguments causing the failure, along with the expected and actual results.

The easiest way to test your own examples is to launch a Condex Read/Eval/Print loop via repl(). For example:

- repl();

condex> (< 3 4)
1

condex> (> 3 4)
0

condex> (ifnz (< 1 2) (+ 3 4) ( * 5 6))
7

condex> (ifnz (> 1 2) (+ 3 4) ( * 5 6))
30

condex> (and 4 5)
5

condex> (or 4 5)
4 

condex> (or 7 (% 8 0))
7

condex> (#args 8 5 9)

condex> (ifnz (< $1 $2) (+ $2 $3) (- $2 $3))
~4

condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 5 6)
5

condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 3 6)
6

condex> (#run min3.cdx 42 17 57)
17

condex> (#quit)
Moriturus te saluto!
val it = () : unit

You can call testInterp() and repl() without the Condex. qualifier, but if you want to access values from the Condex module, you will either need to (1) explicitly qualify them with Condex. or (2) open the Condex structure via open Condex.

5. Static Stack Depth in the Intex-to-PostFix Compiler (21 points)

The intexToPostFix function discussed in Lec 30 automatically translates an Intex program to a PostFix program that has the same meaning. (See the fleshed out version of this function in these solution slides or the file /cs251/sml/intex/IntexToPostFix.sml,)

The challenging part of this translation is determining the correct index to use for an Nget command when translating an Arg node in an Intext AST. The key to solving this technical problem is to provide the expToCmds helper function with a depth argument that statically tracks the number of values that have been pushed on the PostFix stack above the argument values at any point of the computation.

The purpose of this problem is for you to study this idea in the context of a particular Intex program so that you better understand it.

(3 points) Draw the abstract syntax tree for the following Intex program, which we shall call P_Intex.
```
(intex 4 (/ (+ $1 
               (* $2
                  (- $3 $4)))
            (% (- (* $4 $3)
                  $2)
               $1)))
```
For drawing the Intex AST, use the conventions illustrated in slides 3 and 4 of the the Lec 29 Intex slides.

You may draw the tree by hand or use a drawing program, whichever is easier for you. This will be a wide diagram, so you’ll want to draw it in landscape mode rather than portrait mode. To save space, do not label the edges (i.e., no numargs, body, rator, rand1, rand2, index, or value labels should be used).

Since you will be annotating this tree with extra information in parts b and d, leave enough room to add some annotations to the right of each node. Here is an example of an annotated node:
(4 points) In your AST for P_Intex from part a, annotate each expression node with the depth associated with that node when it is encountered by the expToCmds function within the intexToPostFix function. As shown above, you should write depth:d to the right of each expression node, where d is the depth of the node.
(6 points) Write down the PostFix program that is the result of translating the Intex program P_Intex from part a using the intexToPostFix function. We shall call this program P_PostFix. (You should do this translation by hand, not by running the function in SML.) Important: You translated PostFix code must be exactly the code that would be generated by the intexToPostFix function, not just a PostFix program that happens to have the same behavior. Why? Because the purpose of this problem is to understand how the intexToPostFix function (and, in particular, its depth argument) work.

Use comments to explain the commands in the translated program. Here is an example of commenting the PostFix code that is generated by translating the Intex program example (intex 4 (* (- $1 $2) (/ $3 $4))) in Lec 30.

(postfix 4 ; stack starts with ($1 $2 $3 $4)
   1 nget ; stack now ($1 $1 $2 $3 $4)
   3 nget ; stack now ($2 $1 $1 $2 $3 $4)
          ; index 3 needed to copy $2 because extra $1 on stack
   sub ; stack now ($1-$2 $1 $2 $3 $4)
   4 nget ; stack now ($3 $1-$2 $1 $2 $3 $4)
          ; index 4 needed to copy $3 because extra val on stack
   6 nget ; stack now ($4 $3 $1-$2 $1 $2 $3 $4)
          ; index 6 needed to copy $4 because 2 extra vals on stack
   div ; stack now ($3/$4 $1-$2 $1 $2 $3 $4)
   mul ; stack now (($1*$2)*($3/$4) $1 $2 $3 $4)

(8 points) Consider running P_Intex on the four arguments [4, 9, 8, 2]. In your AST for P_Intex from part a, annotate each expression node with the list of integers that corresponds to the PostFix stack that will be encountered just before the last command translated for that node by expToCmds is executed in the PostFix program P_PostFix. As shown above, you should write the stack as an SML-style list to the right of the node below the depth.

For example, below is the result of annotating the Intex AST for (intex 4 (* (- $1 $2) (/ $3 $4))) with depth and inital and final stack information as described below.

Some things to observe:
- Only Intex expression nodes (ArithApp and Arg nodes in this example) have annotations. The arithop nodes Sub Div, and Mul are not annotated, nor are the Arg index values or the top-level Intex program node.
- Every expression node has three annotations:
  1. The depth of the node in the Intex–to-PostFix compilation process. This is the sum of the number of right operands of Arithop nodes in which the expression appears.
  2. The initial stack (is) right before executing the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right pre-order walk over the tree. For the sample tree, this order is: (1) the ArithApp node with Mul, (2) the ArithApp node with Sub, (3) the Arg 1 node, (4) the Arg 2 node, (5) the ArithApp node with Div, (6)) the Arg 3 node, (7) the Arg 4 node,
  3. The final stack (fs) right after executing all of the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right post-order walk over the tree. For the sample tree, this order is: (1) the Arg 1 node, (2) the Arg 2 node, (3) the ArithApp node with Sub, (4)) the Arg 3 node, (5) the Arg 4 node, (2) the ArithApp node with Div, (5) the ArithApp node with Mul
- At each Arg node in the program, the corresponding PostFix nget stack index of that argument in the stack annotating the node is the Arg index value plus the depth annotating the node. In particular:
  - The Arg node with index 1 has depth 0, and so translates to 1 nget.
  - The Arg node with index 2 has depth 1, and so translates to 3 nget.
  - The Arg node with index 3 has depth 1, and so translates to 4 nget.
  - The Arg node with index 4 has depth 2, and so translates to 6 nget.
- It is important to observe the key invariant guaranteed by the Intex-to-PostFix compiler at every expression node in the Intex AST: The final stack is the result of pushing exactly one value on the initial stack: the result of evaluating the Intex expression.

6. Compiling Condex to PostFix (22 points)

In this problem you will extend the IntexToPostFix compiler from Lec 30 to compile the Condex language from Problem 4 to PostFix.

Starting this Problem

Rename the starter file CondexToPostFix.sml in ~/cs251/sml/ps6 to yourAccountName-CondexToPostFix.sml.

In your renamed file yourAccountName-CondexToPostFix.sml, edit this line near the top of the file

use "Condex.sml"; (* semi-colon required! *)

to be

use "yourAccountName-Condex.sml"; (* semi-colon required! *)

It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.

Problem 6a: Translating an Example Condex Program to PostFix By Hand (7 points)

The following is the divRemTest.cdx program discussed in Problem 4.

(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))

It is imperative that you understand how this Condex program should be translated into PostFix before you attempt to flesh out the RelApp and Ifnz cases of expToCmds in the next subproblem.

For this part, write the exact PostFix program that should be generated by the the condexToPosFix function when given this Condex program. You should annotate the commands with explanatory comments as described in Problem 5c.

You should also show the sequence of all intermediate stacks for executing your PostFix program on the argument lists from the following examples:

condex> (#run divRemTest.cdx 6 17 0)
5

condex> (#run divRemTest.cdx 0 -18 3)
~6

Of course, your PostFix program should give the same result as the Condex interpreter.

Problem 6b: Fleshing out the CondexToPostFix Skeleton (15 points)

You should complete Problems 4 and 6b before starting this subproblem.

The file yourAccountName-CondexToPostFix.sml contains the skeleton of a Condex to Postfix compiler that is very similar to the one studied in Lec 30 for Intex.
The core function in the compiler is expToCmds, which has the type

val expToCmds: Condex.exp -> int -> PostFix.cmd list *)

It takes a Condex expression and a depth argument that tracks the number of values that have been pushed above the program arguments on the stack when expToCmds is called (this is used as an offset for the index when compiling a program argument). expToCmds returns a list of PostFix commands.

The Int, Arg, and ArithApp cases of expToCmds are already correctly implemented. What you need to do is flesh out the cases for RelApp, Ifnz, And, and Or.

Notes:

Flesh out the RelApp and Ifnz clauses of expToCmds so that they implement a generalized version of your strategy from Problem 6a.
Do not start to flesh out clauses for Condex’s And and Or constructs until after you have succesfully implemented the Ifnz construct, since understanding the compilation of Ifnz is essential to understaning the compilation of And and Or.
You will need to develop a strategy for correctly compiling Condex’s And and Or expressions to PostFix:
- It is rather challenging to translate these constructs into PostFix in a way that preserves their short-circuit behavior, so you will get most of the credit for translating these constructs if they have the right behavior except for short-circuiting.
- When compiling Condex’s Or construct to PostFix, try to avoid duplicating the calculation of the first operand expression. (You will still receive substantial credit if you duplicate this calculation.)
- Hints for handling short-circuting correctly:
  - Review how Racket desugars itsand and or constructs into if expressions, and have your CondexToPostFix translation do something similar.
  - Translate the test programs shortCircuitAndTest.cdx and shortCircuitOrTest.cdx by hand into PostFix to better understand how short-circuting should be handled in the translation.
You should explicitly use qualified names beginning with Condex. and PostFix. to distinguish function values from these two structures.
To test your expToCmds, evaluate the function call testComp(). This will do the following for the Condex .cdx programs in the ps6 directory:
- Compile the Condex program into a PostFix program using the condexToPostFix function, which depends on your changes to the expToCmds function.
- For various arguments lists, run both the original Condex program (using your Condex interpreter from Problem 4) and the compiled PostFix program (using the SML PostFix interpreter we studied in class) on the argument lists and test that the outputs are the same.
- If you see a result like "Passed all 61 test cases", it means your compiler has passed all the test cases. (You can scroll up to see the details.)
- If you see a result that begins with something like "Failed 4 of 37 test cases ...", you should scroll up searching for the string ***ERROR, which will give details on the test program and arguments causing the failure, along with the different results from the Condex and PostFix interpreters. The testing output also displays the PostFix program that results from the compilation process for each Condex program. You should carefully study the PostFix programs output from the compiler as part of debugging.

Extra Credit 1: PostFix Iterations (10 points)

This problem is optional. You should only attempt it after completing all the other problems.

(3 points) Study and test the following mystery PostFix program of one argument, which is provided near the end of ps6-postfix-starter.rkt. Describe the function that it calculates on its one argument, and give a brief, high-level description of how it calculates that function.
```
;; What function does this compute? 
(define mystery
  '(postfix 1
            1
            (swap dup 0 eq 
              (swap)
              (swap 2 nget mul swap 1 sub swap 3 vget exec)
              sel exec)
            3 rot 3 vget exec))
```
Note: vget is a command that is like nget, except that it can return any value (including an executable sequence), not just an integer. It has been included in your ps6-postfix-starter.rkt file for use in this extra credit problem.
(7 points) Write a PostFix program that takes a single argument (assumed to be a nonnegative integer n) and iteratively calculates the nth Fibonacci number. Full credit will be given only if you use constant stack space; i.e., calculating the Fibonacci of a larger number should not end up with a final stack that has more elements than for a smaller number.

Extra Credit 2: Bootstrapping (14 points)

This problem is usually a second part of Problem 3 Metaprogramming Derivations on this Problem Set 6. However, because it is challenging, I have decided to make this part an optional extra credit problem rather than a required part.

In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:

(6 points) Stage II of the paper describes how a C compiler can be “taught” to recognize '\v' as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:

the usual C programming language, which does not include the special character '\v';
C+\v = an extension to C in which '\v' is treated as the vertical tab character (which has ASCII value 11).

Suppose we are given the following:

a C-to-binary compiler machine (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
a C+\v-to-binary-compiler-in-C program (Figure 2.3 in Thompson’s paper);
a C+\v-to-binary-compiler-in-C+\v program (what should be Figure 2.2 in Thompson’s paper);
a machine that executes the given type of binary code (i.e., a binary interpreter machine)

Construct a metalanguage derivation proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.

(8 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:

a C-to-binary-compiler machine (again, just a “black box” we are given);
a C-to-binary-compiler-in-C program without Trojan Horses (Figure 3.1 in Thompson’s paper);
a C-to-binary-compiler_TH-in-C program with two Trojan Horses (Figure 3.3 in Thompson’s paper);
a login-in-C program with no Trojan Horse;
a binary-based computer (i.e., a binary interpreter machine)

The subscript _TH indicates that a program contains a Trojan Horse. A C-to-binary compiler_TH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:

The Trojan Horse Rule (TH)

$\cfrac{ P\text{-in-C program} \qquad \text{C-to-binary-compiler}_{TH}\ \text{machine} }{ P_{TH}\text{-in-binary} } (TH)$

Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program login_TH-in-binary that is the result of compiling the uninfected login-in-C program with a C compiler that is itself the result of compiling the uninfected C-to-binary-compiler-in-C program. The interesting thing about this derivation is that login_TH-in-binary is infected with a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C program) contains no code for inserting Trojan horses!

Notes:

Although only 3 pages long, the Thompson paper is very dense and challenging. Don’t be surprised if it requires multiple readings to ``get’’ the ideas, which are profound.
Double-check your derivations to make sure that they make sense and actually describe what’s being said In Thompson’s Paper.

Extra Credit Problem 3: Self-printing Program (12 points)

In his Turing-lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a self-printing program. Write a self-printing program in any general-purpose programming language you choose.

Notes

Your program should be self-contained — it should not read anything from the file system.
All output must be characters printed by a command like print. In Racket, for instance, the program/result cant just be a quoted Racket list.
IMPORTANT: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Write this program from scratch completely on your own, without looking anything up online.