PS6: I Never Metalanguage I Didn't Like
-
Changes from the original PS6 description are highlighed in magenta.
-
Dueish: Mon Mar 29 (followed by a 2-day grace period)
- Notes:
- This is the last regular pset, but there will be two more Solo Assignments, E & F.
- This pset includes numerous extra credit problems, some of which cover material from the last few lectures that is not covered by the regular problems on this pset.
- As with all regular psets, you can work on this problem in a two-person team or as an individual, but partnering is strongly recommended. Unless you get a special dispensation from Lyn, you need to work with a different partner than you did in your previous two partnerships. Use this Pset Partner Doc to find a partner and record your partnership.
- This pset has 110 total points.
- Much of the material on this pset has not yet been covered in class. Here are some notes
to help you plan when you can start the problems:
- You can do Problem 1 now based on the Lec 24 material on s-expressions
- You will be able to do Problem 2 after the Mon Mar 22 Lec 26 on a PosfFix interpreter in Racket
- You will be able to do Problem 3 after the Wed Mar 24 Lec 28 on metaprogramming derivations
- You will be able to do Problem 5a after the Thu Mar 25 Lec 29 on an Intex interpreter in SML
- You will be able to do Problems 4 and 5b after the Fri Mar 26 Lec 30 on compiling Intex to PostFix
-
Times (in hours) from some previous semesters:
The following times do not yet include the times for Problems 3, 4, and 6
Times Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Total average time (hours) 0.5 2.5 1.9 median time (hours) 0.5 2.4 1.8 25% took more than 0.7 3.0 2.0 10% took more than 0.8 3.0 3.0 -
How to Start PS6
- Follow the instructions in the GDoc CS251-S21-T3 Pset Submission Instructions for creating your PS5 Google Doc. Put all written answers and a copy of all code into this PS5 GDoc. If you are working with a partner, only one of you needs to create this document, but you should link it from both of your List Docs.
-
Submission:
- For all parts of all problems, include all answers (including Racket and SML code and metaprogramming derivations)) in your PS6 google doc. Format Racket and SML code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
- For Problem 1:
- Be sure that your
deep-reversedefinition inyourAccountName-ps6-deep-reverse.rktalso appears in your PS6 GDoc. - Drop a copy of your
yourAccountName-ps6-deep-reverse.rktin your~/cs251/drop/ps06drop folder oncs.wellesley.edu.
- Be sure that your
- For Problem 2:
- Include the modified parts of
yourAccountName-ps6-postfix.rktin your PS6 GDoc. (You only need to include the modified parts, not the entire contents of the PostFix interpreter!) - Drop a copy of your
yourAccountName-ps6-postfix.rktin your~/cs251/drop/ps06drop folder oncs.wellesley.edu.
- Include the modified parts of
- For Problem 3, include your metaprogramming derivations. It is easiest to write these top-down via nested bullets, as shown in both the Problem 3 description and the lecture notes.
- For Problem 4:
- Include the definition of
evaland any helper functions in your PS6 GDoc. -
Drop a copy of your
~/cs251/sml/ps6/yourAccountName-Condex.smlfile in your~/cs251/drop/ps06drop folder oncs.wellesley.eduby executing the following (replacing all three occurrences ofgdomeby your cs server username):scp -r ~/cs251/sml/ps6/gdome-Condex.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06
- Include the definition of
- For Problem 5:
- For parts a, b, and d, include the drawing of a single AST for
P_Intex(from part a) that is annotated with depth information (from part b) and stack information (from part d). - For part c, include the commented PostFix program
P_PostFixthat is the result of translatingP_Intexto PostFix.
- For parts a, b, and d, include the drawing of a single AST for
- For Problem 6:
-
For Part a, include the annotated PostFix program that should result from compiling
divRemTest.cdx, along with the sequence of all intermediate stacks when executing the PostFix program on the two given argument sequences. -
For Part b:
- Include the definition of
expToCmdsand any helper functions in your PS6 GDoc. -
Drop a copy of your
~/cs251/sml/ps6/yourAccountName-CondexToPostFix.smlfile in your~/cs251/drop/ps06drop folder oncs.wellesley.eduby executing the following (replacing all three occurrences ofgdomeby your cs server username):scp -r ~/cs251/sml/ps6/gdome-CondexToPostFix.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06
- Include the definition of
-
1. Deep Reverse (6 points)
We saw in Lec 24 that tree recursion on trees represented as s-expressions can be expressed rather elegantly. For example:
(define (atom? x)
(or (number? x) (boolean? x) (string? x) (symbol? x)))
(define (sexp-num-atoms sexp)
(if (atom? sexp)
1
(foldr + 0 (map sexp-num-atoms sexp))))
> (sexp-num-atoms '((a (b c) d) e (((f) g h) i j k)))
11
> (sexp-num-atoms '(a b c d))
4
> (sexp-num-atoms 'a)
1
> (sexp-num-atoms '())
0
(define (sexp-atoms sexp)
(if (atom? sexp)
(list sexp)
(foldr append null (map sexp-atoms sexp))))
> (sexp-atoms '((a (b c) d) e (((f) g h) i j k)))
'(a b c d e f g h i j k)
> (sexp-atoms '(a b c d))
'(a b c d)
> (sexp-atoms 'a)
'(a)
> (sexp-atoms '())
'()In this problem, you will define a function (deep-reverse sexp) that returns a new s-expression in which the order of the children at every node of the s-expression tree sexp is reversed.
> (deep-reverse '((a (b c) d) e (((f) g h) i j k)))
'((k j i (h g (f))) e (d (c b) a))
> (deep-reverse '(a b c d))
'(d c b a)
> (deep-reverse 'a)
'a
> (deep-reverse '())
'()Notes:
-
Begin with this starter file ps6-deep-reverse-starter.rkt, which you should rename to
yourAccountName-ps6-deep-reverse.rkt. Add your definition ofdeep-reverseto this file. -
Your definition should have form similar to the definitions for
sexp-num-atomsandsexp-atoms, but you’ll want to use something other thanfoldr. -
You are not allowed to use Racket’s
reversefunction in your definition.
2. Extending PostFix (32 points)
Lec 26 on Mon Mar 22 covers several different versions of an interpreter for the PostFix language implemented in Racket. This problem involves starting with the following version of the interpreter:
This is a slightly modified version of the file postfix-transform-fancy.rkt studied in lecture.
Begin by making a copy of ps6-postfix-starter.rkt named yourAccountName-ps6-postfix.rkt and load this into Dr. Racket. Near the bottom of this file is the following PostFix program named sos (for “sum of squares”). Racket’s semi-colon comments have been used to explain the commands in the program:
;; Sum-of-squares program
(define sos
'(postfix 2 ; let's call the arguments a and b, from top down
1 nget ; stack is now (a a b)
mul ; stack is now (a*a b)
swap ; stack now (b a*a)
1 nget ; stack now (b b a*a)
mul ; stack now (b*b a*a)
add ; stack now (a*a+b*b); top (only) value is returned by program
))Let’s run the program on the arguments 5 and 12:
> (postfix-run sos '(5 12))
About to execute commands (1 nget mul swap 1 nget mul add) on stack (5 12)
after executing 1, stack is (1 5 12)
after executing nget, stack is (5 5 12)
after executing mul, stack is (25 12)
after executing swap, stack is (12 25)
after executing 1, stack is (1 12 25)
after executing nget, stack is (12 12 25)
after executing mul, stack is (144 25)
after executing add, stack is (169)
169Note that the stack that results from executing each command on the previous stack is displayed, line by line. This behavior is controlled by the display-steps? flag at the top of the program:
(define display-steps? #t)If we set the flag to #f, the intermediate stack display will be turned off:
(define display-steps? #f)> (postfix-run sos '(5 12))
169Turn the display-steps? flag on when it’s helpful to understand or debug a PostFix program.
-
(10 points)
Consider the following Racket function
gthat is defined near the bottom of the PostFix interpreter file:(define (g a b c) (- c (if (= 0 (remainder a 2)) (quotient b (- a c)) (* (+ b c) a))))In this part, your goal is to flesh out the definition of a three-argument PostFix program
pfgthat performs the same calculation asgon three arguments:(define pfg '(postfix 3 ;; Flesh out and comment the commands in this PostFix program ))Here are some examples with a correctly defined
pfg:> (apply g '(10 2 8)) 7 > (postfix-run pfg '(10 2 8)) 7 > (apply g '(-7 2 3)) 38 > (postfix-run pfg '(-7 2 3)) 38 > (apply g '(5 4 5)) -40 > (postfix-run pfg '(5 4 5)) -40Notes:
-
You must comment command chunks in your
pfgprogram to explain how they affect the stack. Use a commenting style similar to that insos. -
You have been provided with a testing function
(test-pfg)that will test yourpfgfunction on 5 sets of arguments:;; Tests on an incorrect definition of pfg: > (test-pfg) Testing pfg on (10 2 8): ***different results*** g: 7 pfg: -7 Testing pfg on (11 2 8): ***different results*** g: -102 pfg: 102 Testing pfg on (-6 3 8): ***different results*** g: 8 pfg: -8 Testing pfg on (-7 2 3): ***different results*** g: 38 pfg: -38 Testing pfg on (5 4 5): ***different results*** g: -40 pfg: 40 ;; Tests on a correct definition of pfg: > (test-pfg) Testing pfg on (10 2 8): same result for g and pfg = 7 Testing pfg on (11 2 8): same result for g and pfg = -102 Testing pfg on (-6 3 8): same result for g and pfg = 8 Testing pfg on (-7 2 3): same result for g and pfg = 38 Testing pfg on (5 4 5): same result for g and pfg = -40
-
-
(7 points) Extend PostFix by adding the following two commands:
-
exp: given a stacki_base i_expt ...(wherei_baseandi_exptare integers), replacesi_baseandi_exptby the result of raisingi_baseto the power ofi_expt(or 0 ifi_exptis negative). -
chs: given a stacki_n i_k ...(wherei_kandi_nare nonnegative integers andi_k<=i_n), replacesi_nandi_kby the value ofi_nchoosei_k. (See the definition of “choose” notation here). If one or both ofi_nandi_kare invalid arguments, displays an error message (sees examples below).
For example:
> (postfix-run '(postfix 0 2 3 exp) '()) 8 > (postfix-run '(postfix 0 3 2 exp) '()) 9 > (postfix-run '(postfix 0 5 3 exp) '()) 125 > (postfix-run '(postfix 0 3 5 exp) '()) 243 > (postfix-run '(postfix 0 0 5 exp) '()) 0 > (postfix-run '(postfix 0 5 0 exp) '()) 1 > (postfix-run '(postfix 0 5 -1 exp) '()) 0 > (postfix-run '(postfix 0 3 -5 exp) '()) 0 > (postfix-run '(postfix 0 6 0 chs) '()) 1 > (postfix-run '(postfix 0 6 1 chs) '()) 6 > (postfix-run '(postfix 0 6 2 chs) '()) 15 > (postfix-run '(postfix 0 6 3 chs) '()) 20 > (postfix-run '(postfix 0 6 4 chs) '()) 15 > (postfix-run '(postfix 0 6 5 chs) '()) 6 > (postfix-run '(postfix 0 6 6 chs) '()) 1 > (postfix-run '(postfix 0 6 7 chs) '()) ERROR: invalid operands for chs (6 7) > (postfix-run '(postfix 0 6 -2 chs) '()) ERROR: invalid operands for chs (6 -2) > (postfix-run '(postfix 0 -6 3 chs) '()) ERROR: invalid operands for chs (-6 3)Notes:
-
Do not modify
postfix-exec-commandfor this part. Instead, just add two bindings to the listpostfix-arithops. -
The Racket built-in
exptfunction is helpful. -
Use a factorial function (we’ve defined many in class!) to implement
chs.
-
-
(4 points) Extend PostFix by adding the following three commands:
leis likelt, but checks “less than or equal to” rather than “less than”geis likegt, but checks “greater than or equal to” rather than “greater than”andexpects two integers at the top of the stack. It replaces them by 0 if either is 0; otherwise it replaces them by 1.
For example:
> (postfix-run '(postfix 0 4 5 le) '()) 1 > (postfix-run '(postfix 0 5 5 le) '()) 1 > (postfix-run '(postfix 0 5 4 le) '()) 0 > (postfix-run '(postfix 0 4 5 ge) '()) 0 > (postfix-run '(postfix 0 4 4 ge) '()) 1 > (postfix-run '(postfix 0 5 4 ge) '()) 1 > (postfix-run '(postfix 0 0 0 and) '()) 0 > (postfix-run '(postfix 0 0 1 and) '()) 0 > (postfix-run '(postfix 0 1 0 and) '()) 0 > (postfix-run '(postfix 0 0 17 and) '()) 0 > (postfix-run '(postfix 0 17 0 and) '()) 0 > (postfix-run '(postfix 0 1 1 and) '()) 1 > (postfix-run '(postfix 0 1 17 and) '()) 1 > (postfix-run '(postfix 0 17 17 and) '()) 1 > (postfix-run '(postfix 0 17 23 and) '()) 1Notes:
-
Do not modify
postfix-exec-commandfor this part. Instead, just add three bindings to the listpostfix-relops. -
The testing function
(test-5c)will test all ofle,ge, andandin the context of the single PostFix programtest-sorted:(define test-sorted '(postfix 3 ; let's call the arguments a, b, and c, from top down 2 nget le ; is a <= b? 3 nget 3 nget ge ; is c >= b? and ; is a <= b and c >= b? )) > (test-5c) ; Uses the test-sorted program in its definition Trying test-sorted on (4 5 6): works as expected; result = 1 Trying test-sorted on (4 5 5): works as expected; result = 1 Trying test-sorted on (4 4 5): works as expected; result = 1 Trying test-sorted on (4 4 4): works as expected; result = 1 Trying test-sorted on (4 6 5): works as expected; result = 0 Trying test-sorted on (5 6 4): works as expected; result = 0 Trying test-sorted on (5 4 6): works as expected; result = 0 Trying test-sorted on (6 5 4): works as expected; result = 0 Trying test-sorted on (6 4 5): works as expected; result = 0 Trying test-sorted on (5 5 4): works as expected; result = 0 Trying test-sorted on (5 4 4): works as expected; result = 0 -
A very common issue students encounter with attempted solutions to
andis that it always returns1in the test cases forand, even though the code has cases that return0or1. This issue is rooted in not understanding how the Racket function associated with the relop symbol is proceessed. Here are some questions to ask yourself in order to debug this issues should you encounter it:-
Study the functions associated with the other relop symbols. What kind of value do they return? Why?
-
What does the function
postfix-relop->racket-binopdo in the PostFix interpreter implementation?
A lesson here is that you need to truly understand the code you are modifying before modifying it!
-
-
(2 points) Extend PostFix with a
dupcommand that duplicates the top element of the stack (which can be either an integer or executable sequence). For example:(define sos-dup '(postfix 2 dup mul swap dup mul add)) > (postfix-run sos-dup '(3 4)) About to execute commands (dup mul swap dup mul add) on stack (3 4) after executing dup, stack is (3 3 4) after executing mul, stack is (9 4) after executing swap, stack is (4 9) after executing dup, stack is (4 4 9) after executing mul, stack is (16 9) after executing add, stack is (25) 25 (define cmd-dup '(postfix 1 (dup dup mul add swap) dup 3 nget swap exec exec pop)) > (postfix-run cmd-dup '(4)) About to execute commands ((dup dup mul add swap) dup 3 nget swap exec exec pop) on stack (4) after executing (dup dup mul add swap), stack is ((dup dup mul add swap) 4) after executing dup, stack is ((dup dup mul add swap) (dup dup mul add swap) 4) after executing 3, stack is (3 (dup dup mul add swap) (dup dup mul add swap) 4) after executing nget, stack is (4 (dup dup mul add swap) (dup dup mul add swap) 4) after executing swap, stack is ((dup dup mul add swap) 4 (dup dup mul add swap) 4) About to execute commands (dup dup mul add swap) on stack (4 (dup dup mul add swap) 4) after executing dup, stack is (4 4 (dup dup mul add swap) 4) after executing dup, stack is (4 4 4 (dup dup mul add swap) 4) after executing mul, stack is (16 4 (dup dup mul add swap) 4) after executing add, stack is (20 (dup dup mul add swap) 4) after executing swap, stack is ((dup dup mul add swap) 20 4) after executing exec, stack is ((dup dup mul add swap) 20 4) About to execute commands (dup dup mul add swap) on stack (20 4) after executing dup, stack is (20 20 4) after executing dup, stack is (20 20 20 4) after executing mul, stack is (400 20 4) after executing add, stack is (420 4) after executing swap, stack is (4 420) after executing exec, stack is (4 420) after executing pop, stack is (420) 420 > (postfix-run '(postfix 0 dup) '()) ERROR: dup requires a nonempty stack ()Notes:
-
Implement
dupby adding acondclause topostfix-exec-command. -
Test your
dupimplementation using the above test cases. Yourdupimplementation should ensure there is at least one value on the stack, and give an appropriate error message when there isn’t.
-
-
(9 points) In this part you will extend PostFix with a
rotcommand that behaves as follows.-
The top stack value should be a positive integer rotlen that we’ll call the rotation length. Assume there are n values v1, …, vn below rotlen on the stack, where n is greater than or equal to rotlen. Let’s refer to the first rotlen of these n values as the prefix of the n values, and any remaining values as the suffix of the n values.
-
The result of performing the
rotcommand is to ``rotate’’ the prefix by moving v1 to the end of the prefix. That is, the value v1 is moved to index rotlen in the prefix, and each of the values v2 through vrotlen is shifted down one index (i.e., moved one position to the left) so that they now occupy the indices 1 through rotlen-1. So afterrotis performed, the elements of the prefix are v2 … vrotlen v1. For example, if rotlen is 2, thenrotacts likeswap, and if rotlen is 1, thenrotleaves the stack unchanged. -
The values in the suffix are unchanged by
rot. -
When rotlen is 0, a negative integer, or a noninteger, the
rotcommand signals an error.
Here are examples involving
rot:(define rot-test '(postfix 6 4 rot 3 rot 2 rot)) > (postfix-run rot-test '(8 7 6 5 9 10)) About to execute commands (4 rot 3 rot 2 rot) on stack (8 7 6 5 9 10) after executing 4, stack is (4 8 7 6 5 9 10) after executing rot, stack is (7 6 5 8 9 10) after executing 3, stack is (3 7 6 5 8 9 10) after executing rot, stack is (6 5 7 8 9 10) after executing 2, stack is (2 6 5 7 8 9 10) after executing rot, stack is (5 6 7 8 9 10) 5 > (postfix-run '(postfix 3 (mul add) rot) '(5 6 7)) About to execute commands ((mul add) rot) on stack (5 6 7) after executing (mul add), stack is ((mul add) 5 6 7) ERROR: rot length must be a positive integer but is (mul add) > (postfix-run '(postfix 3 -1 rot) '(5 6 7)) About to execute commands (-1 rot) on stack (5 6 7) after executing -1, stack is (-1 5 6 7) ERROR: rot length must be a positive integer but is -1 > (postfix-run '(postfix 3 4 rot) '(5 6 7)) About to execute commands (4 rot) on stack (5 6 7) after executing 4, stack is (4 5 6 7) ERROR: not enough stack values for rot (4 5 6 7) > (postfix-run '(postfix 0 rot) '()) About to execute commands (rot) on stack () ERROR: rot requires a nonempty stack but is ()Notes:
-
Implement
rotby adding acondclause topostfix-exec-command. -
Racket supplies a
list-tailfunction that is very helpful for implementingrot:> (list-tail '(10 20 30 40 50) 0) '(10 20 30 40 50) > (list-tail '(10 20 30 40 50) 1) '(20 30 40 50) > (list-tail '(10 20 30 40 50) 2) '(30 40 50) > (list-tail '(10 20 30 40 50) 3) '(40 50) > (list-tail '(10 20 30 40 50) 4) '(50) > (list-tail '(10 20 30 40 50) 5) '()Racket does not provide a similar
list-headfunction, but I have provided it in theps6-postfix-starter.rktfile. It works this way:> (list-head '(10 20 30 40 50) 0) '() > (list-head '(10 20 30 40 50) 1) '(10) > (list-head '(10 20 30 40 50) 2) '(10 20) > (list-head '(10 20 30 40 50) 3) '(10 20 30) > (list-head '(10 20 30 40 50) 4) '(10 20 30 40) > (list-head '(10 20 30 40 50) 5) '(10 20 30 40 50) -
Test your
rotimplementation using the above test cases. Yourrotimplementation should give appropriate error messages in various situations, like those in the test cases.
-
3. Metaprogramming Derivations (14 points)
This problem involves the notion of proof-like derivations that show how programming languages are implemented in terms of interpreters and compilers. These derivations are covered in Lec 28 on Wed Mar 24.
Deriving Implementations
As presented in slides 14 to 19 of the Lec 28 Metaprogramming slides there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:
The Interpreter Rule (I)
The Translator Rule (T)
In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “S-to-T translator machine” as an “S-to-T translator” or “S-to-T compiler”. We will also often elide the word “program”; e.g., we will refer to a “P-in-L program” as “P-in-L”. So an “L interpreter” means an “L interpreter machine” while a “S-in-L interpreter” means an “S-in-L interpreter program”. Similar remarks hold for translators (also called compilers): an “S-to-T translator” means an “S-to-T translator machine”, while a “S-to-T-in-L translator” means an “S-to-T-in-L translator program”.
Example
The Metaprogramming slides consider an example with the following elements:
- a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
- an HTML-interpreter-in-C program (i.e., a web browser written in C)
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
- a x86 computer (i.e., an x86 interpreter machine)
They can be used to build a 251 web page display machine as follows:
Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.
Writing Derivations
Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:
- 251-web-page-in-HTML program
- HTML interpreter machine (I)
- HTML-interpreter-in-x86 program (T)
- HTML-interpreter-in-C program
- C-to-x86 compiler machine (I)
- C-to-x86-compiler-in-x86 program
- x86 computer
- x86 computer
Your Tasks
- (7 points) Suppose you are given the following:
- A P-in-Java program
- a Java-to-C-compiler-in-Racket program (i.e., a Java-to-C compiler written in Racket);
- a Racket-interpreter-in-x86 program (i.e., a Racket interpreter written in x86 machine code);
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86 code);
- an x86 computer (i.e., an x86 interpreter machine).
Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the P-in-Java program on the computer. That is, show how to construct a P machine.
- (7 points) Suppose you are given:
- a GraphViz-in-C program
- a C-to-LLVM-compiler-in-x86 program
- an LLVM-to-JavaScript-compiler-in-x86 program
- a JavaScript-interpreter-in-ARM program
- an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
- an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)
Show how to construct a GraphViz machine.
Notes:
-
In the above two parts, it is not possible to derive a “C interpreter”, a “Java interpreter”, or an “LLVM interpreter”, so if you need or use such a part in your derivation, you’ve done something wrong.
-
As shown in the above examples, you should use an (I) or (T) to indicate a conclusion that results from the Interpreter or Translator rule, respectively. Double check that the conclusion is actually justified by the rule from two nested bullets below the conclusion.
4. Interpreting Condex, an Extension to Intex (15 points)
Starting this Problem
This problem involves files in the ~/cs251/sml/ps6 directory in your csenv Virtual Machine.
To create this directory, execute these Linux commands in a shell on your csenv appliance:
cd ~/cs251/sml
git pull origin masterIf you encounter any issues when executing these commands, you can troubleshoot by following these git instructions. If these instructions don’t help, please contact Lyn.
Begin this problem by renaming the starter file Condex.sml in
~/cs251/sml/ps6 to yourAccountName-Condex.sml</code.</span>
It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.
REMINDER: ALWAYS MAKE A BACKUP OF YOUR .sml FILES AT THE END OF EACH DAY TO AVOID POTENTIAL LOSS OF WORK.
The Condex Language
Both Problems 4 and 6 involve the Condex language, which is an extension to Intex. As in Intex, all values in Condex are integers. But Condex includes the follow extra features:
-
Condex includes the relational operators
<,=, and>. Since all Condex values are integers, these operators return 1 in the true case and 0 in the false case. This is similar to relational operators in PostFix. -
Condex includes the conditional expression
(ifnz E_test E_then E_else)whereifnzstands forif-non-zeroand is pronouncediffenz. The meaning of the ifnz expression is as follows:First, the
E_testexpression is evaluated to an integer valueV_test.-
If
V_testis non-zero, then theE_thenexpression is evaluated, and its value is returned as the value of theifnzexpression; -
If
V_testis zero, then theE_elseexpression is evaluated, and its value is returned as the value of theifnzexpression; -
If an error is encountered during any of these evaluations, evaluating the
ifnzexpression gives the same error.
As with Racket’s
ifexpresssion, Condex’sifnzevaluates at most two subexpressions.E_testis always evaluated, and if that evaluation does not give an error, exactly one ofE_thenorE_elseis evaluated. -
-
Condex includes the short-circuit logical operations
(and E_1 E2)and(or E_1 E2), which are similar to their Racket counterparts except they treat0as the only falsey value and all other integers as truthy. In particular:-
If
E_1evaluates to0, then:(and E_1 E2)evaluates to0without evaluatingE_2(or E_1 E2)evaluates to the value ofE_2
-
If
E_1evaluates to a nonzero valueV_1, then:(and E_1 E2)evaluates to the value ofE_2(or E_1 E2)evaluates the value ofV_1without evaluatingE_2(whereE_1is only evaluated toV_1only once)
-
Condex programs are like Intex programs, except they begin with the
symbol condex rather than intex. Your ~/cs251/sml/ps6 directory
contains several sample Condex programs (with extension .cdx) that
we will now discuss:
-
The program in the file
abs.cdxcalculates the absolute value of its single argument:(condex 1 (ifnz (< $1 0) (- 0 $1) $1)) -
The program
simpleIf.cdxusesifnzin a simple conditional test:(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))For example, running this program on the arguments 3 and 4 should return 7, but calling it on 4 and 3 should return 12.
-
The program
min3.cdxreturns the smallest of its three arguments:(condex 3 (ifnz (< $1 $2) (ifnz (< $1 $3) $1 $3) (ifnz (< $2 $3) $2 $3)))For example, running this program on three arguments 4, 5, and 6 (in any order) should return 4.
-
The program
sumRelations.cdxdoesn’t contain any occurrences ofifnz, but uses the fact that relational operations return 0 or 1 to sum the value of three such operations:(condex 4 (+ (< $1 $2) (+ (= $1 $3) (> $1 $4))))For example, running this program on the four arguments 5, 6, 5, 4 returns 3, because
(< 5 6),(= 5 5), and(> 5 4)all return 1. But running it on the four arguments 5, 3, 7, 4 returns 1, because(< 5 3)and(= 5 7)return 0 and only(> 5 4)returns 1. -
Similar to Racket’s
if,ifnzshould evaluate at most one of the two branch expressions. In order to test this, we need an example in which branch expressions can generate errors in some cases. For this purpose, thedivRemTest.cdxprogram below has one branch with a division and another with a remainder that can generate errors when their second operands are zero:(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))-
running this program on the arguments 0, -8, and 2 should return -4 from the then branch (but not execute the else branch expression
(% -8 0), which would encounter a remainder-by-zero error). -
running this program on the arguments 5, 8, and 0 should return 3 from the else branch (but not execute the then branch expression
(/ 8 0), which would encounter a divide-by-zero error).
However, running it on the arguments 3, 2, 0 would encounter a divide-by-zero error in the then branch, and the arguments 0, 4, 5 would enounter a remainder-by-zero error in the else branch.
This
divRemTest.cdxprogram will be a particularly relevant example when we consider compiling Condex to PostFix in Problem 6. -
-
Problem 3a of this PS6, considers another Racket function,
g, for translation into PostFix(define (g a b c) (- c (if (= 0 (remainder a 2)) (quotient b (- a c)) (* (+ b c) a))))The Condex program prob3a-g.cdx expresses this function in Condex:
(condex 3 (- $3 (ifnz (= 0 (% $1 2)) (/ $2 (- $1 $3)) (* (+ $2 $3) $1))))Some relevant test cases from Problem 3a are that the three arguments 10, 2, and 8 yield the result 7, and the argments 5, 3, 5 yield the result -40. (See the comments in the file for more test cases.)
-
The
nestedIfs.cdxprogram is a complex program that usesifnzexpressions as the test, then, and else subexpressions of an outerifnzexpression.(condex 3 (ifnz (ifnz (< $1 $2) (< $1 $3) (> $1 $3)) (ifnz (= $2 $3) (* $2 $2) (% $2 $3)) (ifnz (> $2 $3) (/ $2 $3) (- $3 $2))))See the comments of this file for several test cases.
-
The
andTest.cdxandorTest.cdxprograms test the simple behavior of Condex’sandandoroperations:(condex 2 (and $1 $2)) ; This is andTest.cdx (condex 2 (or $1 $2)) ; This is orTest.cdxThe following table shows the results of these programs on various inputs. Note that, as in Racket, values other than false (i.e., 0) and true (i.e., 1) can be returned by
andandor.$1 $2 result of andTest.cdx result of orTest.cdx 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 2 0 2 3 0 0 3 4 5 5 4 -
The
shortCircuitAndTest.cdxandshortCircuitOrTest.cdxprograms test the short-circuit behavior of Condex’sandandoroperations:(condex 2 (and $1 (/ 17 $2))) ; This is shortCircuitAndTest.cdx (condex 2 (or $1 (% 42 $2))); This is shortCircuitOrTest.cdxThe following tables shows the results of these programs on various inputs.
$1 $2 result of shortCircuitAndTest.cdx Note 0 2 0 0 0 0 Illustrates short-circuit behavior 4 5 5 Returns result that’s not 0 or 1 6 0 Error: Tried to divide 17 by 0 Triggers divide-by-zero error $1 $2 result of shortCircuitOrTest.cdx Note 0 9 6 Returns result that’s not 0 or 1 4 5 4 Returns result that’s not 0 or 1 7 0 5 Illustrates short-circuit behavior 0 0 Error: Tried to remainder 42 by 0 Triggers remainder-by-zero error
In the rest of this problem, your goal will be to flesh out the skeleton a Condex interpreter implemented in SML. The above programs will be used in test cases that your programs behave as expected.
SML Sum-of-product and S-expression Syntax for Condex
In the ps6 directory on your csenv appliance, your file
yourAccountName-Condex.sml contains the definition of the
SML representation for Condex syntax. This syntax is
is very similar to that for the Intex interpreter
we studied in class (and which can be found in ~/cs251/sml/Intex.sml).
Here are the datatypes and helper functions used to represent Condex program trees:
datatype pgm = Condex of int * exp
and exp = Int of int
| Arg of int
| ArithApp of arithop * exp * exp
| RelApp of relop * exp * exp (* New in Condex: relational operations *)
| And of exp * exp (* New in Condex: short-circuit conjunction *)
| Or of exp * exp (* New in Condex: short-circuit disjunction *)
| Ifnz of exp * exp * exp (* New in Condex: conditional with E_test, E_then, E_else *)
and arithop = Add | Sub | Mul | Div | Rem
and relop = Lt | Eq | Gt (* <, =, > *)
and arithopToFun Add = op+
| arithopToFun Mul = op*
| arithopToFun Sub = op-
| arithopToFun Div =
(fn(x,y) => if y = 0 then
raise EvalError ("Tried to divide " ^ (Int.toString x) ^ " by 0")
else
x div y)
| arithopToFun Rem =
(fn(x,y) => if y = 0 then
raise EvalError ("Tried to remainder " ^ (Int.toString x) ^ " by 0")
else
x mod y)
(* New in Condex interpreter *)
and relopToFun Lt = op<
| relopToFun Eq = (fn(a,b) => a = b)
| relopToFun Gt = op>
(* New in Condex interpreter *)
and boolToInt false = 0
| boolToInt true = 1Here are some comparisons with Intex worth noting:
-
Programs are constructed via the
Condex(notIntex) constructor; -
Int,Arg,ArithApp, andarithopare unchanged fromIntex, as are the `arithop -
The
RelAppconstructor and thereloptype are new in Condex; they represent relational operations (<,=,>). -
The
Ifnzconstructor is new in Condex and represents theifnzconstruct. -
The
AndandOrconstructors are new in Condex and represent theandandorconstructs, respectively -
Condexs
arithopToFunhelper function is exactly the same as inIntex. -
The new helper function
relopToFundefines the meaning of therelopoperators. -
The new helper function
boolToIntis also handy in this problem.
Additionally, yourAccountName-Condex.sml contains functions for
parsing s-expression strings into, and unparsing s-expression strings from,
the sum-of-product notation. For example:
- val ifnzExp = stringToExp "(ifnz (< $1 6) (and (< $1 0) (> $2 10)) (or (= $1 10) (> $1 $2)))";
val ifnzExp =
Ifnz
(RelApp (Lt,Arg 1,Int 6),
And (RelApp (Lt,Arg 1,Int 0),RelApp (Gt,Arg 2,Int 10)),
Or (RelApp (Eq,Arg 1,Int 10),RelApp (Gt,Arg 1,Arg 2))) : exp
- expToString ifnzExp;
val it =
"(ifnz (< ($ 1) 6)\n (and (< ($ 1) 0) (> ($ 2) 10))\n (or (= ($ 1) 10) (> ($ 1) ($ 2)))\n )"
: string
- print(expToString ifnzExp);
(ifnz (< ($ 1) 6)
(and (< ($ 1) 0) (> ($ 2) 10))
(or (= ($ 1) 10) (> ($ 1) ($ 2)))
)val it = () : unit
- val simpleIfPgm = stringToPgm "(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))";
val simpleIfPgm =
Condex
(2,
Ifnz
(RelApp (Lt,Arg 1,Arg 2),ArithApp (Add,Arg 1,Arg 2),
ArithApp (Mul,Arg 1,Arg 2))) : pgm
- pgmToString simpleIfPgm;
val it = "(condex 2 (ifnz (< ($ 1) ($ 2)) (+ ($ 1) ($ 2)) ( * ($ 1) ($ 2))))"
: stringThe above examples assume you have opened the Condex structure via open Condex,
and have also executed:
Control.Print.printDepth := 100Control.Print.stringDepth := 1000;
Your Task
yourAccountName-Condex.sml also contains the
skeleton of a Condex interpreter eval function similar to the one defined for
Intex. All you need to do is to replace the stubs for the
RelApp, And, Or and Ifnz clauses of eval so that they return the correct
integer result as specified by the semantics of Condex.
Notes:
-
You may define helper functions(s) if you want.
-
The easiest way to test your
evalfunction is to evaluate the function calltestInterp(), which will run the evaluator on numerous test cases involving the.cdxprogram files described above.-
If you see a result like
"Passed all 61 test cases", it means yourevalfunction has passed all the test cases. (You can scroll up to see the details.) -
If you see a result that begins with something like
"Failed 4 of 61 test cases ...", you should scroll up searching for the string***ERROR, which will give details on the test program and arguments causing the failure, along with the expected and actual results.
-
-
The easiest way to test your own examples is to launch a Condex Read/Eval/Print loop via
repl(). For example:- repl(); condex> (< 3 4) 1 condex> (> 3 4) 0 condex> (ifnz (< 1 2) (+ 3 4) ( * 5 6)) 7 condex> (ifnz (> 1 2) (+ 3 4) ( * 5 6)) 30 condex> (and 4 5) 5 condex> (or 4 5) 4 condex> (or 7 (% 8 0)) 7 condex> (#args 8 5 9) condex> (ifnz (< $1 $2) (+ $2 $3) (- $2 $3)) ~4 condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 5 6) 5 condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 3 6) 6 condex> (#run min3.cdx 42 17 57) 17 condex> (#quit) Moriturus te saluto! val it = () : unit- You can call
testInterp()andrepl()without theCondex.qualifier, but if you want to access values from the Condex module, you will either need to (1) explicitly qualify them withCondex.or (2) open theCondexstructure viaopen Condex.
- You can call
5. Static Stack Depth in the Intex-to-PostFix Compiler (21 points)
The intexToPostFix function discussed in Lec 30 automatically translates an Intex program to a
PostFix program that has the same meaning.
(See the fleshed out version of this function in these solution
slides
or the file /cs251/sml/intex/IntexToPostFix.sml,)
The challenging part of this translation is determining the correct index to use for an Nget
command when translating an Arg node in an Intext AST. The key to solving this technical problem
is to provide the expToCmds helper function with a depth argument that statically tracks the
number of values that have been pushed on the PostFix stack above the argument values at any point
of the computation.
The purpose of this problem is for you to study this idea in the context of a particular Intex program so that you better understand it.
-
(3 points) Draw the abstract syntax tree for the following Intex program, which we shall call
P_Intex.(intex 4 (/ (+ $1 (* $2 (- $3 $4))) (% (- (* $4 $3) $2) $1)))For drawing the Intex AST, use the conventions illustrated in slides 3 and 4 of the the Lec 29 Intex slides.
You may draw the tree by hand or use a drawing program, whichever is easier for you. This will be a wide diagram, so you’ll want to draw it in landscape mode rather than portrait mode. To save space, do not label the edges (i.e., no
numargs,body,rator,rand1,rand2,index, orvaluelabels should be used).Since you will be annotating this tree with extra information in parts b and d, leave enough room to add some annotations to the right of each node. Here is an example of an annotated node:

-
(4 points) In your AST for
P_Intexfrom part a, annotate each expression node with thedepthassociated with that node when it is encountered by theexpToCmdsfunction within theintexToPostFixfunction. As shown above, you should write depth:d to the right of each expression node, where d is the depth of the node. -
(6 points) Write down the PostFix program that is the result of translating the Intex program
P_Intexfrom part a using theintexToPostFixfunction. We shall call this programP_PostFix. (You should do this translation by hand, not by running the function in SML.) Important: You translated PostFix code must be exactly the code that would be generated by theintexToPostFixfunction, not just a PostFix program that happens to have the same behavior. Why? Because the purpose of this problem is to understand how theintexToPostFixfunction (and, in particular, itsdepthargument) work.
Use comments to explain the commands in the translated program. Here is an example of commenting
the PostFix code that is generated by translating the Intex program example
(intex 4 (* (- $1 $2) (/ $3 $4))) in Lec 30.
(postfix 4 ; stack starts with ($1 $2 $3 $4)
1 nget ; stack now ($1 $1 $2 $3 $4)
3 nget ; stack now ($2 $1 $1 $2 $3 $4)
; index 3 needed to copy $2 because extra $1 on stack
sub ; stack now ($1-$2 $1 $2 $3 $4)
4 nget ; stack now ($3 $1-$2 $1 $2 $3 $4)
; index 4 needed to copy $3 because extra val on stack
6 nget ; stack now ($4 $3 $1-$2 $1 $2 $3 $4)
; index 6 needed to copy $4 because 2 extra vals on stack
div ; stack now ($3/$4 $1-$2 $1 $2 $3 $4)
mul ; stack now (($1*$2)*($3/$4) $1 $2 $3 $4)-
(8 points) Consider running
P_Intexon the four arguments[4, 9, 8, 2]. In your AST forP_Intexfrom part a, annotate each expression node with the list of integers that corresponds to the PostFix stack that will be encountered just before the last command translated for that node byexpToCmdsis executed in the PostFix programP_PostFix. As shown above, you should write the stack as an SML-style list to the right of the node below the depth.For example, below is the result of annotating the Intex AST for
(intex 4 (* (- $1 $2) (/ $3 $4)))with depth and inital and final stack information as described below.
Some things to observe:
-
Only Intex expression nodes (
ArithAppandArgnodes in this example) have annotations. ThearithopnodesSubDiv, andMulare not annotated, nor are theArgindex values or the top-levelIntexprogram node. -
Every expression node has three annotations:
-
The depth of the node in the Intex–to-PostFix compilation process. This is the sum of the number of right operands of
Arithopnodes in which the expression appears. -
The initial stack (is) right before executing the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right pre-order walk over the tree. For the sample tree, this order is: (1) the
ArithAppnode withMul, (2) theArithAppnode withSub, (3) theArg 1node, (4) theArg 2node, (5) theArithAppnode withDiv, (6)) theArg 3node, (7) theArg 4node, -
The final stack (fs) right after executing all of the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right post-order walk over the tree. For the sample tree, this order is: (1) the
Arg 1node, (2) theArg 2node, (3) theArithAppnode withSub, (4)) theArg 3node, (5) theArg 4node, (2) theArithAppnode withDiv, (5) theArithAppnode withMul
-
-
At each
Argnode in the program, the corresponding PostFixngetstack index of that argument in the stack annotating the node is theArgindex value plus thedepthannotating the node. In particular:-
The
Argnode with index 1 has depth 0, and so translates to1 nget. -
The
Argnode with index 2 has depth 1, and so translates to3 nget. -
The
Argnode with index 3 has depth 1, and so translates to4 nget. -
The
Argnode with index 4 has depth 2, and so translates to6 nget.
-
-
It is important to observe the key invariant guaranteed by the Intex-to-PostFix compiler at every expression node in the Intex AST: The final stack is the result of pushing exactly one value on the initial stack: the result of evaluating the Intex expression.
-
6. Compiling Condex to PostFix (22 points)
In this problem you will extend the IntexToPostFix compiler from Lec 30
to compile the Condex language from Problem 4 to PostFix.
Starting this Problem
-
Rename the starter file
CondexToPostFix.smlin~/cs251/sml/ps6toyourAccountName-CondexToPostFix.sml. -
In your renamed file
yourAccountName-CondexToPostFix.sml, edit this line near the top of the fileuse "Condex.sml"; (* semi-colon required! *)to be
use "yourAccountName-Condex.sml"; (* semi-colon required! *)
It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.
Problem 6a: Translating an Example Condex Program to PostFix By Hand (7 points)
The following is the divRemTest.cdx program discussed in Problem 4.
(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))It is imperative that you understand how this Condex program should be
translated into PostFix before you attempt to flesh out the
RelApp and Ifnz cases of expToCmds in the next subproblem.
For this part, write the exact PostFix program that should be
generated by the the condexToPosFix function when given this
Condex program. You should annotate the commands with explanatory
comments as described in Problem 5c.
You should also show the sequence of all intermediate stacks for executing your PostFix program on the argument lists from the following examples:
condex> (#run divRemTest.cdx 6 17 0)
5
condex> (#run divRemTest.cdx 0 -18 3)
~6Of course, your PostFix program should give the same result as the Condex interpreter.
Problem 6b: Fleshing out the CondexToPostFix Skeleton (15 points)
You should complete Problems 4 and 6b before starting this subproblem.
The file yourAccountName-CondexToPostFix.sml
contains the skeleton of a Condex to Postfix compiler that is
very similar to the one studied in Lec 30 for Intex.
The core function in the compiler is expToCmds, which has the type
val expToCmds: Condex.exp -> int -> PostFix.cmd list *)It takes a Condex expression and a depth argument that tracks the
number of values that have been pushed above the program arguments
on the stack when expToCmds is called (this is used as an offset
for the index when compiling a program argument). expToCmds returns
a list of PostFix commands.
The Int, Arg, and ArithApp cases of expToCmds are already correctly implemented.
What you need to do is flesh out the cases for RelApp, Ifnz, And, and Or.
Notes:
-
Flesh out the
RelAppandIfnzclauses ofexpToCmdsso that they implement a generalized version of your strategy from Problem 6a. -
Do not start to flesh out clauses for Condex’s
AndandOrconstructs until after you have succesfully implemented theIfnzconstruct, since understanding the compilation ofIfnzis essential to understaning the compilation ofAndandOr. -
You will need to develop a strategy for correctly compiling Condex’s
AndandOrexpressions to PostFix:-
It is rather challenging to translate these constructs into PostFix in a way that preserves their short-circuit behavior, so you will get most of the credit for translating these constructs if they have the right behavior except for short-circuiting.
-
When compiling Condex’s
Orconstruct to PostFix, try to avoid duplicating the calculation of the first operand expression. (You will still receive substantial credit if you duplicate this calculation.) -
Hints for handling short-circuting correctly:
-
Review how Racket desugars its
andandorconstructs intoifexpressions, and have your CondexToPostFix translation do something similar. -
Translate the test programs
shortCircuitAndTest.cdxandshortCircuitOrTest.cdxby hand into PostFix to better understand how short-circuting should be handled in the translation.
-
-
-
You should explicitly use qualified names beginning with
Condex.andPostFix.to distinguish function values from these two structures. -
To test your
expToCmds, evaluate the function calltestComp(). This will do the following for the Condex.cdxprograms in theps6directory:-
Compile the Condex program into a PostFix program using the
condexToPostFixfunction, which depends on your changes to theexpToCmdsfunction. -
For various arguments lists, run both the original Condex program (using your Condex interpreter from Problem 4) and the compiled PostFix program (using the SML PostFix interpreter we studied in class) on the argument lists and test that the outputs are the same.
-
If you see a result like
"Passed all 61 test cases", it means your compiler has passed all the test cases. (You can scroll up to see the details.) -
If you see a result that begins with something like
"Failed 4 of 37 test cases ...", you should scroll up searching for the string***ERROR, which will give details on the test program and arguments causing the failure, along with the different results from the Condex and PostFix interpreters. The testing output also displays the PostFix program that results from the compilation process for each Condex program. You should carefully study the PostFix programs output from the compiler as part of debugging.
-
Extra Credit 1: PostFix Iterations (10 points)
This problem is optional. You should only attempt it after completing all the other problems.
-
(3 points) Study and test the following
mysteryPostFix program of one argument, which is provided near the end ofps6-postfix-starter.rkt. Describe the function that it calculates on its one argument, and give a brief, high-level description of how it calculates that function.;; What function does this compute? (define mystery '(postfix 1 1 (swap dup 0 eq (swap) (swap 2 nget mul swap 1 sub swap 3 vget exec) sel exec) 3 rot 3 vget exec))Note:
vgetis a command that is likenget, except that it can return any value (including an executable sequence), not just an integer. It has been included in yourps6-postfix-starter.rktfile for use in this extra credit problem. -
(7 points) Write a PostFix program that takes a single argument (assumed to be a nonnegative integer n) and iteratively calculates the nth Fibonacci number. Full credit will be given only if you use constant stack space; i.e., calculating the Fibonacci of a larger number should not end up with a final stack that has more elements than for a smaller number.
Extra Credit 2: Bootstrapping (14 points)
This problem is usually a second part of Problem 3 Metaprogramming Derivations on this Problem Set 6. However, because it is challenging, I have decided to make this part an optional extra credit problem rather than a required part.
In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:
- (6 points) Stage II of the paper describes how a C compiler can be “taught” to recognize
'\v'as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:
-
the usual C programming language, which does not include the special character
'\v'; -
C+\v = an extension to C in which
'\v'is treated as the vertical tab character (which has ASCII value 11).
Suppose we are given the following:
-
a C-to-binary compiler machine (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
-
a C+\v-to-binary-compiler-in-C program (Figure 2.3 in Thompson’s paper);
-
a C+\v-to-binary-compiler-in-C+\v program (what should be Figure 2.2 in Thompson’s paper);
-
a machine that executes the given type of binary code (i.e., a binary interpreter machine)
Construct a metalanguage derivation proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.
- (8 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:
-
a C-to-binary-compiler machine (again, just a “black box” we are given);
-
a C-to-binary-compiler-in-C program without Trojan Horses (Figure 3.1 in Thompson’s paper);
-
a C-to-binary-compilerTH-in-C program with two Trojan Horses (Figure 3.3 in Thompson’s paper);
-
a login-in-C program with no Trojan Horse;
-
a binary-based computer (i.e., a binary interpreter machine)
The subscript TH indicates that a program contains a Trojan Horse. A C-to-binary compilerTH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:
The Trojan Horse Rule (TH)
Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program loginTH-in-binary that is the result of compiling the uninfected login-in-C program with a C compiler that is itself the result of compiling the uninfected C-to-binary-compiler-in-C program. The interesting thing about this derivation is that loginTH-in-binary is infected with a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C program) contains no code for inserting Trojan horses!
Notes:
-
Although only 3 pages long, the Thompson paper is very dense and challenging. Don’t be surprised if it requires multiple readings to ``get’’ the ideas, which are profound.
-
Double-check your derivations to make sure that they make sense and actually describe what’s being said In Thompson’s Paper.
Extra Credit Problem 3: Self-printing Program (12 points)
In his Turing-lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a self-printing program. Write a self-printing program in any general-purpose programming language you choose.
Notes
-
Your program should be self-contained — it should not read anything from the file system.
-
All output must be characters printed by a command like
print. In Racket, for instance, the program/result cant just be a quoted Racket list. -
IMPORTANT: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Write this program from scratch completely on your own, without looking anything up online.