PS6: I Never Metalanguage I Didn't Like
-
Changes from the original PS6 description are highlighed in magenta.
-
Dueish: Mon Mar 29 (followed by a 2-day grace period)
- Notes:
- This is the last regular pset, but there will be two more Solo Assignments, E & F.
- This pset includes numerous extra credit problems, some of which cover material from the last few lectures that is not covered by the regular problems on this pset.
- As with all regular psets, you can work on this problem in a two-person team or as an individual, but partnering is strongly recommended. Unless you get a special dispensation from Lyn, you need to work with a different partner than you did in your previous two partnerships. Use this Pset Partner Doc to find a partner and record your partnership.
- This pset has 110 total points.
- Much of the material on this pset has not yet been covered in class. Here are some notes
to help you plan when you can start the problems:
- You can do Problem 1 now based on the Lec 24 material on s-expressions
- You will be able to do Problem 2 after the Mon Mar 22 Lec 26 on a PosfFix interpreter in Racket
- You will be able to do Problem 3 after the Wed Mar 24 Lec 28 on metaprogramming derivations
- You will be able to do Problem 5a after the Thu Mar 25 Lec 29 on an Intex interpreter in SML
- You will be able to do Problems 4 and 5b after the Fri Mar 26 Lec 30 on compiling Intex to PostFix
-
Times (in hours) from some previous semesters:
The following times do not yet include the times for Problems 3, 4, and 6
Times Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Total average time (hours) 0.5 2.5 1.9 median time (hours) 0.5 2.4 1.8 25% took more than 0.7 3.0 2.0 10% took more than 0.8 3.0 3.0 -
How to Start PS6
- Follow the instructions in the GDoc CS251-S21-T3 Pset Submission Instructions for creating your PS5 Google Doc. Put all written answers and a copy of all code into this PS5 GDoc. If you are working with a partner, only one of you needs to create this document, but you should link it from both of your List Docs.
-
Submission:
- For all parts of all problems, include all answers (including Racket and SML code and metaprogramming derivations)) in your PS6 google doc. Format Racket and SML code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
- For Problem 1:
- Be sure that your
deep-reverse
definition inyourAccountName-ps6-deep-reverse.rkt
also appears in your PS6 GDoc. - Drop a copy of your
yourAccountName-ps6-deep-reverse.rkt
in your~/cs251/drop/ps06
drop folder oncs.wellesley.edu
.
- Be sure that your
- For Problem 2:
- Include the modified parts of
yourAccountName-ps6-postfix.rkt
in your PS6 GDoc. (You only need to include the modified parts, not the entire contents of the PostFix interpreter!) - Drop a copy of your
yourAccountName-ps6-postfix.rkt
in your~/cs251/drop/ps06
drop folder oncs.wellesley.edu
.
- Include the modified parts of
- For Problem 3, include your metaprogramming derivations. It is easiest to write these top-down via nested bullets, as shown in both the Problem 3 description and the lecture notes.
- For Problem 4:
- Include the definition of
eval
and any helper functions in your PS6 GDoc. -
Drop a copy of your
~/cs251/sml/ps6/yourAccountName-Condex.sml
file in your~/cs251/drop/ps06
drop folder oncs.wellesley.edu
by executing the following (replacing all three occurrences ofgdome
by your cs server username):scp -r ~/cs251/sml/ps6/gdome-Condex.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06
- Include the definition of
- For Problem 5:
- For parts a, b, and d, include the drawing of a single AST for
P_Intex
(from part a) that is annotated with depth information (from part b) and stack information (from part d). - For part c, include the commented PostFix program
P_PostFix
that is the result of translatingP_Intex
to PostFix.
- For parts a, b, and d, include the drawing of a single AST for
- For Problem 6:
-
For Part a, include the annotated PostFix program that should result from compiling
divRemTest.cdx
, along with the sequence of all intermediate stacks when executing the PostFix program on the two given argument sequences. -
For Part b:
- Include the definition of
expToCmds
and any helper functions in your PS6 GDoc. -
Drop a copy of your
~/cs251/sml/ps6/yourAccountName-CondexToPostFix.sml
file in your~/cs251/drop/ps06
drop folder oncs.wellesley.edu
by executing the following (replacing all three occurrences ofgdome
by your cs server username):scp -r ~/cs251/sml/ps6/gdome-CondexToPostFix.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/ps06
- Include the definition of
-
1. Deep Reverse (6 points)
We saw in Lec 24 that tree recursion on trees represented as s-expressions can be expressed rather elegantly. For example:
(define (atom? x)
(or (number? x) (boolean? x) (string? x) (symbol? x)))
(define (sexp-num-atoms sexp)
(if (atom? sexp)
1
(foldr + 0 (map sexp-num-atoms sexp))))
> (sexp-num-atoms '((a (b c) d) e (((f) g h) i j k)))
11
> (sexp-num-atoms '(a b c d))
4
> (sexp-num-atoms 'a)
1
> (sexp-num-atoms '())
0
(define (sexp-atoms sexp)
(if (atom? sexp)
(list sexp)
(foldr append null (map sexp-atoms sexp))))
> (sexp-atoms '((a (b c) d) e (((f) g h) i j k)))
'(a b c d e f g h i j k)
> (sexp-atoms '(a b c d))
'(a b c d)
> (sexp-atoms 'a)
'(a)
> (sexp-atoms '())
'()
In this problem, you will define a function (deep-reverse sexp)
that returns a new s-expression in which the order of the children at every node of the s-expression tree sexp
is reversed.
> (deep-reverse '((a (b c) d) e (((f) g h) i j k)))
'((k j i (h g (f))) e (d (c b) a))
> (deep-reverse '(a b c d))
'(d c b a)
> (deep-reverse 'a)
'a
> (deep-reverse '())
'()
Notes:
-
Begin with this starter file ps6-deep-reverse-starter.rkt, which you should rename to
yourAccountName-ps6-deep-reverse.rkt
. Add your definition ofdeep-reverse
to this file. -
Your definition should have form similar to the definitions for
sexp-num-atoms
andsexp-atoms
, but you’ll want to use something other thanfoldr
. -
You are not allowed to use Racket’s
reverse
function in your definition.
2. Extending PostFix (32 points)
Lec 26 on Mon Mar 22 covers several different versions of an interpreter for the PostFix language implemented in Racket. This problem involves starting with the following version of the interpreter:
This is a slightly modified version of the file postfix-transform-fancy.rkt studied in lecture.
Begin by making a copy of ps6-postfix-starter.rkt
named yourAccountName-ps6-postfix.rkt
and load this into Dr. Racket. Near the bottom of this file is the following PostFix program named sos
(for “sum of squares”). Racket’s semi-colon comments have been used to explain the commands in the program:
;; Sum-of-squares program
(define sos
'(postfix 2 ; let's call the arguments a and b, from top down
1 nget ; stack is now (a a b)
mul ; stack is now (a*a b)
swap ; stack now (b a*a)
1 nget ; stack now (b b a*a)
mul ; stack now (b*b a*a)
add ; stack now (a*a+b*b); top (only) value is returned by program
))
Let’s run the program on the arguments 5 and 12:
> (postfix-run sos '(5 12))
About to execute commands (1 nget mul swap 1 nget mul add) on stack (5 12)
after executing 1, stack is (1 5 12)
after executing nget, stack is (5 5 12)
after executing mul, stack is (25 12)
after executing swap, stack is (12 25)
after executing 1, stack is (1 12 25)
after executing nget, stack is (12 12 25)
after executing mul, stack is (144 25)
after executing add, stack is (169)
169
Note that the stack that results from executing each command on the previous stack is displayed, line by line. This behavior is controlled by the display-steps?
flag at the top of the program:
(define display-steps? #t)
If we set the flag to #f
, the intermediate stack display will be turned off:
(define display-steps? #f)
> (postfix-run sos '(5 12))
169
Turn the display-steps?
flag on when it’s helpful to understand or debug a PostFix program.
-
(10 points)
Consider the following Racket function
g
that is defined near the bottom of the PostFix interpreter file:(define (g a b c) (- c (if (= 0 (remainder a 2)) (quotient b (- a c)) (* (+ b c) a))))
In this part, your goal is to flesh out the definition of a three-argument PostFix program
pfg
that performs the same calculation asg
on three arguments:(define pfg '(postfix 3 ;; Flesh out and comment the commands in this PostFix program ))
Here are some examples with a correctly defined
pfg
:> (apply g '(10 2 8)) 7 > (postfix-run pfg '(10 2 8)) 7 > (apply g '(-7 2 3)) 38 > (postfix-run pfg '(-7 2 3)) 38 > (apply g '(5 4 5)) -40 > (postfix-run pfg '(5 4 5)) -40
Notes:
-
You must comment command chunks in your
pfg
program to explain how they affect the stack. Use a commenting style similar to that insos
. -
You have been provided with a testing function
(test-pfg)
that will test yourpfg
function on 5 sets of arguments:;; Tests on an incorrect definition of pfg: > (test-pfg) Testing pfg on (10 2 8): ***different results*** g: 7 pfg: -7 Testing pfg on (11 2 8): ***different results*** g: -102 pfg: 102 Testing pfg on (-6 3 8): ***different results*** g: 8 pfg: -8 Testing pfg on (-7 2 3): ***different results*** g: 38 pfg: -38 Testing pfg on (5 4 5): ***different results*** g: -40 pfg: 40 ;; Tests on a correct definition of pfg: > (test-pfg) Testing pfg on (10 2 8): same result for g and pfg = 7 Testing pfg on (11 2 8): same result for g and pfg = -102 Testing pfg on (-6 3 8): same result for g and pfg = 8 Testing pfg on (-7 2 3): same result for g and pfg = 38 Testing pfg on (5 4 5): same result for g and pfg = -40
-
-
(7 points) Extend PostFix by adding the following two commands:
-
exp
: given a stacki_base i_expt ...
(wherei_base
andi_expt
are integers), replacesi_base
andi_expt
by the result of raisingi_base
to the power ofi_expt
(or 0 ifi_expt
is negative). -
chs
: given a stacki_n i_k ...
(wherei_k
andi_n
are nonnegative integers andi_k
<=i_n
), replacesi_n
andi_k
by the value ofi_n
choosei_k
. (See the definition of “choose” notation here). If one or both ofi_n
andi_k
are invalid arguments, displays an error message (sees examples below).
For example:
> (postfix-run '(postfix 0 2 3 exp) '()) 8 > (postfix-run '(postfix 0 3 2 exp) '()) 9 > (postfix-run '(postfix 0 5 3 exp) '()) 125 > (postfix-run '(postfix 0 3 5 exp) '()) 243 > (postfix-run '(postfix 0 0 5 exp) '()) 0 > (postfix-run '(postfix 0 5 0 exp) '()) 1 > (postfix-run '(postfix 0 5 -1 exp) '()) 0 > (postfix-run '(postfix 0 3 -5 exp) '()) 0 > (postfix-run '(postfix 0 6 0 chs) '()) 1 > (postfix-run '(postfix 0 6 1 chs) '()) 6 > (postfix-run '(postfix 0 6 2 chs) '()) 15 > (postfix-run '(postfix 0 6 3 chs) '()) 20 > (postfix-run '(postfix 0 6 4 chs) '()) 15 > (postfix-run '(postfix 0 6 5 chs) '()) 6 > (postfix-run '(postfix 0 6 6 chs) '()) 1 > (postfix-run '(postfix 0 6 7 chs) '()) ERROR: invalid operands for chs (6 7) > (postfix-run '(postfix 0 6 -2 chs) '()) ERROR: invalid operands for chs (6 -2) > (postfix-run '(postfix 0 -6 3 chs) '()) ERROR: invalid operands for chs (-6 3)
Notes:
-
Do not modify
postfix-exec-command
for this part. Instead, just add two bindings to the listpostfix-arithops
. -
The Racket built-in
expt
function is helpful. -
Use a factorial function (we’ve defined many in class!) to implement
chs
.
-
-
(4 points) Extend PostFix by adding the following three commands:
le
is likelt
, but checks “less than or equal to” rather than “less than”ge
is likegt
, but checks “greater than or equal to” rather than “greater than”and
expects two integers at the top of the stack. It replaces them by 0 if either is 0; otherwise it replaces them by 1.
For example:
> (postfix-run '(postfix 0 4 5 le) '()) 1 > (postfix-run '(postfix 0 5 5 le) '()) 1 > (postfix-run '(postfix 0 5 4 le) '()) 0 > (postfix-run '(postfix 0 4 5 ge) '()) 0 > (postfix-run '(postfix 0 4 4 ge) '()) 1 > (postfix-run '(postfix 0 5 4 ge) '()) 1 > (postfix-run '(postfix 0 0 0 and) '()) 0 > (postfix-run '(postfix 0 0 1 and) '()) 0 > (postfix-run '(postfix 0 1 0 and) '()) 0 > (postfix-run '(postfix 0 0 17 and) '()) 0 > (postfix-run '(postfix 0 17 0 and) '()) 0 > (postfix-run '(postfix 0 1 1 and) '()) 1 > (postfix-run '(postfix 0 1 17 and) '()) 1 > (postfix-run '(postfix 0 17 17 and) '()) 1 > (postfix-run '(postfix 0 17 23 and) '()) 1
Notes:
-
Do not modify
postfix-exec-command
for this part. Instead, just add three bindings to the listpostfix-relops
. -
The testing function
(test-5c)
will test all ofle
,ge
, andand
in the context of the single PostFix programtest-sorted
:(define test-sorted '(postfix 3 ; let's call the arguments a, b, and c, from top down 2 nget le ; is a <= b? 3 nget 3 nget ge ; is c >= b? and ; is a <= b and c >= b? )) > (test-5c) ; Uses the test-sorted program in its definition Trying test-sorted on (4 5 6): works as expected; result = 1 Trying test-sorted on (4 5 5): works as expected; result = 1 Trying test-sorted on (4 4 5): works as expected; result = 1 Trying test-sorted on (4 4 4): works as expected; result = 1 Trying test-sorted on (4 6 5): works as expected; result = 0 Trying test-sorted on (5 6 4): works as expected; result = 0 Trying test-sorted on (5 4 6): works as expected; result = 0 Trying test-sorted on (6 5 4): works as expected; result = 0 Trying test-sorted on (6 4 5): works as expected; result = 0 Trying test-sorted on (5 5 4): works as expected; result = 0 Trying test-sorted on (5 4 4): works as expected; result = 0
-
A very common issue students encounter with attempted solutions to
and
is that it always returns1
in the test cases forand
, even though the code has cases that return0
or1
. This issue is rooted in not understanding how the Racket function associated with the relop symbol is proceessed. Here are some questions to ask yourself in order to debug this issues should you encounter it:-
Study the functions associated with the other relop symbols. What kind of value do they return? Why?
-
What does the function
postfix-relop->racket-binop
do in the PostFix interpreter implementation?
A lesson here is that you need to truly understand the code you are modifying before modifying it!
-
-
(2 points) Extend PostFix with a
dup
command that duplicates the top element of the stack (which can be either an integer or executable sequence). For example:(define sos-dup '(postfix 2 dup mul swap dup mul add)) > (postfix-run sos-dup '(3 4)) About to execute commands (dup mul swap dup mul add) on stack (3 4) after executing dup, stack is (3 3 4) after executing mul, stack is (9 4) after executing swap, stack is (4 9) after executing dup, stack is (4 4 9) after executing mul, stack is (16 9) after executing add, stack is (25) 25 (define cmd-dup '(postfix 1 (dup dup mul add swap) dup 3 nget swap exec exec pop)) > (postfix-run cmd-dup '(4)) About to execute commands ((dup dup mul add swap) dup 3 nget swap exec exec pop) on stack (4) after executing (dup dup mul add swap), stack is ((dup dup mul add swap) 4) after executing dup, stack is ((dup dup mul add swap) (dup dup mul add swap) 4) after executing 3, stack is (3 (dup dup mul add swap) (dup dup mul add swap) 4) after executing nget, stack is (4 (dup dup mul add swap) (dup dup mul add swap) 4) after executing swap, stack is ((dup dup mul add swap) 4 (dup dup mul add swap) 4) About to execute commands (dup dup mul add swap) on stack (4 (dup dup mul add swap) 4) after executing dup, stack is (4 4 (dup dup mul add swap) 4) after executing dup, stack is (4 4 4 (dup dup mul add swap) 4) after executing mul, stack is (16 4 (dup dup mul add swap) 4) after executing add, stack is (20 (dup dup mul add swap) 4) after executing swap, stack is ((dup dup mul add swap) 20 4) after executing exec, stack is ((dup dup mul add swap) 20 4) About to execute commands (dup dup mul add swap) on stack (20 4) after executing dup, stack is (20 20 4) after executing dup, stack is (20 20 20 4) after executing mul, stack is (400 20 4) after executing add, stack is (420 4) after executing swap, stack is (4 420) after executing exec, stack is (4 420) after executing pop, stack is (420) 420 > (postfix-run '(postfix 0 dup) '()) ERROR: dup requires a nonempty stack ()
Notes:
-
Implement
dup
by adding acond
clause topostfix-exec-command
. -
Test your
dup
implementation using the above test cases. Yourdup
implementation should ensure there is at least one value on the stack, and give an appropriate error message when there isn’t.
-
-
(9 points) In this part you will extend PostFix with a
rot
command that behaves as follows.-
The top stack value should be a positive integer rotlen that we’ll call the rotation length. Assume there are n values v1, …, vn below rotlen on the stack, where n is greater than or equal to rotlen. Let’s refer to the first rotlen of these n values as the prefix of the n values, and any remaining values as the suffix of the n values.
-
The result of performing the
rot
command is to ``rotate’’ the prefix by moving v1 to the end of the prefix. That is, the value v1 is moved to index rotlen in the prefix, and each of the values v2 through vrotlen is shifted down one index (i.e., moved one position to the left) so that they now occupy the indices 1 through rotlen-1. So afterrot
is performed, the elements of the prefix are v2 … vrotlen v1. For example, if rotlen is 2, thenrot
acts likeswap
, and if rotlen is 1, thenrot
leaves the stack unchanged. -
The values in the suffix are unchanged by
rot
. -
When rotlen is 0, a negative integer, or a noninteger, the
rot
command signals an error.
Here are examples involving
rot
:(define rot-test '(postfix 6 4 rot 3 rot 2 rot)) > (postfix-run rot-test '(8 7 6 5 9 10)) About to execute commands (4 rot 3 rot 2 rot) on stack (8 7 6 5 9 10) after executing 4, stack is (4 8 7 6 5 9 10) after executing rot, stack is (7 6 5 8 9 10) after executing 3, stack is (3 7 6 5 8 9 10) after executing rot, stack is (6 5 7 8 9 10) after executing 2, stack is (2 6 5 7 8 9 10) after executing rot, stack is (5 6 7 8 9 10) 5 > (postfix-run '(postfix 3 (mul add) rot) '(5 6 7)) About to execute commands ((mul add) rot) on stack (5 6 7) after executing (mul add), stack is ((mul add) 5 6 7) ERROR: rot length must be a positive integer but is (mul add) > (postfix-run '(postfix 3 -1 rot) '(5 6 7)) About to execute commands (-1 rot) on stack (5 6 7) after executing -1, stack is (-1 5 6 7) ERROR: rot length must be a positive integer but is -1 > (postfix-run '(postfix 3 4 rot) '(5 6 7)) About to execute commands (4 rot) on stack (5 6 7) after executing 4, stack is (4 5 6 7) ERROR: not enough stack values for rot (4 5 6 7) > (postfix-run '(postfix 0 rot) '()) About to execute commands (rot) on stack () ERROR: rot requires a nonempty stack but is ()
Notes:
-
Implement
rot
by adding acond
clause topostfix-exec-command
. -
Racket supplies a
list-tail
function that is very helpful for implementingrot
:> (list-tail '(10 20 30 40 50) 0) '(10 20 30 40 50) > (list-tail '(10 20 30 40 50) 1) '(20 30 40 50) > (list-tail '(10 20 30 40 50) 2) '(30 40 50) > (list-tail '(10 20 30 40 50) 3) '(40 50) > (list-tail '(10 20 30 40 50) 4) '(50) > (list-tail '(10 20 30 40 50) 5) '()
Racket does not provide a similar
list-head
function, but I have provided it in theps6-postfix-starter.rkt
file. It works this way:> (list-head '(10 20 30 40 50) 0) '() > (list-head '(10 20 30 40 50) 1) '(10) > (list-head '(10 20 30 40 50) 2) '(10 20) > (list-head '(10 20 30 40 50) 3) '(10 20 30) > (list-head '(10 20 30 40 50) 4) '(10 20 30 40) > (list-head '(10 20 30 40 50) 5) '(10 20 30 40 50)
-
Test your
rot
implementation using the above test cases. Yourrot
implementation should give appropriate error messages in various situations, like those in the test cases.
-
3. Metaprogramming Derivations (14 points)
This problem involves the notion of proof-like derivations that show how programming languages are implemented in terms of interpreters and compilers. These derivations are covered in Lec 28 on Wed Mar 24.
Deriving Implementations
As presented in slides 14 to 19 of the Lec 28 Metaprogramming slides there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:
The Interpreter Rule (I)
The Translator Rule (T)
In practice, we often elide the word “machine” for interpreter machines and translator machines. For example, we will refer to an “L interpreter machine” as an “L interpreter”, and an “S-to-T translator machine” as an “S-to-T translator” or “S-to-T compiler”. We will also often elide the word “program”; e.g., we will refer to a “P-in-L program” as “P-in-L”. So an “L interpreter” means an “L interpreter machine” while a “S-in-L interpreter” means an “S-in-L interpreter program”. Similar remarks hold for translators (also called compilers): an “S-to-T translator” means an “S-to-T translator machine”, while a “S-to-T-in-L translator” means an “S-to-T-in-L translator program”.
Example
The Metaprogramming slides consider an example with the following elements:
- a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
- an HTML-interpreter-in-C program (i.e., a web browser written in C)
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
- a x86 computer (i.e., an x86 interpreter machine)
They can be used to build a 251 web page display machine as follows:
Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.
Writing Derivations
Rather that displaying derivations using horizontal lines, it is recommended that you use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:
- 251-web-page-in-HTML program
- HTML interpreter machine (I)
- HTML-interpreter-in-x86 program (T)
- HTML-interpreter-in-C program
- C-to-x86 compiler machine (I)
- C-to-x86-compiler-in-x86 program
- x86 computer
- x86 computer
Your Tasks
- (7 points) Suppose you are given the following:
- A P-in-Java program
- a Java-to-C-compiler-in-Racket program (i.e., a Java-to-C compiler written in Racket);
- a Racket-interpreter-in-x86 program (i.e., a Racket interpreter written in x86 machine code);
- a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86 code);
- an x86 computer (i.e., an x86 interpreter machine).
Using the two reasoning rules above, construct a “proof” that demonstrates how to execute the P-in-Java program on the computer. That is, show how to construct a P machine.
- (7 points) Suppose you are given:
- a GraphViz-in-C program
- a C-to-LLVM-compiler-in-x86 program
- an LLVM-to-JavaScript-compiler-in-x86 program
- a JavaScript-interpreter-in-ARM program
- an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
- an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)
Show how to construct a GraphViz machine.
Notes:
-
In the above two parts, it is not possible to derive a “C interpreter”, a “Java interpreter”, or an “LLVM interpreter”, so if you need or use such a part in your derivation, you’ve done something wrong.
-
As shown in the above examples, you should use an (I) or (T) to indicate a conclusion that results from the Interpreter or Translator rule, respectively. Double check that the conclusion is actually justified by the rule from two nested bullets below the conclusion.
4. Interpreting Condex, an Extension to Intex (15 points)
Starting this Problem
This problem involves files in the ~/cs251/sml/ps6
directory in your csenv
Virtual Machine.
To create this directory, execute these Linux commands in a shell on your csenv appliance:
cd ~/cs251/sml
git pull origin master
If you encounter any issues when executing these commands, you can troubleshoot by following these git instructions. If these instructions don’t help, please contact Lyn.
Begin this problem by renaming the starter file Condex.sml
in
~/cs251/sml/ps6
to yourAccountName-Condex.sml</code.</span>
It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.
REMINDER: ALWAYS MAKE A BACKUP OF YOUR .sml FILES AT THE END OF EACH DAY TO AVOID POTENTIAL LOSS OF WORK.
The Condex Language
Both Problems 4 and 6 involve the Condex language, which is an extension to Intex. As in Intex, all values in Condex are integers. But Condex includes the follow extra features:
-
Condex includes the relational operators
<
,=
, and>
. Since all Condex values are integers, these operators return 1 in the true case and 0 in the false case. This is similar to relational operators in PostFix. -
Condex includes the conditional expression
(ifnz E_test E_then E_else)
whereifnz
stands forif-non-zero
and is pronouncediffenz
. The meaning of the ifnz expression is as follows:First, the
E_test
expression is evaluated to an integer valueV_test
.-
If
V_test
is non-zero, then theE_then
expression is evaluated, and its value is returned as the value of theifnz
expression; -
If
V_test
is zero, then theE_else
expression is evaluated, and its value is returned as the value of theifnz
expression; -
If an error is encountered during any of these evaluations, evaluating the
ifnz
expression gives the same error.
As with Racket’s
if
expresssion, Condex’sifnz
evaluates at most two subexpressions.E_test
is always evaluated, and if that evaluation does not give an error, exactly one ofE_then
orE_else
is evaluated. -
-
Condex includes the short-circuit logical operations
(and E_1 E2)
and(or E_1 E2)
, which are similar to their Racket counterparts except they treat0
as the only falsey value and all other integers as truthy. In particular:-
If
E_1
evaluates to0
, then:(and E_1 E2)
evaluates to0
without evaluatingE_2
(or E_1 E2)
evaluates to the value ofE_2
-
If
E_1
evaluates to a nonzero valueV_1
, then:(and E_1 E2)
evaluates to the value ofE_2
(or E_1 E2)
evaluates the value ofV_1
without evaluatingE_2
(whereE_1
is only evaluated toV_1
only once)
-
Condex programs are like Intex programs, except they begin with the
symbol condex
rather than intex
. Your ~/cs251/sml/ps6
directory
contains several sample Condex programs (with extension .cdx
) that
we will now discuss:
-
The program in the file
abs.cdx
calculates the absolute value of its single argument:(condex 1 (ifnz (< $1 0) (- 0 $1) $1))
-
The program
simpleIf.cdx
usesifnz
in a simple conditional test:(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))
For example, running this program on the arguments 3 and 4 should return 7, but calling it on 4 and 3 should return 12.
-
The program
min3.cdx
returns the smallest of its three arguments:(condex 3 (ifnz (< $1 $2) (ifnz (< $1 $3) $1 $3) (ifnz (< $2 $3) $2 $3)))
For example, running this program on three arguments 4, 5, and 6 (in any order) should return 4.
-
The program
sumRelations.cdx
doesn’t contain any occurrences ofifnz
, but uses the fact that relational operations return 0 or 1 to sum the value of three such operations:(condex 4 (+ (< $1 $2) (+ (= $1 $3) (> $1 $4))))
For example, running this program on the four arguments 5, 6, 5, 4 returns 3, because
(< 5 6)
,(= 5 5)
, and(> 5 4)
all return 1. But running it on the four arguments 5, 3, 7, 4 returns 1, because(< 5 3)
and(= 5 7)
return 0 and only(> 5 4)
returns 1. -
Similar to Racket’s
if
,ifnz
should evaluate at most one of the two branch expressions. In order to test this, we need an example in which branch expressions can generate errors in some cases. For this purpose, thedivRemTest.cdx
program below has one branch with a division and another with a remainder that can generate errors when their second operands are zero:(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))
-
running this program on the arguments 0, -8, and 2 should return -4 from the then branch (but not execute the else branch expression
(% -8 0)
, which would encounter a remainder-by-zero error). -
running this program on the arguments 5, 8, and 0 should return 3 from the else branch (but not execute the then branch expression
(/ 8 0)
, which would encounter a divide-by-zero error).
However, running it on the arguments 3, 2, 0 would encounter a divide-by-zero error in the then branch, and the arguments 0, 4, 5 would enounter a remainder-by-zero error in the else branch.
This
divRemTest.cdx
program will be a particularly relevant example when we consider compiling Condex to PostFix in Problem 6. -
-
Problem 3a of this PS6, considers another Racket function,
g
, for translation into PostFix(define (g a b c) (- c (if (= 0 (remainder a 2)) (quotient b (- a c)) (* (+ b c) a))))
The Condex program prob3a-g.cdx expresses this function in Condex:
(condex 3 (- $3 (ifnz (= 0 (% $1 2)) (/ $2 (- $1 $3)) (* (+ $2 $3) $1))))
Some relevant test cases from Problem 3a are that the three arguments 10, 2, and 8 yield the result 7, and the argments 5, 3, 5 yield the result -40. (See the comments in the file for more test cases.)
-
The
nestedIfs.cdx
program is a complex program that usesifnz
expressions as the test, then, and else subexpressions of an outerifnz
expression.(condex 3 (ifnz (ifnz (< $1 $2) (< $1 $3) (> $1 $3)) (ifnz (= $2 $3) (* $2 $2) (% $2 $3)) (ifnz (> $2 $3) (/ $2 $3) (- $3 $2))))
See the comments of this file for several test cases.
-
The
andTest.cdx
andorTest.cdx
programs test the simple behavior of Condex’sand
andor
operations:(condex 2 (and $1 $2)) ; This is andTest.cdx (condex 2 (or $1 $2)) ; This is orTest.cdx
The following table shows the results of these programs on various inputs. Note that, as in Racket, values other than false (i.e., 0) and true (i.e., 1) can be returned by
and
andor
.$1 $2 result of andTest.cdx result of orTest.cdx 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 2 0 2 3 0 0 3 4 5 5 4 -
The
shortCircuitAndTest.cdx
andshortCircuitOrTest.cdx
programs test the short-circuit behavior of Condex’sand
andor
operations:(condex 2 (and $1 (/ 17 $2))) ; This is shortCircuitAndTest.cdx (condex 2 (or $1 (% 42 $2))); This is shortCircuitOrTest.cdx
The following tables shows the results of these programs on various inputs.
$1 $2 result of shortCircuitAndTest.cdx Note 0 2 0 0 0 0 Illustrates short-circuit behavior 4 5 5 Returns result that’s not 0 or 1 6 0 Error: Tried to divide 17 by 0 Triggers divide-by-zero error $1 $2 result of shortCircuitOrTest.cdx Note 0 9 6 Returns result that’s not 0 or 1 4 5 4 Returns result that’s not 0 or 1 7 0 5 Illustrates short-circuit behavior 0 0 Error: Tried to remainder 42 by 0 Triggers remainder-by-zero error
In the rest of this problem, your goal will be to flesh out the skeleton a Condex interpreter implemented in SML. The above programs will be used in test cases that your programs behave as expected.
SML Sum-of-product and S-expression Syntax for Condex
In the ps6
directory on your csenv appliance, your file
yourAccountName-Condex.sml
contains the definition of the
SML representation for Condex syntax. This syntax is
is very similar to that for the Intex interpreter
we studied in class (and which can be found in ~/cs251/sml/Intex.sml
).
Here are the datatypes and helper functions used to represent Condex program trees:
datatype pgm = Condex of int * exp
and exp = Int of int
| Arg of int
| ArithApp of arithop * exp * exp
| RelApp of relop * exp * exp (* New in Condex: relational operations *)
| And of exp * exp (* New in Condex: short-circuit conjunction *)
| Or of exp * exp (* New in Condex: short-circuit disjunction *)
| Ifnz of exp * exp * exp (* New in Condex: conditional with E_test, E_then, E_else *)
and arithop = Add | Sub | Mul | Div | Rem
and relop = Lt | Eq | Gt (* <, =, > *)
and arithopToFun Add = op+
| arithopToFun Mul = op*
| arithopToFun Sub = op-
| arithopToFun Div =
(fn(x,y) => if y = 0 then
raise EvalError ("Tried to divide " ^ (Int.toString x) ^ " by 0")
else
x div y)
| arithopToFun Rem =
(fn(x,y) => if y = 0 then
raise EvalError ("Tried to remainder " ^ (Int.toString x) ^ " by 0")
else
x mod y)
(* New in Condex interpreter *)
and relopToFun Lt = op<
| relopToFun Eq = (fn(a,b) => a = b)
| relopToFun Gt = op>
(* New in Condex interpreter *)
and boolToInt false = 0
| boolToInt true = 1
Here are some comparisons with Intex worth noting:
-
Programs are constructed via the
Condex
(notIntex
) constructor; -
Int
,Arg
,ArithApp
, andarithop
are unchanged fromIntex
, as are the `arithop -
The
RelApp
constructor and therelop
type are new in Condex; they represent relational operations (<
,=
,>
). -
The
Ifnz
constructor is new in Condex and represents theifnz
construct. -
The
And
andOr
constructors are new in Condex and represent theand
andor
constructs, respectively -
Condexs
arithopToFun
helper function is exactly the same as inIntex
. -
The new helper function
relopToFun
defines the meaning of therelop
operators. -
The new helper function
boolToInt
is also handy in this problem.
Additionally, yourAccountName-Condex.sml
contains functions for
parsing s-expression strings into, and unparsing s-expression strings from,
the sum-of-product notation. For example:
- val ifnzExp = stringToExp "(ifnz (< $1 6) (and (< $1 0) (> $2 10)) (or (= $1 10) (> $1 $2)))";
val ifnzExp =
Ifnz
(RelApp (Lt,Arg 1,Int 6),
And (RelApp (Lt,Arg 1,Int 0),RelApp (Gt,Arg 2,Int 10)),
Or (RelApp (Eq,Arg 1,Int 10),RelApp (Gt,Arg 1,Arg 2))) : exp
- expToString ifnzExp;
val it =
"(ifnz (< ($ 1) 6)\n (and (< ($ 1) 0) (> ($ 2) 10))\n (or (= ($ 1) 10) (> ($ 1) ($ 2)))\n )"
: string
- print(expToString ifnzExp);
(ifnz (< ($ 1) 6)
(and (< ($ 1) 0) (> ($ 2) 10))
(or (= ($ 1) 10) (> ($ 1) ($ 2)))
)val it = () : unit
- val simpleIfPgm = stringToPgm "(condex 2 (ifnz (< $1 $2) (+ $1 $2) (* $1 $2)))";
val simpleIfPgm =
Condex
(2,
Ifnz
(RelApp (Lt,Arg 1,Arg 2),ArithApp (Add,Arg 1,Arg 2),
ArithApp (Mul,Arg 1,Arg 2))) : pgm
- pgmToString simpleIfPgm;
val it = "(condex 2 (ifnz (< ($ 1) ($ 2)) (+ ($ 1) ($ 2)) ( * ($ 1) ($ 2))))"
: string
The above examples assume you have opened the Condex
structure via open Condex
,
and have also executed:
Control.Print.printDepth := 100
Control.Print.stringDepth := 1000;
Your Task
yourAccountName-Condex.sml
also contains the
skeleton of a Condex interpreter eval
function similar to the one defined for
Intex. All you need to do is to replace the stubs for the
RelApp
, And
, Or
and Ifnz
clauses of eval
so that they return the correct
integer result as specified by the semantics of Condex.
Notes:
-
You may define helper functions(s) if you want.
-
The easiest way to test your
eval
function is to evaluate the function calltestInterp()
, which will run the evaluator on numerous test cases involving the.cdx
program files described above.-
If you see a result like
"Passed all 61 test cases"
, it means youreval
function has passed all the test cases. (You can scroll up to see the details.) -
If you see a result that begins with something like
"Failed 4 of 61 test cases ..."
, you should scroll up searching for the string***ERROR
, which will give details on the test program and arguments causing the failure, along with the expected and actual results.
-
-
The easiest way to test your own examples is to launch a Condex Read/Eval/Print loop via
repl()
. For example:- repl(); condex> (< 3 4) 1 condex> (> 3 4) 0 condex> (ifnz (< 1 2) (+ 3 4) ( * 5 6)) 7 condex> (ifnz (> 1 2) (+ 3 4) ( * 5 6)) 30 condex> (and 4 5) 5 condex> (or 4 5) 4 condex> (or 7 (% 8 0)) 7 condex> (#args 8 5 9) condex> (ifnz (< $1 $2) (+ $2 $3) (- $2 $3)) ~4 condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 5 6) 5 condex> (#run (condex 3 (ifnz (< $1 $2) $2 $3)) 4 3 6) 6 condex> (#run min3.cdx 42 17 57) 17 condex> (#quit) Moriturus te saluto! val it = () : unit
- You can call
testInterp()
andrepl()
without theCondex.
qualifier, but if you want to access values from the Condex module, you will either need to (1) explicitly qualify them withCondex.
or (2) open theCondex
structure viaopen Condex
.
- You can call
5. Static Stack Depth in the Intex-to-PostFix Compiler (21 points)
The intexToPostFix
function discussed in Lec 30 automatically translates an Intex program to a
PostFix program that has the same meaning.
(See the fleshed out version of this function in these solution
slides
or the file /cs251/sml/intex/IntexToPostFix.sml
,)
The challenging part of this translation is determining the correct index to use for an Nget
command when translating an Arg
node in an Intext AST. The key to solving this technical problem
is to provide the expToCmds
helper function with a depth
argument that statically tracks the
number of values that have been pushed on the PostFix stack above the argument values at any point
of the computation.
The purpose of this problem is for you to study this idea in the context of a particular Intex program so that you better understand it.
-
(3 points) Draw the abstract syntax tree for the following Intex program, which we shall call
P_Intex
.(intex 4 (/ (+ $1 (* $2 (- $3 $4))) (% (- (* $4 $3) $2) $1)))
For drawing the Intex AST, use the conventions illustrated in slides 3 and 4 of the the Lec 29 Intex slides.
You may draw the tree by hand or use a drawing program, whichever is easier for you. This will be a wide diagram, so you’ll want to draw it in landscape mode rather than portrait mode. To save space, do not label the edges (i.e., no
numargs
,body
,rator
,rand1
,rand2
,index
, orvalue
labels should be used).Since you will be annotating this tree with extra information in parts b and d, leave enough room to add some annotations to the right of each node. Here is an example of an annotated node:
-
(4 points) In your AST for
P_Intex
from part a, annotate each expression node with thedepth
associated with that node when it is encountered by theexpToCmds
function within theintexToPostFix
function. As shown above, you should write depth:d to the right of each expression node, where d is the depth of the node. -
(6 points) Write down the PostFix program that is the result of translating the Intex program
P_Intex
from part a using theintexToPostFix
function. We shall call this programP_PostFix
. (You should do this translation by hand, not by running the function in SML.) Important: You translated PostFix code must be exactly the code that would be generated by theintexToPostFix
function, not just a PostFix program that happens to have the same behavior. Why? Because the purpose of this problem is to understand how theintexToPostFix
function (and, in particular, itsdepth
argument) work.
Use comments to explain the commands in the translated program. Here is an example of commenting
the PostFix code that is generated by translating the Intex program example
(intex 4 (* (- $1 $2) (/ $3 $4)))
in Lec 30.
(postfix 4 ; stack starts with ($1 $2 $3 $4)
1 nget ; stack now ($1 $1 $2 $3 $4)
3 nget ; stack now ($2 $1 $1 $2 $3 $4)
; index 3 needed to copy $2 because extra $1 on stack
sub ; stack now ($1-$2 $1 $2 $3 $4)
4 nget ; stack now ($3 $1-$2 $1 $2 $3 $4)
; index 4 needed to copy $3 because extra val on stack
6 nget ; stack now ($4 $3 $1-$2 $1 $2 $3 $4)
; index 6 needed to copy $4 because 2 extra vals on stack
div ; stack now ($3/$4 $1-$2 $1 $2 $3 $4)
mul ; stack now (($1*$2)*($3/$4) $1 $2 $3 $4)
-
(8 points) Consider running
P_Intex
on the four arguments[4, 9, 8, 2]
. In your AST forP_Intex
from part a, annotate each expression node with the list of integers that corresponds to the PostFix stack that will be encountered just before the last command translated for that node byexpToCmds
is executed in the PostFix programP_PostFix
. As shown above, you should write the stack as an SML-style list to the right of the node below the depth.For example, below is the result of annotating the Intex AST for
(intex 4 (* (- $1 $2) (/ $3 $4)))
with depth and inital and final stack information as described below.Some things to observe:
-
Only Intex expression nodes (
ArithApp
andArg
nodes in this example) have annotations. Thearithop
nodesSub
Div
, andMul
are not annotated, nor are theArg
index values or the top-levelIntex
program node. -
Every expression node has three annotations:
-
The depth of the node in the Intex–to-PostFix compilation process. This is the sum of the number of right operands of
Arithop
nodes in which the expression appears. -
The initial stack (is) right before executing the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right pre-order walk over the tree. For the sample tree, this order is: (1) the
ArithApp
node withMul
, (2) theArithApp
node withSub
, (3) theArg 1
node, (4) theArg 2
node, (5) theArithApp
node withDiv
, (6)) theArg 3
node, (7) theArg 4
node, -
The final stack (fs) right after executing all of the PostFix commands compiled for the expression in an example where the program is run on the four arguments [15, 8, 30, 6]. These annotations correspond to visiting the expression nodes in a depth-first left-to-right post-order walk over the tree. For the sample tree, this order is: (1) the
Arg 1
node, (2) theArg 2
node, (3) theArithApp
node withSub
, (4)) theArg 3
node, (5) theArg 4
node, (2) theArithApp
node withDiv
, (5) theArithApp
node withMul
-
-
At each
Arg
node in the program, the corresponding PostFixnget
stack index of that argument in the stack annotating the node is theArg
index value plus thedepth
annotating the node. In particular:-
The
Arg
node with index 1 has depth 0, and so translates to1 nget
. -
The
Arg
node with index 2 has depth 1, and so translates to3 nget
. -
The
Arg
node with index 3 has depth 1, and so translates to4 nget
. -
The
Arg
node with index 4 has depth 2, and so translates to6 nget
.
-
-
It is important to observe the key invariant guaranteed by the Intex-to-PostFix compiler at every expression node in the Intex AST: The final stack is the result of pushing exactly one value on the initial stack: the result of evaluating the Intex expression.
-
6. Compiling Condex to PostFix (22 points)
In this problem you will extend the IntexToPostFix
compiler from Lec 30
to compile the Condex language from Problem 4 to PostFix.
Starting this Problem
-
Rename the starter file
CondexToPostFix.sml
in~/cs251/sml/ps6
toyourAccountName-CondexToPostFix.sml
. -
In your renamed file
yourAccountName-CondexToPostFix.sml
, edit this line near the top of the fileuse "Condex.sml"; (* semi-colon required! *)
to be
use "yourAccountName-Condex.sml"; (* semi-colon required! *)
It is important to rename this files at the very beginning, because renaming the files protects you from git issues if I need to modify the starter files.
Problem 6a: Translating an Example Condex Program to PostFix By Hand (7 points)
The following is the divRemTest.cdx
program discussed in Problem 4.
(condex 3 (ifnz (> $1 $2) (/ $2 $3) (% $2 $1)))
It is imperative that you understand how this Condex program should be
translated into PostFix before you attempt to flesh out the
RelApp
and Ifnz
cases of expToCmds
in the next subproblem.
For this part, write the exact PostFix program that should be
generated by the the condexToPosFix
function when given this
Condex program. You should annotate the commands with explanatory
comments as described in Problem 5c.
You should also show the sequence of all intermediate stacks for executing your PostFix program on the argument lists from the following examples:
condex> (#run divRemTest.cdx 6 17 0)
5
condex> (#run divRemTest.cdx 0 -18 3)
~6
Of course, your PostFix program should give the same result as the Condex interpreter.
Problem 6b: Fleshing out the CondexToPostFix Skeleton (15 points)
You should complete Problems 4 and 6b before starting this subproblem.
The file yourAccountName-CondexToPostFix.sml
contains the skeleton of a Condex to Postfix compiler that is
very similar to the one studied in Lec 30 for Intex.
The core function in the compiler is expToCmds
, which has the type
val expToCmds: Condex.exp -> int -> PostFix.cmd list *)
It takes a Condex expression and a depth argument that tracks the
number of values that have been pushed above the program arguments
on the stack when expToCmds
is called (this is used as an offset
for the index when compiling a program argument). expToCmds
returns
a list of PostFix commands.
The Int
, Arg
, and ArithApp
cases of expToCmds
are already correctly implemented.
What you need to do is flesh out the cases for RelApp
, Ifnz
, And
, and Or
.
Notes:
-
Flesh out the
RelApp
andIfnz
clauses ofexpToCmds
so that they implement a generalized version of your strategy from Problem 6a. -
Do not start to flesh out clauses for Condex’s
And
andOr
constructs until after you have succesfully implemented theIfnz
construct, since understanding the compilation ofIfnz
is essential to understaning the compilation ofAnd
andOr
. -
You will need to develop a strategy for correctly compiling Condex’s
And
andOr
expressions to PostFix:-
It is rather challenging to translate these constructs into PostFix in a way that preserves their short-circuit behavior, so you will get most of the credit for translating these constructs if they have the right behavior except for short-circuiting.
-
When compiling Condex’s
Or
construct to PostFix, try to avoid duplicating the calculation of the first operand expression. (You will still receive substantial credit if you duplicate this calculation.) -
Hints for handling short-circuting correctly:
-
Review how Racket desugars its
and
andor
constructs intoif
expressions, and have your CondexToPostFix translation do something similar. -
Translate the test programs
shortCircuitAndTest.cdx
andshortCircuitOrTest.cdx
by hand into PostFix to better understand how short-circuting should be handled in the translation.
-
-
-
You should explicitly use qualified names beginning with
Condex.
andPostFix.
to distinguish function values from these two structures. -
To test your
expToCmds
, evaluate the function calltestComp()
. This will do the following for the Condex.cdx
programs in theps6
directory:-
Compile the Condex program into a PostFix program using the
condexToPostFix
function, which depends on your changes to theexpToCmds
function. -
For various arguments lists, run both the original Condex program (using your Condex interpreter from Problem 4) and the compiled PostFix program (using the SML PostFix interpreter we studied in class) on the argument lists and test that the outputs are the same.
-
If you see a result like
"Passed all 61 test cases"
, it means your compiler has passed all the test cases. (You can scroll up to see the details.) -
If you see a result that begins with something like
"Failed 4 of 37 test cases ..."
, you should scroll up searching for the string***ERROR
, which will give details on the test program and arguments causing the failure, along with the different results from the Condex and PostFix interpreters. The testing output also displays the PostFix program that results from the compilation process for each Condex program. You should carefully study the PostFix programs output from the compiler as part of debugging.
-
Extra Credit 1: PostFix Iterations (10 points)
This problem is optional. You should only attempt it after completing all the other problems.
-
(3 points) Study and test the following
mystery
PostFix program of one argument, which is provided near the end ofps6-postfix-starter.rkt
. Describe the function that it calculates on its one argument, and give a brief, high-level description of how it calculates that function.;; What function does this compute? (define mystery '(postfix 1 1 (swap dup 0 eq (swap) (swap 2 nget mul swap 1 sub swap 3 vget exec) sel exec) 3 rot 3 vget exec))
Note:
vget
is a command that is likenget
, except that it can return any value (including an executable sequence), not just an integer. It has been included in yourps6-postfix-starter.rkt
file for use in this extra credit problem. -
(7 points) Write a PostFix program that takes a single argument (assumed to be a nonnegative integer n) and iteratively calculates the nth Fibonacci number. Full credit will be given only if you use constant stack space; i.e., calculating the Fibonacci of a larger number should not end up with a final stack that has more elements than for a smaller number.
Extra Credit 2: Bootstrapping (14 points)
This problem is usually a second part of Problem 3 Metaprogramming Derivations on this Problem Set 6. However, because it is challenging, I have decided to make this part an optional extra credit problem rather than a required part.
In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:
- (6 points) Stage II of the paper describes how a C compiler can be “taught” to recognize
'\v'
as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:
-
the usual C programming language, which does not include the special character
'\v'
; -
C+\v = an extension to C in which
'\v'
is treated as the vertical tab character (which has ASCII value 11).
Suppose we are given the following:
-
a C-to-binary compiler machine (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
-
a C+\v-to-binary-compiler-in-C program (Figure 2.3 in Thompson’s paper);
-
a C+\v-to-binary-compiler-in-C+\v program (what should be Figure 2.2 in Thompson’s paper);
-
a machine that executes the given type of binary code (i.e., a binary interpreter machine)
Construct a metalanguage derivation proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.
- (8 points) Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:
-
a C-to-binary-compiler machine (again, just a “black box” we are given);
-
a C-to-binary-compiler-in-C program without Trojan Horses (Figure 3.1 in Thompson’s paper);
-
a C-to-binary-compilerTH-in-C program with two Trojan Horses (Figure 3.3 in Thompson’s paper);
-
a login-in-C program with no Trojan Horse;
-
a binary-based computer (i.e., a binary interpreter machine)
The subscript TH indicates that a program contains a Trojan Horse. A C-to-binary compilerTH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:
The Trojan Horse Rule (TH)
Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program loginTH-in-binary that is the result of compiling the uninfected login-in-C program with a C compiler that is itself the result of compiling the uninfected C-to-binary-compiler-in-C program. The interesting thing about this derivation is that loginTH-in-binary is infected with a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C program) contains no code for inserting Trojan horses!
Notes:
-
Although only 3 pages long, the Thompson paper is very dense and challenging. Don’t be surprised if it requires multiple readings to ``get’’ the ideas, which are profound.
-
Double-check your derivations to make sure that they make sense and actually describe what’s being said In Thompson’s Paper.
Extra Credit Problem 3: Self-printing Program (12 points)
In his Turing-lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson notes that it is an interesting puzzle to write a program in a language whose effect is to print out a copy of the program. We will call this a self-printing program. Write a self-printing program in any general-purpose programming language you choose.
Notes
-
Your program should be self-contained — it should not read anything from the file system.
-
All output must be characters printed by a command like
print
. In Racket, for instance, the program/result cant just be a quoted Racket list. -
IMPORTANT: There are lots of answers on the Internet, but you will get zero points if you just copy one of those. (I can tell!) Write this program from scratch completely on your own, without looking anything up online.