CS 251: Metaprogramming

Due: 11:00pm Thursday, 1 October
Starter files:
- fork wellesleycs251 / cs251-meta and add bpw as admin
- Text answers in answers.txt or answers.pdf (hg add the latter)
- Programming answers in datacrunch.rkt and optimization.rkt
Submission:
- Complete the honor statement for the invidual problem. (see below)
- Estimate the time you spent on each section in time.txt.
- Commit and push your completed work to your Bitbucket fork.
Relevant reading:
- Racket
- Thompson, Reflections on Trusting Trust, Turing Award lecture reprinted in CACM, August 1984.
- Dean and Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.
Tools:
- Mercurial
- DrRacket
Collaboration:
- Problems 1, 2, 3, and 5 fall under the standard collaboration policy. You may discuss ideas with each other but your writing and code should be your own.
- Problem 4 is an individual problem: You may not collaborate with other students in any way on Problem 4. Feel free to speak with your instructor or tutors about the problem, but otherwise, treat it as you would an exam. When you have completed the problem, write the honor statement in its file to certify that you have followed this policy.

0. Lexical Scope and Environments (15 points)
1. Deriving Implementations (10 points)
2. Bootstrapping (20 points)
3. Functional Idioms in the Large (10 points)
4. Higher-Order Data-Crunching [Individual Problem] (30 points)
5. Program Optimization and Partial Evaluation (15 points)
Optional

0. Lexical Scope and Environments (15 points)

Draw lexical scope contours on the code and draw the environments produced by evaluating the following sequence of Racket bindings as we did in class.

How to submit this part:

Submit a paper copy in class on Friday October 2; OR
Add and commit PDF of your drawing in your repository.

(define f2
    (lambda (g2)
        (let ([x2 3])
            (g2 2))))
(define x2 4)
(define h2
    (lambda (y2)
        (+ x2 y2)))
(define z2 (f2 h2))

1. Deriving Implementations (10 points)

As we discussed briefly in lecture this week, there are two basic reasoning rules involving interpreters, translators (a.k.a. compilers), and program implementation:

The Interpreter Rule (I)

$\frac{P\text{-in-}L\text{ program} \qquad L\text{ interpreter machine}}{P\text{ machine}} (I)$

The Translator Rule (T)

$\frac{P\text{-in-}L_1\text{ program} \qquad L_1\text{-to-}L_2\text{ translator machine}}{P\text{-in-}L_2\text{ program}} (T)$

Example

For example, we briefly considered the following elements in class (broken list on slides is now fixed):

a 251-web-page-in-HTML program (i.e., a 251 web page written in HTML)
an HTML-interpreter-in-C program (i.e., a web browser written in C)
a C-to-x86-compiler-in-x86 program (i.e., a C-to-x86 compiler written in x86)
a x86 computer (i.e., an x86 interpreter machine)

They can be used to build a 251 web page display machine as follows:

$\cfrac{ \text{251-web-page-in-HTML program} \qquad \cfrac{ \cfrac{ \text{HTML-interpreter-in-C program} \qquad \cfrac{ \text{C-to-x86-compiler-in-x86 program} \qquad \text{x86 computer} }{ \text{C-to-x86 compiler machine} } (I) }{ \text{HTML-interpreter-in-x86 program} } (T) \qquad \text{x86 computer} }{ \text{HTML interpreter machine} } (I) }{ \text{251 web page machine} } (I)$

Feel free to abbreviate by dropping the words “program” and “machine”. “… in …” denotes a program, while “… compiler” and “… translator” denote translator machines and “… interpreter” and “… computer” denote interpreter machines.

$\cfrac{ \text{251 web page in HTML} \qquad \cfrac{ \cfrac{ \text{HTML interpreter in C} \qquad \cfrac{ \text{C-to-x86 compiler in x86} \qquad \text{x86 computer} }{ \text{C-to-x86 compiler} } (I) }{ \text{HTML interpreter in x86} } (T) \qquad \text{x86 computer} }{ \text{HTML interpreter} } (I) }{ \text{251 web page machine} } (I)$

Writing Derivations

$\LaTeX$ produces nice derivations with these rules. The mathpartir package is nice, but requires downloading and including an extra file. If you \usepackage{amsmath}, the \cfrac{...}{...} command will also do the job in math mode.

If you do not speak $\LaTeX$ , feel free to use nested bullets to indicate the same structure as the inference rules (though drawn “upside down”). For example, the derivation above could also be written:

251 web page machine (I)
- 251-web-page-in-HTML program
- HTML interpreter machine (I)
  - HTML-interpreter-in-x86 program (T)
    - HTML-interpreter-in-C program
    - C-to-x86 compiler machine (I)
      - C-to-x86-compiler-in-x86 program
      - x86 computer
  - x86 computer

Questions

John McCarthy was the principal designer of the Lisp language. Steve Russell wrote the first implementations on the IBM 704 computer and made some key observations that simplified the implementation of the language. Time-travel back a few decades and suppose you are given the following:
- a Lisp-interpreter-in-Lisp program (i.e., the eval function);
- a Lisp-to-IBM-704-compiler-in-Steve-Russell program (i.e., a Lisp-to-IBM-704 compiler written as instructions for Steve Russell);
- Steve Russell (i.e., a Steve Russell interpreter machine);
- an IBM 704 computer (i.e. an IBM 704 interpreter machine).
Show how to generate a Lisp-interpreter-in-IBM-704 program. Use the rules to show its derivation. (The correct derivation accurately describes how the first Lisp implementation was created.)
After they helped to produce the Lisp-interpreter-in-IBM-704 program, but before you finished your first Lisp program, John McCarthy moved to Stanford and Steve Russell moved on to video game development. They took the Lisp-interpreter-in-Lisp program and the Steve Russell interpreter machine with them. Fortunately, the IBM 704 computer was too big to move and you kept a copy of the Lisp-interpreter-in-IBM-704 program. Can you still run Lisp programs? How can you build a Lisp machine?¹
Suppose you are given:
- a GraphViz-in-C program
- a C-to-LLVM-compiler-in-x86 program
- an LLVM-to-JavaScript-compiler-in-x86 program
- a JavaScript-interpreter-in-ARM program
- an x86 computer (i.e., an x86 interpreter machine, such as your laptop)
- an ARM computer (i.e., an ARM interpreter machine, such as a smartphone)
Show how to construct a GraphViz machine.

2. Bootstrapping (20 points)

In his Turing Award lecture-turned-short-paper Reflections on Trusting Trust, Ken Thompson describes how a compiler can harbor a so-called “Trojan horse” that can make compiled programs insecure. Read this paper carefully and do the following tasks to demonstrate your understanding of the paper:

Stage II of the paper describes how a C compiler can be “taught” to recognize '\v' as the vertical tab character. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage II by carefully listing the components involved and describing (by constructing a proof) how a C compiler that “understands” the code in Figure 2.2 can be generated. (Note that the labels for Figures 2.1 and 2.2 are accidentally swapped in the paper.) In Stage II, two languages are considered:
- the usual C programming language
- C+\v – an extension to C in which '\v' is treated as the vertical tab character (which has ASCII value 11).
Suppose we are given the following:
- a C-to-binary compiler (Here, “binary” is the machine code for some machine. This compiler is just a “black box”; we don’t care where it comes from);
- a C+\v-to-binary-compiler-in-C (Figure 2.3 in Thompson’s paper);
- a C+\v-to-binary-compiler-in-C+\v (what should be Figure 2.2 in Thompson’s paper);
- a machine that executes the given type of binary code.
[Update 29 September 2015: fix formatting] Construct a proof showing how to use the C+\v-to-binary-compiler-in-C+\v source code to create a C+\v-to-binary-compiler-in-binary program.
Stage III of the paper describes how to generate a compiler binary harboring a “Trojan horse”. Using the same kinds of components and processes used in Problem 1, we can summarize the content of Stage III by carefully listing the components involved and describing (by constructing a proof) how the Trojaned compiler can be generated. In particular, assume we have the parts for this stage:
- a C-to-binary-compiler (again, just a “black box” we are given);
- a C-to-binary-compiler-in-C without Trojan Horses (Figure 3.1 in Thompson’s paper);
- a C-to-binary-compiler_TH-in-C with two Trojan Horses (Figure 3.3 in Thompson’s paper);
- a login-in-C program with no Trojan Horse;
- a binary-based computer;
The subscript _TH indicates that a program contains a Trojan Horse. A C-to-binary compiler_TH has the “feature” that it can insert Trojan Horses when compiling source code that is an untrojaned login program or an untrojaned compiler program. That is, if P is a login or compiler program, it is as if there is a new rule:

The Trojan Horse Rule (TH)

Using these parts along with the two usual rules (I) and (T) and the new rule (TH), show how to derive a Trojaned login program login_TH-in-binary that is the result of compiling login-in-C. with a C compiler that is the result of compiling C-to-binary-compiler-in-C. The interesting thing about this derivation is that login_TH-in-binary contains a Trojan horse even though it is compiled using a C compiler whose source code (C-to-binary-compiler-in-C) contains no code for inserting Trojan horses!

3. Functional Idioms in the Large (10 points)

Read MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat. The MapReduce system is a good example of how an idiom developed with a particular language or style of language can be applied even far outside that language. Consider that (a) this paper was published in 2004, when the system had already been in use for some time, and (b) the scale of Google’s workloads has probably increased exponentially since.

For the following questions, write a paragraph or two to answer.

The map and reduce functions used in MapReduce have names that are similar to familiar higher-order Racket functions. (fold is also sometimes called reduce.) Discuss similarities and differences between MapReduce’s programming model and the programming idioms we have used with familiar higher-order functions in Racket. What familiar Racket functions are similar to the primitives of the MapReduce framework? Is there an exact match?
Our in-class discussion considered pros and cons of immutability vs. mutable data. How do Google-scale design constraints affect the (im)mutability discussion? Discuss how this choice is particularly important for at least 2 major design problems that MapReduce aims to solve.

4. Higher-Order Data-Crunching [Individual Problem] (30 points)

This problem is an invidivual problem. You may not collaborate with others in any way to prepare your solution. When you are done, add the following statement to your datacrunch.rkt file, filling the blank with your name:

I, ____, certify that the solutions in this file are solely my own work and that I have not discussed this problem with anyone other than my instructor or tutor.

Please do not use DrRacket’s “box comments” in your code. They interact poorly with our grading infrastructure.

Write your answers to this part in datacrunch.rkt.

Functions in this part will use a corpus of documents. A corpus (for the purposes of this assignment) is a list of documents, where each document is a pair (cons cell) of a name and the contents (i.e., list of words of the document) of the document. Names and words may be any simple values, such as symbols, strings, or numbers.

A corpus may also be considered to be:

an association list from document name to document contents; or
a list of lists where the first element of each list is the document name and the remaining elements are the document words.

Example (note I have chosen notation carefully, but DrRacket may display this structure differently – consider why…):

(define corpus
  '((hw  . (hello world))
    (hw2 . (hi world))
    (wc  . (world champion of the hi fi world))))

Evaluate corpus in DrRacket. How is it displayed? Why does DrRacket display it differently than the above? Why are both displays accurate and equivalent? Recall that '(A . B) denotes a cons cell holding 'A in the car and 'B in the cdr.
Write a function count that takes a corpus of docs and a word and returns the total number of occurrences of word in the corpus. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces. An ideal solution never creates a fresh cons cell.
```
 > (count corpus 'hi)
 2
 > (count corpus 'world)
 4
 > (count corpus 'hello)
 1
 > (count corpus 'hooligans)
 0
 > (count null 'hooligans)
 0
 > (count corpus 'hw)
 0
```
Write a function grep that takes a corpus of docs and a word and returns an association list mapping document name to the number of occurrences of word in that document, ONLY for documents in docs with at least 1 occurrence of word. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result. Order of the result list does not matter.
```
 > (grep corpus 'hi)
 '((wc . 1)
   (hw2 . 1))
 > (grep corpus 'world)
 '((wc . 2)
   (hw2 . 1)
   (hw . 1))
 > (grep corpus 'hooligans)
 '()
 > (grep null 'world)
 '()
```
For the purposes of this assignment, a “pile” is either:
- an non-null atom (a number, boolean, symbol, etc.); or
- a list (empty or non-empty) of piles.
Piles never contain cons cells that are not lists. Piles may be arbitrarily deep. Piles typically contain only one type of atom. The following are piles:
```
 (define pile1 (list 1))
 (define pile2 2)
 (define pile3 null)
 (define pile4 (list 2 3 4))
 (define pile5 (list (list 1 2 3 4) (list 2 (list 4 5) (list 9 2 1) 3) (list 2 8)))
```
A pile may be “flattened” into a single list with elements in their original pile order by applying the built in function:
```
 > (define flat-pile5 (flatten pile5)) ; this yields:
 '(1 2 3 4 2 4 5 9 2 1 3 2 8)
```
Write a function fold-pile that takes:
- a folding function f;
- an initial value init; and
- a pile pile;
and returns a result equivalent to the result of (foldl f init (flatten pile)). Do not call flatten or any equivalent function. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result.
```
 > (fold-pile + 0 pile1)
 1
 > (fold-pile + 0 pile2)
 2
 > (fold-pile + 0 pile3)
 0
 > (fold-pile + 0 pile4)
 9
 > (fold-pile + 0 pile5)
 46
```
The sample solution adds 3 lines below the define line. Only one of those lines is longer than the define line.

5. Program Optimization and Partial Evaluation (15 points)

This problem explores a form of metaprogramming with programs as data in Racket.

Please do not use DrRacket’s “box comments” in your code. They interact poorly with our grading infrastructure.

When translating a program P in a source language to an equivalent program in a target language, a compiler may choose any equivalent program. Most compilers try to choose a more efficient equivalent program, applying optimizations to code. The amount of optimization a compiler can do is limited by the halting problem (its job is to translate the program, not run it), but it can still reason in limited ways about what the program will do when run.

Constant folding is a classic compiler optimization. Constant folding recognizes when simple operations (e.g., arithmetic) are applied to values that will clearly be constant across all possible runs of the program. For example, constant folding can replace the Racket expression (+ 2 3) with the clearly equivalent (and more efficient) expression 5. Importantly, it can replace this expression if this is the whole program or if it is just a subexpression appearing in a larger program. For example, it can also replace (cons (fib (+ 4 3) (cons (+ 4 5) null))) with (cons (fib 7) (cons 9 null)). Constant folding evaluates simple program subexpressions that have all-constant arguments and replaces them in the program with their evaluated forms. More generally, the job of constant-folding (and some related optimizations) is to do as much of the program’s work as possible once before run-time, so that the program can later avoid doing that work every time it is run.

Dead-code elimination is another classic optimization that removes code that clearly will never be evaluated in any execution. For example, dead-code elimination could replace this Java code:

if (true) {
    System.out.println("Hello!");
} else {
    System.out.println("Impossible!");
}

with this equivalent Java code:

System.out.println("Hello!");

Working together, simple compiler optimizations like these can replace the Racket expression (if #f (f x) 3) with the expression 3 even if the compiler cannot determine what f or x will be. While several interdependent optimizations including constant-folding and dead-code elimination have their own names, they work together very well and, in expression oriented languages, their boundaries are blurrier than in statement-oriented languages. We will refer to our optimizer as a constant-folder, although it will use some techniques sometimes associated with other named optimizations and a broader technique known as partial evaluation.

For example, we will write optimizations that can determine that the expression (cons #f (- 8 (+ (if (< 3 5) 2 (fib 87)) 4))) is equivalent to the expression (cons #f 2).

Racket Constant Folder

Writing a Racket constant folder in Racket is simple, since programs are so easily represented as data. A constant-folding function takes a quoted Racket expression (a program as data) as an argument and returns another quote Racket expression. The resulting quoted expression must be functionally equivalent to the original argument expression (i.e., it will compute the same result as the original when evaluated), but it need not be identical in structure to the original.

Quoted Expressions

Recall that quoted expressions in Racket can represent programs. The expression:

'(cons a (- 5 (+ (if (< 3 5) 2 (fib 87)) 4)))

is a list containing the symbols 'cons and 'a and the list of:

the symbol '-, the number 5 and the list of:
- the symbol '+ and the list of
  - the symbol 'if and so on…

Variable names and special forms are represented by symbols in this representation. Each expression is a list of its parts, including, e.g. the 'if symbol and all of the subexpressions of the if expression. If a quoted expression represents a valid Racket expression, it can be evaluated as such with the built-in eval function.

Warning About Metaprogramming

In the programming you begin shortly, you will be writing a Racket program that manipulates other Racket programs. Racket’s support for quoting is simultaneously beautiful and rife with opportunites for confusion for beginners. Racket programs as data look nearly identical to Racket programs as programs. This eases the mental mapping between code and its representation, but it will also force you (in a good way) to think very carefully about when you are manipulating data, when you are evaluating code, and what happens now vs. later.

A Constant-Folding Function

You will write a constant-folding function that reduces arithmetic operations on constant operands, following this recursive pattern. It will manipulate quoted expressions as data that represent programs.

A constant-folded version of the quoted expression e is created as follows:

If e is a value expression (e.g., a number), then return e as is.
Otherwise:
- Constant-fold all of the subexpressions of e.
- If all of the subexpressions of e can be constant-folded to values and e is a kind of expression whose value result can be determined in constant time (or at least finite time!) given constant operands, then reduce e to yield a value expression and return this value expression in place of the original expression e.
- Otherwise, return a new expression of the same kind as e, but with e’s subexpressions replaced by their constant-folded versions. (Note that these constant-folded subexpressions are not necessarily any different from their original forms.)

We will accomplish this in a few steps.

Starting Point

You are given these functions:

constant? takes a quoted expression and returns #t if the expression its argument represents is a value expression and #f if the expression its argument represents would require evaluation to produce a value. constant? does not evaluate the expression, it only inspects its representation.
constant-fold takes a quoted expression as an argument and returns an equivalent (and possibly simpler) quoted expression as a result. Currently, constant-fold supports constant folding only for quoted expressions of the form '(+ ...). You will extend this function to fold more kinds of expressions.

In the following tasks:

Assume all variable-argument expressions (such as +) take exactly two arguments.
Assume that arguments to constant-fold never bind variables that shadow the names of built-in functions. (For example, we do not need to handle the quoted expression '(let ([+ (lambda (x y) (* x x))]) (+ 2 3)) correctly. (It is equivalent to the expression 4, but it is OK if our constant-folder decides it is equivalent to the expression 5.) This assumption makes your job easier – there’s nothing special you need to do as a result.
Assume that arguments to constant-fold never contain nested quoting.
Your code for each addition will be an extra case in the cond expression.
[Update 29 September 2015: clarify] Given a quoted expression representing a syntactically valid Racket expression, your constant-folder is not allowed to raise an error (even if the expression would raise an error when evaluated). Furthermore, if the given expression would raise an error if evaluated, the folded expression should raise that same error if evaluated. Your constant folder should preserve not just non-error behavior, but also error behavior.
As you add cases, carefully consider how each is similar to and different from earlier cases. Use the evaluation rules we developed to guide your reasoning about how to replace expressions.

Questions and Coding

Write your text answers as comments and your code answers as code in optimization.rkt.

Is a constant folder a program translator? Is a constant folder a program interpreter? Explain briefly.
Read the existing definition of constant-fold carefully. Why is it useful to constant-fold the subexpressions of e even if e itself cannot be constant-folded? Given an example quoted expression where this behavior yields a simpler quoted expression.
Add constant folding for - (subtraction) expressions, < (less-than) expressions, and / (division) expressions. (Careful on the last one!)
Add constant folding for and, or, and if expressions.

Testing

We have provided a test-cf function that takes a quoted expression and tests whether constant-fold returns a functionally-equivalent expression as a result. It does so by usig the built in eval function to evaluate the original expression and the new version to see if they give the same result. Naturally, there are some halting-problem-related and other pitfalls with this approach to testin, so caveat emptor. Furthermore, this function does nothing to test if you succeeded at optimizing the expression, only if you generated something equivalent.

We have not provided explicit test expressions (beyond the examples above), but we suggest that you develop several of your own. This will help think about exactly when an optimization can be made.

Optional

Nothing beyond here is required. It is suggested for further exploration if you are interested.

Challenge Problems

Implement constant folding for all cases (including multi-argument) of the expressions + - * < > <= >= = eq? and or null?. Our implementation is 3 lines and includes use of all? (from the previous assignment) and the built-in function member, plus very careful use of the built-in eval function, which fully evaluates a quoted expression. (Remember, constant-folding cannot evaluate arbitrary expressions.)
Add optimizations to reduce car, cdr, and cons? expressions where possible. For example, '(car (cons x (+ 8 3))) can be replaced by the equivalent 'x, even though '(cons x (+ 8 3)) is not a value expression. Consider carefully: Is it valid to replace '(cdr (cons (f 8) 2)) with 2? What about replacing '(car (cons (f 8) 2)) with '(f 8)?
Add support for basic reasoning about bindings and environments, e.g., reduce '(let ([x (+ 3 5)]) (+ x x)) to '16, or reduce '(let ([x (+ 3 5)]) (+ x y)) to '(+ 8 y), but think carefully about the limits you may encounter. You will need to add another parameter to carry environment information. What will you store in this analysis environment as opposed to the dynamic evaluation environment used at run-time? Try to eliminate your analysis’s dependence on our earlier assumption that common builtin functions are not shadowed.
Can you even reduce some function calls? Remember, your constant folder/partial evaluator must always terminate (and never raise an error). How do you know what function calls are safe to reduce? Can you do a perfect job? Can you do an OK-but-safe (conservative) job?

Extra Reading

If you are curious about other related software, languages, and idioms see: Hadoop, Sawzall, Pig Latin, Spark, and many more…

So who uses Lisp/Scheme/Racket? Paul Graham (now at the Y Combinator startup incubator, which is named for a fascinating function we’ll study later) has an interesting essay on the use of Lisp to build what became the Yahoo! Store in the late 1990s and another note on ITA using Lisp around 2000. (Since acquired by Google.) These are 1-2 decades old now, so many things have likely changed since, but they are still interesting accounts. The Clojure language, another Lisp derivative created in the past few years, has seen some use in industry. It runs on the JVM (same infrastructure used to run Java programs) and has extra support for concurrent programming. Emacs, the widely-used text editor, is extremely programmable, all in its own dialect of Lisp. Plenty more examples, running out of time at the moment…

Starting in the 1970s, Lisp machines (computers built just to run Lisp) showed up on the market and gave rise to a lot of interesting technical innovations and policitical happenings that gave rise to the version of Emacs that has become popular as part of the GNU project, as well as a major part of the broader free software movement.

A couple decades later, actual physical Lisp Machines became all the rage. We mean “Lisp machine” in the sense given by the rules above, not specifically these hardware Lisp Machines. ↩

Contents