CS 251: PS9: Open to Interpretation

Dueish: Tue. Dec. 12
Notes:
- This is the last fully required pset, which covers material through the lecture on Fri. Dec 08. It will be followed by a PS10 that has one required solo problem (also due Tue Dec 12) and a some completely optional, extra credit problems.
- Although formally “due” on Tue., Dec. 12, this (and all your other as-yet-incomplete CS251 work) should actually be completed by the end of finals period. (If you need longer than that, consult with Lyn.)
- This pset has 140 points total: 28 for one solo problem and 112 for five regular problems.
- The problems needn’t be done in order. Feel free to jump around.
Submission:
- In your yourFullName CS251 Fall 2017 Folder, create a Google Doc named yourFullName CS251 PS9.
- At the top of your yourFullName CS251 PS9 doc, include your name, problem set number, date of submission, and an approximation of how long each problem part took.
- For all parts of all problems, include all answers (including SML code) in your PS9 google doc. Format code using a fixed-width font, like Consolas or Courier New. You can use a small font size if that helps.
- In Problem 1–6, you will modify the following code files from starter files populating your ~wx/cs251/sml/ps9 directory on your wx Virtual Machine:
  - Problem 1: LazySequence.sml
  - Problem 2: IntexArgChecker.sml
  - Problem 3: BindexToPostFix.sml
  - Problem 4: Simprex.sml and SimprexEnvInterp.sml
  - Problem 5: ValexPS9.sml and ValexEnvInterpPS9.sml
  - Problem 6: ValexPS9.sml and ValexEnvInterpPS9.sml
  Drop a copy of your entire ps9 directory into your ~/cs251/drop/ps09 drop folder on cs.wellesley.edu.
- For Problems 1–6, you should include all modified functions from the files listed above in your Google Doc (so that I can comment on them). In the Google Doc, you need only include the functions you modified, not all functions.
- Don’t forget to include the following written (noncode) answers in your solutions:
  - The requested Intex program in Problem 2c
  - The result of hand translating the Bindex program in Problem 3a into PostFix (use s-expression syntax).
  - The results of evaluating the sample Simprex programs in Problem 4a.

Starting this Problem Set

Problems 1–6 all involve starter files in the ~wx/cs251/sml/ps9 directory in your wx Virtual Machine.

To create this directory, execute the following two commands in a wx VM shell:

cd ~/cs251/sml
git add -A
git commit -m "my local changes"
git pull origin master --rebase

Several students have encountered problems when executing this steps. If you have problems, please contact lyn.

1. Solo Problem: Lazy Sequences (28 points)

This is a solo problem. This means you must complete it entirely on your own without help from any other person and without consulting resources other than course materials or online documentation. You may ask Lyn for clarification, but not for help.

Background

This is a module/ADT problem that shares aspects of the FunSet and OperationTreeSet problems from PS8. It involves a data structure implementation involving multiple tree nodes, where some of the tree nodes have functional components.

The problem involves a list-like data structure called a sequence that has the following signature:

signature SEQUENCE = sig
    (* The type of a sequence *)
    type ''a t 

    (* An empty sequence *)
    val empty : ''a t

    (* Create a sequence of (hi - lo) values fcn(lo), fcn(lo+1), ..., fcn(hi-1) *)
    val segment: int -> int -> (int -> ''a) -> ''a t

    (* Convert a lengnth-n list into a sequence of n values *)                          
    val fromList: ''a list -> ''a t

    (* Concatenate two sequences: given a length-m sequence s and length-n sequence t, 
       returns a single length-m+n sequence that has all values of s followed by all 
       values of t *)
    val concat : ''a t -> ''a t -> ''a t

    (* Return the length of a sequence = number of values in it *)                              
    val length : ''a t -> int

    (* Return the nth value of a sequence (0-indexed). 
       Raise an IndexOutOfBounds exception for an out-of-bounds index. *)                                
    val get : int -> ''a t -> ''a

    (* Return a sequence that results from applying f to each elt *)
    val map : (''a -> ''b) -> ''a t -> ''b t

    (* Return a list with all elements in the sequence. The ith element of the resulting
       list should be the ith element of the given sequence. *)
    val toList : ''a t -> ''a list

end

It would be straightforward to implement a sequence as a list. But this implementation can have issues.

As a concrete example, consider the following inefficient ways to increment and double a number:

(* A slow incrementing function *)
fun linearIncrement n =
  let fun loop num ans =
  if num = 0 then ans else loop (num-1) (ans+1)
  in loop n 1
 end

(* A very slow doubling function *)
fun quadraticDouble n =
  let fun loop num ans =
  if num = 0 then ans else loop (num-1) (linearIncrement ans)
  in loop n n
 end

For example, quadraticDouble 10000 returns fairly quickly, but quadraticDouble 100000 takes many seconds.

Now suppose we want to create the sequence

val reallyBigSeq = segment ~200000000 200000000 quadraticDouble

There are two big problems if we try to represent this as a list. First, the list would need to contain 400 million elements. Second, calculating each entry in the list (especially for large inputs) would take a very long time. So the list implementation is bad in terms of both space and time.

An alternative implementation is not to caclulate and store each element when the sequence is created, but to calculate each element on the fly only when the get operation is called to return it. This is the basis of the LazySequence structure that you will implement in this problem. It is based on the following datatype:

datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
       | ThunkList of (unit -> ''a) list
       | Concat of ''a t * ''a t

In programming languages, the term “lazy” means “not computed until necessary”. In this context, an element of a sequence will not be calculated until it needs to be known.

The `Segment` constructor

In the case of reallyBigSeq, the sequence can be represented via the sum-of-product value

Segment(~200000000,200000000,quadraticDouble)

This simply keeps track of the information for constructing the sequence. Now if we wish to get the value of this sequence at index 20000017, we can apply quadraticDouble to (200000017 + ~200000000) = 17 and return the value quadraticDouble(17). This way, we only “pay” for calculations done at the indices on which we actually perform get. This leads to the following implementation of the segment function:

fun segment lo hi fcn = Segment(lo,hi,fcn)

Thunks

Lazy data structures often employ something called a thunk, which is simply a function of zero arguments. Wrapping a calculation in a function of zero arguments is a way to delay it until later ratger than performing it now. For example, consider testThunk:

val testThunk = fn () => quadraticDouble 100000

Even though calculating quadraticDouble 100000 takes many seconds, defining testThunk is immediate, because no calculation is performed until testThunk is called, as in testThunk(). This is called “dethunking”, which can be defined as follows:

fun dethunk thunk = thunk()

So testThunk() can equivalently be written dethunk(testThunk)

Note that every SML function necessarily takes exactly one argument, so the notion that thunks take “zero” arguments is loose terminology. In fact a thunk takes the “unit value”, written (), which is the only value of type unit and can be viewed as a tuple with zero components.

The upside of a thunk is that the delayed computation is never performed if the thunk is never dethunked. The downside is that the calculation is performed every time the thunk is dethunked, so computations can be repeated unnecessarily.

Note that the type of testThunk is unit -> int, where unit specifies that the () value must be supplied to the thunk in order to extract its value. In general, the type of a thunk for a value of type ''a is unit -> ''a, so we use this as the definition of a type abbreviation named thunkTy:

type ''a thunkTy = unit -> ''a

The `ThunkList` constructor

Whereas the Segment constructor supports “regular” lazy sequences whose values are determined by some function over a range of integers, the ThunkList constructor is intended to support more “irregular” sequences in which there is not a simple correspondence between sequence values and a range of integers. For example, consider these sequences:

val numSeq = fromList [300005, 20074, 4000017, 1002]
val quadSeq = map quadraticDouble numSeq

Suppose we want all the applications of quadraticDouble to be delayed until the sequence elements are extracted. E.g., get 1 quadSeq should calculate quadraticDouble 20074 when the get is performed, but not before that. If the elements at indices 0 and 2 are never accessed via get, then the time-intensive calculations quadraticDouble 300005 and quadraticDouble 4000017 are never performed.

The ThunkList constructor takes a list of thunks:

ThunkList of ''a thunkTy list

Using ThunkList, the quadSeq sequence can be represented as something like:

ThunkList[ fn () => quadraticDouble 300005,      
           fn () => quadraticDouble 20074,      
           fn () => quadraticDouble 4000017, 
           fn () => quadraticDouble 1002 ]

Based on this idea, we can use ThunkList to represent all “irregular” sequences constructed from lists via fromList:

fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)

This means that in numSeq, all the numbers will be thunked, and so will be equivalent to:

ThunkList[ fn () => 300005,      
           fn () => 20074,      
           fn () => 4000017, 
           fn () => 1002 ]]

The `Concat` constructor

The final constructor in the ''a t sum-of-product sequence datatype is the Concat constructor:

Concat of ''a t * ''a t

This is simply used to glue together two sequences:

fun concat seq1 seq2 = Concat(seq1,seq2)

So in the end, any instance of the ''a t sum-of-product sequence datatype is a binary tree of Concat nodes whose leaves are either Segment nodes or ThunkList nodes.

Your task

The starter file for this problem is LazySequence.sml in the ~wx/cs251/sml/ps9 directory in your wx Virtual Machine.

LazySequence.sml contains the SEQUENCE signature from above and the following partial implementation of the LazySequence structure:

  structure LazySequence :> SEQUENCE = struct

     type ''a thunkTy = unit -> ''a

     datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
                 | ThunkList of ''a thunkTy list
                      | Concat of ''a t * ''a t

     fun segment lo hi fcn = Segment(lo,hi,fcn)
     fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)
     fun concat seq1 seq2 = Concat(seq1,seq2)
     fun dethunk thunk = thunk()                            

     val empty = ThunkList []

     fun length seq = 0 (* replace this stub *)

     fun get n seq = raise Unimplemented (* replace this stub *)

     fun map f seq = empty (* replace this stub *)

     fun toList seq = [] (* replace this stub *)

  end

The segment, fromList, concat functions and empty value have already been defined for you. Your task is to implement the remaining four functions: length, get, map, and toList. Pay attention to the following notes in your implementation.

Notes:

Be careful to use the explicitly qualified List.map and List.length functions when operating on a list, but unqualified map and length operators when opreating on a sequence. Any attempt to use the unqualified map and length on a list will result in a type error.
Do not use the toList function in the implementations of length, get, and map. Similarly, the get function should not be used in the implementation of toList. However, you may use the length function in the implementation of get.
The length function should always return an integer and should never raise an exception.
The get function should raise an IndexOutOfBounds exception for a sequence s when called on a index outside the range 0 and length(s) - 1. In the implementation of the get function, full credit will be awarded only if the out-of-bounds length check is peformed only one (not separately for each of the three constructors). You may use the length function in the implementation of get.
The map function should always return a sequence and should never raise an exception, even if the mapped function might raise an error on one of the values. For example, in
```
val mapSeq2 = map (fn n => 20 div n) (fromList [3, 0, 5, 2, 4])
```
no exception should be raised when defining mapSeq2 or evaluating get 0 mapSeq2, get 2 mapSeq2, get 3 mapSeq2, or get 4 mapSeq2. But get 1 mapSeq2 should raise a divide-by-zero exception.
The map function should preserve the “shape” of its input sum-of-product constructor tree. So it should return a Concat output for a Concat input, a Segment output for a Segment input, and a ThunkList output for a ThunkList input.
In the implementation of the map function, it will be necessary to combine two function values into one. What is the most natural way to do this? (Hint: it’s something we’ve studied before!)
If calculating one or more values of a sequence s raises an exception, toList s should raise the exception associated with the value having the lowest index.

The LazySequence structure is followed by numerous testing functions and test cases. (The expected resuls of all the test cases appear in a comment at the end of the file LazySequence.sml.) Here we illustrate the four functions that are used for testing get in the context of these two example sequences:

val mapSeq1 = map (fn n => 20 div n) (fromList [3, 5, 2, 4])
val mapSeq2 = map (fn n => 20 div n) (fromList [3, 0, 5, 2, 4])

(testGet seq) returns a list of (index, value-at-index) pairs for each valid index in seq. If calculating at least one of the values raises an exception, testGet raises the exception encountered for the lowest index at which an exception is raised by get.

- testGet mapSeq1;
val it = [(0,6),(1,4),(2,10),(3,5)] : (int * int) list

- testGet mapSeq2;

uncaught exception Div [divide by zero]
  raised at: /tmp/emacs-region28242nLJ:129.31-129.34
             /tmp/emacs-region28242nLJ:104.29

(testGetRange seq lo hi) returns a list of (index, value-at-index) pairs for each index in seq in the range lo (inclusive) to hi (exclusive). If calculating at least one of the values raises an exception, testGet raises the exception encountered for the lowest index at which an exception is raised by get.

- testGetRange mapSeq1 1 3;
val it = [(1,4),(2,10)] : (int * int) list

- testGetRange mapSeq1 3 6;

uncaught exception IndexOutOfBounds
  raised at: /tmp/emacs-region28242nLJ:68.16-68.34
             /tmp/emacs-region28242nLJ:108.29

- testGetRange mapSeq2 2 5;
val it = [(2,4),(3,10),(4,5)] : (int * int) list

- testGetRange mapSeq2 0 3;

uncaught exception Div [divide by zero]
  raised at: /tmp/emacs-region28242nLJ:129.31-129.34
            /tmp/emacs-region28242nLJ:108.29

(testGetHandleExceptions eltToString seq) returns a list of (index, string-for-value-or-exception-at-index) pairs for each valid index in seq. The eltToString function must be supplied to turn each sequence value into a string. If an exception is encountered, the exception string is used in place of a value at that index.
```
- testGetHandleException Int.toString mapSeq1;
val it = [(0,"6"),(1,"4"),(2,"10"),(3,"5")] : (int * string) list

- testGetHandleException Int.toString mapSeq2;
val it = [(0,"6"),(1,"Error: Div -- divide by zero"),(2,"4"),(3,"10"),(4,"5")] : (int * string) list
```

(testGetRangeHandleExceptions eltToString seq lo hi) returns a list of (index, string-for-value-or-exception-at-index) pairs for each index in seq in the range lo (inclusive) to hi (exclusive). The eltToString function must be supplied to turn each sequence value into a string. If an exception is encountered, the exception string is used in place of a value at that index.

- testGetRangeHandleException Int.toString mapSeq1 ~1 6;
val it =
  [(~1,"Error: IndexOutOfBounds -- ~1"),(0,"6"),(1,"4"),(2,"10"),(3,"5"),
   (4,"Error: IndexOutOfBounds -- 4"),(5,"Error: IndexOutOfBounds -- 5")]
  : (int * string) list

- testGetRangeHandleException Int.toString mapSeq2 ~1 6;
val it =
  [(~1,"Error: IndexOutOfBounds -- ~1"),(0,"6"),
   (1,"Error: Div -- divide by zero"),(2,"4"),(3,"10"),(4,"5"),
   (5,"Error: IndexOutOfBounds -- 5")] : (int * string) list

Here is a FAQ associated with this problem:

Q1: What is the structure of a lazy sequence?

A1: Just as an OperationTreeSet is a tree made out of six kinds of nodes, a LazySequence is a tree made out of the three kinds of nodes in this datatype:
```
datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
               | ThunkList of (unit -> ''a) list
               | Concat of ''a t * ''a t
```
The Segment, ThunkList, and Concat constructors represent three completely different ways to represent a lazy sequence.
- A Segment represents a length-n lazy sequence as a single unary (single-argument) function that is applied to every index in a range with n indices.
- A ThunkList represents a length-n lazy sequence as a list of n nullary (zero-argument) functions (known as thunks).
- A Concat represents a length-(m+n) lazy sequence as the combination of a length-m lazy sequence and a length-n lazy sequence.
Q2: Can lazy sequences have infinite length?

A2: No. Based on the answer to Q1, all lazy sequences have a finite number of elements.

Q3: Why do I get the error data constructor Segment used without argument in pattern?

A3: Be sure to wrap pattens in parens. E.g., write (Segment(lo,hi,func)), not Segment(lo,hi,func)

Q4: When I send LazySequences.sml to the *sml* buffer, I don’t get an interpreter prompt in the *sml* buffer. Why?

A4: Look for an infinite loop in your code that is causing it to hang.

Q5: I get an IndexOutOfBounds exception when testing my code that prevents me from seeing the results of test cases. What can I do?

A5: Try commenting out the test cases involving plain testGet (as opposed to testGetHandleException and testGetRangeHandleException). If that stops the uncaught IndexOutOfBounds exception, it indicates a bug in your get function.

2. Static Argument Checking in Intex (12 points)

As described in the lecture slides on Intex, it is possible to statically determine (i.e., without running the program) whether an Intex program contains a argument reference with an out-of-bound index. In this problem you will flesh out skeleton implementations of the two approaches for static argument checking sketched in the slides. The skeleton implementations are in the file ps9/IntexArgChecker.sml.

Top-down checker (5 points): In the top-down approach, the checkTopDown function on a program passes the number of arguments to the checkExpTopDown function on expressions, which recursively passes it down the subexpressions of the abstract syntax tree. When an Arg expression is reached, the index is examined to determine a boolean that indicates whether it is in bounds. The boolean on all subexpressions are combined in the upward phase of recursion to determine a boolean for the whole tree. Complete the following skeleton to flesh out this approach.
```
(* val checkTopDown: pgm -> bool *)
fun checkTopDown (Intex(numargs,body)) =
  checkExpTopDown numargs body

(* val checkExpTopDown : int -> Intex.exp -> bool *)
and checkExpTopDown numargs (Int i) = raise Unimplemented
  | checkExpTopDown numargs (Arg index) = raise Unimplemented
  | checkExpTopDown numargs (BinApp(_,exp1,exp2)) = raise Unimplemented
```
Uncomment the definition of topDownChecks to test your implementation.
Bottom-up checker (5 points): In the bottom-up approach, the checkExpBottomUp function returns a pair (min, max) of the minimum and maximum argument indices that appear in an expression. If an expression contains no argument indices, it should return the (valOf Int.maxInt, valOf Int.minInt), which is a pair of SML’s (1) maximum integer and (2) minumum integer. The functions Int.min and Int.max are useful for combining such pairs. (Note that Int.maxInt is the identity for Int.min, and Int.minInt is the identity for Int.max.)

The checkBottomUp function on programs returns true if all argument references are positive and less than or equal to the programs number of arguments, and false otherwise. Complete the following skeleton to flesh out this approach.
```
(* checkBottomUp: pgm -> bool *)
fun checkBottomUp (Intex(numargs,body)) =
  let val (min,max) = checkExpBottomUp body
   in 0 < min andalso max <= numargs
  end
   
(* val checkExpBottomUp : Intex.exp -> int * int *)
and checkExpBottomUp (Int i) = (valOf Int.maxInt, valOf Int.minInt)
  | checkExpBottomUp (Arg index) = raise Unimplemented
  | checkExpBottomUp (BinApp(_,exp1,exp2)) = raise Unimplemented
```
Notes
- Uncomment the definition of bottomUpChecks to test your implementation.
- Do not use Int.min/Int.max to find the minimum-so-far and maximum-so-far elements in the tuple returned by checkExpBottomUp. Instead use pattern matching! Why? Because they return the wrong value for the tuple returned for an integer. (Thiabout this!)
Static vs dynamic checking (2 points): A static argument index checker flags programs that might dynamically raise an index-out-of-bounds error, but does not guarantee they will dynamically raise such an error. Give an example of an Intex program for which checkTopDown and checkBottomUp returns false but does not raise an argument-out-of-bounds error when run on particular values for the correct number of arguments. Hint: such a program can raise a different error; what other kinds of expression errors are there?

3. Bindex To PostFix (20 points)

Background

In lecture, we studied the following intexToPostFix function that automatically translates Intex programs to equivalent PostFix programs:

fun intexToPostFix (Intex.Intex(numargs, exp)) =
  PostFix.PostFix(numargs, expToCmds exp 0)
    (* 0 is a depth argument that statically tracks 
       how many values are on stack above the arguments *)

and expToCmds (Intex.Int i) depth = [PostFix.Int i]
  | expToCmds (Intex.Arg index) depth = [PostFix.Int (index + depth), PostFix.Nget]
   (* specified argument is on stack at index + depth *)
  | expToCmds (Intex.BinApp(binop,exp1,exp2)) depth =
    (expToCmds exp1 depth) (* 1st operand is at same depth as whole binapp *)
    @ (expToCmds exp2 (depth + 1)) (* for 2nd operand, add 1 to depth 
                                      to account for 1st operand *)
    @ [PostFix.Arithop (binopToArithop binop)]

and binopToArithop Intex.Add = PostFix.Add
  | binopToArithop Intex.Sub = PostFix.Sub
  | binopToArithop Intex.Mul = PostFix.Mul
  | binopToArithop Intex.Div = PostFix.Div
  | binopToArithop Intex.Rem = PostFix.Rem

For example, given the Intex program intexP2 expressed as an Intex.pgm datatype instance

val intexP2 = Intex(4, BinApp(Mul,
                              BinApp(Sub, Arg 1, Arg 2),
                              BinApp(Div, Arg 3, Arg 4)))

the intexToPostFix translation is as follows:

- intexToPostFix(intexP2);
val it =
  PostFix
    (4,
     [Int 1,Nget,Int 3,Nget,Arithop Sub,Int 4,Nget,Int 6,Nget,Arithop Div,
      Arithop Mul]) : PostFix.pgm

With the helper function translateString shown below, the input and output of translation can be expressed as strings in s-expression format:

fun translateString intexPgmString =
  PostFix.pgmToString (intexToPostFix (Intex.stringToPgm intexPgmString))

- IntexToPostFix.translateString("(intex 4 (* (- ($ 1) ($ 2)) (/ ($ 3) ($ 4))))");
val it = "(postfix 4 1 nget 3 nget sub 4 nget 6 nget div mul)" : string

There are two key aspects to the translation from Intex to PostFix:

The expToCmds function translates every Intex expression to a sequence of PostFix commands that, when executed, will push the integer value of that expression onto the stack. In the case of translating a binary application expression, this sequence is composed by appending the sequences for the two operands followed by the appropriate arithmetic operator command.
The trickiest part of expToCommands is knowing how many integers are on the stack above the program arguments, so that references to Intex arguments by index can be appropriately translated. This is handled by a depth argument to expToCommands that keeps track of this number. Note that the first operand of a binary application has the same depth as the whole binary application expression, but the second operand has a depth one greater than the first to account for the integer value of the first operand pushed onto the stack.

Your Task

In this problem, you will think about and implement a similar translator from Bindex programs to PostFix programs.

Hand translating a sample Bindex expression (6 points) To get a better sense for the issues involved, give the PostFix program (in s-expression notation) that results from translating by hand the following Bindex program:
```
(bindex (a b)
  (- (bind c (+ a b)
       (* b c))
     a))
```
Run your resulting program in PostFix to verify that it works as expected. In particular, running the PostFix program that results from hand translating the above Bindex program on the arguments 1 6 should return 41. If it doesn’t, there’s a bug in your translation strategy.

Fleshing out the translator (14 points): The file ps9/BindexToPostFix.sml contains a skeleton of the Bindex to PostFix translator in which all Bindex expressions translate to a command sequence consisting of the single integer command 42:

fun makeArgEnv args = Env.make args (Utils.range 1 ((length args) + 1))
(* returned env associates each arg name with its 1-based index *)

fun envPushAll env = Env.map (fn index => index + 1) env
(* add one to the index for each name in env *)

fun envPush name env = Env.bind name 1 (envPushAll env)

fun bindexToPostFix (Bindex.Bindex(args, body)) =
  PostFix.PostFix(length args, expToCmds body (makeArgEnv args))

(* In expToCmds, env statically tracks the depth of each named variable
   value on the stack *)
and expToCmds (Bindex.Int i) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.Var name) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.BinApp(binop, rand1, rand2)) env = [PostFix.Int 42] (* replace this stub *)
  | expToCmds (Bindex.Bind(name, defn, body)) env = [PostFix.Int 42] (* replace this stub *)

and binopToArithop Bindex.Add = PostFix.Add
  | binopToArithop Bindex.Sub = PostFix.Sub
  | binopToArithop Bindex.Mul = PostFix.Mul
  | binopToArithop Bindex.Div = PostFix.Div
  | binopToArithop Bindex.Rem = PostFix.Rem

When loaded, the BindexToPostFix.sml file performs numerous tests of the translator. Initially, all of these fail. Your task is to redefine the clauses of expToCmds so that all the test cases succeed.

Notes:

The expToCmds function in the Bindex to PostFix translator uses an environment to track the depth of each variable name in the program. Recall that an environment (defined by the Env structure in utils/Env.sml) maps names to values. In this case, it maps names in a Bindex program to the current stack depth of the value associated with that name. Carefully study the helper functions makeArgEnv, envPushAll, and envPush to understand what they do; all are helpful in the contex of this problem.
As in the Intex to PostFix translator, in the Bindex to PostFix translator, each Bindex expression should translate to a sequence of PostFix commands that, when executed, pushes exactly one value — the integer value of that expression — onto the stack. More formally, translating a Bindex expression to PostFix should obey the following invariant:

A Bindex expression E should translate to a sequence of commands, that, when executed on a stack S, should result in a stack v :: S, where v is the result of evaluating E relative to an environment represented by S.
For simplicity, you may assume there are no unbound variables in the Bindex program you are translating. With this assumption, you may use the Option.valOf function to extract the value v from a SOME(v) value.

The BindexToPostFix structure contains a testTranslator function that takes the name of a file containing a Bindex program and a list of argument lists and (1) displays the Bindex program as an s-expression string, (2) displays the PostFix program resulting from the translation as an s-expression string and (3) tests the behavior of both programs on each of the argument lists in the list of argument lists. If the behaviors match, it prints an approprate message; if they don’t match, it reports an an error with the two different result values. For instance, in the initial skelton, the behavioral tests fail on the program in ../bindex/avg.bdx:

Testing Bindex program file ../bindex/avg.bdx

Bindex program input to translation:
(bindex (a b) (/ (+ a b) 2))

PostFix program output of translation:
(postfix 2 42)

Testing args [3,7]: *** ERROR IN TRANSLATION ***
  Bindex result: 5
  PostFix result: 42

Testing args [15,5]: *** ERROR IN TRANSLATION ***
  Bindex result: 10
  PostFix result: 42

However, when the translator is working, the following will be displayed:

Testing Bindex program file ../bindex/avg.bdx

Bindex program input to translation:
(bindex (a b) (/ (+ a b) 2))

PostFix program output of translation:
(postfix 2 1 nget 3 nget add 2 div)

Testing args [3,7]: both programs return 5

Testing args [15,5]: both programs return 10

You have succeeded when all of the test cases succeed when you load the .sml file.

4. Simprex (35 points)

Background

Rae Q. Cerf of Toy-Languages-я-Us likes the sigma construct from the Bindex lecture, but she wants something more general. In addition to expressing sums, she would also like to express numeric functions like factorial and exponentiation that are easily definable via simple recursion. The functions that Rae wants to define all have the following form:

f(n) = z, if n ≤ 0
f(n) = c(n, f(n-1)), if n > 0

Here, z is an integer that defines the value of f for any non-positive integer value and and c is a binary combining function that combines n and the value of f(n−1) for any positive n. Expanding the definition yields:

f(n) = c(n, c(n-1, c(n−2, ... c(2, c(1, z)))))

For example, to define the factorial function, Rae uses:

z_fact = 1
c_fact(i, a) = i*a

To define the exponentation function bⁿ, Rae uses:

z_expt = 1
c_expt(i, a) = b*a

In this case, c_expt ignores its first argument, but the fact that c_expt is called n times is important.

As another example, Rae defines the sum of the squares of the integers between 1 and n using

z_sos = 0
c_sos(i, a) = (i*i) + a

The examples considered so far don’t distinguish right-to-left vs. left-to-right evaluation, A simple example of that is

z_sub = 0
c_sub(i, a) = i-a

For example, when n is 4, the result is the value of 4 - (3 - (2 - (1 - 0))), which is 2, and not the value of 1 - (2 - (3 - (4 - 0))), which is -2.

Another example that distinguishes right-to-left vs. left-to-right evaluation is using Horner’s method to calculate the value of the polynomial x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n:

z_horner = 0
c_horner(i, a) = i + x*a

For example, when n is 4 and x is 5, the result is (4 + 5*(3 + 5*(2 + 5*(1 + 5*0)))) = 194, which is the value of 5^3 + 2*5^2 + 3*5^1 + 4 = (125 + 50 + 15 + 4) = 194.

Rae designs an extension to Bindex named Simprex that adds a new simprec construct for expressing her simple recursions:

(simprec E_zero (Id_num Id_ans E_combine ) E_arg)

This simprec expression evaluates E_arg to the integer value n_arg and E_zero to the integer value n_zero. If n_arg ≤ 0, it returns n_zero. Otherwise, it returns the value

c(n_arg, c(n_arg−1, c(n_arg−2, ... c(2, c(1, nzero)))))

where c is the binary combining function specified by (Id_num Id_ans E_combine). This denotes a two-argument function whose two formal parameters are named Id_num and Id_ans and whose body is E_combine. The Id_num parameter ranges over the numbers from n_arg down to 1, while the Id_ans parameter ranges over the answers built up by c starting at n_zero. The scope of Id_num and Id_ans includes only E_combine; it does not include E_zero or E_arg.

Rae’s simprec expression is closely related to the notion of primitive recursive functions defined in the theory of computation.

Here are some sample Simprex programs:

;; Program that calculuates the factorial of n
(simprex (n) (simprec 1 (i a (* i a)) n))

;; Exponentiation program raising base b to power p 
(simprex (b p) (simprec 1 (i ans (* b ans)) p))

;; Program summing the squares of the numbers from 1 to hi
(simprex (hi) (simprec 0 (i sumSoFar (+ (* i i) sumSoFar)) hi))

;; Program distingishing right-to-left and left-to-right evaluation via subtraction
(simprex (n) (simprec 0 (i ans (- i ans)) n))

;; Calculating x^(n-1) + 2*(x^(n-2)) + 3*(x^(n-3)) + ... + (n-1)*x + n via Horner's method
(simprex (n x) (simprec 0 (i a (+ i (* x a)) n)))

Your Tasks

After completing her design, Rae is called away to work on another language design problem. Toy-Languages-я-Us is impressed with your CS251 background, and has hired you to implement the Simprex language, starting with a version of the Sigmex implementation. (Sigmex is the name of the language that results from extending Bindex with the sigma construct from lecture.) Your first week on the job, you are asked to complete the following tasks that Rae has specified in a memo she has written about finishing her project.

(15 points) Rae’s memo contains the following Simprex test programs. Give the results of running each of the programs on the argument 3. Show your work so that you may get partial credit if your answer is incorrect.

;; program 1 (2 points)
(simprex (a) (simprec 0 (b c (+ 2 c)) a))

;; program 2 (3 points)
(simprex (x) (simprec 0 (n sum (+ n (* x sum))) 4))

;; program 3 (4 points)
(simprex (y) (simprec 0 (a b (+ b (sigma c 1 a (* a c)))) y))

;; program 4 (6 points)
(simprex (n) 
  (simprec (simprec (* n (- n 3)) (q r r) (* n n))
           (c d (+ d (simprec 0 (x sum (+ sum (- (* 2 x) 1))) c))) 
           (simprec -5 (a b (+ 1 b)) (* n n))))

(20 points)

Rae has created a skeleton implementation of Simprex by modifying the files for the Sigmex (= Bindex + sigma) implementation to contain stubs for the simprec construct. Her modified files, which are named Simprex.sml and SimprexEnvInterp.sml, can be found in the ps9 folder.

Finish Rae’s implementation of the Simprex language by completing the following four tasks, which Rae has listed in her memo:
1. (2 points) Extend the definition of sexpToExp in Simprex.sml to correctly parse simprec expressions.
2. (2 points) Extend the definition of expToSexp in Simprex.sml to correctly unparse simprec expressions.
3. (5 points) Extend the definition of freeVarsExp in Simprex.sml to correctly determine the free variables of a simprec expression.
4. (11 points) Extend the definition of eval in SimprexEnvInterp.sml to correctly evaluate simprec expressions using the environment model.
Notes:
- This problem is similar the the Sigmex (= Bindex + sigma) language extension we implemented in lecture. You may wish to study the solution files bindex/SigmexSolns.sml and bindex/SigmexEnvInterp.sml as part of doing this problem.
- In Simprec.sml, the exp type is defined to be:
```
and exp = Int of int (* integer literal with value *)
        | Var of var (* variable reference *)
        | BinApp of binop * exp * exp (* binary operator application with rator, rands *)
        | Bind of var * exp * exp (* bind name to value of defn in body *)
        | Sigma of var * exp * exp * exp (* E_lo, E_hi, E_body *)
        | Simprec of exp * var * var * exp * exp 
                     (* zeroExp * numVar * ansVar * combExp * argExp *)
```
  The s-expression notation (simprec E_zero (I_num I_ans E_combine) E_arg) is represented in SML as
```
Simprec (<exp for E-zero>, <string for I_num>, <string for I_ans>, 
         <exp for E_combine>, <exp for E_arg>)
```
  For example, the expression (simprec 1 (x a (* x a)) n) is represented in SML as Simprec(Int 1, "x", "a", BinApp(Mul, Var "x", Var "a"), Var "n")
- You can test your modifications of sexpToExp, expToExp, and freeVarsExp via use "Simprex.sml" after you have first uncommented the test cases at the end of Simprex.sml. This will display the results of various test cases for freeVarsExp before displaying the Simprex structure. It will also test sexpToExp and expToSexp. If you uncomment the test cases before you have correctly modified sexpToExp and expToSexp, any attempt to use "Simprec.sml" or use "SimprecEnvInterp.sml" will fail with a SyntaxError.
- You can test your modifications of eval via use "SimprexEnvInterp.sml" after you have first uncommented the test cases at the end of SimprexEnvInterp.sml. This will display the results of various test cases for eval before displaying the SimprexEnvInterp structure.

5. Extending Valex with New Primitive Operators (15 points)

The Valex language implementation is designed to make it easy to add new primitive operators to the language. In this problem, you are asked to extend Valex with some new primitive operators.

The files ValexPS9.sml and ValexEnvInterpPS9.sml in the ps9 folder contain a copy of the Valex language implementation we study in class.

In this problem, you will add each of the following four primitive operators to ValexPS9.sml:

(2 points) (abs n): Returns the absolute value of the integer n. E.g.
```
valex> (abs -17)
17
 
valex> (abs 42) 
42
```
(3 points) (sqrt n): If n is a non-negative integer, returns the integer square root n. The integer square root of a non-negative integer is the largest integer i such that i² ≤ n. Signals an error if n is negative. E.g.
```
valex> (sqrt 25) 
5

valex> (sqrt 35) 
5

valex> (sqrt 36) 
6

valex> (sqrt 37) 
6
```
Hint: The Real and Math structures contain helpful functions for this problem.
(4 points) (range lo hi): Assume lo and hi are integers. If lo < hi, returns a list of integers from lo (inclusive) to hi (exclusive). Otherwise, returns the empty list. For example:
```
valex> (range 1 20)
(list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)

valex> (range 3 8) 
(list 3 4 5 6 7)

valex> (range 8 3)
#e
```
Hint: The Utils.range function is helpful here.

(6 points) (rot n xs) Assume n is an integer and xs is a list. If xs is empty, returns the empty list. If n is a nonnegative integer, returns a list that results from moving the first n elements of the list to the end of the list. If n is greater than the length L of the list, uses (n mod L) instead. If n is a negative integer, raises an error. For example:

valex> (rot 2 (explode "abcdefg"))
(list 'c' 'd' 'e' 'f' 'g' 'a' 'b')

valex> (rot 6 (explode "abcdefg"))
(list 'g' 'a' 'b' 'c' 'd' 'e' 'f')

valex> (rot 0 (explode "abcdefg"))
(list 'a' 'b' 'c' 'd' 'e' 'f' 'g')

valex> (rot 17 (explode "abcdefg")) 
(list 'd' 'e' 'f' 'g' 'a' 'b' 'c') (* same as rot 3 *)

valex> (rot 17 (list))
#e

valex> (rot -3 (explode "abcdefg"))
Eval Error: negative rotation ~3 in rot

NOTES:

All four primitives above can be added to Valex by adding new Primop entries to the primops list in ps9/ValexPS9.sml right below the spot with this SML comment:
```
(* Put your new primops from PS9 here *)
```
Study the other primitives to see how primitives are declared.
Be careful to change the Valex implementation in the ps9 directory, not the one in the valex directory.
You should only change ValexPS9.sml and not ValexEnvInterpPS9.sml
To test your implementation after editing ValexPS9.sml, first load ValexEnvInterpPS9.sml (not ValexPS9.sml) into a *sml* buffer, and then test your new primitives interactively in a Valex read-eval-print loop (REPL) launched by invoking ValexEnvInterp.repl().

6. Desugaring `classify` (30 points)

The `classify` construct

You are a summer programming intern at Sweetshop Coding, Inc. Your supervisor, Dexter Rose, has been studying the syntactic sugar for Valex and is very impressed by the cond construct. He decides that it would be neat to extend Valex with a related classify construct that classifies an integer relative to a collection of ranges. For instance, using his construct, Dexter can write the following grade classification program:

; Classify Program 1
(valex (grade) 
  (classify grade
    ((90 100) 'A') 
    ((80 89) 'B')
    ((70 79) 'C')
    ((60 69) 'D') 
    (otherwise 'F')))

This program takes an integer grade value and returns a character indicating which range the grade falls in.

In general, the classify construct has the following form:

(classify E_disc
   ((Elo_1 Ehi_1) Ebody_1)
   ...
   ((Elo_n Ehi_n) Ebody_n)
   (otherwise E_default))

The evaluation of classify should proceed as follows.

First the discriminant expression E_disc should be evaluated to the value V_disc, which is expected to be an integer. Then V_disc should be matched against each of the clauses ((Elo_i Ehi_i) Ebody_i) from top to bottom until one matches. The value matches a clause if it lies in the range between Vlo_i and Vhi_i, inclusive, where Vlo_i is the integer value of Elo_i, and Vhi_i is the integer value of Ehi_i. When the first matching clause is found, the value of the associated expression Ebody_i is returned. If none of the clauses matches V_disc , the value of the default expression E_default is returned.

Here are a few more examples of the classify construct:

; Classify program 2 
(valex (a b c d)
  (classify (* c d)
    ((a (- (/ (+ a b) 2) 1)) (* a c)) 
    (((+ (/ (+ a b) 2) 1) b) (* b d)) 
    (otherwise (- d c))))

Program 2 emphasizes that any of the subexpressions of classify may be arbitrary expressions that require evaluation. In particular, the upper and lower bound expressions need not be integer literals. For instance, here are some examples of the resulting value returned by Program 2 for some sample inputs:

a	b	c	d	result
10	20	3	4	30
10	20	3	6	120
10	20	3	5	2

; Classify program 3
(valex (a)
  (classify a
    ((0 9) a)
    (((/ 20 a) 20) (+ a 1)) 
    (otherwise (/ 100 (- a 5)))))

Program 3 emphasizes that (1) ranges may overlap (in which case the first matching range is chosen) and (2) expressions in clauses after the matching one are not evaluated. For instance, here are here are some examples of the resulting value returned by Program 3 for some sample inputs:

a	result
0	0
5	5
10	11
20	21
25	5
30	4

Your Tasks

Dexter has asked you to implement the classify construct in Valex as syntactic sugar. You will do this in three phases.

(9 points) In your PS9 Google Doc, write down how you would like each of the three example programs above to be desugared into equivalent Valex programs that do not have classify. Your programs may include other sugar constructs, such as &&.
(9 points) In your PS9 Google Doc, write down incremental desugaring rules that desugar classify into other Valex constructs. These rules should be expressed as rewrite rules on s-expressions. For example, here are the desugaring rules for Valex’s sugared constructs except quote:
```
(&& E_rand1 E_rand2) ↝ (if E_rand1 E_rand2 #f) 
(|| E_rand1 E_rand2) ↝ (if E_rand1 #t E_rand2) 

(cond (else E_default)) ↝ E_default
(cond (E_test E_then) ...) ↝ (if E_test E_then (cond ...))

(list) ↝ #e 
(list E_head ...) ↝ (prep E_head (list ...))

(bindseq () E_body) ↝ E_body
(bindseq ((Id E_defn) ...) E_body) ↝ (bind Id E_defn (bindseq (...) E_body))

(bindpar ((Id_1 E_defn_1) ... (Id_n E_defn_n)) E_body)
  ↝ (bind Id_list (* fresh variable name *)
          (list E_defn_1 ... E_defn_n) (* eval defns in parallel *)
          (bindseq ((Id_1 (nth 1 Id_list))
                    ...
                    (Id_n (nth n Id_list)))
             E_body))
```
Note that ... means zero or more occurrences of the preceding kind of syntactic entity.

NOTES:
- Your desugaring rules may rewrite an s-expression into an s-expression that includes other sugar constructs, such as &&.
- Your desugaring rules should be written in such a way that E_disc is evaulated only once. To guarantee this, you will need to name the value of E_disc with a “fresh” variable (one that does not appear elsewhere in the program).
- Your desugaring rules should be written in such a way that the Elo and Ehi expressions in each clause are evaluated at most once.
- You should treat differently the cases where E_disc is an identifier and when it is not an identifier.
- Your rules should be designed to give the same output for the three sample programs as in part a.

(12 points) In this part, you will implement and test your desugaring rules from the previous part. You will modify the very same ps9/ValexPS9.sml from Problem 5 in this part. You should make your changes to the desugarRules function in ps9/ValexPS9.sml.

NOTES:

Be careful to change the Valex implementation in the ps9 directory, not the one in the valex directory.
Use Utils.fresh to create a fresh variable.

To test your implementation after editing ValexPS9.sml, first load ValexEnvInterpPS9.sml (not ValexPS9.sml) into a *sml* buffer, and then pay attention to the following:

In the ps9 directory, the files classify1.vlx, classify2.vlx, and classify3.vlx contain the three example classify programs from above. The file sort3.vlx contains a test file that not using classify that returns a sorted list of its 3 arguments.

Your implementation should give the same output for the three sample programs as in part a. For seeing the output of the desugaring of your classify construct for examples, use one of the following approaches:

To desugar a file, use Valex.desugarFile to print the result of desugaring the program in the file. For example, supposed sort3.vlx contains this program:

(valex (a b c)
  (cond ((&& (<= a b) (<= b c)) (list a b c))
        ((&& (<= a c) (<= c b)) (list a c b))
        ((&& (<= b a) (<= a c)) (list b a c))
        ((&& (<= b c) (<= c a)) (list c b a))
        ((&& (<= c a) (<= a b)) (list c a b))
        (else (list c b a))))

Then here is the result of using Valex.desugarFile on this program:

- Valex.desugarFile "sort3.vlx";
(valex (a b c)
       (if (if (<= a b) (<= b c) #f)
           (prep a (prep b (prep c #e)))
           (if (if (<= a c) (<= c b) #f)
               (prep a (prep c (prep b #e)))
               (if (if (<= b a) (<= a c) #f)
                   (prep b (prep a (prep c #e)))
                   (if (if (<= b c) (<= c a) #f)
                       (prep c (prep b (prep a #e)))
                       (if (if (<= c a) (<= a b) #f)
                           (prep c (prep a (prep b #e)))
                           (prep c (prep b (prep a #e)))
                           )
                       )
                   )
               )
           )
       )
val it = () : unit

To desugar an expression, one approach is to invoke the Valex.desugarString function on a string representing the expression you want to desugar. For example:

- Valex.desugarString "(&& (|| a b) (|| c d))" 
(if (if a #t b) (if c #t d) #f)
- : unit = ()

- Valex.desugarString "(list 1 2 3)"
(prep 1 (prep 2 (prep 3 #e)))
- : unit = ()

To desugar an expression, another approach is to use ValexEnvInterp.repl() to launch a read-eval-print loop and use the #desugar directive to desugar an expression. For example:

- ValexEnvInterp.repl();

valex> (#desugar (&& (|| a b) (|| c d))) 
(if (if a #t b) (if c #t d) #f)

valex> (#desugar (list 1 2 3)) 
(prep 1 (prep 2 (prep 3 #e)))

There are several ways to test the evaluation of your desugared classify construct:
1. To run a program, one approach is to invoke ValexEnvInterp.runFile on the filename and a list of integer arguments. For example:
```
- ValexEnvInterp.runFile "sort3.vlx" [23, 42, 17];
val it = List [Int 17,Int 23,Int 42] : Valex.value
```
2. To run a program, another approach is to use ValexEnvInterp.repl() to launch a read-eval-print loop and use the #run directive to run a fileon arguments. For example:
```
- ValexEnvInterp.repl();

valex> (#run sort3.vlx 23 42 17)
(list 17 23 42)
```
3. To evaluate an expression, use ValexEnvInterp.repl() to launch a read-eval-print loop and enter the expression. To evaluate an expression with free variables, use the #args directive to bind the variables. For example:
```
- ValexEnvInterp.repl();

valex> (#args (a 2) (b 3))

valex> (bindpar ((a (+ a b)) (b (* a b))) (list a b))
(list 5 6)

valex> (bindseq ((a (+ a b)) (b (* a b))) (list a b))
(list 5 15)
```

Starting this Problem Set

1. Solo Problem: Lazy Sequences (28 points)

Background

The Segment constructor

Thunks

The ThunkList constructor

The Concat constructor

Your task

Notes:

2. Static Argument Checking in Intex (12 points)

3. Bindex To PostFix (20 points)

Background

Your Task

4. Simprex (35 points)

Background

Your Tasks

5. Extending Valex with New Primitive Operators (15 points)

6. Desugaring classify (30 points)

The classify construct

Your Tasks

The `Segment` constructor

The `ThunkList` constructor

The `Concat` constructor

6. Desugaring `classify` (30 points)

The `classify` construct