• This is the last required solo assignment. There will be a Solo Assignment F, but it will be completely optional.

  • Dueish: Fri Dec 11. But working with a partner on PS6 has a higher priority than this solo assignment.

  • Notes:

    • This is a solo assignment. You must complete all parts of this assignment entirely on your own without help from any other person and without consulting any resources other than course materials or SML documentation. You may ask Lyn for clarification, but not for help.

    • You can do all problems on this assignment based on material you learned for PS5. Be sure to study the solutions to PS5 as part of doing this assignment.

    • This assignment has one solo problem worth 30 solo points.

  • How to Start Solo Assignment E

    Follow the instructions in the GDoc CS251-F20-T2 Pset Submission Instructions for creating your SOLO-E Google Doc. Put all written answers and a copy of all code into this SOLO-E GDoc.

  • Submission:

    • Include all code that you write in yourAccountName-LazySequence.sml in your SOLO-E GDoc (so that I can comment on it).
    • Please format your SML function definitions so that they’re easy to read. Format all derivations and code using a fixed-width font (Courier New or Consolas). You use a small font size if that helps.
    • Drop a copy of your ~/cs251/sml/solo-e/yourAccountName-LazySequence.sml file in your in your~/cs251/drop/solo-e drop folder oncs.wellesley.edu by executing the following (replacing **all three occurrences ofgdome` by your cs server username):

      ~~~
      scp -r ~/cs251/sml/solo-e/gdome-LazySequence.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/solo-e
      ~~~

1. Lazy Sequences (30 points)

This is a solo problem. This means you must complete it entirely on your own without help from any other person and without consulting resources other than course materials or online documentation. You may ask Lyn for clarification, but not for help.

Starting this Problem

This problem involves files in the ~/cs251/sml/solo-e directory in your csenv Virtual Machine.

To create this directory, execute this Linux commands in a shell on your csenv appliance:

cd ~/cs251/sml
git pull origin master

If you encounter any issues when executing these commands, you can troubleshoot by following these git instructions. If these instructions don’t help, please contact Lyn.

Begin this problem by renaming the starter file LazySequence.sml in ~/cs251/sml/solo-e to yourAccountName-LazySequence.sml. Put all your definitions for this problem in the renamed starter file.

REMINDER: ALWAYS MAKE A BACKUP OF YOUR .sml FILES AT THE END OF EACH DAY TO AVOID POTENTIAL LOSS OF WORK.

Background

This is a module/ADT problem that shares aspects of the FunSet and OperationTreeSet problems from PS5. It involves a data structure implementation involving multiple tree nodes, where some of the tree nodes have functional components.

The problem involves a list-like data structure called a sequence that has the following signature:

signature SEQUENCE = sig
    (* The type of a sequence *)
    type ''a t 

    (* An empty sequence *)
    val empty : ''a t

    (* Create a sequence of (hi - lo) values fcn(lo), fcn(lo+1), ..., fcn(hi-1) *)
    val segment: int -> int -> (int -> ''a) -> ''a t

    (* Convert a length-n list into a sequence of n values *)                          
    val fromList: ''a list -> ''a t

    (* Concatenate two sequences: given a length-m sequence s and length-n sequence t, 
       returns a single length-m+n sequence that has all values of s followed by all 
       values of t *)
    val concat : ''a t -> ''a t -> ''a t

    (* Return the length of a sequence = number of values in it *)                              
    val length : ''a t -> int

    (* Return the nth value of a sequence (0-indexed). 
       Raise an IndexOutOfBounds exception for an out-of-bounds index. *)                                
    val get : int -> ''a t -> ''a

    (* Return a sequence that results from applying f to each elt *)
    val map : (''a -> ''b) -> ''a t -> ''b t

    (* Return a list with all elements in the sequence. The ith element of the resulting
       list should be the ith element of the given sequence. *)
    val toList : ''a t -> ''a list

end

It would be straightforward to implement a sequence as a list. But this implementation can have issues.

As a concrete example, consider the following inefficient ways to increment and double a number:

(* A slow incrementing function *)
fun linearIncrement n =
  let fun loop num ans =
  if num = 0 then ans else loop (num-1) (ans+1)
  in loop n 1
 end

(* A very slow doubling function *)
fun quadraticDouble n =
  let fun loop num ans =
  if num = 0 then ans else loop (num-1) (linearIncrement ans)
  in loop n n
 end

For example, quadraticDouble 10000 returns fairly quickly, but quadraticDouble 100000 takes many seconds.

Now suppose we want to create the sequence

val reallyBigSeq = segment ~200000000 200000000 quadraticDouble

There are two big problems if we try to represent this as a list. First, the list would need to contain 400 million elements. Second, calculating each entry in the list (especially for large inputs) would take a very long time. So the list implementation is bad in terms of both space and time.

An alternative implementation is not to calculate and store each element when the sequence is created, but to calculate each element on the fly only when the get operation is called to return it. This is the basis of the LazySequence structure that you will implement in this problem. It is based on the following datatype:

datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
       | ThunkList of (unit -> ''a) list
       | Concat of ''a t * ''a t

In programming languages, the term “lazy” means “not computed until necessary”. In this context, an element of a sequence will not be calculated until it needs to be known.

The Segment constructor

In the case of reallyBigSeq, the sequence can be represented via the sum-of-product value

Segment(~200000000,200000000,quadraticDouble)

This simply keeps track of the information for constructing the sequence. Now if we wish to get the value of this sequence at index 20000017, we can apply quadraticDouble to (200000017 + ~200000000) = 17 and return the value quadraticDouble(17). This way, we only “pay” for calculations done at the indices on which we actually perform get. This leads to the following implementation of the segment function:

fun segment lo hi fcn = Segment(lo,hi,fcn)

Thunks

Lazy data structures often employ something called a thunk, which is simply a function of zero arguments. Wrapping a calculation in a function of zero arguments is a way to delay it until later ratger than performing it now. For example, consider testThunk:

val testThunk = fn () => quadraticDouble 100000

Even though calculating quadraticDouble 100000 takes many seconds, defining testThunk is immediate, because no calculation is performed until testThunk is called, as in testThunk(). This is called “dethunking”, which can be defined as follows:

fun dethunk thunk = thunk()

So testThunk() can equivalently be written dethunk(testThunk)

Note that every SML function necessarily takes exactly one argument, so the notion that thunks take “zero” arguments is loose terminology. In fact a thunk takes the “unit value”, written (), which is the only value of type unit and can be viewed as a tuple with zero components.

The upside of a thunk is that the delayed computation is never performed if the thunk is never dethunked. The downside is that the calculation is performed every time the thunk is dethunked, so computations can be repeated unnecessarily.

Note that the type of testThunk is unit -> int, where unit specifies that the () value must be supplied to the thunk in order to extract its value. In general, the type of a thunk for a value of type ''a is unit -> ''a, so we use this as the definition of a type abbreviation named thunkTy:

type ''a thunkTy = unit -> ''a

The ThunkList constructor

Whereas the Segment constructor supports “regular” lazy sequences whose values are determined by some function over a range of integers, the ThunkList constructor is intended to support more “irregular” sequences in which there is not a simple correspondence between sequence values and a range of integers. For example, consider these sequences:

val numSeq = fromList [300005, 20074, 4000017, 1002]
val quadSeq = map quadraticDouble numSeq

Suppose we want all the applications of quadraticDouble to be delayed until the sequence elements are extracted. E.g., get 1 quadSeq should calculate quadraticDouble 20074 when the get is performed, but not before that. If the elements at indices 0 and 2 are never accessed via get, then the time-intensive calculations quadraticDouble 300005 and quadraticDouble 4000017 are never performed.

The ThunkList constructor takes a list of thunks:

ThunkList of ''a thunkTy list

Using ThunkList, the quadSeq sequence can be represented as something like:

ThunkList[ fn () => quadraticDouble 300005,      
           fn () => quadraticDouble 20074,      
           fn () => quadraticDouble 4000017, 
           fn () => quadraticDouble 1002 ]

Based on this idea, we can use ThunkList to represent all “irregular” sequences constructed from lists via fromList:

fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)

This means that in numSeq, all the numbers will be thunked, and so will be equivalent to:

ThunkList[ fn () => 300005,      
           fn () => 20074,      
           fn () => 4000017, 
           fn () => 1002 ]]

The Concat constructor

The final constructor in the ''a t sum-of-product sequence datatype is the Concat constructor:

Concat of ''a t * ''a t

This is simply used to glue together two sequences:

fun concat seq1 seq2 = Concat(seq1,seq2)

Summary

In the end, any instance of the ''a t sum-of-product sequence datatype is a binary tree of Concat nodes whose leaves are either Segment nodes or ThunkList nodes.

In this way, an instance of the sequence datatype is a tree in the same way that instances of OperationTreeSet were trees in PS5 Problem 4.

Because both the Segment and ThunkList constructors involve functions that delay computation until a result is needed, they are similar to the FunSet implementation of sets in PS5 Problem 3.

Your task

yourAccountName-LazySequence.sml contains the SEQUENCE signature from above and the following partial implementation of the LazySequence structure:

  structure LazySequence :> SEQUENCE = struct

     type ''a thunkTy = unit -> ''a

     datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
                 | ThunkList of ''a thunkTy list
                 | Concat of ''a t * ''a t

     fun segment lo hi fcn = Segment(lo,hi,fcn)
     fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)
     fun concat seq1 seq2 = Concat(seq1,seq2)
     fun dethunk thunk = thunk()                            

     val empty = ThunkList []

     fun length seq = 0 (* replace this stub *)

     fun get n seq = raise Unimplemented (* replace this stub *)

     fun map f seq = empty (* replace this stub *)

     fun toList seq = [] (* replace this stub *)

  end

The segment, fromList, concat functions and empty value have already been defined for you. Your task is to implement the remaining four functions: length, get, map, and toList. Pay attention to the following notes in your implementation.

Notes:

  • Be careful to use the explicitly qualified List.map and List.length functions when operating on a list, but unqualified map and length operators when operating on a sequence. Any attempt to use the unqualified map and length on a list will result in a type error (because SML will assume they are the sequence operators and not the list operators).

  • Do not use the toList function in the implementations of length, get, and map. Similarly, the get function should not be used in the implementation of toList. However, you may use the length function in the implementation of get.

  • The length function should always return an integer and should never raise an exception.

  • The get function should raise an IndexOutOfBounds exception for a sequence s when called on a index outside the range 0 and length(s) - 1. To raise this exception for an out-of-bounds index n, use raise (IndexOutOfBounds n). In the implementation of the get function, full credit will be awarded only if the out-of-bounds length check is performed only one (not separately for each of the three constructors). You may use the length function in the implementation of get.

  • The map function should always return a sequence and should never raise an exception, even if the mapped function might raise an error on one of the values. For example, in

    val mapSeq2 = map (fn n => 20 div n) (fromList [3, 0, 5, 2, 4])

    no exception should be raised when defining mapSeq2 or evaluating get 0 mapSeq2, get 2 mapSeq2, get 3 mapSeq2, or get 4 mapSeq2. But get 1 mapSeq2 should raise a divide-by-zero exception.

  • The map function should preserve the “shape” of its input sum-of-product constructor tree. So it should return a Concat output for a Concat input, a Segment output for a Segment input, and a ThunkList output for a ThunkList input.

  • In the implementation of the map function, it will be necessary to combine two function values into one. What is the most natural way to do this? (Hint: it’s something we’ve studied before!)

  • If calculating one or more values of a sequence s raises an exception, toList s should raise the exception associated with the value having the lowest index.

  • To test your sequence implementation in yourAccountName-LazySequence.sml, follow these steps:

    • In the file LazySequenceTest.sml, change the line use "LazySequence.sml"; to use "yourAccountName-LazySequence.sml";

    • Run the file LazySequenceTest.sml.

      • If running LazySequenceTest.sml reports a syntax or type error, there are syntax or type errors in your LazySequence.sml program that need to be fixed before you can test it.

      • Running LazySequenceTest.sml should print the results of running 153 test cases. If some of the test cases don’t pass, scan back for the string FAIL to see expected and actual results for the test cases that failed.

      • If the printing of test cases hangs for a long time without completing, it means that you are not being lazy enough in your implementation, and the test case on which it hangs is taking much longer than it should.

  • Here is a FAQ associated with this problem:

    Q1: What is the structure of a lazy sequence?

    A1: Just as an OperationTreeSet is a tree made out of six kinds of nodes, a LazySequence is a tree made out of the three kinds of nodes in this datatype:

    datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
                   | ThunkList of (unit -> ''a) list
                   | Concat of ''a t * ''a t

    The Segment, ThunkList, and Concat constructors represent three completely different ways to represent a lazy sequence. * A Segment represents a length-n lazy sequence as a single unary (single-argument) function that is applied to every index in a range with n indices.

    • A ThunkList represents a length-n lazy sequence as a list of n nullary (zero-argument) functions (known as thunks).

    • A Concat represents a length-(m+n) lazy sequence as the combination of a length-m lazy sequence and a length-n lazy sequence.

    Q2: Can lazy sequences have infinite length?

    A2: No. Based on the answer to Q1, all lazy sequences have a finite number of elements.

    Q3: Why do I get the error data constructor Segment used without argument in pattern?

    A3: Be sure to wrap pattens in parens. E.g., write (Segment(lo,hi,func)), not Segment(lo,hi,func)

    Q4: When I send LazySequence.sml to the *sml* buffer, I don’t get an interpreter prompt in the *sml* buffer. Why?

    A4: Look for an infinite loop in your code that is causing it to hang.

    Q5: When I run LazySequenceTest.sml, it gets stuck and doesn’t print out a summary of how many test cases have passed/failed. Why?

    A5: This likely means that your implementation is not being sufficiently lazy, and that the tester is taking a long time to complete calculations it should not be computing.