Solo Assignment E
-
This is the last required solo assignment. There will be a Solo Assignment F, but it will be completely optional.
-
Dueish: Fri Dec 11. But working with a partner on PS6 has a higher priority than this solo assignment.
-
Notes:
-
This is a solo assignment. You must complete all parts of this assignment entirely on your own without help from any other person and without consulting any resources other than course materials or SML documentation. You may ask Lyn for clarification, but not for help.
-
You can do all problems on this assignment based on material you learned for PS5. Be sure to study the solutions to PS5 as part of doing this assignment.
-
This assignment has one solo problem worth 30 solo points.
-
-
How to Start Solo Assignment E
Follow the instructions in the GDoc CS251-F20-T2 Pset Submission Instructions for creating your SOLO-E Google Doc. Put all written answers and a copy of all code into this SOLO-E GDoc.
-
Submission:
- Include all code that you write in
yourAccountName-LazySequence.sml
in your SOLO-E GDoc (so that I can comment on it). - Please format your SML function definitions so that they’re easy to read. Format all derivations and code using a fixed-width font (Courier New or Consolas). You use a small font size if that helps.
-
Drop a copy of your
~/cs251/sml/solo-e/yourAccountName-LazySequence.sml file in your in your
~/cs251/drop/solo-edrop folder on
cs.wellesley.eduby executing the following (replacing **all three occurrences of
gdome` by your cs server username):~~~ scp -r ~/cs251/sml/solo-e/gdome-LazySequence.sml gdome@cs.wellesley.edu:/students/gdome/cs251/drop/solo-e ~~~
- Include all code that you write in
1. Lazy Sequences (30 points)
This is a solo problem. This means you must complete it entirely on your own without help from any other person and without consulting resources other than course materials or online documentation. You may ask Lyn for clarification, but not for help.
Starting this Problem
This problem involves files in the ~/cs251/sml/solo-e
directory in your csenv
Virtual Machine.
To create this directory, execute this Linux commands in a shell on your csenv appliance:
cd ~/cs251/sml
git pull origin master
If you encounter any issues when executing these commands, you can troubleshoot by following these git instructions. If these instructions don’t help, please contact Lyn.
Begin this problem by renaming the starter file LazySequence.sml
in ~/cs251/sml/solo-e
to yourAccountName-LazySequence.sml
. Put all your definitions for this problem in the renamed starter file.
REMINDER: ALWAYS MAKE A BACKUP OF YOUR .sml FILES AT THE END OF EACH DAY TO AVOID POTENTIAL LOSS OF WORK.
Background
This is a module/ADT problem that shares aspects of the FunSet
and OperationTreeSet
problems from PS5. It involves a data structure implementation involving multiple tree nodes, where some of the tree nodes have functional components.
The problem involves a list-like data structure called a sequence that has the following signature:
signature SEQUENCE = sig
(* The type of a sequence *)
type ''a t
(* An empty sequence *)
val empty : ''a t
(* Create a sequence of (hi - lo) values fcn(lo), fcn(lo+1), ..., fcn(hi-1) *)
val segment: int -> int -> (int -> ''a) -> ''a t
(* Convert a length-n list into a sequence of n values *)
val fromList: ''a list -> ''a t
(* Concatenate two sequences: given a length-m sequence s and length-n sequence t,
returns a single length-m+n sequence that has all values of s followed by all
values of t *)
val concat : ''a t -> ''a t -> ''a t
(* Return the length of a sequence = number of values in it *)
val length : ''a t -> int
(* Return the nth value of a sequence (0-indexed).
Raise an IndexOutOfBounds exception for an out-of-bounds index. *)
val get : int -> ''a t -> ''a
(* Return a sequence that results from applying f to each elt *)
val map : (''a -> ''b) -> ''a t -> ''b t
(* Return a list with all elements in the sequence. The ith element of the resulting
list should be the ith element of the given sequence. *)
val toList : ''a t -> ''a list
end
It would be straightforward to implement a sequence as a list. But this implementation can have issues.
As a concrete example, consider the following inefficient ways to increment and double a number:
(* A slow incrementing function *)
fun linearIncrement n =
let fun loop num ans =
if num = 0 then ans else loop (num-1) (ans+1)
in loop n 1
end
(* A very slow doubling function *)
fun quadraticDouble n =
let fun loop num ans =
if num = 0 then ans else loop (num-1) (linearIncrement ans)
in loop n n
end
For example, quadraticDouble 10000
returns fairly quickly, but quadraticDouble 100000
takes many seconds.
Now suppose we want to create the sequence
val reallyBigSeq = segment ~200000000 200000000 quadraticDouble
There are two big problems if we try to represent this as a list. First, the list would need to contain 400 million elements. Second, calculating each entry in the list (especially for large inputs) would take a very long time. So the list implementation is bad in terms of both space and time.
An alternative implementation is not to calculate and store each element when the sequence is created, but to calculate each element on the fly only when the get
operation is called to return it. This is the basis of the LazySequence
structure that you will implement in this problem. It is based on the following datatype:
datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
| ThunkList of (unit -> ''a) list
| Concat of ''a t * ''a t
In programming languages, the term “lazy” means “not computed until necessary”. In this context, an element of a sequence will not be calculated until it needs to be known.
The Segment
constructor
In the case of reallyBigSeq
, the sequence can be represented via the sum-of-product value
Segment(~200000000,200000000,quadraticDouble)
This simply keeps track of the information for constructing the sequence. Now if we wish to get the value of this sequence at index 20000017, we can apply quadraticDouble
to (200000017 + ~200000000) = 17
and return the value quadraticDouble(17)
. This way, we only “pay” for calculations done at the indices on which we actually perform get
. This leads to the following implementation of the segment
function:
fun segment lo hi fcn = Segment(lo,hi,fcn)
Thunks
Lazy data structures often employ something called a thunk
, which is simply a function of zero arguments. Wrapping a calculation in a function of zero arguments is a way to delay it until later ratger than performing it now. For example, consider testThunk
:
val testThunk = fn () => quadraticDouble 100000
Even though calculating quadraticDouble 100000
takes many seconds, defining testThunk
is immediate, because no calculation is performed until testThunk
is called, as in testThunk()
. This is called “dethunking”, which can be defined as follows:
fun dethunk thunk = thunk()
So testThunk()
can equivalently be written dethunk(testThunk)
Note that every SML function necessarily takes exactly one argument, so the notion that thunks take “zero” arguments is loose terminology. In fact a thunk takes the “unit value”, written ()
, which is the only value of type unit
and can be viewed as a tuple with zero components.
The upside of a thunk is that the delayed computation is never performed if the thunk is never dethunked. The downside is that the calculation is performed every time the thunk is dethunked, so computations can be repeated unnecessarily.
Note that the type of testThunk
is unit -> int
, where unit
specifies that the ()
value must be supplied to the thunk in order to extract its value. In general, the type of a thunk for a value of type ''a
is unit -> ''a
, so we use this as the definition of a type abbreviation named thunkTy
:
type ''a thunkTy = unit -> ''a
The ThunkList
constructor
Whereas the Segment
constructor supports “regular” lazy sequences whose values are determined by some function over a range of integers, the ThunkList
constructor is intended to support more “irregular” sequences in which there is not a simple correspondence between sequence values and a range of integers. For example, consider these sequences:
val numSeq = fromList [300005, 20074, 4000017, 1002]
val quadSeq = map quadraticDouble numSeq
Suppose we want all the applications of quadraticDouble
to be delayed until the sequence elements are extracted. E.g., get 1 quadSeq
should calculate quadraticDouble 20074
when the get
is performed, but not before that. If the elements at indices 0 and 2 are never accessed via get
, then the time-intensive calculations quadraticDouble 300005
and quadraticDouble 4000017
are never performed.
The ThunkList
constructor takes a list of thunks:
ThunkList of ''a thunkTy list
Using ThunkList
, the quadSeq
sequence can be represented as something like:
ThunkList[ fn () => quadraticDouble 300005,
fn () => quadraticDouble 20074,
fn () => quadraticDouble 4000017,
fn () => quadraticDouble 1002 ]
Based on this idea, we can use ThunkList
to represent all “irregular” sequences constructed from lists via fromList
:
fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)
This means that in numSeq
, all the numbers will be thunked, and so will be equivalent to:
ThunkList[ fn () => 300005,
fn () => 20074,
fn () => 4000017,
fn () => 1002 ]]
The Concat
constructor
The final constructor in the ''a t
sum-of-product sequence datatype is the Concat
constructor:
Concat of ''a t * ''a t
This is simply used to glue together two sequences:
fun concat seq1 seq2 = Concat(seq1,seq2)
Summary
In the end, any instance of the ''a t
sum-of-product sequence datatype is a binary tree of Concat
nodes whose leaves are either Segment
nodes or ThunkList
nodes.
In this way, an instance of the sequence
datatype is a tree in the same way that instances of OperationTreeSet
were trees in PS5 Problem 4.
Because both the Segment
and ThunkList
constructors involve functions that delay computation until a result is needed, they are similar to the FunSet
implementation of sets in PS5 Problem 3.
Your task
yourAccountName-LazySequence.sml
contains the SEQUENCE
signature from above and the following partial implementation of the LazySequence
structure:
structure LazySequence :> SEQUENCE = struct
type ''a thunkTy = unit -> ''a
datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *)
| ThunkList of ''a thunkTy list
| Concat of ''a t * ''a t
fun segment lo hi fcn = Segment(lo,hi,fcn)
fun fromList xs = ThunkList (List.map (fn x => (fn () => x)) xs)
fun concat seq1 seq2 = Concat(seq1,seq2)
fun dethunk thunk = thunk()
val empty = ThunkList []
fun length seq = 0 (* replace this stub *)
fun get n seq = raise Unimplemented (* replace this stub *)
fun map f seq = empty (* replace this stub *)
fun toList seq = [] (* replace this stub *)
end
The segment
, fromList
, concat
functions and empty
value have already been defined for you. Your task is to implement the remaining four functions: length
, get
, map
, and toList
. Pay attention to the following notes in your implementation.
Notes:
-
Be careful to use the explicitly qualified
List.map
andList.length
functions when operating on a list, but unqualifiedmap
andlength
operators when operating on a sequence. Any attempt to use the unqualifiedmap
andlength
on a list will result in a type error (because SML will assume they are the sequence operators and not the list operators). -
Do not use the
toList
function in the implementations oflength
,get
, andmap
. Similarly, theget
function should not be used in the implementation oftoList
. However, you may use thelength
function in the implementation ofget
. -
The
length
function should always return an integer and should never raise an exception. -
The
get
function should raise anIndexOutOfBounds
exception for a sequences
when called on a index outside the range 0 andlength(s) - 1
. To raise this exception for an out-of-bounds indexn
, useraise (IndexOutOfBounds n)
. In the implementation of theget
function, full credit will be awarded only if the out-of-bounds length check is performed only one (not separately for each of the three constructors). You may use thelength
function in the implementation ofget
. -
The
map
function should always return a sequence and should never raise an exception, even if the mapped function might raise an error on one of the values. For example, inval mapSeq2 = map (fn n => 20 div n) (fromList [3, 0, 5, 2, 4])
no exception should be raised when defining
mapSeq2
or evaluatingget 0 mapSeq2
,get 2 mapSeq2
,get 3 mapSeq2
, orget 4 mapSeq2
. Butget 1 mapSeq2
should raise a divide-by-zero exception. -
The
map
function should preserve the “shape” of its input sum-of-product constructor tree. So it should return aConcat
output for aConcat
input, aSegment
output for aSegment
input, and aThunkList
output for aThunkList
input. -
In the implementation of the
map
function, it will be necessary to combine two function values into one. What is the most natural way to do this? (Hint: it’s something we’ve studied before!) -
If calculating one or more values of a sequence
s
raises an exception,toList s
should raise the exception associated with the value having the lowest index. -
To test your sequence implementation in
yourAccountName-LazySequence.sml
, follow these steps:-
In the file
LazySequenceTest.sml
, change the lineuse "LazySequence.sml";
touse "yourAccountName-LazySequence.sml";
-
Run the file
LazySequenceTest.sml
.-
If running
LazySequenceTest.sml
reports a syntax or type error, there are syntax or type errors in yourLazySequence.sml
program that need to be fixed before you can test it. -
Running
LazySequenceTest.sml
should print the results of running 153 test cases. If some of the test cases don’t pass, scan back for the stringFAIL
to see expected and actual results for the test cases that failed. -
If the printing of test cases hangs for a long time without completing, it means that you are not being lazy enough in your implementation, and the test case on which it hangs is taking much longer than it should.
-
-
-
Here is a FAQ associated with this problem:
Q1: What is the structure of a lazy sequence?
A1: Just as an
OperationTreeSet
is a tree made out of six kinds of nodes, aLazySequence
is a tree made out of the three kinds of nodes in this datatype:datatype ''a t = Segment of int * int * (int -> ''a) (* lo, hi, fcn *) | ThunkList of (unit -> ''a) list | Concat of ''a t * ''a t
The
Segment
,ThunkList
, andConcat
constructors represent three completely different ways to represent a lazy sequence. * ASegment
represents a length-n lazy sequence as a single unary (single-argument) function that is applied to every index in a range with n indices.-
A
ThunkList
represents a length-n lazy sequence as a list of n nullary (zero-argument) functions (known as thunks). -
A
Concat
represents a length-(m+n) lazy sequence as the combination of a length-m lazy sequence and a length-n lazy sequence.
Q2: Can lazy sequences have infinite length?
A2: No. Based on the answer to Q1, all lazy sequences have a finite number of elements.
Q3: Why do I get the error
data constructor Segment used without argument in pattern
?A3: Be sure to wrap pattens in parens. E.g., write
(Segment(lo,hi,func))
, notSegment(lo,hi,func)
Q4: When I send
LazySequence.sml
to the*sml*
buffer, I don’t get an interpreter prompt in the*sml*
buffer. Why?A4: Look for an infinite loop in your code that is causing it to hang.
Q5: When I run
LazySequenceTest.sml
, it gets stuck and doesn’t print out a summary of how many test cases have passed/failed. Why?A5: This likely means that your implementation is not being sufficiently lazy, and that the tester is taking a long time to complete calculations it should not be computing.
-