Join the Fold
- Assign: Tuesday, 18 February
- Due: 11:59pm Tuesday, 25 February
- Policy: Pair graded synthesis assignment
- Partner search: Find a partner here.
-
Code:
cs251 start fold
- Submit:
git commit
,cs251 sign
, andgit push
your completed code. - Reference:
Contents
Time Info
In Fall 2019, students self-reported spending a median of 6.0 hours on this assignment.
Compared to that version of the assignment, the current version adds the mapreduce
-related programming tasks.
Tasks
1. Lexical Scope and Closures (20 points)
-
Draw lexical scope contours and the environments produced by evaluating the following sequence of Racket bindings as we did in class. (Note this may happen after the assignment is posted.)
(define f (lambda (g) (let ([x 3]) (g 2)))) (define x 4) (define h (lambda (y) (+ x y))) (define z (f h))
Choose one of these options to submit this part:
git add
andgit commit
a PDF file named exactlyenv.pdf
containing your drawings in your repository; OR- Upload photos of the lexical scope contours and the environment diagram to the CS 251 course on GradeScope. Log in with your Wellesley email address. (See Gradescope help and how to scan / upload to Gradescope with a smartphone camera.)
2. Functional Idioms in the Large (10 points)
Read MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat. The MapReduce system is a good example of how an idiom developed with a particular language or style of language can be applied even far outside that language. Consider that (a) this paper was published in 2004, when the system had already been in use for some time, (b) the scale of workloads at major software services companies has probably increased exponentially since, and (c) the MapReduce system itself is no longer used, but many of its lessons live on in successor systems.
For the following questions, write a paragraph or so in the file mapreduce.txt
.
-
The map and reduce functions used in MapReduce have names that are similar to familiar higher-order Racket functions. (
fold
is also sometimes calledreduce
.) Discuss similarities and differences between MapReduce’s programming model and the programming idioms we have used with familiar higher-order functions in Racket. What familiar Racket functions are similar to the primitives of the MapReduce framework? Is there an exact match? It may help to consider themapreduce
function you are asked to implement in the programming section below. -
In class (possibly after the release of this assignment), we consider pros and cons of immutability vs. mutability in programming languages. How do internet-scale design constraints affect the (im)mutability discussion? Discuss how this choice is important for at least 2 design problems that MapReduce aims to solve.
3. Higher-Order Data-Crunching (50 points)
This part explores several functions that use or provide higher-order
operations over lists and related structures. Many of the functions
will have similarities. Write your answers to this part in
datacrunch.rkt
.
-
A corpus (for the purposes of this assignment) is a list of documents, where each document is a pair (cons cell) of a name and the contents (i.e., list of words of the document) of the document. Names and words may be any simple values, such as symbols, strings, or numbers.
A corpus may also be considered to be:
- an association list from document name to document contents; or
- a list of lists where the first element of each list is the document name and the remaining elements are the document words.
The following example corpus is written with a careful choice of quoted notation that helps illustrate the form as we will interpret it. As we have seen before, Racket may give a different display notation. Consider why.
(define corpus '((hw . (hello world)) (hw2 . (hi world)) (wc . (world champion of the hi fi world))))
Evaluate
corpus
in DrRacket. How is it displayed? Why does DrRacket display it differently than the above? Why are both displays accurate and equivalent? Recall that'(A . B)
denotes a cons cell holding'A
in thecar
and'B
in thecdr
. You do not need to submit an answer for this part. -
Write a function
count
that takes a corpus ofdocs
and aword
and returns the total number of occurrences of word in the corpus. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces. An ideal solution never creates a fresh cons cell.> (count corpus 'hi) 2 > (count corpus 'world) 4 > (count corpus 'hello) 1 > (count corpus 'hooligans) 0 > (count null 'hooligans) 0 > (count corpus 'hw) 0
-
Write a function
grep
that takes a corpusdocs
and aword
and returns an association list mapping document name to the number of occurrences ofword
in that document. ONLY documents indocs
with at least 1 occurrence ofword
should appear as keys in the association list. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result. Order does not matter within the result.> (grep corpus 'hi) '((wc . 1) (hw2 . 1)) > (grep corpus 'world) '((wc . 2) (hw2 . 1) (hw . 1)) > (grep corpus 'hooligans) '() > (grep null 'world) '()
-
For the purposes of this task, a pile is either:
- a non-
null
atom (a number, boolean, symbol, etc.); or - a list (empty or non-empty) of piles.
A pile:
- never contains cons cells that are not lists;
- may be arbitrarily deep; and
- typically contains only one type of atom.
The following example binds
pile1
throughpile5
to piles:(define pile1 (list 1)) (define pile2 2) (define pile3 null) (define pile4 (list 2 3 4)) (define pile5 (list (list 1 2 3 4) (list 2 (list 4 5) (list 9 2 1) 3) (list 2 8 7)))
A pile may be flattened into a single-dimensional list with elements in their original pile order by applying the built in Racket function
flatten
:> (flatten pile5) '(1 2 3 4 2 4 5 9 2 1 3 2 8 7)
Your task is to write a function
foldl-pile
that takes these parameters:- a combiner function
combine
; - an initial accumulator value
init
; and - a pile
pile
;
and returns a result equivalent to the result of
(foldl combine init (flatten pile))
Implementation rules:
- You may not invoke
flatten
or any equivalent function as part of yourfoldl-pile
implementation. - You must minimize the number of intermediate data structures
(lists / cons cells) that your solution produces and discards as
by-products. The best implementations of
foldl-pile
do not create any cons cells that are not part of the final result value offoldl-pile
.
Examples:
> (foldl-pile + 0 pile1) 1 > (foldl-pile + 0 pile2) 2 > (foldl-pile + 0 pile3) 0 > (foldl-pile + 0 pile4) 9 > (foldl-pile + 0 pile5) 53 > (foldl-pile cons null pile5) '(7 8 2 3 1 2 9 5 4 2 4 3 2 1) > (foldl-pile - 1 pile4) 2 > (foldl-pile - 1 pile5) 6
Hints:
- The sample solution adds 3 lines below the define line.
- Only one of those 3 lines is longer than the define line.
- a non-
-
Write a matching function
foldr-pile
that acts just likefoldl-pile
and is subject to the same constraints, except thatfoldr-pile
folds from the right instead from the left, just asfoldr
complementsfoldl
.> (foldr-pile + 0 pile1) 1 > (foldr-pile + 0 pile2) 2 > (foldr-pile + 0 pile3) 0 > (foldr-pile + 0 pile4) 9 > (foldr-pile + 0 pile5) 53 > (foldr-pile cons null pile5) '(1 2 3 4 2 4 5 9 2 1 3 2 8 7) > (foldr-pile - 1 pile4) 2 ; there was a typo previously > (foldr-pile - 1 pile5) -4
-
Write a function
mapreduce
that takes three arguments:- a mapping function,
mapper
, that takes one element and returns one mapping result; - a reducing function,
reducer
, that takes one mapping result and one accumulator and returns one accumulator; - an initial accumulator,
init
; and - a list of elements,
elems
The
mapreduce
function applies themapper
function to each element ofelems
and uses thereducer
function to combine these results of themapper
. One possible implementation is:(define (mapreduce mapper reducer init elems) (foldl reducer init (map mapper elems)))
This implementation creates an intermediate list (the result of
map
) that is used only forfold
ing and then discarded. Your task is to implement a version that creates zero temporary cons cells.While the
reducer
step in this style of computation often has no ordering requirements (it is commutative), we will enforce left-to-right ordering of reductions for the purposes of this task to simplify testing. In other words, your implementation should produce results identical to those produced by the abovemap
/foldl
implementation. To demonstrate that ordering, some examples below use non-commutative (order matters) reducers, likecons
andappend
.> (mapreduce (lambda (x) (* x x)) cons null (list 1 2 3 4)) '(16 9 4 1) > (mapreduce length + 0 corpus) 14 > (mapreduce cadr cons null corpus) '(world hi hello) > (mapreduce (lambda (assoc) (filter (lambda (x) (not (equal? x 'world))) (cdr assoc))) append null corpus) '(champion of the hi fi hi hello)
- a mapping function,
-
Write a function
grep-mapreduce
that acts just likegrep
, but is implemented viamapreduce
. The top-level expression in the body ofgrep-mapreduce
must be a call tomapreduce
. Any number of inner helper function definitions may precede this call tomapreduce
, but neithergrep-mapreduce
nor any helper function you write for it may use explicit recursion. Helper functions may usemapreduc
or other higher-order functions. -
Write a function
wordfreqs
that takes a corpus and returns a word-frequency association list that counts how often each word appears across all documents in the corpus. Order within the result does not matter. Each word from the corpus must appear as the key of exactly one association in the result list. There is no ordering/sorting requirement. Only document contents should be counted; document titles should not be counted. The implementation must use a call tomapreduce
as its top-level body expression. Any number of inner helper function definitions may precede this call tomapreduce
, but neithergrep-mapreduce
nor any helper function you write for it may use explicit recursion. Helper functions may usemapreduce
or other higher-order functions.> (wordfreqs corpus) ; one legitimate result '((world . 4) (hello . 1) (champion . 1) (hi . 2) (fi . 1) (of . 1) (the . 1)) ; another legitimate result '((hello . 1) (hi . 2) (fi . 1) (champion . 1) (of . 1) (world . 4) (the . 1))
There are many potential ways to approach this. Aim for a solution that is easy to express, even if it is not the most efficient.
-
[Optional extra, not graded] Can
foldl-pile
be implemented withmapreduce
? How many distinct implementations can you come up with? Do they use both the mapping and reducing aspects?
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes.
$ git add ... $ git commit ...
-
Run the command
cs251 sign
to sign your work and respond to any assignment survey questions.$ cs251 sign
-
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/master'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.