Join the Fold
- Assign: Tuesday, 18 February
- Due: 11:59pm Tuesday, 25 February
- Policy: Pair graded synthesis assignment
- Partner search: Find a partner here.
-
Code:
cs251 start fold - Submit:
git commit,cs251 sign, andgit pushyour completed code. - Reference:
Contents
Time Info
In Fall 2019, students self-reported spending a median of 6.0 hours on this assignment.
Compared to that version of the assignment, the current version adds the mapreduce-related programming tasks.
Tasks
1. Lexical Scope and Closures (20 points)
-
Draw lexical scope contours and the environments produced by evaluating the following sequence of Racket bindings as we did in class. (Note this may happen after the assignment is posted.)
(define f (lambda (g) (let ([x 3]) (g 2)))) (define x 4) (define h (lambda (y) (+ x y))) (define z (f h))Choose one of these options to submit this part:
git addandgit commita PDF file named exactlyenv.pdfcontaining your drawings in your repository; OR- Upload photos of the lexical scope contours and the environment diagram to the CS 251 course on GradeScope. Log in with your Wellesley email address. (See Gradescope help and how to scan / upload to Gradescope with a smartphone camera.)
2. Functional Idioms in the Large (10 points)
Read MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat. The MapReduce system is a good example of how an idiom developed with a particular language or style of language can be applied even far outside that language. Consider that (a) this paper was published in 2004, when the system had already been in use for some time, (b) the scale of workloads at major software services companies has probably increased exponentially since, and (c) the MapReduce system itself is no longer used, but many of its lessons live on in successor systems.
For the following questions, write a paragraph or so in the file mapreduce.txt.
-
The map and reduce functions used in MapReduce have names that are similar to familiar higher-order Racket functions. (
foldis also sometimes calledreduce.) Discuss similarities and differences between MapReduce’s programming model and the programming idioms we have used with familiar higher-order functions in Racket. What familiar Racket functions are similar to the primitives of the MapReduce framework? Is there an exact match? It may help to consider themapreducefunction you are asked to implement in the programming section below. -
In class (possibly after the release of this assignment), we consider pros and cons of immutability vs. mutability in programming languages. How do internet-scale design constraints affect the (im)mutability discussion? Discuss how this choice is important for at least 2 design problems that MapReduce aims to solve.
3. Higher-Order Data-Crunching (50 points)
This part explores several functions that use or provide higher-order
operations over lists and related structures. Many of the functions
will have similarities. Write your answers to this part in
datacrunch.rkt.
-
A corpus (for the purposes of this assignment) is a list of documents, where each document is a pair (cons cell) of a name and the contents (i.e., list of words of the document) of the document. Names and words may be any simple values, such as symbols, strings, or numbers.
A corpus may also be considered to be:
- an association list from document name to document contents; or
- a list of lists where the first element of each list is the document name and the remaining elements are the document words.
The following example corpus is written with a careful choice of quoted notation that helps illustrate the form as we will interpret it. As we have seen before, Racket may give a different display notation. Consider why.
(define corpus '((hw . (hello world)) (hw2 . (hi world)) (wc . (world champion of the hi fi world))))Evaluate
corpusin DrRacket. How is it displayed? Why does DrRacket display it differently than the above? Why are both displays accurate and equivalent? Recall that'(A . B)denotes a cons cell holding'Ain thecarand'Bin thecdr. You do not need to submit an answer for this part. -
Write a function
countthat takes a corpus ofdocsand awordand returns the total number of occurrences of word in the corpus. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces. An ideal solution never creates a fresh cons cell.> (count corpus 'hi) 2 > (count corpus 'world) 4 > (count corpus 'hello) 1 > (count corpus 'hooligans) 0 > (count null 'hooligans) 0 > (count corpus 'hw) 0 -
Write a function
grepthat takes a corpusdocsand awordand returns an association list mapping document name to the number of occurrences ofwordin that document. ONLY documents indocswith at least 1 occurrence ofwordshould appear as keys in the association list. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result. Order does not matter within the result.> (grep corpus 'hi) '((wc . 1) (hw2 . 1)) > (grep corpus 'world) '((wc . 2) (hw2 . 1) (hw . 1)) > (grep corpus 'hooligans) '() > (grep null 'world) '() -
For the purposes of this task, a pile is either:
- a non-
nullatom (a number, boolean, symbol, etc.); or - a list (empty or non-empty) of piles.
A pile:
- never contains cons cells that are not lists;
- may be arbitrarily deep; and
- typically contains only one type of atom.
The following example binds
pile1throughpile5to piles:(define pile1 (list 1)) (define pile2 2) (define pile3 null) (define pile4 (list 2 3 4)) (define pile5 (list (list 1 2 3 4) (list 2 (list 4 5) (list 9 2 1) 3) (list 2 8 7)))A pile may be flattened into a single-dimensional list with elements in their original pile order by applying the built in Racket function
flatten:> (flatten pile5) '(1 2 3 4 2 4 5 9 2 1 3 2 8 7)Your task is to write a function
foldl-pilethat takes these parameters:- a combiner function
combine; - an initial accumulator value
init; and - a pile
pile;
and returns a result equivalent to the result of
(foldl combine init (flatten pile))Implementation rules:
- You may not invoke
flattenor any equivalent function as part of yourfoldl-pileimplementation. - You must minimize the number of intermediate data structures
(lists / cons cells) that your solution produces and discards as
by-products. The best implementations of
foldl-piledo not create any cons cells that are not part of the final result value offoldl-pile.
Examples:
> (foldl-pile + 0 pile1) 1 > (foldl-pile + 0 pile2) 2 > (foldl-pile + 0 pile3) 0 > (foldl-pile + 0 pile4) 9 > (foldl-pile + 0 pile5) 53 > (foldl-pile cons null pile5) '(7 8 2 3 1 2 9 5 4 2 4 3 2 1) > (foldl-pile - 1 pile4) 2 > (foldl-pile - 1 pile5) 6Hints:
- The sample solution adds 3 lines below the define line.
- Only one of those 3 lines is longer than the define line.
- a non-
-
Write a matching function
foldr-pilethat acts just likefoldl-pileand is subject to the same constraints, except thatfoldr-pilefolds from the right instead from the left, just asfoldrcomplementsfoldl.> (foldr-pile + 0 pile1) 1 > (foldr-pile + 0 pile2) 2 > (foldr-pile + 0 pile3) 0 > (foldr-pile + 0 pile4) 9 > (foldr-pile + 0 pile5) 53 > (foldr-pile cons null pile5) '(1 2 3 4 2 4 5 9 2 1 3 2 8 7) > (foldr-pile - 1 pile4) 2 ; there was a typo previously > (foldr-pile - 1 pile5) -4 -
Write a function
mapreducethat takes three arguments:- a mapping function,
mapper, that takes one element and returns one mapping result; - a reducing function,
reducer, that takes one mapping result and one accumulator and returns one accumulator; - an initial accumulator,
init; and - a list of elements,
elems
The
mapreducefunction applies themapperfunction to each element ofelemsand uses thereducerfunction to combine these results of themapper. One possible implementation is:(define (mapreduce mapper reducer init elems) (foldl reducer init (map mapper elems)))This implementation creates an intermediate list (the result of
map) that is used only forfolding and then discarded. Your task is to implement a version that creates zero temporary cons cells.While the
reducerstep in this style of computation often has no ordering requirements (it is commutative), we will enforce left-to-right ordering of reductions for the purposes of this task to simplify testing. In other words, your implementation should produce results identical to those produced by the abovemap/foldlimplementation. To demonstrate that ordering, some examples below use non-commutative (order matters) reducers, likeconsandappend.> (mapreduce (lambda (x) (* x x)) cons null (list 1 2 3 4)) '(16 9 4 1) > (mapreduce length + 0 corpus) 14 > (mapreduce cadr cons null corpus) '(world hi hello) > (mapreduce (lambda (assoc) (filter (lambda (x) (not (equal? x 'world))) (cdr assoc))) append null corpus) '(champion of the hi fi hi hello) - a mapping function,
-
Write a function
grep-mapreducethat acts just likegrep, but is implemented viamapreduce. The top-level expression in the body ofgrep-mapreducemust be a call tomapreduce. Any number of inner helper function definitions may precede this call tomapreduce, but neithergrep-mapreducenor any helper function you write for it may use explicit recursion. Helper functions may usemapreducor other higher-order functions. -
Write a function
wordfreqsthat takes a corpus and returns a word-frequency association list that counts how often each word appears across all documents in the corpus. Order within the result does not matter. Each word from the corpus must appear as the key of exactly one association in the result list. There is no ordering/sorting requirement. Only document contents should be counted; document titles should not be counted. The implementation must use a call tomapreduceas its top-level body expression. Any number of inner helper function definitions may precede this call tomapreduce, but neithergrep-mapreducenor any helper function you write for it may use explicit recursion. Helper functions may usemapreduceor other higher-order functions.> (wordfreqs corpus) ; one legitimate result '((world . 4) (hello . 1) (champion . 1) (hi . 2) (fi . 1) (of . 1) (the . 1)) ; another legitimate result '((hello . 1) (hi . 2) (fi . 1) (champion . 1) (of . 1) (world . 4) (the . 1))There are many potential ways to approach this. Aim for a solution that is easy to express, even if it is not the most efficient.
-
[Optional extra, not graded] Can
foldl-pilebe implemented withmapreduce? How many distinct implementations can you come up with? Do they use both the mapping and reducing aspects?
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes.
$ git add ... $ git commit ... -
Run the command
cs251 signto sign your work and respond to any assignment survey questions.$ cs251 sign -
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status shows both:
Your branch is up to date with 'origin/master', meaning all local commits have been pushednothing to commit, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.