Random Sentences and Sieves
- Due: 11:00pm Monday 7 December
- Files:
- fork wellesleycs251 / cs251-scala and add bpw as admin
- Time estimates in
time.txt
- Submission: commit and push your completed work to your Bitbucket fork.
- Relevant reading:
- Scala language reference
- J. P. Rosen, What Orientation Should Ada Objects Take?
- (old) Scala Actors Tutorial (ignore new Akka actors)
- Notes on Parallelism and Concurrency in Java sections 2, 3, 6-10. (Section 10 covers producer-consumer pattern)
- Tools:
- Collaboration: All problems fall under the group collaboration policy. You may work in pairs or alone. You may discuss ideas with other groups, but your writing and code must be the work of your group members. If you work with a partner, you must work together as a pair on all parts of the problem rather than working on separate pieces.
Overview
This assignment uses small programming exercises to contrast different design approaches in object-oriented programming and in parallel and concurrent programming.
- The first exercise is mostly an in-class activity, to be finished with the assignment if needed.
- The second exercise will familiarize you furthher with Scala and explore more design trade-offs.
- The third (to be published after Thanksgiving break) will provide a taste of concurrent programming.
Contents
- Expression as Objects (mostly in class)
- Random Sentence Generator
- OPTIONAL but Recommended Reading
- Parallel Sieves
See reference material and tool documentation for Scala and its tools.
The tools are installed on the CS Linux machines. If you are using the wx appliance, be sure to update its configuration to include Scala tools by running the wx update
command while logged in as the wx
user.
Expression as Objects1 (mostly in class)
As discussed in class, idiomatic code in “functional” programming languages and object-oriented programming languages tends toward certain patterns of organization. Functional programming tends to organize code around behavior of operations and is thus easier to extend with new behaviors. Object-oriented programming tends to organization code around behavior of kinds of data and is thus easier to extend with new data variants. In this part, we explore a design pattern (a common idiom) for writing object-oriented code that seeks to emulate the operation-centered organization of functional programming while using the object-oriented semantic tool: dynamic dispatch.
Consider an object-oriented implementation of a small calculator expression language much like the various such languages we have worked with in ML. An object oriented approach would typically use a class hierarchy to organize expressions and the behavior of operations over expressions:
- An abstract base class
Expr
defines the set of operations each expression should support. - Concrete subclasses of
Expr
each define behavior for one kind of expression (Number
,Plus
,Times
, etc.).
The hierarchy is likely clear, but the organization of operations such as eval
, toString
, etc., is more subtle. The obvious way of implementing operations is by adding a method to each class for each operation. The Expression hierarchy would then look like:
abstract class Expr {
def toString() : String
def eval() : Int
}
class Number(n: Int) extends Expr {
override def toString() : String = { ... }
override def eval() : Int = { ... }
}
class Plus(left: Expr, right: Expr) extends Expr {
override def toString() : String = { ... }
override def eval() : Int = left.eval() + right.eval()
}
Another way of implementing expression classes and operations uses a
pattern called the Visitor Design Pattern to emulate the functional-style organization where all code for one operation is collected together. In this pattern, each
operation is represented by a Visitor
class. Each Visitor class
has a visitNumber(e: Number)
method, a visitPlus(e: Plus)
method, and so on – one for each expression class in the hierarchy. Each expression class in the
expression hierarchy has an accept
method which accepts a
Visitor
as an argument and “allows the Visitor to visit the object
and perform its operation.” The expression class does not need to
know what operation the visitor is performing.
If you write a Visitor class ToString
to construct a string
representation of an expression tree, it would be used as follows:
val expTree = new Plus(new Plus(new Number(3), new Number(4)), new Number(2))
val printer = new ToString()
val stringRep = expTree.accept(printer)
println(stringRep)
The first line defines an expression, the second defines an instance
of your ToString
class, and the third passes your visitor object
to the accept
method of the expression object.
The expression class hierarchy using the Visitor Design Pattern has
the following form, with an accept
method in each class and
possibly other methods. Since different kinds of visitors return
different types of values, the accept method is parameterized by the
type that the visitor computes for each expression tree. In Scala, type parameters appear inside square brackets.
abstract class Expr {
def accept[T](v: Visitor[T]): T
}
class Number(n: Int) extends Expr {
override def accept[T](v: Visitor[T]): T = v.visitNumber(n)
}
class Plus(left: Expr, right: Expr) extends Expr {
override def accept[T](v: Visitor[T]): T = {
v.visitPlus(left.accept(v), right.accept(v))
}
}
The associated Visitor
abstract class, naming the methods that
must be included in each visitor, and the ToString
visitor, have
this form:
abstract class Visitor[T] {
def visitNumber(n: Int): T
def visitPlus(left: T, right: T): T
}
class ToString extends Visitor[String] {
override def visitNumber(n: Int): String = {
"" + n
}
override def visitPlus(left: String, right: String): String = {
"(" + left + " + " + right + ")"
}
}
Tasks
-
Starting with the call to
expTree.accept(printer)
and ending beforeprintln
what is the sequence of method calls that occurs while building the string represenation,stringRep
of the expression treeexpTree
below?val expTree = new Plus(new Plus(new Number(3), new Number(4)), new Number(2)) val printer = new ToString() val stringRep = expTree.accept(printer) println(stringRep)
-
The code for these classes is in
Visitors.scala
in the repository for this assignment. Compile withfsc Visitors.scala
orscalac Visitors.scala
and run withscala Visitors
. Add the following classes to that file. You will need to change some of the existing classes to accomodate them.- An
Eval
visitor class that evaluates an expression tree. The visit methods should return anInt
. Subtract
andTimes
classes to represent subtraction and product expressions.- A
Compile
visitor class that returns a sequence of stack-language instructions to evaluate the expression. You may use the following stack instructions:Push(n)
,Add
,Mult
,Sub
,Div
,Swap
. The visit methods can simply return aString
containing the sequence of instructions. For example, compiling the expression tree for 3*(1-2) should return the string"Push(3) Push(1) Push(2) Sub Mult"
. The instruction sequence should just leave the result of computing the expression on the top of the stack. Hint: use a post-order traversal of the expression tree.
- An
-
Suppose there are n subclasses of
Expr
, and m operations we need to implement for expressions. How many classes would have to be added or changed to add each of the following things using the Visitor Design Pattern? Using the original approach described above? Do not implement them, just answer these questsion:- A new class to represent division expressions.
- A new operation to graphically draw the expression parse tree.
-
Consider the trade-offs of these two approaches.
- Under what circumstances would you recommend using the standard design?
- Under what circumstances would you recommend using the Visitor Design Pattern?
- Do you prefer either of these organizations to the organization that comes naturally with pattern-matching in ML?
Random Sentence Generator1
The goals of this problem are to:
- write more Scala code,
- implement a simple class hierarchy following the Composite Design Pattern to contrast it with a functional style, and
- write a fairly entertaining program.
The Random Sentence Generator creates random sentences from a grammar. Here are a few examples of the output for generating entertaining homework extension requests. (This, and rest of the provided grammars, are classics dating back a number of years.[^steve I have made a couple minor location-specific tweaks in porting the assigment to our course.)
-
Honesty: I need an extension because my professor assigned more 251 homework than anybody could finish in 4 weeks, even with the help of 7 lake monsters working at the speed of light.
-
Plead innocence: I need an extension because it was just too nice outside and then I didn’t know I was in this class.
-
Wear down the Professor’s patience: I need an extension because I used up all my paper and then my dorm burned down and then I didn’t know I was in this class and then I lost my pet iguana and then my karma wasn’t good last week and on top of that my dog ate my notes and as if that wasn’t enough I had to finish my doctoral thesis this week and then I had to do laundry and on top of that my karma wasn’t good last week and on top of that I just didn’t feel like working and then I skied into a tree and then I fell through the ice on Lake Waban and as if that wasn’t enough I thought I already graduated and as if that wasn’t enough I lost my dreams and in addition I spent all weekend hung-over and then I had to go to an intramural monster truck meet this week and on top of that all my pencils broke.
Grammars
The program reads in grammars written in a form illustrated by this simple grammar file to generate poems:
<start> = The <object> <verb> tonight
;
<object> =
waves
| big yellow flowers
| slugs
;
<verb> =
sigh <adverb>
| portend like <object>
| hum <adverb>
;
<adverb> =
warily
| grumpily
;
The strings in brackets (<>
) are the non-terminals. Each
non-terminal definition is followed by a sequence of productions,
separated by |
characters, and with a ;
at the end. Each
production consists of a sequence of white-space separated terminals
and non-terminals. A production may be empty so that a non-terminal
can expand to nothing. There will always be whitespace surrounding
the |
, =
, and ;
characters to make parsing easy.
Here are two possible poems generated by generating derivations for this grammar:
- The big yellow flowers sigh warily tonight
- The slugs portend like waves tonight
Your program will create a data structure to represent a grammar it
reads in and then produce random derivations from it. Derivations
will always begin with the non-terminal <start>
.
To expand a non-terminal, simply choose one of its productions from
the grammar at random and then recursively expand each word in the
production. For example:
<start>
-> The <object> <verb> tonight
-> The big yellow flowers <verb> tonight
-> The big yellow flowers sigh <adverb> tonight
-> The big yellow flowers sigh warily tonight
Organization
A grammar consists of terminals, non-terminals, productions, and definitions. These four items have one thing in common: despite occupying structurally different roles in the definition of a grammar, these items can all be expanded into a random derivation for that part of a grammar. Thus, we will create classes organized in the following class hierarchy to store a grammar:
The abstract class GrammarElement
provides the general interface
to all pieces of a grammar. It is defined as follows:
abstract class GrammarElement {
/**
* Expand the grammar element as part of a random
* derivation. Use grammar to look up the definitions
* of any non-terminals encountered during expansion.
*/
def expand(grammar: Grammar): String
/**
* Return a string representation of this grammar element.
* This is useful for debugging. (Even though we inherit a
* default version of toString() from the Object superclass,
* I include it as an abstract method here to ensure that
* all subclasses provide their own implmementaiton.)
*/
def toString(): String
}
The Grammar
object passed into expand
is used to look up the
definitions for non-terminals during the expansion process, as described
next.
The Grammar
Class
A Grammar
object maps non-terminal names to their definitions. At
a minimum, your Grammar
class should implement the following:
class Grammar {
// add a new non-terminal, with the given definition
def +=(nt : String, defn : Definition)
// look up a non-terminal, and return the definition, or null
// if not def exists.
def apply(nt : String) : Definition
// Expand the start symbol for the grammar.
def expand() : String
// return a String representation of this object.
override def toString() : String
}
Subclasses
The four subclasses of GrammarElement
represent the different
pieces of the grammar and describe how each part is expanded:
-
Terminal
: A terminal just stores a terminal string (likeslugs
), and a terminalexpand
s to itself. -
NonTerminal
: A non-terminal stores a non-terminal string (like<start>
). When a non-terminalexpand
s, it looks up the definition for its string and recursively expands that definition. -
Production
: A production stores a list ofGrammarElement
s. Toexpand
, a production simply expands each one of these elements. -
Definition
: A definition stores a vector ofProduction
s. A definition is expanded by picking a randomProduction
from its vector and expanding thatProduction
.
This design is an example of the Composite Design Pattern.
Tasks
Compile with fsc RandomSentenceGenerator.scala
or scalac RandomSentenceGenerator.scala
, then run with scala RandomSentenceGenerator < Poem.g
.
You will need to use Scala’s generic library classes. In particular,
you will probably want to use both List
s and Map
s from the standard
Scala packages. The full documentation for the Scala standard library is here.
-
Begin by implementing the four subclasses of
GrammarElement
. Do not writeexpand
yet, but complete the rest of the classes so that you can create and calltoString()
on them. -
To save time, I have provided code to parse the file format and create a grammar using the contructors of your classes.
You will need to complete the definition of
Grammar
at this point for parsing to work. -
Once the grammar can be properly created and printed, implement the
expand
methods for yourGrammarElement
s. Scala provides a random number generator that can be used as follows:val number = Random.nextInt(N) // number is in range [0,N-1].
-
Change
RandomSentenceGenerator
to create and print three random derivations after printing the grammar. -
Write at least one additional grammar. It can be as simple or as complicated as you like. Go wild.
-
Reflect briefly on the organization of code for this task. If expanding derivations was not the primary use of our grammar representations, would you choose the same representations? If you were building a random sentence generator in ML (or in Scala using a more functional approach), how might it differ? Just write a few sentences.
Hints
A few details about producing derivations:
-
The grammar will always contain a
<start>
non-terminal to begin the expansion. It will not necessarily be the first definition in the file, but it will always be defined eventually. I have provided some error checking in the parsing code, but you may assume that the grammar files are otherwise syntactically correct. -
The one error condition you should catch reasonably is the case where a non-terminal is used but not defined. It is fine to catch this when expanding the grammar and encountering the undefined non-terminal rather than attempting to check the consistency of the entire grammar while reading it. The starter code contains a
RandomSentenceGenerator.fail(msg: String): Unit
method that you can call to report an error and stop.
-
When generating the output, just print the terminals as you expand. Each terminal should be preceded by a space when printed, except the terminals that begin with punctuation like periods, comma, dashes, etc. You can use the
Character.isLetterOrDigit
method to check whether a character is punctuation mark. This rule about leading spaces is just a rough heuristic, because some punctuation (quotes for example) might look better with spaces. Don’t worry about the minor details— we’re looking for something simple that is right most of the time and it’s okay if is little off for some cases.
OPTIONAL but Recommended Reading
This reading is optional, but recommended. It gives a great high-level view of the contrast between two major styles of program organization (and the language features that support them). There are interesting connotations for reliability and extensibility.
What Orientation Should Ada Objects Take?
Read What Orientation Should Ada Objects Take?
J. P. Rosen. Communications of the ACM, November 1992, Vol. 35, No. 11.
This article discusses the Ada programming language, a programming language with a strong static type system and module system similar in some respects to the module system of ML. Ada was originally developed for high-consequence coding projects in the U.S. Department of Defense. No previous understanding of Ada is necessary to read this article. It mentions a couple Ada features, but you can get the core ideas from the article alone.
Reflect on Rosen’s argument about the tensions between composition (with true abstract data types) and classification (via inheritance and encapsulation). Do you see one as more important or useful than the other? Does your answer change depending on the kind of software you consider?
Parallel Sieves1
To get additional starter code for this part, run hg pull ssh://hg@bitbucket.org/wellesleycs251/cs251-scala
. If you have committed any changes in your repository, run hg merge; hg commit
, otherwise, run hg update
.
The ML function, primesto n
, given below, can be used to calculate
the list of all primes less than or equal to n
by using the
classic Sieve of Eratosthenes.
(*
* Sieve of Eratosthenes: Remove all multiples of first element from list,
* then repeat sieving with the tail of the list. If start with list [2..n]
* then will find all primes from 2 up to and including n.
*)
fun sieve [] = []
| sieve (x::xs) =
let fun filter p [] = []
| filter p (y::ys) = if (y mod p) = 0
then filter p ys
else y::(filter p ys)
in
x::(sieve (filter x xs))
end
(* Returns list of integers from i to j *)
fun fromto i j = if j < i then [] else i::(fromto (i+1) j)
(* Return list of primes from 2 to n *)
fun primesto n = sieve (fromto 2 n)
Pipeline Parallelism
Notice that each time through the sieve we first filter all of the multiples of the first element from the tail of the list, and then perform the sieve on the reduced tail. In ML, one must wait until the entire tail has been fitered before you can start the sieve on the resulting list. However, one could use parallelism to have one process start sieving the result before the entire tail had been completely filtered by the original process.
This pattern is called pipeline parallelism: many stages of work are done in parallel. Individual pieces of data move through the pipeline stages in sequence. Pipeline parallelism is used in many applications, from image processing to the internal hardware implementation of computer processors. We will build two pipeline-parallel implementations of the Sieve of Eratosthenes, using two different concurrency primitives to build the pipeline.
Thread Sieves and Bounded Buffers (mostly in class)
Our first implementation using pipeline parallelism will use
the Java Buffer
class.
The main program should begin by creating a Buffer
object, with
perhaps 5 slots. It should then successively insert the numbers from
2 to n
into the Buffer
, followed by -1 to signal that there
are no more input numbers.
After the creation of the Buffer
object, but before starting to
put the numbers into it, the program should create a Sieve
object
(using the Sieve
class described below) and pass it the Buffer
object as a parameter to Sieve
’s constructor. The Sieve
object
should then begin running in a separate thread while the main program
inserts the numbers in the buffer.
After the Sieve
object has been constructed and the Buffer
object has been stored in an instance variable,
in
, its run
method should get the first item from in
:
- If that number is negative then the
run
method should terminate. - Otherwise, it should print out the number (using
System.out.println
) and then create a newBuffer
object,out
. A newSieve
should be created withBuffer
out
and started running in a new thread. Meanwhile the currentSieve
object should start filtering the elements from thein
buffer. That is, therun
method should successively grab numbers from thein
buffer. If the number is divisible by the first number that was obtained fromin
, it is discarded. Otherwise, it is added to theout
buffer. This reading and filtering continues until a negative number is read. When the negative number is read, that number is put into theout
buffer and then therun
method terminates.
In this way, the program will eventually have created a total of p +
1 Sieve
objects all running in separate threads, where p is
the number of primes between 2 and n
. The instances of Sieve
will be working in a pipeline, using the buffers to pass numbers from
one Sieve
object to the next.
In the threads
directory, edit Sieve.java
to implement your solution. Compile with javac Sieve.java
. Run with java Sieve
. Use the provided Buffer
class. Each of the buffers used should be able to hold at most 5 items. Your program should print all of the primes less than 100, in order.
Actor Sieves
Rewrite the Sieve of Eratosthenes program from above in Scala
using Actors. Several actor examples are provided in the examples
directory. Note, the Scala library recently introduced a new version of actors (akka.actor). You will likely see warning about deprecation for this version (scala.actors) – ignore them. The Java sieve program created a new sieve thread for every prime
number discovered. This time, create a new Sieve
actor for each prime Int
.
Your Sieve
actor should keep track of the prime it
was created with, and it should have a slot that can hold another
“follower” Sieve
actor that will handle the next prime
found. (The mailbox of this other Actor will play the role of the
bounded Buffer
from the previous problem in that it will
hold the numbers being passed along from one actor to the next.)
Each Sieve
actor should be able to handle messages of the form
Num(n)
and Stop
. Use Scala’s case classes to emulate an ML datatype for messages.
The operation of each Sieve
actor, once started, is
similar to the previous problem. If it receives a message of the
form Num(n)
then it checks if it is divisible by its
prime. If so, it is discarded. If not, then if the follower
Actor has not yet been created, create it with the number. If
not, then send the number to the follower Sieve
actor.
The Stop
message should cause each Sieve
actor to print its prime, pass on the message to its follower (if any), and exit.
Edit Sieve.scala
to implement your solution. Compile with fsc Sieve.scala
or scalac Sieve.scala
. Run with scala SieveMain
. Your program should print all of the primes less than 100, in order.