CS 251: Random Sentences and Sieves

Due: 11:00pm Monday 7 December
Files:
- fork wellesleycs251 / cs251-scala and add bpw as admin
- Time estimates in time.txt
Submission: commit and push your completed work to your Bitbucket fork.
Relevant reading:
- Scala language reference
- J. P. Rosen, What Orientation Should Ada Objects Take?
- (old) Scala Actors Tutorial (ignore new Akka actors)
- Notes on Parallelism and Concurrency in Java sections 2, 3, 6-10. (Section 10 covers producer-consumer pattern)
Tools:
- Mercurial
- Scala
Collaboration: All problems fall under the group collaboration policy. You may work in pairs or alone. You may discuss ideas with other groups, but your writing and code must be the work of your group members. If you work with a partner, you must work together as a pair on all parts of the problem rather than working on separate pieces.

Overview

This assignment uses small programming exercises to contrast different design approaches in object-oriented programming and in parallel and concurrent programming.

The first exercise is mostly an in-class activity, to be finished with the assignment if needed.
The second exercise will familiarize you furthher with Scala and explore more design trade-offs.
The third (to be published after Thanksgiving break) will provide a taste of concurrent programming.

Expression as Objects (mostly in class)
Random Sentence Generator
OPTIONAL but Recommended Reading
Parallel Sieves

Using Scala

See reference material and tool documentation for Scala and its tools.

The tools are installed on the CS Linux machines. If you are using the wx appliance, be sure to update its configuration to include Scala tools by running the wx update command while logged in as the wx user.

Expression as Objects¹ (mostly in class)

As discussed in class, idiomatic code in “functional” programming languages and object-oriented programming languages tends toward certain patterns of organization. Functional programming tends to organize code around behavior of operations and is thus easier to extend with new behaviors. Object-oriented programming tends to organization code around behavior of kinds of data and is thus easier to extend with new data variants. In this part, we explore a design pattern (a common idiom) for writing object-oriented code that seeks to emulate the operation-centered organization of functional programming while using the object-oriented semantic tool: dynamic dispatch.

Consider an object-oriented implementation of a small calculator expression language much like the various such languages we have worked with in ML. An object oriented approach would typically use a class hierarchy to organize expressions and the behavior of operations over expressions:

An abstract base class Expr defines the set of operations each expression should support.
Concrete subclasses of Expr each define behavior for one kind of expression (Number, Plus, Times, etc.).

The hierarchy is likely clear, but the organization of operations such as eval, toString, etc., is more subtle. The obvious way of implementing operations is by adding a method to each class for each operation. The Expression hierarchy would then look like:

abstract class Expr {
    def toString() : String
    def eval() : Int
}

class Number(n: Int) extends Expr {
    override def toString() : String = { ... }
    override def eval() : Int = { ... }
}

class Plus(left: Expr, right: Expr) extends Expr {
    override def toString() : String = { ... }
    override def eval() : Int = left.eval() + right.eval()
}

Another way of implementing expression classes and operations uses a pattern called the Visitor Design Pattern to emulate the functional-style organization where all code for one operation is collected together. In this pattern, each operation is represented by a Visitor class. Each Visitor class has a visitNumber(e: Number) method, a visitPlus(e: Plus) method, and so on – one for each expression class in the hierarchy. Each expression class in the expression hierarchy has an accept method which accepts a Visitor as an argument and “allows the Visitor to visit the object and perform its operation.” The expression class does not need to know what operation the visitor is performing.

If you write a Visitor class ToString to construct a string representation of an expression tree, it would be used as follows:

val expTree = new Plus(new Plus(new Number(3), new Number(4)), new Number(2))
val printer = new ToString()
val stringRep = expTree.accept(printer)

println(stringRep)

The first line defines an expression, the second defines an instance of your ToString class, and the third passes your visitor object to the accept method of the expression object.

The expression class hierarchy using the Visitor Design Pattern has the following form, with an accept method in each class and possibly other methods. Since different kinds of visitors return different types of values, the accept method is parameterized by the type that the visitor computes for each expression tree. In Scala, type parameters appear inside square brackets.

abstract class Expr {
    def accept[T](v: Visitor[T]): T
}

class Number(n: Int) extends Expr {
    override def accept[T](v: Visitor[T]): T = v.visitNumber(n)
}

class Plus(left: Expr, right: Expr) extends Expr {
    override def accept[T](v: Visitor[T]): T = {
        v.visitPlus(left.accept(v), right.accept(v))
    }
}

The associated Visitor abstract class, naming the methods that must be included in each visitor, and the ToString visitor, have this form:

abstract class Visitor[T] {
    def visitNumber(n: Int): T
    def visitPlus(left: T, right: T): T
}

class ToString extends Visitor[String] {
    override def visitNumber(n: Int): String = {
        "" + n
    }
    override def visitPlus(left: String, right: String): String = {
        "(" + left + " + " + right + ")"
    }
}

Tasks

Starting with the call to expTree.accept(printer) and ending before println what is the sequence of method calls that occurs while building the string represenation, stringRep of the expression tree expTree below?
```
 val expTree = new Plus(new Plus(new Number(3), new Number(4)), new Number(2))
 val printer = new ToString()
 val stringRep = expTree.accept(printer)
    
 println(stringRep)
```
The code for these classes is in Visitors.scala in the repository for this assignment. Compile with fsc Visitors.scala or scalac Visitors.scala and run with scala Visitors. Add the following classes to that file. You will need to change some of the existing classes to accomodate them.
- An Eval visitor class that evaluates an expression tree. The visit methods should return an Int.
- Subtract and Times classes to represent subtraction and product expressions.
- A Compile visitor class that returns a sequence of stack-language instructions to evaluate the expression. You may use the following stack instructions: Push(n), Add, Mult, Sub, Div, Swap. The visit methods can simply return a String containing the sequence of instructions. For example, compiling the expression tree for 3*(1-2) should return the string "Push(3) Push(1) Push(2) Sub Mult". The instruction sequence should just leave the result of computing the expression on the top of the stack. Hint: use a post-order traversal of the expression tree.
Suppose there are n subclasses of Expr, and m operations we need to implement for expressions. How many classes would have to be added or changed to add each of the following things using the Visitor Design Pattern? Using the original approach described above? Do not implement them, just answer these questsion:
- A new class to represent division expressions.
- A new operation to graphically draw the expression parse tree.
Consider the trade-offs of these two approaches.
- Under what circumstances would you recommend using the standard design?
- Under what circumstances would you recommend using the Visitor Design Pattern?
- Do you prefer either of these organizations to the organization that comes naturally with pattern-matching in ML?

Random Sentence Generator¹

The goals of this problem are to:

write more Scala code,
implement a simple class hierarchy following the Composite Design Pattern to contrast it with a functional style, and
write a fairly entertaining program.

The Random Sentence Generator creates random sentences from a grammar. Here are a few examples of the output for generating entertaining homework extension requests. (This, and rest of the provided grammars, are classics dating back a number of years.[^steve I have made a couple minor location-specific tweaks in porting the assigment to our course.)

Honesty: I need an extension because my professor assigned more 251 homework than anybody could finish in 4 weeks, even with the help of 7 lake monsters working at the speed of light.
Plead innocence: I need an extension because it was just too nice outside and then I didn’t know I was in this class.
Wear down the Professor’s patience: I need an extension because I used up all my paper and then my dorm burned down and then I didn’t know I was in this class and then I lost my pet iguana and then my karma wasn’t good last week and on top of that my dog ate my notes and as if that wasn’t enough I had to finish my doctoral thesis this week and then I had to do laundry and on top of that my karma wasn’t good last week and on top of that I just didn’t feel like working and then I skied into a tree and then I fell through the ice on Lake Waban and as if that wasn’t enough I thought I already graduated and as if that wasn’t enough I lost my dreams and in addition I spent all weekend hung-over and then I had to go to an intramural monster truck meet this week and on top of that all my pencils broke.

Grammars

The program reads in grammars written in a form illustrated by this simple grammar file to generate poems:

    <start> = The <object> <verb> tonight	
    ;
    
    <object> =
      waves
    | big yellow flowers	
    | slugs
    ;
    
    <verb> =
      sigh <adverb>	
    | portend like <object>
    | hum <adverb>
    ;
    
    <adverb> =
      warily	
    | grumpily
    ;

The strings in brackets (<>) are the non-terminals. Each non-terminal definition is followed by a sequence of productions, separated by | characters, and with a ; at the end. Each production consists of a sequence of white-space separated terminals and non-terminals. A production may be empty so that a non-terminal can expand to nothing. There will always be whitespace surrounding the |, =, and ; characters to make parsing easy.

Here are two possible poems generated by generating derivations for this grammar:

The big yellow flowers sigh warily tonight
The slugs portend like waves tonight

Your program will create a data structure to represent a grammar it reads in and then produce random derivations from it. Derivations will always begin with the non-terminal <start>. To expand a non-terminal, simply choose one of its productions from the grammar at random and then recursively expand each word in the production. For example:

   <start>
-> The <object> <verb> tonight 
-> The big yellow flowers <verb> tonight            
-> The big yellow flowers sigh <adverb> tonight     
-> The big yellow flowers sigh warily tonight

Organization

A grammar consists of terminals, non-terminals, productions, and definitions. These four items have one thing in common: despite occupying structurally different roles in the definition of a grammar, these items can all be expanded into a random derivation for that part of a grammar. Thus, we will create classes organized in the following class hierarchy to store a grammar:

Class Hierarchy

The abstract class GrammarElement provides the general interface to all pieces of a grammar. It is defined as follows:

abstract class GrammarElement {

  /**
   * Expand the grammar element as part of a random 
   * derivation.  Use grammar to look up the definitions
   * of any non-terminals encountered during expansion.
   */
  def expand(grammar: Grammar): String

  /**
   * Return a string representation of this grammar element.
   * This is useful for debugging.  (Even though we inherit a
   * default version of toString() from the Object superclass, 
   * I include it as an abstract method here to ensure that 
   * all subclasses provide their own implmementaiton.)
   */
  def toString(): String
}

The Grammar object passed into expand is used to look up the definitions for non-terminals during the expansion process, as described next.

The `Grammar` Class

A Grammar object maps non-terminal names to their definitions. At a minimum, your Grammar class should implement the following:

class Grammar {

  // add a new non-terminal, with the given definition
  def +=(nt : String, defn : Definition)

  // look up a non-terminal, and return the definition, or null
  // if not def exists.
  def apply(nt : String) : Definition

  // Expand the start symbol for the grammar.
  def expand() : String

  // return a String representation of this object.
  override def toString() : String
}

Subclasses

The four subclasses of GrammarElement represent the different pieces of the grammar and describe how each part is expanded:

Terminal: A terminal just stores a terminal string (like slugs), and a terminal expands to itself.
NonTerminal: A non-terminal stores a non-terminal string (like <start>). When a non-terminal expands, it looks up the definition for its string and recursively expands that definition.
Production: A production stores a list of GrammarElements. To expand, a production simply expands each one of these elements.
Definition: A definition stores a vector of Productions. A definition is expanded by picking a random Production from its vector and expanding that Production.

This design is an example of the Composite Design Pattern.

Tasks

Compile with fsc RandomSentenceGenerator.scala or scalac RandomSentenceGenerator.scala, then run with scala RandomSentenceGenerator < Poem.g.

You will need to use Scala’s generic library classes. In particular, you will probably want to use both Lists and Maps from the standard Scala packages. The full documentation for the Scala standard library is here.

Begin by implementing the four subclasses of GrammarElement. Do not write expand yet, but complete the rest of the classes so that you can create and call toString() on them.
To save time, I have provided code to parse the file format and create a grammar using the contructors of your classes.

You will need to complete the definition of Grammar at this point for parsing to work.
Once the grammar can be properly created and printed, implement the expand methods for your GrammarElements. Scala provides a random number generator that can be used as follows:
```
 val number = Random.nextInt(N)  // number is in range [0,N-1].
```
Change RandomSentenceGenerator to create and print three random derivations after printing the grammar.
Write at least one additional grammar. It can be as simple or as complicated as you like. Go wild.
Reflect briefly on the organization of code for this task. If expanding derivations was not the primary use of our grammar representations, would you choose the same representations? If you were building a random sentence generator in ML (or in Scala using a more functional approach), how might it differ? Just write a few sentences.

Hints

A few details about producing derivations:

The grammar will always contain a <start> non-terminal to begin the expansion. It will not necessarily be the first definition in the file, but it will always be defined eventually. I have provided some error checking in the parsing code, but you may assume that the grammar files are otherwise syntactically correct.
The one error condition you should catch reasonably is the case where a non-terminal is used but not defined. It is fine to catch this when expanding the grammar and encountering the undefined non-terminal rather than attempting to check the consistency of the entire grammar while reading it. The starter code contains a
```
  RandomSentenceGenerator.fail(msg: String): Unit
```
method that you can call to report an error and stop.
When generating the output, just print the terminals as you expand. Each terminal should be preceded by a space when printed, except the terminals that begin with punctuation like periods, comma, dashes, etc. You can use the Character.isLetterOrDigit method to check whether a character is punctuation mark. This rule about leading spaces is just a rough heuristic, because some punctuation (quotes for example) might look better with spaces. Don’t worry about the minor details— we’re looking for something simple that is right most of the time and it’s okay if is little off for some cases.

OPTIONAL but Recommended Reading

This reading is optional, but recommended. It gives a great high-level view of the contrast between two major styles of program organization (and the language features that support them). There are interesting connotations for reliability and extensibility.

What Orientation Should Ada Objects Take?

Read What Orientation Should Ada Objects Take?
J. P. Rosen. Communications of the ACM, November 1992, Vol. 35, No. 11.

This article discusses the Ada programming language, a programming language with a strong static type system and module system similar in some respects to the module system of ML. Ada was originally developed for high-consequence coding projects in the U.S. Department of Defense. No previous understanding of Ada is necessary to read this article. It mentions a couple Ada features, but you can get the core ideas from the article alone.

Reflect on Rosen’s argument about the tensions between composition (with true abstract data types) and classification (via inheritance and encapsulation). Do you see one as more important or useful than the other? Does your answer change depending on the kind of software you consider?

Parallel Sieves¹

To get additional starter code for this part, run hg pull ssh://hg@bitbucket.org/wellesleycs251/cs251-scala. If you have committed any changes in your repository, run hg merge; hg commit, otherwise, run hg update.

The ML function, primesto n, given below, can be used to calculate the list of all primes less than or equal to n by using the classic Sieve of Eratosthenes.

(* 
 * Sieve of Eratosthenes: Remove all multiples of first element from list,
 * then repeat sieving with the tail of the list. If start with list [2..n] 
 * then will find all primes from 2 up to and including n. 
 *) 
fun sieve [] = [] 
  | sieve (x::xs) = 
      let fun filter p [] = [] 
            | filter p (y::ys) = if (y mod p) = 0
                                 then filter p ys 
                                 else y::(filter p ys)
      in 
          x::(sieve (filter x xs)) 
      end

(* Returns list of integers from i to j *) 
fun fromto i j = if j < i then [] else i::(fromto (i+1) j)

(* Return list of primes from 2 to n *) 
fun primesto n = sieve (fromto 2 n)

Pipeline Parallelism

Notice that each time through the sieve we first filter all of the multiples of the first element from the tail of the list, and then perform the sieve on the reduced tail. In ML, one must wait until the entire tail has been fitered before you can start the sieve on the resulting list. However, one could use parallelism to have one process start sieving the result before the entire tail had been completely filtered by the original process.

This pattern is called pipeline parallelism: many stages of work are done in parallel. Individual pieces of data move through the pipeline stages in sequence. Pipeline parallelism is used in many applications, from image processing to the internal hardware implementation of computer processors. We will build two pipeline-parallel implementations of the Sieve of Eratosthenes, using two different concurrency primitives to build the pipeline.

Thread Sieves and Bounded Buffers (mostly in class)

Our first implementation using pipeline parallelism will use the Java Buffer class.

The main program should begin by creating a Buffer object, with perhaps 5 slots. It should then successively insert the numbers from 2 to n into the Buffer, followed by -1 to signal that there are no more input numbers.

After the creation of the Buffer object, but before starting to put the numbers into it, the program should create a Sieve object (using the Sieve class described below) and pass it the Buffer object as a parameter to Sieve’s constructor. The Sieve object should then begin running in a separate thread while the main program inserts the numbers in the buffer.

After the Sieve object has been constructed and the Buffer object has been stored in an instance variable, in, its run method should get the first item from in:

If that number is negative then the run method should terminate.
Otherwise, it should print out the number (using System.out.println) and then create a new Buffer object, out. A new Sieve should be created with Buffer out and started running in a new thread. Meanwhile the current Sieve object should start filtering the elements from the in buffer. That is, the run method should successively grab numbers from the in buffer. If the number is divisible by the first number that was obtained from in, it is discarded. Otherwise, it is added to the out buffer. This reading and filtering continues until a negative number is read. When the negative number is read, that number is put into the out buffer and then the run method terminates.

In this way, the program will eventually have created a total of p + 1 Sieve objects all running in separate threads, where p is the number of primes between 2 and n. The instances of Sieve will be working in a pipeline, using the buffers to pass numbers from one Sieve object to the next.

In the threads directory, edit Sieve.java to implement your solution. Compile with javac Sieve.java. Run with java Sieve. Use the provided Buffer class. Each of the buffers used should be able to hold at most 5 items. Your program should print all of the primes less than 100, in order.

Actor Sieves

Rewrite the Sieve of Eratosthenes program from above in Scala using Actors. Several actor examples are provided in the examples directory. Note, the Scala library recently introduced a new version of actors (akka.actor). You will likely see warning about deprecation for this version (scala.actors) – ignore them. The Java sieve program created a new sieve thread for every prime number discovered. This time, create a new Sieve actor for each prime Int.

Your Sieve actor should keep track of the prime it was created with, and it should have a slot that can hold another “follower” Sieve actor that will handle the next prime found. (The mailbox of this other Actor will play the role of the bounded Buffer from the previous problem in that it will hold the numbers being passed along from one actor to the next.) Each Sieve actor should be able to handle messages of the form Num(n) and Stop. Use Scala’s case classes to emulate an ML datatype for messages.

The operation of each Sieve actor, once started, is similar to the previous problem. If it receives a message of the form Num(n) then it checks if it is divisible by its prime. If so, it is discarded. If not, then if the follower Actor has not yet been created, create it with the number. If not, then send the number to the follower Sieve actor. The Stop message should cause each Sieve actor to print its prime, pass on the message to its follower (if any), and exit.

Edit Sieve.scala to implement your solution. Compile with fsc Sieve.scala or scalac Sieve.scala. Run with scala SieveMain. Your program should print all of the primes less than 100, in order.

Due to Steve Freund and Williams CS 334, with some adaptations by Ben Wood. ↩ ↩² ↩³

Overview

Contents

Expression as Objects1 (mostly in class)

Tasks

Random Sentence Generator1

Grammars

Organization

The Grammar Class

Subclasses

Tasks

Hints

OPTIONAL but Recommended Reading

What Orientation Should Ada Objects Take?

Parallel Sieves1

Pipeline Parallelism

Thread Sieves and Bounded Buffers (mostly in class)

Actor Sieves

Expression as Objects¹ (mostly in class)

Random Sentence Generator¹

The `Grammar` Class

Parallel Sieves¹