Escape to a Parallel Universe
- Assign: Friday, 17 April
- Due: Friday, 1 May
- Policy: Pair graded synthesis assignment
- Partner search: Find a partner here.
-
Code:
cs251 start parcon
- Submit:
git commit
,cs251 sign
, andgit push
your completed code. - Reference:
Contents
Time Info
In Fall 2019, students self-reported spending a median of 7.8 hours on this assignment.
Tasks
1. Backus to the Future (20 points)
This task is about John Backus’s 1977 Turing Award Lecture: Can
Programming be Liberated from the von Neumann Style? A Functional
Style and its Algebra of Programs. Backus
led development of the procedural
Fortran language in the
1950s. This lecture/paper demonstrates some shifts in thinking in the
intervening 20 years. Read sections 1–7, 9-10, and 15 of this paper
and answer the following questions. Write concise responses, aiming
for just a couple informative phrases or sentences. Write your answers
in the file backus.txt
.
-
One of the reasons this paper is well-known is that in it Backus coined the term “von Neumann bottleneck”. Describe what this term means and why it is relevant to the paper.
-
Many programming languages have at least two syntactic categories: expressions and statements. Backus claims that expressions are good but statements are bad. Explain his claim.
-
In Sections 6, 7, and 9 of the paper, Backus discusses three problems/defects with von Neumann languages. Summarize them.
-
What are applicative languages and how do they address the three problems/defects mentioned by Backus for von Neumann languages?
-
Backus wrote this paper long before the development of Java and Python. Based on his paper, how do you think he would evaluate these two languages?
-
Read (at least skim) section 11, which introduces the details of the FP language. Backus uses many notations that may be unfamiliar to you. For example:
-
McCarthy’s notation of
cond
expressions, from earlier in the course: $p_1 \to e_1; \ldots; p_n \to e_n; e_{n+1}$ matches(cond (p1 e1) ... (pn en) (#t en+1))
in Racket. -
$\langle e_1, \ldots, e_n \rangle$ denotes the sequence of the $n$ values of the expressions $e_1$ through $e_n$. $\phi$ denotes the empty sequence. Because FP is dynamically typed, such sequences can represent both tuples and lists.
-
The symbol $\bot$ (“bottom”) denotes an undefined value: the value of an expression whose evaluation either doesn’t terminate or terminates with an error.
-
If $f$ is a function and $x$ is an object (atom or sequence of objects), then $f : x$ denotes the result of applying $f$ to argument $x$.
-
$[f_1, \ldots, f_n]$ is a functional form denoting a sequence of $n$ functions, $f_1$ through $f_n$. Evaluation of $[f_1, \ldots, f_n] : x$, the application of this form to a single object $x$, is equivalent to evaluation of $⟨f_1 : x, \ldots, f_n : x⟩$, applying each of the $n$ functions to a single object $x$ and gathering the results as an $n$-element sequence.
Read enough of section 11 to answer the remaining questions.
-
- What is one feature of FP that is familiar from Racket or SML but not Java or C? Identify and explain briefly.
- What is one feature of FP that expresses a computation that could be a natural candidate for data parallelism (even if its evaluation is not explicitly parallelized in Backus’s description)? Identify and explain briefly.
- What is one feature of FP that expresses a computation that would be a natural candidates for task parallelism (even if its evaluation is not explicitly parallelized in Backus’s description)? Identify and explain briefly.
- Does FP provide any feature analogous to explicit futures, tasks, or threads? Explain briefly.
- Describe how the form $[f_1, \ldots, f_n] : x$ relates to the
familiar
map
operation from Racket and ML by writing an equivalent Racket or ML call tomap
, assuming the function sequence, $[f_1, \ldots, f_n]$, is available as a list of functions bound to the variablefs
and the object, $x$, is available as a value bound to the variablex
.
2. Parallel Calculator (15 points)
In this task you will parallelize a tiny calculator
implementation. The file calc.sml
defines an interpreter for a small
calculator expression language.
(* Calculator expression language. *)
datatype calc_expr = CalcInt of int
| CalcAdd of calc_expr * calc_expr
| CalcSub of calc_expr * calc_expr
| CalcMul of calc_expr * calc_expr
| CalcDiv of calc_expr * calc_expr
| CalcBind of string * calc_expr * calc_expr
| CalcVar of string
The key interpreter functions are eval
and its helper try
:
(* Evaluate calculator expression e in environment env, returning SOME
answer if the expression is well-formed or NONE otherwise. *)
fun eval e env =
let
(* Try to evaluate e1 and e2, apply the operation f to their
results, and return SOME of this value. If either e1 or e2
results in NONE, return NONE instead. *)
fun try (f, e1, e2) =
case (eval e1 env, eval e2 env) of
(SOME x, SOME y) => SOME (f (x, y))
| _ => NONE
in
(
case e of
CalcInt i => SOME i
| CalcAdd (e1, e2) => try (fn (x,y) => x + y, e1, e2)
| CalcSub (e1, e2) => try (fn (x,y) => x - y, e1, e2)
| CalcMul (e1, e2) => try (fn (x,y) => x * y, e1, e2)
| CalcDiv (e1, e2) => try (fn (x,y) => x div y, e1, e2)
| CalcBind (x, e, body) => (case eval e env of
SOME y => eval body ((x, y) :: env)
| NONE => NONE)
| CalcVar x => lookup x env
) handle _ => NONE
end
For example, evaluating the following calculator expression should
yield SOME 2
:
(* A sample calculator expression. *)
val calc_prog = CalcBind ("x", CalcAdd (CalcInt 21, CalcInt 2),
CalcDiv (CalcAdd (CalcInt 4,
CalcMul (CalcInt 5, CalcVar "x")),
CalcSub (CalcMul (CalcInt 17, CalcInt 3),
CalcDiv (CalcInt 34, CalcVar "x"))))
(* Should evaluate to (SOME 2) *)
val calc_result = eval calc_prog []
When parallelizing, stick with pval
. Do not try to use pcase
.
-
Parallelize the calculator interpreter using
pval
bindings by taking the following steps:- Think. What can be parallelized with use of
pval
bindings? Where? How will you transform the code? (Although you may find a potential use forpcase
, do not usepcase
.) - Edit
calc.sml
(the SML program) to introduceval
bindings where you can later parallelize in the Manticore version. - Copy your edited
calc.sml
tocalc.pml
. - Edit
calc.pml
to convert the relevant SMLval
bindings to Manticorepval
bindings. To get syntax highlighting for calc.pml in Emacs, useM-x sml-mode
. See the Emacs Basics document if you don’t understand that command.
Doing the pre-edits in the
calc.sml
SML version is optional but recommended, because:- The SML compiler’s error messages are more familiar than those of the Manticore compiler.
- You can test
calc.sml
via the usual SML tools to confirm that your initial transformed calculator still works like the original and use it as comparison for the parallelized version.
Unlike SML/NJ, the Manticore implementation uses an ahead-of-time compiler, called
pmlc
. (P is for parallel.) To compilecalc.pml
, use this command:$ pmlc -o calc calc.pml
It will create an executable named
calc
that you can run with:$ ./calc
Note: Unlike the SML/NJ interpreter:
- The executable is a compiled version of the program.
- There is no REPL.
- The executable displays only what is explicitly printed by the compiled program. It does not display any extra info (such as bindings, types, etc.)
Thus, you must use explicit printing for any tests you perform.
- Think. What can be parallelized with use of
-
Does your parallelization exploit data parallelism? Task parallelism? Both? Neither? Explain briefly in a comment at the end of
calc.pml
. -
Consider evaluating the provided sample
calc_prog
calculator expression with your parallelized calculator implementation,eval
. Assume there is initially one task in which the call toeval
occurs, and that every evaluation of apval
binding creates a new task. Answer and explain briefly in a comment at the end ofcalc.pml
for each of the following questions:- Total tasks: How many tasks will your
eval
implementation run in total forcalc_prog
? - Max pending tasks: Define a pending task to be a task
that has started but not yet completed with a result or an
error. What is the maximum number of parallel tasks that could
be pending at a single point in time during
eval calc_prog
? - Max active tasks: Define an active task to be a task that
has started, has not yet completed, and is doing computational
work other than waiting for the result of another task. What
is the maximum number of parallel tasks that could be active
at a single point in time during
eval calc_prog
?
- Total tasks: How many tasks will your
-
Optional challenge question: There is at least one opportunity for parallelization that you cannot exploit via
pval
bindings orpcase
. What is it? Why is it not possible withpval
? How could you parallelize with another feature? Explain briefly in a comment a the end ofcalc.pml
. Hint: think about bindings in the calculator language.
3. Concurrent Bank Accounts (20 points)
This task deals with representations of bank accounts in Concurrent ML. You will break (and explain the breakage of) an unsafe implementation of a bank account and then implement a safe version.
The file bank.sml
contains starter code for this task. The starter
code includes two completed ADT implementations and one that you will
complete.
Cell
The Cell
structure implements a mutable storage cell ADT using
Concurrent ML and no actual mutation. A cell is represented by:
- A “server” thread runs forever and is responsible for:
- knowing the current logical contents of the cell
- responding to requests from client threads to:
- get the current contents of the cell
- put a new value as the new context of the cell
- A pair of channels:
- A request channel is used by client threads to send a get or put request to the “server” thread.
- A reply channel is used by the server thread to send the current balance in response to a client’s get request.
UnsafeAccount
The UnsafeAccount
structure implements a mutable bank account ADT
where the account balance is stored by a Cell.t
. The account
provides three operations:
getBalance
: get the current balancedeposit
: mutate the account balance to deposit (add) the given amountwithdraw
: mutate the account balance to withdraw (subtract) the given amount
Notice that withdraw
is extraneous; it can be modeled by a deposit
of a negative ammount.
The deposit
operation is implemented by getting the current balance
from the underlying Cell.t
with Cell.get
, adding to that balance,
and puting the new balance back in the Cell.t
with Cell.put
. This
implementation works fine when only one client thread is interacting
with an UnsafeAccount.t
. You will show how to break this
implementation with operations from multiple threads.
SafeAccount
The SafeAccount
structure is a second implementation of a mutable
bank account ADT, with same ACCOUNT
signature as
UnsafeAccount
. You will complete this structure with a thread-safe
implementation of the bank account.
Questions and Programming Tasks
-
Write comments in the
Cell
andUnsafeAccount
structures to document how they work. Your comments will not be graded; they are for your own reference in the rest of the task. Understanding these implementations is a major part of what you need to accomplish. -
Why does the
Cell
use two distinct channels for requests vs. replies? Explain briefly how the implementation would break if requests and replies where sent on the same channel instead of on two different channels. Write your answer in a comment below theCell
definition inbank.sml
.Hint: it is enough to think about the interaction of just 1 client thread with the
Cell.t
’s server thread. -
Unfortunately, the
UnsafeAccount
implementation is not thread-safe. When multiple threads concurrently applydeposit
orwithdraw
operations to the sameUnsafeAccount.t
, the final account balance may not always accurately reflect the expected cumulative result of these operations.The key issue is that the
deposit
andwithdraw
operations are not atomic. Clients expect that adeposit
occurs as a single atomic event that—all at once—reads the current balance, adds to it, and updates it. However, theUnsafeAccount
implementsdeposit
/withdraw
using distinctget
andput
operations on the underlyingCell.t
. If multiple client threads are interacting with the same account, the underlyingCell
operations by one client threads might happen in between the underlyingCell
of another client thread.Your first account task is to break the
UnsafeAccount
. The functionmyUnsafeTest
inbank.sml
calls the generaltest
function defined above it to run a stress test on an account. This stress test creates a new account and appliesdeposit
/withdraw
operations to it in multiple threads.Run the stress test like this in an SML REPL:
- use "bank.sml"; - run myUnsafeTest;
Read the printed output together with the test code. The output shows:
- the sequence of
Get
andPut
operations that occurred on theUnsafeAccount
’s underlyingCell.t
- the expected and actual final balance in the account after all operations have completed
If the expected and actual balance match, you may need to adjust the
test
parameters to increase the number of operations until you can cause the actual balance to diverge.Feel free to add printing in
deposit
to help understand the output.What to submit:
-
In a comment below the definition of
myUnsafeTest
inbank.sml
, copy-paste the printed output of your account-breaking stress test. -
Label each
Get
orPut
line with the thread (the main testing thread,thread1
, orthread2
) that most likely requested it. If multiple labelings might be correct, choose one of them, so that eachGet
/Put
is labeled by a single thread. -
In the same comment, briefly explain what happened. Which underlying
Get
/Put
events interacted in what problematic ways to violate the atomicity ofdeposit
/withdraw
operations and cause the final balance to be incorrect?
Hint: drawing time diagrams will likely help your thinking! You do not need to submit them, but if you wish to use them in your explanation, they are easily represented as text using vertical alignment to show relative timing and horizontal columns for threads. For example:
Thread 1 Thread 2 account operation foo: eventA account operation bar: eventC eventD eventB
- the sequence of
-
Complete the skeleton of the
SafeAccount
with a thread-safe implementation of the bank account ADT using Concurrent ML. Instead of implementing it by using an underlyingCell
, implement theSafeAccount
from scratch in the same style as theCell
itself to support atomicdeposit
andwithdraw
operations.Like a
Cell.t
, aSafeAccount.t
should consist of a pair of request/reply channels tended by a “server” thread that maintains the account balance and responds to requests. Instead of low-levelGet
/Put
requests the requests should handle full account operations on the level ofDeposit
.If you are uncertain about how to approach this, follow these steps to transform a
Cell
implementation into aSafeAccount
implementation.- Copy the
Cell
implementation. - Rename the operations.
- Change the associated logic of those operations.
Stress-test your implementation with
run mySafeTest
. This runs the same style of multi-thread stress test you used to breakUnsafeAccount
. Adjust the parameters to try larger numbers ofdeposit
/withdraw
operations. Your implementation should always with with the expected final balance.Note: there are many other interesting ways to approach implementing a
SafeAccount
. You are welcome to explore anything (read more in the concurrency material, especially Concurrent Programming in ML [local access]), but please try this simple implementation first. - Copy the
-
In a comment below the definition of
mySafeTest
inbank.sml
, briefly explain why yourSafeAccount
avoids the problems that you demonstrated in yourUnsafeAccount
-breaking test.
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes.
$ git add ... $ git commit ...
-
Run the command
cs251 sign
to sign your work and respond to any assignment survey questions.$ cs251 sign
-
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/master'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.