Concurrency
Assignment: Concurrency
- Assign: Monday 18 November
- Due: Monday 25 November
- Policy: Pair graded synthesis assignment
- Partner form: Find a partner here.
-
Code:
cs240 start concurrency
(if this doesn't work, usecs240.f24 start concurrency
) - Submit:
git commit
andgit push
your completed code. - Reference:
Contents
- Assignment: Concurrency
- Overview
- Setup
- Compiling and running your shell
- Tasks
- Preparatory Exercises
- Part 1: Basic builtin commands and foreground commands
- Part 2: Background Jobs
- Testing background (and foreground) commands: Example correct interaction
- Submission
- Grading
- License
Overview
It’s time to build a shell! You have used the
bash
shell since
the first CS 240 lab (if not earlier). You now know enough to build a
small shell. Your shell, which should probably acquire a punny name
containing “sh” by the time you are done building it, will provide
basic functions familiar from bash
.1
A shell is a command language interpreter and a prolific manager of processes. You already implemented a parser for simple shell commands in the Pointers assignment. In this assignment you, will implement a few standard process management facilities that are standard shell features.
Goals
- To understand the process model and apply specific system calls for
process management in the operating system (
fork
,execvp
,waitpid
, etc.). - To reason about and manage basic concurrency at the process level.
- To develop debugging skills for more complex environments, including operating system interaction and multiple processes.
- To think critically about how to implement a specification.
- To read documentation to find and use relevant library functions.
- To increase proficiency with C and basic systems programming.
- To build a tool that you have been using (at least) all semester, then ruminate on circularity as you use your own shell… to work on your own shell.
Advice: figuring it out
This assignment gives a specification describing the required behavior of your shell. It is your job to figure out how to implement this specification using the following key tools:
- Review process concepts.
- Read the specifications carefully and think critically about how best to implement them.
- Read the implementation suggestions carefully.
- Search and consult documentation to learn about relevant library functions and system calls.
- Experiment with features in
bash
and try out implementation ideas.
Time Reports
According to self-reported times on Parts 1 and 2 of this assignment from a previous offering:
- 25% of students spent <= 5 hours.
- 50% of students spent <= 6 hours.
- 75% of students spent <= 8 hours.
Setup
Use a CS 240 computing environment.
The shell you build might work on similar (macOS, Linux)
platforms with your own implementation of command_parse
(no guarantees!),
but it must work on our platforms for grading.
Get your repository with cs240 start concurrency
.
Your starter repository contains the following files:
Makefile
: recipe for building the shell.soln_command.a
,soln_command.h
: provided command-parsing library (or use yours from Pointers).joblist.c
,joblist.h
: provided data structure for managing background jobs.shell.c
: you will implement the main shell logic in this file.terminal.c
,terminal.h
: provided library for simple terminal control (needed for extra fun extensions only).
Compiling and running your shell
Tasks
Implement the shell in parts. Complete and test each part before reading or starting the next.
- Required support for basic built-in commands and foreground jobs.
- Required support for basic background jobs.
If you would like to implement optional extensions (for your own learning, not for any extra credit), see Part 3 (job management) and Part 4 (extensions) from a previous offering here.
Grading considers design, documentation, style, memory safety, and correctness.
Preparatory Exercises
-
Complete the lab activities on processes and
bash
before starting this assignment. -
Read Part 1.
-
What is the difference between a built-in command and an executable command?
-
Before starting implementation of each part of the shell, try out the specified feature in
bash
(the shell you use in the course computing environments). Keep in mind thatbash
is a much more complex shell than what you will implement, but the basic behavior of the features described below is the same.
Part 1: Basic builtin commands and foreground commands
In this part, you implement support for basic builtin shell commands and support for running arbitrary executable programs in the foreground. Each feature is described with two parts:
- The declarative specification describes the required behavior that you must implement: “The shell does X.”
- The imperative implementation guide recommends code structure and relevant library functions or system calls: “Call Y with Z. Documentation: …”
Test out any of these features in bash
to understand expected behavior.
Shell Loop and Command Dispatch
Specification
At the top-level the shell is a loop that repeatedly:
- Prints a prompt (e.g.,
->
). - Reads and parses a command entered at the prompt.
- Runs that command.
The stencil code is set up to have this basic structure, though it is not a memory-safe implementation (you will need to insert calls that free data in order to eliminate memory issues).
The shell accepts two types of commands:
- Built-in commands run features of the shell itself. They
change or inspect the state of the shell without creating any new
processes.
- Example: the
exit
command exits the shell.
- Example: the
- Executable commands run executable programs from the filesystem
in a separate process. Any command that is not built-in is assumed
to be the name or path of an executable to run. All executable
commands are run in a single general way.
- Example: the
ls
executable lists the contents of the current working directory.
- Example: the
Implementation
A top-level shell loop is provided in main
. It
prints a prompt for the user and accept user input for a command line.
The provided loop (and shell) exits if the user types the Control-D
key combination at a prompt, sending the special EOF
(end-of-file or
end-of-input) character. For each command entered, the main shell loop
calls shell_run_command
, which is the starting point for your
implementation. Currently shell_run_command
simply echoes every
command.
Replace the body of shell_run_command
to determine whether the
command is a built or executable command and run it appropriately.
The provided code includes stubs for helper functions including
shell_run_builtin
and shell_run_executable
that may help you
organize and complete this task.
Much of the actual implementation in both Parts 1 and 2 of this assignment will require calling library functions in C. Our implementation advice will point out which functions are necessary, but it is up to you to read the manual pages and documentation to determine exactly how to use each function. This is to help you practice reading language documentation in general (which is how you will program outside of classes).
This document refers to manual pages for the standard libraries and
GNU/Linux user commands. This documentation is available by running
the man
command or online and
contains several sections. 1 is executable commands, 2 is system
calls, 3 is library functions. We use standard notation for
referencing manual pages. execvp(3)
indicates a page about
execvp
in section 3: run man 3 execvp
or look
online.
Manual pages have several key pieces. Typically they list a Synopsis of function signature or command usage and provide a short Description of its behavior, followed by details and options. Pages for system calls or library functions include a section called Return Value. Pay close attention: most of these functions may return special error values that your code should handle.
Built-in Commands
Specification
The shell accepts three built-in commands (more later).
exit
exits the shell process.cd
changes the working directory of the shell.- If a path is given as an argument, then
cd
uses the directory described by that path:cd cs240
- If no argument is given, then
cd
uses the directory path given by theHOME
environment variable. - If more than one argument is given, then
cd
reports an error to the user (but does not cause the shell to exit).
- If a path is given as an argument, then
- ` ` (the empty command) does nothing: on the user just entering a newline, the shell does not report an error or exit, it simply does nothing and waits for the next command.
- (Optional) You may optionally implement
help
, which displays a message listing the usage of all built-in commands that your shell supports. (Yes, this list.) You will add more inPart 2)
Implementation
Implement functionality to handle the cd
builtin shell command and the
empty command directly
in C code in the current process. You may find it helpful to use and
extend the shell_run_builtin
function, which should act as follows:
- If the command is built in, do the command and return true.
- If the command is not built in, return false.
Do not fork or exec to run builtin commands – they are features of
the shell itself. Feel free to use shell_run_builtin
and/or create
and call other helper functions. We have provided one case:
shell_builtin_exit
.
System call documentation: You will need to use the following C library functions:
strcmp(3)
(mind the return value),
chdir(2)
, getenv(3)
.
Note that getenv(3)
takes in the environment variable name
without a leading $
. For example, char *v = getenv("VAR");
stores a pointer to the
value of the environment variable VAR
in a local variable named v
.
Testing builtins
Fully test cd
later.
Wait to fully test cd
after you implement executable commands, by
using the executable command pwd
to print the working directory
after cd
-ing. Do not add C code to print the current working
directory in the shell. It will cost you a lot of time and confusion at
this point.
For now, test cd
just by ensuring that the entering the following commands
do not cause your shell to crash, report an error, or exit:
-> cd
-> cd ..
->
And check that passing in more than 2 arguments does cause your shell to report an error (but not crash or exit):
-> cd a b
Error: cd: too many arguments
->
At this point, also test that you can enter an empty command without an error, crash, or exit:
->
->
Come up with your own additional cases to try against your shell at this point.
Executable Commands
Specification
All non-built-in commands are executable commands that name executable programs on the filesystem to run in a separate process.
- The first word of the command line indicates the executable to run.
- If the executable is given by name (e.g.,
ls
) rather than by absolute or relative path (e.g.,/bin/ls
or./shell.bin
), the shell finds the executable according to thePATH
environment variable.
- If the executable is given by name (e.g.,
- The shell passes the entire command array as arguments to the executable.
- For foreground jobs (this part), the shell waits to print the next command prompt after the command terminates.
Example: given command ls .
, the shell executes the
/bin/ls
command in a new process, using the argument array containing
"ls"
followed by the argument .
. When the ls
process has
terminated, the shell prints a new prompt.
-> ls .
Makefile command.h joblist.h terminal.c
command.c joblist.c shell.c terminal.h
->
Implementation
Implement functionality to handle all executable commands:
- Fork a new child process to execute the requested executable
command.
- In the parent process (the shell), wait for the child process to terminate, then return.
- In the child process, execute the requested executable with the entire command array as arguments.
- Do error-checking for all system calls in the function that calls them. If the system call failed, print an error and exit the shell process with an error exit code.
You may find it useful to extend the provided shell_run_executable
stub for this functionality. For now, the bodies of your functions should
ignore the jobs
and foreground
parameters, you will use in Part 2.
System call documentation: You will need to use the following C library functions:
exit(3)
, fork(2)
, waitpid(2)
, execvp(3)
.
- Note that you do not need to handle the command-specific logic, such as special casing on the current directory
with
.
, since the executables themselves implement that logic for you. You just need to execute the new process correctly. - Use
execvp
instead of the standardexec
, since it will automatically finds the executable usingPATH
. - Pay special attention to the form of arguments these functions expect and how errors manifest when calling these functions.
Review our discussion of the fork-exec-wait model from class (especially the shell loop and process management examples), plus the lab on processes. The textbook covers the same material, but it sometimes dives into details too quickly in this section.
Reference
Error-Checking
Remember to check the return values of functions like fork
, execvp
,
and waitpid
to see if they succeeded or result in errors. There are
no exceptions in C, so you must check for error return values
explicitly. If one of these system calls results in an error, it sets
a special global variable (errno
) to indicate why. You can print a
summary easily with the perror
function. It takes a string, prints
that string, and then prints a description of the error that occurred.
In general, a system call failure means you should print the error and
exit the current process immediately. For example:
pid_t pid = fork();
if (pid < 0) {
// Fork system call failed.
perror("fork failed");
exit(-1);
} ...
Pay special attention to failures of the exec
family of functions,
especially if it seems like your shell requires you to type the exit
command multiple times before finally exiting.
Memory Management
The calls to command_parse()
allocate memory. You must be sure to free this memory when the code is done with it – think carefully about this. Try to balance timely freeing with centralized freeing: free this memory as soon as possible after the time it becomes obviously unneeded, but try to minimize the number of points in the code where freeing takes place. Avoid sprinkling command_free()
calls around the program. In part 2, some of the job list calls include freeing for you.
Use valgrind ./shell.bin
to check for memory errors. Valgrind will report memory accesses that fall outside the bounds of legal memory locations (local variables on the stack and allocated heap memory). It also prints a summary of memory leaks (allocated but unfreed memory) at program exit. Note that when your process forks
, Valgrind goes with it – your process forks into two processes running under Valgrind. So if the child exits in error (e.g., trying to execute a nonexistent executable file), you may see an extra Valgrind exit message. Once you have successfully exec
‘d, the new program will not be running under Valgrind.
Your shell should run without valgrind errors when you have finished each Part.
The initial starter shell (which simply echoes every command and does
nothing more) has memory leaks. Your final shell should have no memory
leaks (as measured by valgrind
).
Testing foreground commands
First, check that ls
works as expected:
-> ls .
joblist.c Makefile shell.c soln_command.h terminal.h
joblist.h shell.bin soln_command.a terminal.c
Then, check that a call that takes some time to run, like sleep 5
, does not return immediately:
-> sleep 5
It should return and you should see a new command prompt after 5 seconds.
You should also check that errors from child processes are reported. For example, if you run ls
with a file that does not exist, you should see:
-> ls foo
ls: cannot access 'foo': No such file or directory
->
You now have access to the system’s pwd
call (print working directory), so you can more fully test that cd
actually changes the current working directory:
-> pwd
/home/avh/cs240-repos/concurrency
-> cd
-> pwd
/home/avh
-> cd cs240-repos
-> pwd
/home/avh/cs240-repos
-> cd ../..
-> pwd
/home
Part 2: Background Jobs
In this part, you add support for background jobs and a built-in command to list them.
Background Jobs
A job is a record of a process started by the shell.2
Specification
An executable command ending with &
runs as a background command.
Built-in commands are unaffected by &
. The shell executes each
background command in a new process like a foreground command, but
unlike a foreground command:
- The shell prints a summary of the background job, but does not wait for the child process to terminate before printing the next command prompt.
- The shell remembers the child process and reaps the child process after it terminates.
To support these requirements for background jobs:
- The shell saves each new background job in a list of pending jobs.
- Before each command prompt (e.g., at the end of the previous loop iteration), the shell reaps, reports, and removes any background jobs whose processes have terminated.
Difference from bash
.
In most real-world shells, background commands will report their completion immediately upon their process’s termination, even if the user does not enter any additional commands. This is implemented via signals, which are covered in the optional components of this assignment, but not necessary for the required components.
Your shell need only report completed background jobs as the user enters new commands, not immediately upon their termination.
Example: the following shell interactions demonstrate background jobs.
-> sleep 30 &
[1]+ 4709 Running sleep 30 &
-> sleep 55 &
[2]+ 4828 Running sleep 55 &
-> echo she sells C shells by the C store
she sells C shells by the C store
… wait at least 30 seconds since starting sleep 30
…
-> echo world
world
[1] 4709 Done sleep 30 &
-> cd ..
-> man sleep
...
… wait at least 55 seconds since starting sleep 55
…
-> echo hello
hello
[2] 4828 Done sleep 55 &
->
Test-drive features in bash
.
Before working on implementation, try out background jobs in bash
to
get a feel for how they work. A useful program for testing foreground
and background jobs is sleep
. It takes one argument, a number, and
it sleeps (does nothing) for that many seconds. (man sleep
)
Implement one feature at a time, in order, testing and committing each as you go.
Implementation
Immediately after reading this implementation guide, read the
documentation of the provided JobList
data structure
and associated functions in joblist.h
.
- Use the single
JobList
instance that the stencil code creates before the main shell loop begins. - The
JobList*
parameters are for passing around a reference to this list. (Avoid global variables!) - Save each new job in a list of jobs. (See
job_save
.)- For background jobs:
- You should also report the job to the user (See
job_print
.) - The background job should stay on the jobs list until the child process terminates.
- You should also report the job to the user (See
- For foreground jobs.
- You should remove foreground jobs from the job list immediately upon
completion. (See
job_delete
.) - You should not report the creation of foreground jobs.
- You should remove foreground jobs from the job list immediately upon
completion. (See
- For background jobs:
- Call a function at the end of the shell loop body to reap, report,
and remove any background jobs whose processes have terminated.
- See
job_iter
,job_set_status
,job_print
, andjob_delete
for theJobList
data structure. - Use
waitpid(2)
to check if a process has terminated. Look up how to use theWNOHANG
option to allowwaitpid
to return even if the target process has not terminated. Use the return value to determine whether or not the child process has terminated.
- See
System call documentation: You will need to use the following C library functions: waitpid(2)
. See WNOHANG
. An example of the syntax for using WNOHANG
(your other arguments will likely differ):
waitpid(pid, &s, WNOHANG)
Job Lists
Use the provided data structures and functions declared and
documented in joblist.h
to save, track, and report jobs. There are
two kinds of C structs
comprising the job list data structure:
- A
JobList
structure stores a list of jobs. (You need exactly oneJobList
for the shell.) - A
Job
structure stores information about a single job (an executable command that has been started by the shell):- job ID (
jid
field) – the shell’s short, memorable ID for this job; - process ID (
pid
field) – the operating system’s process ID for the process in which the job’s command was executed (pid
≠jid
); - command array (
command
field) – the command array used to start the job; - and job status (
status
field)JOB_STATUS_FOREGROUND
– process is running as a foreground jobJOB_STATUS_BACKGROUND
– process is running as a background jobJOB_STATUS_STOPPED
– process is stopped (only used in the optional extensions)JOB_STATUS_DONE
– process has been reaped and the job is about to be deleted (transient state used only to affect job printing)
- job ID (
The shell job status is separate from the OS process status.
A job is the shell’s data structure record about a process it
started, including its last known job status (JobStatus
), but it
is distinct from the process itself. The OS knows neither that this
job data structure exists nor what it means; updating it is entirely
the responsibility of the shell. Nothing here is automatic or
implicit.
The operating system keeps its own internal information about the
current state of each process. To detect changes in the status of a
job’s process and effect changes in job status in response, the
shell must explicitly request process status info from the OS with
waitpid
and update the job status according to the results.
Both structures also track extra information used in managing linked lists, terminal interaction, keyboard interrupt signals, etc, for the optional components of this project. For Parts 1 & 2, ignore any fields not listed above.
Do not create instances of these structures yourself. Do not read
or manipulate fields of JobList
structs directly. You may read
fields of Job
structs listed above (use the ->
syntax), but any
other operations on JobList
or Job
structs (including changes to
job status) should be done via the provided functions.
Job list basics:
joblist_create
allocates and initializes a newJobList
.joblist_empty
returns 1 if the given list is empty and 0 otherwise.joblist_free
frees a (presumed empty)JobList
.
Managing individual jobs:
job_save
creates a new backgroundJob
with the given process ID, command, and status, assigns it the next available job ID, and saves it at the end of the list. The command array is saved by reference (not copied) as part of the job data structure.- Save a single command array in at most one job.
- Do not free a command array after passing it to
job_save
.job_delete
will handle that freeing the command array for you.
job_get
gets theJob
for a given job ID. You will likely not need this for the required components of this assignment.job_print
prints a specificJob
.job_set_status
sets aJob
’s status.job_delete
removes aJob
from the list and frees it and its command array (by callingcommand_free
).job_iter
iterates over aJobList
, calling the given function (passed via function pointer) on eachJob
in the list.- A function pointer (K&R 5.11, p. 118) holds the address of a function, allowing functions to be passed as arguments to other functions. Any state used in the passed function must be passed explicitly as an argument.
job_iter
allows you to pass a function (just give its name as an argument) for what to do with eachJob
, but it does not provide a way to build up any result by iterating over the list (because you should not need to do this!).-
Example: prints each job in the list that has an even job ID.
void printer(JobList* jobs, Job* job) { if ((job->jid % 2) == 0) { job_print(jobs, job); } } ... // somewhere in a function ... // where jobs is a JobList* ... job_iter(jobs, printer);
jobs
command
Specification
In this part, add a new builtin jobs
command that reports all background jobs, along with their job ID, process ID, execution
status, and command.
-> jobs
[3] 5502 Running evince bash-book.pdf &
[4] 5504 Running sleep 20 &
Implementation
- Reap completed jobs as part of the
jobs
command. - Share code between two tasks:
- Job reaping/reporting done after each command is run.
- Explicit job reporting with
jobs
.
- See the provided job list code, especially
job_iter
,job_set_status
,job_print
,job_delete
.
Testing background (and foreground) commands: Example correct interaction
You should test your shell extensively (by entering series of commands: interspersing foreground and background commands as well as builtin commands).
Here is an example, correct interaction; run under valgrind
to demonstrate that there are no memory leaks.
First, make and run the executable under valgrind
, with the --leak-check=full
option to show full details:
[avh@cs concurrency] make
make: 'shell.bin' is up to date.
[avh@cs concurrency] valgrind --leak-check=full ./shell.bin
==1081948== Memcheck, a memory error detector
==1081948== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1081948== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1081948== Command: ./shell.bin
==1081948==
Then, run a series of builtin cd
commands and executable pwd
commands to show that we can successfully change directories, including to the home directory via no explicit argument to cd
:
-> pwd
/home/avh/cs240-repos/concurrency
-> cd
-> pwd
/home/avh
-> cd cs240-repos/concurrency
-> pwd
/home/avh/cs240-repos/concurrency
Then, show creating two background commands to run sleep
. Each command prints a summary (via a call to job_print
).
-> sleep 10 &
[1]+ 1082027 Running sleep 10 &
-> sleep 20 &
[2]+ 1082028 Running sleep 20 &
When we run Part 2’s builtin jobs
, we see a report of the two running background jobs (again, implemented via calls to job_print
):
-> jobs
[1] 1082027 Running sleep 10 &
[2]+ 1082028 Running sleep 20 &
We can also still run foreground commands:
-> echo hello
hello
After 10 seconds, if we enter an empty command line, we expect to see the first job finish and a corresponding report (via another call to job_print
):
->
[1] 1082027 Done sleep 10 &
Before the 20 seconds are up, we expect empty command lines to do nothing, until 20 total seconds have passed:
->
->
->
->
->
[2] 1082028 Done sleep 20 &
As that point, if we run jobs
again, the list should be empty:
-> jobs
We can then enter CMD-D
to end the session (also be sure to test the built-in) exit
.
Because we have freed all allocated memory blocks, we see this report from valgrind
with no leaks:
-> ==1081948==
==1081948== HEAP SUMMARY:
==1081948== in use at exit: 0 bytes in 0 blocks
==1081948== total heap usage: 39 allocs, 39 frees, 2,082 bytes allocated
==1081948==
==1081948== All heap blocks were freed -- no leaks are possible
==1081948==
==1081948== For lists of detected and suppressed errors, rerun with: -s
==1081948== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes. (Replace
FILES
with the files you changed andMESSAGE
with your commit message.)$ git add FILES $ git commit -m "MESSAGE"
-
Run the command
cs240 sign
to sign your work and respond to any assignment survey questions.$ cs240 sign
(If this encounters an error, instead execute
cs240.f24 sign
.) -
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/main'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.
Grading
Your grade is computed from 50 points, as follows:
- Design, documentation, style (10 points):
- Review notes in the Pointers sample solutions for tips on designing clean, readable C code.
- Document your code clearly with comments.
- Safety (10 points):
- Your shell code must be free of compiler warnings, memory errors,
and memory leaks (run
valgrind
!)
- Your shell code must be free of compiler warnings, memory errors,
and memory leaks (run
- Correctness and completeness (30 points):
- Your shell must implement the features described in the specification.
- Part 1: 15 points
- Built-in commands
- Foreground jobs
- Part 2: 15 points
- Background jobs
jobs
command
- Part 1: 15 points
- Perform extensive testing to increase your confidence in your shell’s correctness.
- Your shell must implement the features described in the specification.
License
Concurrency
by Benjamin P. Wood at Wellesley College is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Source code and evaluation infrastructure for Concurrency are available upon request for reuse by instructors in other courses or institutions.