Shell
Assignment: Shell
- Policy: Pair practice
- Partner search: Find a partner here.
-
Code:
cs240 start
(if this doesn't work, usecs240.s23 start
- Submit:
git commit
andgit push
your completed code. - Reference:
Contents
- Assignment: Shell
- Overview
- Setup
- Tasks
- Preparatory Exercises
- Part 1: Basics and Foreground
- Part 2: Background Jobs
- Part 3: Job Management
- Part 4: Open-Ended Extensions
- Submission
- License
Overview
It’s time to build a shell! You have used the
bash
shell since
the first CS 240 lab (if not earlier). You now know enough to build a
small shell. Your shell, which should probably acquire a punny name
containing “sh” by the time you are done building it, will provide
basic functions familiar from bash
.1
A shell is a command language interpreter and a prolific manager of processes. You implemented a parser for simple shell commands in the Pointers assignment. In this assignment you will implement standard process management facilities that are standard shell features.
Goals
- To understand the process model and apply specific system calls for
process management in the operating system (
fork
,execvp
,waitpid
, etc.). - To reason about and manage basic concurrency at the process level.
- To develop debugging skills for more complex environments including operating system interaction and multiple processes.
- To think critically about how to implement a specification.
- To read documentation to find relevant library functions.
- To increase proficiency with C and basic systems programming.
- To build a tool that you have been using (at least) all semester, then ruminate on circularity as you use your own shell… to work on your own shell.
Advice: figuring it out
This assignment gives a specification describing the required behavior of your shell. It is your job to figure out how to implement this specification using the following key tools:
- Review process concepts.
- Read the specifications carefully and think critically about how best to implement them.
- Read the implementation suggestions carefully.
- Search and consult documentation to learn about relevant library functions and system calls.
- Experiment with features in
bash
and try out implementation ideas.
Time Reports
According to self-reported times on Parts 1 and 2 of this assignment from a previous offering:
- 25% of students spent <= 5 hours.
- 50% of students spent <= 6 hours.
- 75% of students spent <= 8 hours.
Setup
Use a CS 240 computing environment. The shell you build might work on similar (UNIX, macOS, Cygwin) platforms (no guarantees!), but it must work on our platforms for grading.
Get your repository with cs240 start shell
.
Your starter repository contains the following files:
Makefile
: recipe for building the shellcommand.c
,command.h
: provided command-parsing library (or use yours from Pointers)joblist.c
,joblist.h
: provided data structure for managing background jobsshell.c
: you will implement the main shell logic in this fileterminal.c
,terminal.h
: provided library for simple terminal control
Compile the shell with make
to produce an executable called
shell.bin
. Run the shell with: ./shell.bin
. Exit the shell by typing
Control-d (or exit
, once you have implemented it). Remove all
compiled files (including the executable, if you want to recompile
from scratch) with make clean
.
Tasks
Implement the shell in parts. Complete and test each part before reading or starting the next.
- Required support for basic built-in commands and foreground jobs.
- Required support for basic background jobs.
- Extra Fun Optional support for job management.
- Extra Fun Optional open-ended extensions.
Grading considers design, documentation, style, memory safety, and correctness.
Preparatory Exercises
-
Complete the lab activities on processes and
bash
before starting this assignment. -
Read Part 1.
-
What is the difference between a built-in command and an executable command?
-
Before starting implementation of each part of the shell, try out the specified feature in
bash
(the shell you use in the course computing environments). Keep in mind thatbash
is a much more complex shell than what you will implement, but the basic behavior of the features described below is the same.
Part 1: Basics and Foreground
In this part, you implement support for basic built-in shell commands and support for running arbitrary executable programs in the foreground. Each feature is described with two parts:
- The declarative specification describes the required behavior that you must implement: “The shell does X.”
- The imperative implementation guide recommends code structure and relevant library functions or system calls: “Call Y with Z. Documentation: …”
Test out any of these features in bash
to understand expected behavior.
Shell Loop and Command Dispatch
Specification
At the top-level the shell is a loop that repeatedly:
- Prints a prompt (e.g.,
->
). - Reads and parses a command entered at the prompt.
- Runs that command.
The shell accepts two types of commands:
- Built-in commands run features of the shell itself. They
change or inspect the state of the shell without creating any new
processes.
- Example: the
exit
command exits the shell.
- Example: the
- Executable commands run executable programs from the filesystem
in a separate process. Any command that is not built-in is assumed
to be the name or path of an executable to run. All executable
commands are run in a single general way.
- Example: the
ls
executable lists the contents of the current working directory.
- Example: the
Implementation
A top-level shell loop is provided in main
. It uses the GNU
readline
library to
print a prompt for the user and accept user input for a command line.
The provided loop (and shell) exits if the user types the Control-D
key combination at a prompt, sending the special EOF
(end-of-file or
end-of-input) character. For each command entered, the main shell loop
calls shell_run_command
, which is the starting point for your
implementation. Currently shell_run_command
simply echoes every
command.
Replace the body of shell_run_command
to determine whether the
command is a built or executable command and run it appropriately.
The provided code includes stubs for helper functions including
shell_run_builtin
and shell_run_executable
that may help you
organize and complete this task.
This document refers to manual pages for the standard libraries and
GNU/Linux user commands. This documentation is available by running
the man
command or online and
contains several sections. 1 is executable commands, 2 is system
calls, 3 is library functions. We use standard notation for
referencing manual pages. execvp(3)
indicates a page about
execvp
in section 3: run man 3 execvp
or look
online.
Manual pages have several key pieces. Typically they list a Synopsis of function signature or command usage and provide a short Description of its behavior, followed by details and options. Pages for system calls or library functions include a section called Return Value. Pay close attention: most of these functions may return special error values that your code should handle.
Built-in Commands
Specification
The shell accepts three built-in commands (more later).
exit
exits the shell process.help
displays a message listing the usage of all built-in commands that your shell supports. (Yes, this list.) You will add more in Part 2)cd
changes the working directory of the shell.- If a path is given as an argument, then
cd
uses the directory described by that path:cd cs240
- If no argument is given, then
cd
uses the directory path given by theHOME
environment variable.
- If a path is given as an argument, then
Implementation
Implement functionality to handle all built-in shell commands directly
in C code in the current process. You may find it helpful to use and
extend the shell_run_builtin
function, which should act as follows:
- If the command is built in, do the command and return true.
- If the command is not built in, return false.
Do not fork or exec to run builtin commands – they are features of
the shell itself. Feel free to use shell_run_builtin
and/or create
and call other helper functions. We have provided one case:
shell_builtin_exit
.
Documentation: strcmp(3)
(mind the return value), exit(3)
,
chdir(2)
, getenv(3)
.
Test cd
later.
Wait to test cd
after you implement executable commands, by
using the executable command pwd
to print the working directory
after cd
-ing. Do not add C code to print the current working
directory in the shell. It will cost you a lot of time and confusion at
this point.
Executable Commands
Specification
All non-built-in commands are executable commands that name executable programs on the filesystem to run in a separate process.
- The first word of the command line indicates the executable to run.
- If the executable is given by name (e.g.,
ls
) rather than by absolute or relative path (e.g.,/bin/ls
or./shell.bin
), the shell finds the executable according to thePATH
environment variable.
- If the executable is given by name (e.g.,
- The shell passes the entire command array as arguments to the executable.
- For foreground jobs (this part), the shell waits to print the next command prompt after the command terminates.
Example: given command ls cs240-shell
, the shell executes the
/bin/ls
command in a new process, using the argument array containing
"ls"
followed by "cs240-shell"
. When the ls
process has
terminated, the shell prints a new prompt.
-> ls cs240-shell
Makefile command.h joblist.h terminal.c
command.c joblist.c shell.c terminal.h
->
Implementation
Implement functionality to handle all executable commands:
- Fork a new child process to execute the requested executable
command.
- In the parent process (the shell), wait for the child process to terminate, then return.
- In the child process, execute the requested executable with the entire command array as arguments.
- Do error-checking for all system calls in the function that calls them. If the system call failed, print an error and exit the shell process with an error exit code.
You may find it useful to extend the provided shell_run_executable
stub for this functionality. For now, ignore the jobs
and
foreground
parameters, which will be used in Part 2.
Documentation: exit(3)
, fork(2)
,
waitpid(2)
, execvp(3)
.
- Choose the
exec
variant that automatically finds the executable usingPATH
. - Pay special attention to the form of arguments these functions expect and how errors manifest when calling these functions.
Review our discussion of the fork-exec-wait model from class (especially the shell loop and process management examples), plus the lab on processes. The textbook covers the same material, but it sometimes dives into details too quickly in this section.
Reference
Error-Checking
Remember to check the return values of functions like fork
, exec
,
and waitpid
to see if they succeeded or result in errors. There are
no exceptions in C, so you must check for error return values
explicitly. If one of these system calls results in an error, it sets
a special global variable (errno
) to indicate why. You can print a
summary easily with the perror
function. It takes a string, prints
that string, and then prints a description of the error that occurred.
In general, a system call failure means you should print the error and
exit the current process immediately. For example:
pid_t pid = fork();
if (pid < 0) {
// Fork system call failed.
perror("fork");
exit(-1);
} ...
Pay special attention to failures of the exec
family of functions,
especially if it seems like your shell requires you to type the exit
command multiple times before finally exiting.
Memory Management
Both readline()
and command_parse()
allocate memory. You must be sure to free this memory when the code is done with it – think carefully about this. Try to balance timely freeing with centralized freeing: free this memory as soon as possible after the time it becomes obviously unneeded, but try to minimize the number of points in the code where freeing takes place. Avoid sprinkling command_free()
calls around the program.
Use valgrind ./shell.bin
to check for memory errors. Valgrind will report memory accesses that fall outside the bounds of legal memory locations (local variables on the stack and allocated heap memory). It also prints a summary of memory leaks (allocated but unfreed memory) at program exit. Note that when your process forks
, Valgrind goes with it – your process forks into two processes running under Valgrind. So if the child exits in error (e.g., trying to execute a nonexistent executable file), you may see an extra Valgrind exit message. Once you have successfully exec
‘d, the new program will not be running under Valgrind.
Your shell should run without valgrind errors when you have finished each Part (with one exception). The one error that is OK is where some readline internals and Valgrind don’t see eye-to-eye. It will look something like this.
==1202== Syscall param sendmsg(msg.msg_name) points to uninitialised byte(s)
==1202== at 0x362AAE9880: __sendmsg_nocancel (in /lib64/libc-2.12.so)
==1202== by 0x362C215BF7: readline (in /lib64/libreadline.so.6.0)
...
The initial starter shell (which simply echoes every command and does
nothing more) has memory leaks. Your final shell should have no memory
leaks (as measured by valgrind
).
Part 2: Background Jobs
In this part, you add support for background jobs and a built-in command to list them.
Background Jobs
A job is a record of a process started by the shell.2
Specification
An executable command ending with &
runs as a background command.
Built-in commands are unaffected by &
. The shell executes each
background command in a new process like a foreground command, but
unlike a foreground command:
- The shell prints a summary of the background job, but does not wait for the child process to terminate before printing the next command prompt.
- The shell remembers the child process and reaps the child process after it terminates.
To support these requirements for background jobs:
- The shell saves each new background job in a list of pending jobs.
- Before each command prompt, the shell reaps, reports, and removes any background jobs whose processes have terminated.
Example: the following shell interactions demonstrate background jobs.
-> sleep 30 &
[1]+ 4709 Running sleep 30 &
-> sleep 55 &
[2]+ 4828 Running sleep 55 &
-> echo she sells C shells by the C store
she sells C shells by the C store
… wait at least 30 seconds since starting sleep 30
…
-> echo world
world
[1] 4709 Done sleep 30 &
-> cd ..
-> man sleep
...
… wait at least 55 seconds since starting sleep 55
…
-> echo accio
accio
[2] 4828 Done sleep 55 &
->
Test-drive features in bash
.
Before working on implementation, try out background jobs in bash
to
get a feel for how they work. A useful program for testing foreground
and background jobs is sleep
. It takes one argument, a number, and
it sleeps (does nothing) for that many seconds. (man sleep
)
Implement one feature at a time, in order, testing and committing each as you go.
Implementation
Immediately after reading this implementation guide, read the
documentation of the provided JobList
data structure
and associated functions in joblist.h
.
- Use a single
JobList
instance, created before the main shell loop begins. - The
JobList*
parameters are for passing around a reference to this list. (Avoid global variables!) - Save each new background job in a list of jobs. (See
job_save
.) - Save every job, even foreground jobs. (Helps with Part 3.)
- Remove foreground jobs from the job list immediately upon
completion. (See
job_delete
.) - Report creation of background jobs only, not foreground jobs.
- Remove foreground jobs from the job list immediately upon
completion. (See
- Call a function at the end of the shell loop body to reap, report,
and remove any background jobs whose processes have terminated.
- See
job_iter
,job_set_status
,job_print
, andjob_delete
for theJobList
data structure. - Use
waitpid(2)
to check if a process has terminated. Look up how to use theWNOHANG
option and the return value ofwaitpid
to allowwaitpid
to return even if the target process has not terminated.
- See
System call documentation: waitpid(2)
. See WNOHANG
.
Job Lists
Use the provided data structures and functions declared and
documented in joblist.h
to save, track, and report jobs. There are
two kinds of C structs
comprising the job list data structure:
- A
JobList
structure stores a list of jobs. (You need exactly oneJobList
for the shell.) - A
Job
structure stores information about a single job (an executable command that has been started by the shell):- job ID (
jid
field) – the shell’s short, memorable ID for this job; - process ID (
pid
field) – the operating system’s process ID for the process in which the job’s command was executed (pid
≠jid
); - command array (
command
field) – the command array used to start the job; - and job status (
status
field)JOB_STATUS_FOREGROUND
– process is running as a foreground jobJOB_STATUS_BACKGROUND
– process is running as a background jobJOB_STATUS_STOPPED
– process is stopped (paused, see Part 3)JOB_STATUS_DONE
– process has been reaped and the job is about to be deleted (transient state used only to affect job printing)
- job ID (
The shell job status is separate from the OS process status.
A job is the shell’s data structure record about a process it
started, including its last known job status (JobStatus
), but it
is distinct from the process itself. The OS knows neither that this
job data structure exists nor what it means; updating it is entirely
the responsibility of the shell. Nothing here is automatic or
implicit.
The operating system keeps its own internal information about the
current state of each process. To detect changes in the status of a
job’s process and effect changes in job status in response, the
shell must explicitly request process status info from the OS with
waitpid
and update the job status according to the results.
Both structures also track extra information used in managing linked lists, terminal interaction, keyboard interrupt signals, etc. (See Part 3.) Ignore any fields not listed above.
Do not create instances of these structures yourself. Do not read
or manipulate fields of JobList
structs directly. You may read
fields of Job
structs listed above (use the ->
syntax), but any
other operations on JobList
or Job
structs (including changes to
job status) should be done via the provided functions.
Job list basics:
joblist_create
allocates and initializes a newJobList
.joblist_empty
returns 1 if the given list is empty and 0 otherwise.joblist_free
frees a (presumed empty)JobList
.
Managing individual jobs:
job_save
creates a new backgroundJob
with the given process ID, command, and status, assigns it the next available job ID, and saves it at the end of the list. If the job is not a foreground job, it also sets thisJob
as the “current”Job
in the list. (More about “current” in Part 3.) The command array is saved by reference (not copied) as part of the job data structure.- Save a single command array in at most one job.
- Do not free a command array after passing it to
job_save
.job_delete
will handle that.
job_get
gets theJob
for a given job ID.job_get_current
gets the “current”Job
in the list, if any.job_print
prints a specificJob
.job_set_status
sets aJob
’s status and sets the “current”Job
in the list accordingly.job_delete
removes aJob
from the list and frees it and its command array (by callingcommand_free
).job_iter
iterates over aJobList
, calling the given function (passed via function pointer) on eachJob
in the list.- A function pointer (K&R 5.11, p. 118) holds the address of a function, allowing functions to be passed as arguments to other functions. Note to those who have taken 251: function pointers are not closures. Any state used in the passed function must be passed explicitly as an argument.
job_iter
allows you to pass a function (just give its name as an argument) for what to do with eachJob
, but it does not provide a way to build up any result by iterating over the list (because you should not need to do this!).-
Example: prints each job in the list that has an even job ID.
void printer(JobList* jobs, Job* job) { if ((job->jid % 2) == 0) { job_print(jobs, job); } } ... // somewhere in a function ... // where jobs is a JobList* ... job_iter(jobs, printer);
jobs
command
Specification
Built-in command jobs
reports all background jobs (and stopped
jobs in Part 3), along with their job ID, process ID, execution
status, and command.
-> jobs
[1] 5498 Stopped emacs shell.c
[3] 5502 Running evince bash-book.pdf &
[4] 5504 Running sleep 20 &
Implementation
- Reap completed jobs as part of the
jobs
command. - Share code between two tasks:
- Job reaping/reporting done before command prompts.
- Explicit job reporting with
jobs
.
- See the provided job list code, especially
job_iter
,job_set_status
,job_print
,job_delete
.
Part 3: Job Management
This part requires a surface understanding of signals, which we likely did not cover in class. You implement built-in commands and other features to manage background, stopped, and foreground jobs, and support for keyboard interrupts.
Test-drive features in bash
. Work incrementally.
Before working on implementation, try out these features in bash
to
get a feel for how they work. A useful program for testing foreground
and background jobs is sleep
. It takes one argument, a number, and
it sleeps (does nothing) for that many seconds. (man sleep
)
Implement one feature at a time, in order, testing and committing each as you go.
Making implementation mistakes in this part often result in “locking yourself out.” See Getting Out of Trouble if your shell becomes unresponsive.
Keyboard Interrupts
Control-C is used to send the SIGINT
signal, which terminates a
process by default. Control-Z sends the SIGTSTP
signal, which
causes a process to stop (a.k.a. pause) until explicitly resumed
(continued).
Specification
In the shell, Control-C and Control-Z signals affect foreground commands only. They never cause the shell (or any background commands) to terminate or stop.
- If a foreground command is running:
- Control-C aborts the foreground command. The shell return to a command prompt.
- Control-Z stops (pauses) the foreground command. The shell reports that the command has been stopped, keeping it in the jobs list, and returns to a command prompt.
- If no foreground command is running:
- Control-C has no effect.
- Control-Z has no effect.
Implementation
Instead of writing a few complicated signal handlers yourself (this is error-prone), you will use provided code to “attach” the terminal (and the associated delivery of keyboard signals) to the current foreground process. Additionally, you will change how the shell waits for foreground jobs so that it can also notice when a job has been stopped via signals.
Terminals and Signals
The terminal (a.k.a. TTY) or terminal emulator (the window where you type to interact with the command line) is always attached to a particular foreground process or progress group. Keyboard input (including signals sent with Control-C and Control-Z) is sent to this process. The functions in terminal.h
wrap up the necessary steps to manage which process receives these signals.
-
term_shell_init
sets up the shell’s interaction with the terminal and sets the shell process to ignore several signals, includingSIGINT
andSIGTSTP
. -
term_child_init
sets up a child process’s signal handlers and terminal interaction. The child process should call this function after being forked and beforeexec
-ing the requested executable. -
term_give
passes foreground status in the terminal from the shell to the new child process. The parent process should call this function after forking. The parent’s call toterm_give
and the child’s call toterm_child_init
cooperate to avoid timing problems. Both are necessary. -
term_take
reclaims foreground status for the shell from the previous foreground process. The shell should call this function whenever it continues after waiting for a foreground job to finish or stop.
Updated Waiting
With signals attached to the foreground job, the shell’s existing
behavior of waiting for the foreground process to finish in works as
expected with minor modifications. Your existing waiting code should
call waitpid
such that it returns if the foreground process
terminated or if it stopped.
- See the
WUNTRACED
option and use the macros described inwaitpid(2)
to determine what happened in the foreground process to allowwaitpid
to return. - If the foreground process terminated by exiting normally or due to a signal, the shell should continue to the next command prompt as usual. Normal termination and termination by Control-C can be treated equivalently.
- If the foreground process was stopped instead of terminating, it was due to Control-Z. In this case, the shell should report the stopped job and retain it in the job list.
fg
and bg
Specification
Built-in command fg
brings a running background job to the foreground or resumes a stopped job by running it in the foreground, in addition to updating its status in the job list. Think about how this moving a job to the foreground is similar to starting a brand new job in the foreground. Try to share code accordingly.
fg
takes a job ID as an optional argument. If no job ID is given and there is a “current” job,fg
should act on that job.- If
fg
acts on the “current” job, then there is no longer a “current” job. (job_set_status
implements this.) - If no job with the given ID exists, an error should be printed.
Built-in command bg
resumes a stopped job by running it in the background and updating its status in the job list.
bg
takes a job ID as an optional argument. If no job ID is given and there is a “current” job,bg
should act on that job.bg
prints a status message and sets its target job as the “current” job. (job_set_status
implements this.)- If the given job is already running in the background, the only effect is to set it as the “current” job.
- If no job with this ID exists, an error should be printed.
Implementation
- Tracking the “current” background job is already handled automatically by
job_set_status
. - You can get the current job (if any) by calling
job_get_current
on your jobs list. - Sending the
SIGCONT
signal to a stopped process will cause it to continue executing.- Use the
kill(2)
function to do this. - Do not confuse job IDs and process IDs.
- Instead of passing the desired process ID to
kill
, pass its negative (e.g.,-pid
) to send the signal to the entire process group root with that process. - Proper use of the functions from
terminal.h
ensures that each job’s process is the root of a process group with the same ID.
- Use the
Polite Exit
Specification
When given the command exit
or when the user types Control-D (EOF) at the prompt while any unfinished (and as yet unreapable) jobs remain running in the background or stopped, the shell should print the message There are unfinished jobs.
and return to the command prompt without exiting.
Example
-> sleep 100
^Z[1]+ 16501 Stopped sleep 100
-> sleep 20 &
[2]+ 16502 Running sleep 20 &
-> jobs
[1] 16501 Stopped sleep 100
[2]+ 16502 Running sleep 20 &
-> bg 1
[1]+ 16501 Running sleep 100
-> jobs
[1]+ 16501 Running sleep 100 &
[2] 16502 Running sleep 20 &
-> fg
sleep 100
^C-> exit
There are unfinished jobs.
[2] 16502 Running sleep 20 &
-> jobs
[2] 16502 Done sleep 20 &
-> exit
Implementation
Think about how to share code between this check and the case where you attempt to reap background jobs before each prompt. (See joblist_empty
.) Note that the readline()
function used in the main shell loop will return NULL
(and hence control will leave the loop) when Control-D is entered at the prompt.
Getting Out of Trouble
If your shell does not respond to any commands, Control-C or Control-Z, you can close the terminal and open a new terminal, but other options might be more useful in debugging and recovering:
Open a second terminal and use these commands (in bash
, not your shell).
top
will show what processes are executing and using resources.ps ux
will show all processes belonging to you, including their process IDs and execution status.kill
lets you send signals to other processes you own. This can be useful for killing (or sending arbitrary signals to) your shell or processes you started from within your shell. Just be careful what you kill – don’t kill the instance of emacs where you are editingshell.c
with many unsaved changes…kill 12345
to politely terminate process with ID 12345kill -9 12345
orkill -KILL 12345
to terminate it right now no matter what.kill -INT 12345
to send the same signal Control-C sends.kill -TSTP 12345
to send the same signal Control-Z sends.kill -CONT 12345
to continue a stopped process.
Part 4: Open-Ended Extensions
You are welcome to add any other advanced shell features you like.
Search documentation of bash
or other shells for ideas.
Submission
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes. (Replace
FILES
with the files you changed andMESSAGE
with your commit message.)$ git add FILES $ git commit -m "MESSAGE"
-
Run the command
cs240 sign
to sign your work and respond to any assignment survey questions.$ cs240 sign
(If this encounters an error, instead execute
cs240.s23 sign
.) -
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/main'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.
License
Shell
by Benjamin P. Wood at Wellesley College is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Source code and evaluation infrastructure for Shell are available upon request for reuse by instructors in other courses or institutions.