Assignment: Concurrency

Contents

Overview

It’s time to build a shell! You have used the bash shell since the first CS 240 lab (and likely even earlier in other CS courses). You now know enough to build a small shell! Your shell, which should probably acquire a punny name by the time you are done building it, will provide basic functions familiar from bash.1

A shell manages commands on behalf of a user, using processes. You already implemented a parser for simple shell commands in the Pointers assignment. In this assignment you will implement a few standard process management facilities that are standard shell features.

Goals

  • To understand the process model and apply specific system calls for basic process management in the operating system (fork, execvp, waitpid, etc.).
  • To reason about and manage simple concurrency at the process level.
  • To develop debugging skills for more complex environments, including operating system interaction and multiple processes.
  • To think critically about how to implement a specification.
  • To read documentation to find and use relevant library functions.
  • To increase proficiency with C and basic systems programming.
  • To build a tool that you have been using (at least) all semester, then ruminate on circularity as you use your own shell… to work on your own shell.

Advice: figuring it out

This assignment gives a specification describing the required behavior of your shell. It is your job to figure out how to implement this specification using the following key tools:

  • Review process concepts.
  • Read the specifications carefully and think critically about how best to implement them.
  • Read the implementation suggestions carefully.
  • Search and consult documentation to learn about relevant library functions and system calls.
  • Experiment with features in bash and try out implementation ideas.

Time Reports

Self-reported times on this assignment from a previous offering:

  • 25% of students spent <= 6 hours.
  • 50% of students spent <= 8 hours.
  • 75% of students spent <= 10 hours.

Setup

Use a CS 240 computing environment. The shell you build might work on similar (macOS, Linux) platforms with your own implementation of command_parse (no guarantees!), but it must work on our platforms for grading.

Get your repository with cs240 start concurrency.

Your starter repository contains the following files:

  • Makefile: recipe for building the shell.
  • soln_command.a, soln_command.h: provided command-parsing library (or use yours from Pointers).
  • joblist.c, joblist.h: provided data structure for managing background jobs.
  • shell.c: you will implement the main shell logic in this file.
  • terminal.c, terminal.h: provided library for simple terminal control (needed for extra fun extensions only).

Compiling and running your shell

Compile the shell with make to produce an executable called shell.bin.

Run the shell with: ./shell.bin. Exit the shell by typing Control-d (or exit, once you have implemented it). Remove all compiled files (including the executable, if you want to recompile from scratch) with make clean.

Tasks

Implement the shell in parts. Complete and test each part before starting the next.

  1. Required support for basic built-in commands and foreground jobs.
  2. Required support for basic background jobs.

If you would like to implement optional extensions (for your own learning, not for extra credit), see Part 3 (job management) and Part 4 (extensions) from a previous offering here.

Grading considers design, documentation, style, memory safety, and correctness.

Preparatory Exercises

  1. Complete the lab activities on processes and bash.

  2. Read Part 1 and Part 2.

  3. What is the difference between a built-in command and an executable command?

  4. Before starting implementation of each part of the shell, try out the specified feature in bash (the shell you use in the course computing environments). Keep in mind that bash is a much more complex shell than what you will implement, but the basic behavior of the features described below is the same.

  5. Read through the starter stencil code (which is only about 50 lines of non-comment C code). Note the TODO: comments, which show as blue in VSCode. These are there to help guide your implementation.

Part 1: Basic builtin commands

In this part, you implement support for basic builtin shell commands. Each feature is described with two parts:

  • The declarative specification describes the required behavior that you must implement: “The shell does X.”
  • The imperative implementation guide recommends code structure and relevant library functions or system calls: “Call Y with Z. Documentation: …”

Test out any of these features in bash to understand expected behavior.

Shell Loop and Command Dispatch

Specification

At the top-level the shell is a loop that repeatedly:

  1. Prints a prompt (e.g., ->).
  2. Reads and parses a command entered at the prompt.
  3. Runs that command.

The stencil code is set up to have this basic structure, though it is not a memory-safe implementation—you will need to insert calls that free data in order to eliminate memory leaks. Recall that a memory leak is a case where the code allocates dynamic memory, but fails to free that memory once it is done being used.

The shell accepts two types of commands:

  • Built-in commands run features of the shell itself. They change or inspect the state of the shell without creating any new processes.
    • Example: the exit command exits the shell.
  • Executable commands run executable programs from the filesystem in a separate process. Any command that is not built-in is assumed to be the name or path of an executable to run. All executable commands are run in a single general way.
    • Example: the ls executable lists the contents of the current working directory.

Implementation

A top-level shell loop is provided in main. It prints a prompt for the user and accept user input for a command line. The provided loop (and shell) exits if the user types the Control-D key combination at a prompt, sending the special EOF (end-of-file or end-of-input) character. For each command entered, the main shell loop calls shell_run_command. Currently shell_run_command simply echoes every command; your job is to make it meet the assignment specification.

Replace the body of shell_run_command to determine whether the command is a builtin or executable command. For Part 1, run any builtin command. The provided code includes stubs for a helper function shell_run_builtin. You’ll want to use this and add in additional helper functions as you see fit.

Much of the actual implementation of this assignment will require calling library functions in C. Our implementation advice will point out which functions are necessary, but it is up to you to read the manual pages and documentation to determine exactly how to use each function. This is to help you practice reading language documentation in general (which is how you will program outside of classes, out in the world).

Manual Pages and Documentation

This document refers to manual pages for the standard libraries and GNU/Linux user commands. This documentation is available by running the man command or online and contains several sections. 1 is executable commands, 2 is system calls, 3 is library functions. We use standard notation for referencing manual pages. execvp(3) indicates a page about execvp in section 3: run man 3 execvp or look online.

Manual pages have several key pieces. Pages for system calls or library functions include a section called Return Value. Pay close attention: most of these functions may return special error values that your code should handle.

Built-in Commands

Specification

The shell should accept three built-in commands (for Part 1).

  • exit exits the shell process.
  • cd changes the working directory of the shell.
    • If a path is given as an argument, then cd uses the directory described by that path: cd cs240
    • If no argument is given, then cd uses the directory path given by the HOME environment variable.
    • If more than one argument is given, then cd reports an error to the user (but does not cause the shell to exit).
  • ` ` (the empty command) does nothing: on the user just entering a newline, the shell does not report an error or exit, it simply does nothing and waits for the next command.
  • (Optional) You may optionally implement help, which displays a message listing the usage of all built-in commands that your shell supports. (Yes, this list.) You will add more in Part 3.

Implementation

For Part 1, we suggest the following order of steps:

  1. Add code to handle the cd building shell command.
  2. Add code to handle the ` ` empty builtin shell command (think carefully about when to make this check to avoid a segmentation fault).
  3. Modify the existing code such that builtin commands do not leak memory. Rather than calling the standard library’s free function directly, you should insert calls to more specific functions (in this case, to free the command array before exiting).

The stencil’s existing shell_run_builtin function is designed to return 1 if the command passed in is a builtin command (in which case the function performs the commasnd), or 0 if the command is not (in which case, this function does nothing).

Do not fork or exec to run builtin commands – they are features of the shell itself. You should use/modify shell_run_builtin and create additional helper functions as needed. We have already provided one case—shell_builtin_exit—that you can view as an example (though exit is not yet memory safe in the original stencil code).

System call documentation: You will need to use the following C library functions: strcmp(3) (mind the return value), chdir(2), getenv(3).

Note that getenv(3) takes in the environment variable name without a leading $. For example, char *v = getenv("VAR"); stores a pointer to the value of the environment variable VAR in a local variable named v.

Testing builtins

Fully test cd later.

Wait to fully test cd after you implement executable commands, by using the executable command pwd to print the working directory after cd-ing. Don’t worry about adding C code to print the current working directory in the shell for now, just test that cd correct detects erroneous arguments.

For now, test cd just by ensuring that the entering the following commands do not cause your shell to crash, report an error, or exit:

-> cd
-> cd ..
->

And check that passing in more than 2 arguments does cause your shell to report an error (but not crash or exit):

-> cd a b
Error: cd: too many arguments
->

At this point, also test that you can enter an empty command without an error, crash, or exit:

->
->

Come up with your own additional cases to try against your shell at this point (there is a least one case that we have described, but not listed here).

Memory Management

The calls to command_parse() allocate memory. You must be sure to free this memory when the code is done with it. Aim to free memory as soon as possible after your code is done using it, but try to minimize the number of points in the code where freeing takes place. Avoid sprinkling excess command_free() calls around the program. In part 3, some of the job list calls include freeing for you.

Use valgrind ./shell.bin to check for memory errors. Valgrind will report memory accesses that fall outside the bounds of legal memory locations (local variables on the stack and allocated heap memory). It also prints a summary of memory leaks at program exit.

Your shell should run without valgrind errors when you have finished each Part.

The initial starter shell (which simply echoes every command and does nothing more) has memory leaks in some functions that are not currently called. Your final shell should have no memory leaks (as measured by valgrind) when the user runs any command then exit (or control-D). It is fine if your shell leaks memory after a control-C, since handling signals are beyond the required scope of this assignment.

Part 2: Executable Commands in the Foreground

For Part 2, you will implement support for running arbitrary executable programs in the foreground.

Specification

All non-built-in commands are executable commands that name executable programs on the filesystem to run in a separate process.

  • The first word of the command line indicates the executable to run.
    • If the executable is given by name (e.g., ls) rather than by absolute or relative path (e.g., /bin/ls or ./shell.bin), the shell finds the executable according to the PATH environment variable.
  • The shell passes the entire command array as arguments to the executable.
  • For foreground jobs (this part), the shell waits to print the next command prompt after the command terminates.

Example: given command ls ., the shell executes the /bin/ls command in a new process, using the argument array containing "ls" followed by the argument . (which indicates the current working directory). When the ls process has terminated, the shell prints a new prompt.

-> ls .
Makefile    command.h   joblist.h   terminal.c
command.c   joblist.c   shell.c     terminal.h
->

Implementation

Implement functionality to handle all executable commands:

  • fork a new child process to execute the requested executable command.
    • In the parent process (the shell), wait for the child process to terminate, then return.
    • In the child process, execute the requested executable with the entire command array as arguments.
  • Do error-checking for all system calls in the function that calls them. If the system call failed, print an error and exit the shell process with an error exit code.
  • Make sure to insert necessary calls to avoid memory leaks.

You’ll likely want to create a new helper function for this functionality.

System call documentation: You will need to use the following C library functions: exit(3), fork(2), waitpid(2), execvp(3).

  • Note that you do not need to handle the command-specific logic, such as special casing on the current directory with ., since the executables themselves implement that logic for you. You just need to execute the new process correctly using system calls.
  • Use execvp instead of the standard exec, since it will automatically finds the executable using PATH.
  • Pay special attention to the form of arguments these functions expect and how errors manifest when calling these functions.

Review our discussion of the fork-exec-wait model from class (especially the shell loop and process management examples), plus the lab on processes. The textbook covers the same material, but it sometimes dives into details too quickly in this section.

Reference

Error-Checking

Remember to check the return values of functions like fork, execvp, and waitpid to see if they succeeded or result in errors. There are no exceptions in C, so you must check for error return values explicitly. If one of these system calls results in an error, it sets a special global variable (errno) to indicate why. You should print this summary with the perror function. It takes a string, prints that string, and then prints a description of the error that occurred. In general, a system call failure means you should print the error and exit the current process immediately. For example:

// Call the function fork.
pid_t pid = fork();
// Check whether the return from fork indicates an error.
if (pid < 0) {
  // If there is an error, print a useful message and exit.
  perror("fork failed");
  exit(-1);
} ...

Pay attention to failures from execvp family of functions, especially if it seems like your shell requires you to type the exit command multiple times before finally exiting.

Testing foreground commands

First, check that ls works as expected:

-> ls .
joblist.c  Makefile   shell.c         soln_command.h  terminal.h
joblist.h  shell.bin  soln_command.a  terminal.c

Then, check that a call that takes some time to run, like sleep 5, does not return immediately:

-> sleep 5

It should return and you should see a new command prompt after 5 seconds. sleep takes one argument, a number, and it sleeps (does nothing) for that many seconds (see man sleep).

You should also check that errors from child processes are reported. For example, if you run ls with a file that does not exist, you should see:

-> ls foo
ls: cannot access 'foo': No such file or directory
->

You now have access to the system’s pwd call (print working directory), so you can more fully test that cd actually changes the current working directory:

-> pwd
/home/avh/cs240-repos/concurrency
-> cd
-> pwd
/home/avh
-> cd cs240-repos
-> pwd
/home/avh/cs240-repos
-> cd ../..
-> pwd
/home

Checking Memory Safety

Note that when your process forks, Valgrind still analyzes it – your process forks into two almost-identical processes, both running under Valgrind. So if the child exits in error right away (e.g., trying to execute a nonexistent executable file), you may see an extra Valgrind exit message. Once you have successfully exec‘d, the new program will not be running under Valgrind (since it is a new executable).

Ensure that you can run the above example trace under Valgrind without memory leaks before moving on to the next part.

Part 3: Background Jobs

In this part, you will add support for background jobs and a built-in command to list them.

Background Jobs

A job is a record of a process started by the shell.2

Specification

As you saw in Pointers, executable command ending with & indicate background command. Built-in commands are unaffected by &. The shell executes each background command in a new process like a foreground command, but unlike a foreground command:

  1. The shell prints a summary of the background job, but does not wait for the child process to terminate before printing the next command prompt.
  2. The shell “remembers” the child process and reaps the child process after it terminates.

To support these requirements for background jobs:

  1. The shell should save each new background job in a list of pending jobs.
  2. Before each command prompt (e.g., at the end of the previous loop iteration), the shell should reap, report, and remove any background jobs whose processes have terminated.

Difference from bash.

In most real-world shells, background commands will report their completion immediately upon their process’s termination, even if the user does not enter any additional commands. This is implemented via signals, which are covered in the optional components of this assignment, but not necessary for the required components.

Your shell need only report completed background jobs as the user enters new commands, not immediately upon their termination.

Example: the following shell interactions demonstrate background jobs.

-> sleep 30 &
[1]+ 4709  Running  sleep 30 &
-> sleep 55 &
[2]+ 4828  Running  sleep 55 &
-> echo she sells C shells by the C store
she sells C shells by the C store

… wait at least 30 seconds since starting sleep 30

-> echo world
world
[1]  4709  Done     sleep 30 &
-> cd ..
-> man sleep
...

… wait at least 55 seconds since starting sleep 55

-> echo hello
hello
[2]  4828  Done     sleep 55 &
->

Test-drive features in bash.

Before working on implementation, try out background jobs in bash to get a feel for how they work. A useful program for testing foreground and background jobs is sleep.

Implement one feature at a time, in order, testing and committing each as you go.

Implementation

We have implementing a data structure for keeping track of jobs for you (you’re welcome ;)); your task is to use this existing implementation. Read the documentation of the provided JobList data structure and associated functions in joblist.h to know which functions are available to you.

  • Use the single JobList instance that the stencil code creates before the main shell loop begins.
  • The JobList* parameters are for passing around a reference to this list. Because we are passing around a pointer to one place in memory, each function call you make on this parameter will modify the same list, as desired.
  • Save each new job in a list of jobs. (See job_save.)
    • For background jobs:
      • You should also report the job to the user (See job_print.)
      • The background job should stay on the jobs list until the child process terminates.
    • For foreground jobs.
      • You should remove foreground jobs from the job list immediately upon completion. (See job_delete.)
      • You should not report the creation of foreground jobs.
  • Call a function at the end of the shell loop body to reap, report, and remove any background jobs whose processes have terminated.
    • See job_iter, job_set_status, job_print, and job_delete for the JobList data structure.
    • Use waitpid(2) to check if a process has terminated. Look up how to use the WNOHANG option to allow waitpid to return even if the target process has not terminated. Use the return value to determine whether or not the child process has terminated.

System call documentation: You will need to use the following C library functions: waitpid(2). See WNOHANG. An example of the syntax for using WNOHANG (your other arguments will likely differ):

waitpid(pid, &s, WNOHANG)
Job Lists

Use the provided data structures and functions declared and documented in joblist.h to save, track, and report jobs. There are two kinds of C structs comprising the job list data structure:

  • A JobList structure stores a list of jobs. (You need exactly one JobList for the shell.)
  • A Job structure stores information about a single job (an executable command that has been started by the shell):
    • job ID (jid field) – the shell’s short, memorable ID for this job;
    • process ID (pid field) – the operating system’s process ID for the process in which the job’s command was executed (pidjid);
    • command array (command field) – the command array used to start the job;
    • and job status (status field)
      • JOB_STATUS_FOREGROUND – process is running as a foreground job
      • JOB_STATUS_BACKGROUND – process is running as a background job
      • JOB_STATUS_STOPPED – process is stopped (only used in the optional extensions)
      • JOB_STATUS_DONE – process has been reaped and the job is about to be deleted (transient state used only to affect job printing)

The shell job status is separate from the OS process status.

A job is the shell’s data structure record about a process it started, including its last known job status (JobStatus), but it is distinct from the process itself. The OS knows neither that this job data structure exists nor what it means; updating it is entirely the responsibility of the shell. Nothing here is automatic or implicit.

The operating system keeps its own internal information about the current state of each process. To detect changes in the status of a job’s process and effect changes in job status in response, the shell must explicitly request process status info from the OS with waitpid and update the job status according to the results.

Both structures also track extra information used in managing linked lists, terminal interaction, keyboard interrupt signals, etc, for the optional components of this project. For Parts 1 & 2, ignore any fields not listed above.

You do not need to create instances of these structures yourself. You can read fields of Job structs listed above (use the -> syntax), but any other operations on JobList or Job structs (including changes to job status) should be done via the provided functions.

Job list basics:
  • joblist_create allocates and initializes a new JobList.
  • joblist_empty returns 1 if the given list is empty and 0 otherwise.
  • joblist_free frees a (presumed empty) JobList.
Managing individual jobs:
  • job_save creates a new background Job with the given process ID, command, and status, assigns it the next available job ID, and saves it at the end of the list. The command array is saved by reference (not copied) as part of the job data structure.
    • Save a single command array in at most one job.
    • Do not free a command array after passing it to job_save. job_delete will handle that freeing the command array for you.
  • job_get gets the Job for a given job ID. You will likely not need this for the required components of this assignment.
  • job_print prints a specific Job.
  • job_set_status sets a Job’s status.
  • job_delete removes a Job from the list and frees it and its command array (by calling command_free).
  • job_iter iterates over a JobList, calling the given function (passed via function pointer) on each Job in the list.
    • A function pointer (K&R 5.11, p. 118) holds the address of a function, allowing functions to be passed as arguments to other functions. Any state used in the passed function must be passed explicitly as an argument.
    • job_iter allows you to pass a function (just give its name as an argument) for what to do with each Job, but it does not provide a way to build up any result by iterating over the list (because you should not need to do this!).
    • Example: prints each job in the list that has an even job ID.

      void printer(JobList* jobs, Job* job) {
        if ((job->jid % 2) == 0) {
          job_print(jobs, job);
        }
      }
      
      ... // somewhere in a function
      ... // where jobs is a JobList* ...
      job_iter(jobs, printer);
      

jobs command

Specification

In this part, add a new builtin jobs command that reports all background jobs, along with their job ID, process ID, execution status, and command.

-> jobs
[3]  5502  Running  evince bash-book.pdf &
[4]  5504  Running  sleep 20 &

Implementation

  • Reap completed jobs as part of the jobs command.
  • Aim to share code between two tasks:
    1. Job reaping/reporting done after each command is run.
    2. Explicit job reporting with jobs.
  • See the provided job list code, especially job_iter, job_set_status, job_print, job_delete.

Testing background (and foreground) commands: Example correct interaction

You should test your shell extensively (by entering series of commands: interspersing foreground and background commands as well as builtin commands).

Here is an example, correct interaction; run under valgrind to demonstrate that there are no memory leaks.

First, make, then run the executable under valgrind with the --leak-check=full option to show full details:

[avh@cs concurrency] make
make: 'shell.bin' is up to date.
[avh@cs concurrency] valgrind --leak-check=full ./shell.bin
==1081948== Memcheck, a memory error detector
==1081948== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1081948== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1081948== Command: ./shell.bin
==1081948==

Then, run a series of builtin cd commands and executable pwd commands to show that we can successfully change directories, including to the home directory via no explicit argument to cd:

-> pwd
/home/avh/cs240-repos/concurrency
-> cd
-> pwd
/home/avh
-> cd cs240-repos/concurrency
-> pwd
/home/avh/cs240-repos/concurrency

Then, show creating two background commands to run sleep. Each command prints a summary (via a call to job_print).

-> sleep 10 &
[1]+ 1082027  Running    sleep 10 &
-> sleep 20 &
[2]+ 1082028  Running    sleep 20 &

When we run Part 2’s builtin jobs, we see a report of the two running background jobs (again, implemented via calls to job_print):

-> jobs
[1]  1082027  Running    sleep 10 &
[2]+ 1082028  Running    sleep 20 &

We can also still run foreground commands:

-> echo hello
hello

After 10 seconds, if we enter an empty command line, we expect to see the first job finish and a corresponding report (via another call to job_print):

->
[1]  1082027  Done       sleep 10 &

Before the 20 seconds are up, we expect empty command lines to do nothing, until 20 total seconds have passed:

->
->
->
->
->
[2]  1082028  Done       sleep 20 &

As that point, if we run jobs again, the list should be empty:

-> jobs

We can then enter CMD-D to end the session (also be sure to test the built-in) exit. Because we have freed all allocated memory blocks, we see this report from valgrind with no leaks:

-> ==1081948==
==1081948== HEAP SUMMARY:
==1081948==     in use at exit: 0 bytes in 0 blocks
==1081948==   total heap usage: 39 allocs, 39 frees, 2,082 bytes allocated
==1081948==
==1081948== All heap blocks were freed -- no leaks are possible
==1081948==
==1081948== For lists of detected and suppressed errors, rerun with: -s
==1081948== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Submission

Submit: The course staff will collect your work directly from your hosted repository. To submit your work:

  1. Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.

  2. Make sure you have committed your latest changes. (Replace FILES with the files you changed and MESSAGE with your commit message.)

    $ git add FILES
    $ git commit -m "MESSAGE"
    
  3. Run the command cs240 sign to sign your work and respond to any assignment survey questions.

    $ cs240 sign
    

    (If this encounters an error, instead execute cs240.f24 sign.)

  4. Push your signature and your latest local commits to the hosted repository.

    $ git push
    

Confirm: All local changes have been submitted if the output of git status shows both:

  • Your branch is up to date with 'origin/main', meaning all local commits have been pushed
  • nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

Your grade is computed from 100 points, as follows:

  • Design, documentation, style (20 points):
    • Review notes from the Pointers assignment for tips on designing clean, readable C code.
    • Use helper functions to cleanly divide responsibilities between functions.
    • Document your code clearly with comments; see the provided function for style
  • Safety (20 points):
    • Your shell code must be free of compiler warnings, memory errors, and memory leaks (run valgrind!)
  • Correctness and completeness (60 points):
    • Your shell must implement the features described in the specification.
      • Part 1: 15 points
        • Built-in commands
      • Part 2: 20 points
        • Executable commands in the foreground
      • Part 3: 25 points
        • Executable commands in the background
        • jobs command
    • Perform extensive testing to increase your confidence in your shell’s correctness.

License

Creative Commons License
Concurrency by Benjamin P. Wood, with updates by Alexa VanHattum at Wellesley College is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Source code and evaluation infrastructure for Concurrency are available upon request for reuse by instructors in other courses or institutions.

  1. The full bash shell is orders of magnitude more complicated than what you will build, but yours will implement a few core features. 

  2. In more sophisticated shells, job may describe a related group of processes started together with one command.