Assignment: Concurrency

Contents

Overview

It’s time to build a shell! You have used the bash shell since the first CS 240 lab (if not earlier). You now know enough to build a small shell. Your shell, which should probably acquire a punny name containing “sh” by the time you are done building it, will provide basic functions familiar from bash.1

A shell is a command language interpreter and a prolific manager of processes. You already implemented a parser for simple shell commands in the Pointers assignment. In this assignment you, will implement a few standard process management facilities that are standard shell features.

Goals

  • To understand the process model and apply specific system calls for process management in the operating system (fork, execvp, waitpid, etc.).
  • To reason about and manage basic concurrency at the process level.
  • To develop debugging skills for more complex environments, including operating system interaction and multiple processes.
  • To think critically about how to implement a specification.
  • To read documentation to find and use relevant library functions.
  • To increase proficiency with C and basic systems programming.
  • To build a tool that you have been using (at least) all semester, then ruminate on circularity as you use your own shell… to work on your own shell.

Advice: figuring it out

This assignment gives a specification describing the required behavior of your shell. It is your job to figure out how to implement this specification using the following key tools:

  • Review process concepts.
  • Read the specifications carefully and think critically about how best to implement them.
  • Read the implementation suggestions carefully.
  • Search and consult documentation to learn about relevant library functions and system calls.
  • Experiment with features in bash and try out implementation ideas.

Time Reports

According to self-reported times on Parts 1 and 2 of this assignment from a previous offering:

  • 25% of students spent <= 5 hours.
  • 50% of students spent <= 6 hours.
  • 75% of students spent <= 8 hours.

Setup

Use a CS 240 computing environment. The shell you build might work on similar (macOS, Linux) platforms with your own implementation of command_parse (no guarantees!), but it must work on our platforms for grading.

Get your repository with cs240 start concurrency.

Your starter repository contains the following files:

  • Makefile: recipe for building the shell.
  • soln_command.a, soln_command.h: provided command-parsing library (or use yours from Pointers).
  • joblist.c, joblist.h: provided data structure for managing background jobs.
  • shell.c: you will implement the main shell logic in this file.
  • terminal.c, terminal.h: provided library for simple terminal control (needed for extra fun extensions only).

Compiling and running your shell

Compile the shell with make to produce an executable called shell.bin.

Run the shell with: ./shell.bin. Exit the shell by typing Control-d (or exit, once you have implemented it). Remove all compiled files (including the executable, if you want to recompile from scratch) with make clean.

Tasks

Implement the shell in parts. Complete and test each part before reading or starting the next.

  1. Required support for basic built-in commands and foreground jobs.
  2. Required support for basic background jobs.

If you would like to implement optional extensions (for your own learning, not for any extra credit), see Part 3 (job management) and Part 4 (extensions) from a previous offering here.

Grading considers design, documentation, style, memory safety, and correctness.

Preparatory Exercises

  1. Complete the lab activities on processes and bash before starting this assignment.

  2. Read Part 1.

  3. What is the difference between a built-in command and an executable command?

  4. Before starting implementation of each part of the shell, try out the specified feature in bash (the shell you use in the course computing environments). Keep in mind that bash is a much more complex shell than what you will implement, but the basic behavior of the features described below is the same.

Part 1: Basic builtin commands and foreground commands

In this part, you implement support for basic builtin shell commands and support for running arbitrary executable programs in the foreground. Each feature is described with two parts:

  • The declarative specification describes the required behavior that you must implement: “The shell does X.”
  • The imperative implementation guide recommends code structure and relevant library functions or system calls: “Call Y with Z. Documentation: …”

Test out any of these features in bash to understand expected behavior.

Shell Loop and Command Dispatch

Specification

At the top-level the shell is a loop that repeatedly:

  1. Prints a prompt (e.g., ->).
  2. Reads and parses a command entered at the prompt.
  3. Runs that command.

The stencil code is set up to have this basic structure, though it is not a memory-safe implementation (you will need to insert calls that free data in order to eliminate memory issues).

The shell accepts two types of commands:

  • Built-in commands run features of the shell itself. They change or inspect the state of the shell without creating any new processes.
    • Example: the exit command exits the shell.
  • Executable commands run executable programs from the filesystem in a separate process. Any command that is not built-in is assumed to be the name or path of an executable to run. All executable commands are run in a single general way.
    • Example: the ls executable lists the contents of the current working directory.

Implementation

A top-level shell loop is provided in main. It prints a prompt for the user and accept user input for a command line. The provided loop (and shell) exits if the user types the Control-D key combination at a prompt, sending the special EOF (end-of-file or end-of-input) character. For each command entered, the main shell loop calls shell_run_command, which is the starting point for your implementation. Currently shell_run_command simply echoes every command.

Replace the body of shell_run_command to determine whether the command is a built or executable command and run it appropriately. The provided code includes stubs for helper functions including shell_run_builtin and shell_run_executable that may help you organize and complete this task.

Much of the actual implementation in both Parts 1 and 2 of this assignment will require calling library functions in C. Our implementation advice will point out which functions are necessary, but it is up to you to read the manual pages and documentation to determine exactly how to use each function. This is to help you practice reading language documentation in general (which is how you will program outside of classes).

Manual Pages and Documentation

This document refers to manual pages for the standard libraries and GNU/Linux user commands. This documentation is available by running the man command or online and contains several sections. 1 is executable commands, 2 is system calls, 3 is library functions. We use standard notation for referencing manual pages. execvp(3) indicates a page about execvp in section 3: run man 3 execvp or look online.

Manual pages have several key pieces. Typically they list a Synopsis of function signature or command usage and provide a short Description of its behavior, followed by details and options. Pages for system calls or library functions include a section called Return Value. Pay close attention: most of these functions may return special error values that your code should handle.

Built-in Commands

Specification

The shell accepts three built-in commands (more later).

  • exit exits the shell process.
  • cd changes the working directory of the shell.
    • If a path is given as an argument, then cd uses the directory described by that path: cd cs240
    • If no argument is given, then cd uses the directory path given by the HOME environment variable.
    • If more than one argument is given, then cd reports an error to the user (but does not cause the shell to exit).
  • ` ` (the empty command) does nothing: on the user just entering a newline, the shell does not report an error or exit, it simply does nothing and waits for the next command.
  • (Optional) You may optionally implement help, which displays a message listing the usage of all built-in commands that your shell supports. (Yes, this list.) You will add more inPart 2)

Implementation

Implement functionality to handle the cd builtin shell command and the empty command directly in C code in the current process. You may find it helpful to use and extend the shell_run_builtin function, which should act as follows:

  1. If the command is built in, do the command and return true.
  2. If the command is not built in, return false.

Do not fork or exec to run builtin commands – they are features of the shell itself. Feel free to use shell_run_builtin and/or create and call other helper functions. We have provided one case: shell_builtin_exit.

System call documentation: You will need to use the following C library functions: strcmp(3) (mind the return value), chdir(2), getenv(3).

Note that getenv(3) takes in the environment variable name without a leading $. For example, char *v = getenv("VAR"); stores a pointer to the value of the environment variable VAR in a local variable named v.

Testing builtins

Fully test cd later.

Wait to fully test cd after you implement executable commands, by using the executable command pwd to print the working directory after cd-ing. Do not add C code to print the current working directory in the shell. It will cost you a lot of time and confusion at this point.

For now, test cd just by ensuring that the entering the following commands do not cause your shell to crash, report an error, or exit:

-> cd
-> cd ..
-> 

And check that passing in more than 2 arguments does cause your shell to report an error (but not crash or exit):

-> cd a b
Error: cd: too many arguments
-> 

At this point, also test that you can enter an empty command without an error, crash, or exit:

-> 
-> 

Come up with your own additional cases to try against your shell at this point.

Executable Commands

Specification

All non-built-in commands are executable commands that name executable programs on the filesystem to run in a separate process.

  • The first word of the command line indicates the executable to run.
    • If the executable is given by name (e.g., ls) rather than by absolute or relative path (e.g., /bin/ls or ./shell.bin), the shell finds the executable according to the PATH environment variable.
  • The shell passes the entire command array as arguments to the executable.
  • For foreground jobs (this part), the shell waits to print the next command prompt after the command terminates.

Example: given command ls ., the shell executes the /bin/ls command in a new process, using the argument array containing "ls" followed by the argument .. When the ls process has terminated, the shell prints a new prompt.

-> ls .
Makefile    command.h   joblist.h   terminal.c
command.c   joblist.c   shell.c     terminal.h
->

Implementation

Implement functionality to handle all executable commands:

  • Fork a new child process to execute the requested executable command.
    • In the parent process (the shell), wait for the child process to terminate, then return.
    • In the child process, execute the requested executable with the entire command array as arguments.
  • Do error-checking for all system calls in the function that calls them. If the system call failed, print an error and exit the shell process with an error exit code.

You may find it useful to extend the provided shell_run_executable stub for this functionality. For now, the bodies of your functions should ignore the jobs and foreground parameters, you will use in Part 2.

System call documentation: You will need to use the following C library functions: exit(3), fork(2), waitpid(2), execvp(3).

  • Note that you do not need to handle the command-specific logic, such as special casing on the current directory with ., since the executables themselves implement that logic for you. You just need to execute the new process correctly.
  • Use execvp instead of the standard exec, since it will automatically finds the executable using PATH.
  • Pay special attention to the form of arguments these functions expect and how errors manifest when calling these functions.

Review our discussion of the fork-exec-wait model from class (especially the shell loop and process management examples), plus the lab on processes. The textbook covers the same material, but it sometimes dives into details too quickly in this section.

Reference

Error-Checking

Remember to check the return values of functions like fork, execvp, and waitpid to see if they succeeded or result in errors. There are no exceptions in C, so you must check for error return values explicitly. If one of these system calls results in an error, it sets a special global variable (errno) to indicate why. You can print a summary easily with the perror function. It takes a string, prints that string, and then prints a description of the error that occurred. In general, a system call failure means you should print the error and exit the current process immediately. For example:

pid_t pid = fork();
if (pid < 0) {
  // Fork system call failed.
  perror("fork failed");
  exit(-1);
} ...

Pay special attention to failures of the exec family of functions, especially if it seems like your shell requires you to type the exit command multiple times before finally exiting.

Memory Management

The calls to command_parse() allocate memory. You must be sure to free this memory when the code is done with it – think carefully about this. Try to balance timely freeing with centralized freeing: free this memory as soon as possible after the time it becomes obviously unneeded, but try to minimize the number of points in the code where freeing takes place. Avoid sprinkling command_free() calls around the program. In part 2, some of the job list calls include freeing for you.

Use valgrind ./shell.bin to check for memory errors. Valgrind will report memory accesses that fall outside the bounds of legal memory locations (local variables on the stack and allocated heap memory). It also prints a summary of memory leaks (allocated but unfreed memory) at program exit. Note that when your process forks, Valgrind goes with it – your process forks into two processes running under Valgrind. So if the child exits in error (e.g., trying to execute a nonexistent executable file), you may see an extra Valgrind exit message. Once you have successfully exec‘d, the new program will not be running under Valgrind.

Your shell should run without valgrind errors when you have finished each Part.

The initial starter shell (which simply echoes every command and does nothing more) has memory leaks. Your final shell should have no memory leaks (as measured by valgrind).

Testing foreground commands

First, check that ls works as expected:

-> ls .
joblist.c  Makefile   shell.c         soln_command.h  terminal.h
joblist.h  shell.bin  soln_command.a  terminal.c

Then, check that a call that takes some time to run, like sleep 5, does not return immediately:

-> sleep 5

It should return and you should see a new command prompt after 5 seconds.

You should also check that errors from child processes are reported. For example, if you run ls with a file that does not exist, you should see:

-> ls foo
ls: cannot access 'foo': No such file or directory
-> 

You now have access to the system’s pwd call (print working directory), so you can more fully test that cd actually changes the current working directory:

-> pwd
/home/avh/cs240-repos/concurrency
-> cd 
-> pwd
/home/avh
-> cd cs240-repos
-> pwd
/home/avh/cs240-repos
-> cd ../..
-> pwd
/home

Part 2: Background Jobs

In this part, you add support for background jobs and a built-in command to list them.

Background Jobs

A job is a record of a process started by the shell.2

Specification

An executable command ending with & runs as a background command. Built-in commands are unaffected by &. The shell executes each background command in a new process like a foreground command, but unlike a foreground command:

  1. The shell prints a summary of the background job, but does not wait for the child process to terminate before printing the next command prompt.
  2. The shell remembers the child process and reaps the child process after it terminates.

To support these requirements for background jobs:

  1. The shell saves each new background job in a list of pending jobs.
  2. Before each command prompt (e.g., at the end of the previous loop iteration), the shell reaps, reports, and removes any background jobs whose processes have terminated.

Difference from bash.

In most real-world shells, background commands will report their completion immediately upon their process’s termination, even if the user does not enter any additional commands. This is implemented via signals, which are covered in the optional components of this assignment, but not necessary for the required components.

Your shell need only report completed background jobs as the user enters new commands, not immediately upon their termination.

Example: the following shell interactions demonstrate background jobs.

-> sleep 30 &
[1]+ 4709  Running  sleep 30 &
-> sleep 55 &
[2]+ 4828  Running  sleep 55 &
-> echo she sells C shells by the C store
she sells C shells by the C store

… wait at least 30 seconds since starting sleep 30

-> echo world
world
[1]  4709  Done     sleep 30 &
-> cd ..
-> man sleep
...

… wait at least 55 seconds since starting sleep 55

-> echo hello
hello
[2]  4828  Done     sleep 55 &
->

Test-drive features in bash.

Before working on implementation, try out background jobs in bash to get a feel for how they work. A useful program for testing foreground and background jobs is sleep. It takes one argument, a number, and it sleeps (does nothing) for that many seconds. (man sleep)

Implement one feature at a time, in order, testing and committing each as you go.

Implementation

Immediately after reading this implementation guide, read the documentation of the provided JobList data structure and associated functions in joblist.h.

  • Use the single JobList instance that the stencil code creates before the main shell loop begins.
  • The JobList* parameters are for passing around a reference to this list. (Avoid global variables!)
  • Save each new job in a list of jobs. (See job_save.)
    • For background jobs:
      • You should also report the job to the user (See job_print.)
      • The background job should stay on the jobs list until the child process terminates.
    • For foreground jobs.
      • You should remove foreground jobs from the job list immediately upon completion. (See job_delete.)
      • You should not report the creation of foreground jobs.
  • Call a function at the end of the shell loop body to reap, report, and remove any background jobs whose processes have terminated.
    • See job_iter, job_set_status, job_print, and job_delete for the JobList data structure.
    • Use waitpid(2) to check if a process has terminated. Look up how to use the WNOHANG option to allow waitpid to return even if the target process has not terminated. Use the return value to determine whether or not the child process has terminated.

System call documentation: You will need to use the following C library functions: waitpid(2). See WNOHANG. An example of the syntax for using WNOHANG (your other arguments will likely differ):

waitpid(pid, &s, WNOHANG)
Job Lists

Use the provided data structures and functions declared and documented in joblist.h to save, track, and report jobs. There are two kinds of C structs comprising the job list data structure:

  • A JobList structure stores a list of jobs. (You need exactly one JobList for the shell.)
  • A Job structure stores information about a single job (an executable command that has been started by the shell):
    • job ID (jid field) – the shell’s short, memorable ID for this job;
    • process ID (pid field) – the operating system’s process ID for the process in which the job’s command was executed (pidjid);
    • command array (command field) – the command array used to start the job;
    • and job status (status field)
      • JOB_STATUS_FOREGROUND – process is running as a foreground job
      • JOB_STATUS_BACKGROUND – process is running as a background job
      • JOB_STATUS_STOPPED – process is stopped (only used in the optional extensions)
      • JOB_STATUS_DONE – process has been reaped and the job is about to be deleted (transient state used only to affect job printing)

The shell job status is separate from the OS process status.

A job is the shell’s data structure record about a process it started, including its last known job status (JobStatus), but it is distinct from the process itself. The OS knows neither that this job data structure exists nor what it means; updating it is entirely the responsibility of the shell. Nothing here is automatic or implicit.

The operating system keeps its own internal information about the current state of each process. To detect changes in the status of a job’s process and effect changes in job status in response, the shell must explicitly request process status info from the OS with waitpid and update the job status according to the results.

Both structures also track extra information used in managing linked lists, terminal interaction, keyboard interrupt signals, etc, for the optional components of this project. For Parts 1 & 2, ignore any fields not listed above.

Do not create instances of these structures yourself. Do not read or manipulate fields of JobList structs directly. You may read fields of Job structs listed above (use the -> syntax), but any other operations on JobList or Job structs (including changes to job status) should be done via the provided functions.

Job list basics:
  • joblist_create allocates and initializes a new JobList.
  • joblist_empty returns 1 if the given list is empty and 0 otherwise.
  • joblist_free frees a (presumed empty) JobList.
Managing individual jobs:
  • job_save creates a new background Job with the given process ID, command, and status, assigns it the next available job ID, and saves it at the end of the list. The command array is saved by reference (not copied) as part of the job data structure.
    • Save a single command array in at most one job.
    • Do not free a command array after passing it to job_save. job_delete will handle that freeing the command array for you.
  • job_get gets the Job for a given job ID. You will likely not need this for the required components of this assignment.
  • job_print prints a specific Job.
  • job_set_status sets a Job’s status.
  • job_delete removes a Job from the list and frees it and its command array (by calling command_free).
  • job_iter iterates over a JobList, calling the given function (passed via function pointer) on each Job in the list.
    • A function pointer (K&R 5.11, p. 118) holds the address of a function, allowing functions to be passed as arguments to other functions. Any state used in the passed function must be passed explicitly as an argument.
    • job_iter allows you to pass a function (just give its name as an argument) for what to do with each Job, but it does not provide a way to build up any result by iterating over the list (because you should not need to do this!).
    • Example: prints each job in the list that has an even job ID.

      void printer(JobList* jobs, Job* job) {
        if ((job->jid % 2) == 0) {
          job_print(jobs, job);
        }
      }
      
      ... // somewhere in a function
      ... // where jobs is a JobList* ...
      job_iter(jobs, printer);
      

jobs command

Specification

In this part, add a new builtin jobs command that reports all background jobs, along with their job ID, process ID, execution status, and command.

-> jobs
[3]  5502  Running  evince bash-book.pdf &
[4]  5504  Running  sleep 20 &

Implementation

  • Reap completed jobs as part of the jobs command.
  • Share code between two tasks:
    1. Job reaping/reporting done after each command is run.
    2. Explicit job reporting with jobs.
  • See the provided job list code, especially job_iter, job_set_status, job_print, job_delete.

Testing background (and foreground) commands: Example correct interaction

You should test your shell extensively (by entering series of commands: interspersing foreground and background commands as well as builtin commands).

Here is an example, correct interaction; run under valgrind to demonstrate that there are no memory leaks.

First, make and run the executable under valgrind, with the --leak-check=full option to show full details:

[avh@cs concurrency] make
make: 'shell.bin' is up to date.
[avh@cs concurrency] valgrind --leak-check=full ./shell.bin
==1081948== Memcheck, a memory error detector
==1081948== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1081948== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1081948== Command: ./shell.bin
==1081948== 

Then, run a series of builtin cd commands and executable pwd commands to show that we can successfully change directories, including to the home directory via no explicit argument to cd:

-> pwd
/home/avh/cs240-repos/concurrency
-> cd
-> pwd
/home/avh
-> cd cs240-repos/concurrency
-> pwd
/home/avh/cs240-repos/concurrency

Then, show creating two background commands to run sleep. Each command prints a summary (via a call to job_print).

-> sleep 10 &
[1]+ 1082027  Running    sleep 10 &
-> sleep 20 &
[2]+ 1082028  Running    sleep 20 &

When we run Part 2’s builtin jobs, we see a report of the two running background jobs (again, implemented via calls to job_print):

-> jobs
[1]  1082027  Running    sleep 10 &
[2]+ 1082028  Running    sleep 20 &

We can also still run foreground commands:

-> echo hello
hello

After 10 seconds, if we enter an empty command line, we expect to see the first job finish and a corresponding report (via another call to job_print):

-> 
[1]  1082027  Done       sleep 10 &

Before the 20 seconds are up, we expect empty command lines to do nothing, until 20 total seconds have passed:

-> 
-> 
-> 
-> 
-> 
[2]  1082028  Done       sleep 20 &

As that point, if we run jobs again, the list should be empty:

-> jobs

We can then enter CMD-D to end the session (also be sure to test the built-in) exit. Because we have freed all allocated memory blocks, we see this report from valgrind with no leaks:

-> ==1081948== 
==1081948== HEAP SUMMARY:
==1081948==     in use at exit: 0 bytes in 0 blocks
==1081948==   total heap usage: 39 allocs, 39 frees, 2,082 bytes allocated
==1081948== 
==1081948== All heap blocks were freed -- no leaks are possible
==1081948== 
==1081948== For lists of detected and suppressed errors, rerun with: -s
==1081948== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Submission

Submit: The course staff will collect your work directly from your hosted repository. To submit your work:

  1. Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.

  2. Make sure you have committed your latest changes. (Replace FILES with the files you changed and MESSAGE with your commit message.)

    $ git add FILES
    $ git commit -m "MESSAGE"
    
  3. Run the command cs240 sign to sign your work and respond to any assignment survey questions.

    $ cs240 sign
    

    (If this encounters an error, instead execute cs240.f24 sign.)

  4. Push your signature and your latest local commits to the hosted repository.

    $ git push
    

Confirm: All local changes have been submitted if the output of git status shows both:

  • Your branch is up to date with 'origin/main', meaning all local commits have been pushed
  • nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

Your grade is computed from 50 points, as follows:

  • Design, documentation, style (10 points):
    • Review notes in the Pointers sample solutions for tips on designing clean, readable C code.
    • Document your code clearly with comments.
  • Safety (10 points):
    • Your shell code must be free of compiler warnings, memory errors, and memory leaks (run valgrind!)
  • Correctness and completeness (30 points):
    • Your shell must implement the features described in the specification.
      • Part 1: 15 points
        • Built-in commands
        • Foreground jobs
      • Part 2: 15 points
        • Background jobs
        • jobs command
    • Perform extensive testing to increase your confidence in your shell’s correctness.

License

Creative Commons License
Concurrency by Benjamin P. Wood at Wellesley College is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Source code and evaluation infrastructure for Concurrency are available upon request for reuse by instructors in other courses or institutions.

  1. The full bash shell is orders of magnitude more complicated than what you will build, but yours will implement a few core features. 

  2. In more sophisticated shells, job may describe a related group of processes started together with one command.