CS 341: Shell

Overview
Requirements
Features
Design
Tips
- Trouble
- History
Submission and Grading

Overview

Assign: Tuesday, 3 Nov
Checkpoint: send a progress update Friday, 6 Nov
Checkpoint: aim to complete IO redirection Monday, 9 Nov
Due: Wednesday, 11 Nov
Teams: pairs or individuals
Submit: git add, git commit, and git push your completed code.
Reference:

In this project, you will build a Unix-like shell. You are welcome to reference the CS 240 Shell assignment, including adapting the starter code from that assignment or your own completed code from that assignment (with self-attribution). As a word of caution, some design choices made in the provided code for the CS 240 shell may be an awkward fit for the requirements of this shell. Even if you wish to consult that code, building from scratch will help you be sure you understand all your code (and feel more accomplished!).

A top-level piece of advice is to understand the intended behavior of a feature before trying to implement it. The best way to do that is usually to experiment with that feature in the normal shell that you are already using in your work. The specification intentionally leaves lots of room for a variety of implementation choices and requires that you learn/understand how several key system calls work and interact with each other. Once you have looked at some documentation, I am quite happy to field questions as you interpret it and think about how to put these system calls together to accomplish your goal. This understanding piece will be a major component of the “programming” that you do for this project.

Requirements

Once you report your team to me, I will create a shared GitHub repository for you to use.

What to Submit

Upon submission, your repository should contain at least these files:

README.md: A text file, formatted using Markdown, that documents:
- How to compile your shell.
- How to use any other items included in the repository, such as tests or a Makefile.
- What features your shell supports as well as how to use them, including examples.
- A brief design guide explaining the major structural components of your shell code.
- Any assumptions, non-standard behaviors, or known limitations of your shell.
At least one .c file containing code for your shell. You are welcome to organized your code into as many C source (.c) and header (.h) files as you like.

Additionally, it is recommended, but not required, that you include:

Makefile: Basic rules for compiling/cleaning your shell executable with make.
Some tests. You may wish to copy the text infrastructure from the Syscalls assignment and replace the test specifications with a series of your own tests for the shell. Using shell scripts is also a useful way to save and test a sequence of commands.

What to Implement

Your shell program should support:

The ability run in “interactive mode,” where the user types individual commands at a command prompt, or “script mode,” where your shell is invoked with a file containing a sequence of commands separated by newlines.
A few standard builtin commands, including at least cd and exit.
The ability to run simple single foreground executable commands by invoking executables outside the shell with command-line arguments (e.g., /bin/ls cs341/shell).
Support for input and output redirection and all kinds of executable commands. (Note, builtin commands should not support redirection.)
Support for pipes of executable commands. (Note, builtin commands should not support pipes.)

Suggestions

It is recommended that your shell support a “verbose mode” available either by changing a single static const variable (a constant) or macro in the source code or by passing a -v argument when invoking the shell. In verbose mode, the shell will print additional metadata as described below.

You are also encouraged to explore other advanced shell feature of your choice if you have time.

Features

Interactive Mode and Script Mode

The shell is in interactive mode when the shell itself is invoked with zero arguments:

$ ./your-shell
your-prompt> /bin/ls
Makefile    README.md    your-shell.c
your-prompt> exit
$

When invoked in interactive mode, the shell should print a prompt each time it is ready to accept a new command.

The shell is in script mode when it is invoked with one argument, assumed to be the name of a file containing a sequence of commands separated by newlines.

$ cat your-script.sh
/bin/ls
echo hello world
exit
$ ./your-shell your-script.sh
your-prompt> /bin/ls
Makefile    README.md    your-shell.c
your-prompt> echo hello world
hello world
your-prompt> exit
$

When invoked in script mode, the shell should do the following for each line of the file whose path is given by the first argument:

Print a prompt followed by the line from the file.
Run the command indicated by the line from the file.

When invoked with 2 or more arguments, the shell should immediately exit in error with a message about its proper usage.

In both modes:

The shell should exit when it encounters the exit command or the end-of-file (EOF) indicator, which can be typed at an interactive shell with Control-D.
After each command line, the prompt for the next command-line must not be shown until the previous command line has completed.

Builtin Commands

Builtin commands invoke functions of the shell itself to change state of the shell process or terminate it. They do not create new processes. The shell must support at least these builtins:

The builtin command cd takes a single path as an argument and changes the working directory of the shell process to the directory given by that path, if it exists. Optionally, cd may support any other convenience features such as:
- cd with no arguments changes the working directory to the current user’s home directory.
- cd - changes to the previous working directory.
- more of your choosing.
The builtin command exit terminates the shell process itself.

Optionally, the shell may support any other builtin commands that interest you such as pushd or popd.

Executable Commands

Any command that is not a builtin command is assumed to be an executable command. For executable commands, the shell must create a new child process and execute the given command in that new process. Once the child process has completed (and no sooner), the shell should continue to the next command prompt.

For example, this command runs the executable from the file /bin/ls with command-line arguments /bin/ls and cs341/shell.

your-prompt> /bin/ls cs341/shell
Makefile    README.md    your-shell.c
your-prompt>

If the command gives an executable that does not exist, an error message to this effect should appear, and then the shell should continue to the next command prompt.

your-prompt> no-such-executable whhaaaaaat
error: could not find executable "no-such-executable"
your-prompt>

Do not implement executables; implement the logic to launch an arbitrary executable.

(Now with an orange box and new wording, since it is important!)

Note: the shell itself does not implement any executables! Neither will you implement any executables except the shell. You will not write the logic of ls or cat, etc.

The shell merely invokes existing executable files, given their path in the filesystem. The shell is a launcher of executables, not a provider of executables. This means that a single case in the shell can handle all possible executable commands, whether they be executables that came with the system (like ls) or brand new programs that we write and compile.

When running the executable command /bin/ls cs341/shell, the shell has no clue what ls does or even whether it exists. It just attempts to launch a process and exec the file at the given path, /bin/ls, with the given arguments, "/bin/ls" and "cs341/shell".

In recommended verbose mode, the shell should print a message indicating the PID, executable name (and optionally arguments), and exit status of the child process when it launches and when it completes:

your-prompt> /bin/ls cs341/shell
[Launching: 2534 /bin/ls]
Makefile    README.md    your-shell.c
[Completed: 2534 /bin/ls with exit code 0]
your-prompt>

Optionally, the shell may support invoking executables by name only (without a complete path) by searching for executables with that name in directories listed by the PATH environment variable. For example, ls cs341/shell would have the same behavior as /bin/ls cs341/shell, assuming that /bin is listed in the PATH environment variable. Check documentation of exec-related functions as a starting point or implement path search yourself.

Input and Output Redirection

For executable commands, the shell should include support for redirecting the standard input from a file (./executable < input-file.txt), redirecting the standard output to a file (./executable > output-file.txt), or both (./executable < input-file.txt > output-file.txt).

The key symbols identifying redirection are < for input redirection and > for output redirection. In both cases, the file name for redirection appears as the next token in the command line string to the right of the < or > token.

The effect of input redirection is that the standard input file descriptor (stdin) of the child process that runs the executable should be connected to the given file instead of the terminal keyboard.

your-prompt> /bin/cat input.txt
Hello world.
This is a file.
your-prompt> /bin/cat < input.txt
Hello world.
This is a file.
your-prompt>

Redirects are entirely invisible to the executable

Notice that the strings "<" and "input.txt" are not part of the argument array passed when executing /bin/cat. They are special directives to the shell indicating the shell should set up redirection of stdin for the child process in which it executes /bin/cat.

If the input file does not exist, an error should appear, the executable should not be invoked, and the shell should then provide the next command prompt.

your-prompt> /bin/ls
input.txt
your-prompt> /bin/cat < not-here.txt
error: no such file
your-prompt>

The effect of output redirection is that the standard output file descriptor (stdout) of the child process that runs the executable should be connected to the given file instead of the terminal screen.

If the output file does not exist, it should be created. If the output file does exist (and is a file), it should be overwritten. If a directory exists at the output file path, an error should appear, the executable should not be invoked, and the shell should proceed to the next command prompt.

your-prompt> /bin/ls
hello.txt
your-prompt> /bin/cat hello.txt
Hello world.
your-prompt> /bin/cat hello.txt > a.txt
your-prompt> /bin/ls
a.txt    hello.txt
your-prompt> /bin/cat a.txt
Hello world.
your-prompt> /bin/echo Helloooooooooo wooooooooorld > a.txt
your-prompt> /bin/ls
a.txt    hello.txt
your-prompt> /bin/cat a.txt
Helloooooooooo wooooooooorld
your-prompt>

The best way to understand the expected behavior of these features is to use them in an existing shell.

The dup/dup2 system calls will prove useful. It is important to think about which process needs to use them, and when, relative to other steps.

Optionally, the shell may support the full flexibility of redirection syntax offered by most shells:

Spaces are not required on either side of < or >. For example: ./executable>output-file.txt
Redirection indicators can come in any order with respect to each other or the executable. For example, > output-file.txt ./executable arg1 <input-file.txt arg2. Note that it is still the case that the next token to the right of the < or the > must be the redirect filename.

Pipes

Shell pipe commands connect the standard output of one process to the standard input of another. For example, the command /bin/cat names.txt | /bin/sort launches two processes:

The first process runs the /bin/cat executable.
The second process runs the /bin/sort executable.

All output written to stdout (standard output) in the /bin/cat process becomes available as input readable from stdin (standard input) in the /bin/short process. The shell continues to the next command prompt only once all processes in the pipeline have completed.

your-prompt> /bin/cat names.txt
Pendleton
Clapp
Lulu
your-prompt> /bin/sort < names.txt
Clapp
Lulu
Pendleton
your-prompt> /bin/cat names.txt | /bin/sort
Clapp
Lulu
Pendleton
your-prompt>

The pipe and dup/dup2 and system calls will prove useful. It is important to think about which processes need to use them, and when, relative to other steps.

Think carefully about error cases involving pipes (such as one process in the pipeline failing in error while the others await pipe interaction). I will be happy to think through the logic with you. I will also be forgiving in grading this feature, especially if you document assumptions and the behavior your have implemented.

In recommended verbose mode, the shell should print a message indicating the PID, executable name (and optionally arguments), and exit status of each child process in the pipeline as that process completes:

your-prompt> /bin/cat names.txt | /bin/sort
[Launching: 3482 /bin/cat]
[Launching: 3483 /bin/sort]
[Completed: 3482 /bin/cat with exit code 0]
Clapp
Lulu
Pendleton
[Completed: 3483 /bin/sort with exit code 0]
your-prompt>

Note that the order of messages about separate process and the output or messages from other pipeline processes may not be predictable.

your-prompt> /bin/cat names.txt | /bin/sort
[Launching: 3483 /bin/sort]
[Launching: 3482 /bin/cat]
Clapp
Lulu
[Completed: 3482 /bin/cat with exit code 0]
Pendleton
[Completed: 3483 /bin/sort with exit code 0]
your-prompt>

Optionally, the shell may support:

Using both redirection and piping together: ./executable < input.txt | cat > output.txt
Pipelines with more than two commands: cat names.txt | sort | uniq | grep Wellesley

More Features

If you have more time and interest, your are encouraged (but not required) to support other interesting features beyond the standard requirements. These could include (but are not limited to) the following, listed in rough order of complexity (lowest to highest):

Command sequences with ; as a separator. The command line ./executable >output-file.txt; cat output-file.txt first runs ./executable >output-file.txt; after that command finishes, it runs cat output-file.txt. Sequencing should be usable with redirection and piping. The ; has the lowest precedence.
Several of the optional behaviors above.
Background jobs and job control. See typical definitions here.
Signal-handling for Control-C (generally relevant) and Control-Z (along with fg and bg commands).
Other combinations of the optional behaviors above.
Other shell language features such as variables or control-flow structures.

Design

You may organize the code of your shell you wish. It is likely that you want at least these key components:

Code for a main shell loop that deals with the repeated steps that occur in response to each new command line entered by the user.
Code for parsing a command line string.
Code for implementing shell build-in commands, which allow the user to invoke function of the shell program itself.
Code for invoking executable commands with the various shell options including input/output redirection or pipes.

Tips

Trouble

As you start manipulating processes and adding redirection and pipe features, you will probably get stuck at some point: your shell might just stop responding to input. In this event, you can try Control-C to interrupt/kill your shell (assuming you have not implemented special signal-handling behavior), but sometimes even that may not work and even if it does, you may leave other processes stuck. Log in via a second SSH connection and check out these commands and their documentation:

ps ux
kill, kill -9
pgrep, pkill

You may find it helpful to include the process ID (pid) of your shell somewhere in its output to simplify this task…

History

You might find readline useful if you want to implement command-line history.

Submission and Grading

Submit by ensuring that the material you want evaluated is available on the main branch in your GitHub repository.

Evaluation will include a live demo and code review. It will focus on completeness/correctness (80%) and code clarity/style/documentation (20%).

Contents