Pointer Potions
- Checkpoint: Sould be well underway by 11:00pm Thursday 15 October
- Due:
11:00pm Thursday 15 October11:00pm Monday 19 October - Starter code: fork wellesleycs240 / cs240-pointers (just once!) and add bpw as admin (Need help?)
- Submissions: Commit and push your final version. (Need help?)
- Relevant Reading:
- K&R Chapters 5-6
- C resources
- Debugging resources
- Collaboration: This assignment follows the standard collaboration policy. You may discuss problems and high-level strategies with other students, but your solutions must be your own work. You must cite anyone with whom you discussed the problems in a comment at the top of the relevant file.
If you missed out on learning how to use Emacs or Mercurial during the blur of the Bit Transfiguration assignment, please take the time to learn about them now via the linked materials. We will be using them much more from now through the end of the semester.
We provide and support two computing environments for CS 240 that include all of these tools. Unlike your earlier CS courses, CS 240 focuses on low-level implementation of computer systems. We apply the general concepts we learn in real computer systems, so our work often depends on low-level details of specific systems. It is typically difficult, error-prone, or impossible to complete CS 240 assignments elsewhere1 (e.g., on Mac or Windows). We are unable to offer support or guarantees if you choose to work on environments other than ours.
Contents
- Pointer Potions: Brewing a Command Parser
- Setup
- Starter Code and Review
- Command Structure and Parsing
- Required Functions
- Workflow
- Testing and Debugging
- C Function Declarations
Pointer Potions: Brewing a Command Parser
The main task of this assignment is to implement a shell command parser as a small library in C. The shell is the program that reads and interprets the commands you type in at the terminal command line. Later in the semester, you will build a full shell. For now, we will start with the first step toward a shell: splitting a user’s command line into its meaningful parts.
This parser takes a command line, stored as a single string, and converts it to an array of strings representing the command and its space-separated arguments. This may sound trivial given your previous programming experience with strings, and it is certainly not insurmountable, but you will be using a substantially different and lower-level toolbox to accomplish the job. We prohibit the use of string manipulation functions from the C standard library to force you to confront how strings are actually implemented and manipulated in memory.
The goals of this assignment are:
- to become familiar with the memory addressing model via the C programming language, pointers, arrays, and the link between pointer arithmetic and array indexing;
- to practice structured debugging techniques and tools for understanding execution of programs with pointers.
Please skim this full document before starting work. Pay special attention to “advice you might regret ignoring” highlighed in the orange boxes.
Setup
Starter code for this assignment is a shared Mercurial repository.
One time setup change if you have been having trouble with passphrases. If you have been having trouble remembering the passphrase you set for your SSH key (when setting up Mercurial), you may consider replacing it with an empty passphrase (less secure, but probably OK for this situation).
If you can remember or lookup that passphrase once and your password manager has not been remembering it for you:
- Run
ssh-keygen -p
, accept the default key (~/.ssh/id_rsa
), then enter the old passphrase, and the new one (just hit enter for empty) twice.
If you cannot remember your key passphrase:
- Remove your existing key:
rm ~/.ssh/id_rsa
- Then repeat the key generation process.
If you are using the wx appliance
Please log in as the wx
user and run the command wx upgrade
before working on this assignment to get updated versions of the tools we use.
If you already have a copy of the repository from lab
Please hg pull ssh://hg@bitbucket.org/wellesleycs240/cs240-pointers
and hg update
or hg merge; hg commit
as Mercurial directs you.
If you do not have already have a copy of the repository
Get a copy of the code as folows (need help?):
- Fork wellesleycs240 / cs240-pointers to create your own copy on Bitbucket (just once ever – you may have done this already for lab).
-
Clone your Bitbucket repository to your local computer (just once – you may have done this already for lab):
hg clone ssh://hg@bitbucket.org/yourusername/cs240-pointers
Starter Code and Review
Your working copy should contain several files covering this week’s lab material and this assignment:
Makefile
: rules to compile the various parts- Lab material (done with this, but good reference material):
tour.c
: a file containing a guided tour of pointers, arrays, and structs done before labpointer.c
: a file where you implemented some simple pointer manipulations in labptest.c
: tests forpointer.c
debug-practice
: several small C programs for debugging practice in labpractice.c
: starter and testing code for a few string manipulation functions
- Pointer Potions assignment:
command.c
: a file where you will implement a parser for simple shell commandscommand.h
: a C header file describing the functions you will implement incommand.c
command_test.c
: tests for your command parser
To compile the command-parsing code, run make ctest
. Run it with ./ctest
. [Updated to add this up here, 16 October]
If you are looking for additional reference material about C as a complement to The C Programming Language book, there is much material on the web. (And many other good books.) Start with the links for C on our tools page.
Command Structure and Parsing
Your task will be to write a handful of functions to convert command lines (strings) to command arrays (arrays of one-word strings) and work with command arrays. This section describes command lines, command arrays, and parsing in general. The next section describes the specific functions you will write.
Command Lines
A command line is a string. Here are two examples:
"ls -l cs240-pointers"
"emacs cs240-pointers/command.c &"
The command-line string may be comprised of several words (contiguous non-space characters) that indicate substructure:
- The first word is the name of the command to run:
ls
oremacs
- All remaining words are arguments to pass to this command:
-l
andcs240-pointers
orcs240-pointers/command.c
. - An optional
&
character at the end is never an argument (or part of an argument). For the parser, all we need to do is report the presence or absence of&
and leave it out of the result we return.- The
&
is metadata indicating that the command should be run as a background command. Command lines not ending with&
indicate foreground commands. When executing a foreground command, the shell waits for the command to finish before giving the next command prompt. When executing a background command, the shell gives the next prompt immediately, while the command continues to run in the background. Implementing command behavior will be part of our later shell implementation.
- The
Valid Command Lines
If the &
character appears in a command line, it should appear only as the last non-space character of the command line. For example, these are valid command lines:
"emacs &"
"emacs& "
" sleep 10 & "
These are invalid command lines (for our purposes – some are valid in other shells):
"&uhoh"
" & uh oh"
"uh & oh"
"uh& &oh"
"uh oh & &"
Command Arrays
The parser converts a command line into a command array, an array of strings representing the words of the command line, in order, terminated by a NULL
element. Recall that a string in C is not a special type; it is just an array of char
s terminated by a null character ('\0'
). A command array is thus an array of pointers to arrays of char
s and has the type char**
. All arrays in this structure must be null-terminated, using the right “null” for each array.
'\0'
is the null character. As a character, it is one byte in size.
NULL
is the null address. As an address, it is one machine word in size (32 or 64 bits, depending).
Do not mix them up!
Here is an example command array for the command line string "ls -l cs240pointers"
:
Command Array: Null-Terminated Strings
Index Contents (stored elsewhere in memory)
+----------+
0 | ptr *----------> "ls"
+----------+
1 | ptr *----------> "-l"
+----------+
2 | ptr *----------> "cs240-pointers"
+----------+
3 | NULL |
+----------+
Here is the same array drawn another way and showing how the array is arranged in memory, with each element’s offset from the base address of the array. This assumes addresses (and hence pointers) are 32 bits (4 bytes), so indices are related to offsets by a factor of 4.
Index: 0 1 2 3
Offset: +0 +4 +8 +12 +16
+-------+-------+-------+-------+
Contents: | * | * | * | NULL |
+---|---+---|---+---|---+-------+
| | |
V V V
"ls" "-l" "cs240-pointers"
Although we draw “strings” in the above pictures, this is an abstraction. Each string is actually represented by a '\0'
-terminated array of 1-byte characters in memory. Since each element is one byte, the offset is identical to the index.
Index: 0 1 2
Offset: +0 +1 +2 +3
+-----+-----+-----+
| 'l' | 's' |'\0' |
+-----+-----+-----+
Note: Practice reading memory diagrams every which way. Pay attention to what direction addresses grow. We are intentionally using different conventions in different drawings to help you get used to thinking on your feet…
Required Functions
Your task is to write four functions (plus as many helper functions as you would like) supporting command parsing in command.c
according to the headers in command.h
. If you write helper functions, be sure to read about how C function declarations work. We give a plan for implementation and testing below.
-
char** command_parse(char* line, int* foreground)
Parse a command-line string
line
and:- If the command line is valid:
- Store
0
where theforeground
pointer points if the command line contains&
after the last word. Store1
if it contains no&
. - Return a command array corresponding to the command-line string.
- Store
- If the command line contains invalid use of
&
, returnNULL
without setting foreground/backtround status.
- If the command line is valid:
-
void command_print(char** command)
Print a command array in the form of a command line, with the command words separated by spaces. Do not include a newline at the end. (We will rely on that when reusing this code later.) You will want to learn how to use printf.
-
void command_show(char** command)
Print the structure of a command array to aid in debugging and data inspection.
The output should make it clear exactly what strings the command array holds, but the format is left to you. For example, the output of a
command_show
using helpful formatting will distinguish a correct command array"ls -l cs240-pointers "
:command array: - "ls" - "-l" - "cs240-pointers" end
from an incorrect command array that contains some trailing spaces in words:
command array: - "ls " - "-l " - "cs240-pointers " end
while calling
command_print
will not make this so clear:ls -l cs240-pointers ls -l cs240-pointers
-
void command_free(char** command)
Free all parts of a command array structure previously created by
command_parse
and not previously freed.
Memory Allocation
All data structures used by the parser must be stored somewhere in memory. We should consider what code should be responsible for managing this memory and deciding when it is no longer needed.
Command-line strings are “owned” and managed by code outside of the command library. The parser should never free or mutate command-line strings.
Command arrays should be allocated dynamically by the parser with malloc
. Think carefully about how many allocations to make and how big they should be. Since command_parse
hands these command arrays over to the caller, the command_free
function is designed to make it easy for the caller to free the whole command array structure.
Coding Rules
Phase A: Array Indexing
In your first implementation, use as much array indexing as you wish.
Phase P: Pointer Arithmetic
Before beginning this phase:
- Make a copy of your Phase A implementation to keep for reference:
hg cp command.c command-a.c
- Commit your work:
hg commit
In the second phase, replace all uses of array indexing in command.c
with pointer arithmetic. Comment out and preserve lines that use array indexing, writing replacement lines with pointer arithmetic directly below. For example:
// x = a[5];
x = *(a + 5);
Feel free to restructure your code further when working with pointers only. It does not need to match the array code exactly if other idioms seem more direct.
In the final version of command.c
, the characters [
and ]
should appear only within comments (i.e., as the commented-out array-indexed version you used to start). You are not required to keep the array-indexed version around in comments in command.c
. However, you may find it helpful as additional documentation of your pointer arithmetic.
Grading
When grading, we will be looking for code that is:
- working according to the specification given here
- free of compiler warnings and memory errors (run
valgrind
!) - well-designed, well-organized, and well-documented
Workflow
Follow the workflow below, much of which also corresponds closely to the functionality of command_parse
as described in command.c
.
Programming with pointers is like brewing a potion. Both have the habit of exploding spectacularly in your face unless you brew them just right.
For each stage in the workflow: plan carefully before typing any code; implement one step at a time; test each step extensively before implementing the next; commit each tested step before implementing more. This careful process will save time by catching bugs early and saving working versions you can recover if things start going wrong later.
-
Add more hardcoded command array test cases in
command_test.c
. -
Implement
command_show
andcommand_print
first. They are only a few lines of code each. Test them on the constant, statically allocated command arrays incommand_test.c
. -
Implement
command_parse
in stages, testing each stage on several inputs and commiting a working version before continuing.-
Ignore detection of foreground/background commands to start. Assume commands contain no
&
characters. -
Count the number of words in
line
. -
Allocate the top-level command array.
-
Copy each word in
line
into a newly allocated string and save it in the command array. -
Add foreground/background status parsing.
- Commands containing
&
must place it as the last non-space character of the command line. - Other placements of
&
should causecommand_parse
to returnNULL
.
- Commands containing
-
-
Implement
command_free
. -
Convert functions from array indexing to pointer arithmetic.
Testing and Debugging
Programming with pointers in C is error-prone, even for experts. Haphazard print-based testing is an inefficient (and often ineffective) use of your time. Wield the tools in this section to catch errors sooner and faster.
Write extensive tests.
And other advice:
- Write tests driven from the
main
function incommand_test.c
, build your tests withmake ctest
for just this part, and run them with./ctest
. If you inspect theMakefile
’s rule forctest
, you will see it is produced by compilingcommand_test.c
andcommand.c
. - Do not add a
main
function incommand.c
– this will cause problems down the line. - Call
command_free
only on command arrays returned bycommand_parse
(not on statically declared command arrays).
Assert the truth.
Assertions are “executable documentation”: they document rules about how code should be used or how data should be structured, but they also make it easier to detect violations of these rules (a.k.a. bugs!). Use the assert(...)
tool in C by including assert.h
and asserting expected properties. For example, the provided code already includes code that asserts that the arguments to command_
functions are not NULL
. Thus if a NULL
argument is ever passed to these functions, an error message will be printed and execution will halt immediately. Detecting errors early like this (vs. crashing or corrupting data later wherever the code depends on this assumption) saves a lot of time. Add assertions to make the “rules” of your code clear wherever you make assumptions.
Use Valgrind.
Valgrind is an extended memory error checker. It helps catch pointer errors, misuse of malloc
and free
, and more. Run valgrind on your compiled program like this:
valgrind ./ctest
Valgrind will run your program and watch it execute to detect errors at run time. See the tools page for additional reference links.
Use GDB.
Use GDB to help debug your programs. (Review previous lab activities for tips.) When debugging programs with pointers, pay special attention to the pointer values your program generates. Inspect them like other variables or use the address-of (&
) and dereference (*
) operators at the gdb
prompt to help explore program state. See the tools page for additional reference links.
Does GDB tell you that it cannot display something due to compiler optimizations? If so, turn off compiler optimizations by using editing the CFLAGS = ...
line in your Makefile
to add -O0
at the end of the line. (That’s “space dash capital-oh zero”.) This will take effect the next time you use make
to compile your code.
Other Tips
- Compile from within Emacs by typing
M-x compile
and then editing themake
command to refer to the executable you want to compile (probablyctest
). Now you can click on errors to jump to the source line involved. Note: all the mini buffers where you type in commands, etc., have history accessible with up and down arrows. - Run GDB from with Emacs with
M-x gdb
, starting from a C buffer.- Click left margin to set/delete breakpoints.
- Use buttons at top to run, continue, step, next, navigate up and down the stack, etc.
C Function Declarations
In C, a function may be used only after (i.e., later in the file than) its declaration. This differs from Java, which allows you to refer to later methods. When declaring helper functions, you can do one of a few things to deal with this:
- Just declare your helper function before the functions that use it.
-
Write a function header earlier in the file and the actual definition later in the file. The function header just describes the name and type of the function, much like an interface method in Java. For example:
// A function header declares that such a function exists, // and will be implemented elsewhere. int helper(int x, int y); // Parameter names are optional in headers. int helper2(char*); void needsHelp() { // OK, because header precedes this point in file. helper(7, 8); helper2("hello"); } int helper(int x, int y) { return x + y; } int helper2(char* str) { return 7; }
-
If the functions would likely get used elsewhere, then put the header in a header file, a file ending in
.h
that contains only function headers (for related functions) and data type declarations. For example, if you added another general function (not just a helper function) for manipulating commands (not required in this assignment), it would be best to place a function header for it with the other function headers incommand.h
so that users of your command library can call it.Header files are included (essentially programmatically copy-pasted) by the
#include
directive you often see at the tops of C source files. Then these functions can be used and their implementations will later be found elsewhere.
-
If you run Linux already, it is likely you will have good luck installing tools directly. ↩