Pointers
Assignment: Pointers
- Assign: Sunday 5 March
- Checkpoint: Aim to complete at least Workflow steps 1 through 4 (including
command_show
) and 7b (the first versioncommand_parse
with word counting and top-level array allocation) by Friday 10 March - Due: Tuesday 21 March
- Policy: Pair graded synthesis assignment
- Partner search: Find a partner here.
-
Code:
cs240 start pointers
(if this doesn't work, usecs240.s23 start pointers
- Submit:
git commit
andgit push
your completed code. - Reference:
Contents
- Assignment: Pointers
- Overview
- Setup
- Tasks
- Preparatory Exercises
- Command Structure
- Specification
- Implementation Workflow
- Compiling and Testing
- Debugging
- Submission
- Grading
- More Tips
- Extra Fun Extensions
- License
Overview
The main task of this assignment is to implement a shell command parser as a small library in C. The shell is the program that reads and interprets the commands you type at a command line terminal. This assignment focuses on the first step of such a program: splitting a command line into its meaningful parts.
The parser takes a command line, a single string, and converts it to
a command array, an array of strings representing the command and
its space-separated arguments. The string split
function available
in many programming languages’ standard libraries would accomplish
most of the task, but in this assignment you will essentially implement split
at a low
level. We prohibit the use of string manipulation functions from the
C standard library to force you to confront how strings are
implemented and manipulated in memory.
Goals
- To become familiar with the byte-addressable memory model through pointers and arrays at the C programming language level of abstraction.
- To understand the link between pointer arithmetic and array indexing and representation.
- To practice principled reasoning about program execution, using assertions, testing, and structured debugging.
- To practice using tools for memory safety checking and debugging.
- To have at least one aha! moment in appreciating how storage abstractions in your favorite programming language relate to (and hide!) details of the byte-addressable memory model.
Advice
Do true pair programming.
We strongly recommend working in pairs on this assignment. (Find a partner using this doc. Assuming you work with a partner, do your programming together, sitting in front of one screen. Take turns driving. If both partners will use a copy of the code, use the Team Workflow.
Programming with C pointers has a tendency go wrong in entertaining or painful ways unless you practice careful discipline in programming.
For each stage in the suggested workflow:
- Plan carefully before typing any code.
- Implement one step at a time (with useful assertions).
- Test each step extensively, including with tools for detecting pointer errors, before implementing the next.
- Commit each tested step before implementing more.
This careful process will save time by catching bugs early and saving working versions you can recover if things start going wrong later.
Time Reports
According to self-reported times on this assignment from a previous offering:
- 25% of students spent <= 7 hours.
- 50% of students spent <= 9 hours.
- 75% of students spent <= 12 hours.
Setup
Get your repository with cs240 start pointers
, whether
working alone or with a teammate. (If the cs240
command gives an
error, instead use cs240.s23
.)
Your repository initially contains the following files:
- For the command library implementation:
command.c
: a file where you will implement a command parsing library for simple shell commandscommand.h
: a C header file describing the API of the command parsing library incommand.c
- For testing, demonstration, and automation:
command_lines.c
: a file where you will add test cases for your command parser to be used by all testing driverscommand_demo.c
: a simple demonstration of how the command parsing library functions are used, also useful for incremental testing during developmentcommand_show_test.c
,command_free_test.c
,command_parse_test.c
: drivers for testing parts of the libraryMakefile
: recipes to compile the various parts
Compile all command-parsing code and test drivers with the command
make
. This produces several executables for different testing
purposes during development. Run tests with
individual test drivers. Some tests automatically
report failures; others require manual inspection of results.
You must use a CS 240 GNU/Linux computing environment for CS 240 code assignments.
Tasks
You will modify command.c
to implement a small library for parsing
shell commands with the interface declared in
command.h
and the functionality described by the
Specification section below.
char** command_parse(char* line, int* status);
void command_show(char** command);
void command_free(char** command);
The core of your library is the command_parse
function, which takes
a command line – a string capturing the command typed in the
terminal – and parses it to produce a command array – an array of
strings capturing the individual words of the command line. You will
implement the command_parse
function, plus three other simple
functions for displaying and freeing command arrays.
A few provided test drivers include logic to help
automate testing of your parsing library. These drivers all use
command line test cases from command_lines.c
. You will add several
additional command line test cases for robust testing of your command
parsing library implementation.
Grading considers design, documentation, style, correctness, and performance.
The remainder of this document describes:
- Preparatory exercises to help you navigate and plan the assignment.
- The definition of command lines and command arrays.
- The specification of the functions you must implement.
- The suggested implementation workflow.
- Documentation of how to compile and test your implementation.
- Tools and strategies for debugging.
- The grading criteria.
- Additional tips.
Preparatory Exercises
Lab covers this!
The lab activities leading up to this assignment cover the necessary preparation. If you need to review, here are the most relevant parts to review.
Review the lab exercises for Lab06/07 on Pointers/Arrays in C In particular:
- Understand pointer types and operations.
- Review notations for representing memory.
- Review basic pointer programming skills, including how to use idiomatic pointer style to eliminate array indexing in your C programs.
- Review how to use
gdb
andvalgrind
for debugging and testing your programs.
The instructions for this assignment are rather long and contain lots of details, but it’s important to first at least skim through all the instructions in order to understand what you’re asked to do. At the very least, read enough of the assignment instructions to answer all of the following questions before proceeding to any other part of the assignment. If you are working with a partner, you and your parter should consult with each other about the answers to these questions to make sure you agree about all the answers.
- Can spaces appear at the start or end of command lines?
- Is the
&
character ever considered part of a command word? - Can a command line with zero
&
s be valid? - Can a command line with two or more
&
s be valid? - Is a command line that consists of just zero or more spaces valid?
- Is the command line consisting of just
&
valid? -
You can answer questions like the previous two by:
- adding new test command lines to which file?
- invoking which command to show the expected outputs for each test command line?
-
These questions involve the
status
of a command line:- For a valid command with zero ampersands, what should be stored in
status
? - For a valid command with one ampersands, what should be stored in
status
? - For any invalid command, should anything be store in
status
?
- For a valid command with zero ampersands, what should be stored in
- To write code in idiomatic pointer style, it’s sufficient to replace
all occurrences of the array subscript notation
array[index]
by the pointer notation*(array + index)
. - If a string has length L, how many bytes should be
malloc
ed to store the string? - If a command line has n command words, how many bytes should be
malloc
ed to store these words? - Is
command_parse
allowed to mutate the contents of a command line string it receives as an argument? -
Consider the lines printed by
command_show
for the command array that results from callingcommand_parse
on the command string"cp -pr ../tests files/test-cases"
,- How many total lines should be printed?
- How many total open/close curly braces (
{
and}
) should be printed? - How many total commas should be printed?
- How many total double quote characters
"
should be printed?
- What tool is used to check for memory safety violations?
-
According to the suggested workflow
- What is the first file that you should edit?
- Which of the three functions in
command.c
should you implement first? - When implementing
command_parse
in stages, should you handle&
characters in the input in the first stage or the last stage? - In the first stage of implementing
command_parse
, you should not try to populate the command array with the correct strings. What string(s) should you use instead? Why? - What two
make
commands should you execute to test the final two implementation stages ofcommand_parse
?
Command Structure
Software driven by a command line interface (CLI), such as most software used in CS 240, interprets textual commands typed by a user at a command prompt. To interpret a command, a system must extract the command’s parts to understand its structure. In this assignment, your command parsing library will convert between two representations of simple multi-word commands:
- A command line is a string capturing the full text of a command, just as the user types it in a terminal.
- A command array is an array of strings, where each string captures one meaningful token in the command, in order.
The remainder of this section defines the structure of commands and these two representations as used in this assignment.
Command Lines
A command line is a C string containing zero or more
space-separated words, with an optional single background status
indicator character, '&'
, after all words.
- Like all C strings, the command line is terminated by the null character
'\0'
. - Spaces may appear anywhere in a command line.
- Word characters include all printable characters except ampersand (
'&'
) and whitespace characters, which include space (' '
), tab ('\t'
), newline ('\n'
), and carriage return ('\r'
). - A command word is a contiguous sequence of word characters delineated on each end by a non-word character or the beginning or end of the string.
- Ampersand (
'&'
) is a special status indicator character, not a word character.- In a valid command, ampersand (
'&'
) may appear at most once, and only zero or more spaces after the final word, followed only by zero or more spaces to the end of the string. Any other placement of ampersand is invalid.1 - A valid command containing no ampersand is a foreground command.
- A valid command containing one ampersand is a background command.
- In a valid command, ampersand (
Command Line Examples:
Here are two typical command lines:
- The command line
"ls -l cs240-pointers"
indicates a foreground command containing the three command words"ls"
,"-l"
, and"cs240-pointers"
. - The command line
"emacs cs240-pointers/command.c &"
indicates a background command containing the two command words"emacs"
and"cs240-pointers/command.c"
.
The definition above allows any spacing. The following lines are all
valid command lines indicating the foreground command containing the
three words "ls"
, "-l"
, and "cs240-pointers"
:
"ls -l cs240-pointers"
"ls -l cs240-pointers "
" ls -l cs240-pointers "
The definition above requires that, if present, ampersand ('&'
) must
be the last non-space character of the command line. It may appear
adjacent to or separate from the last word. Regardless of placement,
ampersand ('&'
) is not a word character and is never part of any
command word. For example, these valid command lines all indicate
the background command containing the two words "emacs"
and
"cs240-pointers/command.c"
.
"emacs cs240-pointers/command.c &"
"emacs cs240-pointers/command.c&"
" emacs cs240-pointers/command.c & "
" emacs cs240-pointers/command.c& "
The following are some examples of invalid command lines, with ampersand misused:
"&uhoh"
" & uh oh"
"uh & oh"
"uh& &oh"
"uh oh & &"
" uh oh && "
Command Arrays
The parser converts a command line into a command array, an array of strings representing the words of the command line, in order, terminated by a NULL
element. Recall that a string in C is not a special type; it is just an array of char
s terminated by a null character ('\0'
). A command array is thus an array of pointers to arrays of char
s and has the type char**
. All arrays in this structure must be null-terminated, using the right notion of nullness for each array.
'\0'
is not NULL
or 0
'\0'
is the null character / null byte. As a character (char
), its size is one byte.NULL
is the null pointer / null address. As an address (pointer value), its size is the size of on one machine word. In the 64-bit architecture used this semester, the size of a pointer / machine word is 8 bytes.0
is the zero integer. As an integer, its size is four bytes.
Do not mix them up!
'a'
is not "a"
.
- A literal character (
char
) value in C is written in single quotes:'a'
. Its size is one byte. - A literal string value is in C is written in C in double quotes:
"a"
. Its size is the size of a pointer to the first character of the string. The size of a pointer this semester is 8 bytes.
Do not mix them up!
Comparing 'a' == "a"
will always yield a false result. 'a'
is the
ASCII encoding of the letter a
, a one-byte value of type char
.
"a"
is the address of the start of the '\0'
-terminated
string that begins with the letter a
somewhere in memory; this address
is an an 8-byte value of type char*
.
Here is an example command array for the command line string "ls -l cs240pointers"
:
Command Array: Null-Terminated Strings
Index Contents (stored elsewhere in memory)
+----------+
0 | ptr *----------> "ls"
+----------+
1 | ptr *----------> "-l"
+----------+
2 | ptr *----------> "cs240-pointers"
+----------+
3 | NULL |
+----------+
Here is the same array drawn another way and showing how the array is arranged in memory, with each element’s offset from the base address of the array. Addresses grow left to right, and are assumed to be 64 bits (8 bytes), so indices are related to offsets by a factor of 8.
Index: 0 1 2 3
Offset: +0 +8 +16 +24 +32
+-------+-------+-------+-------+
Contents: | * | * | * | NULL |
+---|---+---|---+---|---+-------+
| | |
V V V
"ls" "-l" "cs240-pointers"
Although we draw “strings” in the above pictures, this is an abstraction. Each string is actually represented by a '\0'
-terminated array of 1-byte characters in memory. Since each element is one byte, the addresses of adjacent characters differ by 1 and the offset is identical to the index. Below is the memory representation of the string "ls"
:
Index: 0 1 2
Offset: +0 +1 +2 +3
+-----+-----+-----+
| 'l' | 's' |'\0' |
+-----+-----+-----+
Note: Practice reading memory diagrams every which way. Pay attention to what direction addresses grow. We are intentionally using different conventions in different drawings to help you get used to thinking on your feet…
Specification
You must write three functions (plus any necessary helper functions) supporting command parsing in command.c
according to the headers in command.h
.
If you write helper functions, be sure to read about how C function declarations work. We give a plan for implementation and testing below.
Function Specifications
-
char** command_parse(char* line, int* status)
Parse the command line string,
line
, and:- If the command line is valid:
- Store
0
(false) in memory at the address given by the pointerstatus
if the command is a background command or store1
(true) at this location if the command is a foreground command. - Build a command array containing the words of the command line, in order, and return its base address.
- Store
- If the command line is invalid, return
NULL
without setting foreground/backgroundstatus
.
The
command_parse
function must not mutate the memory representing the original command line string. - If the command line is valid:
-
void command_show(char** commandArray)
Print the structure of a
commandArray
according to the following format to aid in debugging and data inspection. You will need to learn how to use printf.The output should use the following format, which makes clear exactly what strings the command array holds. For example, this C code:
int status_here; command_show(command_parse("ls -l cs240-pointers ", &status_here));
Should cause
command_show
to print this text:{ "ls", "-l", "cs240-pointers", NULL }
Note that the use of added quotes in the output format distinguishes a correct command array where all spaces are stripped away, such as above, from an incorrect command array that contains some spaces in words, as below:
{ "ls ", "-l ", "cs240-pointers ", NULL }
More generally, the output of
command_show
is the following sequence of lines, each ending with a newline character ('\n'
):- An initial line containing only a single left curly brace
{
. - One line for each word in the command array, in order, where:
- The line starts with two spaces
- a double quote
"
; followed by - the characters of the command word in sequence; followed by
- a double quote
"
; followed by - a comma
,
.
- The line starts with two spaces
- A line containing two spaces
NULL
, corresponding to the null-pointer terminator at the end of the command array. - A final line containing only a single right curly brace
}
. This line should end with a newline character, just like all the others.
This is valid C array initializer syntax. In other words,
command_show
prints a valid C source code description of the command array it is given, as long as the command array words do not contain characters in need of escape:"
('"'
),\
('\\'
), tab ('\t'
), carriage return ('\r'
), newline ('\n'
), etc. We will not test command lines that contain such characters. You are welcome to extend yourcommand_show
implementation to properly escape them in its output, but this is not required. - An initial line containing only a single left curly brace
-
void command_free(char** commandArray)
Free all parts of the given
commandArray
array structure that were allocated bycommand_parse
, including the individual command word strings and the top-level array.
Memory Rules
Naturally, your code should be free of memory safety violations (as
detected by the valgrind
memory error detector):
- No use of unallocated, uninitialized, or freed memory contents.
- No out-of-bounds accesses.
free
only pointers to allocated blocks returned bymalloc
, and only once per block.
In addition, your code must follow these stricter memory allocation rules:
- Clients of the command library own and manage the memory
representing command line strings.
- Command library functions must never free nor mutate command line strings.
- Clients may free or mutate command lines at any time outside a call to command library functions.
- The
command_parse
function must allocate command array structures dynamically withmalloc
and return them to the client code that invokes the library functions. Once returned, these structures are owned by the client. You may assume that:- Clients will not mutate command array structures returned by
command_parse
. - Clients will not
free
any parts of command array structures returned bycommand_parse
, except throughcommand_free
. - Clients will call
command_free
to free an allocated command array (returned bycommand_parse
) at most once, at any time outside a command library function.
- Clients will not mutate command array structures returned by
- Only
command_parse
may allocate memory and it must allocate no more memory than is necessary to represent the final command array structure at the time it is returned. This rule precludes temporary allocations. - Assuming that the client code eventually calls
command_free
on every command array returned bycommand_parse
, all memory allocated by the command library functions must be freed.
Coding Style Rules
Follow this general C style guide as a baseline, along with the following specific rules:
- Do not use any C standard library string functions in
command.c
. You must define all string processing from scratch. - For full credit, eliminate array indexing syntax in this assignment. Use idiomatic pointer style instead.
- Document in comments how your code implements the specifications
provided, connecting your C implementation to the specification.
- Our sample solution for
command_parse
is under 70 clean lines of code and 20–30 lines of comments.
- Our sample solution for
- Document pre- and post-conditions and use assertions to check assumptions where reasonable.
- Use well-scoped helper functions where reasonable.
- Clean solutions are possible with and without helper functions.
- If using helper functions, match them to well-defined sub-problems with clear arguments and results.
- Be sure to order function headers properly.
- Prefer simple efficient code over complicated code.
- Our sample solution for
command_parse
is under 70 clean lines of code and 20–30 lines of comments.
- Our sample solution for
- Ensure your code is free of compiler warnings and compiler errors.
- Ensure your code is free of run-time errors, including memory errors
such as segmentation faults, as well as more subtle memory safety
violations and leaks as detected by
valgrind
.
Idiomatic Pointer Style
For full credit, your submitted code must use only pointers and pointer arithmetic, with no array indexing notation. We require this to get you to confront the real memory representation of arrays in this assignment. Outside this assignment, array indexing is clearer style in many cases.
A simple way to think with arrays but write with pointers is to use
*(a + i)
wherever you think a[i]
. However, this simple
transformation rarely matches an idiomatic pointer style,
so you should not simply replace each occurrence of
a[i]
by *(a + i)
.
An idiomatic loop over an array with array indexing typically uses an integer index variable incremented on each iteration. Keep the scope of the index variable (or “loop variable”) as small as possible for the task.
// pre: s is the address of a null-terminated string.
// post: replaces all characters in s by 'Z'
for (int i = 0; s[i] != '\0'; i++) {
s[i] = 'Z';
}
An idiomatic loop over an array with pointer arithmetic typically uses a cursor pointer that is incremented to point to the next element on each iteration. Keep the scope of the cursor pointer variable as small as possible for the task.
// pre: s is the address of a null-terminated string.
// post: replaces all characters in s by 'Z'
for (char* cursor = s; *cursor != '\0'; cursor++) {
*cursor = 'Z';
}
In the above example, the cursor pointer is explicitly named cursor
,
which we think is a good name to highlight its purpose. However,
in many real C programs the name cursor
might be replaced by
a shorter name like p
or ptr
.
Your final code should contain zero array[index]
operations.
Implementation Workflow
Follow this workflow to understand and implement command_parse
and
friends. You may find it easiest to write a solution that uses array
indexing at first, and then transform it to follow the coding style
rules later, once it works. After each useful
working step, commit your changes!
The workflow below shows the simplest way to test each step,
by calling make
with a particular target.
This is a shorthand for calling make
on a different target
and then calling valgrind
on a particular test driver.
It’s a good idea to study the test driver section
to understand what each shorthand is doing and other options you
have for using the test drivers.
-
Finish all the prep exercises before starting to change any files.
-
Add several more valid command line test case strings in
command_lines.c
. -
Use
make test-expected
to show the expected results for all test command lines incommand_lines.c
. This will use a solution implementation to illustrate howcommand_parse
,command_show
, andcommand_free
should behave on each of the test cases incommand_lines.c
. Understand the results displayed for all the test cases before proceeding to the next step. -
Implement
command_show
. Test withmake test-show
at this stage. This will use a solution implementation forcommand_parse
to parse each of the test cases incommand_lines.c
. Then, for each of the resulting command arrays, it will display the printed output of your actual implementation ofcommand_show
on the command array followed by the expected printed output on the command array from the solution implementation ofcommand_show
.Compare the printed outputs carefully, and make sure that your actual output matches the actual output character for character! Pay special attention to braces, spaces, quotation marks, and commas!
-
Implement
command_free
. Test withmake test-free
at this stage. This will use a solution implementation forcommand_parse
to parse each of the test cases incommand_lines.c
, but will use your implementation ofcommand_free
to free the resultin command arrays and their contained strings.Study the output to see if it contains any
valgrind
error messages (these will begin with a notation like==29223==
, where the the particular number may vary) or memory leaks (listed under theLEAK SUMMARY:
section at the end of thevalgrind
output.Not quite sure what to do here? It’s fine to implement
command_parse
first and then come back to this step. -
Add many more valid and invalid command line test cases to
command_lines.c
to test all aspects of the specification extensively and adversarially. -
Implement and test
command_parse
in stages. Test each stage withmake test-parse
and commit a working version before continuing. The basic algorithm should proceed (and be developed) in this order:- It is strongly recommended that you initially ignore handling
&
in commands, since this will distract you from your key focus incommand_parse
, which is to correctlymalloc
storage for (1) the command array and (2) its string elements. It also means you can initialy ignore validity testing for commands, since without&
, all commands are valid. So, for the time being, comment out all test cases incommand_lines.c
that use&
. -
Implement a version of
command_parse
that counts the number of words n inline
and returns a command array that has the correct length, but whose first n slots are all populated with your favorite string constant, say"cat"
. E.g., for the command line"ls -l cs240-pointers"
, the resulting command array would be displayed bycommand_show
as:{ "cat", "cat", "cat", NULL }
- In this version, make sure that
status
is set to 1 (i.e., it’s a foreground command) before the command array is returned. - Test this version of
command_parse
viamake test-parse
.- This will use the solution versions (not your versions) of
command_show
andcommand_free
on the resulting command array. - Note that
valgrind
will display errors whenfree
is called on your favorite string. You should ignore thesevalgrind
errors in this version.
- This will use the solution versions (not your versions) of
- In this version, make sure that
-
Next, modify the previous version of
command_parse
so that it populates the command array with the correct strings, each of which is created bymalloc
. E.g., for the command line"ls -l cs240-pointers"
, the resulting command array should be displayed bycommand_show
as:{ "ls", "-l", "cs240-pointers", NULL }
Test this stage with both
make test-parse
andmake test-adversarial
. If your implementation is correct, you should not see anyvalgrind
errors in this version. - Finally, in
command_lines.c
, first uncomment all the test cases with&
, and then modify the previous version ofcommand_parse
so that it correctly handles inputs that contain one or more&
characters. In this version:- For all invalid commands,
command_parse
should returnNULL
without modifyingstatus
. - For all valid commands, make sure that
status
is set to the correct value before the command array is returned. - Test this stage with both
make test-parse
andmake test-adversarial
. If your implementation is correct, you should not see anyvalgrind
errors in this version.
- For all invalid commands,
- It is strongly recommended that you initially ignore handling
-
Once everything is implemented:
- Run
make test-demo
to test all of your functions together at once. - Comment out or remove any diagnostic printing you added during development. This is important because the diagonstic printing will confuse the autograder.
- Run
Compiling and Testing
To compile all test drivers, run make
.
You are writing a command parsing library, i.e., a collection of related useful functions. The library itself cannot be run directly; it is not a complete program. We provide several test driver programs that act as clients of the library, calling its functions. You will write several command line test cases to be used by these drivers.
Test Cases
Develop your own large suite of test cases by adding command line
strings to COMMAND_LINES
in command_lines.c.
By large, we mean
perhaps dozens. Be adversarial. Create test cases for as many corner
cases as possible in both valid and invalid commands. We will not
grade your test cases themselves, but preparing and using them will
help you ensure your code is correct and efficient. All test
drivers use this array as their source of test
commands.
Test Drivers
Each provided test driver is useful for different testing tasks. All
driver executable files end with .bin
(for binary
executable). Each driver has two variants:
- The
actual
variant uses yourcommand.c
library. Run this variant to test your code. - The
expected
variant uses a complete and correct (but hidden) solution implementation of the command library. Use it to see expected behavior or compare your code’s behavior to expected behavior.
For both variants, the executable can be run in two ways:
-
Without any arguments, the executable will run its test on each command line from
COMMAND_LINES
incommand_lines.c
. Example:$ valgrind ./command_demo_actual.bin
-
Given exactly one argument, the executable will run its on test using the single argument as a command line. To test on a command line with multiple words, be sure to quote it so it is captured as a single string, rather than one argument per word. (Talk about meta, we have to be careful with how your terminal parses your command line to make sure we are passing the string we expect to your command line parser!)
$ valgrind ./command_demo_actual.bin "Hello world!"
In general, always run these executables under valgrind
to help catch memory errors early. The drivers are listed below with
their purpose, methods for checking correctness, source
code file, command to compile just that driver, and the names of
the relevant executables.
- demo:
- Purpose: demonstration of library functions, testing
command_parse
before completion- This is a simple driver that is easy to modify and extend
by changing the the function
command_demo
in thecommand_demo.c
file. For example, you might invoke helper functions directly in addition tocommand_parse
. If doing so, you must copy their function headers tocommand.h
to make them visible fromcommand_demo.c
.
- This is a simple driver that is easy to modify and extend
by changing the the function
- Correctness: manually inspect outputs, check the output of
valgrind
for memory errors or memory leaks, compare total memory allocations reported byvalgrind
for the actual and expected versions, optionally add custom additional checks to the driver code - Source:
command_demo.c
- Compile:
make demo
- Executables:
./command_demo_actual.bin
and./command_demo_expected.bin
- Shorthands:
make test-expected
is equivalent tomake demo
followed byvalgrind ./command_demo_expected.bin
.make test-demo
is equivalent tomake demo
followed byvalgrind ./command_demo_actual.bin
.
- Purpose: demonstration of library functions, testing
- show:
- Purpose: testing
command_show
- Correctness: manually inspect outputs, check
valgrind
for memory errors - Source:
command_show_test.c
- Compile:
make show
- Executables:
./command_show_test_actual.bin
- Shorhand:
make test-show
is equivalent tomake show
followed byvalgrind ./command_show_test_actual.bin
.)
- Purpose: testing
- free:
- Purpose: testing
command_free
- Correctness: check the output of
valgrind
for memory errors and memory leaks - Source:
command_free_test.c
- Compile:
make free
- Executables:
./command_free_test_actual.bin
and./command_free_test_expected.bin
- Shorhand:
make test-free
is equivalent tomake free
followed byvalgrind ./command_free_test_actual.bin
.)
- Purpose: testing
- parse:
- Purpose: testing a complete
command_parse
implementation undervalgrind
- Correctness: check reports of mismatches between actual and
expected behavior, check
valgrind
for memory errors and memory leaks - Source:
command_parse_test.c
- Compile:
make parse
- Executables:
./command_parse_test_actual.bin
and./command_parse_test_expected.bin
- Shorhand:
make test-parse
is equivalent tomake parse
followed byvalgrind ./command_parse_test_actual.bin
.)
- Purpose: testing a complete
- adversarial:
- Purpose: testing a complete
command_parse
implementation- This driver replaces
malloc
andfree
with correct–but more adversarial–versions that make buggycommand_parse
implementations more likely to crash or cause obvious data corruption. Basically, allocated blocks are pre-filled with'%'
characters (byte value0x25
) and the immediately adjacent (out of bounds) chunks of memory are filled with'@'
characters (byte value0x40
) before the allocated block and filled with'^'
characters (byte value0x5E
) after the allocated block, while freed blocks are mostly filled with'$'
characters (byte value0x24
). This makes use of values from uninitialized ('%' == 0x25
), out of bounds ('@' == 0x40
and'^' == 0x5E
), and recently freed ('$' == 0x24
) locations less likely to get “lucky” by finding'\0'
and more likely to cause obvious corruption or segmentation faults. - It is reasonable to run this driver under
valgrind
, butvalgrind
may report fewer errors than for the other test drivers due to the extra manipulation that the driver is doing.
- This driver replaces
- Correctness: check reports of mismatches between actual and
expected behavior, check for crashes and obvious data
corruption, check
valgrind
for memory errors and memory leaks - Source:
command_parse_test.c
- Compile:
make adversarial
- Executables:
./command_adversarial_test_actual.bin
and./command_adversarial_test_expected.bin
- Shorthand:
make test-adversarial
is equivalent tomake adversarial
followedvalgrind ./command_adversarial_test_actual.bin
.
- Purpose: testing a complete
Assertions
Assertions are “executable documentation” or “executable
specifications”: they document
rules about how code should be used or how data should be structured,
but they also make it easier to detect violations of these rules
(a.k.a. bugs!). Use the assert(...)
statement in C by including
assert.h
and asserting expected properties.
The provided starter code already includes code that asserts that the arguments to
command_
functions are not NULL
. command_parse
begins with
assert(line != NULL);
assert(status != NULL);
and both command_show
and command_free
begin with
assert(commandArray != NULL);
If a NULL
argument is ever passed to these functions, an error message will be printed and
execution will halt immediately. Detecting errors early like this
(vs. crashing or corrupting data later wherever the code depends on
this assumption) saves a lot of time.
Add assertions to make the “rules” of your code clear wherever you make assumptions.
Note that assertions are clearer when the declaration involves a boolean expression.
Although assert(line != NULL);
and assert(line)
have the same behavior,
the first version is preferred because it explicitly indicates that
the assertion verifies that line
is not NULL
.
Debugging
valgrind
and gdb
are the tools of choice.
(See the Pointer Practice
exercises, which introduce their usage.) Avoid print-based
debugging. Backtrack from symptoms to cause.
valgrind
Valgrind is an extended memory error checker. It helps catch pointer
errors, misuse of malloc
and free
, and more. Run valgrind on your
compiled program like this: valgrind
./command_demo_actual.bin
. (Replace ./command_demo_actual.bin
with
any other executable to check others.) The valgrind
tool will run
your program and observe its execution to detect errors at run time.
See the tools page for
additional reference links.
Always run your code under valgrind
during development and testing
to help catch memory errors early.
gdb
Use GDB to help debug your programs when you need more information
than valgrind
offers. Review previous lab activities
for tips. When debugging programs with pointers, pay special
attention to the pointer values your program generates. Inspect them
like other variables or use the address-of (&
) and dereference (*
)
operators at the gdb
prompt to help explore program state.
Documentation and quick reference for gdb
:
- CS 240 GDB reference sheet (pdf, txt)
- GDB Quick Reference card
- GDB manual
- For documentation from the command line, type
help
at thegdb
command prompt, or typeman gdb
, orinfo gdb
in the shell. Some people also like to rungdb
under gdb-mode in Emacs.
Does GDB tell you that it cannot display something due to compiler
optimizations? If so, turn off compiler optimizations by adding
-O0
(that’s “space dash Capital-Oh zero”) at the end of the CFLAGS
= ...
line in your Makefile
. The CFLAGS
variable
here determines what flags are passed to the compiler. -O0
enables
level zero of optimization (i.e., none). Run make clean
and then make
to
compile your code again without optimizations.
Backtracking
Did your program hit an explicit error? An assertion failure? A segfault? Debug by collecting evidence and backtracking methodically to find the cause of this symptom.
- Localize the crash. Exactly what line of code, what
operation in this line, what value(s) used by this operation
manifested what error and crashed? This symptom does not
necessarily indicate the cause, but it marks where to begin the
search.
valgrind
will provide most of this information for every memory error.- If more information is needed, use
gdb
torun
the code until it crashes, then find the line that crashed (and the context in which it executed) withbt
,backtrace
, orwhere
. - Inspect the most relevant line of code and the error report to determine what the error means. Was it a segfault? If it was an assertion failure, what was the assertion checking?
- Inspect the arguments shown in the backtrace or
print
variables to determine what ill-formed pointer was dereferenced to cause a segmentation fault or what illegal values failed an assertion.
- Trace backward through the dependences of this broken
operation to the original logic error.
- Invalid values: Did any of this operation’s arguments hold
invalid values? Did this operation load any invalid values from
memory?
- What operations produced these invalid values?
- This is easy to answer for invalid arguments.
- For invalid values loaded from memory, consider what earlier operations (perhaps in completely separate function calls) may have saved (or failed to save) this value.
- Continue tracing backwards, treating these producer operations as broken operations.
- What operations produced these invalid values?
- Invalid choices: Was it invalid to apply this operation here
given correct arguments (or regardless of the arguments)?
- What control flow decision led to execution of this operation?
- Is the logic of the decision correct?
- If so, continue tracing backwards treating the control flow decision as a broken operation.
- Callers: If using helper functions, you may need to
trace back beyond the function where the crash happened.
- Get a
backtrace
to see where the current function was called. (Don’t just assume you know which function called this one. Get the truth.) - If you want to inspect local variables in functions that
called this one, use
up
anddown
ingdb
to step up and down the call stack, thenprint
what you want to see.
- Get a
- Invalid values: Did any of this operation’s arguments hold
invalid values? Did this operation load any invalid values from
memory?
- Minimize the input. Try to write a small command line that captures the essence of the command line on which your command parser encounters an error. A smaller input is easier to reason about in full.
Submission
Before submitting, disable any diagnostic printing in command.c
. Only
command_show
should print, as specified.
Submit: The course staff will collect your work directly from your hosted repository. To submit your work:
-
Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
-
Make sure you have committed your latest changes. (Replace
FILES
with the files you changed andMESSAGE
with your commit message.)$ git add FILES $ git commit -m "MESSAGE"
-
Run the command
cs240 sign
to sign your work and respond to any assignment survey questions.$ cs240 sign
(If this encounters an error, instead execute
cs240.s23 sign
.) -
Push your signature and your latest local commits to the hosted repository.
$ git push
Confirm: All local changes have been submitted if the output of
git status
shows both:
Your branch is up to date with 'origin/main'
, meaning all local commits have been pushednothing to commit
, meaning all local changes have been committed
Resubmit: If you realize you need to change something later, just repeat this process.
Grading
Your grade will be derived as follows:
- Functional Correctness (55 points):
- Your code behaves according to the specification.
- Memory Safety and Efficiency (20 points):
- Your code is free of memory safety violations.
- Your code allocates no more memory than strictly necessary.
- Your code does not suffer memory leaks (as detected by
valgrind
).
- Design, style, documentation, and test cases (25 points):
-
Your
command_lines.c
file contains sufficient test cases for normal and boundary cases for valid and invalid commands. (5 points)You should extend
command_lines.c
with your own large suite of test inputs and run test drivers undervalgrind
. Careful and extensive testing will help you check that your code meets the specification and runs free of memory safety violations and memory leaks. Take this seriously. - Your code in
command.c
is well-organized, easy to read, and uses clear and uncomplicated control flow. Helper functions are used to simplify code where appropriate. (10 points) - Your code uses idiomatic pointer style rather than using array indexing. (5 points)
- Your code is well documented, using appropriate comments to highlight aspects that may not be obvious from the code itself. (5 points)
-
We compute the correctness, memory safety, and efficiency components
of your grade by running your code on a private suite of test inputs
under valgrind
to detect memory safety violations,
measure memory allocation, and detect memory leaks.
More Tips
printf
The standard way of printing output in C is the printf
function from
the stdio
library. The printf
function accepts a format string
(a template) and additional arguments to format according to each of
the format specifiers (holes in the template with directions about
how to fill them given the right kind of value). Use this linked
documentation/introduction to get started with
printf
.
Here’s an example:
void display_line_info(char* command_line) {
// Note: strlen computes the length of a string,
// but you may not use it in this assignment.
printf("Command line is \"%s\" (%d characters).\n",
command_line, strlen(command_line));
}
Calling display_line_info("Hello world!")
would print this output:
Command line is "Hello world!" (12 characters).
Some important things to know when using printf
:
printf
prints exactly what you tell it to, not more: you must use a newline character ('\n')
if you want a new line to be printed.- Instead of figuring out how to build a larger string just to print
it, either use the
printf
format string to do the building or use multipleprintf
calls with smaller individual strings. -
Even when printing a preexisting string value, an explicit format string should generally be used:
char* my_string = computed_somehow(); // Use this: printf("%s", my_string); // Not this: printf(my_string);
- Since displaying individual characters to the screen is expensive,
standard output (
stdout
), the channel to whichprintf
prints to display in the terminal, is buffered: individual pieces of output are gathered up into larger chunks and flushed to the display all at once. This improves efficiency. For most printing tasks, this implementation detail is entirely imperceptible and not necessary to understand. However, if you are debugging code that interleaves smallprintf
s with other operations that could crash or causevalgrind
errors (e.g. memory operations), there’s a chance you will be confused if you are not aware of it.- The newline character (
'\n'
) typically flushes all buffer text. In other words, printing a newline causes any text that has already been printed–but has not yet appeared in the output–to appear in the output immediately, in the order it was printed. - Strings without a newline may not appear immediately – perhaps only when the next newline is printed.
- To flush the buffer explicitly at any time, you can use
fflush(stdout);
. This is not needed for a workingcommand.c
implementation, but if you run into seemingly truncated output while debugging, well-placedfflush
calls might help clarify things during debugging. In general, this is a good reason to use a debugger instead of print-based debugging.
- The newline character (
C Function Headers and Declaration Order
In C, a function is allowed to be used only after (i.e., later in the file than) its declaration. This differs from Java, which allows you to refer to later methods. When declaring helper functions, you can do one of a few things to deal with this:
- Just declare your helper function before the functions that use it.
-
Write a function header earlier in the file and the actual definition later in the file. The function header just describes the name and type of the function, much like an interface method in Java. For example:
// A function header declares that such a function exists, // and will be implemented elsewhere. int helper(int x, int y); // Parameter names are optional in headers. int helper2(char*); void needsHelp() { // OK, because header precedes this point in file. helper(7, 8); helper2("hello"); } int helper(int x, int y) { return x + y; } int helper2(char* str) { return 7; }
-
If the functions would likely get used elsewhere, then put the header in a header file, a file ending in
.h
that contains only function headers (for related functions) and data type declarations. For example, if you added another general function (not just a helper function) for manipulating commands (not required in this assignment), it would be best to place a function header for it with the other function headers incommand.h
so that users of your command library can call it.Header files are included (essentially programmatically copy-pasted) by the
#include
directive you often see at the tops of C source files. Then these functions can be used and their implementations will later be found elsewhere if compiled correctly.
Extra Fun Extensions
This part is not required. Extra Fun indicates ideas for optional extensions or challenges to try if you are having fun after completing the main assignment. Instructors will be happy to discuss ideas or review your solutions, but Extra Fun means what it says: it’s for extra fun, not extra credit.
Before trying either of these extensions:
- Finish and submit your work for the required parts with a descriptive commit message.
- Make a copy of your
command_parse
function and rename the copycommand_parse_standard
. - Replace entire the body of the
command_parse
function withreturn command_parse_standard(line, status);
. Now,command_parse
is just a wrapper for whatever version you are working on.
After completing a working extension, make sure to test again before submitting any work on the extensions.
Extra Fun Extend the Command Language with Quoting
Start by making a copy of command_parse_standard
and renaming it
command_parse_quotes
, then changing the body of command_parse
to
call this new function.
Extend your command-line parser to support quoting with the
single-quote ('
) or double-quote ("
) characters. Note:
- After a starting quote character, all characters (including
ampersand
&
and space) until the matching close quote are part of the current word. - The delimiting quote characters are not part of the word.
- A quoted section immediately adjacent to a non-quoted word (with no intervening space) together form a single word.
- A literal quote character may be used as part of a quoted or
unquoted word by using the escape character
\
to prefix it.
Here are some examples using single-quote:
- Basic quoting example with the single-quote character:
- Command line:
- As a C string literal:
"echo 'hello && world' si'ngle word' &"
- When printed:
echo 'hello && world' si'ngle word' &
- As a C string literal:
- Command array:
"echo"
"hello && world"
"single word"
NULL
- Command status: background
- Command line:
- With escape characters to allow quote characters inside quoted elements:
- Command line:
- As a C string literal:
"echo 'Don\\'t unquote until here,' I\\'m sure."
- When printed:
echo 'Don\'t unquote until here,' I\'m sure.
- As a C string literal:
- Command array:
"echo"
"Don't unquote until here,"
"I'm"
"sure."
NULL
- Command status: foreground
- Command line:
Extra Fun Replace Multi-Pass Loop Code with Single There-and-Back-Again2 Recursive Code
Define a small family of recursive helper functions that accomplish
the processes of counting words, counting letters, allocating,
copying, etc., without using any loops. Use only recursion. Replace
the body of command_parse
with a single call to the top-level
recursive function you define.
There are many levels of progressively more sophisticated recursive formulations.
Your loop-based code probably made several repeated scans of the command-line string or its subsections in order to avoid additional allocations. In the most sophisticated recursive versions, it is possible to accomplish the entire parsing process with only a single pass over the command-line string, with exactly one recursive function call and one command-line character array access for each character of the command line (plus null terminator). Local variables at each layer of recursion can remember (as recursion progresses deeper) and later reuse relevant input characters and other information (as recursion returns from deeper levels). This is not entirely free: no extra heap space or command-line scans are required, but the call stack will store information about each recursive call in memory.
License
Pointers
by Benjamin P. Wood at Wellesley College is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Source code and evaluation infrastructure for Pointers are available upon request for reuse by instructors in other courses or institutions.
-
This formulation restricts placement of ampersand further than most shells, for simplicity. ↩
-
There and Back Again. Olivier Danvy and Mayer Goldberg. Fundamenta Informaticae, vol. 66, pp. 397–413. IOS Press. 2005. ↩