CS 240 Lab 6

Pointers in C

CS 240 Lab 6

In this lab, you will get practice working with C programs that use pointers, including predicting results of functions that use pointers, writing pointer code, and analyzing incorrect code.

You will also be introduced to some new tools which will be quite helpful in understanding and solving problems with your programs. The GNU debugger gdb allows you to see what is going on “inside” a program while it executes—or what it was doing at the moment it crashed. The debugger allows you to display values of variables and examine contents of memory, which will be handy in understanding the effect of your programs on the hardware of the system.

Another tool you will use is valgrind, an instrumentation framework for building dynamic analysis tools (pronounced like “grinned” not “grind” as it’s named after a gate in Norse mythology). The Valgrind module we will be using, called Memcheck, is for memory error detection.

Setup

On a lab computer booted to Linux, open VSCode.

To get all the files for this week’s lab, in the Terminal from the Linux command line enter:

wget http://cs.wellesley.edu/~cs240/s26/lab/starter/cmemory.zip
unzip cmemory.zip
cd cmemory

The cmemory folder contains lots of different practice files, some of which we’ll use in today’s lab.

Writing Code using Pointers

This section of the lab will give you some practice writing functions with pointers for string manipulation, which are very similar to some you will need for the next assignment.

Exercise 1:

Create a new file named arrayOps.c. Add the following code to the top:

// arrayOps.c
// Basic array operations to practice programming with pointers.
// <your names go here>

#include <stdio.h>  // gives us 'printf' for output
#include <assert.h>  // gives us 'assert' for testing
#include <stdlib.h>  // gives us 'malloc' and 'free'

Now we will work on a function called arraySum. This function should compute the sum of an array of integers. It will need two arguments: an array (i.e., a pointer-to-an-integer), plus a number that specifies the length of the array (unlike with strings, there’s no value we can naturally use to mark the end of an array that couldn’t appear as a normal value in the array).

To write this function, we’ll try out a method called test-driven development. This method involves starting the programming process by writing tests to ensure we understand what our function should do, and then writing documentation describing what it will do, and finally the last step is to write the code.

This method should improve your code quality, and it could save you time, although it does take some discipline not to jump right into writing code.

To start with testing, add the following code to your program:

// Test arrays
int test1[] = {1, 2, 3};

// Test our functions.
int main(void) {
    assert(arraySum(test1, 3) == 6);
    printf("All tests succeeded.\n");
}

You haven’t written arraySum yet, but we do know what it should do. Add at least 3 more tests using assert to the main function, along with new test arrays.

What’s an edge case that you should probably test?

Example answer: The sum of an array with a single number in it might be weird. Even weirder would be the sum of an array with 0 numbers. Providing a different number for the length than the actual length of the array is also interesting; it should work fine if the length is not larger than the true length, but behavior will be unpredictable if it’s larger. Also, what if the array pointer provided is NULL instead of an actual array? You could return -1 in that case, or you could specify that your function is not designed to handle NULL and so people who use it shouldn’t give NULL as an argument.

Can you identify some ways that writing tests first can help speed up your programming process?

Example answer: One big factor is that thinking about edge cases up-front allows you to avoid complicated and error-prone rewrites when you realize there’s an edge case after writing 75% of the code. Another is that once you’ve written your code, if you forgot to handle something, you’ll immediately have failed test that gives you a concrete example of what doesn’t work, and it’s an example that you’re already familiar with since you just wrote it, plus you know what the expected correct behavior is. This speeds up the debugging process a lot, even compared to debugging with tests that someone else wrote.

Now that you’ve thought about edge cases and come up with some tests, add a comment above the main function describing what your arraySum function will do, including how it will handle any edge cases you came up with. Add the declaration for arraySum below your comment (as with all C functions, a declaration starts with a return type, then the function name, then arguments in parentheses with a type for each).

Why might writing a comment explaining the function make sense before you’ve written the function? How could it help speed up development?

Example answer:

Writing a comment in your own words helps you think through your strategy for the function, and as you’re writing the code it serves as a reminder of what it should do with the edge cases you identified when writing tests. Writing is thinking, and writing function documentation after you write your code gives up the opportunity to do some thinking about the design of the function before you’ve started coding.

As a concrete example, if you write “returns 0 if the input is NULL” or a similar description of an edge case, you can start your function by using an if statement to handle that case and get it out of the way. If instead you wrote the code without that case but later got caught by a test, it would be more effort to understand what the test was trying, figure out what you need to do in that case, and then add the check to your code.

Also: if you’re writing multiple functions that call each other, having documentation for each of them can save a lot of time, especially if they have edge cases that interact with each other.

Compare your code to the example below:

Example answer:

/*
 * Computes the sum of an array of integers. Arguments are the array and
 * its length. If the length is 0 or if the array pointer is NULL, this
 * will return 0.
 */
int arraySum(int *array, size_t length) {
}

You may have chosen a different way to handle 0-length or NULL arrays, which is fine, as long as you explained it. You might have also chosen int instead of size_t for the size argument; that’s okay, but will limit the size of array that can be processed (size_t is a type that’s designed to hold the size of a data structure in memory, and is an 8-byte unsigned type, whereas int is a 4-byte signed type).

Now it’s time to actually write the code. Fill in your arraySum function, then since you already have tests set up, in a terminal run:

make arrayOps.bin
./arrayOps.bin

This will build and then run your code. If you get any warnings or error messages from the make step, fix those first and re-run make before running the file.

If everything is working, you should see the “All tests succeeded.” message we put at the end of main. If one of your tests fails, you should see a message about an assertion failure. You might also see “Command terminated” or “Segmentation fault” indicating an error with memory (we’ll see later how to use gdb to debug these errors). You can add printfs to your code to help debug things if necessary.

Note that if you don’t see “All tests succeeded.” but you also don’t see any error messages, and you don’t get another terminal prompt, it’s possible you’ve created an infinite loop. Hit control-C to force-stop the program, and try adding a printf to your loop to see if it’s infinite.

Here’s an example implementation you can compare to:

Example answer:

/*
 * Computes the sum of an array of integers. Arguments are the array and
 * its length. If the length is 0 or if the array pointer is NULL, this
 * will return 0.
 */
int arraySum(int *numbers, size_t length) {
    int result = 0;
    // Handle NULL array
    if (numbers == NULL) {
        return 0;
    }
    // Iterate through the array adding each number to our result
    for (size_t i = 0; i < length; ++i) {
        result += numbers[i];
    }
    // Return result. Will still be 0 if length was 0
    return result;
}
```|]


### Exercise 2:

To practice **idiomatic pointer style** programming, try to rewrite your
`arraySum` function so that it:

1. Uses an `int *cursor` variable which starts out as a copy of the array
    pointer.
2. Does NOT use `[]`.
3. Uses a `while` loop instead of a `for` loop.
4. Adds to the `cursor` variable within the loop.
5. Stops the loop based on the distance between the cursor and the
    original array pointer.
6. Dereferences the cursor using `*` to get the value it points to.

Re-run the file to ensure that it still passes your tests, and use the
debugging techniques mentioned above if not.

Here's an example of the function written in idiomatic pointer style:

[_.][|
```c
/*
 * Computes the sum of an array of integers. Arguments are the array and
 * its length. If the length is 0 or if the array pointer is NULL, this
 * will return 0.
 */
int arraySum(int *numbers, size_t length) {
    int result = 0;
    // Handle NULL array
    if (numbers == NULL) {
        return 0;
    }
    int *cursor = numbers;
    // Keep going as long as our cursor isn't too far from the start yet
    while ((cursor - numbers) < length) {
        result += *cursor;  // add cursor's target to sum
        cursor += 1;  // advance cursor to next entry
    }
    // Return result. Will still be 0 if length is 0
    return result;
}

What’s in Memory?

In previous classes like CS 111 and CS 230 we’ve drawn abstract memory diagrams do show how values are related to each other in memory. In these diagrams we used arrows to represent how values referred to each other, now you know that those arrows are actually pointers, and we can reason exactly about how much memory each object takes up (for example, in C an int is 4 bytes, a pointer is 8, or a character is 1). This allows us to draw more precise diagrams. Here’s an example of the CS 111 style and the same information with more precision:

This is the memory diagram format we’ve used in previous classes:

A classic memory diagram, showing the following data: Variables A, B, and C each have their own small box. The box for A contains the tail of an arrow that points to a big box with two smaller boxes inside, labeled 0 and 1. The second small box inside this big box has ‘null’ written in it. The first box contains the tail of an arrow that points to another big box, this time with three smaller boxes inside. The smaller boxes in the second big box read ‘C’, ‘S’, and ‘\0’. The small box for B contains the tail of a third arrow, which also points to the second large box. The small box for C contains the tail of an arrow that points to a third big box. This big box contains three small boxes labeled 0, 1, and 2. Inside these three small boxes are the values 111, 230, and 240. 

Here is the same data in CS 240 lab style:

A new memory diagram for the same data. It has the same info as the table just ahead. This diagram is a stack of rectangles, with data values inside some and arrows originating from others, along with comments on the right and variable names on the left. The first slot has variable name “A” and contains a red arrow pointing to the slot below it, with comment “char **.” The second slot has no variable name, but contains a blue arrow that points to a slot three levels lower, merging with another blue arrow along the way. This slot’s comment is “char”. The third slot just contains the word ”NULL” with comment ”char  (null)”. The fourth slot has variable name “B” and is the origin of a blue arrow pointing which merges with the blue arrow mentioned above, both pointing to the slot below “B”. The comment is “char ”. The fifth slot is the destination of the blue arrows. It’s divided horizontally into 8 squares, with the values 0x43, 0x53, and 0x00 in the first three squares (the rest are blank). The comment here says ”CS” -> ‘C’, ‘S’, ‘\0’. The sixth slot has variable name C and holds a purple arrow pointing to the slot beneath it. The comment is ”int ”. The seventh slot (destination of the purple arrow) is divided into two horizontally, with values 111 and 230. The comment is “Two 4-byte ints”. The eighth (and final) slot is also divided horizontally, with 240 in the left half and nothing in the right half. Its comment is “Third 4-byte int”. 

This table represents the same data (with the same layout) as the new-style diagram, but it may be more accessible to people using a keyboard to navigate and/or using a screen reader.

Variable Data Comments
A Type char **. Points to start of next slot.
Type char *. Points to start of fifth slot
0x0 Type char * (null)
B Type char *. Points to start of fifth slot (directly below this one)
0x43, 0x53, 0x0 Three bytes ‘C’ = 0x43, ‘S’ = 0x53, and NUL = 0x0, at increasing addresses.
C Type int *. Points to start of slot below.
111, 230 Two 4-byte ints in this 8-byte slot
240 Third int in the array that starts above

Exercise 3:

Note that we have enough information to fill in even more of the diagram: each of the data slots has an address number, and the value of each of the pointer slots is just the address that it points to. In fact, the arrows are just helpful reminders, but the actual values of those slots are memory addresses. Here’s the same table, with addresses added. Fill in the value of each pointer slot below, using hexadecimal notation:

Variable Address Data Comments
A 0x00 Correct answer: 0x08 Type char **. Points to start of next block up.
0x08 Correct answer: 0x20 Type char *. Points to start of fifth slot at address 0x20
0x10 0x0 Type char * (null)
B 0x18 Correct answer: 0x20 Type char *. Points to start of fifth slot at address 0x20
0x20 0x43, 0x53, 0x0 Three bytes ‘C’ = 0x43, ‘S’ = 0x53, and NUL = 0x0, at increasing addresses.
C 0x28 Correct answer: 0x30 Type int *. Points to start of slot below.
0x30 111, 230 Two 4-byte ints in this 8-byte slot
0x38 240 Third int in the array that starts above

Note that in a real program, 0x00 would not be a valid address (it’s reserved for NULL), and it’s unlikely that the data would be packed so closely in memory. We have introduced some gaps, for example the variable C starts at address 0x28, but it could in theory have started at 0x23, immediately after the end of the “CS” string. That would have made for an awkward diagram (with its 8 bytes stretched across parts of 2 different slots) and in a real computer, pointer values are also usually aligned to multiples of 8 bytes for efficiency reasons.

To help you understand pointers better, consider the following code:

char *message = "Eel cafe does  good.";  // two spaces before 'good'

int main(void) {
    printf(message);
    int words = 0;
    int inWord = 0;
    int *cursor = message;
    while (*cursor) {
        char letter = *cursor;
        // Detect word ends as space when we're in a word.
        if (letter == ' ') {
            if (inWord) {
                words += 1;
            }
            inWord = 0;
        } else {
            inWord = 1;
        }
        // Advance cursor
        cursor += 1;
    }
    if (inWord) {
        words += 1;
    }
    printf("Word count: %d\n", words);
    return 1;
}

In this code, when we say int *cursor = message, what’s the relationship between the cursor and message variables?

They are equal to each other, and will remain equal over the life of the program.
cursor is now a pointer that points to message.
message is now a pointer that points to cursor.
The value of message (a memory address) is copied into the space for the cursor variable, so they (for now) share the same value and thus point to the same place.

Explanation: Equals signs copy values. In previous courses, we talked about “copying arrows.” Now we can see that what is copied is just a number, like copying an integer, except the number happens to also be a memory address and will be treated as such by other code.

When will the loop while (*cursor) { stop?

When cursor becomes NULL.
When *cursor becomes NUL.
When *cursor is a non-truthy character like a space.
When cursor reaches a segment boundary.

Explanation: Characters are 8-bit numbers, and in C, if and while treat non-zero values as true and zero as false. So the loop will stop when we reach the NUL byte (value 0) at the end of the string.

Now use a whiteboard to draw a memory diagram of the results of this code, including address labels and pointer values. You may use single-quoted characters like ‘a’ or ‘b’ to describe the bytes in strings rather than looking up their ASCII values, for example, 0x43 in the previous table could be 'C' instead.

Make the following assumptions as you draw your diagram:

  • Assume that the string constants used in the program (“Eel cafe…” and “Word count…”) are stored starting at address 0x18000 and are contiguous with each other.
  • Assume that the address of the message variable is 0x20000. Use ‘…’ to indicate gaps between addresses in your diagram as appropriate.
  • Assume that all four local variables of main are stored consecutively in the order they’re declared, starting at address 0x80000.
    • Remember that ints are 4 bytes, poiners are 8, and chars are 1.
  • Diagram the state of memory after main has completed right before it returns.

Compare your diagram against this solution (your diagram doesn’t need to include comments):

[._][|

Variable Address Value Comments
0x18000 ‘E’, ‘e’, ‘l’, ’ ‘, ’c’, ‘a’, ‘f’, ‘e’ String constant stored in data segment
0x18008 ’ ‘, ’d’, ‘o’, ‘e’, ‘s’, ’ ‘,’ ‘, ’g’
0x18010 ‘o’, ‘o’, ‘d’, ‘.’, 0x0, ‘W’, ‘o’, ‘r’ Second string starts
0x18018 ‘d’, ’ ‘, ’c’, ‘o’, ‘u’, ‘n’, ‘t’, ‘:’
0x18020 , ’ ’, 0x0 NUL at end of string could also be written ‘\0’
message 0x20000 0x18000 Global variable
words, inWord 0x80000 4, 1 Counted 4 words; ended in a word; each int is 4 bytes
cursor 0x80008 0x18014 Pointing at NUL at end of first string.
letter 0x80010 ‘.’ Still holds last non-NUL letter from final loop iteration.

|]

gdb (GNU Debugger)

Next, let’s look at the program main4 from the cmemory folder in gdb, the GNU debugger, which will allow us to execute our program and examine variables and memory along the way to find problems.

For extra details, you may want to consult the official gdb manual, or some of the links on our tools page, in particular this CS240 reference sheet and this general quick-reference card.

Exercise 4:

First, build the main4.bin program (“bin” stands for “binary” and refers to the fact that it’s in machine code form, ready to run on the CPU) using make on the command line:

make main4.bin

If this command doesn’t work, double check that you’ve done the setup steps from the beginning of the lab. You need to be in the cmemory directory that was created when you ran cs240 start cmemory (or unzipped cmemory.zip).

Next, start the program from the command prompt using gdb:

 gdb main4.bin

You should now get a prompt that says (gdb). First, we’ll view the code of the program from within gdb:

(gdb) list      // note that line 7 contains the assignment of ‘B’ to ptr[20]
1   #include <stdio.h>
2   #include <stdlib.h>
3   
4   int main(void) {
5       char ch = 'A';
6       char* ptr = &ch;
7       ptr[20] = 'B';
8       printf("%c\n", ptr[20]);
9   }
10

Set a breakpoint:

(gdb) break 7

Run the program (it will hit the breakpoint)

(gdb) run
Breakpoint 1, main () at main4.c:7
7       ptr[20] = 'B';

The program is paused at this point, and is ready to execute line 7 next.

You can display the contents of variables or memory locations using the print command (the /x means display the value in hexadecimal notation). Print the pointer (address in memory) as a hexadecimal number:

(gdb) print /x ptr

Paste what is displayed, and explain it:

Example answer:

$1 = 0x7fffffffd597

This shows the value of the pointer ptr (in hex), which is the memory address of the character stored in variable ch.

Extra detail: because the address starts with 0x7ff, it’s on the stack.

Prints value being pointed to as a hexadecimal number:

(gdb) print /x *ptr

Paste what is displayed, and explain it:

Example answer:

$2 = 0x41

This shows one character (the letter ‘A’) as hex. It’s the value in memory pointed to by the pointer.

Print a value as a character (/c):

(gdb) print /c *ptr

Paste what is displayed, and explain it:

Example answer:

$3 = 65 ‘A’

Shows both the decimal value for this byte and the ASCII symbol.

Print a value located at offset from a base address, using ptr notation to refer to the value:

(gdb) print /x *(ptr+20)

Paste what is displayed, and explain it:

Example answer:

$1 = 0xf7

This is an uninitialized value, so it could be anything. Here we see the number 0xf7.

You can also print a value using array notation to refer to value offset from a base address:

(gdb) print /x ptr[20]

Next, display multiple memory values using x (stands for eXamine). You can specify multiple values to be displayed, started at the specified address:

(gdb) x /24bx ptr  //will display 24 bytes of data, starting at ptr in memory

You should see something like this:

0x7fffffffd597: 0x41    0xe7    0xd7    0xff    0xff    0xff    0x7f    0x00
0x7fffffffd59f: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd5a7: 0x00    0x55    0xf5    0xa2    0xf7    0xff    0x7f    0x00

What are the numbers on the left-hand side, before the colons?

Example answer: These are the addresses of the data that’s displayed. We asked to display memory starting at 0x7fffffffd597 (the value of ptr we saw earlier), so that’s the first address shown. Depending on how wide your terminal is and therefore how many bytes can be displayed, the subsequent addresses may vary, since they’re the addresses of the first byte in each row.

What are the 2-digit hex values to the right of the colons?

Example answer:

These are the values stored in the target memory, starting with the byte that ptr points to and then continuing with the 23 subsequent bytes, since we asked for 24 total. Hex digits are 4 bits each, so 2 hex digits is 1 byte.

Note that the initial 0x41 is the ‘A’ we saw earlier.

The byte at ptr[20] is actually part of a larger value being stored on the stack for later use: 0x7ffff7a2f55 Do you see how you can read that value from the display? (Start at byte 22 and read backwards). Why do you have to read backwards?

Example answer: The debugger displays bytes left to right, with the byte with the lowest address leftmost (the opposite of little-endian, where the lowest byte holds the least-significant value, but that least-significant value is written to the right.). So, to write a multi-byte value with the most significant byte at the highest address (little-endian order, as we prefer), you need to reverse the order of the bytes as shown in gdb. With the ‘x’ command you can use a letter like ‘q’ or ‘g’ instead of ‘b’ to have it group multiple bytes together and display their values in the usual reading order.

Now, execute a single instruction using step:

(gdb) step

At this point, the ptr[20] = ‘B’; instruction would be executed, and you would expect the memory location 20 away from ptr to have a new value. Examine it as you did before:

(gdb) x /24bx  ptr

Paste the output and explain which byte got which new value:

Example answer:

0x7fffffffd597: 0x41    0x77    0xe0    0xff    0xff    0xff    0x7f    0x00
0x7fffffffd59f: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd5a7: 0x00    0x55    0xf5    0xa2    0x42    0xff    0x7f    0x00

The byte 4th from the right in the bottom row has the value 0x42 now, which is the ASCII value for uppercase ‘B’.

Continue after the breakpoint:

(gdb) continue  // will continue running after the breakpoint to complete the program

What happens next?

Example answer:

First, B is printed. Then the program generates a segmentation fault. The message reads (note you may have a different specific address):

Program received signal SIGSEGV, Segmentation fault.
0x00007fff42c295d0 in ?? ()

Notice that gdb tells you where the segmentation fault occurred. What is significant about that address?

It contains the 0x42 byte that we wrote into memory.
It matches the 8-byte address stored directly after the ‘A’ in memory.
It doesn’t start with 7fff… so it’s not a stack address.

If you get a segmentation fault when running your program from the command prompt, you can try running it in gdb, as well, because you can get more information in the debugger about what is causing the problem. In a normal situation, for example, you could use the ‘backtrace’ command to see which functions have been called and what lines of code those calls are on (in the current situation we’ve corrupted the stack so that won’t work).

Now we’re ready to move on, so exit gdb by typing quit:

(gdb) quit

valgrind

(valgrind is pronounced like “val grinned,” after the gate to Valhalla from Norse mythology.)

To understand how pointers work, it is helpful to consider some more examples where code for allocating memory does not produce the desired result. In this exercise, you will run your code using valgrind, which should help you understand why your code is incorrect. The following files are in the cmemory repository:

fibloop2.c Makefile search.c strings3.c fibloop.c matvec.c strings2.c strings.c

Exercise 5:

Compile the buggy programs by running:

make bugs

Open the file fibloop.c and examine the code. Explain how it works:

This malloc slide has a quick overview of what the malloc function does.

Example answer: It loops through indices putting the fibonacci number in each index by adding together the two numbers in the two previous slots.

Run the program:

./fibloop.bin

Does the program seem to work correctly? Yes
No

Now, run the program using valgrind:

valgrind ./fibloop.bin

Examine the output carefully (it should mostly match what’s shown here):

==23213== Memcheck, a memory error detector
==23213== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==23213== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==23213== Command: fibloop
==23213==
==23213== Invalid write of size 4
==23213==    at 0x400625: fibs (fibloop.c:23)
==23213==    by 0x400651: main (fibloop.c:29)
==23213==  Address 0x4c2e054 is 0 bytes after a block of size 20 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x400651: main (fibloop.c:29)
==23213==
==23213== Invalid write of size 4
==23213==    at 0x400625: fibs (fibloop.c:23)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==  Address 0x4c2e0bc is 0 bytes after a block of size 28 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==
result = 0 1 1 2 3
==23213== Invalid read of size 4
==23213==    at 0x4006DE: main (fibloop.c:41)
==23213==  Address 0x4c2e0bc is 0 bytes after a block of size 28 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==
result = 0 1 1 2 3 5 8 13
==23213==
==23213== HEAP SUMMARY:
==23213==     in use at exit: 0 bytes in 0 blocks
==23213==   total heap usage: 2 allocs, 2 frees, 48 bytes allocated
==23213==
==23213== All heap blocks were freed -- no leaks are possible
==23213==
==23213== For counts of detected and suppressed errors, rerun with: -v
==23213== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 6 from 6)

Based on this output, describe the problem with the program.

Example answer: The loop in fibs has the wrong continuation condition (i <= n instead of i < n), causing it to run one extra time and write to memory beyond the end of the allocated block. One of the two prints in main also prints out this extra value.

Fix the problem by modifying the code, and save the code file. Before re-running it with valgrind, what command do you need to issue in the shell?

Example answer: You need to run make (or make bugs or make fibloop.bin) to re-build the machine code from the C code. Otherwise, valgrind will be running the old machine code that still has the bugs in it.

Now, re-run it, verifying that valgrind no longer produces errors.

Which lines of code needed to be changed to get rid of the valgrind errors?

Example answer:

Line 19 had the bug that caused an invalid write. Changing <= to < on this line fixes it. Line 40 had a similar bug, with the same solution.

Alternatively, changing 7 to 8 on line 30 would generate an extra number so that the loop on line 40 would be valid.

Programming with malloc

Exercise 6:

To get some practice using malloc and free, we’ll return to the arrayOps.c file you created earlier (and use test-driven development again). Write a function called everyOther which, given an array of integers and its length as arguments, allocates a new array using malloc that’s half as long and copies every other element of the original array into the new array. It should return the pointer to the new array, which whoever calls it (including your testing code) will be responsible for freeing.

Remember that to use malloc you’ll need to specify the number of bytes you need (sizeof helps here) and cast the result to the type you want. For example, to allocate an array of 3 integers, you’d do:

int *a = (int*) malloc(sizeof(int) * 3);

As before, start with your tests. What big design decisions were not mentioned in the description above?

Example answer: Two big ones are: how long should the result be if the input array has an odd length, and does “every other element” mean the even- or odd-index elements? Feel free to decide for yourself what answers these questions should have. Writing tests first helps force you to think about and commit to decisions like these up-front, rather than having them be made implicitly as you write your code.

Write at least 3-4 tests for everyOther and then compare to the example below to make sure you’ve formatted them correctly. Try to cover key edge cases.

Example answer:

// Test arrays
int test1[] = {1, 2, 3};
int test2[] = {111, 230, 231, 235, 240};

int main(void) {
    // Tests of arraySum
    ...

    // Tests of everyOther
    int *result = everyOther(test1, 3);
    assert(result[0] == 1);
    assert(result[1] == 3);
    free(result);
    result = everyOther(test2, 5);
    assert(result[0] == 111);
    assert(result[1] == 231);
    assert(result[2] == 240);
    free(result);
    result = everyOther(test2 + 1, 4);
    assert(result[0] == 230);
    assert(result[1] == 235);
    free(result);
    result = everyOther(test2, 0);
    assert(result == NULL);  // no need to free it
    result = everyOther(NULL, 3);
    assert(result == NULL);  // no need to free it

    // Done with tests
    printf("All tests succeeded.\n");
    return 0;
}

Notes:

  • We’ve chosen here to start from index 0 and to round up for odd-length inputs.
  • We make sure to free each result when we no longer need it.
  • The length of the result array is implicit. There’s no way to check that it’s correct.
  • Our third test uses a pointer to the second element of the second test array and reduces the reported length by 1 to compensate, allowing us to test an effectively-even-length array without declaring a 3rd test array.
  • We chose to return NULL when the length is 0.
  • We didn’t cover one possible edge-case: when the incoming length is 1.

Now that you’ve thought about the design and have an idea of how to handle odd and edge cases, write up your description as a comment and add your function signature, again making sure this is above where main is defined. Here’s an example:

Example answer:

/*
 * Allocates and returns a new array that's half (rounded up) the length
 * of the given array (whose length is also specified). The new array
 * contains every other element of the original array (the elements at
 * even indices starting with the first element at index 0).
 *
 * If the input array is NULL, or the array length is 0, this will return
 * NULL without allocating anything. If there isn't enough memory to
 * allocate the result, we'll print an error message and return NULL.
 *
 * Whoever calls this function is responsible for freeing the array it
 * allocates.
 */
int *everyOther(int *array, size_t length) {
}

Note that you do not have to make all the same choices that we did in this example.

Now you’re ready to write the code. As before, pay attention to any compiler warnings/errors, and debug using printf to help tell you what’s going on if you need to.

Here’s an example solution for everyOther, but note that if you made different design choices, your code might work differently:

Example answer:

/*
 * Allocates and returns a new array that's half (rounded up) the length
 * of the given array (whose length is also specified). The new array
 * contains every other element of the original array (the elements at
 * even indices starting with the first element at index 0).
 *
 * If the input array is NULL, or the array length is 0, this will return
 * NULL without allocating anything. If there isn't enough memory to
 * allocate the result, we'll print an error message and return NULL.
 *
 * Whoever calls this function is responsible for freeing the array it
 * allocates.
 */
int *everyOther(int *array, size_t length) {
    // Catch edge cases
    if (length == 0 || array == NULL) {
        return NULL;
    }
    size_t resultLength = length / 2;  // integer division
    if (length % 2 == 1) {
        resultLength += 1;  // round up
    }
    // Result length should not be zero here
    assert(resultLength > 0);
    int *result = (int*) malloc(sizeof(int) * resultLength);
    if (result == NULL) {
        printf(
            "Error: Out of memory in everyOther with %lu elements.\n",
            length
        );
        return NULL;
    }
    int *cursorFrom = array;
    int *cursorTo = result;
    while ((cursorFrom - array) < length) {
        *cursorTo = *cursorFrom;
        cursorFrom += 2;  // skip ahead by 2
        cursorTo += 1;  // advance 1
    }
    // Now cursorTo should be 1 past the end of the result
    assert(cursorTo - result == resultLength);
    return result;
}

Note that we include a few assertions at key points in the code to double-check things that should be true but could have been false because of a bug.

Let’s reflect on the test-driven development process. How did you feel about it overall, and does it seem like it saved you time and/or improved the quality of your code?

Example answer:

There’s no “correct” answer here. Academic studies have a range of results depending on the kind of software being made and the metrics used. There is evidence that in the long run when you account for maintenance and debugging, test-driven development saves time, and there’s stronger evidence that it tends to improve code quality. But it’s possible to spend too much time writing tests, especially in situations like interactive applications where tests can be very difficult to write.

For a well-defined problem with a small scope and easily testable behavior like the ones you’ll mostly encounter in course work, test-driven development should be a good fit.

Stretch Exercise: More valgrind Pratice

Exercise 7:

Use valgrind to fix the code in fibloop2.c.

Describe the problem(s) you fixed:

Example answer:

Same <= n issue with the loop, and the return value (now just a number) is also wrong because it returns the item after the end of the array. Modifying the malloc call to allocate one extra spot is the simplest/best way to fix this because it keeps the semantics (return the nth fibonacci number).

ALSO, the allocated memory is never freed, so more and more memory gets wasted every time you call fib (valgrind reports a “leak summary” about this).

Fixing this leak is tricky because you can’t free before you return unless you first copy your return value into a new local variable. You can’t free after you return, because your function does not return the array it allocates.

Stretch Exercise: Predicting the Behavior of Mystery Functions

To get more practice with pointers, we’ll predict the behavior of some functions and verify by running the programs and observing the output.

Exercise 8:

Examine the files main0.c through main3.c from the repository and predict the output of each (you already saw main4.c in the gdb section). Feel free to give a vague answer if you’re not sure, but do actually read the code:

main0:

Example answer: Prints the letter ‘A’

main1:

Example answer: Prints the letter ‘A’

main2:

Example answer: Prints the address of the variable ‘ch’, which will be somewhere on the stack.

main3:

Example answer: Prints a character on or after the stack 20 characters beyond ‘ch’. We don’t know what this character will be and it may change every time we run the program.

Compile all the files by running:

make predictions

Note: You can also compile any of the individual filename.c files by running make filename.bin. In general, if the makefile is set up properly, typing make <name> will create the file named <name>.

Run main0 through main4 to test your predictions by entering the commands as shown below at the command prompt. The expected output is also displayed.

Note: the $ shown here takes the place of the shell prompt and is included to indicate which lines you should run vs. which are output.

$ ./main0.bin
A
$ ./main1.bin
A
$ ./main2.bin
0x7ffe8bacf967
$ ./main3.bin
?

Note that the value you see for main2.bin may be different, and the character printed by main3.bin is effectively random.

The value printed by main2.bin is the address in memory that ptr refers to on the machine you are using to run the program. A large address in this range (starting with 7ff) in our machine is understood to be part of the stack, which is the area of memory used for storing local variables for a function (along with other important information about which we will soon be learning more).

The value printed by main3.bin is whatever is 20 spots beyond the local variable ch. We don’t know what will be there, and a security measure called address space layout randomization changes what will be there each time the program runs (this is also why the value for main2.bin varies). You can try running main2.bin and/or main3.bin several times to see if you get different results.

Now write code for contains_char_a. Note that there are multiple valid approaches here; the solution below is just one of them.

Example answer:

/**
 * Return 1 if the string given by haystack contains the character
 * given by needle.
 * Return 0 otherwise.
 *
 * Precondition: haystack is a valid pointer to a well-formed string.
 *
 * Use array indexing.
 * Do not use any other functions.
 */
int contains_char_a(char* haystack, char needle) {
    int index = 0;
    char this = 1;
    while (this != 0) {
        this = haystack[index];
        index += 1;
        if (this == needle) {
                return 1;
        }
    }
    return 0;
}

Once more, compile your code and check that the tests passed.

Next, write code for contains_char_p:

Example answer:

/**
 * Return 1 if the string given by haystack contains the character
 * given by needle.
 * Return 0 otherwise.
 *
 * Precondition: haystack is a valid pointer to a well-formed string.
 *
 * Do not use array indexing.  [] is banned.  Use pointer arithmetic instead.
 * Do not use any other functions.
 */
int contains_char_p(char* haystack, char needle) {
    while (*haystack) {
        if (*haystack == needle) {
                return 1;
        }
        haystack += 1;
    }
    return 0;
}

Run the file to make sure your code passes the tests.