CS 240 Lab 6

Pointers in C

CS 240 Lab 6

In this lab, you will get practice working with C programs that use pointers, including predicting results of functions that use pointers, writing pointer code, and analyzing incorrect code.

You will also be introduced to some new tools which will be quite helpful in understanding and solving problems with your programs. The GNU debugger gdb allows you to see what is going on “inside” a program while it executes—or what it was doing at the moment it crashed. The debugger allows you to display values of variables and examine contents of memory, which will be handy in understanding the effect of your programs on the hardware of the system.

Another tool you will use is Valgrind, an instrumentation framework for building dynamic analysis tools (pronounced like “grinned” not “grind” as it’s named after a gate in Norse mythology). The Valgrind module we will be using, called Memcheck, is for memory error detection.

Setup

On a lab computer booted to Linux, open VSCode (you can also use your own laptop if necessary).

To get all the files for this week’s lab, in the Terminal from the Linux command line enter:

cd cs240-repos
cs240 start cmemory
cd cmemory

Warm-up

These exercises will help check your basic knowledge of C arrays and strings.

Arrays

In C, an array variable can be declared by putting square brackets after a variable, optionally with a number inside to indicate the size. Curly braces can be used to store fixed data in an array. For example:

int x[3] = {11, 22, 33};
printf("%d, %d, %d\n", x[0], x[1], x[2]);
x[2] += 11;
printf("%d\n", x[2]);

This code will print:

11, 22, 33
44

(this slide has a printf refresher)

Exercise 1:

In C, each variable holds a value which can be stored in a single register (sometimes these are saved in memory, but they need to fit into a register so that the CPU can load them from memory to operate on them). In the code above, where are the numbers 11, 22, and 33 stored when the array is created?

They are stored in whichever register is used for x.
They are stored in the computer’s memory somewhere.
They are stored in a series of registers in the CPU.
Explanation: Note that option 1 is wrong, because you cannot store three numbers in a single register (imagine if each number were very large). Option 3 is more plausible, but breaks down if you imagine an array of 100 or 1000 numbers.

What value is actually stored in the register (or memory location) used for the variable x?

The first of the three numbers, 11.
The size of the array, 3.
The memory address at which the first number is stored.

In the HW ISA, which instruction would be used to access a value in the array using the variable’s value? Correct answer: LW Explanation: For example, if x was stored in register R2, the instruction LW R3, 0(R2) would load x[0] into register R3.

In addition to the value of the variable x, what other piece of information needs to be kept around in order to safely use the array?

Example answer: The length of the array (3 in this case) needs to be retained, since otherwise you won’t be able to do things like iterate over it until the end. Unlike with a string, there won’t be any indication in the stored data that you’ve reached the end of the array.

Strings

A string in C can be declared using the type char * and can be given a value using a double-quoted sequence of characters. For example:

char *greeting = "Hello!";

The * indicates that strings are pointers to characters, and they work just like arrays of characters, except that they are null-terminated: the last character in the array will always be the special '\0' NUL byte which has numeric value 0. This means that unlike arrays, you can figure out the length of a string based on the pointer value: just follow that pointer and then march along until you see the NUL.

Exercise 2:

The NUL terminator means that in addition to the characters specified in the double quotes (stored in 8 bits each using the ASCII encoding), C stores a single byte of all zeroes at the end of each string. How many bytes are used to store the string “Hello!” in this example, including the NUL byte? Correct answer: 7

What is the length of this string, in characters? Correct answer: 6

Name one benefit and one drawback of using a NUL byte:

Hint: One benefit relates to the arrays part above; one drawback relates to memory use.

Example answer:

A benefit: unlike arrays, you don’t have to store the length of a string and make sure to pass it around wherever you use the string. Anyone with a string can figure out its length by iterating until they find a NUL.

A drawback: unlike arrays, each string requires at least 8 extra bits. If you have many very small strings (e.g., 1-character strings) then in the worst case, the total storage requirement can be twice as much as it would be without sentinel values.

Another more subtle drawback: string values themselves cannot include NUL bytes. So there are only 255 possible characters to use instead of 256.

Of course, a string can be as long or short as it needs to be, it could be thousands of characters long. But remember: in C each variable is a value that can be stored in a register. For the variable greeting above, what value is stored in a register?

The address in memory where the ‘H’ is stored.
The address in memory where the final 0 byte is stored.
The address in memory of all of the stored characters.
The ASCII-encoded bytes for the ‘H’ and the ‘e’.
Explanation: Answer 2 is wrong because if we stored that address, there would be no way for us to figure out where the string started. Answer 3 is wrong because each character is at a different address: “all the stored characters” don’t actually have one address. Answer 4 is wrong because even if you tried to store pieces of the string in subsequent registers, you’d very quickly run out of registers. In contrast, RAM has lots of space available.

Which character in the code above serves as an indication that the greeting variable is holding an address, not a direct value?

Example answer: The * next to greeting means that the type of the variable is “pointer to a character” rather than just “a character.”

Unlike Python, in C you cannot use ‘+’ to concatenate strings (instead there are a few different functions you can use for this). Given that strings are stored as pointers rather than direct values, describe what would be necessary to concatenate two different strings and return a third new string:

Example answer: You would need to copy the characters from both input strings into a new memory location next to each other, put a new sentinel NUL at the end, and then return the address of the start of the new memory location. This would take time proportional to the length of the result.

Pointers Practice

For this part of the lab, you’ll be deducing the values of various expressions involving pointers.

You’ll be filling in values based on this code:

char *p = (char*) 0x35a0;
char *q = (char*) 0x35b0;
// Line 1
int *ip = (int*) p;
int *iq = (int*) q;
// Line 2
p[0] = 2;
p[1] = 1;
p[2] = 0;
p[3] = 0;
// Line 3
p[2] = 1;
*ip = 515;
// Line 4

For each line in the table below, give the type and value of the result of the listed expression. Each expression should be evaluated as if it were run at a specific point in the code, identified by one of the ‘Line’ comments. The first line has been filled in as an example.

  • All of your answers in column 3 should be numbers. Write addresses in hexadecimal with a leading 0x to indicate their base, and write integer and character values in decimal.
  • Assume that you are using a machine with 32-bit integers and 64-bit addresses using little-endian byte ordering.
  • The sizeof function returns the number of bytes taken up by one value of the specified type.
  • The type for any sizeof result should be size_t, and the type for any result which is a difference between two addresses should be ptrdiff_t.
  • Remember that when adding to or subtracting from a pointer to do address arithmetic, the integer added or subtracted will automatically be multiplied by the size (in bytes) of the kind of value that the pointer points to. This ensures that the new address is properly aligned.

Exercise 3:

Evaluate these expressions as if they were run where it says “Line 1:”

Expression Type Value
p char * 0x35a0
&p[1] Correct answer: char * Correct answer: 0x35a1
&p[-1] Correct answer: char * Correct answer: 0x359F
&p[0] Correct answer: char * Correct answer: 0x35a0
&p[1] - &p[0] Correct answer: ptrdiff_t Correct answer: 1
&p[8] Correct answer: char * Correct answer: 0x35a8
(p + 1) - p Correct answer: ptrdiff_t Correct answer: 1
&p[16] - p Correct answer: ptrdiff_t Correct answer: 16
q - p Correct answer: ptrdiff_t Correct answer: 16
sizeof(p) Correct answer: size_t Correct answer: 8
sizeof(*p) Correct answer: size_t Correct answer: 1

These expressions should be evaluated as if the code had advanced to the “Line 2” marker, so that the ip and iq variables are defined:

Expression Type Value
&ip[0] Correct answer: int * Correct answer: 0x35a0
&ip[1] Correct answer: int * Correct answer: 0x35a4
&ip[1] - &ip[0] Correct answer: ptrdiff_t Correct answer: 1
((char*) &ip[1]) - p Correct answer: ptrdiff_t Correct answer: 4
sizeof(ip) Correct answer: size_t Correct answer: 8
sizeof(*ip) Correct answer: size_t Correct answer: 4
&ip[sizeof(int)] Correct answer: int * Correct answer: 0x35b0
ip + sizeof(int) Correct answer: int * Correct answer: 0x35b0
ip + 1 Correct answer: int * Correct answer: 0x35a4
p + sizeof(int) Correct answer: char * Correct answer: 0x35a4
iq - ip Correct answer: ptrdiff_t Correct answer: 4
&iq[-1] - ip Correct answer: ptrdiff_t Correct answer: 3

These expressions should be evaluated as if the code was at “Line 3,” where the items in the p array have been set to new values:

Expression Type Value
*p Correct answer: char Correct answer: 2
*(p + 1) Correct answer: char Correct answer: 1
*ip Correct answer: int Correct answer: 258

These expressions should be evaluated as if the code was at “Line 4,” after *ip has been updated.

Expression Type Value
p[0] Correct answer: char Correct answer: 3
p[1] Correct answer: char Correct answer: 2
p[2] Correct answer: char Correct answer: 0
*ip Correct answer: int Correct answer: 515
*((char *)ip) Correct answer: char Correct answer: 3
*(((char *)ip) + 1) Correct answer: char Correct answer: 2
(*((char *)ip)) + 1 Correct answer: char Correct answer: 4

gdb (GNU Debugger)

Next, let’s look at the program main4 from the cmemory folder in gdb, the GNU debugger, which will allow us to execute our program and examine variables and memory along the way to find problems.

For extra details, you may want to consult the official gdb manual, or some of the links on our tools page, in particular this CS240 reference sheet and this general quick-reference card.

Exercise 4:

First, build the main4.bin program (“bin” stands for “binary” and refers to the fact that it’s in machine code form, ready to run on the CPU) using make on the command line:

make main4.bin

If this command doesn’t work, double check that you’ve done the setup steps from the beginning of the lab. You need to be in the cmemory directory that was created when you ran cs240 start cmemory.

Next, start the program from the command prompt using gdb:

 gdb main4.bin

You should now get a prompt that says (gdb). First, we’ll view the code of the program from within gdb:

(gdb) list      // note that line 7 contains the assignment of ‘B’ to ptr[20]
1   #include <stdio.h>
2   #include <stdlib.h>
3   
4   int main(void) {
5       char ch = 'A';
6       char* ptr = &ch;
7       ptr[20] = 'B';
8       printf("%c\n", ptr[20]);
9   }
10

Set a breakpoint:

(gdb) break 7

Run the program (it will hit the breakpoint)

(gdb) run
Breakpoint 1, main () at main4.c:7
7       ptr[20] = 'B';

The program is paused at this point, and is ready to execute line 7 next.

You can display the contents of variables or memory locations using the print command (the /x means display the value in hexadecimal notation). Print the pointer (address in memory) as a hexadecimal number:

(gdb) print /x ptr

Paste what is displayed, and explain it:

Example answer:

$1 = 0x7fffffffd7e7

This shows the value of the pointer ptr (in hex), which is the memory address of the character stored in variable ch.

Extra detail: because the address starts with 0x7ff, it’s on the stack.

Prints value being pointed to as a hexadecimal number:

(gdb) print /x *ptr

Paste what is displayed, and explain it:

Example answer:

$2 = 0x41

This shows one character (the letter ‘A’) as hex. It’s the value in memory pointed to by the pointer.

Print a value as a character (/c):

(gdb) print /c *ptr

Paste what is displayed, and explain it:

Example answer:

$3 = 65 ‘A’

Shows both the hex value for this byte and the ASCII symbol.

Print a value located at offset from a base address, using ptr notation to refer to the value:

(gdb) print /c *(ptr+20)

Paste what is displayed, and explain it:

Example answer:

$1 = -9 ‘\367’

This is an uninitialized value, so it could be anything. Here we see the number -9, hex 0xf7, represented with a 3-digit octal escape, because it’s a non-printable ASCII character.

I looked it up and apparently octal escape codes are common for the non-printing ASCII control characters (or at least they were around the time GDB was designed).

You can also print a value using array notation to refer to value offset from a base address:

(gdb) print /c ptr[20]

Next, display multiple memory values using x (stands for eXamine). You can specify multiple values to be displayed, started at the specified address:

(gdb) x /24bx ptr  //will display 24 bytes of data, starting at ptr in memory

Paste what is displayed, and explain it:

Example answer:

0x7fffffffd7e7: 0x41    0xe7    0xd7    0xff    0xff    0xff    0x7f    0x00
0x7fffffffd7ef: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd7f7: 0x00    0x55    0xf5    0xa2    0xf7    0xff    0x7f    0x00

This shows the memory contents starting at 0x7fffffffd7e7, which is where the ‘A’ is stored on the stack (your specific memory addresses may differ). The ‘A’ (0x41) is shown first in the top left, and the 0xe7 is next, reading to the right and then down. 8 values are shown per row, with 24 total values.

The byte at ptr[20] is actually part of a larger value being stored on the stack for later use: 0x7ffff7a2f55 Do you see how you can read that value from the display? (Start at byte 22 and read backwards). Why do you have to read backwards?

Example answer: The debugger displays bytes left to right, with the byte with the lowest address leftmost (the opposite of little-endian, where the lowest byte holds the least-significant value, but that least-significant value is written to the right.). So, to write a multi-byte value with the most significant byte at the highest address (little-endian order, as we prefer), you need to reverse the order of the bytes as shown in gdb.

Now, execute a single instruction using step:

(gdb) step

At this point, the ptr[20] = ‘B’; instruction would be executed, and you would expect the memory location 20 away from ptr to have a new value. Examine it as you did before:

(gdb) x /24bx  ptr

Paste the output and explain which byte got which new value:

Example answer:

0x7fffffffe077: 0x41    0x77    0xe0    0xff    0xff    0xff    0x7f    0x00
0x7fffffffe07f: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe087: 0x00    0x55    0xf5    0xa2    0x42    0xff    0x7f    0x00

The byte 4th from the right in the bottom row has the value 0x42 now, which is the ASCII value for uppercase ‘B’.

Continue after the breakpoint:

(gdb) continue  // will continue running after the breakpoint to complete the program

What happens next?

Example answer:

First, B is printed. Then the program generates a segmentation fault. The message reads (note you may have a different specific address):

Program received signal SIGSEGV, Segmentation fault.
0x00007fff42a2f555 in ?? ()

Notice that gdb tells you where the segmentation fault occurred. What is significant about that address?

It contains the 0x42 byte that we wrote into memory.
It matches the 8-byte address stored directly after the ‘A’ in memory.
It doesn’t start with 7fff… so it’s not a stack address.

If you get a segmentation fault when running your program from the command prompt, you can try running it in gdb, as well, because you can get more information in the debugger about what is causing the problem.

Now we’re ready to move on, so exit gdb by typing quit:

(gdb) quit

valgrind

(valgrind is pronounced like “val grinned,” after the gate to Valhalla from Norse mythology.)

To understand how pointers work, it is helpful to consider some more examples where code for allocating memory does not produce the desired result. In this exercise, you will run your code using valgrind, which should help you understand why your code is incorrect. The following files are in the cmemory repository:

fibloop2.c Makefile search.c strings3.c fibloop.c matvec.c strings2.c strings.c

Exercise 5:

Compile the buggy programs by running:

make bugs

Open the file fibloop.c and examine the code. Explain how it works:

This malloc slide has a quick overview of what the malloc function does.

Example answer: It loops through indices putting the fibonacci number in each index by adding together the two numbers in the two previous slots.

Run the program:

./fibloop.bin

Does the program seem to work correctly? Yes
No

Now, run the program using valgrind:

valgrind ./fibloop.bin

Examine the output carefully (it should mostly match what’s shown here):

==23213== Memcheck, a memory error detector
==23213== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==23213== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==23213== Command: fibloop
==23213==
==23213== Invalid write of size 4
==23213==    at 0x400625: fibs (fibloop.c:23)
==23213==    by 0x400651: main (fibloop.c:29)
==23213==  Address 0x4c2e054 is 0 bytes after a block of size 20 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x400651: main (fibloop.c:29)
==23213==
==23213== Invalid write of size 4
==23213==    at 0x400625: fibs (fibloop.c:23)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==  Address 0x4c2e0bc is 0 bytes after a block of size 28 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==
result = 0 1 1 2 3
==23213== Invalid read of size 4
==23213==    at 0x4006DE: main (fibloop.c:41)
==23213==  Address 0x4c2e0bc is 0 bytes after a block of size 28 alloc'd
==23213==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==23213==    by 0x4005BF: fibs (fibloop.c:17)
==23213==    by 0x40065F: main (fibloop.c:30)
==23213==
result = 0 1 1 2 3 5 8 13
==23213==
==23213== HEAP SUMMARY:
==23213==     in use at exit: 0 bytes in 0 blocks
==23213==   total heap usage: 2 allocs, 2 frees, 48 bytes allocated
==23213==
==23213== All heap blocks were freed -- no leaks are possible
==23213==
==23213== For counts of detected and suppressed errors, rerun with: -v
==23213== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 6 from 6)

Based on this output, describe the problem with the program.

Example answer: The loop in fibs has the wrong continuation condition (i <= n instead of i < n), causing it to run one extra time and write to memory beyond the end of the allocated block. One of the two prints in main also prints out this extra value.

Fix the problem by modifying the code, and save the code file. Before re-running it with valgrind, what command do you need to issue in the shell?

Example answer: You need to run make (or make bugs or make fibloop.bin) to re-build the machine code from the C code. Otherwise, valgrind will be running the old machine code that still has the bugs in it.

Now, re-run it, verifying that valgrind no longer produces errors.

Which lines of code needed to be changed to get rid of the valgrind errors?

Example answer:

Line 19 had the bug that caused an invalid write. Changing <= to < on this line fixes it. Line 40 had a similar bug, with the same solution.

Alternatively, changing 7 to 8 on line 30 would generate an extra number so that the loop on line 40 would be valid.

Exercise 6:

Now, try the same thing with fibloop2.c (examine code, run under valgrind, analyze and fix errors).

Describe the problem(s) you fixed:

Example answer:

Same <= n issue with the loop, and the return value (now just a number) is also wrong because it returns the item after the end of the array. Modifying the malloc call to allocate one extra spot is the simplest/best way to fix this because it keeps the semantics (return the nth fibonacci number).

ALSO, the allocated memory is never freed, so more and more memory gets wasted every time you call fib (valgrind reports a “leak summary” about this).

Predicting the Behavior of Mystery Functions

To get more practice with pointers, we’ll predict the behavior of some functions and verify by running the programs and observing the output.

Exercise 7:

Examine the files main0.c through main3.c from the repository and predict the output of each (you already saw main4.c in the gdb section). Feel free to give a vague answer if you’re not sure, but do actually read the code:

main0:

Example answer: Prints the letter ‘A’

main1:

Example answer: Prints the letter ‘A’

main2:

Example answer: Prints the address of the variable ‘ch’, which will be somewhere on the stack.

main3:

Example answer: Prints a character on or after the stack 20 characters beyond ‘ch’. We don’t know what this character will be and it may change every time we run the program.

Compile all the files by running:

make predictions

Note: You can also compile any of the individual filename.c files by running make filename.bin. In general, if the makefile is set up properly, typing make <name> will create the file named <name>.

Run main0 through main4 to test your predictions by entering the commands as shown below at the command prompt. The expected output is also displayed.

Note: the $ shown here takes the place of the shell prompt and is included to indicate which lines you should run vs. which are output.

$ ./main0.bin
A
$ ./main1.bin
A
$ ./main2.bin
0x7ffe8bacf967
$ ./main3.bin
?

Note that the value you see for main2.bin may be different, and the character printed by main3.bin is effectively random.

The value printed by main2.bin is the address in memory that ptr refers to on the machine you are using to run the program. A large address in this range (starting with 7ff) in our machine is understood to be part of the stack, which is the area of memory used for storing local variables for a function (along with other important information about which we will soon be learning more).

The value printed by main3.bin is whatever is 20 spots beyond the local variable ch. We don’t know what will be there, and a security measure called address space layout randomization changes what will be there each time the program runs (this is also why the value for main2.bin varies). You can try running main2.bin and/or main3.bin several times to see if you get different results.

Writing Code using Pointers

This section of the lab will give you some practice writing functions with pointers for string manipulation, which are very similar to some you will need for the next assignment.

Exercise 8:

Open practice.c, and write the code for the following functions that are declared but are unimplemented:

Write code for string_length_a. Compare your code to the implementation shown here:

Example answer:

/**
 * Return the length of str, not counting the null terminator
 * character.
 *
 * Precondition: str is a well-formed C string.
 *
 * Use array indexing. Do not use any other functions.
 */

int string_length_a(char str[]) {
    int count = 0;
    while (str[count]) {
        count += 1;
    }
    return count;
}

After writing string_length_a, compile and run with:

$ make practice.bin
$ ./practice.bin

To be able to scroll through the output, you could also run:

$ ./practice.bin | less

If you use ‘less’, you can hit ‘q’ to quit. Note that in less, you can also hit ‘/’ to search the output.

You should look at the first block of output labeled “Testing string_length_a” and ensure that it says “0 FAIL” at the end.

Next, write the code for string_length_p:

Example answer:

/**
 * Return the length of str, not counting the null terminator
 * character.
 *
 * Precondition: str is a well-formed C string.
 *
 * Do not use array indexing.  [] is banned.  Use pointer arithmetic instead.
 * Do not use any other functions.
 */
int string_length_p(char* str) {

    int count = 0;
    while (*str) {
        str += 1;
        count += 1;
    }
    return count;
}

Once more, you can compile and run with:

$ make practice.bin
$ ./practice.bin

Check that the tests for string_length_p succeeded.

Now write code for contains_char_a. Note that there are multiple valid approaches here; the solution below is just one of them.

Example answer:

/**
 * Return 1 if the string given by haystack contains the character
 * given by needle.
 * Return 0 otherwise.
 *
 * Precondition: haystack is a valid pointer to a well-formed string.
 *
 * Use array indexing.
 * Do not use any other functions.
 */
int contains_char_a(char* haystack, char needle) {
    int index = 0;
    char this = 1;
    while (this != 0) {
        this = haystack[index];
        index += 1;
        if (this == needle) {
                return 1;
        }
    }
    return 0;
}

Once more, compile your code and check that the tests passed.

Next, write code for contains_char_p:

Example answer:

/**
 * Return 1 if the string given by haystack contains the character
 * given by needle.
 * Return 0 otherwise.
 *
 * Precondition: haystack is a valid pointer to a well-formed string.
 *
 * Do not use array indexing.  [] is banned.  Use pointer arithmetic instead.
 * Do not use any other functions.
 */
int contains_char_p(char* haystack, char needle) {
    while (*haystack) {
        if (*haystack == needle) {
                return 1;
        }
        haystack += 1;
    }
    return 0;
}

Run the file to make sure your code passes the tests.

Extra Exercises: Why Pointers?

We’ve already established that arrays and strings simply cannot be stored in registers, because they have variable length, unlike integers or characters. Let’s look at an example of that in assembly code.

Consider the following HW ISA assembly code “function”:

Note: in assembly code, ; is sometimes used instead of // to introduce a comment.

Address Machine Code Assembly Comment
30 4004 OR, R0, R0, R3 ; Set R3 to 0, this will be our index/counter
32 2245 ADD, R2, R3, R5 ; Add R2 and R3 to compute the current address. Store it in R5
34 0560 LW R6, 0(R5) ; Load from memory using R5 as the address, storing into R6
36 7602 BEQ R6, R0, 2 ; Skip next 2 instructions if the R6 value is 0
38 2144 ADD R1, R3, R3 ; R3 += 1
3A 8019 JMP 25 ; Jump to instruction #25 (address 0x32) to restart the loop
3C FFFF HALT ; Replace w/ appropriate jump to continue instead

This code implements a loop which reads data from sequential memory addresses until it finds a whole word that’s 0, and then stops. It’s stored starting at address 0x30, so there’s room for other code before and after it. You can “call” it by jumping to address 0x30, using the instruction JMP 24, encoded as 0x8018.

Exercise 9:

If we were to use this assembly code as part of a larger program, which register should we put a value into before jumping into this code to “call” it?

Correct answer: R2

If we change the final HALT instruction to jump back into other code that we’re “calling” this code from, which register can that code read from to get the number of non-zero memory values that this function counted?

Correct answer: R3

If we treat this as a function with one argument and a return value using the registers you just identified, and we assume that the argument is supposed to be a string, how would you describe what this function does?

Example answer:

It measures the length of a string.

Small caveat: this function requires the string to end with 16 0 bits, rather than just 8, because it loads an entire 16-bit word at a time.

Assume that you have the following data stored in memory starting at address 5:

Address Value ASCII Letter
5 0x48 H
6 0x45 E
7 0x4C L
8 0x4C L
9 0x4F O
A 0x00 NUL
B 0x00 NUL

What steps do you need to take to measure the length of the string "HELLO" by “calling” our assembly code “function?” Describe specific assembly instructions and which register(s) need to hold what data (but you don’t have to actually list every required instruction).

Example answer:

First, you need to store the value 5 in register R2. This can be accomplished by using a few ADD instructions starting with R1 as an input, since R1 always has the value 1. You can do 1 + 1 = 2, then 2 + 2 = 4, and finally 4 + 1 = 5.

Next, you need to “call” the function by issuing a “JMP 24” instruction.

Finally, you’d need to modify the final HALT in the function to jump back to your code (wherever it’s being written) so that you can then read the value of register R3 which will have your result.

What would the result (stored in R3) be if we set R2 to 7 and then ran this code?

Correct answer: 3

Extra Exercises: More Predictions

For each of main9.c, mystery0.c, and mystery1.c: open the file, read the C program, predict the printed output, then compile and run the program to verify your predictions. Explain any surprising behavior in the actual output.

Exercise 10:

Predict the output of main9.bin:

Example answer:

Prints the following:

  1. The letter ‘P’ (hex 0x50 is ASCII capital P; that’s the byte stored at the start of the 4 bytes that together form the 4th integer in the array ‘a’ (at index 3).
  2. The letter ‘A’ (hex 0x41, stored in the MSB of a[0], which we get because we cast ‘a’ to a char* before adding 3).
  3. The letters “D C B A H G F E L K J I P O N M” (we’re iterating through the whole array of ints printing each byte as a char, but they’re in little-endian order.

Was there anything surprising about the output of main9?

Example answer: The fact that the numbers are little-endian and so you don’t get the alphabet in order is probably surprising…

Predict the output of mystery0.bin:

Example answer:

Prints:

a = "hello!"
b = "hello!"
a = "cs 240"
b = "hello!"
a = "cs0xF"
b = "hello!"

Was there anything surprising about the output of mystery0.bin?

Example answer: The behavior of the final block is a bit tricky. Since the mystery function (which copies strings) writes a NUL byte at the end, the “cs 240” string is effectively shortened.

Predict the output for mystery1.bin:

Example answer:

Note: this one is basically impossible to predict correctly, even ignoring that you can’t predict stack addresses!

Prints:

x = 19
p = <some stack address starting with 0x7ff…>
a = "Hello!"
b = "Hi!"

Now the message “Hi, CS 240!” gets written starting from b, and since b only has space for 4 bytes, the “Hi,” goes into b, and further bytes overwrite other values. It would be reasonable to assume that the 4 bytes used to store the integer x are next after the array b, followed by the 8 bytes for a and then the 8 bytes for p. However, if we run the program with gdb we can use the print command to show the addresses of our variables: print &p then print &a, etc. It turns out that the ordering is b, then a, then p, then x, and there’s a 4-byte gap between p and x. The ordering probably has to do with their types, although I’m not certain on the details of how the compiler makes these decisions. The gap for x has to do with “alignment” which we’ll see more about later.

This means that the rest of the string, including the NUL terminator, fits neatly into a.

So we print:

x = 19
p = <same stack address as before>
a = "CS 240!"
b = "Hi, CS 240!"

Importantly, even though b is only 4 bytes, when we print it, the print code runs until it sees the NUL byte, regardless of how big b is “supposed” to be.

Next up, we have the string “What happens if we use a really really long string?” This will overwrite not only all of the variables on the stack, but also bookkeeping information like the return address, causing a segmentation fault when the function returns (but it’s not going to do that just yet…). These bytes are assigned to our variable as follows:

  • “What” to b
  • ” happens” to a
  • ” if we u” to p
  • ” rea” to x after placing “se a” into the gap between p and x
  • The rest of the string overwrites critical stack information beyond the variables.

Hex values for ” if we u” are 0x20, 0x69, 0x66, 0x20, 0x77, 0x65, 0x20, and 0x75. They will form 0x7520657720666920 in little-endian order, so that’s what p will show as.

Meanwhile, the values for x (” rea”) are 0x20, 0x72, 0x65, and 0x61, which in little-endian order is hex 0x61657220. A calculator tells me that that value is 1634038304 in decimal, which is what will be printed for x.

So we print:

x = 1634038304
p = 0x7520657720666920
a = " happens if we use a really really long string?"
b = "What happens if we use a really really long string?"

Note that once again the a and b prints don’t care that information is written ‘outside’ of the variable; they just continue printing until NUL is found.

Finally, we try to copy ‘Hi?’ to p, but since the value of p got scrambled, we get a segmentation fault here.

Surprises in actual output for mystery1:

Example answer:

Everything is a surprise! The exact way memory gets corrupted on the second and third writes is surprising, and the segmentation fault or other weird behavior for the last block is definitely surprising.

Especially surprising to me was the way that the variables were laid out in memory on the stack. I had to triple-check with gdb to know what their addresses were in order to predict the results from the really long string, which I got wrong on my first 2-3 attempts.

Extra Exercises: Memory Diagrams

The strings3.c file contains the following code:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char** argv) {


    // dynamically allocate space for 3 pointers
    char** commandA = (char**) malloc( 3 * sizeof(char*) );
    // commandA is an array of pointers to strings.
    // commandA itself  is also  the address in memory where the first pointer is stored.
    // each pointer will be the starting address of a string of characters.


    //  Initialize the strings for commandA
    commandA[0] = "emacs";
    commandA[1] = "strings.c";
    commandA[2] = NULL;  /* mark the end of the commandA array by assigning a
    NULL to the last pointer in the array */

    //  deallocate the memory used for the commandA array
    free(commandA);

    // allocate a new array of 3 pointers called commandB
    char** commandB = (char**)malloc( 3 * sizeof(char*) );


    //initialize the strings for commandB
    commandB[0] = "ls";
    commandB[1] = "cs240-pointers";
    commandB[2] = NULL;

    //change the second string of the commandA array
    commandA[1] = "uh oh";


    //print the strings of each array
    printf("A: %s %s\n", commandA[0], commandA[1]);
    printf("B: %s %s\n", commandB[0], commandB[1]);


    // deallocate the commandB array
    free(commandB);


    return 0;
}

The first four statements in the main function are:

    char** commandA = (char**)malloc( 3 * sizeof(char*) );
 
    commandA[0] = "emacs";
    commandA[1] = "strings.c";
    commandA[2] = NULL;

In previous classes like CS 111 and CS 230, you’ve used memory diagrams to help you understand how data is organized, particularly when it contains references, which arrays/lists in Java and Python do. Now you get to actually see how memory is really used:

  1. An arrow in a memory diagram represents a pointer, with the numerical address of whatever is at the head of the arrow stored in the box that holds the tail of the arrow.
  2. A small box in a memory diagram that can hold a value represents a particular spot in memory, or possibly (when used for a variable) a register in the CPU. These boxes can hold a specific amount of bits based on the hardware design of the CPU (typically 32 or 64).
  3. A large box (such as for an array or list) represents a contiguous block of memory that holds multiple values (the small boxes inside).

Instead of drawing a typical memory diagram, we will now draw memory as a column of rectangles, each representing one word of memory (64 bits for now, since the CS department server is a “64-bit” machine that has 64-bit registers in its CPU). The rectangles each have an address written in the left-hand margin, which you can make up for now. In addition to drawing arrows in this diagram that point upwards or downwards to other boxes, you will write the address of the box they point to in the box containing the arrow, so that the arrows are purely a visual aid, and the boxes represent the actual data stored.

Here’s an example of an abstract memory diagram that you’re familiar with on the left, plus the same data shown in the new style we’ll be using in labs. Note that we need to make up an address for each piece of data, for now you can use any addresses as long as you’re consistent. The actual specifics of which values are on the stack, on the heap, or stored in a program’s data segment are things we’ll learn in the coming weeks.

This is the memory diagram format we’ve used in previous classes:

A classic memory diagram, showing the following data: Variables A, B, and C each have their own small box. The box for A contains the tail of an arrow that points to a big box with two smaller boxes inside, labeled 0 and 1. The second small box inside this big box has ‘null’ written in it. The first box contains the tail of an arrow that points to another big box, this time with three smaller boxes inside. The smaller boxes in the second big box read ‘C’, ‘S’, and ‘\0’. The small box for B contains the tail of a third arrow, which also points to the second large box. The small box for C contains the tail of an arrow that points to a third big box. This big box contains three small boxes labeled 0, 1, and 2. Inside these three small boxes are the values 111, 230, and 240. 

Here is the same data in CS 240 lab style:

A new memory diagram for the same data. This time, it’s a stack of rectangles, with address numbers to their left and data values inside them, plus variable names in the left margin and arrows and then comments in the right margin. At the bottom, the lowest rectangle has variable name ‘A’, address 0x10, and value 0x30. Its comment reads ’char **’ (said like “char-star-star” or “character pointer pointer”). The next rectangle right above that has variable name ‘B’, address 0x18, value 0x78, and comment “char ” (said like ”char-star” or ”character pointer”). Above those two boxes is an ellipsis before a second pair of boxes. The bottom box of the second pair has address 0x30, value 0x78, and ”char ” as a comment. The top one has address 0x38, value 0x0, and “char * (null)” as a comment. Then there’s a second ellipsis, followed by a single box with address 0x78 that’s divided horizontally into eight squares. The first five squares from the left are blank, and the last three contain the values 0x00, 0x53, and 0x43, from left to right. The comment for this line states “CS” (little-endian). Then there’s another ellipsis, followed by a single box with variable name ‘C’, address 0xA0, value 0x1F8, and comment “int *“. Finally, there’s one more ellipsis with two boxes above it, each divided horizontally into two parts. The lower of these boxes has address 0x1F8 and the second has address 0x200. The lower one has values 230 on the left and 111 on the right, while the upper one is blank on the left and has the value 240 on the right. The lower box has the comment”Two 4-byte ints” while the upper one has the comment “Third 4-byte int”. Finally, there are some arrows near the right margin of the boxes, originating from dots inside certain boxes near their right edges, and terminating outside the right side of other boxes. The arrows are as follows: First, one red arrow from the ‘A’ box with value 0x30, pointing to the box with address 0x30 that contains the value 0x78. Next, two blue arrows, originating from the ‘B’ box with value 0x78 and from the box at 0x30 with value 0x78. These merge together to point to the box with address 0x78 (where the string “CS” is stored). Finally, one purple arrow from the ‘C’ box with value 0x1F8, pointing to the box with address 0x1F8. The same info is available as a table just ahead. 

This table represents the same data (with the same layout) as the new-style diagram, but it may be more accessible to people using a keyboard to navigate and/or using a screen reader.

Variable Address Data Comments
0x200 240 Third int in the array that starts below
0x1F8 230, 111 Two 4-byte int in this 8-byte slot
C 0xA0 0x1F8 Type int *. Points to start of block above.
0x68 0x0, 0x53, 0x43 Three bytes ‘C’ = 0x43, ‘S’ = 0x53, and NUL = 0x0. Little endian.
0x38 0x0 “null” pointer
0x30 0x68 Type char *. Points to start of block above.
B 0x18 0x68 Type char *. Points to start of third block up.
A 0x10 0x30 Type char **. Points to start of next block up.

Notes on the new style of memory diagrams:

  1. You get to make up whatever addresses you want, except that data which is by definition contiguous, like string characters or array entries, must be shown at contiguous addresses.
    • Smaller addresses start at the bottom. Note: You will see some contexts, like gdb “examine” output, where the convention is the opposite
    • When addresses are not contiguous, you should add ellipses to indicate this.
    • In this diagram, A and B did not need to be drawn together, and C was placed separately to illustrate this point. But drawing C next to A and B would have saved some ellipses and would also be fine.
  2. Each box is 64 bits. When multiple smaller values are packed into one box, we sub-divide the box appropriately.
    • Each address is a multiple of 8 bytes, so the last digit in hex is either a 0 or an 8.
    • Little-endian ordering affects where things are placed. The eight individual bytes in each box are addressed from right to left.
  3. Values are numbers, but may be written as hexadecimal (prefixed with 0x) or decimal as appropriate.
    • 1-byte character values can be written as single-quoted letters (not shown here; see below).
    • Interpretations of the data such as string values belong in the “comments” column.
    • Address values should always be written in hex (including pointers).
  4. The arrows always point to the address matching the value in the box. They only exist to help you trace where pointers refer to; they are not actually data stored in the system.

Here is a memory diagram for the commandA variable from the code above:

A memory diagram in the new style. Bottom box has address 0x48, variable name “commandA”, value 0xC0 and comment “char **“. Above it is an ellipsis beyond which are three boxes in a row with addresses 0xC0, 0xC8, and 0xD0. These contain values 0x100, 0x138, and 0x0. The comments are”char “,”char ”, and “char * (null)”. Above those three boxes is another ellipsis, after which is a box with address 0x100 divided horizontally into 8 squares. The comment is “emacs” and after two blank squares on the left, the squares have values 0x00, 0x73, 0x63, 0x61, 0x6D, and 0x65. Above this is a third ellipsis, and above that are two boxes each divided into 8 squares. These have addresses 0x138 and 0x140. The bottom row has comment “strings.c” and the row has comment “continued”. The squares in the bottom row have single-quoted characters in them (from left to right): ‘.’, ‘s’, ‘g’, ‘n’, ‘i’, ‘r’, ‘t’, ‘s’. The top row has 6 blank boxes, followed by a box with ‘\0’ in it and then the rightmost box with ‘c’ in it. There is a red arrow pointing from the very first box at address 0x48 with value 0xC0 to the first box of the second group, with address 0xC0. Blue and purple arrows point from the first two boxes in the second group with values 0x100 and 0x138. These point to the box in the third group and the first box in the top group, respectively, since those have addresses 0x100 and 0x138. A table with this information follows. 

Here’s the equivalent table:

Variable Address Data Comments
0x140 ‘\0’, ‘c’ continued from below
0x138 ‘.’, ‘s’, ‘g’, ‘n’, ‘i’, ‘r’, ‘t’, ‘s’ “strings.c”
0x100 0x00, 0x73, 0x63, 0x61, 0x6D, 0x65 “emacs”
0xD0 0x0 null
0xC8 0x138 Type char *
0xC0 0x100 Type char *
commandA 0x48 0xC0 Type char **

Here are the declarations for commandB:

char** commandB = (char**)malloc( 3 * sizeof(char*) );

commandB[0] = "ls";
commandB[1] = "cs240-pointers";
commandB[2] = NULL;

Draw a memory diagram for commandB similar to the one above, and check it against the table below (note that your addresses will likely be different, which is fine as long as they’re consistent with each other).

Example answer:

Variable Address Data Comments
0xD8 ‘\0’, ‘s’, ‘r’, ‘e’, ‘t’, ‘n’, ‘i’ continued
0xD0 ‘o’, ‘p’, ‘-’, ‘0’, ‘4’, ‘2’, ‘s’, ‘c’ “cs240-pointers”
0xA0 ‘\0’, ‘s’, ‘l’ “ls”
0x50 0x0 null
0x48 0xD0 Type char *
0x40 0xA0 Type char *
commandB 0x20 0x40 Type char **

Predict what you think the output will be after these 3 statements in the program:

commandA[1] = "uh oh";
printf("A: %s %s\n", commandA[0], commandA[1]);
printf("B: %s %s\n", commandB[0], commandB[1]);

Example answer:

Both commandA and commandB will have their second part updated, since commandB is likely allocated right on top of the freed memory for commandA.

Run the program:

$ ./strings3.bin

What happens? Use valgrind and re-examine the complete program to explain the problem (but do not fix the error):

Example answer:

It prints:

A: ls uh oh
B: ls uh oh

As predicted, the memory for commandB is corrupted because we accessed previously-freed memory which overlapped with it.

Use gdb to examine memory and understand the connection between the diagram and real memory.

$ gdb ./strings3.bin

Set a breakpoint at the beginning of the program:

(gdb) break main

What address in memory does main begin (you should be able to read that as a result of setting the breakpoint)?

Example answer:

0x4005cc (much earlier in memory than the stack)

Note that your value may differ slightly.

Run the program within gdb:

(gdb) run

You will hit the breakpoint, meaning you are at the beginning of the main program.

List the instructions:

(gdb) list
7    int main(int argc, char** argv) {
8      char** commandA = (char**)malloc( 3 * sizeof(char*) );
9    
10      commandA[0] = "emacs";
11      commandA[1] = "strings.c";
12      commandA[2] = NULL;
13    
14      free(commandA);

Examine the local variables at this point:

(gdb) info locals

There should be two local variables listed, commandA and commandB.

You can also display the value of a variable directly:

(gdb) print commandA

What is the value of commandA at this point? Why?

Example answer:

gdb should show something like:

$2 = (char **) 0x0

The code hasn’t started to run yet, so the variable does not yet have a value set. It’s not necessarily the case, but it’s a decent bet that variables which have not yet been initialized will be 0.

Execute one step (the first line in main, which allocates space for commandA):

(gdb) step 1
10      commandA[0] = "emacs";

Note: after you execute a step, gdb will display the next line to be executed.

Now print commandA again:

(gdb) print commandA

What has changed and why?

Example answer:

Now we see something like:

$3 = (char **) 0x602010

Now commandA points to some space on the heap, because we’ve called malloc.

Now, examine the contents of memory at commandA (using the /3a argument to the x command to mean “mean display the 3 addresses stored there”):

(gdb) x /3a commandA

You should see something very similar to this (although the actual addresses on your machine may be different):

0x602010:    0x0    0x0
0x602020:    0x0

Why are we interested in 3 addresses – what are you looking at? Why are they all 0 right now?

Example answer: We’re looking at the pointers in the commandA array-of-pointers. We hope they’re all 0 because they’re uninitialized, although they could have any values, because we did not call calloc to allocate + clear the memory, we called malloc to just allocate it.

Execute the next instruction:

(gdb) step 1

Examine commandA again:

(gdb) x /3a commandA
0x602010:    0x400740    0x0
0x602020:    0x0

Note: Remember that gdb displays lower address values first, followed by higher values, which is the opposite of the memory diagrams we just drew.

You could also examine the value with a print command, although it will only display the single value pointed to by commandA1 (not all three pointers/addresses in the array).

What has changed, and why? (Remember that the instruction you just executed is: commandA[0] = "emacs";)

Example answer:

As you might expect, the first pointer in the array of pointers has been assigned a value. It now points to the program data where “emacs” was stored as part of the program’s code.

To examine memory at the address specified by the new value you see, and display a byte in hexadecimal notation, run:

(gdb) x  /bx  *commandA

You can also do this with:

(gdb) print  /x  *commandA[0]

Or,

(gdb) print /x **commandA

Record what you see:

Example answer:

0x400740: 0x65

Again, your exact address may be different.

What does this value represent (remember that commandA is an array of pointers to strings, and that each string can be thought of as an array of characters)?

Example answer: This is the ‘e’ of the string “emacs”

Repeat the command, but display the contents of memory as characters (in this case, the 6 characters in memory starting at the specified address):

(gdb) x /6c *commandA

Examine memory at that same address again, this time formatting the data as a string:

(gdb) x  /s  *commandA

Or, to do the same thing:

(gdb) x  /s  commandA[0]

Or:

(gdb) print commandA[0]

Now, execute another step:

(gdb) step 1

Print the new value:

(gdb) print /a commandA[1]

What has changed? Explain what the new value/address you see represents:

Example answer:

You will see something like:

$5 = 0x400746

This is the memory address of the string “strings.c”, also allocated with the program data.

Examine memory at the new address, formatting the data as a string:

(gdb) print /s commandA[1]

Is it what you expected?

Example answer: Indeed this shows “strings.c”

Execute another step and print the third entry in the commandA array:

(gdb) step 1
(gdb) print  commandA[2]

Has anything changed? What did the last instruction accomplish? (commandA[2] = NULL; )

Example answer:

You will see:

$7 = 0x0

This sets the last part of the commandA array to NULL. If we had junk data in there before, it will be cleared away, but it’s reasonably likely the value was already zero (as shown above). Now it will reliably be zero.

After the next step, you will have executed the free() call:

(gdb) step 1

What does the free do?

Example answer:

Tells malloc that it can re-use the memory we had previously reserved. The first part of that memory may be immediately overwritten by a new value (it was in my testing).

Execute the next step, which initializes commandB, then print commandA and commandB again:

(gdb) step 1
(gdb) info locals

Can you now explain why the program prints “uh oh” twice?

Example answer:

commandB was allocated on top of the free memory that used to be commandA, so changes using the commandA pointer will affect commandB. It’s invalid to use the commandA pointer at all after it’s been freed.

Exit gdb:

(gdb) quit

Now, at the command prompt, run the program using valgrind, and examine the errors produced:

valgrind ./strings3.bin

The output should look like this:

==7698== Memcheck, a memory error detector
==7698== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7698== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==7698== Command: ./strings3.bin
==7698==
==7698== Invalid write of size 8
==7698==    at 0x40064E: main (strings3.c:22)
==7698==  Address 0x5205048 is 8 bytes inside a block of size 24 free'd
==7698==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==7698==    by 0x40060E: main (strings3.c:14)
==7698==  Block was alloc'd at
==7698==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==7698==    by 0x4005D5: main (strings3.c:8)
==7698==
==7698== Invalid read of size 8
==7698==    at 0x40065D: main (strings3.c:23)
==7698==  Address 0x5205048 is 8 bytes inside a block of size 24 free'd
==7698==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==7698==    by 0x40060E: main (strings3.c:14)
==7698==  Block was alloc'd at
==7698==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==7698==    by 0x4005D5: main (strings3.c:8)
==7698==
==7698== Invalid read of size 8
==7698==    at 0x400664: main (strings3.c:23)
==7698==  Address 0x5205040 is 0 bytes inside a block of size 24 free'd
==7698==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==7698==    by 0x40060E: main (strings3.c:14)
==7698==  Block was alloc'd at
==7698==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==7698==    by 0x4005D5: main (strings3.c:8)
==7698==
A: emacs uh oh
B: ls cs240-pointers
==7698==
==7698== HEAP SUMMARY:
==7698==     in use at exit: 0 bytes in 0 blocks
==7698==   total heap usage: 2 allocs, 2 frees, 48 bytes allocated
==7698==
==7698== All heap blocks were freed -- no leaks are possible
==7698==
==7698== For lists of detected and suppressed errors, rerun with: -s
==7698== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

Explain the meaning of each of the three errors shown above:

Example answer:

The first error is the write to commandA[1] which was already freed.

The second two errors are for reading commandA[0] and commandA[1] for the print.

Explain how to fix these errors.

Example answer: Move the free call for commandA down to where the free for commandB is.

Extra Exercises: More Pointer Code

If you want extra practice, here are solutions for some of the more complex functions in the practice.c file:

Remember that you can compile and run tests with:

make practice.bin
./practice.bin

You may wish to add | less to the second command to paginate the output. If you do, use ‘q’ to quit pagination when you’re done.

substring

Example answer:

/**
 * Return a pointer to a newly allocated string holding the characters
 * in haystack starting with the character at index start and ending
 * just before the character at index end.
 *
 * Preconditions:
 * - haystack is a valid pointer to a well-formed string.
 * - To begin, assume start and end respect the bounds of haystack:
 *    - start > 0
 *    - start < (length of string) - 1
 *    - end < (length of string) - 1
 *    - end - start >= 0
 *
 * Do not use any other functions besides malloc.
 */
char* substring(char* haystack, int start, int end) {

    int len = end - start; // abusable
    char *result = (char*) malloc((len + 1) * sizeof(char));
    int i;
    for (i = 0; i < len; ++i) {
            result[i] = haystack[start + i];
    }
    result[i] = '\0';
    return result;
}

contains_string

Example answer:

/**
 * Return 1 if the string given by haystack contains the string given
 * by needle as a substring.
 * Return 0 otherwise.
 *
 * Precondition: haystack and needle are both valid pointers to
 * well-formed strings.
 *
 * Do not use any other functions.  There are a few ways to implement
 * this, some more efficient than others.  Start with a simple
 * approach.  Optimize if you have time left over at the end of lab.
 */
int contains_string(char* haystack, char* needle) {
    int matchfrom;
    int matchto;
    int targetlen = string_length_a(needle);
    if (targetlen == 0) {
            return 1;
    }
    int index = 0;
    char this;
       do {
        this = haystack[index];
        index += 1;
        if (this == needle[0]) {
                // Attempt to match the whole string here before proceeding
                char cmp = 'A'; // just to get into the loop
                matchfrom = 1;
                matchto = 0; // because we've already added 1 to index
            while ((matchfrom < targetlen) && cmp) {
                cmp = haystack[index + matchto];
                if (cmp == needle[matchfrom]) {
                    matchfrom += 1;
                    matchto += 1;
                } else {
                        break;
                }
            }
            if (matchfrom == targetlen) {
                return 1;
            }
        }
    } while (this);
    return 0;
}