CS 240 Lab 6: C Programming & Pointers

Peter Mawhorter

How do Computers Work?

How do CPUs Work?

  • Power flows through chips, creating patterns of activation driven by a clock and determined by logic gates.
  • Flip-flops allow information to be stored and then updated on a clock edge, allowing circuits to compute values that change every tick.
  • An ALU performs basic operations, a register file stores operands & results, and RAM stores lots of data.
  • The control unit synchronizes the register file, ALU, RAM, and PC logic so that they can implement the basic operations in the instruction set architecture.

How do CPUs Represent Stuff?

  • High/low voltage patterns represent numbers in binary.
    • These are stored in arrays of flip-flops called registers.
    • Random Access Memory provides bigger/slower storage.
    • A sequence of data in RAM could be text or even a program.
  • The computer performs basic operations with these numbers, like addition, using an Arithmetic Logic Unit.
    • All other operations are just a series of basic operations.

What Controls the ALU?

  • Hardware instructions composed of an Operation Code & register targets control what the computer does each cycle.
    • Hardware instructions are stored in program memory and we normally advance one instruction per tick.
  • An instruction set architecture specifies what hardware instructions are available and what they do.
    • Assembly language is a way of writing down machine instructions in a (just slightly) more human-readable format.

But what about?

  • Where do machine instructions come from?
  • How is code in a language like C translated into machine instructions?

Short answer: the compiler handles this for us.

Outline

CPU Review

The datapath for the HW architecture. At the left, we start with the PC, which is a single register and which gets input from a line we’ll get to later called “PC input”. The PC feeds into the read address input of instruction memory, which outputs an instruction (16 bits) to the right. The PC also feeds into an adder whose other input is the number 2, and this gets forwarded to the branch-equal (or BEQ) mechanism. Back to the instruction, it gets split into bits 0-3, 4-7, 8-11, and 12-15. Bits 12-15 go to the control unit, drawn as a circle, which generates “Mem,” “Branch,” “Mem store,” “Reg Write,” and “ALU Control” outputs. We’ll see those pop up later, for now going back to the Instruction Memory outputs, bits 0-3 and 4-7 go into a 2x1 mux which feeds the write address of the register file, and which is selected by the “Mem” control signal. Bits 0-3 also bypass the Register file, go through a sign extender to become 16 bits, which we’ll call “offset.” This offset feeds into another 2x1 mux for the second ALU operand that’s also controlled by the “Mem” signal, as well as going to a shift-left-by-1 to become the second input to the first adder in the BEQ mechanism mentioned before. This BEQ mechanism adds the PC + 2 to the offset and then selects between that result and the original PC + 2 result using a 2x1 mux which is selected by an AND result combining the “Branch” control signal and the “Zero” output of the ALU. The result of that mux is the PC input mentioned right at the beginning. That wraps up destinations for the “offset,” backing up to the instruction bits, bits 4-7 also go to the Read address 2 for the Register file, while bits 8-11 go to the Read address 1. The only other Register file input is the Write Data, but we’ll get to that later. With two read addresses specified, along with a write address, the register file takes the “Reg Write” control signal as its “Write Enable” input, and outputs “Read data 1” and “Read data 2.” Read data 1 is 16 bits and goes straight in as the first operand of the ALU. Read data 2 (also 16 bits) is the second input to the mux described earlier that feeds the second ALU operand (whose other input if you recall is the sign-extended bits 0-3 of the instruction itself). Read data 2 is also forwarded to the Data Memory as its “Write Data” input (more on that in a second). This takes care of the outputs of the register file; with two operands to work with, the ALU also gets the ALU control signal from the control unit, and produces the aforementioned “Zero” output plus a 16-bit result. The ALU result serves as input to the Data Memory “Address” input, and also goes to a 2x1 mux below data memory which connects back to the “Write Data” input of the register file we mentioned earlier. That mux is controlled by the “Mem” signal from the control unit, and its other input is the “Read Data” output of the Data Memory, so either the ALU result or a piece of data whose address is specified by the ALU result will get written to a register file (if “Write Enable” is on, of course). As mentioned previously, the Data Memory also gets Write Data input from the Register File’s Read Data 2 output. It’s third and final input is a “Write Enable” which comes from the “Mem store” control unit signal. As already mentioned, its only output (“Read Data”) feeds into a mux and (if selected) back to the register file’s write input. And that’s the end of the datapath! 

The CPU connections listed as a table:

Component Inputs Outputs
PC
  • Next Address from BEQ Logic
  • PC to read address of Instruction Memory
  • PC + 2 to next address for BEQ Logic
Instruction Memory
  • Read Address from PC
  • 16-bit Instruction splits into
    • Bits 12-15 go to the Control Unit
    • Bits 8-11 go to Read Addr 1 of Register File
    • Bits 4-7 go to:
      • Read Addr 2 of Register File
      • Register File Write Addr Mux
    • Bits 0-3 go to:
      • Register File Write Addr Mux
      • Sign extend to 16 bits and then:
        • ALU 2nd Operand Mux
        • Shift left 1 into Offset for BEQ Logic
Control Unit
  • Bits 12-15 of Instruction
  • Reg Write signal goes to Write Enable of Register File
  • ALU Control signal goes to control inputs of ALU
  • Branch signal goes to BEQ Logic Branch input
  • Mem Store signal goes to Write Enable of Data Memory
  • Mem signal goes to:
    • Register File Write Addr Mux
    • ALU 2nd operand Mux
    • ALU/Memory result Mux
Register File
  • Read Addr 1 from bits 8-11 of Instruction
  • Read Addr 2 from bits 4-7 of Instruction
  • Write Addr from Register File Write Addr Mux (either bis 0-3 or 4-7 of instruction)
  • Write Data from Final Result Mux (either ALU Result or Data Memory Read Data output)
  • Write Enable from Reg Write output of Control Unit
  • Read Data 1 goes to ALU 1st Operand
  • Read Data 2 goes to:
    • The ALU 2nd operand Mux (along with sign-extended instruction bits)
    • The Write Data input for Data Memory
Arithmetic Logic Unit (ALU)
  • 1st operand from Read Data 1 of the Register File
  • 2nd operand from the ALU 2nd operand Mux (either Read Data 2 from the Register File or bits 0-3 of the instruction after sign extension to 16 bits)
  • ALU Control output from Control Unit
  • 16-bit result goes to:
    • Address input for Data Memory
    • Final Result Mux (along with Read Data from Data Memory)
  • Zero flag goes to Zero input to BEQ Logic
Data Memory (RAM)
  • Read Address from ALU result
  • Write Data from Register file Read Data 2 output
  • Write Enable from Control Unit Mem Store signal
  • Read Data result goes to Final Result Mux (along with ALU result)
BEQ Logic
  • Next Address from PC + 2
  • Offset from bits 0-3 of instruction after sign extend and shift left by 1
  • Branch from Control Unit
  • Zero from ALU Zero Flag
  • Adds PC + 2 and Offset inputs and then selects that result or the original PC + 2 result using an AND between the Branch and Zero inputs. Takes this result as the Next Address and feeds it back to the PC.

An image of the table below showing the assembly syntax, meaning, and encoding for each instruction in the HW ISA. The Opcode, Rs, Rt, and Rd values together are encoded in 16 bits (4 bits = 1 hex digit each). The opcode bits are the most significant bits, while the Rd bits are least significant. Note: The JMP offset is unsigned. All other offsets are signed. 

Assembly Syntax Meaning
(R = register file, M = data memory)
Opcode Rs Rt Rd
ADD Rs, Rt, Rd R[d] ← R[s] + R[t] 0010 s t d
SUB Rs, Rt, Rd R[d] ← R[s] - R[t] 0011 s t d
AND Rs, Rt, Rd R[d] ← R[s] & R[t] 0100 s t d
OR Rs, Rt, Rd R[d] ← R[s] | R[t] 0101 s t d
LW Rt, offset(Rs) R[t] ← M[R[s] + offset] 0000 s t offset
SW Rt, offset(Rs) M[R[s] + offset] ← R[t] 0001 s t offset
BEQ Rs, Rt, offset If R[s] == R[t] then
PC ← PC + 2 + offset * 2
0111 s t offset
JMP offset PC ← offset * 2 1000 offset
HALT Stops the program 1111 ignored

Code ↔︎ Assembly

  • Variable ≈ Register OR memory address
    • Complier uses 3-step (load + operate + store) with variables stored in memory
  • Function ≈ Chunk w/ jumps in & out
    • Arguments → Placed on stack by caller
    • Return location → Placed on stack by caller
    • Local variables → Placed on stack if needed
    • Newly created objects → Allocated on heap
    • Dedicated register for current stack location
  • if/while/catch/etc. ≈ BEQ ± JMP

Editing Code

Things you should be able to do in your favorite editor:

  • Go to function by name
  • Select a word/line
  • Copy/paste code precisely
  • (Un)comment blocks
  • Find/replace words
  • Autocomplete names
  • See code warnings
  • Autoindent/fix indentation

Bonus: do all of these without moving your hands off the home row.

printf

#include <stdio.h> gets you printf

printf("%d, %s, %#x\n", 240, "CS 240", 240);

Prints:

240, CS 240, 0xf0

Pointers

DON’T BE FOOLED

IN C THERE ARE ONLY POINTERS

An image of the table below showing the assembly syntax, meaning, and encoding for each instruction in the HW ISA. The Opcode, Rs, Rt, and Rd values together are encoded in 16 bits (4 bits = 1 hex digit each). The opcode bits are the most significant bits, while the Rd bits are least significant. Note: The JMP offset is unsigned. All other offsets are signed. 

Assembly Syntax Meaning
(R = register file, M = data memory)
Opcode Rs Rt Rd
ADD Rs, Rt, Rd R[d] ← R[s] + R[t] 0010 s t d
SUB Rs, Rt, Rd R[d] ← R[s] - R[t] 0011 s t d
AND Rs, Rt, Rd R[d] ← R[s] & R[t] 0100 s t d
OR Rs, Rt, Rd R[d] ← R[s] | R[t] 0101 s t d
LW Rt, offset(Rs) R[t] ← M[R[s] + offset] 0000 s t offset
SW Rt, offset(Rs) M[R[s] + offset] ← R[t] 0001 s t offset
BEQ Rs, Rt, offset If R[s] == R[t] then
PC ← PC + 2 + offset * 2
0111 s t offset
JMP offset PC ← offset * 2 1000 offset
HALT Stops the program 1111 ignored

Pointers

  • In an instruction, we use 4 bits to mean “which register,” OR “which address,” OR “how far to offset?” The OR and LW lines from the HW ISA instructions slide, with a highlights on the second and last columns. In the second column, the “R[s]” part which is used as an operand for OR but as the base address to read from for LW is highlighted. In the last column the highlight shows that for OR, the “Rd” value is ‘d’ which indicates a register to store into, while for LW, the “Rd” value is instead an offset. In both cases an arrow points points between the two entries with a question mark beside it. 
  • Consider the instructions
    5112 (OR R1, R1, R2) and
    0121 (LW R2, 1(R1))…
    • The 1s mean different things.

Pointers

  • A pointer is a variable which holds an address
    • Use the * operator to dereference it and talk about what’s at that address (can both read & write there)
    • Use the & operator to get the address of something which can be stored in a pointer
  • A pointer’s type determines the size of things it points to
    • Adding or subtracting from a pointer moves it that far (e.g., int = 4 bytes, so for int *p, p + 1 is 4 bytes farther)
    • void * when we don’t know what we’re pointing to
      (avoid this)

Pointers

  • The C compiler understands normal values and pointers. Both arrays and strings are implemented using pointers.
  • int vec[] is really just int *vec
  • Strings have type char *

Arrays?

int x[3] = {1, 2, 3};
printf("%d, %d, %d\n", x[0], x[1], x[2]);
x[2] += 2;
printf("%d\n", x[2]);

Prints:

1, 2, 3
5

Arrays?

int x[3] ...

Declare a variable x of type “array-of-int”
Actually: type int *, or pointer-to-int

... = {1, 2, 3};

Put the numbers 1, 2, 3 into that array
Actually: allocate space for three numbers as part of the program, and store the address of the first one in x

Arrays?

int x[3] = {1, 2, 3};
printf("%p\n", x);

Prints (%p is the format for a pointer):

0x7fffc5b375bc

This is the memory address of the first number

Arrays?

int x[3] = {1, 2, 3};
printf("%p\n", x);
printf("%d\n", *(int*)0x7fffc5b375bc);

Prints:

0x7ffff6755d7c
Segmentation fault (core dumped)

??? Addresses change each time you run it, for security

Arrays?

int x[3] = {1, 2, 3};
printf("%p\n", x);
printf("%d, %d, %d\n", *x, *(x + 1), *(x + 2));

Prints:

0x7ffff6755d7c (or something like it)
1, 2, 3

*(x + 1) ??? Add 1 to address of x, get value there

Arrays?

  • Array int x[] is really a pointer int *x
    • Empty [] means unspecified size.
    • Just points to the first element.
    • Can add to it to get a farther pointer.
  • Expression x[2] is really just *(x + 2)
  • What if we need to create a new array?
    • Use malloc to ask for some bytes

malloc

  • Stands for “Memory ALLOCate”
  • Puts your stuff on the “heap”
  • Allocates a specific number of bytes
    • Use sizeof to figure out # of bytes based on # of entries of a specific type
    • You need to free them later, or you get a memory leak
    • Result is void *, we cast to what we need
      (cast using (type) value)
int x[] = (int*) malloc(length * sizeof(int));

Strings?

  • Does C support strings?
    • Ha ha ha no.
    • But I can write char *string = "Some words."?

char *string

char *

Strings?

  • Strings are just pointers to characters.
  • Points to first character.
    • Up to your code to read the rest.
    • Strings include “NUL byte” = 0x00 at end.
      • Except when you mess up and they don’t :(
char *string = "Hello world!";
printf("%c\n", *string);
H

Strings?

char *string = "Hello world!";
printf("%s\n", string);  // note no '*'
while (*string != 0) {   // until the NUL byte
    printf(" %c", *string);  // space before
    string += 1;  // point to next
}
printf("\n");
printf("%d\n", *string); // as decimal
Hello world!
 H e l l o   w o r l d !
0

Strings

  • There are functions for dealing with strings in string.h
    (don’t use them for this lab)
  • Support for Unicode is pretty DIY :(
  • Lots of “sharp edges” particularly around user input

Arrays? Strings?

 

Arrays? Strings?

 

Tools

  • gdb and valgrind
    • gdb stands for “Gnu DeBugger”
    • valgrind after the gates of Valhalla (“grinned,” not “grind”)

gdb

  • Use gdb to debug a C program compiled with -g:
    • Pause and go step-by-step
    • Set breakpoints to stop at & continue
    • Print out variable values
    • Modify values on the fly

valgrind

  • Use valgrind ‘memcheck’ mode to check for memory errors during execution:
    • Prints messages when invalid memory is written or read
    • Gives summary of leaked memory at end of program

More Info