CS 240: Architecture

Assignment: Architecture

Assign: Sunday 16 February
Checkpoint: aim to complete Q1, Q2, and Q4 by by the end of the day Thursday 20 February
Due: Tuesday 25 February
Policy: Individual graded synthesis assignment
Submit: Upload a PDF written on the cs240-arch-worksheet-blank-S25.pdf submission sheet
Reference:

Goals

To practice using combinational, arithmetic, and sequential logic.
To understand how to encode computation with low-level building block including multiplexers, flip-flops, and adders.
To master every piece of a simplified but representative instruction set architecture—the HW ISA.

Time reports

For a shorter version of this assignment in a previous semester, students self-reported the following times:

25% of students spent <= 5 hours.
50% of students spent <= 7 hours.
75% of students spent <= 10 hours.

Assignment: Architecture
- Time reports
Contents
Exercises
Submission

Exercises

Please write your answers on the cs240-arch-worksheet-blank-S25.pdf submission sheet to streamline the grading process. Submit your work by scanning a PDF of all worksheet pages (in order) and uploading it to Gradescope. Remember that this course uses your Gradescope account attached to your Wellesley email address (the variety that matches your username). You can merge accounts across multiple email addresses if helpful.

Notes:

This assignment is worth 103 points.

Q1. Universal Muxification of Gates [10 points]

Recall that a simple MUX (a 2:1 mux) has 3 inputs: a selector input S, and two choice inputs, labeled 0 and 1:

2:1 muxes are universal in the sense that any circuit can be implemented using just 2-to-1 muxes. In this problem, you will show this by using muxes to implement various gates. In each circuit, draw the specified number of 2:1 muxes and implement the given boolean expression. A mux input can be A, B, 0, 1, or the output of another device (e.g., another mux or a decoder). Be sure to label the S, 0, and 1 inputs of every mux.

[1 point] Use one 2:1 mux to implement NOT A (i.e. A').
[1 points] Use one 2:1 mux to implement A AND B (i.e. AB).
[2 points] Use two 2:1 muxes to implement A NOR B.
[3 points] [Independent] Use two 2:1 muxes to implement A XOR B.
[3 points] [Independent] Use one 2:4 decoder and one 2:1 mux to implement A XOR B. Recall that a 2:4 decoder looks like the following, and that only one of the four outputs is 1 (and the other three are 0) as is determined by the output label, which corresponds to the the two input bits $𝐵_1$ $𝐵_0$. E.g., the output labeled 01 is 1 when $𝐵_1$ is 0 and $𝐵_0$ is 1. (Hint: Not all output wires of the decoder need to be used.)

Q2. vALUe Judgments [28 points]

In class and lab we designed an $n$-bit ALU by connecting $n$ 1-bit ALUs. In these exercises, you will improve the 4-bit version of our design.

Summary of our ALU design:

The Carry out from each bit is wired to the Carry in of the next bit, if present.
The full $n$-bit ALU has 4 lines of control input:
- Invert A controls the invert A inputs of all of the 1-bit ALUs.
- Negate B controls:
  - the invert B inputs of all of the 1-bit ALUs; and
  - the carry-in of the least-significant bit ALU.
- Two bits of Operation control the Operation inputs of all of the 1-bit ALUs.

Rules for extending the ALU:

Any additional logic may connect to any existing wires in the ALU and add gates or other building blocks as needed.
Minimize the number of gates or other building blocks you add. Gates such as AND can have > 2 inputs.
Choose the simplest option and clearly label new outputs.

Exercises:

[7 points]
1. [5 points] Design extra 1-bit outputs from the $n$-bit ALU for the four conditions codes. Extend the single ALU diagram provided on the worksheet to draw and label the additional logic required for all of the condition codes (flags):
  - Sign Flag [0.5 points]: 1 if the result is negative, otherwise 0.
  - Carry Flag [0.5 points]: 1 if there is a carry out from the operation, otherwise 0.
  - Zero Flag [1 point]: 1 if all bits of the result are 0, otherwise 0.
  - Overflow Flag [3 points]: 1 if there is a signed overflow, otherwise 0.
2. [2 points] Explain why your implementation of the Overflow Flag is correct
[4 points] Consider the scenario where Invert A = 1, Negate B = 1, and Operation = 10.
1. [2] Write the Result of the ALU in this scenario as an arithmetic expression using $A$ and $B$, e.g., Result = $A + B$ or Result = $\frac{2A}{B}$. Your answer should not involve Boolean/bitwise operators.
2. [2] Write an explanation that does involve Boolean/bitwise operators that explains how you derived the arithmetic expression for the Result in this scenario.
[12 points] Louise Reasoner is trying to design a Less-Than Flag output that computes $A < B$ for the 4-bit two’s-complement values $A$ and $B$. Her first idea is to compute $A - B$ and return a value based on the sign bit of the result: her hope is that a 0 sign bit for the result of the subtraction means $A \geq B$ and a 1 sign bit for the result means $A < B$.
1. [3 points] Give three pairs of 4-bit two’s complement signed values $A$ and $B$ where Louise’s approach correctly calculates $A < B$.
  - In the first pair, both $A$ and $B$ must be positive numbers.
  - In the second pair, both $A$ and $B$ must be negative numbers.
  - In the third pair, one of the two numbers must be positive and the other must be negative.
  Recall that 4-bit signed values are in the range [-8, 7], and that the result $A - B$ must also be in the range [-8, 7] after arithmetic mod 16 is perfomed.
2. [2 points] Give two pairs of values, $A$ and $B$, where Louise’s approach is incorrect in calculating $A < B$ (that is, where her approach either yields 0 when $A$ actually is less than $B$, or 1 when $A$ is not actually less than $B$).
  - In the first pair, $A$ should be positive.
  - In the second pair, $A$ should be negative.
3. [1 point] Describe the key reason(s) why Louise’s proposed approach causes answers from the examples in the previous subpart to be incorrect.
4. [3 points] Using minimal additional logic, help Louise design a 1-bit Less-Than Flag whose value always correctly indicates $A < B$ after $A - B$ is performed. That is, the new Less-Than Flag should have value 1 if $A < B$ and 0 otherwise. Extend the ALU diagram provided on the worksheet to draw and label the additional logic for the Less-Than Flag.
5. [2 points] Explain why your implementation of the Less-Than Flag always gives the correct result.
6. [1 point] Write how to control each of the 4 control lines of our 4-bit ALU to perform $A < B$:
  - Invert A = ?
  - Negate B = ?
  - Operation = ??
[5 points] Design a 1-bit Equality Flag output for the ALU using minimal additional logic. Given inputs $A$ and $B$, this output wire should carry 1 if $A == B$ and 0 otherwise.
1. [3 points] Using the provided sheet, draw and label your design.
2. [1 point] Explain why your design is correct.
3. [1 point] Write how to control each of the 4 control lines of our 4-bit ALU ALU to perform $A$ == $B$:
  - Invert A = ?
  - Negate B = ?
  - Operation = ??

Q3. Flop-flip-flopping [10 points]

Consider the following circuit that uses falling-edge triggered D flip-flops.

Assume the Clock input and values Q₀, Q₁, and Q₂ are all initially 0.

[3 points] In the worksheet diagram for this part, you are given a waveform for a clock with a falling edge at every time 𝑡 between 𝑡=0 to 𝑡=10. Draw waveforms for $Q_0$, $Q_1$, and $Q_2$ from 𝑡=0 to 𝑡=10.
[5 points] Fill out a table summarizing the values of $Q_0$, $Q_1$, and $Q_2$ after each Clock cycle. Note the order that the worksheet table lists $Q_0$, $Q_1$, and $Q_2$; this will help you identify a pattern in the next part.
[2 points] Briefly explain what this circuit does at a high level, treating the $Q$ outputs treating the $Q$ outputs as a single multi-bit value. A phrase to a few sentences is enough. Hint: you have seen a similar circuit behavior in lab.

Q4. Some Loopy Programs [22 points]

In this exercise you will simulate the execution of more HW ISA programs with loops.

Background

As an example, we consider a different program that has a loop to remind you of the conventions used in an execution table (as shown in the solution slides for the HW ISA architecture). Suppose that register R2 has been initialized with the signed decimal number 3 and register R3 has been initialized with the signed decimal number 7. Also suppose that the program P0 has the following Instruction Memory (IM):

Address	Contents
0x0–0x1	BEQ R2, R3, 3
0x2–0x3	ADD R2, R1, R2
0x4–0x5	SUB R3, R1, R3
0x6–0x7	JMP 0
0x8–0x9	HALT

Then below is an execution table that shows the step-by-step execution of the program P0.

Note that the way you should read the meaning of BEQ R1, R2, offset is that the change to the PC, PC <- PC + 2 + offset*2 replaces the non-branch change to the PC (PC <- PC + 2), as we designed in the HW ARCH slides. Specifically, note in the following example that the PC goes from 0 to 8 (not 10) based on the instruction BEQ R2, R3, 3, because PC + 2 + offset*2 = 0 + 2 + 3*2 = 8.

Execution table

PC	Instruction	State Changes
0x0	BEQ R2, R3, 3	PC ← PC+2 = 0+2 = 2 (because R[2] = 3 ≠ 7 = R[3])
0x2	ADD R2, R1, R2	R[2] ← R[2]+R[1] = 3+1 = 4; PC ← PC+2 = 2+2 = 4
0x4	SUB R3, R1, R3	R[3] ← R[3]-R[1] = 7-1 = 6; PC ← PC+2 = 4+2 = 6
0x6	JMP 0	PC ← 2*0 = 0
0x0	BEQ R2, R3, 3	PC ← PC+2 = 0+2 = 2 (because R[2] = 4 ≠ 6 = R[3])
0x2	ADD R2, R1, R2	R[2] ← R[2]+R[1] = 4+1 = 5; PC ← PC+2 = 2+2 = 4
0x4	SUB R3, R1, R3	R[3] ← R[3]-R[1] = 6-1 = 5; PC ← PC+2 = 4+2 = 6
0x6	JMP 0	PC ← 2*0 = 0
0x0	BEQ R2, R3, 3	PC ← PC+2+(2*3) = 0+2+6 = 8 (because R[2] = 5 = R[3])
0x8	HALT	program stops executing

Your Tasks

First, consider a HW ISA program P1 with the following Instruction Memory IM:

Address	Contents
0x0–0x1	ADD R0, R1, R2
0x2–0x3	ADD R1, R1, R3
0x4–0x5	ADD R3, R3, R4
0x6–0x7	BEQ R3, R4, 3
0x8–0x9	ADD R2, R4, R2
0xA–0xB	SUB R4, R1, R4
0xC–0xD	JMP 3
0xE–0xF	HALT

[8 points] In the worksheet, fill in the execution table for program P1. Use the same notational conventions used in the example execution table for P0 above. Any numbers beginning with 0x will be interpreted as hexadecimal; any numbers not beginning with 0x will be interpreted as decimal.
[3 points] Show the final values of the registers R2, R3, and R4 when the program execution halts. Again, any numbers beginning with 0x will be interpreted as hexadecimal; any numbers not beginning with 0x will be interpreted as decimal.
[3 points] Alice has started to translate program P1 (except for the final HALT instruction) into a sequence of statements in a version of C with a 16-bit int type. Fill in the rest of her program with assignments to variables named R2, R3, and R4 that represent the registers with these names in P1 and a while loop. You can use the online C compiler to check your C syntax.
```
// Alice starts with these C statements for program P1
int R0 = 0;
int R1 = 1;
int R2 = R0 + R1;
// Below, fill in the remaining C statements to complete program P1:
```
[5 points] Consider this new loopy program, P2, with the following instruction memory.
```
# R2 and R3 hold program inputs here.
0x0:  AND R2, R2, R4
0x2:  AND R3, R3, R5
0x4:  BEQ R5, R0, 3
0x6:  SUB R5, R1, R5
0x8:  ADD R4, R4, R4
0xA:  JMP 2
0xC:  HALT # Stops execution.
# What value is in R4?
```
Assume R2 and R3 hold input values when the following program starts. When the program stops, R4 will hold an output value that depends on the input values in R2 and R3. The address of each instruction in the instruction memory is shown to its left. The HALT instruction stops the computer.
1. [3 points] Execute the program with the initial state that R2 holds 5 and R3 holds 4 (you do not need to draw out the full execution table for this part).
  
  What are the final values in the registers R2, R3, R4, and R5 when the program reaches HALT? You may write your answers in either decimal or hex, but must use a 0x prefix for hex.
2. [2 points] Try a few more examples with (small) positive numbers in R2 and R3. What does this code do with R2 and R3 to compute R4? Answer with one simple line of valid C code of the form: R4 = exp;, where exp is a simple C expression using the the variables R2 and R3 (corresponding to the HW registers) and any simple C operations that accomplish the same effect as these combined HW instructions. exp must not include C statements like loops or conditionals. It should also not include any function calls.
3. [2 points] Explain why the single C line in the previous subpart calculates the same result as the instructions in program P2.

Q5. Taking Control [9 points]

In this exercise you will design the Control Unit for the HW microarchitecture we designed in class. (Base your design on the figure below labeled “HW microarchitecture datapath”.) Our design focused on connecting the datapath components (e.g., the register file, ALU, memory, etc.), but we left the Control Unit as black box.

Instead of drawing gates for the Control Unit, we will just build out a truth table for the behavior of the control unit. The truth table should take in a 4-bit HW instruction opcodes as input. As output, the truth table should produce control bits for each output control line, as needed for the specific opcode.

The worksheet provides a skeleton of the Control Unit truth table. The output control lines we need are:

Reg Write (1 bit): controls the write-enable of the register file. If an instruction should both read and write to the register file, the write-enable should be 1 (that is, the read data lines still produce the contents of their respective registers even when the write-enable is 1).
ALU Op (4 bits): controls the ALU (with the same 4 ALU control lines we developed earlier; see the end of the Arithmetic slides).
- From left to right: Invert A, Negate B, Op₁, Op₀.
Mem Store (1 bit): controls the write-enable of the data memory.
Mem (1 bit): controls three multiplexers deciding which input to provide for:
- the register file’s write address; and
- the register file’s write data; and
- the ALU’s second operand.
Branch (1 bit, added during class): controls whether to choose the next PC based on a branch target and test.

Your task for this problem is to fill in the rows for each of the following instructions, as shown on the worksheet: SW, ADD, SUB, AND, OR, BEQ.

Notes:

For this problem, all answers should be in binary (not decimal or hexadecimal).
We have completed the row for LW (“load word”) as an example. For LW, the control lines are:
- Opcode: 0000, as shown in the the HW ISA instruction table in slide 14 of the arch slides.
- Reg Write: 1, since loads do write to the register file
- ALU Op: 0010, since for a LW, the design needs to add the offset to a register’s contents.
- Mem Store: 0, since loads do not write to the memory.
- Mem: 1, to control the 3 multiplexers for the register file’s write and the ALU’s second operand.
- Branch: 0, since the design should not branch for the load instruction (LW should use the standard increment-PC-by-2 calculation).
For this problem, ignore the last column (with label Jump), which will be be filled in for a later problem.
For this problem, ignore the rows for instructions NAND and JMP, which will be filled in for later problems.

Q6. Instruction `NOT` Missing (12 points)

The HW ISA introduced in class lacks an instruction for bitwise negation. A logical choice would be NOT Rs,Rd, which takes the bitwise negation of the value in source register Rs and stores it into the destination register Rd.

[4 points] One way to implement NOT is to replace any occurrence of a NOT instruction by a sequence of existing HW instructions that has the same effect as NOT.
1. [1 point] Given a two’s complement signed integer X, express ~X in terms of X and arithmetic on signed integers.
2. [3 points] Based on the answer to the previous subpart, given source register identifier Rs and destination register identifier Rd, write a sequence of one or more HW instructions that has the same effect as NOT Rs,Rd. The replacement instructions may use values from any registers, but must store values only in Rd. Instructions may not store values in any other register or data memory location. The sequence may use only the instructions defined in the instruction table in slide 14 of the arch slides.
[6 points] A second way to implement NOT is to add a new instruction to the ISA, complete with an unique encoding, and to extend the microarchitecture to implement it. First, you will instead implement a related instruction (NAND) in this way.
1. [3 points] Implement a related instruction, NAND, as a new ISA instruction. In the worksheet, fill in a 16-bit encoding of the NAND instruction, following the format shown in the instruction table in slide 14 of the arch slides. You may choose any unused opcode for for your encoding.
2. [3 points] Next, fill in the row labeled NAND of the Control Unit truth table from earlier in the worksheet, to indicate how to control the datapath to implement NAND. Do not not add extra logic to the datapath diagram! Instead, use properties of boolean operations to calculate NAND using the logic already present.
[2 points] Implement NOT as a new ISA instruction in the worksheet by filling in a 16-bit encoding of the NOT instruction. You should effectively translate the the NOT instruction to an appropriate NAND instruction so that no extra logic is needed in the datapath diagram and no extra row is needed in the Control Unit truth table.

Q7. Jumping into the Unknown [12 points]

Add logic to the HW microarchitecture to implement the JMP instruction, which we did not implement in class. JMP sets the PC to the absolute instruction address given by its argument (a number) multiplied by 2. For example, JMP 3 sets the PC to 0x6, causing the instruction stored at address 0x6 (i.e., the 3rd instruction in the program) to be executed next.

[8 points] In the wiring diagram for the microarchitecture datapath on the worksheet, you will add:
- One new Jump control line from the Control Unit (in Problem 4), carrying 1 if the current instruction is JMP and 0 otherwise.
- Wiring and logic using the Jump control line and the offset bits from the JMP instruction encoding to correctly update the PC if the current instruction is JMP. Note that on the worksheet, we have added the red wire and extra space toward the top left of the microarchitecture diagram for you to draw and label your new logic.
Notes:
- Be sure to label the red wire with the bit range [??, ??] that indicates which range of bits are used for the red wire (as done in the diagram for other operands).
- The JMP hardware will need to use a Zero Extend unit that is different from the existing Sign Extend unit. A Zero Extend unit will add extra zeroes at the front of a number to make it the correct length as indicated by the bus size of the output of the unit.
- Do not attempt to reuse the existing Shift unit; if you need such a shift, use a separate copy of the Shift unit.
[1 point] In the Control Unit truth table from earlier in the worksheet, flesh out the contents of the the Jump control line in the final column.
[3 points] In the Control Unit truth table from earlier in the worksheet, flesh out the contents of the row for the JMP opcode. Note that if the bits for an entry don’t matter, you should say this explicitly