CS 240: Architecture

Assignment: Architecture

Assign: Wednesday 22 February
Due: Friday 3 March
Policy: Individual graded synthesis assignment
Submit: Upload a PDF written on the cs240-arch-worksheet-s23.pdf submission sheet
Reference:
- Sequential Logic
- A Simple Processor

Assignment: Architecture
Exercises
Submission

Exercises

Please write your answers on the cs240-arch-worksheet-s23.pdf submission sheet to streamline the grading process. Submit your work by scanning a PDF of all worksheet pages (in order) and uploading it to Gradescope. Remember that this course uses your Gradescope account attached to your Wellesley email address (the variety that matches your username). You can merge accounts across multiple email addresses if helpful.

Notes:

This assignment is worth 72 points.
You might want to start with the Problems on the HW ISA and its hardware implementation (Problems 3–6) before you attempt the problems on Sequential Circuits (Problems 1 & 2).

1. Flop-flip-flopping [10 points]

Consider the following circuit that uses falling-edge triggered D flip-flops.

Cycles Completed	Q₂	Q₁	Q₀
0 (initial state)	0	0	0
1
2
…
10

Table template for Flop-flip-flopping part b.

Assume the Clock input, and outputs Q₀, Q₁, and Q₂, are all initially 0.
In one Clock cycle, the Clock is first at level 0 for a duration d, then has a rising edge that jumps up to level 1 for the same duration d, and then ends with a falling edge that jumps back down to 0. On your worksheeet, the waveform for Clock has been given for ten cycles ($t=0$ to $t=10$)

[3] Draw waveforms for Q₀, Q₁, and Q₂ from $t=0$ to $t=10$.
[5] Fill out a table summarizing the values of Q₀, Q₁, and Q₂ after each Clock cycle.
[2] Briefly explain what this circuit does at a high level, treating the Q outputs together. A phrase to a few sentences is enough.

2. Reconstructing Memories (RAM Puzzles) [15 points]

In these exercises, you will show how one or more provided RAMs (random access memories) can be used to implement a RAM with different dimensions that has the same number of stored bits as the provided RAMs.

Provided RAMs

In an A×B RAM, A is the number of addressable locations (cells) in the memory and B is the width of each location (cell) in bits. Recall that a RAM is much like a register file, but the underlying storage technology is not based on flip-flops. Memory addresses are analogous to register numbers in a register file; memory locations (cells) are analogous to registers in a register file. Unlike the register file we designed, these RAMs support reading only one location at a time.

The provided RAMs have the following form:

Each RAM has a Data In port, a Data Out port, an Address input and a Write Enable input.
Each RAM is always reading from the location given by the current Address, expressing its contents on Data Out.
Each RAM also stores the value on Data In into the location given by the current Address iff the Write Enable input carries a 1.
Each RAM is indivisible and must be used as a “black box”.

Rules

You will design new RAMs with the same kind of inputs, outputs, and externally observable behavior, but with different dimensions. For each:

Draw one or more small boxes of the form above, representing the provided RAM(s), inside a larger box of the same form, representing the RAM you are implementing. (The worksheet already has the provided RAM(s) drawn for you inside the larger box that represents the RAM you are implementing.)
Connect inputs and outputs of the larger box with those of the smaller boxes, possibly adding combinational logic building blocks (e.g., muxes, decoders, gates) that make the logic correct.
Do not draw each individual line of a bus (e.g., Address); use hatch marks and width annotations instead. You must specify the number of bits (width of bus) carried by each bus.
When a larger bus splits into or is joined from smaller buses, you must show which bit ranges from the larger bus correspond to the width of each smaller bus, as shown in the examples below:

Exercises

[5 points] Design a 256×8 RAM by adding minimal wiring and logic to two identical 256×4 RAM components. Label the widths (in bits/lines) of the ports Address, Data In, and Data Out for each 256×4 RAM you use and for the 256×8 RAM you implement.
[10 points] Design a 64K×8 RAM by adding minimal wiring and logic to a single provided 16K×32 RAM. (K = “kilo” = 1024)
- Unlike the previous part, this one provided RAM component has the exactly same capacity as the desired final product, though it has different dimensions:
  - The provided RAM has fewer addressable memory locations (16K) than the RAM you will design (64K)
  - The provided RAM has a larger addressable unit of data (32 bits = 4 bytes) than the RAM you will design (8 bits = 1 byte).
- The Data Out of the inner RAM supplies 4 bytes, but the Data Out of the outer RAM can provide only 1 byte. How will you determine which of the 4 bytes is chosen? (Hint: what combinational logic building block chooses between inputs?)
- The Data In of the outer RAM supplies 1 byte, but the Data In of the inner RAM requires 4 bytes.
  - Where will the other 3 bytes come from?
  - When writing a 1-byte value into the inner RAM, you must be sure to change only the relevant 8 bits of storage. All other storage bits should hold exactly the same values as before the write operation. Think carefully about the combinational logic to implement this. (Hint: You will need one or more MUXes; a 2-to-4 decoder is also very useful in a particularly elegant solution, but there are other solutions that do not use a decoder.)
  - If you join four 1-byte buses to create a 4-byte bus, your wiring notation must unambiguously indicate the order (MSB down to LSB) of the four bytes.
- You might find it helpful to study this C program that uses arrays of different dimensions to illustrate in software the key ideas that you will need to solve this RAM puzzle in hardware.

3. A Loopy Program [14 points]

In this exercise you will simulate the execution of an HW ISA program with a loop.

Background

In the solution slides for the HW ISA architecure, there are examples (slides 8-11) of using an execution table to simulate the execution of HW ISA programs, one of which contains a loop.

Here we consider a different program that has a loop to remind you of the conventions used in an execution table. Suppose that register R2 has been initialized with the signed decimal number 3 and register R3 has been initialized with the signed decimal number 7. Also suppose that the program P0 has the following Instruction Memory (IM):

Address	Contents
0x0–0x1	BEQ R2, R3, 3
0x2–0x3	ADD R2, R1, R2
0x4–0x5	SUB R3, R1, R3
0x6–0x7	JMP 0
0x8–0x9	HALT

Then here is an execution table that shows the step-by-step execution of the program P0:

PC	Instruction	State Changes
0x0	BEQ R2, R3, 3	PC ← PC+2 = 0+2 = 2 (because 3 = R[2] != R[3] = 7)
0x2	ADD R2, R1, R2	R[2] ← R[2]+R[1] = 3+1 = 4; PC ← PC+2 = 2+2 = 4
0x4	SUB R3, R1, R3	R[3] ← R[3]-R[1] = 7-1 = 6; PC ← PC+2 = 4+2 = 6
0x6	JMP 0	PC ← 2*0 = 0
0x0	BEQ R2, R3, 3	PC ← PC+2 = 0+2 = 2 (because 4 = R[2] != R[3] = 6)
0x2	ADD R2, R1, R2	R[2] ← R[2]+R[1] = 4+1 = 5; PC ← PC+2 = 2+2 = 4
0x4	SUB R3, R1, R3	R[3] ← R[3]-R[1] = 6-1 = 5; PC ← PC+2 = 4+2 = 6
0x6	JMP 0	PC ← 2*0 = 0
0x0	BEQ R2, R3, 3	PC ← PC+2+(2*3) = 0+2+6 = 8 (because 5 = R[2] == R[3] = 5)
0x8	HALT	program stops executing

Your Task

This task considers a HW ISA program P1 with the following Instruction Memory IM:

Address	Contents
0x00–0x01	ADD R0, R0, R2
0x02–0x03	ADD R1, R1, R3
0x04–0x05	ADD R3, R1, R3
0x06–0x07	ADD R3, R3, R4
0x08–0x09	BEQ R3, R4, 3
0x0A–0x0B	ADD R2, R4, R2
0x0C–0x0D	SUB R4, R1, R4
0x0E–0x0F	JMP 4
0x10–0x11	HALT

[10 points] In the worksheet, create an execution table that shows the execution of program P1. Use the same notational conventions used in the example execution table for P0 above. Any numbers beginning with 0x will be interpreted as hexidecimal; any numbers not beginning with 0x will be interpreted as decimal.
[1.5 points[ Show the final values of the registers R2, R3, and R4 when the program execution halts. Again, any numbers beginning with 0x will be interpreted as hexidecimal; any numbers not beginning with 0x will be interpreted as decimal.
[2.5 points] Translate program P1 (except for the final HALT instruction) into a sequence of statements in one of Python, Java, or JavaScript (you choose!), where one of the statements is a while loop. Your program should include assignments to variables named R2, R3, and R4 that represent the registers with these names in P1. You may assume that the variables R0 and R1 have been initialized to 0 and 1, respectively, before the statements you write.

4. Taking Control (8 points)

In this exercise you will design the Control Unit for the HW microarchitecture we designed in class. (See the figure below labeled “HW microarchitecture datapath”.) Our design focused on connecting the datapath components (e.g., the register file, ALU, memory, etc.), but we left the Control Unit as black box.

Flesh out a truth table with 4-bit HW instruction opcodes as inputs, plus one output for each of the control lines, outputs of the Control Unit that are control inputs to other components. As shown below, the table should have one row for each of the seven instructions (one row for each opcode) except JMP from the first instruction table here.)

Instruction	Opcode_3:0	Reg Write	ALU Op_3:0	Mem Store	Mem	Branch
LW	0000
…	…	…	…	…	…	…

Control Unit truth table

The control lines we need are:

Reg Write (1 bit): controls the write-enable of the register file.
ALU Op (4 bits): controls the ALU (with the same 4 ALU control lines we developed earlier).
- From left to right: Invert A, Negate B, Op₁, Op₀.
Mem Store (1 bit): controls the write-enable of the data memory.
Mem (1 bit): controls three multiplexers deciding which input to provide for:
- the register file’s write address;
- the register file’s write data; and
- the ALU’s second operand.
Branch (1 bit, added during class): controls whether to choose the next PC based on a branch target and test

Please note that Mem Store, Mem, and Branch are all active high (0 = False, 1 = True).

5. Jumping into the Unknown [15 points]

[10 points]

Add logic to the HW microarchitecture depicted in the figure above to implement the JMP instruction, which we did not implement in class. JMP sets the PC to the absolute instruction address given by its argument (a number) multiplied by 2. For example, JMP 3 sets the PC to 0x6, causing the instruction stored at address 0x6 (i.e., the 3rd instruction in the program) to be executed next.
1. [7 points] In the wiring diagram for the microarchitecture on the worksheet, you will add:
  - One new Jump control line from the Control Unit (in Problem 4), carrying 1 if the current instruction is JMP and 0 otherwise.
  - Wiring and logic using the Jump control line and the offset bits from the JMP instruction encoding to store the JMP instructions’s target address to the PC if and only if the current instruction is JMP. You will need to “cut” one existing wire to splice in some new logic.
  Notes:
  - The JMP hardware will need to use a Zero Extend unit that is different from the existing Sign Extend unit.
  - Do not attempt to reuse the existing Shift Left by 1 unit; if you need such a shift, use a separate copy of this unit.
2. [1 point] In the Control Unit Truth Table from the previous problem, flesh out the contents of the the Jump control line in the final column.
3. [2 points] In the Control Unit Truth Table from the previous problem, flesh out the contents of the row for the JMP opcode.
[5 points] Assume R2 and R3 hold unknown input values when the following program starts. R4 will hold an output when the program stops. The address of each instruction in the instruction memory is shown at left. The (new) HALT instruction stops the computer.
```
# R2 and R3 hold program inputs here.
0:  AND R2, R2, R4
2:  AND R3, R3, R5
4:  BEQ R5, R0, 3
6:  SUB R5, R1, R5
8:  ADD R4, R4, R4
A:  JMP 2
C:  HALT # Stops execution.
# What value is in R4?
```
1. [3 points] Execute the program assuming R2 holds 5 and R3 holds 3. What are the final values in the registers R2, R3, R4, and R5 when the program reaches HALT? You may write your ansswers in either decimal or hex, but must use a 0x prefix for hex.
2. [2 points] Try a few more examples with (small) positive numbers in R2 and R3. What does this code do with R2 and R3 to compute R4? Answer with one simple line of valid C code of the form: R4 = exp;, where exp is a simple C expression using the the variables R2 and R3 (corresponding to the HW registers) and any simple C operations that accomplish the same effect as these combined HW instructions. exp must not include C statements like loops or conditionals.

6. Instruction Not Missing (10 points)

The HW ISA introduced in class lacks an instruction for bitwise complement. A logical choice would be NOT Rs,Rd, which takes the bitwise complement of the value in source register Rs and stores into destination register Rd. You will design two implementations of the proposed NOT instruction.

[4 points] One way to implement NOT is to replace any occurence of a NOT instruction by a sequence of existing HW instructions that has the same effect as NOT.

Given source register identifier Rs and destination register identifier Rd, write a sequence of one or more HW instructions that has the same effect as NOT Rs,Rd. The replacement instructions may use values from any registers, but must store values only in Rd. Instructions may not store values in any other register or data memory location. The sequence may use only the instructions defined in class. (See the instruction table here.)

Hint: Think about the relation of bitwise complement and arithmetic rules.
[4 points] A second way to implement NOT is to add a new instruction to the ISA, complete with an unique encoding, and to extend the microarchitecture to implement it. First, implement a related instruction (NAND) in this way.
1. [3] Define a 16-bit encoding of the NAND instruction as in the instruction table here. Choose any unused opcode for the encoding.
2. [1] Add a row to the Control Unit Table from the previous problem to indicate how to control the datapath to implement NAND. Do not add extra logic to the datapath.
[2 points] Consider how to implement NOT using NAND. Define an encoding for the NOT instruction directly in the ISA. Choose the encoding carefully to minimize additions to the Control Unit and the datapath. Assuming you have completed (b), an ideal encoding requires no new rows or columns in the Control Unit table.

Hint: Think of how to translate the NOT instruction to a NAND instruction and use this intuition when designing the bit encoding of NOT.

7 Extra Fun Points Affixed or Afloat in a C of Numbers

Extra Fun (Completely Optional!)

This exercise is completely optional. It will not earn any extra credit points, and is only for your own edification. It requires reading about fixed point and floating point number representations, which will not be covered in class in Fall 2022.

The Sea language (an imaginary programming language otherwise identical to C) supports a signed fixed8ths char type¹ using an 8-bit signed fixed point number system with precision to the 1/8 (eighths) place.
1. What are the minimum and maximum numbers representable by a signed fixed8ths char? Write your answer as a base ten number using fractions or decimals.
2. What are the minimum and maximum numbers representable by a signed fixed32nds char, which uses 8-bit signed fixed-point representation with precision to the 32nds place? Write your answer as a base ten number using fractions or decimals.
3. 8-bit two's complement adder.
  A, B, and Sum are 8-bit buses.
  
  We provide a hardware adder design (at right) that supports 8-bit two’s complement addition (e.g., for signed char numbers). Draw a logic diagram for a hardware adder that supports the signed fixed8ths char number system. You may use the provided two’s complement adder as a black box (no need to draw the internals) and extend it with whatever wiring and logic is necessary. There is a “trick” here! Keep it simple.
Using the 6-bit floating-point encoding defined in class, give decimal notation of the numbers represented by the following floating-point encodings.
1. 110101
2. 100001
3. 011100
4. 000011
5. 010010
6. 111101

Submission

char is often used for characters in C, including with literal character values like 'c', but it is just a number type. It is synonymous with signed char in most implementations, just a signed 8-bit integer. unsigned char is an unsigned 8-bit integer. ↩

Assignment: Architecture

Contents

Exercises

1. Flop-flip-flopping [10 points]

2. Reconstructing Memories (RAM Puzzles) [15 points]

Provided RAMs

Rules

Exercises

3. A Loopy Program [14 points]

Background

Your Task

4. Taking Control (8 points)

5. Jumping into the Unknown [15 points]

6. Instruction Not Missing (10 points)

7 Extra Fun Points Affixed or Afloat in a C of Numbers

Submission