

# A Simple Processor

- 1. A simple Instruction Set Architecture
- 2. A simple microarchitecture (implementation): Data Path and Control Logic





https://cs.wellesley.edu/~cs240/



### Program, Application

Programming Language

Compiler/Interpreter

**Operating System** 

**Instruction Set Architecture** 

Microarchitecture

**Digital Logic** 

Devices (transistors, etc.)

Solid-State Physics











### **ISAs** are an abstract model of the underlying hardware.

### This week:

HW ISA

An example ISA and hardware implementation for CS240





# HWW SA Summary (details to follow)

#### Registers

- Register size = 16 bits
- Number of registers = 16
- R0 always holds 0
- R1 always holds 1.

### Memory

- Access 16 bits at once
- Byte-addressable (new address every 8 bits)

Word size = 16 bits (2 bytes)



ALU computes on 16-bit values.  $\bullet$ 

Instruction Fetch and Decode

- Instructions are 16 bits in size
- Stored in separate memory
- **Program counter (PC)** register holds address of next instruction



#### **R:** Register File





## Using your understanding of powers of 2 needed to make selections, how many bits should be on the labeled busses?



Start the presentation to see live content. For screen share software, share the entire screen. Get help at **pollev.com/app** 

r = 16, w = 16

r = 16, w = 4

#### None of the above



#### Using your understanding of powers of 2 needed to make selections, how many bits should be on the labeled busses?



| r = 8, w = 8      |    |
|-------------------|----|
|                   | 0% |
|                   |    |
| r = 16, w = 16    |    |
|                   | 0% |
|                   |    |
| r = 4, w = 16     |    |
|                   | 0% |
|                   |    |
| r = 16, w = 4     |    |
|                   | 0% |
|                   |    |
| None of the above |    |
|                   | 0% |

Start the presentation to see live content. For screen share software, share the entire screen. Get help at **pollev.com/app** 



#### Using your understanding of powers of 2 needed to make selections, how many bits should be on the labeled busses?



| r = 8, w = 8      |    |
|-------------------|----|
|                   | 0% |
|                   |    |
| r = 16, w = 16    |    |
|                   | 0% |
|                   |    |
| r = 4, w = 16     |    |
|                   | 0% |
|                   |    |
| r = 16, w = 4     |    |
|                   | 0% |
|                   |    |
| None of the above |    |
|                   | 0% |

Start the presentation to see live content. For screen share software, share the entire screen. Get help at **pollev.com/app** 



## **R:** Register File





Word size = 16 bits, # registers = 16

$$W = 2$$

**Read ports** 

 $\overrightarrow{\mathbb{W}}$ 

We'll think of the register file like this:

Abstraction!

R0 always holds hardcoded 0 R1 always holds hardcoded 1

R2 – R15: general purpose (instructions can use them to hold anything)

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |





#### Memory is byte-addressable, accesses full wo

**Memory** is "Little Endian": the "little" (low) by at the lower address.

Example: storing 1 at address 0x0



We'll think of the data memory like this:

|                   |           | -        |      |
|-------------------|-----------|----------|------|
| ords (16 bits)    | Address   | Contents |      |
|                   | 0x0-0x1   | 0x01     | 0x00 |
| yte is stored     | 0x2 – 0x3 |          |      |
| yte is stored     | 0x4 – 0x5 |          |      |
|                   | 0x6 - 0x7 |          |      |
|                   | 0x8 – 0x9 |          |      |
| 0 yields          | 0xA – 0xB |          |      |
| y y i ci ci ci ci | 0xC – 0xD |          |      |
|                   | •••       |          |      |
|                   |           |          |      |



## HW ISA IN: Instruction Memory

#### Instructions are 1 word in size.

#### Separate *instruction memory*.

### **Program Counter (PC) register**

holds address of next instruction to execute.





We'll think of the instruction memory like this:

| Address   | Contents |
|-----------|----------|
| 0x0 - 0x1 |          |
| 0x2 – 0x3 |          |
| 0x4 – 0x5 |          |
| 0x6 – 0x7 |          |
| 0x8 – 0x9 |          |
| •••       |          |



## HW ISA



#### **Abstract Machine**

#### M: Data Memory

| Address   | Contents |  |
|-----------|----------|--|
| 0x0 - 0x1 |          |  |
| 0x2 – 0x3 |          |  |
| 0x4 – 0x5 |          |  |
| 0x6 – 0x7 |          |  |
| 0x8 – 0x9 |          |  |
| OxA – OxB |          |  |
| 0xC - 0xD |          |  |
| •••       |          |  |

#### **PC:** Program Counter

#### Processor Loop

- 1. ins  $\leftarrow$  IM[PC]
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

#### **IM:** Instruction Memory

| Address   | Contents |
|-----------|----------|
| 0x0 - 0x1 |          |
| 0x2 – 0x3 |          |
| 0x4 – 0x5 |          |
| 0x6 – 0x7 |          |
| 0x8 – 0x9 |          |
| •••       |          |

#### **R:** Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |



## HW ISA Instructions

| ISA Instru                             | uctions                                                     | MSB <b>16</b> | 5-bit E  | ncodir | ng <sub>LSB</sub> |
|----------------------------------------|-------------------------------------------------------------|---------------|----------|--------|-------------------|
| Assembly Syntax                        | Meaning(R = register file,<br>M = data memory)              | Opcode        | Rs       | Rt     | Rd                |
| ADD R <i>s,</i> R <i>t,</i> R <i>d</i> | $R[d] \leftarrow R[s] + R[t]$                               | 0010          | S        | t      | d                 |
| SUB R <i>s,</i> R <i>t,</i> R <i>d</i> | $R[d] \leftarrow R[s] - R[t]$                               | 0011          | <i>S</i> | t      | d                 |
| AND R <i>s,</i> R <i>t,</i> R <i>d</i> | $R[d] \leftarrow R[s] \& R[t]$                              | 0100          | <i>S</i> | t      | d                 |
| OR R <i>s,</i> R <i>t,</i> R <i>d</i>  | $R[d] \leftarrow R[s] \mid R[t]$                            | 0101          | <i>S</i> | t      | d                 |
| W R <i>t, offset</i> (Rs)              | $R[t] \leftarrow M[R[s] + offset]$                          | 0000          | S        | t      | offset            |
| W R <i>t, offset</i> (Rs)              | $M[R[s] + offset] \leftarrow R[t]$                          | 0001          | <b>S</b> | t      | offset            |
| BEQ R <i>s,</i> R <i>t, offset</i>     | If $R[s] == R[t]$ then<br>PC $\leftarrow$ PC + 2 + offset*2 | 0111          | <b>S</b> | t      | offset            |
| JMP offset                             | PC ← offset*2                                               | 1000          |          | offset | -                 |
| HALT                                   | Stops program execution                                     | 1111          |          |        |                   |

JMP offset is unsigned All other offsets are signed







## **Exercise 0**

HW ISA

Fill in the rest of the machine state based on this initial state

|           | wentury  |      |
|-----------|----------|------|
| Address   | Contents |      |
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 – 0x3 | 0x04     | 0x01 |
| 0x4 – 0x5 |          |      |
| 0x6 – 0x7 |          |      |
| 0x8 – 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

#### **PC:** Program Counter

#### Processor Loop

- 1. ins  $\leftarrow$  IM[PC]
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

#### **IM:** Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R1, R1, R2 |
| 0x2 – 0x3 | SW R2, 4(R0)   |
| 0x4 – 0x5 | HALT           |
| 0x6 – 0x7 |                |
| 0x8 – 0x9 |                |
| •••       |                |

#### M. Data Momory

#### **R:** Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |



## Execution Table for *Exercise #0* (shows step-by-step execution) Solutions

| PC  | Instr          |                               |
|-----|----------------|-------------------------------|
| 0x0 | ADD R1, R1, R2 | $R[2] \leftarrow R[1] \delta$ |
| 0x2 | SW R2, 4(R0)   | M[R[0] + 4] =                 |
| 0x4 | HALT           | Program exe                   |







### **Exercise 0 Solutions**



#### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 – 0x3 | 0x04     | 0x01 |
| 0x4 – 0x5 | 0x02     | 0x00 |
| 0x6 - 0x7 |          |      |
| 0x8 – 0x9 |          |      |
| OxA – OxB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

#### **PC:** Program Counter

#### **Processor Loop**

- 1. ins  $\leftarrow$  IM[PC]
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

#### **IM:** Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R1, R1, R2 |
| 0x2 – 0x3 | SW R2, 4(R0)   |
| 0x4 – 0x5 | HALT           |
| 0x6 – 0x7 |                |
| 0x8 – 0x9 |                |
| •••       |                |

#### **R:** Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  | 0x0002   |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |





### **Exercise 1 Solutions**



### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 – 0x3 | 0x04     | 0x01 |
| 0x4 – 0x5 | 0x04     | 0x00 |
| 0x6 – 0x7 |          |      |
| 0x8 – 0x9 |          |      |
| 0xA – 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

#### **PC:** Program Counter

#### **Processor Loop**

- IM[PC] 1. ins  $\leftarrow$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

#### **IM:** Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0-0x1   | LW R3, 0(R0)   |
| 0x2 - 0x3 | LW R4, 2(R0)   |
| 0x4 – 0x5 | AND R3, R4, R5 |
| 0x6 – 0x7 | SW R5, 4(R0)   |
| 0x8 – 0x9 | HALT           |
| •••       |                |

#### **R:** Register File

| Reg | Contents      |          |
|-----|---------------|----------|
| RO  | 0x0000        |          |
| R1  | 0x0001        | 1!<br>1! |
| R2  |               | th       |
| R3  | 0x000F        | bi       |
| R4  | <b>0x0104</b> |          |
| R5  | 0x0004        |          |
| R6  |               |          |
| R7  |               |          |
| R8  |               |          |
| R9  |               |          |
| R10 |               |          |
| R11 |               |          |
| R12 |               |          |
| R13 |               |          |
| R14 |               |          |
| R15 |               |          |



#### Execution Table for *Exercise* #1 (shows step-by-step execution) **Solutions**

| PC  | Instr          |               |
|-----|----------------|---------------|
| 0x0 | LW R3 0(R0)    | R[3] ← M[     |
| 0x2 | LW R4, 2(R0)   | R[4] ← M[     |
| 0x4 | AND R3, R4, R5 | R[5] ← R[3    |
| 0x6 | SW R5, 4(R0)   | M[R[0] + 4] = |
| 0x8 | HALT           | Program exe   |
|     |                |               |



The bytes are swapped from the memory M picture on the previous page because the bytes are stored in Little Endian order.

E.g., for the byte pair 0x00 at address 0x0 and 0x0F at address 0x1, the byte at the lower address 0x0 is stored at the "little end" (LSB) of the 2-byte word. As we'll soon see, this is consistent with the byte ordering in the C programming language.







### **Exercise 2 Solutions**



What is this code doing at a high level?

Multiplies the contents of R9 and R10!

**PC:** Program Counter

#### Processor Loop

- IM[PC] 1. ins  $\leftarrow$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

#### M: Data Memory

| Address   | Contents |
|-----------|----------|
| 0x0 - 0x1 | 0x0F     |
| 0x2 – 0x3 | 0x04     |
| 0x4 - 0x5 |          |
| 0x6 – 0x7 |          |
| 0x8 – 0x9 |          |
| OxA – OxB |          |
| 0xC - 0xD |          |
| •••       |          |

#### **IM:** Instruction Memory

| Address   | Contents   |
|-----------|------------|
| 0x0-0x1   | SUB R8, R8 |
| 0x2 – 0x3 | BEQ R9, RO |
| 0x4 – 0x5 | ADD R10, F |
| 0x6 – 0x7 | SUB R9, R1 |
| 0x8 – 0x9 | JMP 1      |
| 0xA - 0xB | HALT       |
| •••       |            |



| 8, R8  |
|--------|
| ), 3   |
| R8, R8 |
| ., R9  |
|        |
|        |
|        |

#### **R:** Register File

| Reg | Contents (time: $\rightarrow$ )                                                           |
|-----|-------------------------------------------------------------------------------------------|
| RO  | 0x0000                                                                                    |
| R1  | 0x0001                                                                                    |
| R2  |                                                                                           |
| R3  |                                                                                           |
| R4  |                                                                                           |
| R5  |                                                                                           |
| R6  |                                                                                           |
| R7  |                                                                                           |
| R8  | $0x???? \xrightarrow{(1)}{0} x0000 \xrightarrow{(2)}{0} x0003 \xrightarrow{(4)}{0} x0006$ |
| R9  | $3 \xrightarrow{5} 0x0002 \rightarrow 0x0001 \rightarrow 0x0000$                          |
| R10 | 0x0003                                                                                    |
| R11 |                                                                                           |
| R12 |                                                                                           |
| R13 |                                                                                           |
| R14 |                                                                                           |
| R15 |                                                                                           |

| <br> |
|------|
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |



## Execution Table for *Exercise #2* (shows step-by-step execution)

#### **Solutions**

| PC  | Instr           |             |
|-----|-----------------|-------------|
| 0x0 | SUB R8, R8, R8  | R[8] ← R[8  |
| 0x2 | BEQ R9, R0, 3   | PC ← PC+2   |
| 0x4 | ADD R10, R8, R8 | R[8] ← R[1  |
| 0x6 | SUB R9, R1, R9  | R[9] ← R[9  |
| 0x8 | JMP 1           | PC ← 2*1    |
| 0x2 | BEQ R9, R0, 3   | PC ← PC+2   |
| 0x4 | ADD R10, R8, R8 | R[8] ← R[1  |
| 0x6 | SUB R9, R1, R9  | R[9] ← R[9  |
| 0x8 | JMP 1           | PC ← 2*1    |
| 0x2 | BEQ R9, R0, 3   | PC ← PC+2   |
| 0xA | HALT            | Program exe |
|     |                 |             |
|     |                 |             |

#### State Changes

$$[8] - R[8] = 0; PC \leftarrow PC+2 = 0+2 = 2$$

$$2 = 2+2 = 4$$
 (because  $2 = R[9] \neq R[0] = 0$ )

 $10] + R[8] = 3 + 0 = 3; PC \leftarrow PC+2 = 4+2 = 6$ 

 $[9] - R[1] = 2 - 1 = 1; PC \leftarrow PC+2 = 6+2 = 8$ 

-2 = 2+2 = 4 (because  $1 = R[9] \neq R[0] = 0$ )

 $10] + R[8] = 3 + 3 = 6; PC \leftarrow PC+2 = 4+2 = 6$ 

 $[9] - R[1] = 1 - 1 = 0; PC \leftarrow PC+2 = 6+2 = 8$ 

-2+(2\*3) = 4+6 = 10 (because 0 = R[9] = R[0] = 0)

ecution stops



## HW ARCH microarchitecture



#### One possible hardware implementation of the HW ISA



### **Instruction Fetch** (default, unless branch or jump)

### Fetch instruction from memory. Increment program counter (PC) to point to the next instruction.

Processor Loop

| 1. | ins 🗲  | IM[PC] |
|----|--------|--------|
| 2. | PC ←   | PC + 2 |
| 3. | Do ins |        |







## **Instruction Encoding: 3 formats**

#### Arithmetic instructions:

- 2 source register IDs (Rs,Rt)
- 1 destination register ID (Rd)

#### **Memory/branch instructions:**

- address/source register ID (Rs)
- data/source register ID (Rt)
- 4-bit offset

#### Jump instruction:

- 12-bit offset

#### All have 4-bit opcode in MSBs



| 15:12  | 11:8 | 7:4 | 3:0    |
|--------|------|-----|--------|
| opcode | Rs   | Rt  | offset |

| 15:12  | 11:0   |
|--------|--------|
| opcode | offset |



## Arithmetic Instructions

| Instruction           | Meaning                        | Opcode | Rs   | Rt   | Rd   |
|-----------------------|--------------------------------|--------|------|------|------|
| ADD <i>Rs, Rt, Rd</i> | $R[d] \leftarrow R[s] + R[t]$  | 0010   | 0-15 | 0-15 | 0-15 |
| SUB <i>Rs, Rt, Rd</i> | $R[d] \leftarrow R[s] - R[t]$  | 0011   | 0-15 | 0-15 | 0-15 |
| AND <i>Rs, Rt, Rd</i> | $R[d] \leftarrow R[s] \& R[t]$ | 0100   | 0-15 | 0-15 | 0-15 |
| OR <i>Rs, Rt, Rd</i>  | $Rd \leftarrow R[s] \mid R[t]$ | 0101   | 0-15 | 0-15 | 0-15 |
| •••                   |                                |        |      |      |      |

### **Example encoding:** ADD R3, R6, R8



#### **16-bit Encoding**

| code | Rs   | Rt   | Rd   |
|------|------|------|------|
| 10   | 0011 | 0110 | 1000 |



### **Arithmetic Instructions: Instruction Decode, Register Access, ALU**







## **Memory Instructions**

| Instruction       | Meaning                              | Ор   | Rs   | Rt   | Rd     |
|-------------------|--------------------------------------|------|------|------|--------|
| LW Rt, offset(Rs) | $R[t] \leftarrow Mem[R[s] + offset]$ | 0000 | 0-15 | 0-15 | offset |
| SW Rt, offset(Rs) | $Mem[R[s] + offset] \leftarrow R[t]$ | 0001 | 0-15 | 0-15 | offset |
| •••               |                                      |      |      |      |        |

#### **Example encoding:**

## SW R6, -8(R3)





### **Memory Instructions:** Instruction Decode, **Register/Memory Access, ALU**



How can we support arithmetic and memory instructions?

What's shared?











## **Control-flow Instructions**

| Instruction        | Meaning                                                     | Ор   | Rs   | Rt   | Rd     |
|--------------------|-------------------------------------------------------------|------|------|------|--------|
| BEQ Rs, Rt, offset | If $R[s] == R[t]$ then<br>$PC \leftarrow PC + 2 + offset^2$ | 0111 | 0-15 | 0-15 | offset |
| •••                |                                                             |      |      |      |        |

#### Example encoding:

### BEQ R1, R2, -2

#### **16-bit Encoding**

| Ор   | Rs   | Rt   | Rd   |
|------|------|------|------|
| 0111 | 0001 | 0010 | 1110 |



## **Compute branch target for BEQ**





## Make branch decision







## What's missing from what we covered in lecture?

Details of Control Unit  $\bigcirc$ 

- ALU op is **not** instruction opcode; some translation needed
- Reg Write bit (for ADD, SUB, AND, OR, LW)
- Mem Store bit (for SW)
- Mem bit (arithmetic/memory MUX bit)
- Branch bit (for BEQ)  $\bullet$
- Implementation of JMP  $\bigcirc$
- Implementation of HALT (basically stops the clock Ο running the computer; we won't implement this)

See **Arch** Assignment!





## $\mathbb{HW}$ $\mathbb{ARCH}$ not the only implementation

#### Single-cycle architecture

- Simple, (barely!) fits on a slide (and in our heads).
- One instruction takes one clock cycle.
- Slowest instruction determines minimum clock cycle.
- Inefficient.

#### Could it be better?

- Performance, energy, debugging, security, reconfigurability, ...
- Pipelining
- OoO: Out-of-order execution
- Caching
- ... enormous, interesting design space of Computer Architecture



## **Conclusion of unit: Computational Building Blocks (HW)**

#### Lectures

Digital Logic Data as Bits **Integer Representation Combinational Logic** Arithmetic Logic Sequential Logic A Simple Processor

#### Labs

- 1: Transistors to Gates
- 2: Data as Bits
- 3: Combinational Logic & Arithmetic
- 4: ALU & Sequential Logic
- 5: Processor Datapath

#### Topics

- Transistors, digital logic gates
- Data representation with bits, bit-level computation
- Number representations, arithmetic
- Combinational and arithmetic logic
- Sequential (stateful) logic
- Computer processor architecture overview

#### Assignments

Gates Zero Bits

Arch

Mid-semester exam 1: HW February 22

