



# A Simple Processor

- 1. A simple Instruction Set Architecture
- A simple microarchitecture (implementation):
   Data Path and Control Logic

# Motivation

# oftware

```
int x = y * 2;
```

```
int p = q \& 0x0000FFFF;
```

```
for (int i = 0; i < 10; i++) {
    ...
}</pre>
```

# How do we connect these?

# Hardware

**ALU** with Adder (compute)



Registers (local data storage)



RAM (larger/longer data storage)



connection

implementation

# Program, Application

# Programming Language

# Compiler/Interpreter

Operating System

**Instruction Set Architecture** 

Microarchitecture

Digital Logic

Devices (transistors, etc.)

Solid-State Physics

# Hardware



Computer

ISAS define the *interface*between software and hardware

# Computer

ISAs are an abstract model of the underlying hardware.



### This week:

HW ISA

An example ISA and hardware implementation for CS240!

# Basic building block of an ISA:

the instruction!



# ISAs are an abstract model of the underlying hardware.

### This week:

HW ISA

An *example ISA* and *hardware implementation* for CS240!

# HWW ISA Summary (details to follow)

Word size = 16 bits (2 bytes)

Registers

- Register size = 16 bits
- Number of registers = 16
- R0 always holds 0
- R1 always holds 1.



ALU computes on 16-bit values.

Memory

- Access 16 bits at once
- Byte-addressable (new address every 8 bits)

Instruction
Fetch and
Decode

- Instructions are 16 bits in size
- Stored in separate memory
- Program counter (PC) register
   holds address of next instruction

# HWW ISA R: Register File



# Using your understanding of powers of 2 needed to make selections, how many bits should be on the labeled busses?



$$r = 8, w = 8$$

$$r = 16, w = 16$$

$$r = 4, w = 16$$

$$r = 16, w = 4$$

None of the above

# HW ISA R: Register File





We'll think of the register file like this:

R0 always holds hardcoded 0
R1 always holds hardcoded 1

R2 – R15: general purpose (instructions can use them to hold anything)

| Reg | Contents |
|-----|----------|
| R0  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |
|     | 10       |

### Word size = 16 bits, # registers = 16



# HIW ISA M: Data Memory



We'll think of the data memory like this:

Memory is byte-addressable, accesses full words (16 bits)

**Memory** is "Little Endian": the "little" (low) byte is stored at the lower address.

Example: storing 1 at address 0x0 yields

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x01     | 0x00 |
| 0x2 - 0x3 |          |      |
| 0x4 - 0x5 |          |      |
| 0x6 - 0x7 |          |      |
| 0x8 – 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x01     | 0x00 |
| 0x2 – 0x3 | 0x23     | 0x45 |
| 0x4 – 0x5 | 0x67     | 0xab |
| 0x6 – 0x7 |          |      |
| 0x8 - 0x9 |          |      |
| 0xA – 0xB |          |      |
| 0xC - 0xD |          |      |
|           |          |      |

0x2345

0x4523

0x2300

0x0023

0x2367

# HIW ISA IM: Instruction Memory

Instructions are 1 word in size.

Separate instruction memory.

# Program Counter (PC) register

holds address of next instruction to execute.





We'll think of the instruction memory like this:

| Address   | Contents |
|-----------|----------|
| 0x0 - 0x1 |          |
| 0x2 - 0x3 |          |
| 0x4 - 0x5 |          |
| 0x6 - 0x7 |          |
| 0x8 - 0x9 |          |
| •••       |          |



### M: Data Memory

| TVII Data IVICIIIOI y |          |  |
|-----------------------|----------|--|
| Address               | Contents |  |
| 0x0 - 0x1             |          |  |
| 0x2 - 0x3             |          |  |
| 0x4 - 0x5             |          |  |
| 0x6 - 0x7             |          |  |
| 0x8 - 0x9             |          |  |
| 0xA - 0xB             |          |  |
| 0xC - 0xD             |          |  |
| •••                   |          |  |

# PC: Program Counter

### Processor Loop

- 1.  $ins \leftarrow IM[PC]$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

### IM: Instruction Memory

| Address   | Contents |
|-----------|----------|
| 0x0 - 0x1 |          |
| 0x2 - 0x3 |          |
| 0x4 - 0x5 |          |
| 0x6 - 0x7 |          |
| 0x8 - 0x9 |          |
| •••       |          |

### R: Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |

# HW ISA Instructions

| Assembly Syntax    | Meaning (R = register file,<br>M = data memory)          | Opcode | Rs | Rt     | Rd     |                          |
|--------------------|----------------------------------------------------------|--------|----|--------|--------|--------------------------|
| ADD Rs, Rt, Rd     | $R[d] \leftarrow R[s] + R[t]$                            | 0010   | S  | t      | d      |                          |
| SUB Rs, Rt, Rd     | $R[d] \leftarrow R[s] - R[t]$                            | 0011   | S  | t      | d      | - Arithmetic             |
| AND Rs, Rt, Rd     | $R[d] \leftarrow R[s] \& R[t]$                           | 0100   | S  | t      | d      | Antimietic               |
| OR Rs, Rt, Rd      | $R[d] \leftarrow R[s] \mid R[t]$                         | 0101   | S  | t      | d      |                          |
| LW Rt, offset(Rs)  | $R[t] \leftarrow M[R[s] + offset]$                       | 0000   | S  | t      | offset |                          |
| SW Rt, offset(Rs)  | $M[R[s] + offset] \leftarrow R[t]$                       | 0001   | S  | t      | offset | <ul><li>Memory</li></ul> |
| BEQ Rs, Rt, offset | If $R[s] == R[t]$ then $PC \leftarrow PC + 2 + offset*2$ | 0111   | S  | t      | offset | - Control flow           |
| JMP offset         | PC ← offset*2                                            | 1000   |    | offset | •      |                          |
| HALT               | Stops program execution                                  | 1111   |    |        |        |                          |

MSB

16-bit Encoding

JMP offset is

unsigned

All other offsets

are signed

# HIW ISA IM: Instruction Memory

Instructions are 1 word in size.

Separate instruction memory.

# Program Counter (PC) register

holds address of next instruction to execute.





We'll think of the instruction memory like this:

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R0, R1, R2 |
| 0x2 - 0x3 | SUB R2, R1, R3 |
| 0x4 - 0x5 | OR R3, R3, R4  |
| 0x6 - 0x7 |                |
| 0x8 - 0x9 |                |
| •••       |                |



3. Do ins

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R0, R1, R2 |
| 0x2 - 0x3 | SUB R2, R1, R3 |
| 0x4 - 0x5 | OR R3, R3, R4  |
| 0x6 – 0x7 |                |
| 0x8 - 0x9 |                |
|           |                |

ADD SUB OR

None of the above



# Exercise 0

# HW ISA

Fill in the rest of the machine state based on this initial state

### PC: Program Counter

# Processor Loop

- 1.  $ins \leftarrow IM[PC]$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 - 0x3 | 0x04     | 0x01 |
| 0x4 - 0x5 |          |      |
| 0x6 - 0x7 |          |      |
| 0x8 - 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

### IM: Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R1, R1, R2 |
| 0x2 - 0x3 | SW R2, 4(R0)   |
| 0x4 - 0x5 | HALT           |
| 0x6 - 0x7 |                |
| 0x8 - 0x9 |                |
| •••       |                |

### R: Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |

# Execution Table for Exercise #0 (shows step-by-step execution)





| PC  | Instr          | State Changes                                                                  |
|-----|----------------|--------------------------------------------------------------------------------|
| 0x0 | ADD R1, R1, R2 | $R[2] \leftarrow R[1] \& R[1] = 1 + 1 = 0x00002; PC \leftarrow PC+2 = 0+2 = 2$ |
| 0x2 | SW R2, 4(R0)   | $M[R[0] + 4] = M[4] \leftarrow R[2] = 0x0002; PC \leftarrow PC+2 = 6+2 = 8$    |
| 0x4 | HALT           | Program execution stops                                                        |

Reminder: the two bytes will are stored in **Little Endian** order when we store them to memory **M**.

That is, the byte 0x02 will be stored in the "little" end of the word—the lower address of the pair of addresses that store the word. 0x00 will be stored at the higher address.



# Exercise 0 Solutions

# HW ISA

### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 - 0x3 | 0x04     | 0x01 |
| 0x4 - 0x5 | 0x02     | 0x00 |
| 0x6 – 0x7 |          |      |
| 0x8 - 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

# PC: Program Counter

### Processor Loop

- 1.  $ins \leftarrow IM[PC]$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

### IM: Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | ADD R1, R1, R2 |
| 0x2 - 0x3 | SW R2, 4(R0)   |
| 0x4 - 0x5 | HALT           |
| 0x6 - 0x7 |                |
| 0x8 - 0x9 |                |
| •••       |                |

### R: Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  | 0x0002   |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |



# Exercise 1

# HW ISA

Fill in the rest of the machine state based on this initial state

### PC: Program Counter

# Processor Loop

- 1.  $ins \leftarrow IM[PC]$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 - 0x3 | 0x04     | 0x01 |
| 0x4 - 0x5 |          |      |
| 0x6 - 0x7 |          |      |
| 0x8 - 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

### IM: Instruction Memory

| Address   | Contents       |
|-----------|----------------|
| 0x0 - 0x1 | LW R3, 0(R0)   |
| 0x2 - 0x3 | LW R4, 2(R0)   |
| 0x4 - 0x5 | AND R3, R4, R5 |
| 0x6 - 0x7 | SW R5, 4(R0)   |
| 0x8 - 0x9 | HALT           |
| •••       |                |

### R: Register File

| Reg | Contents |
|-----|----------|
| RO  | 0x0000   |
| R1  | 0x0001   |
| R2  |          |
| R3  |          |
| R4  |          |
| R5  |          |
| R6  |          |
| R7  |          |
| R8  |          |
| R9  |          |
| R10 |          |
| R11 |          |
| R12 |          |
| R13 |          |
| R14 |          |
| R15 |          |

# Execution Table for Exercise #1 (shows step-by-step execution)



| PC  | Instr       | State Changes |
|-----|-------------|---------------|
| 0x0 | LW R3 0(R0) |               |
|     |             |               |
|     |             |               |
|     |             |               |
|     |             |               |
|     |             |               |



# Exercise 2

# HW ISA

Fill in the rest of the machine state based on this initial state

# PC: Program Counter

### Processor Loop

- 1.  $ins \leftarrow IM[PC]$
- 2.  $PC \leftarrow PC + 2$
- 3. Do ins

### M: Data Memory

| Address   | Contents |      |
|-----------|----------|------|
| 0x0 - 0x1 | 0x0F     | 0x00 |
| 0x2 - 0x3 | 0x04     | 0x01 |
| 0x4 - 0x5 |          |      |
| 0x6 – 0x7 |          |      |
| 0x8 – 0x9 |          |      |
| 0xA - 0xB |          |      |
| 0xC - 0xD |          |      |
| •••       |          |      |

### IM: Instruction Memory

| Address   | Contents        |
|-----------|-----------------|
| 0x0 - 0x1 | SUB R8, R8, R8  |
| 0x2 - 0x3 | BEQ R9, R0, 3   |
| 0x4 - 0x5 | ADD R10, R8, R8 |
| 0x6 - 0x7 | SUB R9, R1, R9  |
| 0x8 - 0x9 | JMP 1           |
| 0xA - 0xB | HALT            |
| •••       |                 |

## R: Register File

| Reg | Contents (time: → ) |
|-----|---------------------|
| R0  | 0x0000              |
| R1  | 0x0001              |
| R2  |                     |
| R3  |                     |
| R4  |                     |
| R5  |                     |
| R6  |                     |
| R7  |                     |
| R8  |                     |
| R9  | 0x0002              |
| R10 | 0x0003              |
| R11 |                     |
| R12 |                     |
| R13 |                     |
| R14 |                     |
| R15 |                     |

# Execution Table for Exercise #2 (shows step-by-step execution)



| PC  | Instr          | State Changes |
|-----|----------------|---------------|
| 0x0 | SUB R8, R8, R8 |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |
|     |                |               |

# HW ARCH microarchitecture



One possible hardware implementation of the HW ISA

# Instruction Fetch (default, unless branch or jump)

Fetch instruction from memory.

Increment program counter (PC)

to point to the next instruction.

Processor
Loop

1.  $ins \leftarrow IM[PC]$ 2.  $PC \leftarrow PC + 2$ 3. Do ins





D-flip-flop Ripple-carry adder Encoder A & B

B & C

C & D

# Instruction Encoding: 3 formats

# **Arithmetic instructions:**

- 2 source register IDs (Rs,Rt)
- 1 destination register ID (Rd)

### All have 4-bit opcode in MSBs

| 15:12  | 11:8 | 7:4 | 3:0 |
|--------|------|-----|-----|
| opcode | Rs   | Rt  | Rd  |

# Memory/branch instructions:

- address/source register ID (Rs)
- data/source register ID (Rt)
- 4-bit offset

| 15:12  | 11:8 | 7:4 | 3:0    |
|--------|------|-----|--------|
| opcode | Rs   | Rt  | offset |

### Jump instruction:

- 12-bit offset

| 15:12  | 11:0   |
|--------|--------|
| opcode | offset |

# **Arithmetic Instructions**

**16-bit Encoding** 

| Instruction    | Meaning                        | Opcode | Rs   | Rt   | Rd   |
|----------------|--------------------------------|--------|------|------|------|
| ADD Rs, Rt, Rd | $R[d] \leftarrow R[s] + R[t]$  | 0010   | 0-15 | 0-15 | 0-15 |
| SUB Rs, Rt, Rd | $R[d] \leftarrow R[s] - R[t]$  | 0011   | 0-15 | 0-15 | 0-15 |
| AND Rs, Rt, Rd | $R[d] \leftarrow R[s] \& R[t]$ | 0100   | 0-15 | 0-15 | 0-15 |
| OR Rs, Rt, Rd  | $Rd \leftarrow R[s] \mid R[t]$ | 0101   | 0-15 | 0-15 | 0-15 |
| • • •          |                                |        |      |      |      |

# Example encoding:

ADD R3, R6, R8

| Opcode | Rs   | Rt   | Rd   |
|--------|------|------|------|
| 0010   | 0011 | 0110 | 1000 |

# Arithmetic Instructions: Instruction Decode, Register Access, ALU



# The control unit

A large instantiation of a truth table that controls parts of the microarchitecture



Input: the opcode from the instructions

Output: many wires controlling decisions

You will implement the control unit on the Arch Assignment!

# Memory Instructions

| Instruction       | Meaning                              | Ор   | Rs   | Rt   | Rd     |
|-------------------|--------------------------------------|------|------|------|--------|
| LW Rt, offset(Rs) | $R[t] \leftarrow Mem[R[s] + offset]$ | 0000 | 0-15 | 0-15 | offset |
| SW Rt, offset(Rs) | $Mem[R[s] + offset] \leftarrow R[t]$ | 0001 | 0-15 | 0-15 | offset |
| • • •             |                                      |      |      |      |        |

# Example encoding:

SW R6, -8(R3)

| Opcode | Rs   | Rt   | Rd   |
|--------|------|------|------|
| 0001   | 0011 | 0110 | 1000 |

# Memory Instructions: Instruction Decode, Register/Memory Access, ALU

How can we support arithmetic and memory instructions?



Choose between Arithmetic/Memory instructions with MUXs



# Control-flow Instructions

16-bit Encoding

| Instruction        | Meaning                                                  | Ор   | Rs   | Rt   | Rd     |
|--------------------|----------------------------------------------------------|------|------|------|--------|
| BEQ Rs, Rt, offset | If $R[s] == R[t]$ then $PC \leftarrow PC + 2 + offset*2$ | 0111 | 0-15 | 0-15 | offset |
| • • •              |                                                          |      |      |      |        |

# Example encoding:

BEQ R1, R2, -2

| Ор   | Rs   | Rt   | Rd   |
|------|------|------|------|
| 0111 | 0001 | 0010 | 1110 |

# Compute branch target for BEQ



# Make branch decision



# What's missing from what we covered in lecture?

- Details of Control Unit
  - ALU op is not instruction opcode; some translation needed
  - Reg Write bit (for ADD, SUB, AND, OR, LW)
  - Mem Store bit (for SW)
  - Mem bit (arithmetic/memory MUX bit)
  - Branch bit (for BEQ)
- Implementation of JMP
- Implementation of HALT (basically stops the clock running the computer; we won't implement this)

See Arch Assignment!

# HW ARCH not the only implementation

### Single-cycle architecture

- Relatively simple, (barely!) fits on a slide (and in our heads).
- Every instruction takes one clock cycle each.
- Slowest instruction determines minimum clock cycle.
- Inefficient.

### Could it be better?

- Performance, energy, debugging, security, reconfigurability, ...
- Pipelining
- OoO: Out-of-order execution
- Caching
- ... enormous, interesting design space of Computer Architecture

# Conclusion of unit: Computational Building Blocks (HW)

### Lectures

Digital Logic

Data as Bits

Integer Representation

Combinational Logic

Arithmetic Logic

Sequential Logic

A Simple Processor

### **Topics**

Transistors, digital logic gates

Data representation with bits, bit-level computation

Number representations, arithmetic

Combinational and arithmetic logic

Sequential (stateful) logic

Computer processor architecture overview

### Labs

1: Transistors to Gates

2: Data as Bits

3: Combinational Logic & Arithmetic

4: ALU & Sequential Logic

5: Processor Datapath (next week)

### Assignments

Gates

Zero

Bits

Arch (out now!)

Mid-semester exam 1: HW

October 10