CS 240 Lab 7: x86 Assembly

Peter Mawhorter

Outline

  • x86 Architecture
  • How Does a Computer Work?
  • x86 Details
  • Disassembly

x86 Architecture

  • x86 microarchitecture is extremely complicated
    • Instructions get broken into micro-ops
    • Multiple instructions run at once when possible
    • RAM access is slow and so requires scheduling and caches
  • We won’t cover x86 circuitry in this class

Diagram from techschems.com

Diagram of an Intel Core i7 CPU at a high level (we won’t need to understand this…). It has the following components: two Branch Predictors (one with BTB and RSB inside and one with Gshare, Indirect, and RSB inside). an L2 Cache (shared between 2 cores) at the bottom. This cache has two arrows pointing to it from either end of the diagram. From the left, an arrow labeled “I$ prefetcher” comes from the “ITLB” as part of the Instruction Cache. From the right a “D$ Prefetcher” arrow comes from the Store Buffers of the Data Cache. The Instruction Cache feeds into stacked Prefetch buffers, which feed into a “decode” unit with a “UROM” nearby (but not connected). The multiple decode results go into a stacked InstructionQueue, and results from that go to both an Integer Register File and an FP Register File (near this connection is an “Allocate Rename” linked to a “Reorder Buffer”). The FP Register File connects to FP Rename Buffers which connect to multiple FPC Scheds which connect to separate ALUs. One ALU has Shuffle and simul attached, another has FP Adder attached. These connect to a thick line which goes back around to the FP Register File. Meanwhile the Integer Register file has an Integer Rename Buffers component; it connects to two IEC sched and an MEC sched. The IEC sched go to two different ALUs, one with a Shifter the other with a JEU. These go to a thick line that connects back around to the Integer Register File. The MEC sched connects to an AGU with a TLB, it has a reissue Q next to it. The AGU is next to a Data Cache (which we saw earlier; it has Store Buffers that connect back to the L2 Cache). The Data Cache also connects to the thick arrow that goes back around to the Integer Register File. 

x86 Architecture

x86 Architecture

There’s too much information here for us to understand all at once. But as in other parts of the class, we are honing our dealing-with-information-overload skills:

  • Pick out what’s important for your purposes
  • Cut a piece away using abstractions
  • Double-check foundational understanding

How Does a Computer Work?

  • ✓ It runs programs composed of machine instructions using a CPU
  • ? How are the programs I write translated into machine instructions?
    • How are functions, conditionals, and loops implemented?
  • ? Why should I care what the machine instructions are doing?

x86 Details

Anatomy of an x86 Instruction

INST OPERAND, DEST ↔︎ add %rdi, %rax

  • INST is the name, possibly w/ suffix (mov vs. movl).
  • OPERAND is the first operand
  • DEST is the destination (also second operand)
  • Result codes (e.g., zero flag) go into the “FLAGS” register (no direct access)
  • Lots of operand rules (e.g., “operand must be register”)

Beware Intel vs. AT&T syntax and GAS vs. NASM vs. MASM (we use AT&T/GAS)

x86 Operands

  • %RRR or %RR - register name (%rax, %ah)
  • $CONST - constant value ($0xF0, $240)
    • Can be decimal or hexadecimal depending on 0x prefix
  • NUMBER - constant memory address (0xF0, 123)
    • Lack of $ means memory access
  • NUMBER(%RRR, %RRR, STRIDE) - variable memory address
    • See next slide
  • NO arithmetic allowed
    • We are specifying each basic operation. We can’t ask for another operation in the middle of the one we’re specifying.

Address Calculation

  • Syntax offset ( base, index, stride ), e.g.: 8(%rdi, %rsi, 4)
    • base and index are registers
    • offset must be number
    • stride is a number (only 1, 2, 4, or 8)
  • The computed address is base + offset + (index × stride)
  • Iterating through a string:
    • base is the start of the string (original pointer)
    • index is the index variable
    • offset could be used, e.g., to skip 1st letter
    • stride would be 1, but could be 4 for ints instead of chars

Common Instructions

  • mov copies stuff
  • j* jumps (lots of varieties like je, jge, etc.)
  • cmp/test compares (to set up for conditional like je)
  • push and pop stores/reads stack
  • lea stores address in register (think of &)
    • “Load Effective Address,” a.k.a. “Lovely Efficient Arithmetic”
  • Keep this reference handy and use compiler explorer right-click docs (this link lets you enter assembly code directly)

Credit to Ben Wood for “Lovely Efficient Arithmetic”

Register Names

  • %rax Accumulator (return value)
  • %rcx Count (arg 4)
  • %rdx Data (arg 3)
  • %rbx Base
  • %rsp Stack Pointer
  • %rbp Base Pointer
  • %rsi Source Index (arg 2)
  • %rdi Destination Index (arg 1)
  • %r8 - %r15 (args 5 + 6 in r8 and r9)
  • %rip Instruction Pointer (only special access)

Registers for Arguments

Apparently this mnemonic is from Geoff Kuenning who I took classes from at HMC

Contrasts w/ HW ISA

  • Second operand is also destination
    • ADD R1, R2, R3 vs. add %rdi, %rax
    • Instead of x = a + b we only get x += a
  • Constants allowed (only in certain places)
    • ADD R1, R1, R6 vs. mov $2, %rsi
  • Index register in memory addresses
    • LW R2, 3(R5) vs. mov 3(%rax, %rdi, 8), %rdx
  • Two-step branching w/ fixed destinations
    • BEQ R1, R2, 3 vs. cmp $rax, $rdi / jeq 0x54321
  • ~700 instructions instead of 9 (you only need ~20-30)
    • Important: stack/call stuff (push, pop, call and ret)

Disassembly

  • Use objdump -d or disas command within gdb
  • Can also compile using -S flag for gcc to get a file
    • Using -O0 (the default) does no optimization; many things will be stored on the stack unnecessarily
    • Using -O3 does lots of optimization; gets stuff done without the stack where possible

Compiler Explorer

  • Compiler Explorer shows C and assembly side-by-side
  • Setup:
    • Select “C” language on the left side
    • Select “x86-64 gcc 11.5” from the right-side compiler menu
    • Add “–std=c99” to the compiler options on the right
      • Try “-Og” or “-O1” options for simpler assembly
    • Uncheck “Intel asm syntax” in the Output menu on the right
  • Right-click and use “View assembly documentation”

Exploring string_length_a

  • Compiler Explorer link
  • Can run objdump -d practice.bin
  • Could also run gdb practice.bin and then disas string_length_a

Lab Work

  • Let’s go on an adventure!
  • Partners reminder