CS 240 Lab 9: Data Structures in Memory

Peter Mawhorter

Outline

Assembly Coding
Data Structures

Assembly Coding

Segments

Programs are divided into three* segments:
- .text - read-only; contains program code
- .data - read/write; contains writable globals
- .bss - read/write; starts full of 0s

*Some formats support arbitrary segments

Details of ELF format if you’re curious

Details of the assembly relocation process

Directives

Start with . (reference)
Tell the assembler what to do next
(besides interpret assembly code into machine code).

.data    # assemble into .data segment
.quad 0  # assemble 8 bytes of 0
.string "My lovely string <3"
  # assemble these 20 bytes (including final NUL)

.text  # assemble into .text segment
.global main  # make "main" global for linking

Symbols

Store address for later use (reference)
Replaced with a number in machine code during linking

.data
count:  .quad 0  # remembers start of these 8 bytes

.text
countDown:  # remembers this instruction address
    mov count, %rax  # load from remembered address
    sub $1, %rax
    mov %rax, count  # store to remembered address
    ret

Symbols

Use $ (or lea) to use address as a value:

.text
msg: .string "Hi~\n"
.global show
show:
    sub $0x8, %rsp
    mov $msg, %rdi
    call printf # also a symbol
    add $0x8, %rsp
    ret

mov    $0x401136,%rdi
call   401030 # addr of printf
ret

Symbols

Use $ (or lea) to use address as a value:

.text
msg: .string "Hi~\n"
.global show
show:
    sub $0x8, %rsp
    mov msg, %rdi
    add $0x8, %rsp
    ret

mov    0x401136,%rdi  # WRONG
mov    $0x0,%rax
call   401030 # addr of printf
ret

Symbols

0000000000401136 <msg>:
  401136:   48                      rex.W
  401137:   69                      .byte 0x69
  401138:   7e 0a                   jle    401144 <show+0x9>
  40113a:   00                      .byte 0x0

000000000040113b <show>:
  40113b:   48 c7 c7 36 11 40 00    mov    $0x401136,%rdi
  401142:   48 c7 c0 00 00 00 00    mov    $0x0,%rax
  401149:   e8 e2 fe ff ff          call   401030 <printf@plt>
  40114e:   c3                      ret

Segmentation Fault!

Did you put code into the .data segment?
Did you try to write to the .text segment?
Did you forget to align the stack before a call?
Did you pass a bad pointer to printf?
Did you blow up the stack?
(check return pointer is okay)

Segmentation Fault!

Use GDB disas and x commands
- disas will show the current function
  (w/ address shows code that address is part of)
- stepi will step one instruction
- break to set breakpoints
  (use * for address-based; like break *show+7)
- info reg rsp (or another register) show contents
- use $rsp (or another register) w/ x:
  x /4xg $rsp would show first 4 quads on stack
  x /s $rdi to show string pointed to by %rdi

Prelab code outline

.data
# global variable z
z: .long 0
# format string for printf (read-only)
fmt: .string "Calculation produces %d\n"

.text
# Multiplies arg1 with itself and returns result (32-bit)
.global square
square:
    # multiply %edi by iteslf
    # copy %edi to %eax
    # return

.global main
main:
    # Don't forget to align the stack and
    # save callee-saved registers

    # First call: square(3)
    # Put 3 in %rdi, then call
    # store result in a caller-saved register

    # push that register before this call
    # Second call: square(4)
    # Put 4 in %rdi, then call
    # pop the value we pushed
    # store result in a caller-saved register
    # (or directly add it to the other result)

    # Add results into z
    mov %?, z

    # Setup args for printf
    # Set %rax to 0
    # Call printf

    # Don't forget to undo any stack alignment
    # Return 0

Data Structures

1D Arrays

Item i is at base + i * size
- base is the array address: the pointer
- size is the size of each element
Store size in a separate variable
Same formula as pointer math, arrays are pointers

2D Arrays

Extra magic for double-indexing:
- x[1] is the same as *(x + 1)
- x[1][2] is NOT ALWAYS the same as *(*(x + 1) + 2)
Instead, store whole array as if it were flat, to avoid overhead of having a pointer for each row.
- Compiler multiplies row-index by row length
- x[1][2] only makes sense if we know how big a row is
- Row length must be fixed at compile time (otherwise use pointers)

2D Arrays

// allocated statically on stack
int x[2][2];

x[1][1] is x + 4*2 + 4 = x + 12

4×4 + 8 = 24 bytes used

// allocated dynamically on heap
int **x;
x = (int**) malloc(sizeof(int*)*2);
x[0] = (int*) malloc(sizeof(int)*2);
x[1] = (int*) malloc(sizeof(int)*2);

x[1][1] is *(*(x + 8) + 4)

4×4 + 2×8 + 8 = 40 bytes used

2D Arrays

Would an array wear pants…

A 3-by-3 array containing the numbers 1-9 in the same order they appear on a telephone number pad (left-to-right 1/2/3 and top to bottom rows starting with 1/4/7). A pair of 3-legged pants has been put onto the array, with each column of the array wearing one leg of the pants.

…like this?

The same 3-by-3 array with the pants worn by the rows, so that the pant legs extend sideways from the left side to the right.

…or like this?

2D Arrays

The second image from the previous slide, where the array is wearing one pant leg on each row.

C uses row-major order: values are stored in-memory the same way that an English speaker would read them out: left-to-right across one row and then down to the next row.

2D Arrays

In assembly, you can just write things out in order, or use multiple rows to make it more legible:

nums: .long 1, 2, 3, 4, 5, 6, 7, 8, 9

nums: .long 1, 2, 3
      .long 4, 5, 6
      .long 7, 8, 9

The nums label just refers to the address of the first element; compiler needs to be told what the data means.

extern int nums[3][3];

Deeper Thoughts

All data is just bits + interpretation
Arrays, strings, x86 binary executables, etc.:
- Meaning/effect comes from human (or machine) interpretation
- Different interpretation → different meaning/effect
E.g., an array could be a hash table or a binary tree…
Semiotics is the study of how meaning gets made from language