CS 240: Malloc

Assignment: Malloc

Assign: Monday 27 November
Checkpoint: Prep exercises Part A through Part F by Wednesday 6 December
Due: Friday 15 December. Extensions through 4pm on Thursday December 21 may be granted, but that's a hard deadline.
Policy: Pair graded synthesis assignment
Partner search: Find a partner here.
Code: cs240 start malloc (if this doesn't work, use cs240.f23 start malloc)
Submit: git commit and git push your completed code.
Reference:

Assignment: Malloc
Overview
Setup
Tasks
Preparatory Exercises
- CSAPP Review
- Required Exercises to Complete for Checkpoint
Specification
- Function Specifications
- Programming Rules and Style
Implementation
- Basic: Implicit Free List Allocator
- Optional Improvements [Independent]
Compiling and Testing
How To Work
Submission
Grading
Extra Fun Open-Ended Extensions
- Implement realloc
- Extended Error-Checking

Overview¹

In this assignment you will implement your own dynamic memory allocator (malloc/free) for C programs!

Goals

To understand general principles and trade-offs in dynamic memory allocation.
To consider and empirically evaluate the effects of implementation and design choices on competing measures of performance.
To develop defensive design, coding, and debugging skills for environments where standard structure and amenities are unavailable.
To build a fundamental system component that you have been using (unwittingly?) since the first time you used a computer.
To synthesize your accumulated bit-twiddling, pointer-plying, and algorithm-accelerating wisdom to build and optimize a “1-sentence” project instead of taking a final exam.

Advice

This assignment requires careful bit-level manipulation of memory using many of C’s unsafe pointer casting features without much structure. Your allocator code will not be enormous, but it will be subtle. Take planning seriously and work methodically. Ignoring this advice could cost hours or days. Also keep in mind that performance is a major component of the evaluation for this assignment.

We recommend this path for preparation and development of a successful allocator:

Read the specification, complete the preparatory exercises,, and review effective work strategies.
Implement an implicit free list allocator in C (and test it).
Optionally, extend your implicit free list allocator to create an explicit free list allocator
Extra Fun Optionally, add open-ended extensions to the allocator.

Pair programming.

You can work in pairs on this assignment. If you would like use to assign you a partner based on goal score and working times, fill out this Google form (you can also find your own partner using the pair stream).

Time Reports

According to self-reported times on this assignment from a prior semester:

25% of students spent <= 6 hours.
50% of students spent <= 12 hours.
75% of students spent <= 15 hours.

These times represent a range of implementation strategies and stopping points.

Setup

Use a CS 240 computing environment. The allocator you build might work on similar platforms (no guarantees!), but it must work on our platforms for grading.

Get your repository with cs240 start malloc.

Your starter repository contains the following files:

Makefile – recipes for compiling
mdriver.c – testing driver
memlib.h – memory/heap interface
mm.c – memory allocator implementation
mm.h – memory allocator interface
traces/ – several trace files (.rep) used for simulated testing
remaining files are testing support files you do not need to inspect

Compile the allocator and test driver with make to produce an executable called mdriver.bin. Usage of the mdriver.bin testing executable is described later.

Tasks

You will modify mm.c to implement a memory allocator with the interface declared in mm.h and the functionality described by the Specification section below.

int   mm_init();
void* mm_malloc(size_t payload_size);
void  mm_free(void* payload_pointer);

The provided mm.c file partially implements an allocator based on an implicit free list. It provides skeletons of several helper functions beyond the interface above and an implementation of mm_malloc that uses these helper functions. Your job is to complete the helper functions (and possibly add more of your own) as well as mm_free() to implement a memory allocator with good balance of memory utilization and throughput. You may use any implementation strategy that works (subject to programming rules). Performance of your memory allocator is a large component of the grade. There are many opportunities for extensions, whether in increasingly sophisticated allocator implementations for better performance or in analysis and error-checking tools.

Grading considers design, documentation, style, correctness, and performance.

The remainder of this document includes:

Preparatory exercises for designing and simulating an implicit free list allocator.
The allocator specification.
Notes on implementing various allocator strategies.
Documentation of the test harness for correctness checking and performance evaluation.
Advice for effective work strategies.
Grading criteria.
Extra Fun Ideas for open-ended extensions.

Preparatory Exercises

As you read this document, complete these exercises to familiarize yourself with heap layout, block layout, and the provided starter code. These exercises will help you design your allocator implementation before diving into messy implementation details. Good planning is key when developing low-level systems software.

Preparation is your ticket for assistance.

You must complete the preparatory exercises (and show evidence) before asking questions about code or debugging on the main assignment.
You may ask questions on preparatory exercises at any time.

CSAPP Review

Read CSAPP section 9.9 to become familiar with the type of thinking and coding you will do to build the allocator. The book discusses an implicit free list allocator similar in organization to the one we suggest you build first. The valuable part is the descriptions. Do not get too caught up in the code details in Sec 9.9.12, since you will use different starter code.

Note: In CSAPP section 9.9, word means 4 bytes and double-word means 8 bytes, but in the CS240 slides and assignment, word means 8 bytes and double-word means 16 bytes,

Based on the CSAPP reading, you should complete the Lab 12 pre-lab.

In the past, we have also recommended the following practice problems on memory allocation, but you do not need to submit solutions to these. Solutions not in the textbook are available by visiting office hours.

CSAPP Practice Problem 9.6
CSAPP Practice Problem 9.7
CSAPP Homework Problem 9.15
CSAPP Homework Problem 9.16

Required Exercises to Complete for Checkpoint

These exercises will help you plan your allocator implementation and avoid wasting time writing C code before you understand what you are doing.

Unlike preparatory exercises for past assignments, these count toward your grade and have a separate checkpoint deadline. Create a Google Doc shared with your Malloc assignment partner and your instructors based on the template provided in lab. Write/draw your responses for these exercises in that document. Submit by sharing the Google Doc with both the instructors.

Other tips:

Take notes on anything you learn during these exercises.
Read instructions and comments in provided code carefully.
Draw detailed diagrams of the heap and execute operations on the visual heap by hand to become familiar with the details or to debug.
Every line of provided code does something meaningful and necessary.

A. Block Layout

Read mm.c to learn about the suggested starting block layout. Words are 8 bytes. Unless specific questions indicate otherwise, pages are 4096 bytes.² Pay special attention to the early comments in mm.c about block and heap layout, as well as the code in mm_init.

Submit text answers to these questions:

What is the minimum block size (in bytes) of our allocator? Explain your answer.
What is stored in the header of each block?
What is stored in the footer of each free block?
Why is a footer not needed in an allocated block?
What is the largest payload that could be allocated in a minimum-size block? Explain your answer.
What is stored in the last word of the heap? (We call this the heap footer; CSAPP calls this the epilogue.)
How much space in the heap is never part of any block? Explain your answer.

B. Heap Layout

For the following questions, you will draw heap layouts summarizing memory as an array of word cells. Use the block layout rules from mm.c and the heap setup code in mm_init.

Rather than using the horizontal heap layouts shown in CSAPP, use vertical heap layout consisting of vertically arranged rectangles, where each rectangle represents one or more 8-byte words. The heap should consist of the following:

An 8-byte heap header
A sequence of heap blocks, where
- Each free block has:
  - an 8-byte block header with
    - a block size (multiple of 16)
    - status bits:
      - 0b00 previous and current block both free
      - 0b01 previous block free and current block allocated
      - 0b10 previous block allocated and current block free
      - 0b11 previous and current block allocated
    E.g.,
    +------------------+ | 48 | 0b10 | # free block of size 48 bytes with prev block allocated +------------------+
  - A rectangle indicating the padding size in bytes (block size - 16 bytes)
    +------------------+ | ... 32 bytes ... | ~------------------+
  - An 8-byte block footer with the block size. (A footer has no status bits.). E.g.,
    +------------------+ | 48 | +------------------+
- Each allocated block has:
  - an 8-byte heap header with block size and status bits
  - A rectangle indicating the size in bytes for payload and internal fragmentation.
  An allocated block has no footer.
An 8-byte heap footer (with size 0 and status bits 0bp1, where the p bit indicates whether the previous bit is allocated.

Submit drawings of each the following heap layouts (for an implicit free list design):

A heap containing a single allocated block with size 48, assuming page size = total heap size = 64 bytes.
A heap containing a single free block with size 16, assuming page size = total heap size = 32 bytes.
A heap containing a free block of size 80, followed by an allocated block of size 32, assuming page size = total heap size = 128 bytes.
A heap containing, in order:
- an allocated block with size 48
- a free block with size 32
- a free block with size 64
- an allocated block with size 16
assuming page size = total heap size = 176

C. Starter Code Details

Read starter code in mm.c. Submit text answers to these questions:

What is the value of MIN_BLOCK_SIZE?
What is the result of PADD(0x0708, 5)?
What do the LOAD, LOAD_BLOCK_HEADER, and PLOAD functions do? Why might it be preferable to use them in your allocator code in place of normal C pointer operations? (Note that STORE, STORE_BLOCK_HEADER, and PSTORE are analogous.)
Answer the following questions after studying these helper functions and constants: block_get_header, block_set_header, block_get_footer, block_set_footer, block_succ, block_pred, status_size,status_pred, status_used, make_status, PRED_USED_BIT, USED_BIT.
1. Given a block pointer bp, explain what this statement does:
```
block_set_header(bp, make_status(block_get_header(bp), 0, USED_BIT))
```
2. Given a block pointer bp, write a single expression using the above helper functions to return the allocation status of the block following bp in memory order.
3. Given a block pointer bp, write a single expression using the above helper functions to return the allocation status of the block preceding bp in memory order.
4. Given a block pointer bp, write a sequence of statements using the above helper functions that changes the block header of the block pointed at by bp to indicate that the block preceding bp in memory order is an allocated block.
Answer the following questions about these functions from the starter code. Describe what they actually do in the starter code, not what they are supposed to do when the starter code is fleshed out. When answering these questions, assume that the DEBUG flag is set to 0.
1. What does mm_free do?
2. What does coalesce do?
3. What fit policy does the search function employ?
4. What are the possible values that search(48) can return, and under what conditions are they returned?
5. The extend_heap function is called only if no existing free block is large enough to satisfy the current allocation request. It calls coalesce on the heap space that it creates. In the starter code, coaleasce does nothing, but when you flesh it out later, it should correctly coalesce adjoining free blocks. Explain the purpose of calling the correct coalesce function in extend_heap.

D. Simulate the Starter Allocator

For this part, you will simulate the starter code manually on a sequence of allocation events and draw the state of the heap after each event in the simulations. Follow the drawing guidelines from part B. Copy-paste will speed your drawing.

The starter code is not a complete allocator. Do not simulate your guess of what an allocator should do. Simulate the starter code exactly as written. This will force you to read and understand the provided code carefully in detail. Your later self will appreciate it when you add comments to the starter code now to help remember what you learn about the starter code during the simulations.

Submit drawings of the heap after each event in this simulation. In the case where the heap is unchanged by an allocation event, you can just write “same heap as above”.

Simulate the following calls to the allocator code in order:

mm_init();
void* p0 = mm_malloc(12);
void* p1 = mm_malloc(16);
void* p2 = mm_malloc(16);
mm_free(p0);
mm_free(p1);
void* p3 = mm_malloc(24);

E. Run the Starter Allocator on a Trace File

A sequence of allocation events can be represented in a trace file. The format of trace files is described here. For example, the allocation events from the simulation in the previous part are encoded in the trace file traces/small.rep.

In this part, you will execute the starter file on this trace file. In mm.c, make sure that the DEBUG flag is set to 1 (the default). In your cs240-repos/malloc directory, execute the following two instructions:

make 
./mdriver.bin -V -f traces/small.rep

This should display a long log of allocation steps.

The log contains a sequence of chunks, where each chunk begins with a call to an allocation function and is followed by a summary of the heap displayed by calling the check_heap helper function. Calls to check_heap are guarded by a conditional involving the DEBUG flag, like this:

if (DEBUG) {
  check_heap(0);
}

This means that the heap will be displayed only when the value of the DEBUG flag is nonzero. When doing simple tests, you want the DEBUG flag to be set to 1, but should set it to 0 when there would be an overwelming amount of ouptut from the calls to check_heap.

Copy into your Google Doc the portion of the last chunk (showing a call to free).
Describe all the parts of a general chunk displayed by check_heap. What abbreviations are used in a line and what do they stand for?

F. Simulate the Correct Allocator

The simulation in parts D is not very interesting because it is for a very crude first-fit implicit list allocator that does not implement splitting, freeing, or coalescing. In this part, you should redo the simulation of the allocation events from Part D, but instead assume that splitting, freeing, and coalescing work as they are supposed to.

Submit drawings of the heap after each event in this simulation.

G. Write Features in Pseudocode

If you will not work with your lab partner for the main Malloc assignment, wait and work on this part with your assignment partner.

For each of the following features, add the feature by sketching pseudocode comments within mm.c.

Write pseudocode at a level of detail that is sufficient to simulate and draw concretely, but do not get tangled up in C-level pointer work. Well-written comments at this stage can remain as documentation when you move on to implementation in C.

Write a pseudocode implementation of mm_free as // inline comments in the body of mm_free.
Write a pseudocode implementation of splitting in allocate as // inline comments in the body of allocate.
Write a pseudocode implementation of coalesce as // inline comments in the body of coalesce.

Submit your pseudocode by copying your commented versions of mm_free, allocate, and coalesce into your Google Doc for this part.

Specification

Function Specifications

The three main memory management functions should work as follows.

int mm_init(): Initialize the heap. Return 0 on success or -1 if initialization failed. Client applications must call mm_init once to initialize the system before the first call to mm_malloc() or mm_free(). A complete implementation of mm_init for implicit free lists with first-fit allocation policy is provided.
void* mm_malloc(size_t payload_size): Allocate and return a pointer to a block payload of at least payload_size contiguous bytes or return NULL if the requested allocation could not be completed.
- Payload addresses must be aligned to 16 bytes.
- Blocks must be multiples of 16 bytes in size.
- Allocated blocks must be non-overlapping and within heap bounds.
- size_t is a type for describing sizes; it is an unsigned integer large enough to represent any size within the address space.
void mm_free(void* payload_pointer): Free the allocated block whose payload is referenced by the pointer payload_pointer. Assume that payload_pointer was returned by an earlier call to mm_malloc() and has not been passed to mm_free since its most recent return from mm_malloc.

The specifications for mm_malloc and mm_free match those for the malloc(1) and free(1) functions, respectively, in the standard C library.

Programming Rules and Style

Do not change any of the mm_ function types in mm.c/mm.h. You may add, remove, or change helper functions in mm.c as you wish. Declare all helper functions (other than the interface above) as static (visible only within the file, so they do not impact our testing code).
Do not call any standard memory-management related library functions or system calls such as malloc, calloc, free, etc. You may use all functions in memlib.c, but if you use the provided starter code, you likely will not need additional uses of the memlib.c functions.
Avoid declaring global or static mutable variables. (Constants are OK: static const MY_CONSTANT = ...;) If you think you need them, consider how to store them within the heap region instead.
Write function header comments for new functions (and expand the existing header comments for the main existing functions mm_malloc, mm_free, search, allocate, and coalesce) to describe what the function does, what policy it follows (if applicable), and what it assumes. Use inline comments to describe details as needed.
Since some of the unstructured pointer manipulation inherent to allocators can be confusing, we recommend small helper functions and short inline comments on steps of the allocation algorithms.

Implementation

You may take any approach to implementing the specified allocator behavior above. Your implementation will be evaluated based on correctness, performance, documentation, and style. This section suggests an incremental implementation strategy that allows you to choose what balance of implementation sophistication and performance you prefer.

Start with an implicit free list allocator implementation (and commit and push that code once it works!). If you wish to improve the performance of your allocator once you have implemented and tested an implicit free list approach, consider an explicit free list or other search strategies as next steps, or eventually explore more sophisticated improvements.

Basic: Implicit Free List Allocator

Convert your pseudocode feature implementations to C code one at a time, testing and committing each before starting C code for the next feature.

Implement and test mm_free.
Implement and test splitting in allocate.
Implement and test coalesce.

Run tests on individual traces for debugging and on all traces for general testing and performance evaluation.

Some provided traces will cause your allocator to run out of memory unless you have completed all three features.
Any other errors or crashes indicate correctness problems you must fix.
The traces small.rep, short1-bal.rep, and short2-bal.rep should work from the beginning, but they are suitable only as small debugging samples.
Write your own small traces to help test or debug individual cases.
Use check_heap to sanity-check each feature before implementing the next or to help debug when things go wrong.

Implement and test incrementally.

Implement, test, and commit one feature (or sub-feature!) at a time, testing and committing each before you start implementing the next. This will make it much easier to understand what code is involved in a bug.

Optional Improvements [Independent]

If you wish to improve the performance of your allocator further, consider any of these directions. Any improvements beyond the basic implicit free list are [Independent] work (though unlike previous independent work, course instructors will be more willing to discuss these extensions).

Be sure to commit your working implicit free list allocator with a recognizable commit message before attempting any improvements.

Basic/Intermediate: Search Strategy [Independent]

Once you have a working allocator using a first-fit policy with freeing, splitting, and coalescing implemented, one potential performance improvement strategy is to try alternative fit policies. Consider how they are likely to affect your performance index relative to first fit and relative to the performance improvements that could result from an explicit free list. For a next-fit policy, we suggest:

Add a static const int NEXT_FIT_POLICY that is 1 if using next fit or 0 otherwise. Write code such that you can easily switch policies by switching this value.
Use a global variable—or better, the first heap word—to save a pointer to the block where the next search should begin. Add helper functions for getting and setting this pointer if using the latter.

Intermediate: Explicit Free List Allocator [Independent]

The best way to develop an explicit free list allocator (or a more sophisticated seglist allocator) is to first develop a working implicit free list allocator and extend it to use explicit free lists.

Memory Order vs. List Order

Explicit free-list allocators must distinguish memory order from list order. Memory order refers to the order of blocks as arranged in the memory address space. We use the terms adjacent, preceding, predecessor, following, and successor to refer to memory order. List order refers to the order of blocks in the explicit free list. We use the terms next and previous to refer to list order. Confusing these orders will lead to tricky bugs.

Some suggestions as you implement explicit free lists:

An explicit free list allocator (or a next-fit allocator) must store a pointer to the list head node somewhere. The first word of the heap (or a single global pointer variable if necessary) is a good place to store this. Helper functions for getting and setting this pointer will keep list manipulations clean.
Free blocks must contain next and previous pointers. Consider how this affects MIN_BLOCK_SIZE. Choose a fixed offset within the block to store each of the pointers. Helper functions for getting and setting these pointers will keep list manipulations clean. Using block headers as the target of these pointers in all cases is the simplest, easiest, most efficient option.
Helper functions to insert a block into the free list and remove a block from the free list are advisable, as you will need to do these same steps in multiple places.
We suggest disabling coalescing, splitting and mm_free while you complete initial development of the explicit free list.
Update mm_init, search, allocate, and extend_heap to use your explicit free list. Test.
Update and test each of mm_free, splitting, and coalescing for the explicit free list.

Advanced: Seglists, Alternative Policies, and Beyond [Independent]

A clean and efficient working explicit free list allocator should achieve a respectable performance index. To achieve even better performance, you could consider implementing seglists, alternative free list representations, or alternative search policies. If you embark on this path, weigh difficulty of implementation against likely effect on performance index.

Compiling and Testing

Test Harness

The mdriver.bin program is a testing harness that evaluates your mm.c implementation for correctness, space utilization, and throughput. The test harness uses trace files to simulate memory management workloads using your mm.c implementation. A trace is a sequence of allocate and free events. mdriver.bin simulates a trace by calling mm_malloc and mm_free for each corresponding event in the trace in order.

To build the mdriver.bin program, run make or make mdriver.bin. To create an optimized version for performance testing (not debugging), run make mdriver.opt.bin.

Use the mdriver.bin executable during development, testing, and debugging. It is compiled with minimal compiler optimizations and with assertions enabled.
Use the mdriver.opt.bin executable for performance evaluation only. It is compiled with advanced compiler optimizations and with assertions disabled, making it more difficult to debug. Its usage is identical to the usage of mdriver.bin, so we list only the former in examples.

To run the mdriver.bin executable, use the command ./mdriver.bin -V or ./mdriver.bin -V. The -V flag displays helpful summary information as described below. The mdriver.bin executable accepts the following command line arguments:

-t <tracedir>: Look for the default trace files in directory tracedir instead of the default directory defined in config.h.
-f <tracefile>: Use one particular tracefile for testing instead of the default set of tracefiles.
-h: Print a summary of the command line arguments.
-l: Run and measure libc malloc in addition to your mm.c implementation.
-v: Verbose output. Print a performance breakdown for each tracefile in a compact table.
-V: More verbose output. Prints additional diagnostic information as each trace file is processed. Useful during debugging for determining which trace file is causing your mm.c implementation to fail.

Note that the mdriver.bin test harness runs each trace at least twice in each session: once to check correctness and once to check performance. It is important to remember this when examining debugging output.

Common Testing Tasks

Run mdriver.bin on individual traces for debugging: ./mdriver.bin -V -f traces/your-favorite-trace.rep
The traces short1-bal.rep and short2-bal.rep should work from the beginning (inefficiently). You may find it useful to write additional traces for debugging.
Run mdriver.bin on all traces for correctness testing and performance evaluation: ./mdriver.bin -V
Some traces in the traces directory will cause your allocator to run out of memory unless you have completed all three implicit allocator features.
Any other errors (other than out-of-memory) or crashes indicate correctness problems you must fix.

Trace Format

When learning about the starter code, you will simulate small traces. When testing and debugging, you may find it useful to write and test your own small traces.

Traces used by mdriver.bin summarize the execution of a program as a sequence of mm_malloc and mm_free calls in a simple format.

A trace file contains 4 header lines:

Suggested heap size (any number, ignored by our tests).
Total number of blocks allocated.
Total number of malloc/free events.
Weight (any number, ignored by our tests).

Remaining lines after the header give a sequence of memory management events, one per line. Each event is either an allocate event or a free event:

Event a i size indicates the ith call to malloc in the trace, requesting a payload og the given size. The ID i uniquely identifies the malloc call and the allocation it makes, starting at 0 for the first allocation in the trace.
Event f i indicates a call to mm_free with the pointer that was returned by the ith malloc call in the trace.

The following example C code would generate the corresponding trace below it.

C code:

p0 = malloc(12);
p1 = malloc(16);
p2 = malloc(16);
free(p0);
free(p1);
p3 = malloc(24);

A corresponding trace:

Heap Consistency Checker

The starter code function check_heap implements basic heap consistency checks for an implicit free list allocator and prints out all blocks in the heap in memory order.

You may find it helpful to insert calls to check_heap to assist in testing and debugging allocator features you add. Make sure to set DEBUG=0 to disable check_heap when you submit your code. These checks will otherwise hurt your performance evaluation.

You may also find it helpful to extend the check_heap function to check more detailed consistency properties. This becomes most interesting when considering an explicit free list (or more sophisticated) allocator. check_heap should return a nonzero value if and only if your heap is consistent according to the conditions you check.

Example checks to add:

Are all blocks present in the (explicit) free list also marked as free?
Are all blocks present in the (explicit) free list also valid block addresses?
Are all blocks marked as free also present in the (explicit) free list?
Are all free blocks surrounded by allocated blocks (i.e., are all free blocks fully coalesced)?
Are all blocks non-overlapping (check boundary tags)?
Are all heap words (except the heap header and heap footer) part of some block?

Feel free to rename or split check_heap into multiple static helper functions. When you submit mm.c, make sure all calls to check_heap are conditioned on DEBUG; otherwise they will impact performance.

Performance

Performance of your allocator is important. Use mdriver.opt.bin for performance evaluation.

For the most part, a correct implementation based on our provided code will yield passable performance. Two performance metrics will be used to evaluate your solution:

Space utilization: The peak ratio between the aggregate amount of memory used by the driver (i.e., allocated via mm_malloc but not yet freed via mm_free) and the size of the heap used by your allocator. The optimal ratio is 1. Choose policies that minimize fragmentation to maximize utilization.
Throughput: The average number of allocator operations completed per second.

The mdriver.opt.bin program computes a performance index, which is a weighted sum of the space utilization and throughput:

P = 0.6 × U_mm.c + 0.4 × min(1, T_mm.c / T_libc)

where U_mm.c is the space utilization of your mm.c implementation, T_mm.c is the throughput of your mm.c implementation, and T_libc is the estimated throughput of the standard C library (libc) version of malloc on the default traces. The performance index values both good space utilization and good throughput, with slight preference toward space efficiency.

Your allocator must balance utilization and throughput. A performance index of 70-80 or above (out of 100) is pretty good. For reference, representative implementations have achieved the following performance indices (out of 100):

Implicit free list with first-fit, splitting, and coalescing: around 50
Implicit free list with next-fit, splitting, and coalescing: high 50s to 70s or 80s
Explicit free list with first-fit, splitting, and coalescing: low 90s

Better performance likely requires a more sophisticated free list implementation. Before haphazardly attempting to “optimize” your implementation, read these hints about performance engineering and optimization.

How To Work

Low-level unguarded memory manipulation code like this allocator is prone to subtle and frustrating bugs. There’s a reason you often use higher-level languages with more protections! The following strategies help to work effectively and responsibly to avoid frustration and confusion.

Plan Carefully

Review allocator concepts, complete the preparatory exercises, and read the provided code carefully. Never write code until you have a good idea of what you are doing and what is already there.

Code Defensively

Expect things to go wrong. Anywhere your code relies on an assumption, check it explicitly. Explain to your partner why your code should work. Write error-prone code once carefully and reuse it.

Use assertions. Anywhere you assume a property of data to hold, you can assert that property explicitly. Assertions document your assumptions. They are also checked explicitly as your program runs, making it easier to localize the source of an error if one occurs.
Use the provided helpers for memory access (LOAD, STORE, PLOAD, PSTORE.) and pointer arithmetic (PADD and PSUB). This helps avoid pointer manipulation errors or at least catch them early.

Implement and Test Incrementally

Implement, test, and commit one feature (or sub-feature!) at a time, testing and committing each before you start implementing the next. This will make it much easier to understand what code is involved in a bug.

Debug Methodically

GDB and the check_heap function are the tools of choice. Here are some strategies and other tips.

As you are debugging, remember that the mdriver.bin testing harness runs each trace twice: once to check for correctness and once to measure performance. Keep this in mind when examining any debugging output, especially if it seems to appear twice…
Did your program hit an explicit error? An assertion failure? A segfault?
- Localize the crash. Exactly what line of code, what operation in this line, what value(s) used by this operation manifested what error and crashed? This symptom rarely indicates the cause, but it marks where to begin the search.
  - Use gdb to run the code until it crashes, then find the line that crashed (and the context in which it executed) with bt, backtrace, or where.
  - Inspect the most relevant line of code and the error report to determine what the error means. Was it a segfault? If it was an assertion failure, what was the assertion checking?
  - Inspect the arguments shown in the backtrace or print variables to determine what ill-formed pointer was dereferenced to cause a segmentation fault or what illegal values failed an assertion.
- Trace backward through the dependencies of this broken operation to the original logic error.
  - Invalid values: Did any of this operation’s arguments hold invalid values? Did this operation load any invalid values from memory?
    - What operations produced these invalid values?
      - This is easy to answer for invalid arguments.
      - For invalid memory values, consider what earlier operations (perhaps in completely separate function calls) may have saved (or failed to save) this value. Look ahead to the memory tips.
    - Continue tracing backwards treating these producer operations as broken operations.
  - Invalid choices: Was it invalid to apply this operation here given correct arguments (or regardless of the arguments)?
    - What control flow decision led to execution of this operation?
    - Is the logic of the decision correct?
    - If so, continue tracing backwards treating the control flow decision as a broken operation.
  - Callers: You may need to trace back beyond the function where the crash happened.
    - Get a backtrace to see where the current function was called. (Don’t just assume you know which function called this one. Get the truth.)
    - If you want to inspect local variables in functions that called this one, use up and down in gdb to step up and down the call stack, then print what you want to see.
  - Memory: Remember, mm_malloc and mm_free are called many times during a single execution. They depend on their arguments, but also the entire contents of the heap as it exists when they are called, so changes made by earlier calls to mm_malloc and mm_free affect data used by this one.
    - Use the check_heap function to scan the heap.
    - Run it at arbitrary times in gdb with call check_heap().
    - Or hard-code calls to it in your code so you see how the heap evolves with each mm_malloc or mm_free call.
- Minimize the input. Try to write a small trace that captures the essence of the large trace on which your allocator has an error. A smaller trace is easier to reason about in full.
valgrind will be less useful since you are doing very low-level work, implementing the memory allocator itself. valgrind excels in analyzing programs that use the memory allocator.
If resorting to printing (try gdb first), use fprintf(stderr, "...", ...) instead of printf("...", ...) and be sure to remember newline for clarity. Disable any printing in final version.
Use the mdriver.bin -f option. During initial development, using tiny trace files will simplify debugging and testing. We have included two such trace files (short1-bal.rep and short2-bal.rep) that you can use for initial debugging. You can also write your own for targeted debugging/testing.
Use the mdriver.bin -V option. The -V will option will give a detailed summary for each trace file and indicate when each trace file is read, which will help you isolate errors.
Does your code seem to be running forever?
- That might be because it is no longer broken! Set DEBUG=0 to disable those calls to check_heap. Printing the full contents of the heap on every memory management event costs orders of magnitude more than the allocations or frees themselves.
- Or, it might be because a logical mistake is causing an loop. Reenable debugging and check with gdb.

Optimize Judiciously

“… avoid premature optimization…” – Donald Knuth (or Tony Hoare?) and countless others.

When your program is not as efficient as you wish, it is easy to jump in and start hacking on the first part that comes to mind as a potential reason for poor performance. A much better approach is to first:

Make sure you have enabled all available automatic optimizations, such as compiler optimizations.
If this does not yield sufficient gains, then:
- Gather and analyze empirical data about where your program is spending its time or other resources.
- Use this data to form a hypothesis about what particular implementation features are “bottlenecks” contributing to poor performance.
- Now you are ready to attempt some modifications and use this same measurement process to evaluate their impact.
  - Think first in Big-O terms. Consider lower level implementation optimizations only once you have exhausted opportunities for algorithmic optimizations.

Using a Profiler for More Detailed Time Measurement

Your performance index measures the efficiency of your implementation, but gives no hints about why or where it is (in)efficient. Using a profiler can help answer these questions and support more informed choices about performance engineering.

A profiler essentially measures (via some approximation) where your code spends most of its execution time. Presented with a report of this information, it is hopefully intuitive that:

Functions that account for the largest chunk of execution time are the functions where you should focus your optimization efforts: In an ideal world, doubling the speed of a function that accounts for 50% of overall execution time should cut 25% off the execution time.
Functions that account for only small fractions of execution time are not worth optimizing: even if you could eliminate the cost of a function that accounts for 0.5% of execution time, this would make overall execution only 0.5% faster.

Things are not quite this clear-cut, since there can be complex interactions between the choices made in one function and the efficiency of another. However, to a first approximation, this is a useful way of thinking about profiling results.

gprof is a great place to start. Enable it by passing the -pg option to the compiler. (See comments in Makefile.) Now, when run, the compiled executable will generate a file gmon.out that can be interpreted by running the gprof command. Read about gprof for more on how to use the profiler and its results. Be sure to turn off -pg for your performance evaluation, because it will slow things down.

After making a change to the Makefile’s configuration of the compiler, make clean to remove the old executables, then make again to compile fresh executables using the new compiler configuration.

Submission

To submit the required prep exercises, share your Google Doc with the instructors.

Before submitting, disable any diagnostic printing that you added (except in check_heap). Make sure DEBUG is off (set to 0).

For code, submit the best implementation that you have. If you have a partially implemented improvement or if your most recent version is not your best:

Copy your partial or non-best version of mm.c to a new file starting with mm- and ending with .c. Then add and commit it with a descriptive message. For example:
```
cp mm.c mm-partial-explicit.c
git add mm-partial-explicit.c
git commit -m "Copied partial explicit free list implementation to mm-partial-explicit.c"
```
Use git log to find the commit ID (large hexadecimal number) of your best working version.
Use git checkout ID -- mm.c (replace ID with the commit ID you found above) to restore the working version you will submit.
Rebuild it with make clean and make, then test it to make sure this is the one.

Add and commit this version of mm.c:

git add mm.c
git commit -m "Restored best working version of mm.c"

Then, to submit your code, follow the usual steps:

Submit: The course staff will collect your work directly from your hosted repository. To submit your work:

Test your source code files one last time. Make sure that, at a minimum, submitted source code is free of syntax errors and any other static errors (such as static type errors or name/scope errors). In other words: the code does not need to complete the correct computation when invoked, but it must be a valid program. We will not grade files that do not pass this bar.
Make sure you have committed your latest changes. (Replace FILES with the files you changed and MESSAGE with your commit message.)
```
$ git add FILES
$ git commit -m "MESSAGE"
```
Run the command cs240 sign to sign your work and respond to any assignment survey questions.
```
$ cs240 sign
```
(If this encounters an error, instead execute cs240.f23 sign.)
Push your signature and your latest local commits to the hosted repository.
```
$ git push
```

Confirm: All local changes have been submitted if the output of git status shows both:

Your branch is up to date with 'origin/main', meaning all local commits have been pushed
nothing to commit, meaning all local changes have been committed

Resubmit: If you realize you need to change something later, just repeat this process.

Grading

Your grade will be calculated from 120 points as follows:

Preparatory Exercises (20 points):
- Submit a shared Google Doc by the separate prep exercise deadline.
- You are encouraged to start working on your code as soon as you have the prep exercises done, even if you do not have your grade back on them yet.
Documentation and Style (10 points):
- Document your code clearly.
- Follow the programming rules.
Correctness (50 points): based on passing mdriver.bin tests
- 2.5 points each for short1-bal.rep and short2-bal.rep.
- 5 points each for the 9 remaining traces, excluding the optional realloc*.rep traces.
Performance (40 points): 40 × NP
- NP is the normalized mdriver.opt.bin performance index.
- These points are possible only for working implementations.
Extra Fun Successful open-ended extensions may earn additional points.

You may use any implementation strategy that works (subject to programming rules). Allocator performance is a large component of the grade for this assignment. A high-quality implicit free list allocator with a performance index around 50/100 can achieve a grade of about 80; higher grades require improved implementations.

Extra Fun Open-Ended Extensions

There are a number of interesting allocator features you could add if looking for more fun or extra credit. Each is annotated with a possible extra credit award for a thorough, well-designed, well-implemented extension. Other improvements to allocator performance on the standard test suite will naturally be reflected in your grade independent of extra credit. Talk to the instructors if you are curious about any of these extensions or if you have ideas for others. Some larger extensions would be larger than the original assignment. In general, the amount of work required per extra point is far larger than the amount of work required per standard point, so you should only approach these once the rest of your project is complete, well-tested, and well-documented.

Implement realloc

(Up to +10) Implement another memory-allocation-related function: mm_realloc. The signature for this function, which you will find in your mm.h file, is:

extern void* mm_realloc(void* ptr, size_t size);

Similarly, you should add the following in your mm.c file:

void* mm_realloc(void* ptr, size_t size) {
  // ... implementation here ...
}

Follow the contract of the standard C library’s realloc exactly (pretending that malloc and free are mm_malloc and mm_free, etc.). The man page entry for realloc says:

   The realloc() function changes the size of the memory block pointed to by
   ptr to size bytes.  The contents will be unchanged in the range from the
   start of the region up to the minimum of the old and new sizes.  If the
   new size is larger than the old size, the added memory will not be
   initialized.  If ptr is NULL, then the call is equivalent to
   malloc(size), for all values of size; if size is equal to zero, and ptr
   is not NULL, then the call is equivalent to free(ptr).  Unless ptr is
   NULL, it must have been returned by an earlier call to malloc(), calloc()
   or realloc().  If the area pointed to was moved, a free(ptr) is done.

A good test would be to compare the behavior of your mm_realloc to that of realloc, checking each of the above cases. Your implementation of mm_realloc should also be performant. Avoid copying memory if possible, making use of nearby free blocks. You may not use standard library functions such as memcpy to copy memory. Instead, copy WORD_SIZE bytes at a time to the new destination while iterating over the existing data.

To run tracefiles that test mm_realloc, compile using make mdriver-realloc.bin. Then, run mdriver-realloc.bin with the -f flag to specify a tracefile, or first edit config.h to include additional realloc tracefiles in the default list.

Extended Error-Checking

(Up to +100) Augment the allocator to do additional error-checking on the fly to help detect common memory errors such as invalid frees (bad address), double-free, use-after-free, out-of-bounds access on heap objects corrupting heap metadata, etc. You could insert extra padding, use canary values, place recently freed blocks in a waiting area, track additional information about earlier allocations and frees to detect errors, or more. Come chat if you have ideas or interest here.

This document is based on the CSAPP Malloc Lab, which is available on the CSAPP website. The Malloc assignment includes additional structure in the starter code and makes the realloc implementation optional. ↩
The page size 4096 is the number of bytes returned by mem_pagesize() and used by heap enlarging function mem_sbrk to make the heap larger. ↩