Remembrallocator
code Implement a memory allocator.
- Due: By the end of finals, 4:00pm Monday, 16 May
- Starter Code: fork wellesleycs240 / cs240-malloc (just once!) and add bpw as admin. (Need help?)
- Submit:
- Commit and push your final revision and double-check that you submitted the up-to-date version. (Need help?)
- Do not submit a paper copy.
- Relevant Reference:
- Collaboration: Optional pair code assignment policy, as defined by the syllabus.
Ben will be available during most of reading period and finals for ad hoc office hours.
Contents
- Overview
- Specification
- Provided Code and Block Layout
- Memory System
- Programming Rules
- Compiling and Testing
- Design and Debugging Hints
- Grading
- OPTIONAL Extra Credit
Overview
In this assignment you will implement a dynamic memory allocator for C programs, i.e., your own version of malloc
and free
. This task requires careful bit-level manipulation of memory contents using many of C’s unsafe pointer casting features. The code for your allocator will not be very large, but it will be subtle.
The code for your allocator will not be very large, but it will be subtle. Before coding:
- Review the workings of explicit free-list allocators.
- Read and understand the provided code and heap layouts.
- Write all the steps of your malloc and free algorithms in pseudocode and pictures before writing a single line of C.
Preparation
In the past, students who took the time to read and understand material about allocators in the textbook went on to implement this allocator quickly and efficiently with relatively little debugging. Students who did not understand the preliminaries before starting struggled much more while coding and debugging. Here are two ways to prepare:
In lab
In lab, you will explore heap layout rules for this allocator, as well as provided code that establishes rules for manipulating the heap. As you go, you will annotate an under-documented version of the provided code. Be sure you understand the material examined in the lab well before writing any of your own code. Other general tips:
- Take notes on anything else you learn during the lab that it does not explicitly require.
- Read these instructions and comments in provided code carefully.
- Draw diagrams of the heap and execute operations on the visual heap by hand to become familiar with the details.
- Every line of provided code does something meaningful and necessary.
On your own
Read CSAPP section 9.9 to become familiar with the type of thinking and coding you will do to build the allocator. The book discusses an implicit free list allocator (we build an explicit freelist), but the material is still very relevant to your task. The following practice problems on memory allocation may be helpful in preparing for this assignment. They will not be graded. Solutions not in the textbook are available by visiting office hours.
Note “word” means 4 bytes for these problems.
- Practice Problem 9.6
- Practice Problem 9.7
- Homework Problem 9.15
- Homework Problem 9.16
Specification
The starter code includes several files. You will modify mm.c
to implement an allocator interface as declared in mm.h
.
int mm_init(); // Provided
void* mm_malloc(size_t size); // Implemented by you.
void mm_free(void* ptr); // Implemented by you.
The provided mm.c
file partially implements an allocator based on an explicit free list. It provides several helper functions beyond the interface above. Your job is to complete mm_malloc()
and mm_free()
. The three main memory management functions should work as follows:
-
int mm_init()
(provided): Initialize the heap. The return value is -1 if there was a problem in performing the initialization and 0 otherwise. The application program (in this case, the driver we provide) callsmm_init
exactly once to initalize the system before callingmm_malloc()
ormm_free()
. -
void* mm_malloc(size_t size)
: Return a pointer to an allocated block payload of at leastsize
bytes within the heap orNULL
if an error occurred. All payloads must be aligned to 8 bytes. The entire allocated block should lie within the heap region and should not overlap with any other allocated block. (size_t
is a type for describing sizes; it’s an unsigned integer large enough to represent any size within the address space.) -
void mm_free(void* ptr)
: Free the allocated block referenced by the pointerptr
. Assume thatptr
was returned by an earlier call tomm_malloc()
and has not yet been freed. These semantics match the semantics ofmalloc
andfree
in the standard C library. Typeman malloc
in the shell for complete documentation.
Provided Code and Block Layout
We provide code in mm.c
to help maintain an explicit free list using a specific block layout, documented with the code. You are encouraged to use this approach and code, but you may implement mm.c
in any way you wish. Be sure to understand the block layout and the preconditions and postconditions of the provided code before using it.
Explicit free-list allocators must distinguish memory order from list order. Memory order refers to the order of blocks as arranged in the memory address space. We use the terms adjacent, preceding, and following to refer to memory order. List order refers to the order of blocks in the explicit free list. We use the terms next and previous to refer to list order. Confusing these orders will lead to tricky bugs.
Memory System
The memlib.c
package simulates the memory system for your
dynamic memory allocator. You will not need to call these functions directly, but understanding their effects is necessary for understanding the provided helper functions above.
-
void* mem_sbrk(int incr)
: Expand the heap byincr
bytes, whereincr
is a positive non-zero integer and returns a pointer to the first byte of the newly allocated heap area.The semantics are identical to the Unix
sbrk
function, except thatmem_sbrk
accepts only a positive non-zero integer argument. (Runman sbrk
if you want to learn more about what this does in Unix.) This function and its approach to managing heap bounds date from before virtual memory. We use it for simplicity (rather than more flexible ways of managing available pages). -
void* mem_heap_lo()
: Return a pointer to the first byte in the heap -
void* mem_heap_hi()
: Return a pointer to the last byte in the heap. -
size_t mem_heapsize()
: Return the current size of the heap in bytes. -
size_t mem_pagesize()
: Return the system’s page size in bytes (4K on Linux systems).
Programming Rules
-
Do not change any of the function types in
mm.c
. You may add as many helper functions and macros as you wish. -
Do not invoke any standard memory-management related library calls or system calls. Do not use
malloc
,calloc
,free
,realloc
,sbrk
,brk
or any variants of these calls in your code. (You may use all the functions inmemlib.c
, of course.) -
Do not define any global or
static
compound data structures such as arrays, structs, trees, or lists in yourmm.c
program. You may declare global scalar variables such as integers, floats, and pointers inmm.c
, but avoid them if at all possible.
Compiling and Testing
The driver program mdriver.c
tests your mm.c
implementation for correctness, space utilization, and throughput. Use the
command make
to generate the driver code and run it with
the command ./mdriver -V
(the -V
flag
displays helpful summary information as described below).
The driver program is controlled by a set of trace files. Each trace file contains a sequence of allocate and free directions that instruct the driver to call
your mm_malloc
and mm_free
functions in some
sequence. The driver and the trace files are the same ones we will use
when we grade your submitted implementation.
The mdriver
executable accepts the following command
line arguments:
-t <tracedir>
: Look for the default trace files in directorytracedir
instead of the default directory defined inconfig.h
.-f <tracefile>
: Use one particulartracefile
for testing instead of the default set of tracefiles.-h
: Print a summary of the command line arguments.-l
: Run and measurelibc
malloc in addition to the student’s malloc package.-v
: Verbose output. Print a performance breakdown for each tracefile in a compact table.-V
: More verbose output. Prints additional diagnostic information as each trace file is processed. Useful during debugging for determining which trace file is causing your malloc package to fail.
Debugging haphazardly is slow and frustrating. Debugging methodically is much more efficient. See the tips below to become a better debugger.
Design and Debugging Hints
Low-level unguarded memory manuiplation code like this allocator is prone to subtle and frustrating bugs. There’s a reason you often use higher-level languages with more protections! The following strategies can help you replace long hours of frustration and confusion with shorter effective debugging efforts.
There are three main pieces to effective debugging. The first steps toward effective debugging happen before you write any code to help avoid bugs in the first place and make them more obvious if they do occur.
Plan Carefully
Never write code until you have a good idea of what you are doing.
-
Understand the malloc implementation in the textbook. The textbook has a detailed example of a simple implicit free list allocator. Use this as a point of departure. Do not start working on your allocator until you understand everything about the simple implicit list allocator.
-
Read and understand the provided code before writing any of your own.
-
Write out your algorithms in pseudocode and execute them in pictures before writing a single line of C.
Code Defensively
Expect things to go wrong. Anywhere your code relies on an assumption, check it explicitly. Explain to your partner why your code should work. Write error-prone code once carefully and reuse it.
-
Use assertions. Anywhere you assume a property of data to hold, assert that property explicitly. Assertions document your assumptions. They are also checked explicitly as your program runs, making it easier to localize the source of an error if one occurs.
-
Encapsulate pointer arithmetic in C preprocessor macros. Pointer arithmetic in memory managers is confusing and error-prone because of all the casting that is necessary. We have supplied maros that do this: see
PADD
andPSUB
.
Debug Methodically
- Did your program hit an explicit error? A segfault?
- Localize it. Exactly what line of code, what operation in this line, and what value used by this operation manifested the error and crashed?
- Use
gdb
to run the code until it crashes, then find the line that crashed. - Determine what ill-formed pointer was dereferenced to cause a segmentation fault. Use
print
to inspect the value of a variable or other expression.
- Use
- Trace backward through the dependences of this operation to the original logic error.
- Did any of this operation’s arguments hold invalid values?
- If so, what operations produced these invalid values?
- Continue tracing backwards treating these producer operations as broken operations.
- Was it invalid to apply this operation here given correct arguments (or regardless of the arguments)?
- If so, what control flow decision led to execution of this operation?
- Continue tracing backwards treating these producer operations as broken operations.
- You may need to trace back beyond the function where the crash happened.
- Get a
backtrace
to see where the current function was called. (Don’t just assume you know which function called this one. Get the truth.) - If you want to inspect local variables in functions that called this one, use
up
anddown
ingdb
to step up and down the call stack, thenprint
what you want to see.
- Get a
- Remember,
mm_malloc
andmm_free
are called many times during the execution of the driver program. They depend on their arguments, but also the entire contents of the heap as it exists when they are called, so changes made by earlier calls tomm_malloc
andmm_free
affect data used by this one.- Use the
mm_check
function to scan the heap. - Run it at arbitrary times in
gdb
withcall mm_check()
. - Or hard-code calls to it in your code so you see how the heap evolves with each
mm_malloc
ormm_free
call.
- Use the
- Did any of this operation’s arguments hold invalid values?
- Localize it. Exactly what line of code, what operation in this line, and what value used by this operation manifested the error and crashed?
-
valgrind
may be less useful since you are doing very low-level work, implementing the memory allocator itself. Usingvalgrind
“below” the memory allocator intended for analyzing programs that use the memory allocator. -
Use the
mdriver
-f
option. During initial development, using tiny trace files will simplify debugging and testing. We have included two such trace files (short1-bal.rep
andshort2-bal.rep
) that you can use for initial debugging. -
Use the
mdriver
-v
and-V
options. The-v
option will give you a detailed summary for each trace file. The-V
will also indicate when each trace file is read, which will help you isolate errors. - Does your code seem to be running forever? That might be because it is no longer broken! Comment out those calls to
mm_check
. Printing the full contents of the heap on every allocation event costs orders of magnitude more than the allocations or frees themselves.
Grading
Your grade will be calculated as follows:
- Correctness (50%). Based on
mdriver
tests. - Style (25%).
- Your code should use as few global variables as possible (ideally none!).
- Your code should be as clear and concise as possible.
- Each function should have a header comment that describes what it does, how it does it, and what it assumes. Keep this description high-level.
- Since some of the unstructured pointer manipulation inherent to allocators can be confusing, short inline comments on steps of the allocation algorithms are also recommended. (These will also help us give you partial credit if you have a partially working implementation.)
- Performance (25%). We are most concerned about the correctness
of your implementation. For the most part, a correct implementation
based on our provided code will yield reasonable performance. Two performance metrics will be
used to evaluate your solution:
- Space utilization: The peak ratio between the aggregate
amount of memory used by the driver (i.e., allocated via
mm_malloc
but not yet freed viamm_free
) and the size of the heap used by your allocator. The optimal ratio is 1. You should find good policies to minimize fragmentation in order to make this ratio as close as possible to the optimal. - Throughput: The average number of operations completed per second.
The driver program summarizes the performance of your allocator by computing a performance index, which is a weighted sum of the space utilization and throughput:
P = 0.6U + 0.4 min (1, T/Tlibc)
where U is your space utilization, T is your throughput, and Tlibc is the estimated throughput of the standard C libary (libc) version of
malloc
on the default traces. The performance index favors space utilization over throughput. To receive a good performance score, you must achieve a balance between utilization and throughput. A performance index of somewhere around 0.7-0.8 or above is pretty good. My basic solution to this assignment has a performance index around .9 and would receive full performance credit. - Space utilization: The peak ratio between the aggregate
amount of memory used by the driver (i.e., allocated via
OPTIONAL Extra Credit
There are a number of interesting allocator features you could add if looking for more fun or extra credit. Each is annotated with a possible extra credit award for a thorough, well-designed, well-implemented extension. Talk to me if you are curious about any of these or if you have ideas for others. Some of the larger extensions would be larger than the original assignment.
Basic Heap Consistency Checker (up to +25%)
Dynamic memory allocators are notoriously tricky to program correctly and efficiently. They involve a lot of subtle, untyped pointer manipulation. In addition to the usual debugging techniques, you may find it helpful to write a heap checker that scans the heap and checks it for consistency.
Some examples of what a heap checker might check are:
- Is every block in the free list marked as free?
- Are there any contiguous free blocks that somehow escaped coalescing?
- Is every free block actually in the free list?
- Do the pointers in the free list point to valid free blocks?
- Do any allocated blocks overlap?
- Do the pointers in a heap block point to valid heap addresses?
Your heap checker will extend the function int mm_check(void)
in
mm.c
. Feel free to rename it, break it into several functions, and
call it wherever you want. It should check any invariants or
consistency conditions you consider prudent. It returns a nonzero
value if and only if your heap is consistent. This is not required,
but may prove useful. When you submit mm.c
, make sure to remove any
calls to mm_check
as they will slow down your throughput.
Extended Error-Checking (up to +100%)
Building from a heap consistency checker, consider how to augment the allocator to do additional error-checking on the fly to detect common errors when using a memory allocator. You could insert extra padding, use canary values, place recently freed blocks in a waiting area, or track additional information about earlier allocations and frees to detect errors. In other words, implement a couple of the kinds of checks that valgrind
’s memcheck
tool offers.
Implement realloc (up to +25%)
Implement a final memory allocation-related
function: mm_realloc
. The signature for this function, which
you will find in your mm.h
file, is:
extern void* mm_realloc(void* ptr, size_t size);
Similarly, you should find the following in your mm.c
file:
void* mm_realloc(void* ptr, size_t size) {
// ... implementation here ...
}
Follow the contract of the standard C library’s
realloc
exactly (pretending that malloc
and
free
are mm_malloc
and mm_free
, etc.).
The man page entry for realloc
says:
The realloc() function changes the size of the memory block pointed to by
ptr to size bytes. The contents will be unchanged in the range from the
start of the region up to the minimum of the old and new sizes. If the
new size is larger than the old size, the added memory will not be
initialized. If ptr is NULL, then the call is equivalent to
malloc(size), for all values of size; if size is equal to zero, and ptr
is not NULL, then the call is equivalent to free(ptr). Unless ptr is
NULL, it must have been returned by an earlier call to malloc(), calloc()
or realloc(). If the area pointed to was moved, a free(ptr) is done.
A good test would be to compare the behavior of your mm_realloc
to
that of realloc
, checking each of the above cases.
Your implementation of mm_realloc
should also be performant.
Avoid copying memory if possible, making use of nearby free blocks.
Do not use memcpy
to copy memory; instead, copy
WORD_SIZE
bytes at a time to the new destination while
iterating over the existing data.
To run tracefiles that test mm_realloc
, compile using
make mdriver-realloc
. Then, run mdriver-realloc
with the -f
flag to specify a tracefile, or first edit
config.h
to include additional realloc tracefiles in the
default list.
Alternative Policies (up to +25%)
Reimplement the searchFreeList
function to implement next-fit or best-fit.
Seglists (up to +50%)
Convert our single-list allocator to a seglist allocator and measure performance impact. This will involve changing or replacing provided code at a fairly deep level. It may be more work than the assignment so far, but very interesting!