CS 341: mprotect

Overview
Part 0: Using xv6
- Build and Run xv6
- GDB for the xv6 Kernel
Part 1: The Life of a Syscall
Part 2: Protecting Against Null Pointer Dereference
Part 3: The mprotect System Call and Immutable Code
- Suggested Workflow
Submission and Grading

Overview

Assign: Thursday, 12 Nov
Checkpoint: aim to have started work on null protections; send a progress update Monday, 16 Nov
Checkpoint: aim to have finished work on null protections and started on mprotect; send a progress update Wednesday, 18 Nov
Due: Friday, 20 Nov
Teams: pairs or individuals
Submit: git add, git commit, and git push your completed code.
Reference:

In this project, you will implement some basic memory protection features for xv6, a small Unix-like operating system. The required tasks for the project are to:

Modify xv6 to ensure that dereferencing a null pointer (NULL) in a user program leads to an explicit error and termination of the process, rather than accessing memory contents at address 0. This part is fairly carefully guided.
Modify xv6 to implement the mprotect system call and use it to set restrictive permissions for protection of important regions of memory that should not be written, such as the code (text) section. This part is much more open.

Scope: While you may not write all that much code for this project, it will be much more involved and immersive than the preceding C programming tasks since you are working with a large code base. Much of your time in this project will be spent learning your way around the xv6 codebase before you actually get to writing code. Take this seriously and expect to spend a lot of time on it. Expect to feel a little overwhelmed at some moments, especially as you are figuring things out, but trust that we will find our way. Share your knowledge liberally! Ask/answer questions on Zulip; help others find their way; accept help from others to understand the existing code. The code you write should be your own, but please feel free to make navigation of the existing code more of an open class effort. Please check in with me often and do not be shy to report where you really are. We will work together to make the project work and adjust as needed.

Dependency chains: Parts 0 and 1 do not involve producing anything. They are about finding your way in xv6. They should absolutely be done before you attempt Parts 2 or 3. Parts 2 and 3 can be approached separately. Part 2 is more guided and requires less code. I do recommend doing it first, but if you are really stuck and Part 3 makes more sense to you, there’s no technical reason you cannot go there first.

Acknowledgment: This project is based on an OSTEP project plan, with some additional guidance.

Part 0: Using xv6

xv6 is a small educational Unix-like operating system designed and built at MIT. The xv6 book covers its design and implementation. (Throughout the book, numbers in parentheses refer to a specific line of code in the full source code listing of xv6, which matches your starter code.) xv6 is big, in the sense that its a few thousand lines of C and assembly code, but it’s small, in the sense that operating systems like Linux, macOS, or Windows are millions of lines of code. While we will not have time to become acquainted with all parts of xv6 this term as part of CS 341, doing so on your own time in the future is entirely feasible. Note: we are using the version of xv6 for the x86 architecture; newer versions use the RISC-V architecture.

Build and Run xv6

Since xv6, while relatively small, is a real OS, and running (or debugging!) a real OS on real hardware can get quite mysterious and painful when things are not working, we will use an emulator called QEMU. QEMU is just a software program that pretends to be x86 hardware. xv6 code will not really know the difference, but this will make our lives much better. The CS server (cs.wellesley.edu a.k.a. tempest) is setup with the tools you need. If you have a local Linux environment, you can likely get QEMU and other tools working, but the project alone will have plenty of places to spend your time, so I strongly advise using tempest. :-) Fair warning, tempest has a pretty old version of QEMU, so newer fancy features may not be available.

Once I have created your team’s GitHub repository, you will find a copy of the xv6 code there (with very minor tweaks in the Makefile for our setup). Clone your repository to somewhere in your CS home directory, cd into it, and then run make build. That will build xv6 and create a disk image for QEMU.

bpw@tempest $ make
gcc -fno-pic -static -fno-builtin -fno-strict-aliasing -O2 -Wall -MD -ggdb -m32 -Werror -fno-omit-frame-pointer -fc

[… much more excitement …]

dd if=kernel of=xv6.img seek=1 conv=notrunc
347+1 records in
347+1 records out
178088 bytes (178 kB) copied, 0.00536816 s, 33.2 MB/s

To run xv6 in QEMU (and first build anything that’s been updated), use make run. This will take over your current terminal session as QEMU launches. I recommend having two terminal sessions open so you can poke at the xv6 source code while also working in xv6 itself. (You will also need two sessions for debugging, later.) After make run, you should see some build messages go by, then you should end up with something like this:

SeaBIOS (version 1.11.0-2.el7)


iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+1FF94780+1FED4780 C980
                                                                               


Booting from Hard Disk..xv6...
cpu1: starting 1
cpu0: starting 0
sb: size 1000 nblocks 941 ninodes 200 nlog 30 logstart 2 inodestart 32 bmap start 58
init: starting sh
$

Here you see the BIOS (very low-level pre-OS boot-time setup), then some output from xv6 as it boots, then a note that it is starting sh, its shell, and finally a shell prompt. By default, everything is in one top-level directory. Try ls.

$ ls
.              1 1 512
..             1 1 512
README         2 2 2287
cat            2 3 14452
echo           2 4 13308
forktest       2 5 8132
grep           2 6 15988
init           2 7 14196
kill           2 8 13340
ln             2 9 13280
ls             2 10 16136
mkdir          2 11 13368
rm             2 12 13344
sh             2 13 24776
stressfs       2 14 14296
usertests      2 15 67192
wc             2 16 15116
zombie         2 17 13004
console        3 18 0
$

Yep, that’s it! Note that the console support is very basic as is the shell. No fancy history or convenience editing commands. if you enter odd characters or move around with the arrow keys, you’ll probably see your next command fail, but a prompt will return and keep working. Keep it simple.

Try some commands using the executables you see here.

The next question is: how do we get out? Shutdown? Here, we will enter QEMU’s “monitor mode” and terminate the emulator. This will also work even if you have totally wedged the OS with changes that you make. To enter QEMU monitor mode, type Control-a then c. You should see a QEMU prompt:

QEMU 2.0.0 monitor - type 'help' for more information
(qemu)

Here you can type quit or just q to shut things down. There are many other things you can do here. (See help.)

Note that changes to the xv6 file system are not durable between runs. Each time we boot from the clean image.

Alternatively, to pause emulation in QEMU (kind of like stopping time for xv6), enter QEMU monitor mode with Control-a c, then type stop at the (qemu) prompt. When you are ready to resume, type continue at the (qemu) monitor prompt, then type Control-a c again to go back to the xv6 console.

Please make sure to shutdown QEMU when not in use.

The older version of QEMU we have on tempest is not especially efficient. Basically, it uses up 100% of a CPU core whenever it’s running xv6. This is fine while you are running and interacting with xv6, but please make sure to shutdown QEMU when you stop working.

If you might have lost your connection and left a QEMU running, you can SSH to tempest and run pkill -u $USER qemu* to terminate all of your running QEMUs. (Don’t worry, you won’t be able to terminate anyone else’s!)

GDB for the xv6 Kernel

Now, run xv6 with GDB so you can inspect the kernel. The setup for this is a bit more complicated than running a normal program under a debugger, so much of the setup steps are automated in a script for you.

Start xv6 under QEMU by running make gdb. (We have made some minor tweaks to the Makefile to reduce typing and configure for defaults on tempest.) On tempest, you will likely see startup that looks a bit like this:

bpw@tempest $ make gdb
sed "s/localhost:1234/localhost:26228/" < .gdbinit.tmpl > attach.gdb
which: no qemu in (/net/atlas/bin:/usr/local/bin:/net/bin:/usr/local/smlnj/bin:/usr/local/racket/bin:/opt/rh/)
which: no qemu in (/net/atlas/bin:/usr/local/bin:/net/bin:/usr/local/smlnj/bin:/usr/local/racket/bin:/opt/rh/)

(process:13057): GLib-WARNING **: 17:42:05.252: gmem.c:489: custom memory allocation vtable not supported
*** Now run 'gdb -x attach.gdb' in another terminal.
qemu-system-i386 -nographic -drive file=fs.img,index=1,media=disk,format=raw -drive file=xv6.img,index=0,media=dis8

(process:13060): GLib-WARNING **: 17:42:05.390: gmem.c:489: custom memory allocation vtable not supported

Aside: tmux

One convenient—but totally optional—way to work with multiple terminal sessions over a single SSH connection is to use tmux, a terminal multiplexer, which supports things like seeing multiple terminals at once, detaching from your session so you can log out of tempest but leave things running, then login and reattach to continue your terminal session later.

Notice that things seem to have paused. There is no further output; we have not reached the xv6 boot output or the xv6 shell prompt. This is because the QEMU startup process has paused to allow an external GDB running outside QEMU/xv6 to attach and observe the emulated machine.

There is some noise here that you can ignore, including lines from make and some warning about versions of software on tempest (the which and WARNING lines). The line of interest is:

*** Now run 'gdb -x attach.gdb' in another terminal.

Do that. Leave the current terminal where it is, then open a second terminal session on tempest, cd to the same directory, and run the command gdb -x attach.gdb. (It’s critical that you’re in the same directory as your xv6 source code.) In the new terminal, running this command should bring you to a GDB prompt like this:

bpw@tempest $ gdb -x attach.gdb 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
+ target remote localhost:26228
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

The target architecture is assumed to be i8086
[f000:fff0]    0xffff0:	ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file kernel
(gdb)

GDB is now attached and in control. Let’s set a breakpoint for the exec function inside the kernel using the break or b command in GDB:

(gdb) break exec
Breakpoint 1 at 0x801009a0: file exec.c, line 12.
(gdb)

Now let the boot process continue until it first runs the exec function in the xv6 kernel by using the continue or c command in GDB:

(gdb) continue
Continuing.
[New Thread 2]
[Switching to Thread 2]
The target architecture is assumed to be i386
=> 0x801009a0 <exec>:	push   %ebp

Breakpoint 1, exec (path=<error reading variable: can't compute CFA for this frame>, 
    argv=<error reading variable: can't compute CFA for this frame>, argv@entry=0x8dfffea0) at exec.c:12
12	{
(gdb)

Take a look back at your first terminal. Notice that the boot process has continued and there is more output from xv6. However, we are still not at the shell prompt, since an exec syscall must run to get us there. Let’s see where we are with backtrace, bt, or where

(gdb) backtrace
#0  exec (path=<error reading variable: can't compute CFA for this frame>, 
    argv=<error reading variable: can't compute CFA for this frame>, argv@entry=0x8dfffea0) at exec.c:12
#1  0x80105173 in sys_exec () at sysfile.c:419
#2  0x8010469a in syscall () at syscall.c:139
#3  0x80105699 in trap (tf=<error reading variable: can't compute CFA for this frame>) at trap.c:43
#4  0x801053e2 in alltraps () at trapasm.S:20
(gdb)

Aha! Interesting. The back trace bottoms out in the entry point for traps in xv6. Of course, this trap must have been delivered by an explicit int (interrupt) instruction somewhere, but we cannot see beyond the trap handler, which sits at the base of the kernel stack for this process. (The user stack is elsewhere – or is it? Since this is the very first process, was this trap really issued from a user program? Hold that thought for now.)

Follow a few more steps to reach the shell prompt in xv6:

Take note of the xv6 output you see in the first terminal, then tell GDB to continue.
Now you should see output from xv6 up through init: starting sh, but it looks like we have arrived once more at the exec breakpoint over in GDB.
continue once more. GDB should appear to be stuck or running computation but not yet reaching a breakpoint. Over in xv6, you should see a prompt.

Now every time you run a command in the xv6 shell, you should catch the exec breakpoint over in GDB. Of course, you could remove this breakpoint or install a breakpoint anywhere else in the kernel. (Usually, it’s best to do that in a fairly “high level” function in the kernel, closer to the trap handler than to deep kernel internals.) You can also quit GDB, which will allow xv6 to continue without GDB supervision. Terminate the QEMU session as usual when you are done.

Part 1: The Life of a Syscall

As a warmup, take a tour of the life of a syscall in xv6. You will need to make some small additions along these code paths for the last part of the project.

There are a few things you can do to find the relevant code: use the ones that you find most useful. Whatever guide you follow, you should absolutely open up the files in your own xv6 codebase to follow along.

This document gives a brief overview of the code involved in getting from user code to kernel system call implementation and back..
Starting at 12:26, this video (by one of the OSTEP authors) gives a detailed tour through through the same code. Note: The version of xv6 that we are using has a completely flat filesystem structure, instead of the include, kernel, user, etc. directories shown in the video, but the code is nearly identical.
The xv6 book covers the entirety of xv6. Chapter 3 includes substantial detail on system calls. (Numbers in parentheses refer to a specific line of code in the full source code listing of xv6.)

Try using GDB to chase through some details of exec or another system call. Try placing a breakpoint near the trap entrypoint in trap (trap.c) or one of the system call handlers in sysproc.c. Then single-step (s or step) for a while to see what happens. To skip over an entire function call at once (instead of descending into it step by step), use n or next.

Part 2: Protecting Against Null Pointer Dereference

For simplicity, the standard xv6 implementation sets up the virtual address space of each new process to store its code (a.k.a., the text section) starting at virtual address 0 (or 0x00000000, to be completely explicit, since we’ll be working in a 32-bit world). This is simple, as the entrypoint (the first instruction to run) is predictably located at address 0, but it has a downside associated with null pointers. The null pointer is generally encoded as 0. Thus if we accidentally—or intentionally—dereference the null pointer (the zero address) we will access the code of the process! Furthermore xv6 does not protect the code (text) section in any way, meaning a program can not only read, but also overwrite its own code. That sounds like a recipe for fun or disaster, depending on your perspective.

Task 2a: Null Pointer Dereference

Write a user program in evilnull.c that reads some bytes of its own code using a null pointer, prints a few of those bytes somehow, and then writes some new value (like 0xBAD4FACE) into memory through the null pointer. This requires that you:

Write evilnull.c.
- Remember, xv6 provides a few familiar C functions in its user libraries, but not the full standard C library. You are also missing the NULL macro, which is just short for ((void*) 0). If you want to write NULL you can introduce your own macro definition with #define NULL ((void*) 0).
- Instead of #includeing normal C standard library headers, you will need to #include xv6 user headers. Check out cat.c for examples.
- printf in xv6 requires an explicit file descriptor argument. 1 is stdout. See cat.c for example.
- Your main function should end with a call to exit(). A simple return will not work quite like you expect in main for xv6.
- You may also wish to write a version that uses standard C headers and compile it manually to run directly on Linux, so you can observe its behavior there as well and compare with xv6.
Update the Makefile to ensure an evilnull program will be built from evilnull.c. You should not be writing new rules, just adding your program to the list of user programs. Check where the Makefile references _cat to figure out how to add your program to the list.
Launch xv6 in the emulator (make run), then run your evilnull program and confirm that its evilness succeeds. Be prepared for how to terminate the emulator, just in case your evilness is a bit more successful than you anticipated.

This program will serve as your basic test case to see if your work in later tasks can defeat its evil.

Task 2b: Prohibit Access to Page Zero

As a simple means of protecting against (most) null pointer dereferences, you will modify xv6 so that it reserves page 0 of every process’s virtual address space as inaccessible to a user process. As a result of this reservation, any attempt to access address 0 (or nearby positive offsets from address 0) will be intercepted by the system during a page fault, causing the xv6 to terminate the offending process.

The xv6 book, chapter 2 describes the two-level page table and virtual memory system of xv6, as well as how important system calls like exec manipulate these. Keep it close at hand when you are reading code, and vice versa.

GDB and QEMU for virtual memory

As you poke around the virtual memory system, you might like to inspect with GDB. The usual GDB commands like info locals, print, x, etc., are always helpful. In the QEMU monitor mode (Control-a c), you can also use info mem to inspect the virtual memory mappings.

Here is a rough plan to approach this task.

First, do some reconnaissance, starting from two system calls that do key setup of address spaces: exec and fork.

Examine the implementation of exec in exec.c. How does exec initialize the address space? This is a good place to practice your skills for filtering out unnecessary detail vs. important detail.
- Pay special attention to how exec loads a program’s code (or other parts) into memory from the ELF executable. (Remember, the xv6 book can help you interpret the code.) Where in the virtual address space do the executable segments get placed?
- Does exec make any other pages inaccessible to the process? How?
- Use the grep command to search for definitions and uses of functions in the xv6 source files, e.g., grep loaduvm *.
Examine the implementation of fork. Find the code that copies the page table (i.e., the virtual address space) of the parent process to create the page table/address space of the child process. Will something here need to change?

Next, modify code to ensure the page based at virtual address 0 is unused and inaccessible:

Change how executables are created to ensure that their code is not marked to appear starting at address 0, or anywhere within the first page of memory.
- This change can be made without any change to kernel code. It just requires a small tweak in the Makefile rules that direct the linker where to place the code (text) section when generating executables for user programs. Look for the _%: rule, which will be applied to build all the UPROGS executables. Currently, this build rule follows the assumption that code (text) is at virtual address 0 in the virtual address space.
- make clean and then rebuild and boot xv6 again. Run evilnull in the xv6 shell. Do you expect its behavior to change? If so, how? Did its behavior actually change? If so, how?
Simply linking executables that will avoid placing code at address zero is only the first step. Address 0 is still accessible, so there’s more to do in the next step.
Make changes in exec or fork as needed so that the page at address 0 in the virtual address space is left unused and is definitely marked inaccessible to the user process. Look for existing functions (especially those that are exposed through defs.h) that could accomplish steps you need.

Finally, test your protections. Run evilnull again. What happens now? The evilnull process should terminate with some kind of error related to accessing address 0, but xv6 itself and the shell should continue unaffected. Create variants of evilnull that perform only reads through a null pointer and only writes through a null pointer, respectively. Both of these variants should be defeated by your protections. In other words, your modified kernel must protect against both writes and reads with the null pointer; protecting against only writes (but allowing reads) does not suffice.

If successful protection against evilnull seems to require more than a few new or changed lines of code, consider rethinking your approach.

Task 2c: Check Assumptions, Safeguard Syscalls

When pointers are passed to the kernel as system call arguments, the kernel must take great care in using them. The kernel code that uses these arguments must avoid using bad pointer arguments from the user that may cause the kernel to clobber its own memory contents, or worse.

A useful experiment would be to make a copy of the cat program (call it evilcat) that, instead of using the read and write syscalls to read input to/write output from its buf buffer (array), uses those syscalls to read/write with memory at address zero ((void*) 0, or feel free to #define NULL ((void*) 0) if you’d prefer to use NULL) acting as the buffer. How is this different than accessing address zero directly in the user process? Do your protection mechanism guard against this? Try it!

After this experiment, review other kernel code—such as code for those read/write system calls that take a pointer argument—that makes assumptions about the virtual address space. Determine if that code needs to be updated for your modified approach to protecting page zero. Update that code if so. Make sure that you can defeat evilcat’s attempts to use the null pointer.

Task 2d: Documentation

Document your implementation with inline comments and give a summary of how it works in a Markdown file null.md. Also include any known issues or limitations.

Part 3: The `mprotect` System Call and Immutable Code

Linux and most other operating systems ban writing to code (text) pages of memory by default as a safety and security precaution. xv6 leaves read-write permissions open on all mapped portions of the virtual address space by default. In this part of the project, you will implement a pair of system calls that allow changing the permissions for page-granularity sections of the virtual address space:

int mprotect(void* base, int length)
int munprotect(void* base, int length)

The versions of these system calls available in Linux, macOS, and other POSIX systems, allow the caller to dictate specific permissions with a third argument. Our simplified form will implicitly control read access alone:

mprotect disables ~~read~~ write access for a section of the address space starting at the base address and continuing up through the end of the page containing the address base + length - 1.
munprotect reenables ~~read~~ write access for a section of the address space starting at the base address and continuing up through the end of the page containing the address base + length - 1.

After mprotecting a section of the address space any reads by the user program should still succeed, but writes should cause a trap and termination of the process. After munproctecting a section of memory, writes to that section should succeed.

There are restrictions on what arguments to these two system calls are valid:

The base address argument must be aligned to a page boundary.
The length argument must be positive. It shall be rounded up to the nearest multiple of the page size so that mprotect and munprotect operate over all pages that intersect the specified range.
All pages touched by the range specified by base and length must be pages that are already mapped and accessible by the user process. In other words, if mprotect is called with a range that includes virtual addresses of unmapped or inaccessible pages, it should return in error and leave protections as they were before the mprotect syscall.

The implementations of these system calls must check their arguments for validity. If the above rules are violated, or if any other error arises, the system calls should avoid making any changes and instead return -1 to indicate failure. If the arguments are valid, the system calls should proceed as requested and return 0 upon success.

Implementation note: after updating the page table, we need to make sure that the TLB is also up-to-date. With no explicit action from the system call, the TLB might otherwise continue to hold a now-stale copy of a page table entry that has out-of-date protection bits. For our purposes you should simply call the lcr3 function with the existing physical address of the page table. (grep around to find its usage in the existing code.) This function writes to the x86 cr3 register (the register holding the page table address) and in turn forces invalidation of the TLB entries. (This general sort of event is called a “TLB shootdown.”) The first following accesses to any pages will thus suffer TLB misses that cause the MMU to reload TLB entries from the (updated) in-memory page table. There are some subtleties to getting this right on a multicore system, but this level should suffice for our project.

Suggested Workflow

Build and test this incrementally. Work on mprotect alone to start. After mprotect is working, repeat the steps (probably much more quickly) for munprotect.

Write some simple test cases that try to “break” mprotect.
Build an mprotect that takes no arguments and does nothing other than print mprotect. Refer to the syscall tour to remember all the places you need to add info about this system call.
Add arguments to mprotect and print them; return 0.
Check the validity of arguments and create the return value accordingly.
Make the appropriate page table updates.

It may be useful to introduce temporary simplifying assumptions along the way such as assuming that a single mprotect call refers to only 1 page, and then remove those assumptions once the simple case is working.

Demonstrate correctness with tests and documentation: At the end, your tests should be able to demonstrate the correctness of your implementation. Document your implementation with inline comments and give a summary of how it works in a Markdown file mprotect.md. Include any known issues, as well as instructions for how to run your tests and what they should demonstrate.

Optional: reuse your mprotect implementation to remove write access on the code (text) sections of memory by default in exec.

Submission and Grading

Make sure to git add any new files and changes, git commit, and git push.

There will be an interactive demo opportunity. Grading will be based on:

80% correctness
20% code quality

Contents