CS 341: Open!

Final Project
- Format
- Stages
- Tools
Project Stages and Timeline
Evaluation and Grading
Project Ideas

Final Project

Assign: Monday, 23 Nov
Due: Friday, 18 Dec
Teams: up to 3 people
Submit: various

The goal of the open final project is to build and evaluate a piece of system-level software or conduct an in-depth experimental measurement study and evaluation of existing system-level software. Some sample ideas are given, but the definition of system-level is left intentionally vague and bounds are quite open. The effort and scope expectations for the project are greater than those for the earlier prescribed projects. Use the course grade weighting and calendar timelines as rough gauges for the expected relative effort and scope.

Format

All projects will include:

System-level programming.
Empirical measurement or testing.
A presentation to the class.
A written report or documentation.

Different types of projects may have different balances of the components. I will work with you to guide that balance. Most projects will follow into one of two styles: design and implementation or measurement study. Some projects may blend the two styles.

Design and Implementation Format

A design and implementation project focuses on creating a software artifact.

Substantial code will be written to implement the system.
The software artifact should be tested to demonstrate some functionality and measure basic performance metrics, but a large comparative experimental study is not required.
The presentation should focus on the problem addressed or functionality created, the design and implementation of the sytem, and a demo if possible.
The written component is primarily in the form of documentation: an extensive README describing design, implementation, and usage of the system, tests, any performance measurements, and known issues. Alternatively, the README may focus on usage and submitted components, with a PDF about design, implementation, performance, etc.
The code and tests should be well documented with comments.
There is likely little if any data to submit, except that required by test cases.

Measurement Study Format

A measurement project focuses on measuring empirical properties of real existing systems experimentally.

Some code may be written to automate the experiments or analyze their data, but existing tools likely will be applied to accomplish most of the measurement.
The presentation should focus on the measurement questions addressed by the project, the experimental design and measurement methodology, measurement results, and analysis of those results.
The experimental data should be submitted, along with any code to collect or analyze it.
The written component is primarily in the form of a report (a markdown file or a PDF) detailing the question the measurement study seeks to answer, the methodology and experimental design, a summary of experimental results, and analysis of these results, as well as discussion of any limitations of the study.
Experiment code should be well documented with comments and a README should identify what components have been submitted.

Stages

There are multiple staged deadlines for progress and components in the project:

Team Formation, Initial Proposal, and Meeting
Project Plan
Progress Check-in
Presentation
Code, Data, and Documentation/Report

For any project, keep in mind that your main implementation window will be only about two weeks long, maybe a bit longer. Keep your ambitions reasonable and think about achieving a minimum viable product before spending effort on fancy features. I encourage you to check in more often than the stage schedule requires to reach out for advice. I would be happy to assist in finding or recommending suitable tools, discussing designs, etc.

Tools

While I may encourage you in certain directions, there generally are no restrictions on the programming languages, tools, or external libraries you use, so long as they are attributed and they do not implement the core of your project for you.

Code, data, and documentation/reports should be submitted in a Git repository that I can clone. By default I will make a hosted repository on GitHub for your team; you are welcome to choose an alternative if you prefer. You are encouraged to use extra features like GitHub Issues for organizing work on the project.

If you need administrator (root) level control over a system, I would suggest installing a preferred OS inside a virtual machine (VM), which is pretty easy these days. A benefit of a VM is that if you break something at the system level, you are not breaking your own system. If you want a ready-made option, here’s a VM image with Ubuntu 18.04 (and some CS 240/251 tools installed).

If you are doing an implementation project where performance or isolated access to a real machine matters, talk to me so we can set something up. If you’d like a space where you can all access and share files directly on tempest or another CS machine, get in touch.

Project Stages and Timeline

Team Formation, Initial Proposal, and Meeting

Deadline: by end of Wednesday 25 November.

Form a team and share a Google Doc with:

A short summary of your preferred project (a paragraph)
Questions for me about logistics/feasibility for this project
A listing of at least one alternative project or area of interest

Schedule a meeting with me during drop-in slots or at another time to discuss, ask questions, form a reasonable scope for the project, and address any initial logistical needs.

Project Plan

Deadline: by end of Tuesday 1 December.

By this time, work for the project should be underway.

Update your team Google Doc with:

Any updates on direction or progress since the meeting.
A list of relevant tools and resources you will use.
What major work components need to be accomplished, and in what order you will approach them.
How your team will divide or share work
Basic characterization of a minimum viable product: the smallest thing you could get done and consider to be working. Be conservative.
A couple more levels that you would like to reach beyond the minimum viable product.
Anything you need from me to proceed with the project.
A tentative timeline, including at least:
- Minimum and hopeful status for the Progress Check-in deadline.
- Minimum and hopeful status for the Presentation deadline.

Progress Check-in

Deadline: by end of Monday 7 December.

Update your Google Doc with current status and revised timeline. Meet with me briefly.

Presentation

Scheduled: last two days of classes, Wednesday-Thursday 9-10 December.

Sign up for a time slot here. Presentation time slots will be 15 minutes long, inclusive of time for questions. Target 10-12 minutes of material (practice your timing!) to leave time for classmates to ask questions.

You may use whatever tools (such as live drawing, interactive demos, slides, key code snippets, or other media) you find useful to convey your ideas. Successful presentations should generally cover these areas, but there is plenty of leeway for how you organize, blend, or balance the content of the presentation across these areas:

Problem, Background, and Goals: What’s the central problem your system aim to solve or the question your measurement aims to solve? What are the critical requirements or properties that your system/measurement needs to accomplish? Where or why do (perhaps larger versions of) systems or measurements like yours matter in the real world? These will vary by project. Some familiar system properties will presumably pop up in some form, such as: latency, throughput, memory efficiency, utilization, fault-tolerance, reliability, safety, and more. Include some background on the nature of the problem or the new-for-this-project tools you are using for your implementation.
Design: How does the design of your system or measurement study approach its goals? At an architectural or algorithmic level (not lines of code), how does your system work?
Implementation: Are there any implementation techniques or details that are fascinating or that are key to achieving the design goals?
Evaluation: How can you measure the effectiveness of the system you are building or what are the key properties your experiments will measure? (If you have any results yet, sharing them in the presentation would be great! They should appear in the final report.)
Status and Conclusions: What’s implemented now? What do you target as the finished product? What have you learned from the project? Are there any interesting features or implementation improvements that you likely will not complete but would be interesting to explore?

All members of the team should share in the presentation duties.

Questions about the presentation? I would be happy to consult with you ahead of time. Get in touch.

Code, Data, and Documentation/Report

Deadline: by end of final exam period, Friday 18 December.

Complete and submit all components of the project, including code, any experimental data, and the documentation or report. The primary submission should be in the form of a GitHub (or other hosted) repository that you share with me. Any material outside the repo should be linked from the README.md.

The documentation and report for the final submission may be submitted as two separate documents or as a single fused document.

Your report should be organized like a miniature research paper using the set of five presentation areas above as an outline. Essentially, the report should be much like a text version of the topics covered by your presentation, updated for the final status of your project that as it is submitted and extended with any additional information you wish to convey that did not fit in your presentation. In addition to the presentation outline areas, the report should include a References section with citations in ACM reference format for any key resources, reference, supporting software/tools, or related literature that informed your system/experiment design, implementation, or evaluation. (It is not required that you have a lot of citations or a dedicated report section about “related work,” but whatever you did use must be cited.)

In addition to this report material, your top-level documentation (probably a README.md) should cover:

Organization of the files/directories in your full repository/submission: where to find what.
Instructions to install required software dependencies (with links to software or other documentation as needed) and compile and configure your system.
Instructions to run your system on sample or test inputs and interpret its outputs. These instructions/suggested tests should demonstrate that it works. If compiling or running your system is tricky or depends on a specific environment, it’s best to include transcripts of the output of each of these suggested tests.
A list of known bugs or limitations. (Optionally, this may link to the issue tracker if you have been using it.)
Separately, code should use inline documentation in the form of comments, as usual.

If text alone does not do justice to what you wish to convey, feel free to include images, video, or other media.

In addition to the documentation and report, please make a final update to your project planning Google Doc to describe the contributions of your team members to the work of the project.

Questions about the report or documentation requirements? I would be happy to consult with you ahead of time. Get in touch. I am also willing to arrange the opportunity for you to give me another (or a first) live demo before the end of final exams if you think it would be useful (or tricky to capture in text). A live demo is not required.

Evaluation and Grading

The effort and scope expectations for the project are greater than those for the earlier prescribed projects. Use the course grade weighting and calendar timelines as rough gauges for the expected relative effort and scope.

In the open-ended spirit of the project, there is flexibility for the grading to fit to your project. There is not a strict allocation of points among these goals. I will work directly with letter grades. One way of understanding the prioritization is as follows:

Roughly half of the grade is due to the quality of the design, implementation, and evaluation of the software and results of the system or measurement study.
- For design and implementation projects, the primary factor is the software itself, but it cannot stand on its own without some evaluation demonstrating or measuring how it works.
- For measurement study projects, the methodology, results, and interpretation are likely to be weighted more evenly.
Roughly half of the grade is due to the quality of the technical communication about the design, implementation, and evaluation.
- By default, the documentation/report weigh more than the presentation, since the written parts will arrive at the final stage of the project when it is more complete.
- I will give feedback within a day after your presentation on where I think things stand and, if needed, where I would like to see improvement before the final report/submission to reach a given level.

Ultimately, there’s not a clear way to separate the software/results and communication parts, so these rough halves are flexible. It can be difficult to appreciate software design, implementation, and evaluation or results without clear technical communication about them. The technical communication will be about the software design, implementation, and evaluation or results, so it can only get so far without those things. Each side amplifies the other: think of a product, not a sum.

Thus a second way of understanding the prioritization is that the grade is almost entirely about design, implementation, and evaluation or results, and there is some balancing of how much the software/results or communication contribute. Different projects and teams may balance across and within these components in different ways. As examples:

Software artifacts or results that are more modest in scope or quality can combine with excellent technical communication for a successful project.
A more substantive or high quality software artifact and results can combine with technical communication that is more modest in scope or quality for a successful project.
Substantial high quality work in both parts definitely results in an excellent or exceptional project.

If you have intentionally focused on certain components more than others, please feel free to notes this in your documentation or report. I will take it into account in how I interpret and weigh the parts of your work.

Questions? Please check in at any time.

Project Ideas

Here are some ideas of potential projects:

Build an extended filesystem feature like snapshots, version history, or logging with FUSE.
Build an interesting new feature for xv6, such as:
- kernel threads + locks + condition variables or semaphores
- mmap for memory-mapped files + demand paging + interprocess shared memory.
- A filesystem checker.
- A more sophisticated scheduler.
Translate a component of xv6 to Rust.
Build some system utility programs in Rust.
Build an OS scheduler simulator and run measurement studies comparing several scheduling algorithms.
Implement basic userspace cooperative threads / fibers / tasks as a library.
Build and evaluate a persistent or concurrent key-value store.
Build a file search engine that parses files to build an in-memory index and supports fast searches over the file contents using this index. Use parallelism or concurrency to accelerate indexing, support concurrent updates and searches, or store and reload the index in a space-efficient file format.
Do a measurement study of filesystem storage behavior on tempest and the Systems Lab workstations (remotely), comparing and reporting performance on several combinations of storage backends.
Build a dynamic deadlock detector for multithreaded Java programs using RoadRunner.
Use ptrace (manual) or other features to interpose on system calls and build one of:
- A security or “demo” sandbox that intercepts system calls related to files and I/O to ban access to configurable filesystem locations or capture all filesystem modifications inside a contained sandbox.
- A simple record-and-replay debugging tool that records a log of system call, arguments, and results for a process so that the process’s behavior can be replayed later by rerunning the program with system calls served from the log.

Here are some more general patterns:

Build a parallel or concurrent X.
Empirically measure system behavior Y for real-world programs / systems.
Build a simulator for system Z and experimentally compare alternative approaches.
Build a tool for tracing/auditing/debugging/introspection/analysis of program or process behavior.
Do some system-level programming for an application area of your choice.
Anything else that you can convince me is systems-related.

Contents