CS 301: Compiler and Runtime System Design, Spring 2016

Scheduled meetings: Tuesdays + Fridays 11:10am - 12:20pm, SCI E111
Instructor: Ben Wood, SCI E128, contact on website
Website: https://cs.wellesley.edu/~cs301/s16/
Google Group: announcements, discussion, questions. Stay subscribed and check your email.

Syllabus Contents:

Objective

CS 301 covers principle and practice in the design and implementation of compilers and programming language runtime systems. Topics include lexical analysis, parsing, symbols tables, semantic analysis, type checking, intermediate representations, program analysis and optimization, code generation, garbage collection, and other runtime support. As time permits, the course may also survey topics including alternative parsing techniques, just-in-time compilation, runtime optimization, program analysis, and applications to others areas of CS. Students will construct a full compiler and runtime system for a simple statically-typed programming language over the course of the semester.

Prerequisite: CS 230 and at least one of CS 240 or CS 251. CS 235 is recommended but not required.

Compiler design is a great CS capstone topic that unifies core computer science theory and practice to build an essential programming tool that you have taken for granted up to now. Compilers are nominally program translators, but they can be much more: The toolkit we develop for thinking about compilers and programming languages has applications to a range of problems in security, privacy, energy efficient computing, data analysis… even biology!

We will learn theoretical foundations and put them to work in a large project. To succeed in CS 301, you should have some maturity (and joy) as a programmer and learner, some interest in foundations, an eagerness to build things, and the drive and independence to tackle large challenges.

Texts

Dragon Book (main text): Compilers: Principles, Techniques, & Tools, 2nd ed. Aho, Lam, Sethi, and Ullman. Addison-Wesley, 2006. Note the errata list.
EC (optional): Engineering a Compiler, 2nd ed. Cooper and Torczon. Morgan Kaufmann, 2012.

Most of the course reading comes from the so-called Dragon Book. Both of these books have great parts and quirky parts. We mix and match from Dragon, EC, and other provided material. Get a copy of the Dragon Book. A new hardcover is expensive, but I have found reasonable used/paperback/international copies for $15 - $60 by searching online booksellers for “aho lam sethi ullman”. I have a few copies of each book to share, but the Dragon is a big enough part of this course that it is well worth getting your own. You are free to get your own copy of EC, but it is not required. (Similar used/paperback prices.)

Format

CS 301 follows a tutorial-style format1 with two key components:

  1. Weekly small group meetings with the instructor focus on readings and problem sets exercising theoretical foundations for compiler and runtime design.
  2. A challenging 5-stage team programming project puts theory into practice to build a compiler over the semester.

Normal scheduled class meetings support these two components with a mix of other activities.

First week

Before launching into the regular weekly format, we spend the first week together as a class building a simple compiler for a tiny language from scratch. The goal is hands-on perspective for the structure of a compiler and a few of its jobs before we dive into a semester-long walk through key compiler components in greater depth. This small project also gets us up to speed on the tools and language we will use for projects and tests out working with potential teammates.

Weekly tutorial1 meetings and assignments (topic)

The first core component of the course is the weekly tutorial1 meeting. In the first week of the semester I form tutorial groups of 3 students, after collecting some input from you. Starting the second week of the semester, each group has a weekly hour meeting with me to discuss the current reading and problem set assigned the week before. Weekly readings and problem sets consist of exercises to master theoretical foundations, explorations of design considerations, and occasional small programming exercises. Please read the advice below about technical reading for CS 301.

You are expected to complete the reading and a rough draft of solutions to the problem set in preparation for each meeting. (You submit polished solutions for a subset following the meeting.) The bulk of the meeting will be discussion of the reading and problem set led by students. My main role in meetings is to watch and listen, occasionally ask useful questions, or give you a nudge if we get stuck or head in the wrong direction. This means that, in addition to reading and preparing solutions, you should prepare to present your work and the key points of the readings. Do not prepare a polished lecture. Do take time to identify key ideas of the reading and problem set solutions, and how to present them orally, with the help of a whiteboard as needed.

This format promotes growth and confidence in your skills for independent learning and clear oral technical communication. Meetings might seem intimidating to start. Sometimes, you may get stuck on a problem, despite your best effort to solve it. Occasionally, you may get stuck presenting an idea you thought you understood. Awkward pauses ensue. These are all fine, expected, and useful in the course of learning. When I ask you to prepare for meetings, the goal is to be prepared to learn during the meeting, which is not necessarily equivalent to being prepared to explain everything flawlessly. The opportunity to work through these obstacles productively together, and at the right pace for you, is a merit of the format.

Collaboration and Honor Code for problem sets

Considering the workload and the technical level of the material, you are encouraged to work on weekly assignments in small groups. These groups need not be the same as your meeting groups, but they should be limited in size. Meeting-sized groups often work best. I encourage (and may require) you to change collaborators a few times through the semester to work with many of your classmates.

Regardless of group work, your solutions must be your own work and you must cite all collaborators on work you submit. When working through problems with others, do not write solutions as a group or copy solutions. Do not take pictures of solutions written on whiteboards. Do take general notes on the approach, structure, or key steps you and the group constructed. Then, later, write up solutions on your own, referring to your notes only if needed. A good rule of thumb is to let at least 30 minutes pass after working with the group and before writing your solutions. Then you know whether you actually understand the solutions and (as is important for meetings) how to derive and explain them.

Violations of these rules are considered violations of the Honor Code and handled accordingly.

Implementation project (code)

The second core component of the course is a semester-long team project applying the principles we learn to implement a full compiler for a Java-like language. The project is done in self-formed teams of 3 students, not necessarily the same as your tutorial groups. I may reorganize teams later if needed. The project begins in the third week of the course. Work with several classmates before then to find teammates that are a good fit. The project is a significant amount of work. Teammates must contribute to the project in similar measure. Consider how well your schedule and courseload fit those of potential teammates.

The project is organized into five stages (four prescribed parts and an open-ended extension), each containing intermediate graded checkpoints. All stages of the project are graded for design, documentation, style, correctness, and efficiency. The open-ended extension at the end of the semester allows (and requires) you to add, evaluate, present, and report on an interesting new feature beyond what we build in the main four stages.

Each stage is a significant amount of work. I encourage you to work in the microfocus or S173 for a sense of class camaraderie (and since necessary tools are provided there). Start each step early and meet each checkpoint. Starting at the last minute guarantees missing a checkpoint. The project is large and cumulative. Falling behind or leaving a stage incomplete have significant negative impact on the rest of the project. It is difficult to recover once behind.

Collaboration and Honor Code for programming projects

Teams may collaborate internally in any way, such that each team member contributes to each stage of the project in similar measure and can explain the product as a whole. I recommend doing a significant amount of the work in pair (or trio) programming style. It is sometimes reasonable to split some parts into separate tasks for individual team members as needed.

Members of separate teams are free to discuss higher-level concerns like design, algorithms, bugs, and so on, but prohibited from sharing or viewing each others’ code. All submitted work must be the work of your team (or code explicitly provided in the assignment). As with problem sets, teams must explicitly credit such discussions in their submitted work. Keep the code-viewing restriction in mind when debugging. You may help another team interpret error messages, but you may not view their code in the process. (You may accept help interpreting error messages, but you may not show your code in the process.)

Violations of these rules are considered violations of the Honor Code and handled accordingly.

Class meetings (lab)

Normal scheduled all-class meetings are used for a variety of purposes depending on our needs each week, including “lab” time for introductions to tools or project stages, occasional lectures, discussions, code reviews, or unstructured problem set/project work. Attendance is expected, though we may make some class meetings optional.

Grading

(Coarse) course grades are derived roughly half from weekly meetings/assignments and half from the programming project.

Late passes

Deadlines are necessary to keep the course moving. To add flexibility for a challenging workload, each project team is allotted a budget of 4 × 24-hour late passes to use within the semester. Each pass may be redeemed by emailing the instructor to delay one project deadline or checkpoint by exactly 24 hours. (e.g., 1 hour late or 23.5 hours late both cost exactly one 24-hour pass.) A severe grade penalty is assessed for late work without passes. Passes become void at the end of final exam period, no work is accepted. Consult your class dean and instructor in case of severe extenuating circumstances.

Late passes are not valid for weekly assignments, due to tight integration with tutorial meetings. If you are unable to prepare or attend a meeting, please notify the instructor as early as possible to discuss alternatives.

How to read for CS 301

Using technical reading as your primary means of knowledge acquisition can require significant adjustment from more passive lecture-based courses where you can rely on the instructor to highlight important points and provide alternative explanations. In this course, you and your group are in charge. I assist mainly by asking useful questions to guide you. Building skills for technical reading, as you do in this course, is invaluable as a computer scientist (and beyond).

It is rarely possible to read a technical document and absorb all the relevant detail in one pass. Instead, aim to read repeatedly at deepening levels of detail and work through examples. Read and work on exercises in stages:

  1. Do a cursory reading to identify important concepts and high-level structure within the topic (and also what looks less important so you can skip it initially). This is an important skill that takes time to develop.

  2. Do a detailed, methodical reading of each section and work through examples carefully, aiming to understand details. Become familiar with the important concepts of each section before moving on to the next. If you get stuck, take a break, jump ahead and come back later, or try a related exercise. Context from later material may provide the missing link for you.

  3. Begin the exercises once you have some working knowledge of the topic. Consider which concepts are at play in each exercise. Refer to the reading (especially examples) as needed. Exercises may reveal new connections. If an exercise involves a secondary detail you skipped during your detailed reading of important concepts, go back and read further.

  4. When you have completed the readings and exercises, take a few minutes to review them, pick out key points, and consider how to explain these.

  5. If you are still stuck on some idea or exercise, write down what you do know and how you think it might work so we can work from there in your meeting.

Tia Newhall (Swarthmore College) has more good advice on reading technical computer science material. It targets lecture courses, but many points can be extrapolated to this format.

Accommodations

Please feel free to speak with me about concerns or suggestions about how I can make the course more accessible to you. I will do my best to help. An anonymous feedback form is available in the Google Group if you would prefer.

Students with disabilities who are taking this course and who need disability-related accommodations are encouraged to work with Jim Wice (781.283.2434 or jwice@wellesley.edu), the Director of Disability Services. Jim’s office is located in the Pforzheimer Learning & Teaching Center on the third floor of Clapp Library. If you have a physical disability or a learning disability, Jim is the person to see to arrange for accommodations. If your learning disability is undocumented or if you are uncertain as to whether you have a disability, Jim can arrange for you to be tested.

Acknowledgments

This course material is based on similar assignments and projects developed at Cornell University and by Stephen Freund (at Williams College).

  1. Tutorial format can mean a variety of things, but in this context indicates instruction in small groups (e.g., 1-3 students and an instructor), where the students lead discussion. 2 3