CS305 Final Project

Final Project

 

Overview

The final project offers an opportunity for you to explore a topic in greater depth than we covered in class and to apply your machine learning knowledge to an interesting domain. For the project, you will identify a dataset and a task involving supervised classification, regression, or recommendations. The task by itself could be novel, or you could explore new ways of attacking it compared to existing work. The project is comprised of trying different feature representations and machine learning models, and evaluating their performance. Your completed project and final submission will be a ~5 page paper summarizing your dataset, methods, and experimental results, as well as your supporting data and code, which you will present to the instructor during reading period or final exam period.

There are three milestones for the final project, as described below, each with its own deadline. You are required to work in teams of 2 or 3 for the final project. Working in teams is the norm for computer science. If, for some reason, you prefer to work alone, you will need to get permission from the instructor.

 

Assessment

Project outcomes will be evaluated on the amount of demonstrated effort, depth of research into the topic and into related work, creative problem solving, and demonstrated understanding of machine learning. You should aim to design correctly working programs that implement sensible algorithms and featurizers, and show that you made an an effort to try multiple ideas.

There are three milestones for the project:

Milestone Percent of Final Project Grade Due Date
Proposal 10% See course schedule
In-Class Presentation 5% See course schedule
Completed Project / Final Paper 85% At meeting with instructor scheduled
during Reading Period or Final Exam Period
 

Example Data Sources

For your final project, you are free to use data from any source. Below are a few possible sources (disclaimer - these sources are neither vetted nor endorsed by the course instructor), though you are by no means limited to these.

 

Milestone 1: Proposal

The project proposal sets out your topic and goals in a 1-2 page paper. The most important criterion for choosing a topic is that it genuinely excites you. Be creative! The second is feasibility -- you have limited time to work on your final projet, so set your goals realistically. Keep in mind that it will take significantly less time to use an existing well-formatted dataset than to collect the data yourself or use data that requires considerable pre-processing.

Once you have established your team, you should identify your dataset and think carefully about the following questions: Is there previous work from others using this data? If so, what have they found? What do the data represent? What format is the data in -- is it easily input into machine learning algorithms? Can the data be visualized? How will you featurize the data? What questions are you trying to answer? What machine learning algorithm or set of algorithms will you use? Are there hyperparameters and, if so, what are they and how will you tune them? How will you evaluate the accuracy of the algorithms you are employing? How else might you assess your algorithms beyond their accuracy? What strategies will you use to check for underfitting/overfitting and/or to improve the performance of your algorithm? How will the project challenge you beyond what you learned in the class?

In general, the more time you spend initially thinking deeply about the above questions, the more smoothly your project will go and the less time you will spend later on heading down dead ends. Your proposal, 1-2 pages in length, should include the following:

  • A description of your problem and motivations.
  • A brief summary of any existing work, with links to relevant papers or websites, if applicable.
  • The dataset that you will be using, with a link if relevant.
  • A description of the featurization that you envision using, e.g., what features you will use, whether you can/will visualize the data, whether you will use clustering and dimensionality reduction to aid featurization or data exploration, and whether the data need to be scaled and normalized.
  • How the data will be split into training, development, and test sets, and whether cross-validation will be used.
  • What classification algorithm(s) you plan to use and why.
  • How you will evaluate your results (accuracy, precision/recall, etc.).
  • Will you consider any tradeoffs between accuracy, interpretability, and fairness of your algorithms?
  • Is the primary purpose of your task prediction, or do you also want to explain something about the data, e.g., analyzing features that are predictive of a class?
  • What aspects of your project do you anticipate will challenge you most?
  • Responsibilities of each team member.
 

Milestone 2: In-Class Presentation

During class, each team will share their project ideas and plan with the rest of the class. In advance, you should prepare a Google slides presentation that you include in your Google drive FinalProject folder. Your in-class presentation should be approximately 10 minutes in duration and should clearly convey to your classmates what you will be doing in your project. The presentation should include:

  • Why you chose the data/topic that you did and what you find interesting about it.
  • A summary of the information from your Milestone 1 propsoal.
  • Any changes you have made to your plan since submitting your Milestone 1 proposal.
  • Any progress you have made in implementing your project.
  • How project work is/will be distributed among team members.
  • What you consider to be the hardest part and most challenging aspects of your project.
 

Milestone 3: Completed Project and Final Paper

For your final submission, you should have complete, working code that processes and featurizes your data, and applies one or more machine learning algorithms toward analysis of the data. You will submit a ~5 page paper that summarizes relevant literature, your methods, experimental results, and ideas for future work. Your final paper should include the following:

  • A description of your data, problem, and motivations.
  • An overview of any relevant existing work by others.
  • Details about the dataset that you used, including how you acquired it, and how you featurized it.
  • A description of the machine learning models you used and why you chose them. You should include information on how your models were trained, developed, and tested.
  • The results of your experiments, including numbers, visualizations, and interpretations, as appropriate.
  • Consideration of whether your algorithms are interpretable and whether they might exhibit any biases.
  • Analysis of any shortcomings of your work, and ideas for future research.

You must present your completed project and final paper to your instructor during reading period or final exam period. This is not a formal presentation but rather a meeting with the instructor where you describe your final project and the instructor has the opportunity to ask you questions about it. Near the end of classes, the instructor will send out a schedule of times during reading period and final exam period that you can use to sign up for a meeting at which you will present your completed project to the instructor. Your code and final paper should be completed and submitted by the start of this meeting. To submit your code and final paper, you should upload them to the FinalProject folder in your Google drive. If your dataset is less than a few megabytes in size, you should submit this as well. If your dataset is larger, you should not submit it. Regardless of whether you submit your dataset, your final paper should include details of how you acquired your dataset.