CS313 Project 5

How to turn in this Project

You are required to turn in both a hardcopy and a softcopy. If working as a team, the team should submit a single hardcopy and a single softcopy (to either team member's account). Please make sure to keep a copy of your work, on your computer, in your private directory (or, to play it safe, both).

Hardcopy Submission

Your hardcopy packet should consist of:

The cover page;
Your Hierarchical_Clustering.java file from Task 1.
Your KMeans_Clustering.java file from Task 2.
Your CAST_Clustering.java file from Task 3.

Staple these together and submit them on the due date.

Softcopy Submission

You should submit your final version of your Clustering directory to the drop/project5 directory in your account on the CS server.

Commencement is just around the corner. Family... friends... everyone is talking about post-graduation plans. Getting a job, going to graduate school, making a difference in the world. If just one more person asks you what you are going to do after graduation, you might lose it. Don't they know that you are responsible enough to handle it. You're working on it. I mean, it's not as though you just sit around all day and watch TikTok... sometimes you play Wordle, too. You decide that you can't decide. You need time to find yourself. That will show them. You start planning an epic trip. You'll wander the globe in search of meaning, and you'll take nothing with you but a backpack and your resourcefulness... oh, and perhaps a folder of your CS313 notes organized with multi-colored tabs... those could come in handy.

Background

In this project, your goal is to implement three algorithms for clustering gene expression data: hierarchical clustering, k-means clustering, and CAST clustering. We have provided you with three classes that you can use:

An instance of the Gene class represents a gene, including the name of the gene, a description of the gene's putative function, and a collection of the gene's expression values from a set of RNA-seq experiments.
An instance of the Cluster class represents a group of genes that have been clustered together.
An instance of the Clustering class represents a clustering (i.e., partitioning) of genes into clusters.

The contracts for these classes can be found here.

Implementations for these classes are stored in the /home/cs313/download/Clustering subdirectory on the CS server.

The three abovementioned classes do not require any modification. Your goal is to implement three new classes from scratch, Hierarchical_Clustering, KMeans_Clustering, and CAST_Clustering as described below.

Task 1: Hierarchical Clustering

We have provided you with a Clustering application that, when executed, reads in information from a file about a set of genes and experiments as well as the expression values of each gene in all of the experiments. For example, the provided file data/yeast_10.txt contains information about the expression values of 10 yeast genes from 79 experiments. A summary of the 79 experiments can be found here. When the Clustering application is invoked as follows

java Clustering data/yeast_10.txt

then the program will read in from the specified file the expression values for the 10 yeast genes in the 79 experiments. The Clustering application creates an empty clustering, i.e., it clusters the genes into zero clusters. You do not need to modify the Clustering class. In this task, your goal is to create a class, Hierarchical_Clustering, that inherits from the Clustering class and implements centroid-linkage hierarchical clustering. The contract for the Hierarchical_Clustering class that you are asked to implement can be found here.

To begin, study the contracts of the three provided classes: Gene, Cluster, and Clustering. These classes contain many methods that will be useful when implementing your hierarchical clustering algorithm. After studying these three classes, you should create a new class, Hierarchical_Clustering, that extends the Clustering class and fulfills the Hiearchical_Clustering contract.

With the code available to download on the CS server, we have provided you with three files for testing your hierarchical clustering implementation: yeast_10.txt, yeast_150.txt, and yeast_2467.txt. The three data files contain information about expression values for 10 yeast genes in 79 experiments, for 150 yeast genes in 79 experiments, and for 2467 yeast genes in 79 experiments, respectively. We have also provided you with a sample solution, Test_Hierarchical, which you can execute and compare to your own solution.

Task 2: k-Means Clustering

In this task, your goal is to create a class, KMeans_Clustering, that inherits from the Clustering class and implements k-means clustering. The contract for the KMeans_Clustering class that you are asked to implement can be found here.

To begin, study the contracts of the three provided classes: Gene, Cluster, and Clustering. These classes contain many methods that will be useful when implementing your k-means clustering algorithm. After studying these three classes, you should create a new class, KMeans_Clustering, that extends the Clustering class and fulfills the KMeans_Clustering contract.

With the code available to download on the CS server, we have provided you with with a sample solution, Test_KMeans, which you can execute and compare to your own solution.

Task 3: CAST Clustering

In this task, your goal is to create a class, CAST_Clustering, that inherits from the Clustering class and implements CAST clustering. The contract for the CAST_Clustering class that you are asked to implement can be found here.

To begin, study the contracts of the three provided classes: Gene, Cluster, and Clustering. These classes contain many methods that will be useful when implementing your CAST clustering algorithm. After studying these three classes, you should create a new class, CAST_Clustering, that extends the Clustering class and fulfills the CAST_Clustering contract.

With the code available to download on the CS server, we have provided you with with a sample solution, Test_CAST, which you can execute and compare to your own solution.