How to turn in this Project
You are required to turn in both a hardcopy and a softcopy. If working as a team, the team should submit a single hardcopy and a single softcopy (to either team member's account). Please make sure to keep a copy of your work, on your computer, in your private directory (or, to play it safe, both).Hardcopy Submission
Your hardcopy packet should consist of:- The cover page;
-  Your Hierarchical_Clustering.javafile from Task 1.
-  Your KMeans_Clustering.javafile from Task 2.
-  Your CAST_Clustering.javafile from Task 3.
Softcopy Submission
You should submit your final version of your
Clustering directory to the
drop/project5 directory in your
account on the CS server.
Commencement is just around the corner. Family... friends... everyone is talking about post-graduation plans. Getting a job, going to graduate school, making a difference in the world. If just one more person asks you what you are going to do after graduation, you might lose it. Don't they know that you are responsible enough to handle it. You're working on it. I mean, it's not as though you just sit around all day and watch TikTok... sometimes you play Wordle, too. You decide that you can't decide. You need time to find yourself. That will show them. You start planning an epic trip. You'll wander the globe in search of meaning, and you'll take nothing with you but a backpack and your resourcefulness... oh, and perhaps a folder of your CS313 notes organized with multi-colored tabs... those could come in handy.
Background
In this project, your goal is to implement three algorithms for clustering gene expression data: hierarchical clustering, k-means clustering, and CAST clustering. We have provided you with three classes that you can use:- An instance of the Geneclass represents a gene, including the name of the gene, a description of the gene's putative function, and a collection of the gene's expression values from a set of RNA-seq experiments.
- An instance of the Clusterclass represents a group of genes that have been clustered together.
- An instance of the Clusteringclass represents a clustering (i.e., partitioning) of genes into clusters.
The contracts for these classes can be found here.
Implementations for these classes are stored in the 
/home/cs313/download/Clustering subdirectory on
the CS server. 
The three abovementioned classes do not require any modification. Your goal is to
implement three new classes from scratch, Hierarchical_Clustering,
KMeans_Clustering, and CAST_Clustering as described below.
Task 1: Hierarchical Clustering
We have provided you with a Clustering application that, when
executed, reads in information from a file about a set of genes and
experiments as well as the expression values of each gene in 
all of the experiments. For example, the provided file
data/yeast_10.txt contains information about the expression
values of 10 yeast genes from 79 experiments. A summary of the
79 experiments can be found here.
When the Clustering application is invoked as follows
java Clustering data/yeast_10.txt
then the program will read in from the specified file the expression values for the 10
yeast genes in the 79 experiments. The Clustering application
creates an empty clustering, i.e., it clusters the genes into zero clusters.
You do not need to modify the Clustering class.
In this task, your goal is to create a class, Hierarchical_Clustering,
that inherits from the Clustering class and implements
centroid-linkage hierarchical clustering. The contract for the 
Hierarchical_Clustering class that you are asked to implement 
can be found here.
To begin, study the contracts of the three provided classes: Gene,
Cluster, and Clustering. These classes contain many
methods that will be useful when implementing your hierarchical clustering
algorithm. After studying these three classes, you should create a new
class, Hierarchical_Clustering, that extends the 
Clustering class and fulfills the
Hiearchical_Clustering contract.
With the code available to download on the CS server, we have provided 
you with three files for testing your hierarchical clustering implementation:
yeast_10.txt, yeast_150.txt, and yeast_2467.txt.
The three data files contain information about expression values for 10 yeast genes
in 79 experiments, for 150 yeast genes in 79 experiments, and
for 2467 yeast genes in 79 experiments, respectively. We have also 
provided you with a sample solution, Test_Hierarchical,
which you can execute and compare to your own solution.
Task 2: k-Means Clustering
In this task, your goal is to create a class, KMeans_Clustering,
that inherits from the Clustering class and implements
k-means clustering. The contract for the 
KMeans_Clustering class that you are asked to implement 
can be found here.
To begin, study the contracts of the three provided classes: Gene,
Cluster, and Clustering. These classes contain many
methods that will be useful when implementing your k-means clustering
algorithm. After studying these three classes, you should create a new
class, KMeans_Clustering, that extends the 
Clustering class and fulfills the
KMeans_Clustering contract.
With the code available to download on the CS server, we have provided 
you with with a sample solution, Test_KMeans,
which you can execute and compare to your own solution.
Task 3: CAST Clustering
In this task, your goal is to create a class, CAST_Clustering,
that inherits from the Clustering class and implements
CAST clustering. The contract for the 
CAST_Clustering class that you are asked to implement 
can be found here.
To begin, study the contracts of the three provided classes: Gene,
Cluster, and Clustering. These classes contain many
methods that will be useful when implementing your CAST clustering
algorithm. After studying these three classes, you should create a new
class, CAST_Clustering, that extends the 
Clustering class and fulfills the
CAST_Clustering contract.
With the code available to download on the CS server, we have provided 
you with with a sample solution, Test_CAST,
which you can execute and compare to your own solution.