public class Hierarchical_Clustering extends Clustering
Hierarchical_Clustering
class represents a
clustering (i.e., grouping or partitioning) of a collection of genes using
the centroid-linkage hierarchical clustering method.clusters, experiments, genes
Constructor and Description |
---|
Hierarchical_Clustering(String fileName,
int numClusters)
Creates an initially empty
Hierarchical_Clustering . |
Modifier and Type | Method and Description |
---|---|
void |
hierarchical()
Performs centroid-linkage hierarchical clustering.
|
void |
initiallyAssignOneGeneToEachCluster()
Assigns each gene to its own unique cluster.
|
static void |
main(String[] args)
The
main method creates a Clustering based on gene and experiment data from a tab-delimited text file. |
void |
mergeTwoClosestClusters()
Identifies and merges together the two closest clusters.
|
getExperimentNamesFromFile, getGeneInformationFromFile, getNumClusters, getNumExperiments, getNumGenes, toString
public Hierarchical_Clustering(String fileName, int numClusters)
Hierarchical_Clustering
.
A set of genes and experiments are determined from the specified String
representing the name of a file.
Genes and experiments are read-in from the tab-delimited file. Initially, the constructed
Hierarchical_Clustering
is empty.
fileName
- the name of a tab-delimited text file containing gene and experiment datanumClusters
- an integer representing the desired number of clusterspublic void initiallyAssignOneGeneToEachCluster()
Initially, each cluster contains only one gene. So the number of clusters is the same as the number of genes.
public void mergeTwoClosestClusters()
Identifies the two clusters that are closest together, i.e., whose mean expression vectors have the shortest distance between them. Then these two clusters are merged. In order to merge the two clusters, all genes in the second cluster are added to the first cluster and the second cluster is then removed. Thus, the total number of clusters is decreased by one following invocation of this method.
public void hierarchical()
Initially, each cluster contains one gene. The two closest clusters are identified (i.e., the two clusters whose mean expression vectors have the smallest distance between them), and these two clusters are merged into one cluster. The process is repeated, with each iteration reducing the number of clusters by one, until the desired number of final clusters is obtained.
public static void main(String[] args)
main
method creates a Clustering
based on gene and experiment data from a tab-delimited text file.
The clustering is determined using centroid-linkage hierarchical clustering. The text file and the desired number of clusters are specified as command line arguments. The computed set of clusters is output to standard output (and can be redirected to a file).
args
- an array of Strings
representing any command line arguments