| 
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.ObjectMotifSearch
EM_MotifSearch
public class EM_MotifSearch
An instance of the EM_MotifSearch class represents
 a search using the EM algorithm for a motif common to a set of 
 genomic sequences.
| Field Summary | 
|---|
| Fields inherited from class MotifSearch | 
|---|
instanceLocations, matrix, motifLength, numSequences, sequences | 
| Constructor Summary | |
|---|---|
EM_MotifSearch(java.lang.String fileName,
               int motifLength)
Creates an initially empty EM_MotifSearch. | 
|
| Method Summary | |
|---|---|
 void | 
determineMatrixModel()
The Maximization step in the EM algorithm.  | 
 void | 
determineMotifInstances()
The Expectation step in the EM algorithm.  | 
 void | 
EM()
Executes the EM (Expectation Maximization) algorithm.  | 
 double | 
getInformationContentOfMatrix()
Returns the information content associated with the matrix model.  | 
 double | 
getScoreForMotifInstance(java.lang.String s)
Given a candidate motif instance, returns the score (probability) of that instance based on the matrix model.  | 
static void | 
main(java.lang.String[] args)
The main method generates an EM_MotifSearch
 for a motif of the specified length in the genomic sequences found in the specified
 FASTA file. | 
 void | 
run_EM_multiple_times(int iterations)
Executes the EM (Expectation Maximization) algorithm multiple times.  | 
 void | 
setRandomLocationsForMotifInstances()
The initial random seed step in the EM algorithm.  | 
| Methods inherited from class MotifSearch | 
|---|
addPseudocountsToMatrix, getConsensusSequence, getInstanceLocations, getMatrix, getNucleotideContent, matrixToString, motifInstancesToString | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Constructor Detail | 
|---|
public EM_MotifSearch(java.lang.String fileName,
                      int motifLength)
EM_MotifSearch.
 
 A set of genomic sequences is determined from the specified String representing the name of a file.
 Genomic sequences are read-in from the FASTA file. The integer parameter
 represents the desired length of the motif being searched for.
 Initially, the constructed
 EM_MotifSearch is empty. This constructor need only invoke the constructor
 of the super class.
fileName - the name of a FASTA file containing one or more genomic sequencesmotifLength - the length of the desired motif| Method Detail | 
|---|
public void setRandomLocationsForMotifInstances()
For each sequence, randomly determine the start index of a motif instance in the sequence.
public void determineMatrixModel()
Based on the motif instances in the sequences, creates a matrix motif model. The resulting matrix model should be updated with pseudocounts so that no entries in the matrix correspond to 0.0.
public void determineMotifInstances()
Based on the matrix model, identifies motif instances in the sequences. One motif instance is identified in each sequence. For each sequence, the motif instance that best matches the model is chosen.
public double getScoreForMotifInstance(java.lang.String s)
 The length of the motif instance specified by String s must be the
 same as the number of columns in the matrix model.
s - a String corresponding to a motif instance
public double getInformationContentOfMatrix()
 The information content of a matrix model is described in
 Task 2 of Exercise 7.
 The method getNucleotideContent may be useful
 in determining the background frequency of different nucleotides.
public void EM()
Initially, the EM algorithm is randomly seeded, i.e., one motif instance is randomly chosen in each sequence. Then, the Maximization and Expectation steps are alternately repeated until convergence. The EM algorithm converges when the information content of the matrix model no longer improves.
public void run_EM_multiple_times(int iterations)
 The number of times that the algorithm is executed is specified
 by the integer parameter. The best motif, as determined by
 information content, is identified over all executions
 of the algorithm. Upon completion of this method, this
 EM_MotifSearch should correspond to the best
 motif (including matrix model and motif instances) identified
 over all executions of the algorithm.
public static void main(java.lang.String[] args)
main method generates an EM_MotifSearch
 for a motif of the specified length in the genomic sequences found in the specified
 FASTA file. The EM algorithm is executed the specified number of iterations. The
 set of motif instances, matrix, consensus sequence, and information content
 corresponding to the maximum over all iterations are output.
args - an array of Strings representing any command line arguments
  | 
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||