|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object MotifSearch EM_MotifSearch
public class EM_MotifSearch
An instance of the EM_MotifSearch
class represents
a search using the EM algorithm for a motif common to a set of
genomic sequences.
Field Summary |
---|
Fields inherited from class MotifSearch |
---|
instanceLocations, matrix, motifLength, numSequences, sequences |
Constructor Summary | |
---|---|
EM_MotifSearch(java.lang.String fileName,
int motifLength)
Creates an initially empty EM_MotifSearch . |
Method Summary | |
---|---|
void |
determineMatrixModel()
The Maximization step in the EM algorithm. |
void |
determineMotifInstances()
The Expectation step in the EM algorithm. |
void |
EM()
Executes the EM (Expectation Maximization) algorithm. |
double |
getInformationContentOfMatrix()
Returns the information content associated with the matrix model. |
double |
getScoreForMotifInstance(java.lang.String s)
Given a candidate motif instance, returns the score (probability) of that instance based on the matrix model. |
static void |
main(java.lang.String[] args)
The main method generates an EM_MotifSearch
for a motif of the specified length in the genomic sequences found in the specified
FASTA file. |
void |
run_EM_multiple_times(int iterations)
Executes the EM (Expectation Maximization) algorithm multiple times. |
void |
setRandomLocationsForMotifInstances()
The initial random seed step in the EM algorithm. |
Methods inherited from class MotifSearch |
---|
addPseudocountsToMatrix, getConsensusSequence, getInstanceLocations, getMatrix, getNucleotideContent, matrixToString, motifInstancesToString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public EM_MotifSearch(java.lang.String fileName, int motifLength)
EM_MotifSearch
.
A set of genomic sequences is determined from the specified String
representing the name of a file.
Genomic sequences are read-in from the FASTA file. The integer
parameter
represents the desired length of the motif being searched for.
Initially, the constructed
EM_MotifSearch
is empty. This constructor need only invoke the constructor
of the super
class.
fileName
- the name of a FASTA file containing one or more genomic sequencesmotifLength
- the length of the desired motifMethod Detail |
---|
public void setRandomLocationsForMotifInstances()
For each sequence, randomly determine the start index of a motif instance in the sequence.
public void determineMatrixModel()
Based on the motif instances in the sequences, creates a matrix motif model. The resulting matrix model should be updated with pseudocounts so that no entries in the matrix correspond to 0.0.
public void determineMotifInstances()
Based on the matrix model, identifies motif instances in the sequences. One motif instance is identified in each sequence. For each sequence, the motif instance that best matches the model is chosen.
public double getScoreForMotifInstance(java.lang.String s)
The length of the motif instance specified by String s
must be the
same as the number of columns in the matrix model.
s
- a String
corresponding to a motif instance
public double getInformationContentOfMatrix()
The information content of a matrix model is described in
Task 2 of Exercise 7.
The method getNucleotideContent
may be useful
in determining the background frequency of different nucleotides.
public void EM()
Initially, the EM algorithm is randomly seeded, i.e., one motif instance is randomly chosen in each sequence. Then, the Maximization and Expectation steps are alternately repeated until convergence. The EM algorithm converges when the information content of the matrix model no longer improves.
public void run_EM_multiple_times(int iterations)
The number of times that the algorithm is executed is specified
by the integer parameter. The best motif, as determined by
information content, is identified over all executions
of the algorithm. Upon completion of this method, this
EM_MotifSearch
should correspond to the best
motif (including matrix model and motif instances) identified
over all executions of the algorithm.
public static void main(java.lang.String[] args)
main
method generates an EM_MotifSearch
for a motif of the specified length in the genomic sequences found in the specified
FASTA file. The EM algorithm is executed the specified number of iterations. The
set of motif instances, matrix, consensus sequence, and information content
corresponding to the maximum over all iterations are output.
args
- an array of Strings
representing any command line arguments
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |