public class HMM
extends Object
HMM
class represents a
hidden Markov model, including the model's states, transitions,
and the optimal annotation (i.e., parse) of an observation
sequence.Constructor and Description |
---|
HMM(String states_fileName,
String transitions_fileName,
String sequence_fileName)
Creates a hidden Markov model based on the specified states, transitions,
and observation sequence.
|
Modifier and Type | Method and Description |
---|---|
void |
annotationToFile(String filename)
Outputs the optimal annotation and its score to a file.
|
String |
arrayToString(double[][] a)
Returns a String representation of a 2D array.
|
String |
arrayToString(int[][] a)
Returns a String representation of a 2D array.
|
void |
determineOptimalAnnotation()
Determines the optimal annotation of the observation sequence.
|
void |
fillInTableEntry(int row,
int col)
Fills in the entry at row
row and column col in both
the dynamic programming table and the backtracking table. |
int[][] |
getBacktrackTable()
Returns the backtracking table used by the Viterbi algorithm.
|
double |
getScoreOfOptimalAnnotation()
Returns the score (the natural logarithm of the probability) of
the optimal annotation of the observation sequence.
|
double[][] |
getTable()
Returns the dynamic programming table used by the Viterbi algorithm.
|
static void |
main(String[] args)
The
main method generates a HMM based on the state and
transition information in two files specified by command line arguments. |
String |
optimalAnnotationToString()
Returns a String representation of the optimal annotation.
|
String |
toString()
Returns a String representation of this HMM.
|
void |
viterbi()
Implements the Viterbi algorithm by filling in a dynamic programming table
and a backtracking table.
|
public HMM(String states_fileName, String transitions_fileName, String sequence_fileName)
The states of the HMM are read in from the file specified by the first parameter. The transitions between states of the HMM are read in from the file specified by the second parameter. The observation sequence emitting by the HMM is read in from the file specified by the third parameter. It is assumed that all paths (state sequences) through the HMM start in the starting state, which is the first state (i.e., the state at index 0). It is assumed that the first state emits characters only of length 1. It is assumed that for any state in the HMM, all characters emitted by that state are the same length. The constructor initializes the HMM, but it does not compute via the Viterbi algorithm the optimal annotation for the specified observation sequence.
states_fileName
- the name of a tab-delimited file specifying the states of the HMMtransitions_fileName
- the name of a tab-delimited file specifying the transitions
between states of the HMMsequence_fileName
- the name of a FASTA formatted file containing an observation sequence,
i.e., a sequence emitted by the HMM for which we want to determine the optimal annotationpublic void fillInTableEntry(int row, int col)
row
and column col
in both
the dynamic programming table and the backtracking table.
Each entry in the dynamic programming table corresponds to the natural logarithm
of a probability. For the dynamic programming table,
the table entry at row row
and column col
corresponds
to the probability (to be precise, the natural logarithm of the probability)
of the best annotation of the first (col
+1) characters in the
observationSequence assuming the best annotation of these
characters ends in state row
.
While the dynamic programming table enables computation of the
"probability" of the optimal annotation, the backtracking table
enables computation of the optimal annotation itself.
The backtracking table entry at row row
and column col
corresponds to the state prior to state row
in the optimal annotation,
assuming state row
is the end state after observing the first
(col
+1) characters in the observationSequence.
Any dynamic programming table entry corresponding to a state that is unreachable
should be populated with a value of negative infinity corresponding to the
natural logarithm of a probability of zero.
row
- the row of the table entry to be filled incol
- the column of the table entry to be filled inpublic void viterbi()
The method populates every entry in a dynamic programming table as well as every entry in a backtracking table. The tables have one row for each state in the HMM. The tables have one column for each character in the observationSequence. For the dynamic programming table, the table entry at row i and column j corresponds to the probability (to be precise, the natural logarithm of the probability) of the best annotation of the first (j+1) characters in the observationSequence assuming the best annotation of these characters ends in state i. While the dynamic programming table enables computation of the "probability" of the optimal annotation, the backtracking table enables computation of the optimal annotation itself. The backtracking table entry at row i and column j corresponds to the state prior to state i in the optimal annotation, assuming state i is the end state after observing the first (j+1) characters in the observationSequence. The following assumptions are made:
Since it is assumed that all annotations start in the first state, every other state besides the first state in the first column of the dynamic programming table should be populated with a value of negative infinity corresponding to the natural logarithm of a probability of zero.
public void determineOptimalAnnotation()
Once the dynamic programming table and backtracking table have
been filled in, this method uses the tables to construct a
String
representation of the optimal annotation.
The maximum value in the final column of the dynamic programming
table indicates the final state in the optimal annotation. The
backtracking table then can be used to backtrack from this final
state to determine the optimal state sequence (i.e., annotation).
The optimal annotation should be the same length as the
observation sequence and each state in the optimal annotation
should be represented by the first character in the state's name.
public double getScoreOfOptimalAnnotation()
double
representing the score of the optimal annotationpublic String optimalAnnotationToString()
String
representing the optimal annotation.public String toString()
toString
in class Object
String
representing this HMM.public double[][] getTable()
doubles
corresponding to the dynamic programming table.public int[][] getBacktrackTable()
ints
corresponding to the backtracking table.public String arrayToString(double[][] a)
A helpful method for debugging. A String representation of the dynamic programming table can be obtained using this method. However, the method is only practical for small 2D arrays.
a
- a 2D array of doubles
String
representation of the 2D arraypublic String arrayToString(int[][] a)
A helpful method for debugging. A String representation of the backtracking table can be obtained using this method. However, the method is only practical for small 2D arrays.
a
- a 2D array of ints
String
representation of the 2D arraypublic void annotationToFile(String filename)
filename
- the name of the output filepublic static void main(String[] args)
main
method generates a HMM
based on the state and
transition information in two files specified by command line arguments. An observation
sequence is determined from a FASTA formatted file also specified as a command line
argument. After the model is built based on the information in the three files,
the Viterbi algorithm along with backtracking is used to determine the optimal
annotation of the observation sequence. The optimal annotation, as well as its score,
is output to the screen and to a file named annotation.txt. The application can be
executed as follows:
java HMM states.txt
transitions.txt observation.txt
args
- an array of Strings
representing any command line arguments