Alignment

java.lang.Object
- Alignment

```
public class Alignment
extends java.lang.Object
```
An instance of the Alignment class represents an optimal pairwise alignment for two genomic sequences.

Constructor Summary

Constructors
Constructor and Description
`Alignment(java.io.File file1, java.io.File file2)` Creates an Alignment from the genomic sequences found in the two specified FASTA files.
`Alignment(java.lang.String sequence1, java.lang.String sequence2)` Creates an Alignment from the two specified genomic sequences.

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`alignmentTableToString()` Returns a `String` representation of the alignment table generated during alignment computation.
`java.lang.String`	`backtrackTableToString()` Returns a `String` representation of the backtrack table generated during alignment computation.
`void`	`calculatePValue()` Estimates the p-value of this `Alignment`.
`void`	`computeAlignment()` Computes the optimal pairwise alignment of two genomic sequences.
`java.lang.String`	`getAlignment()` Returns the optimal pairwise alignment.
`int`	`getAlignmentScore()` Returns the optimal pairwise alignment score for this `Alignment`.
`double`	`getPValue()` Returns the p-value of this `Alignment`.
`static void`	`main(java.lang.String[] args)` The `main` method creates an optimal pairwise alignment for two genomic sequences.
`static void`	`outputHistogramOfRandomAlignmentScores(java.lang.String fileName, java.util.Vector<java.lang.Integer> v)` Outputs a histogram of optimal pairwise alignment scores to a file.
`java.lang.String`	`sequence1()` Returns the first of two genomic sequences in this `Alignment`.
`java.lang.String`	`sequence2()` Returns the second of two genomic sequences in this `Alignment`.
`void`	`setAffineGaps(int alphaGapScore, int betaGapScore)` Score gaps in this `Alignment` using an affine model.
`void`	`setFastAlignment(int numGaps)` Indicate that a fast linear-time pairwise alignment should be performed.
`void`	`setFixedScoring(int matchScore, int mismatchScore)` When aligning two characters, one from each genomic sequence, if the two characters are identical then the alignment score of the two characters should be the `match` score.
`void`	`setGlobalAlignment()` Indicate that a global pairwise alignment should be performed.
`void`	`setLinearGaps(int linearGapScore)` Score gaps in this `Alignment` using a linear model.
`void`	`setLocalAlignment()` Indicate that a local pairwise alignment should be performed.
`void`	`setMatrixScoring(java.lang.String fileName)` When aligning two characters, one from each genomic sequence, the alignment score of the two characters should be determined from a matrix of scores found in a file with the specified name.
`java.lang.String`	`toString()` Returns a `String` representation of this `Alignment`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - Alignment
```
public Alignment(java.io.File file1,
         java.io.File file2)
```
    Creates an Alignment from the genomic sequences found in the two specified FASTA files.
    
    Parameters:
    file1 - a File object referring to a FASTA file containing a genomic sequence
    file2 - a File object referring to a FASTA file containing a genomic sequence
  - Alignment
```
public Alignment(java.lang.String sequence1,
         java.lang.String sequence2)
```
    Creates an Alignment from the two specified genomic sequences.
    
    Parameters:
    sequence1 - a genomic sequence to be aligned
    sequence2 - a genomic sequence to be aligned
- Method Detail
  - sequence1
```
public java.lang.String sequence1()
```
    Returns the first of two genomic sequences in this Alignment.
    
    Returns:
    a String corresponding to the first of two genomic sequences in the pairwise alignment
  - sequence2
```
public java.lang.String sequence2()
```
    Returns the second of two genomic sequences in this Alignment.
    
    Returns:
    a String corresponding to the second of two genomic sequences in the pairwise alignment
  - computeAlignment
```
public void computeAlignment()
```
    Computes the optimal pairwise alignment of two genomic sequences.
    Either the optimal global or optimal local pairwise alignment is computed. Computation of the optimal pairwise alignment includes determining the optimal pairwise alignment score as well as the corresponding alignment.
  - getAlignmentScore
```
public int getAlignmentScore()
```
    Returns the optimal pairwise alignment score for this Alignment.
    
    Returns:
    the optimal pairwise alignment score
  - alignmentTableToString
```
public java.lang.String alignmentTableToString()
```
    Returns a String representation of the alignment table generated during alignment computation. This method is used primarily for debugging and is only useful for small alignment tables, i.e., when aligning short sequences.
    
    Returns:
    a String representation of the alignment table
  - backtrackTableToString
```
public java.lang.String backtrackTableToString()
```
    Returns a String representation of the backtrack table generated during alignment computation. This method is used primarily for debugging and is only useful for small backtrack tables, i.e., when aligning short sequences.
    
    Returns:
    a String representation of the backtrack table
  - getAlignment
```
public java.lang.String getAlignment()
```
    Returns the optimal pairwise alignment.
    
    Returns:
    a String representing the optimal pairwise alignment
  - getPValue
```
public double getPValue()
```
    Returns the p-value of this Alignment.
    The p-value of this Alignment is the probability (between 0.0 and 1.0) that the optimal pairwise alignment score of two random sequences is greater than or equal to the optimal pairwise alignment score for this Alignment. A p-value close to 1.0 suggests that an alignment was likely to have occurred merely by chance. A p-value close to 0.0 suggests that an alignment was unlikely to have occurred by chance. In this case (especially if the p-value is less than about 0.05), the alignment is significant and the sequences are deemed similar.
    
    Returns:
    the p-value of the optimal pairwise alignment
  - setGlobalAlignment
```
public void setGlobalAlignment()
```
    Indicate that a global pairwise alignment should be performed.
  - setLocalAlignment
```
public void setLocalAlignment()
```
    Indicate that a local pairwise alignment should be performed.
  - setFastAlignment
```
public void setFastAlignment(int numGaps)
```
    Indicate that a fast linear-time pairwise alignment should be performed.
    For two sequences of length n, rather than use an O(n^2) algorithm that identifies the optimal alignment with any number of gaps, a FAST alignment runs in O(numGaps*n) time where numGaps is the number of gaps considered. This option is only available for global alignments.
    
    Parameters:
    numGaps - at least this many gaps are considered when computing the optimal alignment
  - setLinearGaps
```
public void setLinearGaps(int linearGapScore)
```
    Score gaps in this Alignment using a linear model.
    With a linear model for scoring gaps, every gap is penalized the same amount as specified by the linearGapScore parameter.
    
    Parameters:
    linearGapScore - the (negative) contribution to the alignment score of each gap
  - setAffineGaps
```
public void setAffineGaps(int alphaGapScore,
                 int betaGapScore)
```
    Score gaps in this Alignment using an affine model.
    With an affine model for scoring gaps, the first gap in a sequence of consecutive gaps is penalized by the alphaGapScore parameter whereas subsequent gaps in a sequence of consecutive gaps are penalized by the betaGapScore parameter.
    Affine gap scoring is meant to model empirical biological evidence that the existence of a gap is more significant than the length of the gap. It is expensive, biologically, to add to or splice from a genomic sequence, but the length of the addition or deletion is less important.
    
    Parameters:
    alphaGapScore - the (negative) contribution to the alignment score of initiating each sequence of gaps
    betaGapScore - the (negative) contribution to the alignment score of extending each sequence of gaps
  - setFixedScoring
```
public void setFixedScoring(int matchScore,
                   int mismatchScore)
```
    When aligning two characters, one from each genomic sequence, if the two characters are identical then the alignment score of the two characters should be the match score. If the two characters differ then the alignment score of the two characters should be the mismatch score.
    
    Parameters:
    matchScore - the (positive) contribution to the alignment score of aligning two identical characters
    mismatchScore - the (negative) contribution to the alignment score of aligning two different characters
  - setMatrixScoring
```
public void setMatrixScoring(java.lang.String fileName)
```
    When aligning two characters, one from each genomic sequence, the alignment score of the two characters should be determined from a matrix of scores found in a file with the specified name.
    In fixed scoring, all pairs of identical characters (e.g., G|G, C|C, T|T) are scored the same and all pairs of different characters (e.g., G|C, G|T, C|T) are scored the same. However, with genomic sequences, not all pairs of characters are equally similar or dissimilar. In matrix scoring, different pairs of identical characters (e.g., G|G, C|C, T|T) may be scored differently and different pairs of mismatching characters (e.g., G|C, G|T, C|T) may be scored differently. For example, adenine (A) and guanine (G) are both purines whereas cytosine (C) and thymine (T) are both pyrimidines. Since adenine is more similar to guanine than to thymine, an adenine aligned with a guanine (A|G) should not penalize an alignment as much as an adenine aligned with a thymine (A|T). Analogously for protein sequences, two different hydrophobic amino acids aligned together might not penalize an alignment as much as a hydrophobic amino acid aligned with a hydrophilic amino acid.
    In matrix scoring, the alignment score of every possible pair of characters is specified in a matrix that must be read in from a file. For DNA sequences, since there are 4 characters in the DNA alphabet, there are 16 possible pairs of characters and the matrix contains 16 entries. For protein sequences, since there are 20 characters in the protein alphabet, there are 400 possible pairs of characters and the matrix contains 400 entries.
    
    Parameters:
    fileName - the name of a file containing a matrix of alignment scores for all pairs of characters
  - calculatePValue
```
public void calculatePValue()
```
    Estimates the p-value of this Alignment.
    The p-value of this Alignment is the probability (between 0.0 and 1.0) that the optimal pairwise alignment score of two random sequences is greater than or equal to the optimal pairwise alignment score for this Alignment. A p-value close to 1.0 suggests that an alignment was likely to have occurred merely by chance. A p-value close to 0.0 suggests that an alignment was unlikely to have occurred by chance. In this case (especially if the p-value is less than about 0.05), the alignment is significant and the sequences are deemed similar.
    A p-value for an alignment of two sequences can be estimated as follows. Randomly generate 1000 pairs of sequences by randomly permuting the original two sequences. For each of the 1000 pairs of random sequences, determine the optimal pairwise alignment score. These 1000 scores approximate an extreme value distribution. Calculate the mean and standard_deviation of the 1000 scores. The mean and standard_deviation can be used to calculate two parameters, mu and beta, representing the extreme value distribution. mu can be calculated as mean - (0.5772*beta). beta can be calculated as standard_deviation * √6 / π. Finally, the p-value can be calculated as 1.0 - e^(-e^(-(x-mu)/beta)) where x is the optimal pairwise alignment score of the original pair of sequences.
  - toString
```
public java.lang.String toString()
```
    Returns a String representation of this Alignment.
    
    Overrides:
    
    toString in class java.lang.Object
    
    Returns:
    a String representation of this Alignment
  - outputHistogramOfRandomAlignmentScores
```
public static void outputHistogramOfRandomAlignmentScores(java.lang.String fileName,
                                          java.util.Vector<java.lang.Integer> v)
```
    Outputs a histogram of optimal pairwise alignment scores to a file.
    A histogram with 101 bins is created from the set of alignment scores stored in the Vector v. The histogram indicates the number of alignment scores corresponding to each of the 101 bins. The histogram is output to the specified file fileName. The first column of the output file represents the x-axis of the histogram, i.e., the 101 bins corresponding to 101 possible alignment scores. The second column of the output file indicates the number of alignment scores corresponding to each bin. The third column is a normalized version of the second column, i.e., each entry in the third column is the corresponding entry in the second column divided by the total number of alignment scores. The fourth column represents a mathematical function - an extreme value distribution - that approximates the third column. The extreme value distribution is determined from the mean and standard deviation of the set of alignment scores in v.
    
    Parameters:
    fileName - the name of a file to which the histogram will be output
    v - a Vector of optimal pairwise alignment scores
  - main
```
public static void main(java.lang.String[] args)
```
    The main method creates an optimal pairwise alignment for two genomic sequences.
    The main method expects the Alignment program is executed with exactly two command line arguments: the names of two FASTA files each containing a genomic sequence. The main method creates an Alignment object based on the two genomic sequences and computes the optimal pairwise alignment of the sequences.
    
    Parameters:
    args - an array of Strings representing any command line arguments

Class Alignment

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Alignment

Alignment

Method Detail

sequence1

sequence2

computeAlignment

getAlignmentScore

alignmentTableToString

backtrackTableToString

getAlignment

getPValue

setGlobalAlignment

setLocalAlignment

setFastAlignment

setLinearGaps

setAffineGaps

setFixedScoring

setMatrixScoring

calculatePValue

toString

outputHistogramOfRandomAlignmentScores

main