Class MotifSearch

java.lang.Object
  extended by MotifSearch
Direct Known Subclasses:
EM_MotifSearch

public class MotifSearch
extends java.lang.Object

An instance of the MotifSearch class represents a search for a motif common to a set of genomic sequences.


Field Summary
protected  java.util.Vector<java.lang.Integer> instanceLocations
          A collection of integers where each integer represents the index of the start location of a motif instance in a genomic sequence
protected  double[][] matrix
          A 2D array corresponding to a position-specific scoring matrix (PSSM) that is a model of the motif being searched for
protected  int motifLength
          The length of the motif being search for
protected  int numSequences
          The number of genomic sequences
protected  java.util.Vector<java.lang.String> sequences
          A collection of genomic sequences
 
Constructor Summary
MotifSearch(java.lang.String fileName, int motifLength)
          Creates an initially empty MotifSearch.
 
Method Summary
 void addPseudocountsToMatrix()
          Replaces values of 0.0 in the matrix (motif model) with small pseudocount values (0.0001).
 java.lang.String getConsensusSequence()
          Returns the consensus sequence for the motif.
 java.util.Vector<java.lang.Integer> getInstanceLocations()
          Returns a copy of the indices of start locations of motif instances in the sequences.
 double[][] getMatrix()
          Returns a copy of the matrix that models the motif.
 double getNucleotideContent(char c)
          Returns a double representing the frequency that the specified nucleotide character occurs in the genomic sequences.
static void main(java.lang.String[] args)
          The main method creates an initially empty MotifSearch for a motif of the specified length in the genomic sequences found in the specified FASTA file.
 java.lang.String matrixToString()
          Returns a String representation of the matrix that models the motif.
 java.lang.String motifInstancesToString()
          Returns a String representation of motif instances found in the genomic sequences.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

sequences

protected java.util.Vector<java.lang.String> sequences
A collection of genomic sequences


numSequences

protected int numSequences
The number of genomic sequences


motifLength

protected int motifLength
The length of the motif being search for


matrix

protected double[][] matrix
A 2D array corresponding to a position-specific scoring matrix (PSSM) that is a model of the motif being searched for


instanceLocations

protected java.util.Vector<java.lang.Integer> instanceLocations
A collection of integers where each integer represents the index of the start location of a motif instance in a genomic sequence

Constructor Detail

MotifSearch

public MotifSearch(java.lang.String fileName,
                   int motifLength)
Creates an initially empty MotifSearch.

A set of genomic sequences is determined from the specified String representing the name of a file. Genomic sequences are read-in from the FASTA file. The integer parameter represents the desired length of the motif being searched for. Initially, the constructed MotifSearch is empty.

Parameters:
fileName - the name of a FASTA file containing one or more genomic sequences
motifLength - the length of the desired motif
Method Detail

matrixToString

public java.lang.String matrixToString()
Returns a String representation of the matrix that models the motif.

Returns:
a String represenation of the matrix modeling the motif

motifInstancesToString

public java.lang.String motifInstancesToString()
Returns a String representation of motif instances found in the genomic sequences.

Returns:
a String represenation of motif instances found in the genomic sequences

addPseudocountsToMatrix

public void addPseudocountsToMatrix()
Replaces values of 0.0 in the matrix (motif model) with small pseudocount values (0.0001).


getNucleotideContent

public double getNucleotideContent(char c)
Returns a double representing the frequency that the specified nucleotide character occurs in the genomic sequences.

Parameters:
c - a nucleotide character (e.g., A or C or G or T)
Returns:
a double representing the frequency that c occurs in the sequences

getMatrix

public double[][] getMatrix()
Returns a copy of the matrix that models the motif.

Returns:
a 2D array represenation of the matrix modeling the motif

getInstanceLocations

public java.util.Vector<java.lang.Integer> getInstanceLocations()
Returns a copy of the indices of start locations of motif instances in the sequences.

Returns:
a collection of indices of start locations of motif instances in the sequences

getConsensusSequence

public java.lang.String getConsensusSequence()
Returns the consensus sequence for the motif. IUPAC symbols are used as the alphabet for the consensus sequence.

Returns:
a String representing the consensus sequence for the motif

main

public static void main(java.lang.String[] args)
The main method creates an initially empty MotifSearch for a motif of the specified length in the genomic sequences found in the specified FASTA file.

Parameters:
args - an array of Strings representing any command line arguments