Title: | Reduced Space Pair HMM |
Speaker: |
Philip Lam
|
Abstract |
Using a pair-Hidden Markov model (pair-HMM) with N states to align two
sequences of lengths L1 and L2 (DNA or protein) requires memory O
(NL1L2) and time O(N2L1L2). In practice, the sequences can be short
regions of few base-pairs long such as a protein binding motif, or
they can be long genomic fragments thousands of base-pairs long. In
the latter case, aligning two sequences might be impractical due to
the costly resource requirements. The solution is to find ways to
reduce the computational space. Different heuristic tricks have been
employed to do this for pair-HMMs, profile HMMs, and profile
stochastic context free grammars (profile SCFGs; for RNA
structural/sequence alignment). In most cases, prior knowledge of
certain features of the sequences is utilized.
Our novel method can work without having any prior knowledge of the
sequence. A process is applied to the Markov model at the beginning
which reduces our search space. Furthermore, the sum of probability of
removed state paths can be calculated.
|