Title: Reduced Space Pair HMM
Speaker: Philip Lam
Abstract Using a pair-Hidden Markov model (pair-HMM) with N states to align two sequences of lengths L1 and L2 (DNA or protein) requires memory O (NL1L2) and time O(N2L1L2). In practice, the sequences can be short regions of few base-pairs long such as a protein binding motif, or they can be long genomic fragments thousands of base-pairs long. In the latter case, aligning two sequences might be impractical due to the costly resource requirements. The solution is to find ways to reduce the computational space. Different heuristic tricks have been employed to do this for pair-HMMs, profile HMMs, and profile stochastic context free grammars (profile SCFGs; for RNA structural/sequence alignment). In most cases, prior knowledge of certain features of the sequences is utilized. Our novel method can work without having any prior knowledge of the sequence. A process is applied to the Markov model at the beginning which reduces our search space. Furthermore, the sum of probability of removed state paths can be calculated.