Hidden Markov Models (HMMs) are among the most prominent methods in Bioinformatics. They are routinely applied to DNA sequences, for example for CpG-island or coding sequence detection, and to protein sequences for the important problem of protein domain identification among others. The statistical flexible foundation of HMMs make it the primary choice for many tasks. Given the exponentially increasing amount of sequencing capacity by next and third generation sequencing approaches, their is a high demand for efficient HMM learning algorithms from sequence data. SeqAn offers fast implementation of indexing algorithms, like suffix trees and suffix arrays, which combined with the HMM framework, can satisfy the above mentioned needs.
This thesis has the following parts.
Part (1) should be based on (Lifshits et al. 2009). The following plan consists of 6 months of work.