You are here: ABI » ThesesHome » NewHMMLearningAlgorithms

MSc Thesis: Faster HMM Learning with Indexing Structures

Motivation

Hidden Markov Models (HMMs) are among the most prominent methods in Bioinformatics. They are routinely applied to DNA sequences, for example for CpG-island or coding sequence detection, and to protein sequences for the important problem of protein domain identification among others. The statistical flexible foundation of HMMs make it the primary choice for many tasks. Given the exponentially increasing amount of sequencing capacity by next and third generation sequencing approaches, their is a high demand for efficient HMM learning algorithms from sequence data. SeqAn offers fast implementation of indexing algorithms, like suffix trees and suffix arrays, which combined with the HMM framework, can satisfy the above mentioned needs.

Overview

This thesis has the following parts.

  1. Implementation of an HMM learning algorithm based on indexing structures.
  2. Comparison with standard learning algorithms which are implemented in SeqAn to a large extend already.

Part (1) should be based on (Lifshits et al. 2009). The following plan consists of 6 months of work.

Work Plan

  • week 1-2: Understanding the paper by Lifshits et al. 2009 and familiarization with HMMs
  • week 3-4: Exploration of the current HMM implementation in SeqAn
  • month2-3: Implementation of the new algorithm in SeqAn
  • month4 : Performance Experiments on standard application with HMMs and comparison to standard learning algorithms in SeqAn.
  • month5-6: Cleaning code, Documentation and Interpretation of results and writing the Master thesis.
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback