Topic
Motif finding using the STELLAR engine and SeqAn::TCoffee
The project builds on work on exact local alignment by Kehr, Rausch, Emde and Reinert [1,2]. STELLAR is a tool to compute exact local DNA matches without X-drops. SeqAn::TCoffee is a segment based version of the popular TCoffee algorithm by Notredame [3].
Motif finding is in general the task of finding a hidden sequence motif in a set of sequences.
The aim of this project is to use the algorithms to implement a motif finder algorithm for DNA in SeqAn. The steps are
- Use the present interface for motif finding in SeqAn
- Use the Stellar algorithm to find local segment matches between the sequences
- Compute a multiple alignment of the pairwise segments using SeqAn::TCoffee
- Scan the alignment and find conserved motifs
- Refine the motifs using the EM algorithms in SeqAn
Timeline
- [Week 1-2] Getting acquainted with the necessary SeqAn componentes for motif finding, setting up test bed (e.g. Pevzner benchmark).
- [Week 3-6] Implementing the the segment generation, refinement and alignment
- [Week 6-8] Implementing the extraction of motifs and local improvement (existing EM)
- [Week 8-10] Testing on benchmark, comparison with other programs
- [Week 10-12] Write up
Subpages
References
[1] Kehr, B., D. Weese, and Knut Reinert. 2011. “STELLAR: Fast and Exact Local Alignments.” BMC Bioinformatics 12 (Suppl 9): S15.
[2] Rausch, Tobias, Anne-Katrin Emde, David Weese, Andreas Döring, Cedric Notredame, and Knut Reinert. 2008. “Segment-Based Multiple Sequence Alignment..” Bioinformatics (Oxford, England) 24 (16) (August 15): i187–92. doi:10.1093/bioinformatics/btn281.
[3] Notredame, Cedric. 2000. “T-Coffee: a Novel Method for Fast and Accurate Multiple Sequence Alignment.” Journal of Molecular Biology 302 (1) (September 8): 205–217. doi:10.1006/jmbi.2000.4042.