Motif finding using the STELLAR engine and SeqAn::TCoffee

The project builds on work on exact local alignment by Kehr, Rausch, Emde and Reinert [1,2]. STELLAR is a tool to compute exact local DNA matches without X-drops. SeqAn::TCoffee is a segment based version of the popular TCoffee algorithm by Notredame [3]. Motif finding is in general the task of finding a hidden sequence motif in a set of sequences.

The aim of this project is to use the algorithms to implement a motif finder algorithm for DNA in SeqAn. The steps are

  • Use the present interface for motif finding in SeqAn
  • Use the Stellar algorithm to find local segment matches between the sequences
  • Compute a multiple alignment of the pairwise segments using SeqAn::TCoffee
  • Scan the alignment and find conserved motifs
  • Refine the motifs using the EM algorithms in SeqAn


  • [Week 1-2] Getting acquainted with the necessary SeqAn componentes for motif finding, setting up test bed (e.g. Pevzner benchmark).
  • [Week 3-6] Implementing the the segment generation, refinement and alignment
  • [Week 6-8] Implementing the extraction of motifs and local improvement (existing EM)
  • [Week 8-10] Testing on benchmark, comparison with other programs
  • [Week 10-12] Write up



[1] Kehr, B., D. Weese, and Knut Reinert. 2011. “STELLAR: Fast and Exact Local Alignments.” BMC Bioinformatics 12 (Suppl 9): S15.

[2] Rausch, Tobias, Anne-Katrin Emde, David Weese, Andreas Döring, Cedric Notredame, and Knut Reinert. 2008. “Segment-Based Multiple Sequence Alignment..” Bioinformatics (Oxford, England) 24 (16) (August 15): i187–92. doi:10.1093/bioinformatics/btn281.

[3] Notredame, Cedric. 2000. “T-Coffee: a Novel Method for Fast and Accurate Multiple Sequence Alignment.” Journal of Molecular Biology 302 (1) (September 8): 205–217. doi:10.1006/jmbi.2000.4042.
Topic revision: r3 - 25 Jul 2012, BirteKehr
  • Printable version of this topic (p) Printable version of this topic (p)