Implementation and evaluation of index-based seeding strategies in SeqAn


Substring Indices, read mapping, local alignment, q-mer indices


The goal of this thesis is to implement different seeding strategies using fixed length and variable length exact and approximate seeds and evaluate their runtime as well as specificity and sensitivity.

  • Implementation of a a benchmark by simulating local DNA sequences with various error rates from a given genomic sequence (see also [3]).
  • Implementation of q-mer and index based searches for exact and approximate seeds. Special emphasis will be on search in dislex transformed strings (see [1,2]
  • Evaluating the results in terms of run time specificity and sensitivity



  1. Kiełbasa, Szymon M, Raymond Wan, Kengo Sato, Paul Horton, and Martin C Frith. 2011. “Adaptive Seeds Tame Genomic Sequence Comparison.” Genome Research 21 (3) (March): 487–493. doi:10.1101/gr.113985.110.
  2. Horton, P, and SM Kiełbasa. 2008. “DisLex: a Transformation for Discontiguous Suffix Array Construction.” In.
  3. Kehr, B., David Weese, and Knut Reinert. 2011. “STELLAR: Fast and Exact Local Alignments.” BMC Bioinformatics 12 (Suppl 9): S15.

Topic revision: r2 - 27 Jan 2016, KnutReinert
  • Printable version of this topic (p) Printable version of this topic (p)