Implementation and evaluation of index-based seeding strategies in SeqAn


Substring Indices, read mapping, local alignment, q-mer indices


The goal of this thesis is to implement different seeding strategies using fixed length and variable length exact and approximate seeds and evaluate their runtime as well as specificity and sensitivity.

  • Implementation of a a benchmark by simulating local DNA sequences with various error rates from a given genomic sequence (see also [3]).
  • Implementation of q-mer and index based searches for exact and approximate seeds. Special emphasis will be on search in dislex transformed strings (see [1,2]
  • Evaluating the results in terms of run time specificity and sensitivity



