Holger Schmeisky:

Approximate String Matching (Hamming Distance)

Requirements

C++ knowledge

Academic Advisor

Manuel Holtgrewe, Knut Reinert

Discipline

Bioinformatics

Degree

Bachelor of Science (B.Sc.)

Project

SeqAn

Status

finished

Approximate String Matching is an important problem in Bioinformatics applications, e.g. [Weese et al., 2009]. There are various ways of quantifying "approximate". For example Hamming or Levensthein distance are considered. In this thesis, approximate string matching with Hamming distance, sometimes also called k Mismatches Problem, is considered: Find a "short" string needle in a longer string haystack such that Hamming distance between the needle and the aligned part of the haystack is less than a given value k.

There a large body of literature on approximate string matching, see e.g. [Navarro, 2001]. However, it is still an active area of research, e.g. [Salmela et al., 2010].

The first aim of this thesis is to implement existing algorithms for the k Mismatches Problem that have shown to be competetive in recent studies. The second aim of this thesis is to perform an experimental study of the algorithms on real world inputs. The third aim of this thesis is a clear presentation and evaluation of the study's results.

Department of Mathematics and Computer Science

Algorithmic Bioinformatics (ABI)

Holger Schmeisky:

Approximate String Matching (Hamming Distance)

Requirements

Contents