Comparing short-read realignment algorithms.
Read alignment is a crucial step for the analysis of Next-Generation-Sequencing (NGS) data. Many downstream analyses then use the alignment information as the input, for example SNP or small indel calling. In the context of indels (indel sequencing errors in the reads or indels in the donor with respect to the reference), this yields problems:
The read alignment program (aka read mapper) computes pairwise alignments of the reads to the reference. Many of these pairwise alignments then have to be combined into one multi-read alignment (MSA) against the reference. Usually, such tools work first by heuristically creating such a MSA from the pairwise alignments and then refine it [1,2,3].
The SeqAn library alread provides an implementation of the Anson-Myers algorithm (reAligner) for realignment. The task of this library is to implement the algorithm by Homer and Nelson (SRMA) and compare the results of SRMA and reAligner.
For this, a careful implementation of SRMA is required. Furthermore, existing simulators should be used to simulate data sets with SNPs ans indels and call these variants using existing tools, e.g. [4] or GATK.
The task is thus twofold: