You are here: ABI » ThesesHome » BScReadRealignment

BScReadRealignment

Comparing short-read realignment algorithms.

Background

Read alignment is a crucial step for the analysis of Next-Generation-Sequencing (NGS) data. Many downstream analyses then use the alignment information as the input, for example SNP or small indel calling. In the context of indels (indel sequencing errors in the reads or indels in the donor with respect to the reference), this yields problems:

The read alignment program (aka read mapper) computes pairwise alignments of the reads to the reference. Many of these pairwise alignments then have to be combined into one multi-read alignment (MSA) against the reference. Usually, such tools work first by heuristically creating such a MSA from the pairwise alignments and then refine it [1,2,3].

Topic

The SeqAn library alread provides an implementation of the Anson-Myers algorithm (reAligner) for realignment. The task of this library is to implement the algorithm by Homer and Nelson (SRMA) and compare the results of SRMA and reAligner.

For this, a careful implementation of SRMA is required. Furthermore, existing simulators should be used to simulate data sets with SNPs ans indels and call these variants using existing tools, e.g. [4] or GATK.

The task is thus twofold:

  1. Provide a robust implementation of SRMA as a SeqAn library module including test and documentation.
  2. Test the SRMA implementation against the implementaiton of reAligner using synthetic and ideally also real-world data sets.

Comments

References

  • [1] Anson EL, Myers EW. ReAligner: a program for refining DNA sequence multi-alignments. J Comput Biol. 1997 Fall;4(3):369-83.
  • [2] Homer N, Nelson SF. Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol. 2010;11(10):R99. doi: 10.1186/gb-2010-11-10-r99
  • [3] GATK Local Realigner.
  • [4] Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback