You are here: ABI » ThesesHome » ThesisMCRazer

ThesisMCRazer

Check, whether abstract, description, and literature are in CMS

Parallelization of RazerS

Student
Martin Riese
Academic Advisor
David Weese, Prof. Dr. Knut Reinert

Abstract

In this work the existing implementation of the read mapping tool RazerS should be parallelized and improved in the handling of repeats.

Description

RazerS [1] is a fast read mapping tool with sensitivity control, i.e. it provides a seamless tradeoff between sensitivity and running time and guarantees not to lose more reads than specified. It is based on a specialization of the SWIFT [2] algorithm and uses a k-mer index of the reads to search for common k-mer in parallelograms. In this work 3 possible ways of parallelization should be examined:
  • The scanning of genomic regions could be distributed
  • The set of reads and there indices could be distributed
  • The verification processes could be distributed
  • Therefore different multi-core frameworks should be examined, e.g. Threads, Mutexes, … in SeqAn; OpenMP; Intel Threading Building Blocks [4]

Another goal of this work is to improve the handling of repeats. In contrast to non-repeat regions, low-complexity repeats and reads of these repeats cause a huge number of counting events. This number is linear in the length of repeat region and the number of reads covering this region. Different strategies should be developed to overcome this bottleneck, e.g. using repeat annotations / detection, compressing repeats, generating matches from a representative repeat substring. The verification process of RazerS should be extended by a quality based DP alignment and a banded Myers bitvector verification.

Literature

[1] Weese, D., Emde, A.-K., Rausch, T., Döring, A., and Reinert, K. (2009). Razers–fast read mapping with sensitivity control. Genome Res, 19(9):1646–54. [2] Rasmussen K., Stoye J., Myers G. (2006). Efficient q-gram filters for finding all ε-matches over a given length. J Comput Biol 13:296–308. [3] Morgulis A., Gertz E.M., Schäffer A.A., Agarwala R. (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 2006 Jun;13(5):1028-40. [4] http://www.threadingbuildingblocks.org

Work Plan

Work plan

Weekly reports

Weekly reports
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback