Algorithmic Bioinformatics

Kathrin Trappe

Multi-Split-Mapping of NGS reads for variant detection

Academic Advisor: Tobias Rausch , Anne-Katrin Emde , Knut Reinert
Discipline: Bioinformatik
Degree: Master of Science (M.Sc.)
Degree: Mar 06, 2012
Status: finished


Read mapping is a fundamental task in DNA sequence analysis. Current read mapping tools are very fast and precise but usually fail to map reads that cross breakpoints of structural variants (SVs), or exon-exon junctions in the case of RNA-sequencing. These breakpoints cause one or even multiple splits in the read-to-reference alignment, with parts of the read mapping to different locations on the reference sequence. Identification and classification of SVs is important to evaluate their functional impact but remains challenging. So far, there is no sophisticated SV detection method that can determine all types of SVs at single-nucleotide resolution while being independent from different platforms like Illumina or 454, or from paired-end and single-end reads.

We designed and implemented a sound generic multi-split chaining method using the C++ library SeqAn that uses SeqAn’s exact local aligner Stellar to detect splits of a read. Compatible local matches of a read are then identified, and all compatibility information is stored in a split-read graph representation of the matches. We then use a DAG shortest path algorithm to determine the most probable chain of splits, and report the underlying breakpoints.

Our approach is more versatile compared to existing split-read methods. It allows for multiple splits at arbitrary locations in the read, and is able to detect inversions, inter- and intra-chromosomal translocations, duplications, insertions, and deletions. At the same time, it is independent of the read length. We successfully applied our method to simulated Illumina read data and also to 454 RNA-Seq data, yielding robust results that can compete with the results of the tool SVDetect and the Illumina and the 454 analysis software. 



[1] Trapnell, Cole, Lior Pachter, and Steven L Salzberg. 2009. “TopHat: discovering splice junctions with RNA-Seq.” Bioinformatics (Oxford, England) 25 (9) (April 30): 1105-1111. doi:10.1093/bioinformatics/btp120.

[2] Maher, Christopher A, Chandan Kumar-Sinha, Xuhong Cao, Shanker Kalyana-Sundaram, Bo Han, Xiaojun Jing, Lee Sam, Terrence Barrette, Nallasivam Palanisamy, and Arul M Chinnaiyan. 2009. “Transcriptome sequencing to detect gene fusions in cancer.” Nature 457 (7234) (February 22): 97-101. doi:10.1038/nature07638.

[3] Wu, T, and C Watanabe. 2005. “GMAP: a genomic mapping and alignment program for mRNA and EST sequences.” Bioinformatics (Oxford, England).

[4] Alkan, Can, Coe, Bradley P., and Eichler, Evan E. 2011. “Genome structural variation discovery and genotyping.” Nature Rev. Gent. 12, 363-376 (May 2011). doi:10.1038/nrg2958