Read mapping is a fundamental task in DNA sequence analysis. Current read mapping tools are very fast and precise but usually fail to map reads that cross breakpoints of structural variants (SVs), or exon-exon junctions in the case of RNA-sequencing. These breakpoints cause one or even multiple splits in the read-to-reference alignment, with parts of the read mapping to different locations on the reference sequence. Identification and classification of SVs is important to evaluate their functional impact but remains challenging. So far, there is no sophisticated SV detection method that can determine all types of SVs at single-nucleotide resolution while being independent from different platforms like Illumina or 454, or from paired-end and single-end reads.
We designed and implemented a sound generic multi-split chaining method using the C++ library SeqAn that uses SeqAn’s exact local aligner Stellar to detect splits of a read. Compatible local matches of a read are then identified, and all compatibility information is stored in a split-read graph representation of the matches. We then use a DAG shortest path algorithm to determine the most probable chain of splits, and report the underlying breakpoints.
Our approach is more versatile compared to existing split-read methods. It allows for multiple splits at arbitrary locations in the read, and is able to detect inversions, inter- and intra-chromosomal translocations, duplications, insertions, and deletions. At the same time, it is independent of the read length. We successfully applied our method to simulated Illumina read data and also to 454 RNA-Seq data, yielding robust results that can compete with the results of the tool SVDetect and the Illumina and the 454 analysis software.