You are here: ABI » ThesesHome » SIMDDpAlgo

SIMDDpAlgo

SIMD extension of the standard DP algorithms in SeqAn

Introduction

Newer processors are shipped with 16 registers having an extended width of 128 bit (SSE) or 256 bit (AVX) respectively. Those micro processors are supplied with an extended instruction set to perform SIMD (Single Instruction Multiple Data) operations on those registers. Many scientific applications benefit from those by that some computations can be performed in parallel on the level of registers.

One of the most fundamental operations in bioinformatics is the computation of pairwise sequence alignments. Here, dynamic programming is used to compute a matrix to determine the optimal alignment between two sequences by means of the given scoring and gap scheme. Alignments are used in many applications. During Read Mapping for example several million alignments between short DNA segments and a reference sequence are computed and stored in form of a CIGAR string in SAM file. Those applications would greatly benefit from high-parallelized computation of such alignments. One approach of course is to use a multi-threaded approach to distribute the work to multiple cores. However, all of those cores also support vector extensions and those can even increase the load of each process by running multiple alignments at the same time on one core.

The alignment algorithms in SeqAn have been unified using a highly modularised structure such that the standard DP algorithms can be easily extended by new features with minor effort. In this work the DP-Algorithms implemented in SeqAn have to be generically extended to support inter-SIMD parallelisation using the vector extension AVX.

Specifications

Global Function Interface

The global alignment interfaces need to be adapted to support pairwise alignments of set of strings instead of two single sequences. The user should be warned by an assertion if the chosen score value type is to small for the number of sequences used, such that a number overflow is likely to occur.

Internal Function Interface

Internally the score functions must be adapted to also support instructions with the extended SIMD vectors. A generic component has to be identified and interfaced to support both vectorised and scalar values.

Due to the modularised structure of the alignments many different configurations are possible. The SIMD interface should cover the same functionality as for the scalar version. This includes: banded and unbanded alignments, lokal, global and semi-global alignments and different traceback configurations. The components necessary to accomplish this need to be identified and adapted in a generic way.

The Traceback should be serialised in the end, as it takes only a minimal fraction of the time compared to the computation of the whole DP matrix.

Iterator Interface

There is an outer and an inner for-loop iterating through the columns and rows for all the DP algorithms respectively. To allow a generic use of the iterators a data structure with an iterator interface must be devised, such that the given sequences are transformed from an horizontal to an vertical layout, while the same iterator interface as used for the scalar version can be reused.

Evaluation

The SIMD alignment version should be tested thoroughly. In the end the performance should be evaluated and compared with the serial version as well as with other implementations like the stripped Smith-Waterman [1] which is implemented in the SSEARCH of the fasta package [2].

References

[1] Farrar, Michael. "Striped Smith–Waterman speeds database searches six times over other SIMD implementations." Bioinformatics 23.2 (2007): 156-161.

[2] http://fasta.bioch.virginia.edu/fasta_www2/fasta_intro.shtml

 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback