You are here: Wiki>ABI Web>ThesesHome>ThesisGPGPUSeqan (22 Mar 2011, llangner)Edit

Page ThesisGPGPUSeqan

Myers Bit-Vector Algorithm on GPU for Seqan

Massiv parallelization of Myers Fast Bit-Vector Algorithm for Approximate String Matchinsg using GPGPU technics (NVidia CUDA) in Seqan

Lars Langner

Academic Advisor
Prof. Dr. Knut Reinert, David Weese


This work will be about the possibilty to use GPGPU methodes in Seqan (especially with NVidia CUDA) and the implementation and parallelization of Myers fast bitvector algorithm for approximate string matchings.
It is stated that using massive multiprocessor programming can dramatically increases the speed and performance of algorithms, depending on the amount that could be parallized and the efficiency it is implemented.
Using hundreds or thousands of parallel threads for read mapping/localizing/matching to a genom with the fast GPU cores and Myers fast matching algorithm can give an great gain of speed for sequence analyse and string matchings.

Serveral issues has to be worked out:
  • denote what preconditions are needed for using and compiling GPGPU/NVidia CUDA in Seqan
  • if and how generic programming or meta-programming can be used within the algorithm on GPU
  • generic implementation with any desired bitvector-length in an efficient manner for Seqan
  • generic implementation of Myers bitvector algorithm on CPU and GPU with different strategies (global vs. thread-local memorys)
  • extension to the banded Myers bitvector algorithm (Ukkonen Trick)
  • usage of different distance measures (Levenshtein and Damerau Edit Distances)
  • efficently distribution/scheduling of the tasks onto the GPU processors/threads
  • testing, performance and usage analyse for the algorithm and comparing to the CPU variant

This will require an efficient adopting of the problem and problem-size for partitioning the parallel work most effective on GPU with different hardware settings.


[1] Myers, G. (1999). A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming, Journal of the ACM, 46(3): 495-415
[2] Hyyrö, H. (2001). A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances
[3] Gröpl, C., Klau, G. and Reinert, K. (2009). Bit-parallel string matching
[4] Bailey, M., Cunningham, S. (2009). Graphics Shaders - Theory and Practice
[5] Kirk, David B., Hwu, Wen.mei W. (2010). Programming Massively Parallel Processors - A Hands-on Approach

Work Plan

This diploma thesis has a duration of 6 month from 01.10.2010 until 01.04.2011.

Month 1 - Integrate and Implement the algorithm for GPU and CPU

Month 2 - Implement different variants, strategies for scheduling and find usage capabilitys

Month 3 - Integrate into Seqan, Real-World data and performance tests

Month 4 - Enhance implementations and beginn writing diploma thesis

Month 5 & 6 - Working and writing out the diploma thesis

Weekly Report

Week 0 - Literature Study and Researches
Week 1 (4.10) - Seqan Retreat 2010 / Getting into Seqan
Week 2 (11.10) - Generic Bitvectors & Myers-Algorithm Implementation
Week 3 (18.10) - Generic CUDA Implementation
Week 4 (25.10) - Solving Memory & Threading issues
Week 5 (01.11) - Improved memory handling, matching detection
Week 6 (08.11) - Implem. other distance measures; first performance measurements
Week 7 (15.11) - Implem. CPU methode like the CUDA methode ; Unrolling loops, force inlining, performance issues
Week 8 (22.11) - begin OpenCL implementation, Seqan Interface Adaptions ; Code fixings ; double-buffer logic
Week 9 (29.11) - unify code for independent CUDA,OpenCL or CPU usage ; threading & memory issues ; trible-buffering logic
Week 10(06.12) - scheduling, computation and performance test & improvements, many threads test (up to 1 million)
Week 13(13.12) - implementing Myers Banded Variantions, better asyncronisation, commenting code
Week 14(20.12) - dddoc documentations, Seqan integration
Week 15(27.12) - ddoc- documentations, Christmas holiday
Week 16(03.01) - Seqan Finder vs. GPU Test, Diploma Thesis writing
Week 17(10.01) - Buffering,Stability & Performance Improvments, Test Suite Improvments, OpenCL run
Week 18(17.01) - OpenCL version for small bitvectors ( <=64 )
Week 19(24.01) - Writing Diploma Thesis
Week 20(31.01) - Writing Diploma Thesis
Week 21(07.02) - Writing Diploma Thesis
Week 22(14.02) - Adding Hamming Distance, Writing Diploma Thesis
Week 23(21.02) - Gathering performance results, Writing Diploma Thesis
Week 24(28.02) - Gathering performance results, Writing Diploma Thesis
Week 25(07.03) - Writing Diploma Thesis
Week 26(14.03) - Diploma Thesis Review
Week 27(21.03) - Diploma Thesis Review
Week 28(28.03) - Finishing and release


Topic revision: r28 - 22 Mar 2011, llangner
  • Printable version of this topic (p) Printable version of this topic (p)