You are here: ABI » ThesesHome » ThesisParallelismInSeqan

Parallelism In The SeqAn Library

The purpose of this thesis is to allow "easy parallelism" in the SeqAn library. This will consist of identifying parts of the library where parallelization can be applied effectively and actually engineering (designing, implementing and evaluating) parts of the library to use parallelism. Common parallelization patterns should be extracted into a generalized framework/library that can then expose high-level parallelism to the users of SeqAn.

The parallel execution of the string search algorithms in the module "find" as well as the algorithms for constructing indices are obvious candidate for parallelization. The parallel execution of string matching algorithms is a good starting point. Additionally, it allows the extension of the task to many interesting research questions, e.g. how to collect the results, how to process them etc.

Module "find" Cleanup

The module "find" is one of the oldest modules in SeqAn and a lot of "technical debt" has accumulated. We are currently working on a new draft of this module and a naive implementation of the various search types. For the application of the implementation, it will probably be easier to work with the new interface and adjust the old -- working -- code to the new specification.

To do this for the competetive algorithms (e.g. Myers Bit Vector Algorithm) is part of this thesis work.

There is lots of related practical and theoretical work. The implementation resulting from this thesis should find a "sweet spot" between reimplementing existing implementations and using them. The decision finding process is to be documented. Competetive previous implementation of algorithms are to be taken into consideration in an exhaustive experimental study.

Related work includes [TBB], [Singler, 2008], [Singler, 2007], [Naeher, 2008], [Tichy, 2009], [Tichy, 2008] and [Massingil et al., 2000]. The design of experiments and presentation of data could be guided by [Sanders, 2002].

Tentative Schedule

We propose the following tentative schedule.

  • Review of existing literature and other implementations (2-4 weeks).
  • Cleanup of module find (2-3 weeks).
  • Various parallel find implementations (2-3 weeks).
  • At the same time, thought should go into how to generalize the code into a generalized parallel framework (2 weeks).
  • Exploration of parallelization ideas for more parts of SeqAn and implementation (4-6 weeks).
  • Experimental study, comparison with other implementations (2-3 weeks).
  • Thesis write-up (4 weeks).

Tentative total: 25 weeks. There will be regular meetings with your supervisor.

We stress that one month should be allocated for the thesis write-up to ensure enough time for a high-quality write-up. However, with all other schedule items: If the work progresses more quickly than anticipated, time can be rescheduled.

The wiki pages ThesisParallelismInSeqanWorkSchedule and ThesisParallelismInSeqanWeeklyReports should track the updated actual schedule for the next weeks and weekly reports.

Requirements

  • The work should follow good software and engineering practice, e.g. There should be distinct planning and implementation phases, where possible.
  • The code should follow the SeqAn style guide.
  • The code should be well-tested and documented.
  • The thesis should show and document the design decisions.
  • The experimental evaluation should incorporate the main competitors (e.g. MCSTL, TBB).
  • The work progress is to be tracked in the AGABI wiki.

Literature

The following can serve as a starting point for your own research:

  • [TBB] Intel Threading Building Blocks http://osstbb.intel.com.
  • [Singler, 2008] Singler, J. and Konsik, B. The GNU libstdc++ parallel mode: software engineering considerations. Proceedings of the 1st international workshop on Multicore software engineering, 2008. p.15-22.
  • [Singler, 2007] Singler, J. and Sanders, P. and Putze, F. MCSTL: The multi-core standard template library. Euro-Par 2007 Parallel Processing. p682-694.
  • [Naeher, 2008] A Framework for Multi-Core Implementations of Divide and Conquer Algorithms and its Application to the Convex Hull Problem. CCCG 2008, Montreal, Quebec, August 13–15, 2008.
  • [Tichy, 2009] Tichy, W. and Pankratius, Victor. Lecture Slide: Softwareentwicklung für moderne, parallele Plattformen. SS2009.
  • [Tichy, 2008] Tichy W. et al. Lecture Slides: Multikernrechner und Rechnerbuendel.
  • [Sanders, 2002] P. Sanders. Presenting Data from Experiments in Algorithmics. In Experimental Algorithmics -- From Algorithm Design to Robust and Efficient Software, volume 2547 of LNCS, pages 181-196. Springer, 2002.
  • [Massingil et al., 2000] B. L. Massingill, T. G. Mattson, and B. A. Sanders. Patterns for finding concurrency for parallel application programs. In 7th. Pattern Languages of Programs Conference (PLOP 2000), 2000.
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback