Parallelism In The SeqAn Library: Patterns for Parallelization

A brief summary of patterns for parallelization, based on [Massingill et al. 2000].

Finding Concurrency

Decomposition

Task decomposition

Identify functional units that can be computed independently/in parallel.

Tasks may map to function calls or independent loop iterations.

Data decomposition

Identify physically continuous blocks of data to operate on, partition the data appropriately and assign a task for each partition.

For example, decompose loops into independent tasks for each iteration, or partition loop/data into chunks and assign one task to each chunk.

Dependency analysis

Task grouping

Aggregate tasks to form task groups, thus reducing granularity.

“Tasks that correspond to a high-level operation naturally group together.”

Merge tasks that share the same constraints (shared data …).

Task ordering

Identify ordering constraints between task groups:

  • Tasks that need to execute concurrently.
  • Tasks that need to execute sequentially because the second depends on the results of the first.
  • Absence of any constraints between sets of tasks.

Data sharing

Identify which data is

  • Task local
  • Shared, but read-only
  • Effectively task local (i.e. partitioned into subsets corresponding to tasks)
  • Read/write
    • Accumulated
    • Scattered

Identify border regions where data sets of different tasks overlap.

Evaluation

Suitability for the target platform

  • How many PEs are available?
  • How are data structures shared among PEs?
  • What does the target architecture imply about the # of UE and how structures are shared among them?
  • Will the time spent doing useful work in a task be significantly greater than the time taken to deal with dependencies?

Design quality

  • Flexibility
  • Efficiency
  • Simplicity

Preparation for the next phase

  • How regular are the tasks and their data dependencies?
  • Are interactions between tasks (groups) synchronous or asynchronous?
  • Are the tasks grouped in the best way?

Algorithm Structure

Organized by …

… tasks

  • Task parallelism
  • Divide and conquer

… data decomposition

  • Geometric decomposition
  • Recursive decomposition
  • Domain decomposition

… flow of data

  • Pipeline – Each stage should take approximately the same time. Subsequent stages execute in parallel (communication via queue). Replicate stateless stages to augment parallelism.
  • Event-based coordinaton

Supporting Structures

Program structures

  • Master/worker
    • static: each worker works on one task only, then returns
    • dynamic: workers on standby until they receive work via input queue, process work, push into output queue, idle.
  • Producer/consumer
    • Fixed-size queue
    • Consumers wait while queue empty; producers wait while queue full
    • Requires synchronizing guards
  • Loop parallelism
  • Fork/join
  • Leader/follower
  • Work stealing
    • Asynchronous round-robin
    • Global round-robin
    • Random polling

Literature

  • [Massingill et al., 2000] B. L. Massingill, T. G. Mattson, and B. A. Sanders. Patterns for finding concurrency for parallel application programs. In 7th. Pattern Languages of Programs Conference (PLOP 2000), 2000.
  • [Kumar et al., 1994] V. Kumar, A. Grama, A. Gupta, G Karypis, Introduction to parallel computing: Design and analysis of algorithms, The Benjamin/Cummings Publishing Company, Inc. (Redwood City, California), 1994.
Topic revision: r12 - 21 Jun 2010, KonradRudolph
 
  • Printable version of this topic (p) Printable version of this topic (p)