BscImprovementsOfGraphBasedRealignment

Improving the Graph-Based Realignment in SeqAn.

Background

The SeqAn library contains a powerful method for realignment and consensus based on the alignment graph [1]. While this works quite well, the approach has some drawbacks. The following picture shows a multi-read alignment where the graph-based (re-)alignment method did not succeed in create a good alignment.

multi_read_alignment

The input for the graph-based alignment are matches between the reads. The matches might be conflicting and the alignment algorithm selects some matches while discarding others. In the left two marked regions, this leads to many small insertions and deletions while in the right marked region, there is a long stretch TAATT…CAACA that the matches were discarded for.

Another problem of the realignment method is the cubic running time of the triplet consensus extension. This is problematic with deep alignments (hundreds of stacked sequences).

Topic

The aim of this thesis is to fix the issues mentioned above.

  • The artifacts could be fixed in a postprocessing step. The student should develop heuristics to find such artifacts and fix them (e.g. creating pairwise read alignments in the problematic regions, add them to the underlying alignment graph, and perform progressive alignemtn again).
  • The running time problems could be accommodated by doing the consensus extension in a hierarchical fashion or with sampling.
In the case of read realignment, one could also use an adaptive method, i.e. full triplet extension in lower-coverage regions and sampling-based realignment in high-coverage regions.

Comments

References

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback