SlaganSupermap < ABI

You are here: Foswiki>ABI Web>LectureWiki>AdvancedAlgorithms>GenomeComparisonP4>SlaganSupermap (12 Jul 2010, heeger)Edit Attach

SLAGAN

Input: two genomic dna sequences in FASTA file(s)

Generation of local alignments (Svenja)

SLAGAN uses the CHAOS aligner for this phase:

finding seeds: k-mers with d mismatches (k-mer index?)
chaining seeds
scoring chains
removing chains with score < t
extending seeds & finding optimal gap positions
rescoring

Implementation: Chaos is part of lagan so this is somewhere implemented in SeqAn. But we will probably need to adjust it for our needs. Probably this step has to be run twice: sequence 1 vs sequence2 and sequence1 vs reverse complement of sequence2.

REINERT: have a look at lagan.cpp in the SeqAn demo folder. You will also find the call to chaos. Alternatively, Birte Kehr can provide you with her new local aligner.

Output: local alignments as six-tuples (start1,end1,start2,end2,score,strand) REINERT: Ask Birte for the standard output of her aligner. That format can also be read in again for the next step.

Building the 1-monotonic conservation map (Felix)

storing local alignments in eight data structures for the different chaining types
- which data structure? paper says: balanced binary trees. Cant we use any structure that gives us access to the right alignment in logarithmic time?
dynamic programming for finding high scoring chain (this is in fact a segment based approach)

Implementation: As this is the core feature of the algorithm, we will try to reimplement it.

Output: chain consisting of consistent subchains (how to store this?)

Aligning consistent subsegments (Svenja)

expand consistent subchains
re-align consistent subchains with lagan
clip overlaps in alignment in an optimal way

Implementation: SLAGAN uses LAGAN for this task. This is already implemented in SeqAn somewhere so we can use this. But we will have either to formate our input in a way that the LAGAN implementation can deal with it or adjust the implementation so it can deal with our input

Output: glocal alignment

Implementation Detail

Types of Gaps

The type of gap between two matches L1 and L2 is denoted by three bits (linearInSeq2, IPositive, JPositive)

linearInSeq2: L1 and L2 are collinear or consistent (i.e. L1.end2 < L2.start2)
IPositive: L1 is positive
JPositive: L2 is positive

Implemented a class that can store this three bits in one char. Realized afterward that there is a Bitset class in the STL...

Chaining

gap penalties:

gap open: constant
gap continue: |(L1.end1-L1.end2)-(L2.start1-L2.start2)| * constant (difference in diagonal numbers of end point of L1 and start point of L2 times constant)
distance: min(|L1.end1-L2.start1|, |L1.end2-L2.start2|) * constant

consistent (+++ and ---): gap open = 0, gap continue = -0.5, distance = ?
inversions (++- and --+): gap open = -1000, gap continue = ?, distance = 1*distance
translocation (+-+ and -+-): gap open = -2000, gap continue = ?, distance = 2.5*distance
translocated inversions (+-- and -++): gap open = ?, gap continue = ?, distance = ?

Problems and Milestones

Chaos is done but seems to return wrong results. Some closer examinations and examples are needed
- tested with a very short example: seq1: AAAAAAAAAAA, seq2: different combinations of 11-15 A's and other bases, k=11
- start-position of every local alignment which is found is always 0, end position always the last position of seq2 (that means, always the whole sequence seq2 is found)
- number of local alignments, which are found, would be correct
- further analysis is needed (look at the found seeds and the positions during the alignment step)
- seeds are okay on small examples
- re-align-step is not necessary

addSeed with Chaos-Tag (chaining seeds with Chaos algorithm) leads to segmentation faults and infinity loops → some serious debugging is needed here
- we will skip this for now and only use the Merge tag. Maybe we can use Birtes local aligner later...

scores for local matches can be extracted from the seedSets scoreMap
- position(seedSetIter, seedSet) seems not to work → count the steps the iterator made, use the counter to access the scoreMap

Chaining Algorithms seems to be way more complicated than we thought. We read the supplemental material to the slagan paper but it seems we will also have to read (and understand) the Eppstein Paper on sparse dynamic programming. Maybe it is a good idea to first implement a quadratic version of the chaining
- we will build a graph with nodes representing the matches ans edges represneting possible chainings (only 1-monotinic ones)
- use bellman-ford algorithm to find the heaviest path → highscoring chaining
gap costs are not really good explained in the paper (as can seen above we are lacking some of them)

Next steps

debugging chaos; output of chaos

implement the chaining in O(n^2)

Topic revision: r11 - 12 Jul 2010, heeger - This page was cached on 09 Mar 2025 - 17:12.

ABI

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback