You are here: Foswiki>ABI Web>ThesesHome>BScGenAlignGraphs (19 Sep 2013, kiekemo)Edit

BScGenAlignGraphs

Comparing graphs for genome alignment of multiple sequences under the presence of large structural events.

Schedule [Moritz]

Finishing Date Milestone estimated time needed
31.07. Modul for read-in of .maf files 1 week
21.08. Modul to create A-Bruijn graphs from .maf files 3 weeks
28.08. Modul for output of A-Bruijn graphs 1 week
04.09. Adding translocation and inversion to AlignmentGraph modul 1 week
11.09. Converting A-Bruijn graph to Alignment graph and vice versa (optional) 1 week
18.09. Benchmarks 1 week
25.09. Creating Bachelor Thesis 2 weeks

Weekly progress

Week progress
24.07. - 31.07. Finished maf parser with some problems. Fixed in time though.
31.07. - 07.08. Decided to change schedule. Going to finish A-Bruijn graph implementation first. Added inversion to AlignmentGraph. Altering buildAlignmentGraph() for translocation is probably very time consuming, therefore the schedule change. Read up on A-Bruijn graph mainly and worked out a plan for the implementation.
31.07. - 07.08. Implemented a scaffold for the A-Bruijn graph and cleared up the last details.
07.08. - 14.08. Implemented the buildAbruijnGraph()-function with support for inversions. maf files can now be converted into A-Bruijn graphs.
14.08. - 21.08. Implemented the two functions write2MAF() and write2DOT(). write2MAF() creates a .maf file of the A-Bruijn. write2DOT() creates a .dot file that can be read by GraphViz and converted into a graph.
21.08. - 28.08 Problem with the output after insertion and deletion of vertices. Problem solved by fixing the functions addVertex() and removeVertex(). Minor changes to the datatypes used (Align object and AlignBlock object). Wrote tests for all functions except for write2DOT() and write2MAF(). Taking care of AlignmentGraph now.
28.08. - 04.09. Tried to get the buildALignmentGraph() function to work with a test.maf, I created. Problems occured, when using Align objects. Apparently it tries to create an edge where source = target. Milestone wasn't reached. Wrote the interface for the abruijn program.
04.09. - 18.09. Fixed the problem with the Align objects (made some major changes in the maf parser header). Wrote my own (simplified) buildAlignmentGraph() function and a function to convert an Abruijn graph to an Alignment graph.

Background

Genome alignment compares whole genomes of different species or individuals with the aim of identifying homologous regions. Challenges are the huge amount of data involved and the presence of evolutionary events that affect the structure of a genome. Graphs have proven to be a versatile data structure to cope with both challenges ([5],[6]).

At the moment, there are four dominant graph structures that are able to represent a genome alignment of multiple genomes under the presence of large evolutionary events ([1]-[4]). Appearing very different at first glance, the graphs share similar substructures and modification schemes.

Topic

The SeqAn library provides a generic interface for graph components and already implements an alignment graph. Within the scope of this thesis, the task is to extend the alignment graph and implement a basic version of the A-Bruijn graph in SeqAn. A further step is to implement a (simple) transformation that converts one graph into the other, and then test the performance of both graphs in terms of construction time, space and transformation time (and space).

This thesis is integrated into a larger project, where in the first part the above mentioned graphs have been compared from a theoretical point of view. The thesis is the first step of a potential proceeding part, where the graphs are compared based on their applicability and time and space consumption regarding construction and modification operations performed on the graphs. This requires sound and comparable implementations of the graph specialisations, that are able to build up (and transform) a graph in adequate run time given large sets of genome data.

Comments

If the project is part of a bachelor thesis the student might consider doing an internship using the SeqAn library beforehand.

Literatur

[1] Kececioglu J: The maximum weight trace problem in multiple sequence alignment. In Proceedings of the 4th Symposium on Combinatorial Pattern Matching (CPM), Volume 684 of Lecture Notes in Computer Science, Springer-Verlag 1993:106–119. (alignment graph)

[2] Pevzner PA, Tang H, Tesler G: De novo repeat classification and fragment assembly. Genome Res 2004, 14(9):1786–1796. (A-Bruijn graph)

[2a] Raphael B, Zhi D, Tang H, Pevzner P: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res 2004, 14(11):2336–2346. (A-Bruijn graph)

[3] Paten B, Herrero J, Beal K, Fitzgerald S, Birney E: Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 2008, 18(11):1814–1828. (Enredo graph)

[4] Paten B, Diekhans M, Earl D, John JS, Ma J, Suh B, Haussler D: Cactus graphs for genome comparisons. J Comput Biol 2011, 18(3):469–481. (Cactus graph)

Whole-genome alignment overview paper:

[5] Dewey CN, Pachter L.: Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Human Mol Gen 2006, 15:R51-R56.

[6] Dubchak I, Poliakov A, Kislyuk A, Brudno M: Multiple whole-genome alignments without a reference organism. Genome Res 2009, 19(4):682–689.

 
Topic revision: r9 - 19 Sep 2013, kiekemo
 
  • Printable version of this topic (p) Printable version of this topic (p)