Algorithmic Bioinformatics

Felix Heeger

Suffix Tree Based Palindrome Detection

Academic Advisor: Knut Reinert , David Weese
Discipline: Bioinformatics
Degree: Bachelor of Science (B.Sc.)
Degree: Sep 28, 2009
Start: Aug 04, 2009
Status: finished


Using the suffix array implementation available in SeqAn, the goal is to design, develop and experimentally verify an algorithm in SeqAn that finds all maximal complemented RNA palindromes of significant length in a given RNA sequence.




An example of a maximal complemented RNA palindrome in the sequence CCAUUAGCUAAUU is, for instance, AUUAGCUAAU. As you can see the reverse complement of the second part CUAAU is AUUAG, which matches the first part of the sequence. Hence, the substring AUUAGCUAAU is a complemented RNA palindrome. It is maximal because it cannot be extended to the left and right. The complemented RNA palindromes of interest are those that are actually not adjacent to each other. When the two halves are not adjacent, we call the palindrome a separated palindrome. An example would be AUUAG***CUAAU with * being any character from the RNA alphabet. Such structures occur frequently in RNA sequences because they form the well-know stem and hairpin structure. The goal of this thesis is to identify maximal non-separated and maximal separated palindromes up to a gap length k between the two halves of the palindrome. That is, k is number of *'s in the above example.



[1] D. Gusfield, Algorithms on strings, trees, and sequences. (1997), chapter 9, page 196-199.