Page AdvancedAlgorithms11

Welcome to the Wiki of the Lecture Advanced Algorithms in Bioinformatics (P4).

News

  • 28.09. (Weese): The second examination (Nachklausur) will take place on Wednesday, 19.10., 2-4 pm, in room 005 (T9).
  • 12.07. (Weese): The examination results are online.
  • 31.05. (Weese): The examination (Klausur) will take place on Wednesday, 29.06., in room 006 (T9).
  • 07.04. (Weese): Please sign up for this lecture at https://www.mi.fu-berlin.de/kvv/course.htm?cid=9685&sid=22 (updated URL)
  • 21.03. (Weese): The first lecture will be given on Monday, 11.04., at 2:15 pm, room SR FB (1.1.16) in Arnimallee 14.
  • 21.03. (Weese): Important/urgent information will be distributed via our mailing list. Please sign up at https://lists.fu-berlin.de/listinfo/bioinf-P4-2011

Exam Results

Mat Pts Grade
xxx4940 62,5 2,0
xxx1192 27 5,0
xxx0760 0 5,0
xxx9983 0 5,0
xxx2042 45 3,3
xxx2152 49 3,0
xxx1288 0 5,0
xxx4523 0 5,0
xxx0284 39 3,7
xxx2000 71 1,3
xxx2779 0 5,0
xxx6890 62,5 2,0
xxx6034 39 3,7
xxx1235 65 1,7
xxx9546 68 1,3
xxx9672 57 2,3
xxx9469 43 3,3
xxx2200 43,5 3,3
xxx4248 43 3,3

Schedule

Event Day Time Address Room
Lecture Mon 14-16 Arnimallee 14 SR FB (1.1.16)
Lecture Fri 14-16 Takustr. 9 006
Exercise Wed 14-16 Takustr. 9 006

Date Lecture Block Lecturer
11.04.-06.05. Online String Searching and Filtering David Weese
09.05.-30.05. Index Data Structures David Weese
11.05. Review 1  
03.06.-17.06. Genome Comparison and RNA Analysis Sandro Andreotti
08.06. Review 2  
29.06. Klausur  
18.06.-16.07. Individual Projects  

Exercises

You can download the exercise sheets on the subpages of the different lecture blocks.

Reviews

Requirements for Aktive Teilnahme

You have to hand in 75% of all exercise problems. (An exercise problem counts, if it is clearly visible that a solution was attempted for some time.) In addition you need to reach 50% of all points of the two reviews.

Projects

The time from 18.06. to 16.07. is reserved for your projects. Please pick a project from below and sign in on this (coming soon) page. You are expected to write a C++ program which you have to successfully present and explain in a code review.

For additional information and material check: P4Projects

Project A: Suffix array search (Firdevs, Hatice)
  • Student 1
    • Read in a text from disk
    • Construct the suffix array with Quicksort or an algorithm of this lecture
    • Construct the lcp-table with the Kasai algorithm
    • Write the 2 tables to disk
  • Student 2
    • Read in a text the corresponding suffix array and lcp table from disk
    • Implement the suffix array search with mlr-heuristic
    • Implement the O(m + log n) suffix array search and therefor construct an lcp-interval tree
    • Compare the search times of both algorithms (use a genomic text and a large set of substrings)

Project B: FM-Index (Arthur, Felix, Duc)
  • Student 1
    • Read in a text from disk and construct its suffix array (you are allowed to use the results of Project A, you could also read it in after executing the tool of Project A)
    • Construct the Burrows-Wheeler Transformation (BWT) using the suffix array from Project A
    • Construct the Occ and c-table
    • Provide a function for the L-to-F mapping (used by Student 2)
  • Student 2
    • Implement the exact match algorithm that uses BWT, Occ and c table to search a string in a text
  • Student 3
    • Use the exact match algorithm of Student 2 and backtracking to implement an approximate string search with up to k errors

Project C: Pigeonhole Filter (Aileen, Anke, Yvonne)
Implement the hierarchical filter (PEX) to find approximate matches of a pattern with up to k edit-errors in a text.
  • Student 1
    • You are given the name of a text file, a pattern P and the number of edit-errors k on command line
    • Read in the text from disk
    • Construct the hierarchical tree as described in the PEX algorithm
  • Student 2
    • Write a semi-global DP-aligner to search a substring of P in a substring of T with edit-errors
    • Implement the PEX search algorithm
  • (Optional: Student 3)
    • Implement a banded version of Myers' bit-vector algorithm as a replacement for the DP-aligner

Project D: QUASAR (Daniel, Jörn)
Implement QUASAR to search semi-global alignments with up to k edit-errors of patterns in a text.
  • Student 1
    • Read in a text and construct the q-gram index (assume a DNA alphabet and q=10)
    • Read in a set of patterns (you can assume that all have the same size w)
    • For each pattern P
      • Count the occurrences of all overlapping q-grams of P in overlapping blocks of the text
  • Student 2
    • For each pattern P
      • For each block that reaches the threshold t (from q-gram lemma)
        • Verify if the block contains a semi-global alignment of P with up to k edit-errors
        • If so, output the pattern P, its end position in the text and the number of errors

Project E: Calculation of the exact q-gram threshold (Franzi, Christopher)
  • Student 1
    • Read in (from command line) a gapped shape, a text length m and a number of mismatches k
    • Calculate the exact gapped q-gram threshold t using the DP algorithm described in [1]
  • Student 2
    • Calculate a lower bound for t using the q-gram lemma
    • Calculate an upper bound using a greedy algorithm that tries to destroy as many q-grams as possible with k errors (see [2])
    • For different gapped shapes and values for m and k compare the results of Student 1 with your bounds

Project F: Skew algorithm (Mathias, Oliver, Jan-Martin)
Implement the skew-algorithm for the difference cover D = {1,2,4} and m = 7
  • Student 1
    • Sort (7 radix passes) and name the septuples T[i..i+6] where i mod 7 equals 1, 2 or 4
    • Create the text T' = t1 t2 t4 of septuple-names
    • Create suffix array A' of T' recursively
  • Student 2
    • Derive suffix array A124 from A'
    • Construct suffix arrays A0, A3, A5, A6 iteratively from A124
  • Student 3
    • Merge suffix arrays from Student 2 using a priority queue and appropriate comparisons

Project G: Nussinov SCFG (Martin, Kurt)
  • Student 1
    • Implement the Nussinov-SCFG for RNA-folding with the Nussinov CYK algorithm.
    • Together with Student 2 compare the folds predicted by your algorithm to those predicted by mfold and RNAFold.
  • Student 2
    • Implement a training procedure that uses RFAM-alignments as training input to learn the SCFG's parameters that can be used by Student 1.

References

Please note that some journals might only be accessible from granted subnets like the FU subnet (or VPN).

[1] Stefan Burkhardt, Juha Kärkkäinen, Better Filtering with Gapped q-Grams, Fundamenta Informaticae, 2003
[2] http://en.wikipedia.org/wiki/Maximum_coverage_problem

Topic revision: r23 - 01 May 2013, DavidWeese
 
  • Printable version of this topic (p) Printable version of this topic (p)