General information for programming exercises

Exercise 1

Read mapping with semi global alignment

Deadline: 02.05.2012 9:00 a.m.
  • Write a C++ program that:
    • reads two fast files genome.fa and reads.fa where genome.fa contains a single sequence and reads.fa contains an unknown number of reads.
    • uses semi global alignment with dynamic programming for each read to find all matching positions with edit distance at most k.
    • implements the semi global alignment with and without the Ukkonen trick.
    • creates a file containing for each read all hits (startPos, endPos, errors).

  • Test your program on the given Dataset (will be added)
  • Compare the performance of the naive approach and the Ukkonen trick. Also use the RazerS read mapper in Seqan and compare the runtimes.
  • Write a short application note (no more than 1-2 pages) where you describe your implementation and your observations.

Format specifications

The program usage should be: ./exercise1 <genome.fasta> <reads.fasta> <k> <m> where k its the number of allowed errors and m is either 0 (without Ukkonen) or 1 (with Ukkonen)

The program should produce an output file named hits.result with one line for each hit in the format:

<read_id>, <start_pos>, <end_pos>, <errors>

Point scheme

For each exercise you can score 4 Points, 3 for the program and 1 for the application note. For the program:
3 Pts = program runs without errors
2 Pts = program contains small errors
1 Pts = program contains critical errors
0 Pts = no program submitted

Requirements: You have to reach at least 75% of the points summed for all programming exercises
Topic revision: r8 - 30 Apr 2012, SandroAndreotti
 
  • Printable version of this topic (p) Printable version of this topic (p)