AssignmentWeek3 < ABI

You are here: Foswiki>ABI Web>LectureWiki>AdvancedAlgorithms12>AssignmentWeek3 (12 Jun 2012, SandroAndreotti)Edit Attach

Exercise 3

Compression with BWT

Deadline: 24.06.2012 6:00 p.m.

Implement a program that (Mode c)

reads a single sequence from a fasta file
calculates the BWT of that sequence
implements move-to-front encoding and Huffman coding to compress the BWT
writes the Huffman code into an outfile

Mode x

reads a file containing the BWT compression of some sequence
writes the uncompressed sequence into a fasta outfile.

Remarks

For the compressed file it would be optimal if you output the Huffman as binary bitstring. But this is not required. As an alternative you can also use a single char for each bit, so you can output a string of ones and zeros. To obtain the compression rate you can then divide the file size by 8 (which would correspond to the size if binary bitstring was used).

Format specifications
The file format for the compressed file shall be the following:

Alphabet (first line)
<string of all characters that appeared in the sequence e.g. AGCT (required to revert the move to front encoding)>
HuffmanCodes (third line)
0   <code for 0> 
1   <code for 1>
2   <code for 2>
3   <code for 3>
<SequenceName starting with a ">"  (e.g. ) (>random10M)>
<Huffman code of sequence either as binary bit string or as string of ones and zeros>

The program usage should be: ./exercise3 <input file> <output file> <mode> where:

mode: single char either c (compress) or x (extract).
for mode c input file is fasta file and output file is compressed file
for more x input file is compressed file and output file is fasta file

For the application note you should test your program on different sequences with different alphabets. Compare the compression rate to that obtained by other file compression tools.

Topic revision: r5 - 12 Jun 2012, SandroAndreotti - This page was cached on 11 Mar 2025 - 03:27.

ABI

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback