Sign up at ProgrammingGroupList for one of the exercises on Tuesday 12th and Wednesday 14th

Exercise 3

Compression with BWT

Deadline: 24.06.2012 6:00 p.m.

Implement a program that (Mode c)

Mode x

Remarks

For the compressed file it would be optimal if you output the Huffman as binary bitstring. But this is not required. As an alternative you can also use a single char for each bit, so you can output a string of ones and zeros. To obtain the compression rate you can then divide the file size by 8 (which would correspond to the size if binary bitstring was used).

Format specifications
The file format for the compressed file shall be the following:
Alphabet (first line)
<string of all characters that appeared in the sequence e.g. AGCT (required to revert the move to front encoding)>
HuffmanCodes (third line)
0   <code for 0> 
1   <code for 1>
2   <code for 2>
3   <code for 3>
<SequenceName starting with a ">"  (e.g. ) (>random10M)>
<Huffman code of sequence either as binary bit string or as string of ones and zeros>
The program usage should be: ./exercise3 <input file> <output file> <mode> where:

For the application note you should test your program on different sequences with different alphabets. Compare the compression rate to that obtained by other file compression tools.