This work presents a useful data structure to manage multiple biological sequences both efficiently and generically. The objective is to create a data structure to aggregate simila- rities between biological sequences. The implemented method ensures that the contai- ned sequences can be compressed while simultaneously the efficient analysis of multiple sequences exploiting parallel computing approaches is supported. The development of new techniques in both categories has a strong impact on the state of the art research in the field of genome research, especially since the current development and wide spread use of next generation sequencing is growing exceptionally fast.
Literature:
[1] P. Pyl, (2010). Incremental Index Structures
[2] A. Döring, D. Weese, T. Rausch, and K. Reinert, SeqAn - An efficient, generic C++ library for sequence analysis, BMC Bioinformatics 2008, 9:11