Add BAM support to SeqAn SequenceFile

In this project you will be asked to write an interface that will help the Sequence IO of SeqAn to seamlessly read SAM/BAM files as if they are FASTA/FASTQ files.

Introduction

SAM/BAM Files: These are common file formats that are used to store alignment information of a short sequences (often called as “reads”) with respect to a reference sequence, which is usually a longer sequence. To know more about what a SAM/BAM files looks like read the specification at https://samtools.github.io/hts-specs/SAMv1.pdf.

FASTA/FASTQ Files: These file formats are used for storing biological sequences. This could be any of DNA, RNA or Protein sequences.

The SeqAn FormattedFile class supports reading and writing of both FASTA/FASTQ sequence files and SAM/BAM alignment files. But many people utilize SAM/BAM alignment files only for the sequences inside them discarding the mapping information. Which means Given a SAM/BAM file one wants to extract the sequences and their corresponding identifiers and qualities.

Tasks

Stretch goals

Extension as Bachelor Project

TODO

Literature