Page ThesisAnnoAlign

Brief overview

Next-generation sequencing methods have given rise to a whole new set of algorithmic problems. Using the principles of huge amounts of copies, cleaved into short reads, RNASeq brings the advantages of NGS into transcriptome research. It has superior detection bandwidth and accuracy in comparison with tradition methods like microarrays and thus can lead to new revelations in expression analysis.

One major challenge in NGS techniques is the reassembly of reads into a consensus sequence. Transcriptome research, in contrast to genome sequencing, deals with a whole set of source sequences in different expression levels, due to splicing events. This feature leads to some unique questions. Some of these questions at hand are:

And last, but not least:

In this thesis we propose a modular pipeline that will adress some of these fields.

Since gene annotation and splicing sites are topics that have been long researched with lots of data readily available, we choose to use integrate them into our program. In a first module, all needed data is loaded into the program and a set of possible splicing products based on the observed reads is efficiently created. The module then outputs these so called proxies and read data into an exchangeable Read Mapper. In early stages of development we will focus on RazerS, which can handle the vast algorithmic effort needed to align the (short) proxies to the (long) reference genome at great efficiency. A second module then loads and analyses the alignment data to gain information on involved genes and their splicing products and providing a sound basis for expression analysis within the pipeline.

Weekly Reports

ThesisAnnoAlignReports

Timeframe

21.10.2009 - ??

Comments