Chris Bauer, M.Sc.: Disputation der Doktorarbeit

07.05.2013 | 16:00


Betreuung: Prof. Dr. K. Reinert

Thema der Dissertation:

Exploiting proteomics data

Thema der Disputation:

Bioinformatic Challenges in Statistical Analysis of RNA-Seq Data

Transcriptome sequencing by RNA-Seq was 'expected to revolutionize the manner in which eukaryotic transcriptomes are analyzed' (Wang et al., Nature Reviews, 2009). In the recent years, RNA-Seq has become the ideal technique for high-throughput studies of gene expression and a competitive alternative to microarray technologies. Although RNA-Seq is a very powerful technology, it is also highly complex leading to many challenges especially in statistical data analysis.

We will present and discuss three problems in the computational analysis of RNA-Seq data: First, random hexamer priming, which is used to generate reads across the entire length of all expressed transcripts, results in a bias in the nucleotide composition at the start of sequencing reads (Hansen el al., Nuclear Acids Research, 2010). Second, there is clearly a positive association between read counts and gene length which is not entirely removed via scaling by gene length, as in the RPKM (Oshlack and Wakefield, Biology Direct, 2009; Bullard et al., BMC Bioinformatics, 2010). Third, repetitive gene sequences, pseudo-genes or gene copy make a unique mapping of reads to genomic locations very difficult and as an effect some genes are not or only party accessible. Two RNA-seq datasets are used to visualize these problems and to present adequate correction strategies: One of the first RNA-Seq datasets of mouse liver, brain and muscle (Mortazavi et al., Nature Methods, 2008) and more recent RNA-Seq data from mouse fibroblasts (Schwanhäusser et al., Nature, 2011).

Zeit & Ort

Institut für Informatik, Takustr. 9, 14195 Berlin, Raum: SR 006