Thema der Dissertation:
Exploiting proteomics data
Thema der Disputation:
Bioinformatic Challenges in Statistical Analysis of RNA-Seq Data
Transcriptome sequencing by RNA-Seq was 'expected to revolutionize the manner in which eukaryotic transcriptomes are analyzed' (Wang et al., Nature Reviews, 2009). In the recent years, RNA-Seq has become the ideal technique for high-throughput studies of gene expression and a competitive alternative to microarray technologies. Although RNA-Seq is a very powerful technology, it is also highly complex leading to many challenges especially in statistical data analysis.
We will present and discuss three problems in the computational analysis of RNA-Seq data: First, random hexamer priming, which is used to generate reads across the entire length of all expressed transcripts, results in a bias in the nucleotide composition at the start of sequencing reads (Hansen el al., Nuclear Acids Research, 2010). Second, there is clearly a positive association between read counts and gene length which is not entirely removed via scaling by gene length, as in the RPKM (Oshlack and Wakefield, Biology Direct, 2009; Bullard et al., BMC Bioinformatics, 2010). Third, repetitive gene sequences, pseudo-genes or gene copy make a unique mapping of reads to genomic locations very difficult and as an effect some genes are not or only party accessible. Two RNA-seq datasets are used to visualize these problems and to present adequate correction strategies: One of the first RNA-Seq datasets of mouse liver, brain and muscle (Mortazavi et al., Nature Methods, 2008) and more recent RNA-Seq data from mouse fibroblasts (Schwanhäusser et al., Nature, 2011).
Die Disputation besteht aus dem o. g. Vortrag, danach der Vorstellung der Dissertation einschließlich jeweils anschließenden Aussprachen.
Interessierte werden hiermit herzlich eingeladen
Der Vorsitzende der Promotionskommission
Prof. Dr. K. Reinert
Institut für Informatik, Takustr. 9, 14195 Berlin, Raum: SR 006