Over the last years, next-generation sequencing (NGS) has become the method of choice for genome analysis. For gene expression analysis the conventional microarrays are now being replaced by these sequenced-based methods which can identify and quantify rare transcripts without prior knowledge of a particular gene. In the group of Prof. Dr. Silke R. Sperling, whole transcriptome (mRNA-seq) and miRNome (miRNA- seq) data were generated from right ventricles of 22 patients with Tetralogy of Fallot (TOF) as well as from left and right ventricle of four healthy unaffected individuals (in total eight normal heart samples). TOF accounts for 7-10% of all congenital heart disease, which are the most common birth defect in human with an estimated incidence of around 1% in all live births. TOF is characterized by four cardiac features: ventricular septal defect with overriding aorta, right ventricular outflow tract obstruction and right ventricular hypertrophy. The first aim of this master thesis is to review the literature and existing latest methods for mRNA-seq and miRNA-seq data analysis. The second aim of this thesis is to setup an analysis pipeline comprising methods for read mapping, quantification of expressed mRNAs and miRNAs (including different isoforms and splice junctions) and differential expression analysis. The third aim of this thesis is perform a whole transcriptome and miRNome analysis in patients with TOF and healthy unaffected individuals using the NGS data from the group of Prof. Sperling.
There will be regular meetings to discuss progress and problems.
The following could be a good starting point for literature review. [1] RNA-Seq: a revolutionary tool for transcriptomics. Wang Z, Gerstein M, Snyder M. Nat Rev Genet. 2009 Jan;10(1):57-63.
[2] RazerS 3: Faster, fully sensitive read mapping. Weese D, Holtgrewe M, Reinert K. Bioinformatics. 2012 Aug 24.
[3] MicroRazerS: rapid alignment of small RNA reads. Emde AK, Grunert M, Weese D, Reinert K, Sperling SR. Bioinformatics. 2010 Jan 1;26(1):123-4.
[4] edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Robinson MD, McCarthy DJ, Smyth GK. Bioinformatics. 2010 Jan 1;26(1):139-40. [5] http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html