Assessment of off-target effects of non-coding RNAs

Bioinformatics or computational biology is one of the fast growing and most exciting fields in science. Through this project, you will learn how your computer science and statistical background can be used for solving biological questions. You will also learn software development process with SeqAn.

The project will focus on small-interfering RNA(siRNA), which is a key non-coding RNA(ncRNA) in RNA interference(RNAi). You can continue your Bachelor project after completion. In this case, you will analyze off-target effects of other non-coding RNA-mediated mechanisms such as CRISPR-Cas9. (Fire and Mello received the Nobel Prize in 2006 for RNAi. And CRISPR-Cas9 was one of top 10 breakthroughs in 2014 according to MIT technology review.)

All the implementation will be based on SeqAn, so you should be able to use C++. The students with basic knowledge of statistics and molecular biology are preferred, but not required. Unfortunately, the mentor can't speak German. If you need a help in German, you should consider other mentors.


There is a central dogma of molecular biology which states that the genetic information flows from DNA to RNA to protein("DNA makes RNA and RNA makes protein"). However, scientists have found few exceptions. ncRNA is one of those things. These RNAs do not encode proteins but have an important regulatory role in many biological processes.

Wahlestedt, Nature Drug Discovery, 2013

siRNA is a ncRNA which involves in RNA interference and regulates gene expressions post-transcriptionally. Scientists have used this for research since it can induce selective knockdown(reducing of expression) of genes of interest. Several pharmaceutical companies also have invested in developing of siRNA-based therapeutics. The mechanism of recognizing targets is illustrated as below (a).

Lam et al, Molecular Therapy Nucleic Acids, 2015

However, there is an off-target effect (b). Sometimes introduced siRNA acts like MicroRNA(miRNA), another type of ncRNA that regulates gene expressions by mRNA degradation or translational repression. In this case, 6 nucleotides at 2-7 positions from 5' end of siRNA, which we call the "seed region" is very important in target recognition. Thus, this miRNA-like off-target effect in transcriptome can be easily shown, by analyzing overrepresented k-mer in 3'UTRs of downregulated genes. In other words, if the 6-mer that is reverse-complement to the seed region is significantly enriched in 3'UTRs of downregulated genes, we can say the transcriptome exhibits severe off-target effects.

Goals of the PMSB

You will be familiar with:

Computer Science
  • Software development process with SeqAn including testing and documentation.
  • Standard file formats in bioinformatics including FASTA, GTF, Refflat.. , etc.
  • Efficient algorithms for sequence analysis including k-mer counting

  • Central dogma and several regulatory mechanisms including RNAi.
  • Current understanding of transcriptome.

  • Statistical tests that are frequently used in biomedical research including Ranksum, K-S, Chi-square, and Fisher's exact test

Expected Output

Sylamer : Stjin et al, Nature Method, 2008 [1]

  • Input: 1) gene expression levels(based on transcriptome data), 2) 3'UTR sequences 3) Seed sequences of known miRNAs
  • Output: 1) miRNA-like off-target scores, 2) potential seed sites of known miRNAs (if any).
  • All the necessary raw data including transcriptome will be given by the mentor.
  • Statistical model for assessing overrepresented k-mer will be given by the mentor based on [1]. But, of course, you can propose one.
  • Vivid graphics are optional.

Possible continuation as Bachelor's Project

  • Make a SeqAn tool based on [1] which can be integrated into KNIME workflow.
  • For an ambitious student) Add analysis module for CRISPR-Cas9 off-target effects. You have to study about structural variation, which is very complex but another exciting subject in genome analysis.


[1] Stjin et al, Detecting microRNA binding and siRNA off-target effects from expression data, Nature Method, 2008
Topic revision: r1 - 08 Feb 2017, kjk
  • Printable version of this topic (p) Printable version of this topic (p)