Over the past few years it has been observed that a big portion of the genome is transcribed at some point in some tissue, and most of the detected transcripts in mammals and other complex organisms are non-coding RNAs (ncRNAs), transcripts which do not encode for proteins. Although the functional consequences of diﬀerent ncRNA classes are not yet fully understood, this does not mean that they do not contain information nor have functions.
Recently, NcRNAs have emerged as important key players in several biological processes and diseases. In our group we are interested in the functional mechanisms of non-coding RNAs which act as regulators of gene expression, their interplay with Transcription Factors (TFs), epigenetic marks, genetic variants and RNA Binding Proteins (RBPs). High-throughput genomics experiments provide a rich source of information, but in order to make sense of big and heterogeneous amount of data, data-integration pipelines, as well as adequate statistical models and algorithms are necessary.
We are a young and heterogeneous research group interested in unraveling the ‘secrets’ of non-coding RNAs by means of Bioinformatics methods. Our ultimate goal is to contribute to narrow the huge gap between the number of annotated non-coding RNAs and the number of non-coding RNAs with known function in the cell. Our preferential tools of investigations are both supervised (regression models, SVMs, Trees, NNs), as well as unsupervised machine learning methods (clustering, HMMs) which model several genomic data in order to answer questions such as :
- What is the role of Single Nucleotide Polymorphisms and Epigenetics marks in microRNA regulation and its implications for cancer disorders?
- How many non-coding RNAs are actually ‘functional’ and how many are ‘transcriptional noise’ ?
- Do lincRNAs participate in gene-regulatory networks and how do they interact with their target genes?
- How do RNA Binding Proteins modulate RNA function in different biological context?
- How can we confidently find non-coding RNAs and RBPs bio-markers in Immune-related processes?
Transcriptional regulation of microRNAs
While we are currently working on an updated version of the PROmiRNA software (see previous projects) to predict microRNA promoters on a high-depth FANTOM5 data, the location of microRNA promoters genome-wide enables us to model and study different aspects of microRNA regulation by means of diverse statistical models. For example, it allows us to model microRNA expression based on location and features of genetic variants and histone modifications. In collaboration with the group of Dr. Marie Laure Yaspo at MPI Molgen, Lisa Barros, a PhD student in our lab, is applying such models to cancer patient data in order to discover potential biomarkers of colorectal cancer among microRNA-related features.
Small RNAs in the pathogenesis of bacterial and viral infections
As part of the SFB-TR48 consortium (http://www.sfb-tr84.de/) we are collaborating with the experimental group of Prof. Bernd Schmeck at the Institute for Lung Research at the Philipps University of Marburg, to characterize the regulation of microRNAs in L. pneumophila infection, an intracellular pathogen causing severe pneumonia. Our preliminary results show a time-dependent de-regulation of several human miRNAs, as well as bacterial RNAs upon a time-course infection. A post-doc in our lab, Brian Caffrey, integrates different types of high-throughput data (e.g. RNA-seq, ChIP-Seq data and proteomics experiments) in order to reconstruct the small RNA-mediated regulatory network describing the host-pathogen cross-talk during the inflammation process.
Protein-RNA interaction and footprints from CLIP-seq data
RNAs do not act alone to perform their function but often associate with several RNA-binding proteins which determine their fate or mechanism of action. Interactions of RNAs with RNA-binding proteins can be detected via technologies such as CLIP-seq. This technology is relatively new and so far few methods have been developed to reliably identify binding sites above noise, and in the presence of appropriate controls. Sabrina Krakau, a PhD candidate in our lab, is designing and implementing an algorithm to reliably identify RBP binding sites from CLIP-seq high-throughput experiments taking into account several sources of technical and biological bias.
Together with our experimental partners Dr. Thomas Conrad and Dr. Ulf Orom from the long ncRNA research group at MPI Molgen we will use Sabrina’s method to characterize the function and binding partners of newly discovered RNA Binding Proteins investigated in the lab.
RNA Binding Proteins (RBPs) influence RNA fate and processing in several ways and have been shown to be involved in many human diseases, such as neurological disorders and cancer. In lower organisms, such as bacteria, RBPs might contribute to pathogenicity in host cells by ‘mimicking’ host-specific RNA-binding mechanisms. In two different projects we use machine learning methods to predict sequence and structure specificity of RBPs in different organisms (Annkatrin Bressin, master student), as well as specific RNA motifs which are recognized by a particular protein (David Heller, master student). Collaborations: Dr. Benedikt Beckmann, IRI Humboldt University and Dr. Ralf Krestel, Hasso-Plattner-Institut Potsdam.
Long non-coding RNAs: functional classification and involvement in gene-regulatory networks
Stefan Budach, a PhD student in our lab, is embarked in establishing a comprehensive functional classification of known and novel long intergenic RNAs (lincRNAs). Most lincRNAs show little evidence for evolutionary conservation and therefore their function cannot be inferred from sequence alone based on homology, as it is often done for protein-coding genes. In collaboration with Knut Reinert’s group at the FU Berlin we are extending previous RNA motif finding algorithms to cluster lincRNAs into functional groups based on structural motifs, given that structures evolve slower than sequences.
Recent studies have reported enhancer functions of long non-coding RNAs, pointing to active transcription of previously identified enhancers. Several long non-coding RNAs have been shown to exhibit correlated expression values with ‘selected’ protein-coding genes, which allow us to infer direct or indirect association between them. In addition, long ncRNAs have have been shown to physically connect the genomic regions of regulated genes, thereby mediating gene activation or enhancer function through direct 3D chromatin interactions. We use sparse regression methods as well as network analysis of chromatin conformation data to prioritize significant gene-lincRNA interactions and identify biologically significant regulatory modules (Collaboration: Prof. Heike Siebert, Dr. Natasa Conrad FU Berlin).
An interesting case of study for us is the interaction network mediated by long non-coding RNA Xist. In collaboration with the lab of Dr. Edda Schulz, MPI Molgen we apply machine learning method to understand important features of Xist-mediated gene silencing.
In my previous work, I developed a semi-supervised machine learning method for miRNA promoter recognition called PROmiRNA . The application of PROmiRNA to the human genome allowed us for the first time to study the characteristics of regulatory elements and transcription factors of different miRNA promoter classes.
PROmiRNA website http://promirna.molgen.mpg.de/
Global mature miRNA expression is not only regulated at transcriptional level, but several post-transcriptional steps influence the final miRNA expression level. In our previous work, together with our experimental partners from the group of Dr. Ulf Orom, we have defined a quantitative measure of miRNA processing from RNA-Seq data and built a classification model to discriminate efficient from non-efficient processing based on sequence features, i.e. specific and degenerate k-mers .
 Annalisa Marsico, Matthew R Huska, Julia Lasserre, Haiyang Hu, Dubravka Vucicevic, Anne Musahl, Ulf Andersson Orom and Martin Vingron
PROmiRNA: a new miRNA promoter recognition method uncovers the complex regulation of intronic miRNAs. Genome Biology 2013
 Thomas Conrad*, Annalisa Marsico*, Maja Gehre and Ulf Andersson Orom
Sequence-dependent Microprocessor activity regulates dynamics of miRNA biogenesis.
Cell Reports 2014