Department Computational Molecular Biology(Martin Vingron)
Computational Molecular Biology Research Groups
The field of transcriptional regulation has gone through a rapid development over the last couple of years. This is due to the plethora of whole-genome sequence data and the functional genomics data on gene expression, DNA-binding proteins, and epigenetics, which have become available (e.g., the ENCODE data). The group works on exploiting this data for the purpose of gaining a better understanding of transcriptional regulation in eukaryotes. The main questions lie in the identification of the regulatory sequence motifs and the interplay between epigenetic marks and regulation. To this end, we develop methods and analyse particular data sets. The ultimate goal is to unravel biological networks and pinpoint possible transcriptional mechanisms behind the interactions.
The main focus of the group is the development of algorithms and tools for the analysis of next-generation sequencing (NGS) data. The recent advances in high-throughput sequencing technologies led to an enormous increase in the amount of data generated. Furthermore, steadily improving and newly emerging technologies require a continuous adaptation of existing software. In many cases dedicated software has to be developed to efficiently handle and analyze the vast amount of sequencing data. While a few processing steps like quality control and mapping are quite independent of the sequencing application (e.g. ChIP-seq, re-sequencing), for each application specific software is needed to address the questions of interest.
Therefore, the group put strong emphasis on setting up an efficient processing infrastructure that allows to cope with sequencing data even for large cohorts of samples in a short time period. As a basic requirement for mutation screening and transcriptome analysis we developed and published comprehensive tool sets for variant detection and transcript expression analysis, respectively. Our processing and data management pipeline is an essential prerequisite to successfully address questions in e.g. cancer genomics or diagnostics where large sample numbers have to be analyzed. All algorithms developed in the group were optimized and validated experimentally in collaboration with the respective laboratories (Ropers, Yaspo). As a result of these tight interactions we published not only new algorithms but also their application to large-scale projects that otherwise could not have been tackled.
The bioinformatics group develops methods and tools for the analysis and interpretation of biological data, predominantly in the domain of high-throughput sequencing, and subsequent interpretation of that data at the level of human in- teraction networks. The group has published 44 scientific publications during the reporting period. In several national and international consortia, we apply these tools and resources to the study of human disease processes (e.g. cancer, renal disorders and toxicology). The work of the group is structured in 1. methods development, 2. resources development and 3. applications to human diseases.
The long-term goal of our group is to understand how a single transcription factor can regulate vastly different sets of genes depending on the cell type and to identify and study processes that influence the expression level of individual genes.
We study transcriptional regulation using the glucocorticoid receptor (GR), a member of the steroid hormone receptor family. Upon hormone stimulation, GR binds to specific DNA sequences to regulate the expression of target genes. Although GR is expressed throughout the body, the genes regulated and the genomic loci bound by GR show little overlap between cell-types. Current efforts are aimed to investigate the role of sequence motifs and chromatin in cell-type-specific genomic binding and transcriptional regulation. Further, we study signals involved in fine-tuning expression levels of individual target genes, specifically the role of DNA as a ligand that allosterically modulates the activity of GR.
Unravelling the evolutionary forces responsible for variation in genomes within one species or divergence between two species is a major scientific challenge. Today, genomes of many species and of individuals within species have been sequenced. This gives us the unprecedented opportunity for a quantitative analysis of these data with respect to evolutionary aspects. Due to advances in next generation sequencing technologies and the availability of public databases this analysis is possible with more power and precision than before.
We use data on variation within one genome and comparative genomics to learn more about the processes that shape the genome of humans and other species. We investigate processes on short length scales, e.g. nucleotide substitutions, insertions and deletions and long length scales, e.g. insertions of repetitive elements and duplications. Our analyses are complemented by studies of the mathematical underpinnings of models for nucleotide substitutions and phylogeny as well as experimental approaches to study selection in vitro.