Research

Our projects

Epigenetic gene regulation

CENTRE predicts enhancer–promoter interactions specific to each cell type. Here, a particular interaction receives a high score in GM12878 cells but a low score in K562 cells.

CENTRE predicts enhancer–promoter interactions specific to each cell type. Here, a particular interaction receives a high score in GM12878 cells but a low score in K562 cells.

Epigenetic data such as histone modifications (HMs) help predict enhancers, as we demonstrated in earlier papers, including our CRUP algorithm for predicting enhancers from HM ChIP-seq data. In a typical use case, if a lab needs to determine the location of enhancers in a new cell type, generating only three HM ChIP-seq datasets is sufficient for the algorithm to produce a reliable enhancer prediction.

We subsequently approached the problem of predicting enhancer–promoter interactions in the same way. Can these interactions be predicted based on only a small set of experimental data that a lab would need to produce for a new cell type? Our new CENTRE algorithm does exactly this. It combines publicly available DNA data with a relatively small set of cell-type-specific data. This set includes the same three HMs used by CRUP, plus RNA-seq data to describe gene activity. With this information, our algorithm predicts interacting enhancer–promoter pairs as well as – or better than – leading methods that rely on a wide array of input data [Rapakoulia et al., Bioinformatics, 2023], and which are therefore costlier to apply to a new cell type.

DNA accessibility, as measured, for example, by ATAC-seq, is another highly informative feature regarding the regulatory potential of a region. In fact, cell types are characterized equally by their DNA accessibility patterns and their gene expression profiles. While single cells can now be classified into their cell type using scRNA-seq, it remains challenging to determine cell type from scATAC-seq profiles. We developed scATACat, an algorithm that infers a cell’s type from scATAC-seq data using reference cell types defined by their typical DNA accessibility profiles [Altay et al., NAR Genomics and Bioinformatics, 2024]. We see this as a step toward identifying accessible regions that define cell-type identity and in which we expect to find regulatory signals driving specificity. Along these lines, we also collaborated with Prof. Petra Knaus (Freie Universität) to analyze accessibility profiles of cells under shear stress.

In ongoing work, we are integrating accessibility into enhancer prediction. This has led us to examine in detail the characteristics of enhancers close to promoters (“proximal enhancers”). Furthermore, we are developing machine learning algorithms that use epigenetic data together with transcription factor binding motifs to improve predictions of cell-type-specific transcription factor binding.

Trisevgeni Rapakoulia, Sara Lopez Ruiz de Vargas, Persia Akbari-Omgba, Verena Laupert, Igor Ulitsky, and Martin Vingron, "CENTRE: a gradient boosting algorithm for Cell-type-specific ENhancer-Target pREdiction," Bioinformatics 39 (11), btad687 (2023).

MPG.PuRe

DOI

publisher-version

Aybuge Altay and Martin Vingron, "scATAcat: cell-type annotation for scATAC-seq data," NAR: genomics and bioinformatics 6 (4), Article lqae135 (2024).

MPG.PuRe

DOI

publisher-version

Single-cell data analysis: clustering, visualization, batch correction

UMAP of a single-cell RNA-seq dataset with clusters indicated by color. Overlaid dots represent genes characteristic of each cluster. Some genes are labeled; in the interactive version, users can hover over the dots to view all gene names.

UMAP of a single-cell RNA-seq dataset with clusters indicated by color. Overlaid dots represent genes characteristic of each cluster. Some genes are labeled; in the interactive version, users can hover over the dots to view all gene names.

Motivated by collaborations involving single-cell transcriptomics, we developed methods to address problems that we found had received too little attention in the literature. In 2022, we published the concept of Association Plots [Gralinska et al., J. Mol. Biol., 2022; Proc. Royal Statistical Society Series C, 2023], which comprehensively visualize the genes associated with a particular cluster of cells. Genes are represented by dots: the further to the right a gene lies, the stronger its association with the cluster. These plots are based on the geometry of correspondence analysis, in which cell clusters and their respective marker genes lie along an axis emanating from the origin of the coordinate system.

Building on this idea, we developed CAbiNet, a tool that co-clusters cells and genes and presents the results so that the marker genes for a given cluster are visible within that cluster. Figure 2 below shows an example of what we call a biMAP: a UMAP overlay colored by cell cluster, with marker genes embedded within the corresponding clusters [Zhao, Kohl, et al., NAR, 2023]. In the interactive version, users can mouse over the dots embedded in a cluster to see the corresponding gene name.

For each cluster, we can now also generate an Association Plot, allowing us to visualize both the quality of the cluster and how strongly its marker genes characterize it. In ongoing work, we are developing an alternative algorithm that visually conveys the decisions made during clustering. This approach largely overcomes the curse of dimensionality by making high-dimensional geometry intuitively understandable.

Elzbieta Gralinska, Clemens Kohl, Sokhandan Fadakar, and Martin Vingron, "Visualizing Cluster-specific Genes from Single-cell Transcriptomics Data Using Association Plots," Journal of Molecular Biology 434 (11), 167525 (2022).

MPG.PuRe

DOI

publisher-version

Elzbieta Gralinska and Martin Vingron, "Association Plots: visualizing cluster-specific associations in high-dimensional correspondence analysis biplots," Journal of the Royal Statistical Society - Series C: Applied Statistics 72 (4), 1023-1040 (2023).

MPG.PuRe

DOI

publisher-version

Yan Zhao, Clemens Kohl, Daniel Rosebrock, Qinan Hu, Yuhui Hu, and Martin Vingron, "CAbiNet: joint clustering and visualization of cells and genes for single-cell transcriptomics," Nucleic Acids Research 52 (13), Article e57 (2024).

MPG.PuRe

DOI

publisher-version

Protein–protein interaction and intrinsically disordered regions

Motivated by discussions – and later collaboration – with Denes Hnisz’s group, we investigated whether the sequences of intrinsically disordered regions (IDRs) can inform us about which proteins physically interact. Predicting protein–protein interactions from sequence alone has been a longstanding challenge in bioinformatics. We set out to develop a machine learning approach aimed at predicting interactions based specifically on the IDRs of the participating proteins [Kibar et al., Proteins, 2023].

During the development of this algorithm, we learned a few unexpected lessons. First, when comparing predictions based on entire protein sequences with those based only on the IDRs, we found that the IDR-based predictions were as good as – or even better than – those using full sequences. This confirms that IDRs play a key role in protein interactions. Second, during evaluation, we realized that the problem itself is generally ill defined in the literature. To address this, we proposed two distinct problem formulations: the “symmetric” and “asymmetric” cases. In the symmetric case, neither of the two sequences appears in the training set. In the asymmetric case, one of the two sequences does. As one anonymous reviewer noted: “This is an interesting and important study that adds significantly to the field … the authors are congratulated for their study and for the clear demarcation between the asymmetric and symmetric problems.” These new definitions help explain discrepancies in reported method performance and now allow the two cases to be addressed more systematically.

Building on our shared interest with the Hnisz lab in IDRs, we also developed a novel algorithm to predict sequences that may constitute IDRs. Starting from the observation that aromatic side chains in IDRs often follow a near-periodic spacing, we modeled this feature by measuring the “non-randomness” of that spacing. Specifically, we collected the distances between consecutive aromatic residues and compared their distribution to a random expectation modeled by a Poisson process. In theory, the random distances should follow a geometric distribution. The deviation between the observed and expected distributions is quantified using a Kolmogorov–Smirnov test. This simple approach works surprisingly well and has allowed us to rapidly screen large protein sequence datasets for potential IDRs. For the full study, see [Naderi et al., Nature Cell Biology, 2024].

Gözde Kibar and Martin Vingron, "Prediction of protein–protein interactions using sequences ofintrinsically disordered regions," Proteins: Structure, Function, and Bioinformatics 91 (7), 980-990 (2023).

MPG.PuRe

DOI

publisher-version

The role of PHF13 in chromatin structure

Sarah Kinkley

From 2019 to 2024, Sarah Kinkley led a lab in our department focused on understanding the regulation of chromatin architecture and genome integrity. One of the group’s interests was exploring the impact and function of specific H3K4me3 epigenetic readers – namely the paralogs PHF13 and PHF23 – on epigenome regulation, chromatin structure, and genome integrity. The group also aimed to decipher the role of RNA–DNA hybrids (R-loops) as potential drivers of oncogenesis. To this end, they performed various screens to assess the incidence of R-loops, their role as a precursor state in oncogenesis, and their involvement in synthetic lethality.

PHF13 is an H3K4me3 epigenetic reader. Having developed CRISPR knock-out, degron, tagged, and inducible cell lines, the group examined the functional domains of this protein and how they impact PHF13’s genomic functions. It was found that PHF13 is able to oligomerize in two distinct ways: one via its ordered N-terminal and C-terminal domains, and another via its intrinsically disordered regions (IDRs). This differential oligomerization promoted PHF13 phase transitions, influencing its role in gene regulation and higher-order chromatin compaction.

Oligomerization via its ordered regions resulted in a multivalent, ordered chromatin protein that could extend across nucleosomes, drive global chromosome compaction, and promote strong changes in gene expression – consistent with polymer–polymer phase separation. Oligomerization via PHF13’s IDRs promoted the formation of condensates similar to liquid-like phase separation and also influenced gene expression, albeit targeting different genes and with a weaker amplitude [Rossi et al., Nucleic Acid Research, in revision].

Another major interest of the group is to decipher the role of R-loops (RNA–DNA hybrids) as potential drivers of oncogenesis. RNA–DNA hybrids are highly genotoxic when aberrantly formed or inefficiently resolved. As a result, cells have evolved many dedicated enzymes and mechanisms to limit their formation and eliminate these structures. However, many of the factors regulating these structures are frequently mutated or disrupted in cancer, suggesting that RNA–DNA hybrids may represent a common precursor state to oncogenesis.

Unfortunately, there is a lack of tools for high-throughput, in vivo visualization of these structures, hampering studies aimed at exploring these questions. To address this deficit, we are developing high-precision tools that enable comprehensive, real-time, genome-wide visualization of RNA–DNA hybrids. Using these tools, we aim to perform a series of screens to investigate the incidence of R-loops, their role as a precursor to oncogenesis, and their involvement in synthetic lethality.

Francesca Rossi, Alexandre P. Magalhães, René Buschow, Tobias Schubert , Laura Viola Glaser, Andrea Fontana, Julia Mai , Hannah Staege, Astrid Grimme, Hans Will, Sabrina Schriener, Denes Hnisz, Martin Vingron, Andrea M. Chiariello, and Sarah Kinkley, "Differential oligomerization regulates PHF13 chromatin affinity and function," Nucleic Acids Research 53 (12) (2025).

MPG.PuRe

DOI

publisher-version

Rupam Choudhury, Anuroop Venkateswaran Venkatasubramani, Jie Hua, Marco Borsò, Celeste Franconi, Sarah Kinkley, Ignasi Forné, and Axel Imhof, "The role of RNA in the maintenance of chromatin domains as revealed by antibody-mediated proximity labelling coupled to mass spectrometry," eLife 13, Article e95718 (2024).

MPG.PuRe

publisher-version

Ranjan Kumar Maji, Beate Czepukojc, Michael Scherer, Sascha Tierling, Cristina Cadenas, Kathrin Gianmoena, Nina Gasparoni, Karl Nordström, Gilles Gasparoni, Stephan Laggai, Xinyi Yang, Anupam Sinha, Peter Ebert, Maren Falk-Paulsen, Sarah Kinkley, Jessica Hoppstädter, Ho-Ryun Chung, Philip Rosenstiel, Jan G. Hengstler, Jörn Walter, Marcel Holger Schulz, Sonja M. Kessler, and Alexandra K. Kiemer, "Alterations in the Hepatocyte Epigenetic Landscape in Steatosis," Epigenetics & Chromatin 16, Article 30 (2023).

MPG.PuRe

DOI

publisher-version

Regulatory changes in evolutionary genomics

Stefan Haas

A cluster of ovotestis-specific enhancers (E1–E5) within the regulatory domain of the transcription factor SALL1 in the mole. Genome-wide analysis of enhancer activity and putative target gene expression identifies SALL1 as the top candidate gene in ovotestis. — A cluster of ovotestis-specific enhancers (E1–E5) within the regulatory domain of the transcription factor *SALL1* in the mole. Genome-wide analysis of enhancer activity and putative target gene expression identifies *SALL1* as the top candidate gene in ovotestis.

A cluster of ovotestis-specific enhancers (E1–E5) within the regulatory domain of the transcription factor *SALL1* in the mole. Genome-wide analysis of enhancer activity and putative target gene expression identifies *SALL1* as the top candidate gene in ovotestis.

Clade-specific genomic rearrangements are an important mechanism in evolutionary processes, contributing to novel clade-specific phenotypes by restructuring regulatory domains. In collaboration with the Mundlos group, we previously identified major regulatory genomic changes in moles linked to the mole-specific development of female ovotestis [Schindler et al., Development, 2023]. In a complementary approach, we used CRUP to analyze enhancer activity and putative target genes in gonadal tissues of moles and mice, based on the idea that a clade-specific phenotype is accompanied by multiple regulatory adaptations that support its robust development.

We screened for regulatory units with an increased number of ovotestis-specific enhancers potentially regulating development-related genes. In doing so, we discovered the TAD of the transcription factor SALL1, which is expressed specifically in mole ovotestis but not in mouse gonads. Additionally, the regulatory domain of SALL1 contains five strongly ovotestis-specific enhancers, four of which drive metanephros-specific expression in moles only. Intriguingly, these enhancers are widely conserved in mice as well; however, their activity in distinct tissues only partially overlaps with that in moles. This project shows that, during evolution, an entire group of enhancers can become functionally rewired to a new tissue context while still maintaining functional similarity in other tissues across clades.

M. Schindler, M. Osterwalder, I. Harabula, L. Wittler, A. C. Tzika, Dina K. N. Dechmann, M. Vingron, A. Visel, S. A. Haas, and F. M. Real, "Induction of kidney-related gene programs through co-option of SALL1 in mole ovotestes," Development 150 (17), dev201562 (2023).

MPG.PuRe

DOI