Research

Our projects

Network-based genome analysis

Some of our recent projects are described below.

Figure 1: NetCore workflow. (1) Data initialization. (2) Network propagation using node coreness. (3) Semisupervised module identification, integrating network propagation results with the seed gene list.

Figure 1: NetCore workflow. (1) Data initialization. (2) Network propagation using node coreness. (3) Semisupervised module identification, integrating network propagation results with the seed gene list.

The Herwig Lab maintains ConsensusPathDB, a meta-database of human – as well as mouse and yeast – molecular interactions integrated from 33 public resources [Kamburov and Herwig, Nucleic Acids Research, 2022]. The integrated protein-protein interaction (PPI) network serves as a scaffold for genome analysis, and we developed NetCore, a mathematical framework based on random walk with restart, to analyze experimental data at the network level (Fig. 1) [Barel and Herwig, Nucleic Acids Research, 2020]. Unlike traditional approaches using node degree, NetCore utilizes node core for re-ranking, making it particularly robust against biases in PPI experiments.

We applied NetCore in collaboration with the German Diabetes Center Düsseldorf, analyzing time-resolved phosphorylation data to study insulin stimulation response in human muscle, identifying kinase network modules as insulin targets [Turewicz et al., Nature Communications, 2025]. With the Institute Pasteur Tunis, we examined dynamic gene expression data to track network changes in susceptible vs. resistant mice during Leishmania major infection [Bouabid et al., Frontiers in Immunology, 2023]. Internally, we collaborate with the Metzger Lab, using network propagation to integrate transcriptomic and proteomic data from diverse pig breeds, aiming to identify pathways and network modules influencing body size.

Atanas Kamburov and Ralf Herwig, "ConsensusPathDB 2022: molecular interactions update as a resource for network biology," Nucleic Acids Research 50 (D1), D587-D595 (2022).

MPG.PuRe

DOI

publisher-version

Gal Barel and Ralf Herwig, "NetCore: a network propagation approach using node coreness," Nucleic Acids Research 48 (17), e98 (2020).

MPG.PuRe

DOI

publisher-version

Michael Turewicz, Christine Skagen, Sonja Hartwig, Stephan Majda, Kristina Thedinga, Ralf Herwig, Christian Binsch , Delsi Altenhofen, D. Margriet Ouwens, Pia Marlene Förster, Thorsten Wachtmeister, Karl Köhrer, Torben Stermann, Alexandra Chadt, Stefan Lehr, Tobias Marschall , G. Hege Thoresen, and Hadi Al-Hasani, "Temporal phosphoproteomics reveals circuitry of phased propagation in insulin signaling," Nature Communications 16 (1), Article 1570 (2025).

MPG.PuRe

DOI

publisher-version

Software development for long-read transcriptome sequencing (LRTS)

Figure 2: (A) IsoTools 2.0 workflow. (B) Gene structure model in a simplex view. Genes are mapped into different simplex regions based on variations in transcription start sites, exon chains, and polyadenylation sites. When comparing two cell types (e.g., endothelial and H1 stem cells), genes exhibiting positional shifts in the simplex can be identified. (C) Domain view of two transcripts, EDF1-0 and EDF1-3, relative to reference transcripts EDF1-203 and EDF1-201. Alternative last exon usage (red mark) in EDF1-3 results in the loss of the HTH cro/ C1-type domain.

Figure 2: (A) IsoTools 2.0 workflow. (B) Gene structure model in a simplex view. Genes are mapped into different simplex regions based on variations in transcription start sites, exon chains, and polyadenylation sites. When comparing two cell types (e.g., endothelial and H1 stem cells), genes exhibiting positional shifts in the simplex can be identified. (C) Domain view of two transcripts, EDF1-0 and EDF1-3, relative to reference transcripts EDF1-203 and EDF1-201. Alternative last exon usage (red mark) in EDF1-3 results in the loss of the HTH cro/ C1-type domain.

Our lab developed IsoTools (Fig. 2A), a comprehensive pipeline for mapping, annotation, and statistical analysis of third-generation PacBio Iso-Seq and Oxford Nanopore LRTS data. The package includes data quality control steps, isoform identification and quantification, detection of (coordinated) splicing events, and statistical tests for differential splicing [Lienhard et al., Bioinformatics, 2023]. With IsoTools we participated in LRGASP, an international challenge on longread transcript identification and quantification organized by the GENCODE consortium, where the software ranked among the top-performing tools in isoform quantification [Pardo-Palacios et al., Nature Methods, 2024]. We recently enhanced IsoTools with new features, including transcription start site detection from long reads, gene model statistics such as entropy, improved visualization components, and functional annotation using protein domains (Fig. 2B, C) [Bi et al., J Mol Biol, 2025].

F. J. Pardo-Palacios, D. Wang, F. Reese, M. Diekhans, S. Carbonell-Sala, B. Williams, J. E. Loveland, M. De Maria, M. S. Adams, G. Balderrama-Gutierrez, A. K. Behera, J. M. Gonzalez, T. Hunt, J. Lagarde, C. E. Liang, H. Li, M. Jerryd Meade, D. A. Moraga Amador, A. D. Prjibelski, I. Birol, H. Bostan, A. M. Brooks, M. Hasan Celik, Y. Chen, M. R. M. Du, C. Felton, J. Goke, S. Hafezqorani, R. Herwig, H. Kawaji, J. Lee, J. Liang Li, M. Lienhard, A. Mikheenko, D. Mulligan, K. Ming Nip, M. Pertea, M. E. Ritchie, A. D. Sim, A. D. Tang, Y. Kei Wan, C. Wang, B. Y. Wong, C. Yang, I. Barnes, A. Berry, S. Capella, N. Dhillon, J. M. Fernandez-Gonzalez, L. Ferrandez-Peral, N. Garcia-Reyero, S. Goetz, C. Hernandez-Ferrer, L. Kondratova, T. Liu, A. Martinez-Martin, C. Menor, J. Mestre-Tomas, J. M. Mudge, N. G. Panayotova, A. Paniagua, D. Repchevsky, E. Rouchka, B. Saint-John, E. Sapena, L. Sheynkman, M. Laird Smith, M. M. Suner, H. Takahashi, I. A. Youngworth, P. Carninci, N. D. Denslow, R. Guigo, M. E. Hunter, H. U. Tilgner, B. J. Wold, C. Vollmers, A. Frankish, K. Fai Au, G. M. Sheynkman, A. Mortazavi, A. Conesa, and A. N. Brooks, "Systematic assessment of long-read RNA-seq methods for transcript identification and quantification," Nature Methods 21 (7), 1349-1363 (2024).

MPG.PuRe

DOI

publisher-version

Yalan Bi, Tom Lukas Lankenau, Matthias Lienhard, and Ralf Herwig, "IsoTools 2.0: Software for Comprehensive Analysis of Long-read Transcriptome Sequencing Data," Journal of Molecular Biology , Article 169049 (2025).

MPG.PuRe

DOI

publisher-version

Effects of splicing factor mutations in cancer patients

Figure 3: (A) Distribution of SF3B1 mutations in CLL and MDS patient samples used for Iso-Seq, with each dot representing a mutated sample. SF3B1 is shown as the major isoform expressed, with the full splice match (FSM). (B) Significantly altered (Q-value < 0.01) 3′ASs separate samples by SF3B1 mutations in leukemia cell lines, as well as CLL and MDS patients, based on longer variant PSI values. (C) Distribution of 3′AS distances to the canonical splice site. Negative distances indicate alternatives located upstream, while positive values indicate alternatives located downstream, resulting in a shorter exon. The blue line represents proportion, green indicates the total number of 3′ASs, and dotted vertical lines show the enriched region (12–21 nt upstream of the canonical AG).

Figure 3: (A) Distribution of SF3B1 mutations in CLL and MDS patient samples used for Iso-Seq, with each dot representing a mutated sample. SF3B1 is shown as the major isoform expressed, with the full splice match (FSM). (B) Significantly altered (Q-value < 0.01) 3′ASs separate samples by SF3B1 mutations in leukemia cell lines, as well as CLL and MDS patients, based on longer variant PSI values. (C) Distribution of 3′AS distances to the canonical splice site. Negative distances indicate alternatives located upstream, while positive values indicate alternatives located downstream, resulting in a shorter exon. The blue line represents proportion, green indicates the total number of 3′ASs, and dotted vertical lines show the enriched region (12–21 nt upstream of the canonical AG).

In a project funded by the German Research Foundation (DFG), we collaborated with groups at the universities of Cologne and Frankfurt to apply long-read transcriptomes sequencing (LRTS) to investigate aberrant splicing induced by mutations in the splicing factor SF3B1 in chronic lymphocytic leukemia (CLL) and myelodysplastic syndrome (MDS) patients (Fig. 3A). IsoTools analyses revealed that SF3B1 hot-spot mutations specifically impact 3’ alternative splicing (3’AS; Fig. 3B) and identified a preferential selection of alternative splice sites located -12 to -21 bp upstream of the canonical splice site (Fig. 3C).

Enhancing plausibility of machine learning predictions in biomedical applications

Figure 4: Workflow of the “Predict and Propagate” approach, which integrates machine learning predictions with network propagation to enhance biological plausibility in other wise non-interpretable results. The method is illustrated using XGBoost for survival prediction.

Figure 4: Workflow of the “Predict and Propagate” approach, which integrates machine learning predictions with network propagation to enhance biological plausibility in other wise non-interpretable results. The method is illustrated using XGBoost for survival prediction.

Our lab participated in the AI initiative of the Federal Ministry of Education and Research (BMBF), coordinating a project to improve machine learning for biomedical applications by integrating biological background knowledge and leveraging methods that enable post hoc interpretability of machine learning methods.
For patient risk prediction using molecular data, we developed “Predict and Propagate,” an approach combining tree learning with XGBoost and subsequent network propagation of the learned features using NetCore. This method generates plausible network modules from otherwise non-interpretable machine learning methods (Fig. 4) [Thedinga and Herwig, 2021, 2022]. Using TCGA gene expression data from over 10,000 patients across 25 cancer types, we found that XGBoost ensemble tree learning outperforms classical decision trees and support vector machines. Additionally, pan-cancer training yielded better results than training on single cancer cohorts, identifying predictive biomarkers for survival across multiple cancer types. By integrating network propagation with machine learning, we further demonstrated that the tumor microenvironment is highly predictive for pan-cancer survival.

Kristina Thedinga and Ralf Herwig, "A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment," iScience 25 (1), 103617 (2021).

MPG.PuRe

DOI

publisher-version

Kristina Thedinga and Ralf Herwig, "Gradient tree boosting and network propagation for the identification of pan-cancer survival networks," STAR Protocols 3 (2), 101353 (2022).

MPG.PuRe

DOI

publisher-version

Deep learning models for drug response predictions

Figure 5: (A) Pre-training on in vitro drug sensitivity data enhances performance across different DNN models when applied to patientderived organoids with limited sample sizes. X-axis: number of samples; Y-axis: correlation between predictions and ground truth. (B) Genes identified from pre-trained models exhibit higher functional relevance. X-axis: top genes from different drug models; Y-axis: pathway enrichment score. (C) Drug synergy network comprising 82 drugs and 1,334 binary drug-drug interactions derived from drug triplet analysis.

Figure 5: (A) Pre-training on in vitro drug sensitivity data enhances performance across different DNN models when applied to patientderived organoids with limited sample sizes. X-axis: number of samples; Y-axis: correlation between predictions and ground truth. (B) Genes identified from pre-trained models exhibit higher functional relevance. X-axis: top genes from different drug models; Y-axis: pathway enrichment score. (C) Drug synergy network comprising 82 drugs and 1,334 binary drug-drug interactions derived from drug triplet analysis.

In cooperation with machine learning experts at the University of Potsdam, we explored deep neural network (DNN) architectures and transfer learning for drug response predictions. This approach leverages the vast amount of in vitro drug sensitivity data for pre-training, enabling more accurate predictions of drug sensitivity in preclinical models such as PDXs, PDOs, and ex vivo patient tissues, where available data is typically limited [Prasse et al., 2022a, 2022b]. Recently, we developed a deep learning method based on transformer technology to predict drug combinations of varying sizes, with a particular focus on triplet drug combinations [Campana et al., submitted to Briefings in Bioinformatics, 2025].

Paul Prasse , Pascal Iversen , Matthias Lienhard, Kristina Thedinga, Chris Bauer, Ralf Herwig, and Tobias Scheffer, "Matching anticancer compounds and tumor cell lines by neural networks with ranking loss," NAR: genomics and bioinformatics 4 (1), lqab128 (2022).

MPG.PuRe

DOI

publisher-version

Paul Prasse, Pascal Iversen, Matthias Lienhard, Kristina Thedinga, Ralf Herwig, and Tobias Scheffer, "Pre-Training on In Vitro and Fine-Tuning on Patient-Derived Data Improves Deep Neural Networks for Anti-Cancer Drug-Sensitivity Prediction," Cancers / Molecular Diversity Preservation International (MDPI) 14 (16), 3950 (2022).

MPG.PuRe

DOI

publisher-version