Design of protein models with machine learning
Prof. Dr. Cecilia Clementi
Our group works on the definition and implementation of strategies to study complex biophysical processes on long timescales. Despite the significant advances, our quantitative understanding of biological function at the molecular and cellular level is still in its relative infancy. Experimental and theoretical approaches to characterize macromolecular dynamics and function have evolved dramatically in the last few decades. However, experiment and computation have co-existed with limited feedback. On one hand, simulations can, in principle, resolve details not accessible to experiment, but are based on empirical models and, alone, cannot be quantitatively predictive. On the other hand, a wealth of indirect data on the structure and dynamics of macromolecular complexes is available from thermodynamic and kinetic measurements on parts of the systems of interest, but there is no way to systematically combine these data into a structural model. We design multiscale models, adaptive sampling approaches, and data analysis tools that allow exploring large regions of a system's free energy landscape. We use data-driven methods for systematic coarse-graining of macromolecular systems, to bridge molecular and cellular scales. We work on a theoretical formulation to exploit the complementary information that can be obtained in simulation and experiment, to combine the approximate but high-resolution structural and dynamical information from computational models with the “exact” but lower resolution information available from experiments.
Project: Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consisten with the conclusions we would draw from a model at a finer level of detail. We have recently shown that it is possible to use a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space.

Building upon recent results, we are currently investigating the design of coarse-grained force fields transferable across molecular systems. We are using continuous convolution graph neural network architectures and exploring the optimal mapping of atoms into coarse grained sites.
This project is in collaborations with the Artificial Intelligence for the Sciences Group of Prof. Dr. Frank Noé.
For more information, visit the website of the Theoretical and Computational Biophysics Group