# SFB 1114 | A04

**Freie Universität Berlin**

Fachbereich Mathematik und Informatik

German Research Foundation (DFG)

**Efficient Calculation of Slow and Stationary Scales in Molecular Dynamics**

Molecular dynamics (MD) simulation is a technique that aids in the understanding of fundamental processes in biology and chemistry, and has important technological applications in pharmacy, biotechnology, and nanotechnology. Many complex molecular processes have cascades of timescales spanning the range from 10-15 s to 1 s, often with no pronounced gap that would permit ef.cient coarse-grained time integration.

In many applications, the slowest timescales and the associated structural rearrangements are the ones of interest. In this project, we study the challenging process of induced folding of peptides. The metastable states associated with the slowest timescale of an induced folding problem are shown in Fig. A04-1. Such states may be found when peptide ligands bind to proteins and when membrane-associated proteins anchor into the membrane. Here, association and conformational changes occur on physical timescales of nano-to milliseconds, while dissociation events may require seconds or longer.

As a root model, we choose classical molecular dynamics with atomistic resolution and explicit solvent. Thus, the simulation system consists of a box, containing typically

10.000 to 100.000 classical particles, representing solvent, ions, and the solvated protein. The system evolves by a time-stepping scheme that approximates the solution of the classical equations of motion. Additionally, the time-stepping scheme usually contains a stochastic term that models the coupling of the molecular system to a heat bath, and thus ensures the desired thermodynamic ensemble (e.g., canonical). In this setting, molecular dynamics is a Markov process in a high-dimensional state space. The dominant timescales and their associated structure changes between metastable (long-lived) states are given by the eigenvalues and eigenfunctions of the transfer operator of the Markov process. These dominant eigenvalues and eigenfunctions therefore need to be approximated.

The introduction of Markov state models (MSMs) to molecular simulation in the past few years has been a breakthrough in providing the ability to perform such an approximation. An MSM consists of a discretization of the molecular state space into sets, often found by geometric clustering of available simulation data, and a matrix of transition probabilities between them, estimated from the same simulation data. This is an estimation of a set discretization of the transfer operator. Despite their success, the current algorithmic realization of MSMs for high-dimensional system suffers from two fundamental problems:

1. Discretization Problem: When the initial discretization for the MSM, based on Euclidean distances in the data, is poor, the dominant transfer operator eigenvalues (and timescales) will be systematically underestimated, resulting in numerical unreliability of the approach. When the user is interested in approximating a sizable number (e.g., 10–100) of slow processes with high accuracy, the common practice to use data-driven geometric clustering methods may not be a viable approach.

2. Sampling Problem: MSMs contain only information of states that have been visited and transitions that have occurred in the simulation data. While the slowest events may occur on timescales of seconds, affordable simulation lengths are on the order of microseconds. Thus, MSM construction suffers from a severe sampling problem.

Both problems are coupled. Based on keystones set by recent theoretical results, we now set out to develop a concise numerical and algorithmic framework to address them.

The long-term aims of this project are to develop efficient modeling and simulation methods for the dominant (slow) timescales of complex biomolecular simulation systems, and apply them to folding-binding problems in biomolecules.

In contrast to previous conformation-dynamics approaches such as Markov state modeling that are driven by a set-based approach, we attempt a paradigm shift and will focus on developing methods to approximate and sample individual timescales and eigenfunctions one by one.