Open Scientific Questions, Problems and Trash.Projects

I: Analysis and Modeling of Molecular Dynamics

Essential Problems

Reliable and Automatic Generation of a Markov Model for a given Simulation Set==

Nice-to-have

II: Adaptive Molecular Dynamics Simulation

Essential Problems

Incremental identification of clusters in MD data (Emal)

Currently, we are using k-means clustering on a complete MD dataset to generate microstates which are later merged into metastable states using, e.g., PCCA. In an adaptive MD simulation framework, this approach is problematic as new/updated Markov model needs to be build whenever new simulations are available. Recomputing the clustering from scratch each time would be computationally ineffictive; additionally the stochastic nature of k-means is likely to introduce instabilities into the framework. Goal: Find or design an algorithm which incrementally adapts an appropriate microstate clustering when new data is available. Such an algorithm should be fast (linear or log-linear in the number of datapoints) and guarantee some basic smoothness properties (no sudden jumps in the results when adding a few datapoints). Ideally, it should guarantee that kinetically separated data are not merged. It may be based on crisp or soft clustering. Details...

Metakinetics/Optimally expanding transition network/Search phase (Jan-Hendrik)

Currently, we assume that the relevant conformational states are somehow already known from the start. This is not the case in practice, where only one or a few conformations of a biomolecule may be available (often from NMR or X-ray experiments). Design a version of enhanced sampling which is optimal in terms of quickly finding all metastable states that can be reached from the starting state within a certain mean first passage time. Illustrate on discrete Markov model, continuous toy system, peptide dynamics. Compare to Metadynamics and Parallel Replica MD.

Enhanced Sampling of MR121-GSGSW (Emal + Jan-Hendrik)

Show that enhanced sampling can significantly reduce sampling time compared to single long simulations using the MR121-GSGSW peptide. When this works reliable by using the max-Likelihood matrix as a "simulation engine", we must get it to work with either drawing (without replacement) from already available simulation trajectories or (if pool is not sufficient), generate new ones. This is yet without changing the state definition, which would complicate things.

Nice-to-have

III: Molecular Polymerization and Aggregration

Essential Problems

Nice-to-have