key: cord-0151848-abop3snz authors: Unke, Oliver T.; Stohr, Martin; Ganscha, Stefan; Unterthiner, Thomas; Maennel, Hartmut; Kashubin, Sergii; Ahlin, Daniel; Gastegger, Michael; Sandonas, Leonardo Medrano; Tkatchenko, Alexandre; Muller, Klaus-Robert title: Accurate Machine Learned Quantum-Mechanical Force Fields for Biomolecular Simulations date: 2022-05-17 journal: nan DOI: nan sha: 1f385d1785cc97103fc5e65b08969407ba348677 doc_id: 151848 cord_uid: abop3snz Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes. Accurate MD simulations require computationally demanding quantum-mechanical calculations, being practically limited to short timescales and few atoms. For larger systems, efficient, but much less reliable empirical force fields are used. Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations, offering similar accuracy as ab initio methods at orders-of-magnitude speedup. Until now, MLFFs mainly capture short-range interactions in small molecules or periodic materials, due to the increased complexity of constructing models and obtaining reliable reference data for large molecules, where long-ranged many-body effects become important. This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on"bottom-up"and"top-down"molecular fragments of varying size, from which the relevant physicochemical interactions can be learned. GEMS is applied to study the dynamics of alanine-based peptides and the 46-residue protein crambin in aqueous solution, allowing nanosecond-scale MD simulations of>25k atoms at essentially ab initio quality. Our findings suggest that structural motifs in peptides and proteins are more flexible than previously thought, indicating that simulations at ab initio accuracy might be necessary to understand dynamic biomolecular processes such as protein (mis)folding, drug-protein binding, or allosteric regulation. Molecular dynamics (MD) simulations allow to determine the motion of individual atoms in chemical and biological processes, enabling mechanistic insights into molecular properties and functions, as well as providing a detailed interpretation of experimental studies. MD simulations require a reliable model of the forces acting on each atom at every time step of the dynamics [1] . It is most desirable to obtain atomic forces from accurate solutions to the manybody Schrödinger equation [2] , but this is only feasible for short MD simulations of few atoms for the foreseeable future [3] . For larger systems, it is common practice to derive the forces from empirical models of the potential energy. Such force fields (FFs) approximate chemical interactions with computationally efficient terms and enable MD simulations of millions of atoms [4] for up to several milliseconds of dynamics [5] . A disadvantage of FFs is their limited accuracy due to the neglect of important quantum-mechanical effects, such as changes to hybridization states, interactions between orbitals delocalized over several atoms, or electronic correlations between distant molecular fragments. Further, many FFs require a predetermined covalent bonding structure, preventing bond breaking and formation. When additional accuracy and flexibility is required, for example to study an enzymatic reaction, a possible alternative are quantum mechanics/molecular mechanics (QM/MM) simulations [3, 6] : The system is divided into a small QM region modelled with ab initio methods (e.g. substrate and active site of an enzyme) and an MM region (e.g. the remaining protein and solvent molecules) described with an FF. However, the high computational cost associated with an accurate treatment of the QM region and the fact that it is often unclear which atoms need to be included for an adequate description of the process of interest [7] may limit the applicability of QM/MM methods. In recent years, machine learned force fields (MLFFs) have emerged as an alternative means to execute MD simulations, combining the computational efficiency of traditional FFs with the high accuracy of quantum-chemistry methods [8] . To construct an MLFF, a machine learning (ML) model is trained on ab initio reference data to predict energies and forces from atomic positions -without the need to explicitly solve the Schrödinger equation outside of the reference data. MLFFs have led to numerous insights, e.g. regarding reaction mechanisms [9] , or the importance of quantum-mechanical effects for the dynamics of molecules [10] . Despite these successes, until now, MLFFs have been applied primarily to MD simulations of small to medium-sized systems (tens of atoms), or periodic materials (e.g. metallic copper) [11] . Applications to large heterogeneous systems, like proteins or other biologically relevant systems, have remained elusive, due to the increased complexity of constructing physically-informed ML architectures and obtaining reliable reference data for long-range interactions, which are known to play a key role in biomolecular dynamics [12, 13] . This work proposes a general approach to constructing accurate machine-learned force fields for large-scale molecular simulations (GEMS). Based on the divide-andconquer principle, MLFFs for large heterogeneous systems are trained on molecular fragments of varying size, which cover the relevant chemical interactions, but are still amenable to electronic-structure calculations. From this fragment data, the model infers to recompose the original system, which allows GEMS to successfully address the long-standing challenge of biomolecular simulations at ab initio quality (Fig. 1A) . While MLFFs can successfully learn local chemical interactions from small molecules [14] , a sufficient number of larger fragments is needed to learn long-range effects necessary to generalize to larger systems and achieve high prediction accuracy (0.450 meV/atom for energies and 36.704 meV/Å for forces). In this work, we rely on the recently proposed SpookyNet architecture [15] , which models dispersion and electrostatics explicitly by embedding physically motivated interaction terms into the ML architecture and learning their parameters from reference data. In addition, an empirical term for short-ranged repulsion between atomic nuclei increases the robustness of the model for strong bond distortions. SpookyNet also includes a mechanism to describe effects like non-local charge transfer, which other MLFFs (with some exceptions [16] ) are typically unable to. Taken together, these components enable the model to generalize to larger molecules when trained on appropriate reference data. However, ultimately, the quality and reliability of an MLFF should be judged by its predictions of experimental measurements -for example, we show that GEMS is able to quantitatively reproduce experimental results regarding the helix stability of polyalanine systems at different temperatures and correctly describe infrared peak positions of water for a solvated protein. GEMS is applied to MD simulations of model peptides and the 46-residue protein crambin in aqueous solution (>25k atoms). When comparing to conventional force fields, such as AMBER99SB-ILDN [17] (AmberFF), GEMS approximates energies and forces computed from density-functional theory much more closely (Fig. 1B) . Interestingly, our findings reveal previously unknown intermediates in the folding pathway of poly-alanine peptides (Fig. 1C) and a dynamical equilibrium between α-and 3 10helices. In the simulations of solvated crambin, GEMS indicates that protein motions are qualitatively different and more flexible when compared to computations with a conventional FF (Fig. 1D) , showing contrasting short and long timescale dynamics. These results suggest that simulations at ab initio accuracy may be necessary to fully understand dynamic processes like protein (mis)folding, drug-protein binding, or allosteric regulation. We start by generating reference data for smaller molecular fragments in order to train an MLFF, where the learned model accurately reflects the full large system. There are Top-down fragments are generated by cutting out a spherical region around an atom (including solvent molecules) and saturating all dangling bonds (the right side shows four top-down fragments generated from different regions). They are crucial for learning weak but long-ranged interactions, which are important for the dynamics of large systems. (B) Bottom-up fragments are generated by constructing chemical graphs consisting of one to eight non-hydrogen atoms (not all possible graphs are shown). The graphs are then converted to three-dimensional structures by adding hydrogen atoms. Due to their small size, multiple ab initio calculations for many different conformers of each generated structure can be performed, allowing extensive sampling of the potential energy surface which is necessary for training robust models. several strategies to achieve this goal. On one hand, the model needs to be able to learn all relevant chemical interactions that are necessary to reconstruct a complete and accurate picture of the system of interest from the fragment data. This is important to capture weak, but long-ranged interactions, which collectively dominate e.g. relative energy differences of different conformations of large molecules. On the other hand, it is necessary to prevent "holes" in the potential energy surface (PES) [22] -regions with low potential energy corresponding to unphysical structures, e.g. featuring unnaturally large or short bond lengths. The existence of holes in the PES prevents stable MD simulations, because long trajectories eventually may become trapped by such artefacts and behave unphysically [23]. To achieve both requirements, we propose the use of two complementary methods to construct fragments, which allow models to learn different aspects of the PES of large systems. The first method follows a top-down approach, where fragments are constructed by "cutting out" spherical regions of the system of interest, which also includes solvent molecules in the condensed phase ( Fig. 2A) [24]. They are chosen as large as possible to sample important long-range effects, but still small enough such that reference energies and forces computed with quantum chemistry methods are accessible in a reasonable time. As our tests on poly-alanine systems demonstrate (see below), the top-down fragments we choose are sufficiently large for the systems studied in this work. Since it is difficult to collect enough reference data for the large top-down fragments to train robust models, they are enriched by smaller fragments, for which data points for many different conformations can be collected. Start-ing from single atoms, molecules similar to local bonding patterns of the system of interest [14] are systematically constructed by growing chemical graphs in a bottom-up fashion (Fig. 2B) . By limiting the size of these fragments, it is possible to sample many different conformations, allowing models to learn the effects of strong distortions in local structural patterns, which is key to preventing holes in the PES. As a result, the combination of bottom-up and top-down fragments enables learning accurate and robust MLFFs for large systems. We apply GEMS to predict the properties and dynamics of several peptides consisting primarily of alanine. These are popular model systems for proteins and well studied both theoretically and experimentally. In addition, by limiting the number of residues, it is still possible to perform electronic-structure calculations for the full system. Thus, the predictions of an ML model trained only on fragment data can be directly compared to reference calculations, which allows to verify the ability of GEMS to reconstruct the properties of larger systems from the chemical knowledge extracted from smaller molecules. As a first test case, we consider the cooperativity between hydrogen bonds in poly-alanine peptides capped with an N-terminal acetyl group and a protonated lysine residue at the C-terminus (AceAla n Lys + H + ). In α-helices, the local dipole moments of hydrogen bonds formed between backbone peptide groups are aligned, leading to a cooperative polarization effect [26] . Thus, the relative stabilization energy of an α-helix compared to a fully extended structure (FES) fluctuates non-trivially with helix length and is a challenging prediction task. We find that GEMS closely agrees with the reference ab initio method, demonstrating that large-scale effects can be learned effectively from fragment data (Fig. 3A) . Alanine-based peptides have a strong tendency to form helical structures. While short isolated helices are only marginally stable in solution, AceAla 15 Lys + H + is known to form stable helices in gas phase. Experimental results suggest that AceAla 15 Lys + H + remains helical up to temperatures of ∼725 K [27], allowing a direct comparison with theoretical predictions. By running GEMS simulations at different temperatures, we confirm that the peptide remains primarily helical up to 700 K, but forms a random coil at 800 K (see supplementary video 1 at https://youtu.be/QZIc3a4OjJk). An analysis of the formed hydrogen bonds reveals that the average number of α-helical hydrogen bonds decreases with increasing temperature (see Fig. S7A ), while the number of 3 10 -helical hydrogen bonds remains almost constant until a sudden drop at 800 K (see Fig. 3B ), which agrees with results from ab initio simulations [28] . Interestingly, the long-ranged interactions learned from top-down fragments seem to be crucial to reproduce the experimental results, as a model that was only trained on bottom-up fragments predicts reduced thermal stability (see Fig. S7B ). To investigate whether there are fundamental differences between GEMS and dynamics simulations performed with conventional FFs, we study the room temperature (300 K) folding process of a pure poly-alanine peptide capped with an N-terminal acetyl group and a C-terminal Nmethyl amide group (AceAla 15 Nme) in gas phase. Starting from the FES, MD simulations with GEMS suggest that AceAla 15 Nme has a strong tendency to form H-bonds between peptide groups of directly adjacent residues within the first ∼100 ps of dynamics. The formed arrangements exhibit a "wavy" structure and φ and ψ backbone dihedral angles of ∼ 0 • and ∼ 0 • , which lie in a sparsely populated region of the Ramachandran plot. These intermediates are typically short-lived with lifetimes of ∼25-50 ps, and fold readily into helical configurations via a characteristic twisting motion. There is still some controversy between theoretical and experimental results regarding the predominance of different helical conformations [29] . We find that there may be cases where no single motif is preferred: Once a helix is formed, its structure fluctuates between pure α-and 3 10 -helices, as well as hybrids of both helix types (see supplementary video 2 at https://youtu.be/ZuKW292DKKw for a complete folding trajectory). A 10 ns trajectory of the helical state suggests a dynamical equilibrium with a ∼38%/62% mixture of α-and 3 10 -helices. In contrast, MD simulations with a conventional FF yield qualitatively different results, suggesting that a more rigid and primarily α-helical configuration is formed from the FES without distinct structural intermediates (see Fig. 3C ). As a final test for the accuracy of GEMS, we compare the predictions of the ML model to ab initio data computed at the at the PBE0/def2-TZVPP+MBD [18-21] level of theory. To this end, we use 1554 and 1000 AceAla 15 Nme Fully Extended Structure "Wavy" Structure structures sampled from densely and sparsely populated regions (rare events) of the configurational space visited in 100 aggregated 250 ps MD trajectories (25 ns total) in the NVT ensemble at 300 K simulated with GEMS (see Section S4 for details). We find that predicted energies and forces are in good agreement with the reference values in both cases. For energies and forces, correlation coef-ficients are R 2 = 0.996 and R 2 = 0.998, respectively, and mean absolute errors (MAEs) are 0.450 meV/atom and 36.704 meV/Å. Again, we find that the inclusion of top-down fragments during training is crucial for high accuracy, as prediction errors for a model trained only on bottom-up fragments are much larger (see Fig. S4 ). In comparison, even though predictions with the conventional AmberFF [17] are remarkably accurate with correlation coefficients of R 2 = 0.928 (for energy) and R 2 = 0.876 (for forces), the MAEs are much larger at 2.274 meV/atom and 329.328 meV/Å (distributions of predicted and reference energy values were shifted to have a mean of zero prior to computing MAEs in both cases, such that constant energy offsets between different methods do not influence the results). As a general trend, we observe that predictions with GEMS reproduce the reference across the whole range of values without the presence of a single outlier, whereas the AmberFF systematically under-and over-predicts small and large energy values, respectively (see Fig. 1B ). These findings show that GEMS gives accurate predictions even for rare configurations and the simulated MD trajectories are essentially ab initio quality (see Figs. S2 and S3 for a more detailed analysis of correlations within the different subsets of configurations). GEMS enables accurate molecular simulations in the condensed phase. The 46-residue protein crambin in aqueous solution (25257 atoms) is chosen as a model system. Crambin contains 15 out of the 20 natural amino acids and forms common structural motifs such as β-sheets, αhelices, turns/loops, and disulfide bridges. To assess qualitative differences between simulations with a conventional FF (here, the AmberFF is chosen) and GEMS, we consider the power spectrum [33] computed from 125 ps of dynamics at a temporal resolution of 2.5 fs (Fig. 4A ). The power spectrum is related to the internal motions of the system and reveals the dominant frequencies of molecular vibrations, which are influenced by the atomic structure and characteristic for the presence of certain functional groups. In comparison to the results obtained from the dynamics with a conventional FF, peaks in the power spectrum computed with GEMS are shifted towards lower wavenumbers and lie close to the frequency ranges expected from measured infrared spectra. For example, the dominant peaks above 1000 cm -1 correspond to bending and stretching vibrations of water molecules, which are experimentally expected at around ∼1600 cm -1 and ∼3500 cm -1 , respectively [30] , which is consistent with the GEMS spectrum. In contrast, the corresponding peaks for the conventional FF are blue-shifted several hundreds of wavenumbers. Additionally, peaks in the GEMS spectrum are broader, indi-cating that the frequencies of characteristic vibrations are influenced stronger by intermolecular interactions, hence broadening their frequency range. Long-ranged interactions may particularly influence slow proteins motions, i.e. the low-frequency parts of the power spectrum, where notable differences between GEMS and AmberFF can be observed. Similar to the results for AceAla 15 Nme, we find that in comparison to simulations with the AmberFF, crambin is more flexible in GEMS simulations (Fig. 1D) . Although a straightforward quantitative comparison is not possible, qualitatively, the increased flexibility agrees more closely to structures modelled from nuclear magnetic resonance (NMR) spectroscopy measurements (see Fig. S5 ). The increased flexibility is also indicated by a Ramachandran map of the backbone dihedral angles of crambin (Fig. 4B) , which shows that a wider range of values is sampled in simulations with GEMS. This becomes even more apparent by projecting the trajectories into a low-dimensional space that allows a direct visualization of the path taken through conformational space (Fig. 4C) . However, a timeresolved analysis of the trajectories reveals that structural fluctuations with GEMS are only larger on timescales in excess of ∼200 ps (Fig. 4D ). On shorter time scales on the other hand, the trend is reversed. This suggests that there are qualitative differences between simulations with conventional FFs and GEMS on all timescales and simulations with ab initio accuracy might be crucial to fully understand protein dynamics. Modeling quantum-mechanical properties of large molecules is an outstanding challenge and it holds promise for broad application in chemistry, biology, pharmacology, and medicine. We have developed a general framework for constructing MLFFs -GEMS -for large molecular systems such as proteins by learning from ab initio reference data of small(er) fragments without the need to perform electronic-structure calculations for a whole protein -as the latter would constitute a computationally impractical task. The proposed divide-and-conquer strategy employing a library of ∼3 million DFT+MBD computations on fragments and using a machine learning model that incorporates physical constraints and long-range interactions allows to efficiently construct MLFFs that accurately reflect the quantum-mechanical energies and atomic forces in large molecules. An interesting insight of our ab initio accurate simulations is that proteins are significantly more flexible than previously thought. These molecular fluctuations and associated low-frequency vibrations are expected to strongly contribute to dynamical processes such as protein folding, drug-protein binding, and allosteric regulation [35] . While our work focuses exclusively on the study of peptides and proteins, the proposed framework can be applied to any atomic system too large to study with ab initio methods. We find that even small poly-alanine peptides display qualitatively different dynamics when simulated with GEMS in comparison to dynamics with conventional FFs. For example, GEMS simulations suggest that the folding of AceAla 15 Nme from the FES to a helical conformation occurs via short-lived intermediates characterized by hydrogen bonding between peptide groups of adjacent residues. Once a helix is formed, its structure fluctuates between 3 10 -and α-helices in a dynamical equilibrium. This is in stark contrast to simulations with a conventional FF, where the peptide forms a rigid α-helix without visiting a common intermediate. These results are reminiscent of the first MD study of a protein [36] , which showed that proteins are less rigid than previously thought [37] . The current findings, already alluded to above, indicate that proteins might be even more flexible, and our simulations of crambin suggest that the general trend observed for peptides in gas phase also holds for proteins in solution. In particular, crambin samples a larger conformational space in GEMS simulations and its backbone dihedral angles have broader distributions. Interestingly however, structural fluctuations on short time scales are reduced in comparison to simulations with a conventional FF. These observations show that there are qualitative differences in the dynamics of proteins when they are simulated with ab initio quantum-mechanical accuracy. We conjecture that these variations in the dynamics might be crucial for the understanding of effects like allostery, or processes like enzyme catalysis and protein (mis)folding. For example, a description of the potential energy at ab initio quality might be necessary to gain deeper insights into Levinthal's paradox [38] . A promising avenue for future work is to extend GEMS to larger systems and longer timescales, for example by distributing GEMS simulations over multiple accelerators, which requires non-trivial modifications to the way the MLFF is evaluated. Other possible extensions to GEMS include incorporating nuclear quantum effects, which were demonstrated to significantly change the dynamics of small molecules [39] . It is likely that similar effects can be observed for larger systems. Let us discuss some limits of MLFFs when compared to classical MD simulations. Although MLFFs are orders of magnitude more computationally efficient than ab initio calculations, their computational efficiency is lower than that of conventional FFs (as to be expected). For example, simulating a single timestep of NPT dynamics of crambin in aqueous solution on an NVIDIA A100 GPU with GEMS takes roughly ∼500 ms, whereas GROMACS [40] only requires ∼2 ms for a single time step with a conventional FF on similar hardware. Consequently, at this moment, GEMS simulations are limited to shorter time scales. In addition, evaluating MLFFs usually requires increased memory compared to conventional FFs, limiting the maximum system size that can be simulated with GEMS. Nonetheless, GEMS allows to simulate several nanoseconds of dynamics for systems consisting of thousands of atoms with ab initio accuracy (it may be possible to combine GEMS with other ML methods to achieve further speed-ups, e.g. by allowing larger time steps during the dynamics [41] ). Furthermore, GEMS like every other MLFF may lead to unphysical dynamics, if not properly trained (see e.g. Ref. 23 for a discussion of such phenomena). As a rule, MLFF simulations should therefore always be subjected to more scrutiny than re-sults from mechanistic FFs. In particular, the resulting trajectories need to be carefully checked for unphysical bond breaking or formation, or otherwise unphysically distorted conformations, which are prevented in traditional FFs by construction. Nevertheless it should be emphasized again that compared to simulations with a conventional FF, GEMS offers highly improved accuracy as well as enables to study chemical transformations such as the making and breaking of chemical bonds and proton transfer processes. Another advantage of using accurate MLFFs is the availability of arbitrary derivatives -including the potential to obtain alchemical derivatives [42, 43] . This may enable the optimization of accessible observables, such as docking/binding energies, with respect to local (nearly isosteric) mutations. In a more conventional approach, MLFFs can be used to describe the effects of mutations via thermodynamic integration as regularly performed with classical FFs [44] [45] [46] . Given the incorporation of nonlocality in the present methodology, such analyses could naturally account for longer-range phenomena like (static) allosteric effects and the inherent non-additivity of interactions known to be relevant for the free energy of binding or stability [47] . In a similar vein, this may allow to perform sensitivity analyses or engage explainability methods (see, e.g., Ref. 48) to identify allosteric hotspots and networks, which have also been speculated to play an important role for the evolutionary aspects of proteins and the biomolecular machinery [49, 50] . Again, we would like to stress that such studies and the above approaches are not limited to biomolecular systems. They may equally well be applied in materials design; for example, studying and optimizing point defects in solid state systems as relevant to the design of quantum materials. Finally, we would like to highlight a promising application of GEMS to modeling protein-protein interactions. Fig. 5 shows the binding curves of the angiotensin converting enzyme 2 (ACE2) and the receptor binding domain (RBD) of the spike protein of SARS-CoV-1 and SARS-CoV-2 variants using either AmberFF or GEMS (in gas phase). Here, as expected from experimental evidence [51] we observe a stronger binding of the SARS-CoV-2 both for the classical FF as well as GEMS. Interestingly, however, GEMS yields a substantially stronger binding by -1.1 eV. While the obtained binding energies do not account for solvation or entropic effects and cannot be directly compared to experimental binding affinities, this application provides evidence of the importance of ab initio accuracy when studying interactions between complex biological systems. We therefore would like to stress the high promise of GEMS for enabling quantummechanical insight in broad application domains across enzyme and protein chemistry or heterogeneous materials. Although top-down fragments in this work are systemspecific, in the future, they may be generated to cover a wider range of systems and enable GEMS simulations with a chemically transferable and size-extensive "universal" machine-learned force field. Bottom-up fragments The construction of bottom-up fragments follows an approach similar in spirit to the one described by Huang et al. [14] . Ignoring hydrogen atoms, increasingly large chemical graphs with the same local bonding patterns as the system of interest are constructed until a maximum number of heavy atoms is reached. This is achieved by starting from graphs consisting of a single heavy atom. Larger graphs are constructed by successively adding additional heavy atoms and pruning graphs which do not appear as substructures in the original system. Once the graphs are constructed, they are converted to bottomup fragments by saturating all valencies with hydrogen atoms and generating the corresponding three-dimensional molecular structure (e.g. using Open Babel [52] ). For all structures, multiple conformers are sampled using either MD simulations at high temperatures or normal mode sampling [53] . Here, we use the bottom-up fragments for proteins generated in earlier work [54] (see Ref. 55 for details). Since all similar local structures are covered by the same graphs, the bottom-up fragments optimally exploit any structural redundancies, often resulting in a surprisingly small number of fragments. For example, just 2 307 chemical graphs (with a maximum of eight heavy atoms) are sufficient to cover all local bonding patterns appearing in proteins consisting of the twenty natural amino acids, even when considering different protonation states and the possibility of disulfide bridges [55] . Top-down fragments Starting from an MD snapshot of the system of interest (sampled from conventional MD simulations, see below and Section S3 for more details), all atoms outside a spherical region around a central atom are deleted. The cutoff radius for selecting the spherical region should be chosen as large as possible, but still resulting in fragment sizes for which reference calculations are feasible (in this work, we choose 8Å). Then, any resulting dangling single bonds on heavy atoms are saturated with hydrogen atoms. Valencies situated on hydrogen atoms or corresponding to double bonds are eliminated by including the bonded atom in the original system (outside the cutoff). This process is repeated until all valencies are saturated [24] . By choosing different central atoms, several (partially overlapping) top-down fragments can be constructed from a single configuration of the original system. Density Functional Theory (DFT) reference calculations were performed using the Psi4 software package [56, 57] at the PBE0/def2-TZVPP+MBD [18-21] level of theory on Google Cloud Platform (GCP). This level of theory has been found to be well-suited for modelling interactions of proteins in water [13, 58, 59] . Each fragment was run on an independent Docker container within a cloud compute engine virtual machine. We mostly used n2d-higmem-4 and n2-highmem-4 virtual machine instances with 4 cores, 32 GB RAM and 768 GB of disk space each, with some larger fragments being manually relaunched on highermemory machines if they crashed with out-of-memory errors. Execution was parallelized on up to 20 000 CPU cores. Calculations where shut down if they did not complete within 21 days, which was the case for a few outliers, but median execution time per fragment was ≈ 48 hours. All MLFFs in this work use the recently proposed SpookyNet architecture (see Ref. 15 for details). We use three different trained ML models in this work, one for the simulations of all poly-alanine systems, one for the simulation of crambin in aqueous solution, and one for the gas-phase ACE2/SARS CoV-1/2 RBD binding curves shown in Fig. 5 (CoV model) . The poly-alanine and CoV models use the recommended architectural hyperparameters of T = 6 interaction modules and F = 128 features [15]. Due to hardware limitations when performing MD simulations for thousands of atoms, the crambin model uses T = 3 interaction modules and F = 64 features to reduce memory requirements. All models use a shortrange cutoff of r srcut = 10 a 0 (∼ 5.29Å). The crambin and CoV models additionally employ a long-range cutoff of r lrcut = 20 a 0 (∼ 10.58Å) for the computation of the analytical electrostatic and dispersion correction terms included in the SpookyNet energy prediction (to achieve sub-quadratic scaling with respect to the number of atoms). We follow the training protocol described in Ref. 15 for fitting the parameters to reference energies, forces, and dipole moments, however, the mean squared Conventional FF All classical MD simulations have been performed with the GROMACS 2020.3 software package using NVIDIA V100 or A100 GPUs in a Kubernetes system on GCP. Throughout this work, we have employed the AMBER99SB-ILDN force field [17] for the conventional MD simulations. Standard amino acid definitions have been adapted to accommodate charged Lys + H + termini in accordance with the AMBER99SB-ILDN parametrizations where needed. In the MD simulations of ACE2/SARS CoV-2 RBD, the binding of the Zn 2+ -cofactor in ACE2 has been described via harmonic restraints to the experimentally-determined ligands in order to avoid potential shortcomings in the description of the metal-ligand interaction. All solvated systems presented in this manuscript or used for sampling representative structures for generating top-down fragments were initially resolvated, optimized to a maximum atomic force of 1000 kJ/mol/nm, and equilibrated according to the protocol detailed in section S2. Simulations for studying non-equilibrium processes (i.e. the gas phase folding/unfolding of poly-alanine systems) have been started directly from optimized structures with velocities drawn from a Maxwell-Boltzmann distribution at twice the simulation temperature (such that the average kinetic energy during the simulation corresponds to the desired temperature [61]). The gasphase simulations have thereby been realized in a pseudo-gasphase setting as proposed and validated in Ref. 61. All constant temperature MD simulations have been performed using temperature coupling via stochastic velocity rescaling [62] and a Parrinello-Rahman barostat [63] has been used for NPT simulations. To speed up computations, standard MD simulations involved the commonly employed constraint of bonds involving hydrogen, while the power spectra reported in this work have been obtained from fully unconstrained simulations. The starting structures of poly-alanine systems have been generated with the Avogadro software [64] and the initial structure of crambin has been taken from PDB entry 2FD7 [31] , where the incorrectly described residues (SER11 and VAL15) have been remodeled using PyMOL [65] . Our simulations of the ACE2/SARS CoV-1/2 RBD complex have been initiated from a set of representative conformations as identified in Ref. 34 or pointwise mutations thereof. Currently available experimental results on the mutations present in the β-, γ-, δ-, and -variants of SARS CoV-2 do not indicate considerable structural changes to the spike RBD. After partial relaxation, simple pointwise mutations of the structural representatives obtained for the α-variant can thus be assumed to represent viable starting points for MD simulations of the different variants. GEMS All MD simulations with the GEMS method were performed using the SchNetPack [66] MD toolbox with a timestep of 0.5 fs and without any bond constraints. Simulations for poly-alanine systems were performed on NVIDIA V100 GPUs on GCP, whereas crambin simulations were performed on NVIDIA A100 GPUs with 80GB. To mimic experimental conditions [27], the simulations of AceAla 15 Lys + H + helix stability were performed in the NVE ensemble starting from an optimized structure with initial velocities drawn from a Maxwell-Boltzmann distribution at twice the simulation temperature as explained above. The folding simulations of AceAla 15 Nme were performed in the NVT ensemble at 300 K starting from the optimized FES using the same method to assign initial velocities. Simulations of crambin in aqueous solution were performed in the NPT ensemble at a temperature of 300 K and a pressure of 1.01325 bar, starting from an optimized structure and initial velocities drawn from a Maxwell-Boltzmann distribution according to the simulation temperature (the first 1 ns of dynamics was discarded to allow the system to equilibrate). Constant temperature and/or pressure simulations use the Nosé-Hoover chain thermostat/barostat [67, 68] implemented in SchNetPack using a chain length of 3. We thank Michael Brenner for insightful comments. OTU acknowledges funding from the Swiss National Science Correspondence to OTU, AT and KRM. There are no competing interests to declare. DFT reference data for training the models and scripts for running MD simulations with GEMS will be made available upon publication. of small molecules with coupled cluster forces. J. Chem. Phys., 150(11):114102, 2019. [11] Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, E Weinan, and Linfeng Zhang. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. Conventional force fields (FFs) allow to study large systems, e.g. entire viruses [69] [70] [71] , in atomic detail. They achieve this remarkable efficieny by modeling chemical interactions as a sum over simple empirical terms [72] [73] [74] . However, while very efficient, their accuracy is limited [75] and they typically cannot describe chemical reactions. Although there are various efforts to increase the accuracy of classical FFs, for example by including polarization effects [76, 77] and sophisticated models for anisotropic charge distributions [78] [79] [80] [81] [82] , or by developing reactive FFs [83, 84] , they are clearly much faster to evaluate but typically cannot compete with the accuracy of machine learned force fields (MLFFs) [85] [86] [87] [88] [89] [90] [91] . Machine learning (ML) methods "learn the rules" of quantum mechanics [92] and their representation from data, allowing to skip computationally expensive ab initio simulations. Beyond FF construction, there are several other applications of ML to quantum chemistry (QC). One of the earliest uses of ML in QC was the exploration of chemical space [93] [94] [95] [96] . However, ML can also be used to accelerate studies that typically rely on MD simulations or other dynamical equations [97] . For example, it can be used to directly sample equilibrium distributions [98, 99] or rare events [100] , or directly predict reaction rates [101] . Further, ML is used for predicting protein structure [102] [103] [104] , solving the Schrödinger [105] [106] [107] , predicting wave functions [108] [109] [110] , modelling solvated systems [111] , generating molecules and solving inverse design problems [112] [113] [114] [115] [116] [117] [118] , and even for planning chemical syntheses [119] . For a more detailed overview of the use of ML in molecular and material science, refer to Refs. 120-125, for an overview of applications in molecular simulations, refer to Ref. 126 , and for reviews on the exploration of chemical space, refer to Refs. 127 and 128, furthermore general reviews can be found in Refs. 92, 119, 129-134. After initial preparation (resolving doubly-or ill-defined residues and atom type definitions present in the original files from the respective sources specified in the main manuscript), classical molecular dynamics (MD) simulations of solvated systems were initialized by resolvating the systems in cubic simulation boxes with a minimum protein-to-box distance of 1.6 nm. Unless explicitly specified otherwise, simulation cells were solvated in TIP3P water with physiological concentrations of NaCl with excess Na + -or Cl --ions to neutralize the simulation box where needed. The solvated structures were subsequently optimized to a maximum atomic force of 1'000 kJ/mol/nm and equilibrated in a four-step procedure consisting of (a time step of 2 fs was used in all cases): 1) a short NVT-simulation of 50'000 steps (simulation time 100 ps) 2) NPT simulation (Berendsen barostat) of 50'000 steps (100 ps) 3) NPT simulation (Parrinello-Rahman barostat [135] ) of 100'000 steps (200 ps) with fully constrained bonds 4) and 100'000 steps (200 ps) with constraints on all bonds involving hydrogen. In all equilibration runs a constant temperature thermostat with stochastic velocity rescaling [136] set to the final simulation temperature was employed. Throughout all steps, the AMBER99SB-ILDN force field [137] was used. For AcAla 15 Lys-chains involving the charged LysH + terminus, topology and AMBER definitions have been adapted accordingly using the default AMBER99SB-ILDN parametrization. For the gasphase AcAla 15 Lys+H + , we adopted pseudogasphase settings as detailed in Ref. 138 using maximal unit cells while disabling particle-mesh Ewald electrostatics. The constant temperature (pseudo-)gasphase simulations were prepared by structure optimization (maximum atomic force of 1'000 kJ/mol/nm). The reported simulations were then run in an NVT ensemble initialized with velocities randomly drawn from a Maxwell-Boltzmann distribution correponding to twice the simulation temperature. To train a model that can be used to simulate trajectories of a particular system of interest, we want to train it on a diverse set of top-down fragments representative of a variety of conformations. The general strategy is to cluster configurations that occur in classical MD simulations, select some representatives for each cluster, and then decompose the whole configurations into spherical regions that are small enough to run DFT calculations on them (top-down fragments). Different systems have different characteristics when it comes to the possible conformations: • The poly-alanine systems unfold and thus show a lot of variation, but the overall system is comparatively small (contains few atoms). • Crambin in aqueous solution contains many different atoms, but due to presence of three disulfide bridges, the protein itself shows only minimal variations, mainly determined by the states of the disulfide bridges, so clustering is straightforward. In our classical MD simulation of AceAla 15 Nme and AceAla n Lys + H + in solution at temperatures 280K, 300K, and 310K, (2 µs each) the poly-alanine chain did not keep a helix structure, but assumed almost arbitrary conformations. So we cannot define different well defined clusters, but we can still use a clustering algorithm to find a diverse and representative sample of the configurations seen during the trajectories. We used affinity propagation [139] as the clustering algorithm. This algorithm takes as input a matrix specifying "similarities" between two objects, and a "preference" that specifies the cost of adding a new cluster (which is balanced with the gain in similarity obtained by switching nearby objects to the new cluster). The output is a set of clusters with one object in each cluster designated as the representative of this cluster. The number of clusters is controlled by the relation between similarities and the preference; default choices for the preference include the median similarity and the lowest similarity, but in general the preference can be tuned to produce clusters at the desired granularity. To compare two configurations of atoms, we move them so that the center of mass is at the origin, and then use the rotation that gives the minimal mean square distance between the atoms. The similarity is then the negative sum of the square distances. We set the preference to -50 compared to a median similarity between -14 and -31 for the six trajectories, this gave 240 cluster for AceAla n Lys + H + molecule, and 266 cluster for the AceAla 15 Nme molecule. Initial structure were taken from PDB entry 2FD7. The incorrect residues SER11 and VAL15 have been remodeled using PyMOL. Crambin has 3 disulfide bridges at atoms (NCCS:SCCN) • These disulfide bridges have two stable positions, in two MD simulations over 5 µs each we observed the first and the third disulfide bridge to flip between stable positions (measured as dihedral angle of the N-C-C-S configurations). Together with a twist at an Arginine residue, these explained almost all the variations seen. We computed 22 clusters and made sure all observed variations were represented. To check the accuracy of our GEMS simulation, we select samples from 100 trajectories of 2500 steps starting from a common stretched initial conformation. We subsample by only taking every second time step, which leaves 125,000 conformations. We use affinity propagation (see Section S3) to get representative samples. The similarity is the negative sum of square distances between corresponding non-hydrogen atoms, after centering the molecules and applying an optimal rotation. However, using affinity propagation directly on these trajectories would have a large bias towards stable end conformations: Our trajectories contain stable end conformations for roughly half of the time, so affinity propagation with the default settings would spend most representatives on the stable conformations, largely ignoring the interesting folding part. To reduce this bias, we use a preprocessing step that removes conformations that have a small distance to an already selected point. (The threshold used was 9Å 2 for the sum of the square distances, corresponding to 0.3Å per non-hydrogen atom for the RMSE.) This reduces the stable tails of the trajectories, but if we do this only within the trajectories, we get another bias around the common initial conformation. Computing pairwise distances for the union of all trajectories would be computationally expensive, so we use an approximation: We randomly mix all 100 trajectories and subdivide them into a partition of smaller subsets, and remove "almost duplicates" (as above) only inside each of the partitions. We then mix again and thin out again three more times. This removes most of the "almost duplicates", and we arrive at 25,249 conformations from the original 125,000 conformations. On these 25,249 we can then run the affinity propagation; using preference= −1500Å 2 we arrive at 1554 representatives. Plotting the distance from the nearest representative (blue curve) over the trajectories now shows an even average distance: The red curve is a rolling average over 100 time points, it hovers around 0.4Å. The x-axis gives the time steps in the simulation. After every 2500 steps = 250ps the next trajectory starts, so in the image below there are data from the first two trajectories. In the blue curve, conformations that are selected as cluster representatives can be seen as time points in which the blue curve touches the x-axis (since the distance to the closest representative is then 0). We can see that there are regions that need more representatives (e.g. when folding happens), and regions which only have occasional representatives (in the more stable end phase), but the average distance stays approximately constant. While this takes care of any obvious bias towards common or stable positions, we also add a list of 1000 conformations that are as far away as possible from all previously selected conformations. These can be thought of as untypical or unstable conformations, and we want to make sure that our model works for them as well as it does for the maybe more typical cluster representatives. The MD simulations are performed with the SchNetPack [140] toolbox providing an interface to the Atomic Simulation Environment [141] to run MD simulations with machine learning models. SchNetPack includes a fully functional MD suite, which can be used to perform efficient MD and PIMD simulations in different ensembles. The SpookyNet [142] model is used to implement schnetpack.md.calculators.MDCalculator interface from SchNetPack. See figure S1 for the schematic and corresponding papers for more details. Both SpookyNet and SchNetPack are written in PyTorch and thus can be used to run MD simulations directly on GPU increasing performance (and decreasing time per simulation step) by orders on magnitude compared to CPU-based models. Molecular dynamics simulations of biomolecules An undulatory theory of the mechanics of atoms and molecules Recent advances in firstprinciples based molecular dynamics Scaling of multimillion-atom biological molecular dynamics simulation on a petascale supercomputer Anton, a special-purpose machine for molecular dynamics simulation QM/MM methods for biomolecular systems How large should the QM region be in QM/MM calculations? The case of catechol O-methyltransferase Machine learning force fields Reactive atomistic simulations of Diels-Alder reactions: The importance of molecular rotations Molecular force fields with gradient-domain machine learning: Construction and application to dynamics stability of polypeptide helices: Critical role of van der Waals interactions Andras Perczel, Alexander Rashin, and Imre G Csizmadia. α-and 3 10 -helix interconversion: A quantum-chemical study on polyalanine systems in the gas phase and in aqueous solvent The Coblentz society desk book of infrared spectra. National Standard Reference Data System Role of a salt bridge in the model protein crambin explored by chemical protein synthesis: X-ray structure of a unique protein analogue, [V15A]crambin-α-carboxamide UMAP: Uniform manifold approximation and projection for dimension reduction Computing vibrational spectra from ab initio molecular dynamics Molecular basis for higher affinity of SARS-CoV-2 spike RBD for human ACE2 receptor Molecular dynamics of an α-helical polypeptide: Temperature dependence and deviation from harmonic behavior Dynamics of folded proteins Are there pathways for protein folding? Dynamical strengthening of covalent and non-covalent molecular interactions by nuclear quantum effects at finite temperature GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation Super-resolution in molecular dynamics trajectory reconstruction with bi-directional neural networks Alchemical predictions for computational catalysis: Potential and limitations Alchemical derivatives of reaction energetics Theoretical study of anion binding to calix[4]pyrrole: The effects of solvent, fluorine substitution, cosolute, and water traces Predicting the effects of basepair mutations in DNA-protein complexes by thermodynamic integration Effect of guanine to inosine substitution on stability of canonical DNA and RNA duplexes: Molecular dynamics thermodynamics integration study Site-specific thermodynamics: Understanding cooperativity in molecular recognition Explaining deep neural networks and beyond: A review of methods and applications Evolutionarily conserved pathways of energetic connectivity in protein families Origins of allostery and evolvability in proteins: A case study Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Open Babel: An open chemical toolbox ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost Solvated protein fragments data set PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges Psi4 1.1: An open-source electronic structure program emphasizing automation, advanced libraries, and interoperability Psi4NumPy: An interactive quantum chemistry programming environment for reference implementations and rapid development The individual and collective effects of exact exchange and dispersion interactions on the ab initio structure of liquid water Exploring the conformational preferences of 20-residue peptides in isolation: Ac-Ala 19 -Lys + H + vs Canonical sampling through velocity rescaling Polymorphic transitions in single crystals: A new molecular dynamics method Avogadro: an advanced semantic chemical editor, visualization, and analysis platform PyMOL: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr SchNetPack: A deep learning toolbox for atomistic systems Molecular dynamics simulations of a protein in the canonical ensemble Explicit reversible integrators for extended systems dynamics Mature HIV-1 capsid structure by cryoelectron microscopy and all-atom molecular dynamics Citizen scientists create an exascale computer to combat COVID-19 On the determination of molecular fields. -II. from the equation of state of a gas Force fields and molecular dynamics simulations. InÉcole thématique de la Société Française de la Neutronique High-dimensional potential energy surfaces for molecular simulations: From empiricism to machine learning Dynamic properties of force fields Polarizable force fields Polarizable force fields: History, test cases, and prospects Force field modeling of conformational energies: Importance of multipole moments and intramolecular polarization Beyond point charges: Dynamic polarization from neural net predicted multipole moments Accuracy and tractability of a Kriging model of intramolecular polarizable multipolar electrostatics and its application to histidine Multipolar electrostatics Minimal distributed charges: Multipolar quality at the cost of point charge electrostatics An empirical valence bond approach for comparing reactions in solutions and in enzymes ReaxFF: A reactive force field for hydrocarbons Generalized neural-network representation of high-dimensional potential-energy surfaces QCTFF: On the construction of a novel protein force field ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost First principles neural network potentials for reactive simulations of large molecular and condensed systems Four generations of high-dimensional neural network potentials Machine learning of accurate energyconserving molecular force fields Towards exact molecular dynamics simulations with machinelearned force fields Machine Learning Meets Quantum Physics Fast and accurate modeling of molecular atomization energies with machine learning Machine learning of molecular electronic properties in chemical compound space Assessment and validation of machine learning methods for predicting molecular atomization energies Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space Sparse learning of stochastic dynamical equations Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning Equivariant flows: Exact likelihood generative learning for symmetric densities Targeted adversarial learning optimized sampling Exhaustive state-to-state cross sections for reactive molecular collisions from importance sampling simulation and a neural network representation Improved protein structure prediction using potentials from deep learning Highly accurate protein structure prediction with Al-phaFold Highly accurate protein structure prediction for the human proteome Solving the quantum many-body problem with artificial neural networks Ab initio solution of the many-electron Schrödinger equation with deep neural networks Deepneural-network solution of the electronic Schrödinger equation Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions A deep neural network for molecular wave functions in quasi-atomic minimal basis representation SE(3)-equivariant prediction of molecular wavefunctions and electronic densities Machine learning of solvent effects on molecular spectra and reactions Deep reinforcement learning for de novo drug design Molecularrnn: Generating realistic molecular graphs with optimized properties Generating equilibrium molecules with deep neural networks Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules Generating valid Euclidean distance matrices Efficient multi-objective molecular optimization in a continuous latent space Inverse design of 3d molecular structures with conditional generative neural networks Machine learning the ropes: Principles, applications and directions in synthetic chemistry Commentary: The materials project: A materials genome approach to accelerating materials innovation Predicting crystal structure by merging data mining with quantum mechanics Accelerating the discovery of materials for clean energy in the era of smart automation Machine learning for molecular and materials science Use machine learning to find energy materials Machine learning for alloys Machine learning for molecular simulation Exploring chemical compound space with quantum-based machine learning Ab initio machine learning in chemical compound space Retrospective on a decade of machine learning for chemical discovery Machine learning for chemical discovery Combining machine learning and computational chemistry for predictive insights into chemical systems Unsupervised learning methods for molecular simulation data Machine learning for chemical reactions Machine learning for electronically excited states of molecules Polymorphic transitions in single crystals: A new molecular dynamics method Canonical sampling through velocity rescaling Improved side-chain torsion potentials for the Amber ff99SB protein force field How to run molecular dynamics simulations on electrospray droplets and gas phase proteins: Basic guidelines and selected applications Clustering by passing messages between data points SchNetPack: A deep learning toolbox for atomistic systems The atomic simulation environment -a Python library for working with atoms SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects Three-dimensional structure of the water-insoluble protein crambin in dodecylphosphocholine micelles and its minimal solvent-exposed surface Knowledgebased protein secondary structure assignment