Submitted 4 September 2019 Accepted 10 February 2020 Published 16 March 2020 Corresponding author F. Marty Ytreberg, ytreberg@uidaho.edu Academic editor James Procter Additional Information and Declarations can be found on page 10 DOI 10.7717/peerj-cs.264 Copyright 2020 Mirabzadeh and Ytreberg Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Implementation of adaptive integration method for free energy calculations in molecular systems Christopher A. Mirabzadeh1 and F. Marty Ytreberg1,2,3 1 Department of Physics, University of Idaho, Moscow, ID, United States of America 2 Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, United States of America 3 Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States of America ABSTRACT Estimating free energy differences by computer simulation is useful for a wide variety of applications such as virtual screening for drug design and for understanding how amino acid mutations modify protein interactions. However, calculating free energy differences remains challenging and often requires extensive trial and error and very long simulation times in order to achieve converged results. Here, we present an implementation of the adaptive integration method (AIM). We tested our implementation on two molecular systems and compared results from AIM to those from a suite of other methods. The model systems tested here include calculating the solvation free energy of methane, and the free energy of mutating the peptide GAG to GVG. We show that AIM is more efficient than other tested methods for these systems, that is, AIM results converge to a higher level of accuracy and precision for a given simulation time. Subjects Computational Biology, Scientific Computing and Simulation Keywords Adaptive integration, Monte Carlo, Free energy, Solvation, Protein, Biomolecule INTRODUCTION Measuring free energy differences using computer simulations can be computationally expensive, yet is useful for many different applications (see e.g., Steinbrecher & Labahn, 2010; Chodera et al., 2011; Mobley et al., 2012; Zhan et al., 2013; Miller et al., 2014; Petukh, Li & Alexov, 2015; Zhan & Ytreberg, 2015; Miller et al., 2016; Cournia, Allen & Sherman, 2017; Hossain et al., 2019; Aminpour, Montemagno & Tuszynski, 2019). Specific examples include determining protein conformational preferences, virtual screening for drug design or drug discovery (Steinbrecher & Labahn, 2010; Chodera et al., 2011; Zhan & Ytreberg, 2015; Śledź & Caflisch, 2018; Aminpour, Montemagno & Tuszynski, 2019; Zhang et al., 2019). Of specific relevance to the current study is that free energy calculations allow prediction of how amino acid mutations may modify protein-protein binding (Zhan et al., 2013; Miller et al., 2014; Petukh, Li & Alexov, 2015; Miller et al., 2016; Geng et al., 2019). We are particularly interested in developing and implementing efficient methods for calculating free energy differences and using them to understand how amino acid mutations modify protein–protein and protein-substrate interactions. How to cite this article Mirabzadeh CA, Ytreberg FM. 2020. Implementation of adaptive integration method for free energy calculations in molecular systems. PeerJ Comput. Sci. 6:e264 http://doi.org/10.7717/peerj-cs.264 https://peerj.com/computer-science mailto:ytreberg@uidaho.edu https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.264 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj-cs.264 For this study, we have implemented the adaptive integration method (AIM) introduced by Fasnacht, Swendsen & Rosenberg (2004) for use in the GROMACS (Berendsen, Van der Spoel & Van Drunen, 1995) molecular dynamics simulation package, and have compared results to a suite of other methods. In previous studies, AIM was shown to provide high quality, precise and efficient estimates of binding free energies (Ytreberg, Swendsen & Zuckerman, 2006; Kaus, Arrar & McCammon, 2014; Kaus & Mccammon, 2015). We focus on alchemical free energy calculation where a system is transformed from one state to another via an unphysical pathway. The progress along the pathway that connects the two states is defined by the parameter λ. For this study, we are interested in two ways to explore λ space—both of which have the goal of obtaining equilibrium sampling of system configurations at discrete λ values along the pathway. The first way is to treat λ as a system variable that can be biased and sampled. A class of such methods, termed generalized ensemble, use an extended Hamiltonian to sample λ (Bitetti-Putzer, Yang & Karplus, 2003). For example, λ-dynamics (Kong & Brooks, 1996; Knight & Brooks, 2009; Knight & Brooks, 2011) treats λ as a dynamic particle in the system with fictitious mass. By contrast, AIM uses Metropolis Monte Carlo to sample λ space (Fasnacht, Swendsen & Rosenberg, 2004). Monte Carlo moves between values of λ are based on running estimates of free energy differences; this is a key distinction from other methods and allows AIM to continuously improve the estimate for the free energy during the simulation. The second way to explore λ space is to perform standard molecular dynamics or Monte Carlo simulations at fixed values of λ, typically discarding some simulation time for equilibration. The configurational ensembles at each value of λ can then be used to estimate free energy differences (see e.g., Lyubartsev, Førrisdahl & Laaksonen, 1996; Gonçalves & Stassen, 2004; Kofke, 2005; Shirts, Mobley & Chodera, 2007; Chodera & Shirts, 2011; Klimovich, Shirts & Mobley, 2015). In order to provide comparisons for AIM to fixed λ methods, we used the Python tool alchemical-analysis.py (Klimovich, Shirts & Mobley, 2015), part of the Pymbar package (Shirts & Chodera, 2008). This tool estimates the free energy using a suite of methods such as the Bennett acceptance ratio, multistate Bennett acceptance ratio, thermodynamic integration and exponential averaging. For the current study, we chose two molecular systems that have well-documented results and are important starting points for biomolecular free energy studies. First, we calculated the solvation free energy of methane. Simulations were performed and the free energies were calculated using the fixed λ methods provided by alchemical-analysis. Simulations were also performed using AIM and results compared to fixed λ simulations. Using the lessons learned from the methane system, we then calculated the free energy of mutating the peptide GAG to GVG in water. For both systems, we found that AIM produces free energy estimates that are within statistical uncertainty of fixed λ methods but with greater efficiency (i.e., more accurate for a given simulation time). METHODS All methods, code and simulation input files are available in the supplemental materials. For this study, we performed alchemical free energy simulations where the system is changed Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 2/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 from a reference state to an end state by constructing a reaction pathway that modifies, adds or removes atoms. Such alchemical simulations are non-physical, i.e., the simulation does not represent what could occur naturally. Since the free energy is a state variable, it is independent of the path taken, and we may provide any path we wish. To perform these simulations the reaction pathway is divided into many separate, non-physical, λ states between a reference state and an end state. The λ states represent the progress along the reaction pathway as the reference state transforms into the end state. Like most methods used to calculate free energies we start from the identity, F =U −TS, (1) where U is the potential energy, T is the temperature and S is the entropy of the system. For free energy differences we generalize the formulation of the change in free energy by separating calculations into two, non-overlapping, thermodynamic end states, A and B, at constant system temperature T , 1F ≡1FA→B=FB−FA=1U −T1S. (2) 1F is the change in free energy, 1U is the change in potential energy and 1S is the change in entropy of the system. According to statistical mechanics, the free energy difference between the two end states, A and B, of the system is the log of the ratio of the configurational partition functions (see discussion in Chipot & Pohorille (2007)), 1F =−kBT ln Z[UB(Ex)] Z[UA(Ex)] . (3) Here, kB is the Boltzmann constant and Z[U(Ex)] is the configurational partition function for the energy states UA(Ex) and UB(Ex), where Ex is the vector of configuration coordinates. The configurational partition function is given by Z[U(Ex)]= ∫ exp(−βU(Ex))dEx, (4) and β= 1kBT . Computationally, we calculate free energy differences between end states by performing molecular dynamics simulations along a reaction pathway of intermediate states, defined by λ, such that, 0≤λ≤1. This pathway connects the two end states of the system. In the case of poor overlap, where the end states may be separated by a high energy barrier, |UB−UA|�kBT , this pathway mitigates the otherwise very slow convergence of free energy estimates (Shirts, Mobley & Chodera, 2007). Care should be taken when choosing intermediate states such that there is adequate overlap in the conformation space between the end states (Shirts, Mobley & Chodera, 2007; Klimovich, Shirts & Mobley, 2015). For our simulations the number of λ values and time per λ were chosen through extensive trial and error (more on this below). The method of exponential averaging (DEXP, IEXP) (Zwanzig, 1954) starts from Eq. (3) above and then adding and subtracting exp(−βU(Ex)) from the integral in the configurational partition function of the numerator we end up with the final relationship, 1Fij =−kBT ln〈exp(−β1Uij(Ex))〉λi. (5) Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 3/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 where 1Fij is the free energy between λi and λj and 〈·〉λi represents an average of the equilibrium configuration for λi. Unlike some other methods, exponential averaging has an exact solution since it is only used to evaluate the difference between two states. However, it is the least efficient method and should not be used if difference in potential energies are much larger than kBT (Shirts & Pande, 2005). In addition, exponential averaging can be noisy, biased and dependent on the tails of the distribution of λ states (Bruckner & Boresch, 2011; Shirts & Pande, 2005). For thermodynamic integration (TI) we estimate the free energy by first looking at the derivative of Eq. (1) with respect to λ, ∂F ∂λ = 〈 ∂U ∂λ 〉 λ . (6) This differential equation, Eq. (6), can then be integrated to give, 1F = ∫ 1 λ=0 〈 ∂Uλ(Ex) ∂λ 〉 λ dλ (7) where the 〈·〉λ notation represents the ensemble average at a given intermediate state, λ. The free energy is estimated by numerically integrating Eq. (7) after running equilibrium simulations at each intermediate λ state. Since numerical integration is required, TI can be biased by the chosen method of integration. Some of that bias can be removed by using cubic-spline interpolation or more complex integration estimators(Shirts & Pande, 2005; Shyu & Ytreberg, 2009). The Bennett (Bennett, 1976) and multistate (Shirts & Chodera, 2008) Bennett acceptance ratio (BAR and MBAR) methods are far more efficient than exponential averaging and are commonly used to avoid the shortcomings of other methods (Shirts & Pande, 2005; Ytreberg, Swendsen & Zuckerman, 2006). BAR and MBAR typically achieve the same statistical precision as TI with fewerλstates unless the integrand for TI is very smooth (Shirts & Mobley, 2013; Ytreberg, Swendsen & Zuckerman, 2006). The complete derivation can be found in Bennett’s paper (Bennett, 1976) but the premise is; for sufficiently large samples ni of Ui and nj of Uj, 1F(i→ j)=kBT ln 〈f (1Uij+C)〉j 〈f (1Uji−C)〉i +C. (8) C is a shift constant, C =kBT ln nj ni , (9) and f (x) is the Fermi function, f (x)= 1 1+exp(βx) . (10) Equation (8) is the ratio of canonical averages of two different potentials Ui and Uj acting on the same configuration space meaning it requires information from two neighboring states. However, this limitation is not too much of a concern with a trivial coordinate Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 4/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 transformation or when using dummy coordinates in alchemical simulations. MBAR, an extension of BAR, differs in that it takes data from more than two states hence the name ‘‘multistate’’. AIM is similar to TI in that numerical integration of Eq. (7) is performed; the key difference is how the averages 〈∂U/∂λ〉λ are obtained. AIM uses Metropolis Monte Carlo to move in λ space and ordinary running averages are calculated at each λ value. In AIM, a random move from λold to λnew is accepted with probability min{1,exp(−β(Unew)−Uold)+β(Fnew−Fold)} (11) where Unew −Uold is the difference in the potential energy for the old and new λ values. Fnew−Fold is the estimated free energy difference based on the current running averages of ∂U/∂λ. Implementation AIM was implemented in GROMACS as an expanded ensemble calculation. That is, the Hamiltonian must be calculated along with its derivative, and an expanded ensemble step must be performed for every dynamics step. In GROMACS, nstexpanded is the number of integration steps between attempted λ moves changing the system Hamiltonian in expanded ensemble simulations. This value must be a multiple of nstcalcenergy, the number of steps before calculating the system energy, but can be greater or less than nstdhdl, the number of steps before calculating∂U/∂λ(referred to as dHdλ in GROMACS documentation). For a detailed explanation of all technical terms see reference Abraham et al. (2016). The GROMACS package was further altered to print out the ∂U/∂λ averages computed by AIM to the log file when AIM is used as the lmc-mover. AIM requires the ∂U/∂λ value from every dynamics step to be stored regardless of whether a move in λ space is attempted. Since ∂U/∂λ is only calculated at each step where free energies are calculated, every nstdhdl step, we set nstexpanded = nstdhdl = nstcalcenergy = 1 for AIM simulations. This further implies that lmc-stats functions were not used during AIM simulations because those functions modify the Hamiltonian which is not needed for AIM. For the implementation of AIM with GROMACS we follow the outline given in our previous study Ytreberg, Swendsen & Zuckerman (2006). 1. Start the simulation from an equilibrated configuration at λ=0 and perform one molecular dynamics step. 2. Randomly choose a trial move in λ space. For example, if our λ spacing is 0.05, a move from λ=0.35 to 0.4 or 0.3 may be attempted but not to 0.45. 3. Calculate the difference in potential energy between the trial and current λ values. 4. Estimate the free energy difference between the trial and current λ values using the running averages of ∂U/∂λ and the trapezoidal rule. 5. Accept λ trial with probability given in Eq. (11). 6. If the move is accepted then λ is updated to the trial value, otherwise the simulations stays at the current λ. 7. The running average of ∂U/∂λ is updated. Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 5/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 Simulation details The first system used here, methane in water, is detailed in systematic studies of force fields and the free energies of hydration of amino acid side chain analogs (Sun et al., 1992; Lyubartsev, Førrisdahl & Laaksonen, 1996; Chodera & Shirts, 2011; Paliwal & Shirts, 2011). For the GAG to GVG mutation the PMX (Gapsys et al., 2015) software package was used to construct the tri-peptide mutation. Using PMX, we generated the hybrid protein structure and topology for simulations of the chosen mutation, alanine to valine. All simulations described in this paper were performed using the molecular dynamics package GROMACS 5.1.4. The simulations were carried out at 300 K and solvated in a dodecahedron box with TIP3P waters. The methane molecule was parameterized using the OPLS (Optimized Potential for Liquid Simulations) force field (Jorgensen, Maxwell & Tirado-Rives, 1996). The OPLS force field was chosen for this study because it is known to perform well on small molecules (Shirts et al., 2003). In future studies, we anticipate using AIM on protein systems where other force fields are more appropriate such as AMBER (Salomon-Ferrer, Case & Walker, 2013) and CHARMM (Mackerell, Banavali & Foloppe, 2001). Since all molecular dynamics force fields have similar form and number of parameters, it is expected that the performance of AIM would not depend on the force field chosen. For the GAG to GVG mutations, Na+ and Cl- ions were added to keep the simulation box neutral and reach a physiologically relevant 150 mM salt concentration. For both systems, energy minimization was performed using steepest descent for 1,000 steps. The system was then equilibrated using simulated annealing for 1,000 ps to heat the system from 100 K to 300 K. For production simulations, electrostatic interactions were handled by Reaction field with a cut-off of 0.9 nm, Potential-shift-Verlet modifier and Verlet cutoff scheme. Van der Waals interactions were handled by twin range cutoffs with neighbor list cutoff of 1.15 nm and van der Waals cutoff of 0.9 nm. The bonds involving hydrogens were constrained with the Shake algorithm, allowing for a 2 fs time step. Long range dispersion corrections for energy and pressure were applied. For the free energy calculations, softcore scaling was used with parameters sc-power=1, sc-r-power=6 and sc-alpha=0.5. In addition, the van der Waals and Coulomb interactions were separately turned on or off as a function of λ. That is, one is held fixed as a function of λ while the other changes. For the methods processed with alchemical-analysis.py we ran fixed λ simulations. That is, an equal amount of simulation time was spent at each λ value. For AIM we ran expanded ensemble simulations where we alternate between taking molecular dynamics steps and attempting trial moves in λ space. That is, for AIM the amount of time spent at each λ value is determined by the algorithm. In order to determine the best distribution of intermediate λ states we followed a simple strategy: (i) Conduct short simulations with a small set of intermediates. (ii) Generate a plot comparing slope values between AIM and fixed λ (iii) Determine the locations of curvature in the estimate of the free energy. (iv) Increase the density of intermediate states in locations of high curvature. (v) Repeat until all areas of high curvature have been well explored. We note that these steps should be performed for any method where the slope of the free energy is used to calculate free energies, including both TI and AIM. Similar Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 6/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 Figure 1 Different λ densities for methane solvation free energy calculations. Eight trial simulations of 100 ps per λ for 11, 21 and 31 λ values. This shows how the number of λ values were chosen to effectively compare AIM to fixed λ simulations. The circles indicate the region where the λ density needed to be in- creased. Full-size DOI: 10.7717/peerjcs.264/fig-1 steps would be performed for methods such as DEXP, IEXP, BAR and MBAR to ensure that energy differences are not too large between λ values. RESULTS Methane After conducting short simulations, generating plots to determine locations of high curvature and increasing λ density in those regions, we averaged eight trial simulations of 100 ps per λ for separate λ distributions (see Fig. 1). We found, by progressively increasing the λ density between λ=0.5 and λ=1.0, that a distribution of 31 λ values gave us a dense enough distribution to properly compare AIM to fixed λ methods for the methane simulations. Figure 2 is a violin plot to visualize the distribution and probability densities over the eight trials for each method as a function of simulation time per value of λ. A violin plot combines a box plot and a density plot to show the shape of the distribution around the mean. The thick black bar in the center represents the interquartile range, the white dot is the median and the thin black line going vertically through the middle represents the upper and lower adjacent values. Reading a violin plot is similar to reading a density plot. The thicker parts represent high frequency values and the thinner parts represent low Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 7/14 https://peerj.com https://doi.org/10.7717/peerjcs.264/fig-1 http://dx.doi.org/10.7717/peerj-cs.264 Figure 2 Violin plot showing methane solvation results for 31 λ values averaged over eight trials. A vi- olin plot combines a box plot and a density plot to visualize the distribution and probability density. The graphic shows all methods have similarly converged at 1 ns per λ. AIM and AIM-CUBIC converge earlier than other methods at 750 ps per λ. Full-size DOI: 10.7717/peerjcs.264/fig-2 frequency values. The advantage of a violin plot over a box plot is that we are able to view the underlying distribution of the data. In Fig. 2 at 100 ps per λ most methods have similar standard deviations of around 0.05 kcal/mol (visualized by the height of the violin shape in the figure), but the slower convergence of MBAR in this case leads to a larger standard deviation of around 0.16 kcal/mol. By 750 ps AIM has converged to a smaller standard deviation of around 0.02 kcal/mol compared to the other methods at around 0.05 kcal/mol. GAG to GVG Mutation For the GAG to GVG mutation we first tested a distribution of 41 λ values averaged over 8 trial simulations of 1 ps and 100 ps per λ; see Fig. 3. By reviewing the smoothness of the function we concluded that 41 λ values was sufficient. Figure 4 shows the convergence of the free energy for GAG to GVG over time for each method. At just 1 ps per λ value the AIM estimates are within less than 1.0 kcal/mol of the converged (1 ns per λ) result with a standard deviation of around 0.5 kcal/mol. All other methods are more than 2.0 kcal/mol from this converged result with larger standard deviations of around 1.0 kcal/mol. At 100 ps all estimates are less than 0.1 kcal/mol from the converged result, but AIM estimates have a standard deviation of around 0.1 kcal/mol compared to other methods at 0.4 kcal/mol. All methods have similarly converged at 1 ns per λ. DISCUSSION In the limit of infinite sampling, all rigorous methods (i.e., statistical mechanics-based methods), performed properly with the same force-field and parameters, will yield the same result within uncertainty. Often, it is of interest to define accuracy by comparing results to experimental data. However, given that the purpose of this study is to compare various computational methods, we define accuracy by comparing results to the value upon which all methods converge. For fixed λ simulations the sampling time is typically the same for each λ state. Sampling time must be increased whenever convergence has not been achieved. However, if bias is introduced by using an insufficient number of λ values in regions of high curvature, increased sampling leads to radical convergence problems (Shyu & Ytreberg, 2009; Steinbrecher & Labahn, 2010). If the curvature of the underlying free energy slope values is large, averaging over a state space that is not dense enough Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 8/14 https://peerj.com https://doi.org/10.7717/peerjcs.264/fig-2 http://dx.doi.org/10.7717/peerj-cs.264 Figure 3 Different simulation times for alanine to valine mutation free energy calculations. Eight trial simulations of 41 λ values at 1 ps, 100 ps and 1 ns per λ. Note the smoothness of AIM versus fixed λ simu- lations. AIM requires less samples than fixed λ simulations to smooth the free energy function. Full-size DOI: 10.7717/peerjcs.264/fig-3 BA R MB AR IE XP T I TI -C UB IC DE XP AI M AI M- CU BI C 30.0 32.5 35.0 G (k ca l/m ol ) 1 ps BA R MB AR IE XP T I TI -C UB IC DE XP AI M AI M- CU BI C 100 ps BA R MB AR IE XP T I TI -C UB IC DE XP AI M AI M- CU BI C 1ns Figure 4 Violin plot showing alanine to valine mutation results for 41 λ values averaged over eight tri- als. The graphic shows all methods have similarly converged at 1 ns per λ. AIM and AIM-CUBIC con- verge more rapidly than other methods and are mostly converged at 100 ps per λ. Full-size DOI: 10.7717/peerjcs.264/fig-4 to fully describe the state function propagates this bias requiring significantly increased sampling time to achieve convergence. For TI, the bias will persist even for infinite sampling. In addition, increasing sampling time may not be realistic when dealing with limited computational resources. Paliwal & Shirts (2011) make a detailed argument to why convergence may not be possible for all systems due to hard limitations in computational resources. In particular, both TI and AIM are calculating the same slope averages and should agree very well for simple systems and reasonably long simulation times. However, before Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 9/14 https://peerj.com https://doi.org/10.7717/peerjcs.264/fig-3 https://doi.org/10.7717/peerjcs.264/fig-4 http://dx.doi.org/10.7717/peerj-cs.264 convergence is reached, due to the fact that AIM spends more time in some regions, we should not expect the approximation of AIM to exactly match TI with similar sampling time until the number of λ values has been sufficiently increased in high curvature regions. Once we have properly chosen the λ values then reasonably long simulations will lead to highly similar results between these two methods. AIM is able to estimate the free energy for the amino acid mutation within 1.0 kcal/mol at a total simulation time of only 41 ps. This is quite remarkable since ±1.0 kcal/mol is typically the range that is desired for mutation studies. Of course, further studies using a broader range of amino acid change are needed, but it suggests that AIM may be suitable for quick estimation. We believe the reason that AIM performs so well in such cases is due to the Monte Carlo sampling that allows AIM to more efficiently sample λ space compared to fixed λ simulations. The reader may note that AIM violates detailed balance since the acceptance criterion contains the free energy estimates that are updated continuously. AIM does however obey detailed balance asymptotically. As simulation time increases, the average free energy differences between λ values reach an equilibrium and detailed balance is satisfied. Once this equilibrium is attained the algorithm will sample all λ values equally, that is, the histogram of the number of configurations will become flat as a function of λ. CONCLUSION In this report we have implemented the adaptive integration method (AIM) for calculating free energy differences in GROMACS and applied it to two molecular systems. We have shown agreement within statistical uncertainty between AIM and a suite of fixed λ methods for methane solvation and an GAG to GVG mutation. We have also shown that AIM is more efficient than the other tested methods. That is, for a given amount of simulation time, AIM has a higher level of accuracy and precision. We anticipate these findings will extend to larger, more complex systems. Future studies will be performed to test whether this is the case. Further, we found that running longer simulations with too few intermediate λ states generated results that were inconsistent between methods. The density and sampling convergence of theλstates directly influences the agreement between all the tested methods. Since some states will contribute disproportionately to the variance of the estimate, we found that generating short test simulations of different λ densities before attempting longer simulations is advisable. ADDITIONAL INFORMATION AND DECLARATIONS Funding Support for this research was provided by the National Science Foundation (DEB 1521049 and OIA 1736253) and the Center for Modeling Complex Interactions sponsored by the National Institutes of Health (P20 GM104420). Computer resources were provided in part by the Institute for Bioinformatics and Evolutionary Studies Computational Resources Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 10/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264 Core sponsored by the National Institutes of Health (P30 GM103324). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Grant Disclosures The following grant information was disclosed by the authors: National Science Foundation: DEB 1521049, OIA 1736253. The Center for Modeling Complex Interactions sponsored by the National Institutes of Health: P20 GM104420. Institute for Bioinformatics and Evolutionary Studies Computational Resources Core sponsored by the National Institutes of Health: P30 GM103324. Competing Interests The authors declare there are no competing interests. Author Contributions • Christopher A. Mirabzadeh conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. • F. Marty Ytreberg conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. Data Availability The following information was supplied regarding data availability: All code used to generate the results, including scripts and modified GROMACS code are available in the Supplemental Files. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj-cs.264#supplemental-information. REFERENCES Abraham M, Van der Spoel D, Lindahl E, Hess B. 2016. GROMACS user manual version 5.1.4. Available at http//:www.gromacs.org. Aminpour M, Montemagno C, Tuszynski JA. 2019. An overview of molecular modeling for drug discovery with specific illustrative examples of applications. Molecules 24(9):Article 1693 DOI 10.3390/molecules24091693. Bennett CH. 1976. Efficient estimation of free energy differences from Monte Carlo Data. Journal of Computational Physics 22:245–268 DOI 10.1016/0021-9991(76)90078-4. Berendsen HJ, Van der Spoel D, Van Drunen R. 1995. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications 91(1–3):43–56 DOI 10.1016/0010-4655(95)00042-E. Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 11/14 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.264#supplemental-information http://dx.doi.org/10.7717/peerj-cs.264#supplemental-information http://dx.doi.org/10.7717/peerj-cs.264#supplemental-information http//:www.gromacs.org http://dx.doi.org/10.3390/molecules24091693 http://dx.doi.org/10.1016/0021-9991(76)90078-4 http://dx.doi.org/10.1016/0010-4655(95)00042-E http://dx.doi.org/10.7717/peerj-cs.264 Bitetti-Putzer R, Yang W, Karplus M. 2003. Generalized ensembles serve to improve the convergence of free energy simulations. Chemical Physics Letters 377(5–6):633–641 DOI 10.1016/S0009-2614(03)01057-1. Bruckner S, Boresch S. 2011. Efficiency of alchemical free energy simulations. I. A practical comparison of the exponential formula, thermodynamic integration, and Bennett’s acceptance ratio method. Journal of Computational Chemistry 32(7):1303–1319 DOI 10.1002/jcc.21713. Chipot C, Pohorille A. 2007. Free energy calculations. New York: Springer. Chodera JD, Mobley DL, Shirts MR, Dixon RW, Branson K, Pande VS. 2011. Alchemi- cal free energy methods for drug discovery: progress and challenges. Current Opinion in Structural Biology 21(2):150–160 DOI 10.1016/j.sbi.2011.01.011. Chodera JD, Shirts MR. 2011. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing. Journal of Chemical Physics 135(19):0–15 DOI 10.1063/1.3660669. Cournia Z, Allen B, Sherman W. 2017. Relative binding free energy calculations in drug discovery: recent advances and practical considerations. Journal of Chemical Information and Modeling 57(12):2911–2937 DOI 10.1021/acs.jcim.7b00564. Fasnacht M, Swendsen RH, Rosenberg JM. 2004. Adaptive integration method for Monte Carlo simulations. Physical Review E 69(5):056704-1–056704-15 DOI 10.1103/PhysRevE.69.056704. Gapsys V, Michielssens S, Seeliger D, De Groot BL. 2015. pmx: automated protein struc- ture and topology generation for alchemical perturbations. Journal of Computational Chemistry 36(5):348–354 DOI 10.1002/jcc.23804. Geng C, Xue LC, Roel-Touris J, Bonvin AM. 2019. Finding the 11G spot: are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? Wiley Interdisciplinary Reviews: Computational Molecular Science 9(5):e1410 DOI 10.1002/wcms.1410. Gonçalves PFB, Stassen H. 2004. Calculation of the free energy of solvation from molecular dynamics simulations. Pure and Applied Chemistry 76(1):231–240 DOI 10.1351/pac200476010231. Hossain S, Kabedev A, Parrow A, Bergström CA, Larsson P. 2019. Molecular simulation as a computational pharmaceutics tool to predict drug solubility, solubilization processes and partitioning. European Journal of Pharmaceutics and Biopharmaceutics 137(February):46–55 DOI 10.1016/j.ejpb.2019.02.007. Jorgensen WL, Maxwell DS, Tirado-Rives J. 1996. Development and testing of the OPLS all-atom force field on conformational energetics and properties of or- ganic liquids. Journal of the American Chemical Society 118(45):11225–11236 DOI 10.1021/ja9621760. Kaus JW, Arrar M, McCammon JA. 2014. Accelerated adaptive integration method. Journal of Physical Chemistry B 118(19):5109–5118 DOI 10.1021/jp502358y. Kaus JW, Mccammon JA. 2015. Enhanced ligand sampling for relative protein-ligand binding free energy calculations. Journal of Physical Chemistry B 119(20):6190–6197 DOI 10.1021/acs.jpcb.5b02348. Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 12/14 https://peerj.com http://dx.doi.org/10.1016/S0009-2614(03)01057-1 http://dx.doi.org/10.1002/jcc.21713 http://dx.doi.org/10.1016/j.sbi.2011.01.011 http://dx.doi.org/10.1063/1.3660669 http://dx.doi.org/10.1021/acs.jcim.7b00564 http://dx.doi.org/10.1103/PhysRevE.69.056704 http://dx.doi.org/10.1002/jcc.23804 http://dx.doi.org/10.1002/wcms.1410 http://dx.doi.org/10.1351/pac200476010231 http://dx.doi.org/10.1016/j.ejpb.2019.02.007 http://dx.doi.org/10.1021/ja9621760 http://dx.doi.org/10.1021/jp502358y http://dx.doi.org/10.1021/acs.jpcb.5b02348 http://dx.doi.org/10.7717/peerj-cs.264 Klimovich PV, Shirts MR, Mobley DL. 2015. Guidelines for the analysis of free energy calculations. Journal of Computer-Aided Molecular Design 29:397–411 DOI 10.1007/s10822-015-9840-9. Knight JL, Brooks CL. 2009. λ—dynamics free energy simulation methods. Journal of Computational Chemistry 30(11):1692–1700 DOI 10.1002/jcc.21295. Knight JL, Brooks CL. 2011. Multisite λ dynamics for simulated structure—activity relationship studies. Journal of Chemical Theory and Computation 7(9):2728–2739 DOI 10.1021/ct200444f. Kofke DA. 2005. Free energy methods in molecular simulation. Fluid Phase Equilibria 228–229:41–48 DOI 10.1016/j.fluid.2004.09.017. Kong X, Brooks CL. 1996. 3—dynamics: a new approach to free energy calculations. The Journal of Chemical Physics 105(6):2414–2423 DOI 10.1063/1.472109. Lyubartsev AP, Førrisdahl OK, Laaksonen A. 1996. Calculations of solvation free energies by expanded ensemble method. In: 2nd international conference on natural gas hydrates. Toulouse (France), June 2–6, 1996, (6). 311–318. Mackerell AD, Banavali N, Foloppe N. 2001. Development and current status of the CHARMM force field for nucleic acids. Biopolymers (Nucleic Acid Sciences 56:257–265. Miller CR, Johnson EL, Burke AZ, Martin KP, Miura TA, Wichman HA, Brown CJ, Ytreberg FM. 2016. Initiating a watch list for Ebola virus antibody escape mutations. PeerJ 4:e1674 DOI 10.7717/peerj.1674. Miller CR, Lee KH, Wichman HA, Ytreberg FM. 2014. Changing folding and binding stability in a viral coat protein: a comparison between substitutions accessible through mutation and those fixed by natural selection. PLOS ONE 9(11):e112988 DOI 10.1371/journal.pone.0112988. Mobley DL, Liu S, Cerutti DS, Swope WC, Rice JE. 2012. Alchemical prediction of hydration free energies for SAMPL. Journal of Computer-Aided Molecular Design 26(5):551–562 DOI 10.1007/s10822-011-9528-8. Paliwal H, Shirts MR. 2011. A benchmark test set for alchemical free energy transfor- mations and its use to quantify error in common free energy methods. Journal of Chemical Theory and Computation 7(12):4115–4134 DOI 10.1021/ct2003995. Petukh M, Li M, Alexov E. 2015. Predicting binding free energy change caused by point mutations with knowledge-modified MM/PBSA method. PLOS Computational Biology 11(7):1–23 DOI 10.1371/journal.pcbi.1004276. Salomon-Ferrer R, Case DA, Walker RC. 2013. An overview of the Amber biomolecular simulation package. Wiley Interdisciplinary Reviews: Computational Molecular Science 3(2):198–210 DOI 10.1002/wcms.1121. Shirts MR, Chodera JD. 2008. Statistically optimal analysis of samples from multiple equilibrium states. Journal of Chemical Physics 129(12):124105-1–124105-10 DOI 10.1063/1.2978177. Shirts MR, Mobley DL. 2013. An introduction to best practices in free energy cal- culations. In: Monticelli L, Salonen E, eds. Biomolecular simulations. Methods Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 13/14 https://peerj.com http://dx.doi.org/10.1007/s10822-015-9840-9 http://dx.doi.org/10.1002/jcc.21295 http://dx.doi.org/10.1021/ct200444f http://dx.doi.org/10.1016/j.fluid.2004.09.017 http://dx.doi.org/10.1063/1.472109 http://dx.doi.org/10.7717/peerj.1674 http://dx.doi.org/10.1371/journal.pone.0112988 http://dx.doi.org/10.1007/s10822-011-9528-8 http://dx.doi.org/10.1021/ct2003995 http://dx.doi.org/10.1371/journal.pcbi.1004276 http://dx.doi.org/10.1002/wcms.1121 http://dx.doi.org/10.1063/1.2978177 http://dx.doi.org/10.7717/peerj-cs.264 in molecular biology (Methods and Protocols), vol. 924. Totowa: Humana Press, 271–311 DOI 10.1007/978-1-62703-017-5. Shirts MR, Mobley DL, Chodera JD. 2007. Chapter 4 alchemical free energy calculations: ready for prime time? Annual Reports in Computational Chemistry 3(February 2016):41–59 DOI 10.1016/S1574-1400(07)03004-6. Shirts MR, Pande VS. 2005. Comparison of efficiency and bias of free energies com- puted by exponential averaging, the Bennett acceptance ratio, and thermody- namic integration. Journal of Chemical Physics 122(14):144107-1–144107-16 DOI 10.1063/1.1873592. Shirts MR, Pitera JW, Swope WC, Pande VS. 2003. Extremely precise free energy calculations of amino acid side chain analogs: comparison of common molecular mechanics force fields for proteins. Journal of Chemical Physics 119(11):5740–5761 DOI 10.1063/1.1587119. Shyu C, Ytreberg FM. 2009. Reducing the bias and uncertainty of free energy estimates by using regression to fit thermodynamic integration data. Journal of Computational Chemistry 30(14):2297–2304 DOI 10.1002/jcc.21231. Śledź P, Caflisch A. 2018. Protein structure-based drug design: from docking to molecular dynamics. Current Opinion in Structural Biology 48:93–102 DOI 10.1016/j.sbi.2017.10.010. Steinbrecher T, Labahn A. 2010. Towards accurate free energy calculations in ligand protein-binding studies. Current Medicinal Chemistry 17(8):767–785 DOI 10.2174/092986710790514453. Sun Y, Spellmeyer D, Pearlman DA, Kollman P. 1992. Simulation of the solvation free energies for methane, ethane, and propane and corresponding amino acid dipeptides: a critical test of the bond-PMF correction, a new set of hydrocarbon parameters, and the gas phase-water hydrophobicity scale. Journal of the American Chemical Society 114(17):6798–6801 DOI 10.1021/ja00043a027. Ytreberg FM, Swendsen RH, Zuckerman DM. 2006. Comparison of free energy methods for molecular systems. Journal of Chemical Physics 125(18):1–11 DOI 10.1063/1.2378907. Zhan YA, Wu H, Powell AT, Daughdrill GW, Ytreberg FM. 2013. Impact of the K24N mutation on the transactivation domain of p53 and its binding to murine double- minute clone 2. Proteins: Structure, Function and Bioinformatics 81(10):1738–1747 DOI 10.1002/prot.24310. Zhan YA, Ytreberg FM. 2015. The cis conformation of proline leads to weaker binding of a p53 peptide to MDM2 compared to trans. Archives of Biochemistry and Biophysics 575:22–29 DOI 10.1016/j.abb.2015.03.021. Zhang H, Liao L, Saravanan KM, Yin P, Wei Y. 2019. DeepBindRG: a deep learning based method for estimating effective protein—ligand affinity. PeerJ 7:e7362 DOI 10.7717/peerj.7362. Zwanzig RW. 1954. High-temperature equation of state by a perturbation method. I. Nonpolar gases. Journal of Chemical Physics 22(8):1420–1426 DOI 10.1063/1.1740409. Mirabzadeh and Ytreberg (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.264 14/14 https://peerj.com http://dx.doi.org/10.1007/978-1-62703-017-5 http://dx.doi.org/10.1016/S1574-1400(07)03004-6 http://dx.doi.org/10.1063/1.1873592 http://dx.doi.org/10.1063/1.1587119 http://dx.doi.org/10.1002/jcc.21231 http://dx.doi.org/10.1016/j.sbi.2017.10.010 http://dx.doi.org/10.2174/092986710790514453 http://dx.doi.org/10.1021/ja00043a027 http://dx.doi.org/10.1063/1.2378907 http://dx.doi.org/10.1002/prot.24310 http://dx.doi.org/10.1016/j.abb.2015.03.021 http://dx.doi.org/10.7717/peerj.7362 http://dx.doi.org/10.1063/1.1740409 http://dx.doi.org/10.7717/peerj-cs.264