key: cord-0992819-e1qsr8kg authors: Bhakat, Soumendranath title: Pepsin-like aspartic proteases (PAPs) as model systems for combining biomolecular simulation with biophysical experiments date: 2021-03-17 journal: RSC advances DOI: 10.1039/d0ra10359d sha: 6dd469162292f13b1479ed31a2586f49f0704bd6 doc_id: 992819 cord_uid: e1qsr8kg Pepsin-like aspartic proteases (PAPs) are a class of aspartic proteases which shares tremendous structural similarity with human pepsin. One of the key structural features of PAPs is the presence of a β-hairpin motif otherwise known as flap. The biological function of the PAPs is highly dependent on the conformational dynamics of the flap region. In apo PAPs, the conformational dynamics of the flap is dominated by the rotational degrees of freedom associated with χ1 and χ2 angles of conserved Tyr (or Phe in some cases). However it is plausible that dihedral order parameters associated with several other residues might play crucial roles in the conformational dynamics of apo PAPs. Due to their size, complexities associated with conformational dynamics and clinical significance (drug targets for malaria, Alzheimer's disease etc.), PAPs provide a challenging testing ground for computational and experimental methods focusing on understanding conformational dynamics and molecular recognition in biomolecules. The opening of the flap region is necessary to accommodate substrate/ligand in the active site of the PAPs. The BIG challenge is to gain atomistic details into how reversible ligand binding/unbinding (molecular recognition) affects the conformational dynamics. Recent reports of kinetics (K(i), K(d)) and thermodynamic parameters (ΔH, TΔS, and ΔG) associated with macro-cyclic ligands bound to BACE1 (belongs to PAP family) provide a perfect challenge (how to deal with big ligands with multiple torsional angles and select optimum order parameters to study reversible ligand binding/unbinding) for computational methods to predict binding free energies and kinetics beyond typical test systems e.g. benzamide–trypsin. In this work, i reviewed several order parameters which were proposed to capture the conformational dynamics and molecular recognition in PAPs. I further highlighted how machine learning methods can be used as order parameters in the context of PAPs. I then proposed some open ideas and challenges in the context of molecular simulation and put forward my case on how biophysical experiments e.g. NMR, time-resolved FRET etc. can be used in conjunction with biomolecular simulation to gain complete atomistic insights into the conformational dynamics of PAPs. Aspartic proteases are a class of enzymes which consist of two highly conserved aspartates in the active site. These enzymes use an active water molecule bound (by H-bond interaction) to aspartate for catalysis of their substrates. Based on their threedimensional structural similarity, aspartic proteases can be categorised into two categories: (1) pepsin-like aspartic proteases (PAPs) and (2) retroviral aspartic proteases (RAPs). 1 PAPs belong to the A01 family whereas RAPs belong to the A02 family of proteases within the MEROPS database 2 (Table 1) . In recent years, HIV protease (belongs to RAPs) gained signicant attention in context of structural and computational biology due to its role in HIV/AIDS. 3 Similar argument can be put forward for b-secretase enzyme, BACE1 which belongs to PAPs and it is a promising drug target for Alzheimer's disease. One of the comprehensive reviews ‡ on PAPs have been published by Ben Dunn in 2002. 1 Since then several blockbuster drug molecules have been proposed targeting BACE1. 4 However, conformational dynamics and molecular recognition in PAPs have not been extensively studied by biophysical experiments e.g. NMR, single molecule FRET, Raman spectroscopy etc. Recent computational studies on BACE1 and plasmepsins (malarial PAP) have identied key order parameters (dynamical The main aim of this section is to clarify the language, organize the concepts and dene structural features across a few representative PAPs. Common sequential features that are conserved in PAPs are as follows: (1) presence of two catalytic aspartates (oen known as catalytic site) and (2) the presence of -Asp-Thr-Gly-Ser-sequence (Fig. 1) . Structurally PAPs can be characterised by two main domains (1) the ap region and (2) the coil region. 3, 5, 6 The name coil region is not accurate. The so called coiled region consists of b-strand and coil structures. In this review, it will be denoted as CS region (coil-strand region) ( Fig. 2 and Table 2 ). The ap region acts as a lid over the catalytic site which governs the entry of the substrate or small molecule inhibitors. The extent of ap opening (described in detail in later sections) also dictates the volume of the binding site i.e. open ap conformation increases the volume of the binding site which can accommodate bigger ligands. The CS domain also plays a role in ligand/substrate binding but its role is not as pronounced as the ap region. One of the startling structural features of PAPs is the presence of conserved tyrosine and tryptophan residue. The tyrosine (Tyr) residue is highly conserved in the ap region of most of the PAPs. Tryptophan (Trp) is another conserved residue that is present in the PAPs. In most cases, Trp is present at the S1 region ( Fig. 3 ). However in case of BACE1 and BACE2, the conserved Trp is a part of the ap region (Fig. 3 ). Plasmepsin V (PlmV), an emerging drug target against malaria doesn't possess the conserved Trp residue. 14 Whereas in PlmIX and X, the conserved Tyr residue is replaced by phenylalanine (Phe) (Fig. 3 ). Visual inspection of several 3D crystal structure of Plm and BACE-1 shows that the orientation of Tyr and Trp inuences the extent of ap opening (Fig. 4) . In the next section, i will propose/review several order parameters that can be used to study the extent of ap opening and its connection with orientation of Tyr and Trp. Order parameter is a commonly used term in physical chemistry. A poor man's working denition of order parameter (sometimes referred as reaction co-ordinate) in context of biomolecular conformational dynamics can be expressed as follows: 'geometric or abstract co-ordinate systems which captures conformational changes along a pathway'. In biomolecular simulation, order parameters are oen known as collective variables (CVs). Some of the commonly used geometric order parameters are distance between two atoms, bond lengths, torsional angles etc. Abstract order parameters include vectorised linear combinations e.g. principal components. Time independent components, latent variable from variational auto-encoder etc. In this section, i will discuss how geometric and abstract order parameters can be used to capture conformational dynamics and molecular recognition in PAPs. Karubiu et al. proposed the Ca-Ca distance (DIST2) between catalytic Asp and the ap tip residue (ap tip residue varies across different PAPs) in PlmII as order parameters to capture the extent of ap opening in unbiased all-atom molecular dynamics (MD) simulation. 5 Rule of thumb to select ap tip residue. Residue number of the conserved Tyr/Phe + 1. For example, in PlmII it will be Val78 (77(Tyr) + 1). This distance criteria was inspired by the order parameter proposed to study spontaneous ap opening and closing in apo HIV protease. 21, 22 Karubiu et al. further proposed a distance metric (DIST3) which can capture opening/closing motion associated with CS domain (Fig. 5) . Recent study by Bhakat and Söderhjelm 6 showed that the uctuation of these two order parameters (DIST2 and DIST3) are uncorrelated which led to the hypothesis that in apo PAPs, the conformational dynamics of ap and CS domains are independent of each other (Fig. 12 ). Both DIST2 and DIST3 were further applied on BACE1 and other plasmepsins in order to gain insights into conformational dynamics from classical MD simulations. Bhakat and Söderhjelm performed test calculations (not published) where they have used DIST2 and DIST3 as CVs within metadynamics framework. The authors observed that the metadynamics biasing along these distances led to distorted ap conformations which suggests that these distance based order parameters are not the optimum choices to capture reversible conformational dynamics in PAPs but rather can be used as a quantitative measurement to capture the extent of ap/CS opening/closing. The authors further proposed two additional distance based order parameters COM ap and COM CS in case of PlmII with an aim that both these order parameters will simultaneously act as quantitative measurement of the conformational dynamics and CVs in enhanced sampling calculations e.g. metadynamics. COM ap : distance between centre of mass (COM) of the catalytic aspartates (only Ca atoms of catalytic aspartates) and COM of the ap region (Ca atoms of residues 58-88). COM CS : distance between COM of the catalytic aspartates (only Ca atoms of catalytic aspartates) and COM of the CS region (Ca atoms of residues 282-302). Authors further used COM ap and COM CS as CVs in twodimensional well-tempered metadynamics simulation. Metadynamics simulations with COM CVs showed signicant hysteresis and lack of reversible sampling of the conformational landscape as these CVs doesn't represent slow order parameters (more on this later) necessary for optimum conformational sampling (Fig. 13 ). Karubiu et al. further proposed a distance based order parameter (Fig. 5 ) which measures the distance (DIST1) between ap and CS region as a function of time. Due to uncorrelated dynamics of these regions, DIST1 doesn't provide any additional information compared to DIST2/DIST3. A few other distance based order parameter has also been proposed by different research groups. Gorfe and Caisch 9 proposed (in Fig. 12 shows that DIST2 and DIST3 are uncorrelated which gives an indication that in apo PAPs, the dynamics of flap and CS region are independent of each other. DIST1 acted as an auxiliary order parameter. These distance based order parameters can be generalised for other PAPs. The image was created using PDB:1LF4. context of BACE1) the distance between the Ca of the ap tip residue Thr72 and Cb of the catalytic aspartate as a measure of ap opening (Y). This order parameter is almost identical to the DIST2 proposed by Karubiu et al. The authors further proposed an analogous distance criteria of DIST1: distance between Ca atoms of ap tip Thr72 and Thr329 of CS region (Y 0 ). Spronk and Carlson 10 introduced an analogous distance parameter (in BACE1) of DIST2 (denoted as z) which measures the distance between the Ca of the Gln73 and the Cb of Asp32 (Fig. 6 ). Due to their similarity, either DIST2 or z can be applicable in PAPs to capture the extent of ap opening. Recently Shen and coworkers 23 proposed the following distance parameters to measure the extent of ap opening in human renin: (1) distance between Tyr-83-OH and Asp-38-CG and (2) distance between Ser-84-CB and Asp-226-CG ( Fig. 27 in ESI †). The distance between Tyr-83-OH and Asp-38-CG can be a deceptive measurement of the ap opening as the orientation of Tyr-83-OH is highly dependent on the rotational degrees (c1 and c2) of freedom of Tyr. Flipping (discussed later) of Tyr side-chain can change the orientation of the Tyr-83-OH. In that case, the distance between Tyr-83-OH and Asp-38-CG will not measure the true extent of ap opening. Distance between Ser-84-CB and Asp-38-CG consist of two stable anchor points which can act as an alternative to Tyr-83-OH and Asp-38-CG. Additional order parameters have been proposed to capture ap twisting. Spronk and Carlson proposed ap twisting angle, f as the dihedral angle involving Trp76C-Val69N-Thr72CA-Gln73CA (Fig. 7) . However, this denition is specialised for BACE1. A general order parameter for capturing ap twisting in PAPs can be expressed as a dihedral angle involving following residues: i + 5 À C-i À 2 À N-i + 1 À CA-i + 2 À CA where i is the residue number of conserved Tyr. Recently, Bhakat and Söderhjelm proposed the torsion angles (c1 and c2) of conserved Tyr (in PlmII and BACE1) as order parameters to capture conformational dynamics in PAPs. 6 A typical free energy prole of Tyr (along c1) has three free energy minimas which corresponds to three different sidechain orientations: gauche + , gauche À and trans. MD and metadynamics investigations using PlmII and BACE1 predicted that in apo PAPs both gauche + and trans conformations are equally populated § ( Fig. 8 and 9 ). Gauche + is stabilised by side-chain Hbond interaction between Tyr and Trp (denoted as normal state) whereas the trans conformation is stabilised by side-chain Hbond interaction between Tyr and one of catalytic Asp (denoted as ipped state, recently Shen and co-workers sampled this interaction using constant pH molecular dynamics on human renin) (Fig. 10) . Gauche À predicted to be the minor population which in case of BACE1 is stabilised by side-chain H-bond 1W50, 3TPL and 1SGZ differs in flap elevation and orientation of Tyr. For detail refer 6 ). As one can see simulations starting with 1W50 and 3TPL only sampled gauche + (G + ) conformation. Whereas, simulation starting with 1SGZ only sampled trans (T) conformation. Metadynamics simulation with c1 and c2 order parameters did a enhanced sampling of the conformational space. Looking at apparent free energy values (z axis unit in kJ mol À1 ) tells us that in apo BACE1 the flap remains in dynamic equilibrium between gauche + and trans conformations. Gauche À (G À ) is the minor population which was stabilised by H-bond interaction between Tyr and Lys. interaction between Tyr and Lys (PDB:1SGZ 25 ) ( Fig. 4 and 8) . A typical way to understand the rotation of Tyr and its connection with ap elevation is to plot a free energy prole involving DIST2/z against c1 (Fig. 10 ).{ Investigation of crystal structures of plasmepsin have shown that not only Tyr but the conserved Trp can also adapt ipped orientation (Fig. 11 ). Flipping of Trp (Table 3 ) combined with ap opening (measured using DIST2, see Table 4 ) leads to expansion of the binding pocket which can accommodate open ap inhibitors. The interplay between ap opening and dihedral order parameters of Tyr and Trp have further being { One can make a cos transformation of c1 which will handle the periodicity associated with c angles. exploited by Oefner and colleagues in human renin. 26 By increasing ap elevation by 1Å, rotating Tyr by À120 (ipped conformation) and displacing Trp slightly led to expansion of the binding sitek which led to identication of novel piperidine based inhibitors with micro-molar k i value. Further computational and experimental studies (X-ray crystallography and NMR) are necessary to understand if the combined ipping of Tyr and Trp is rare or a common phenomena in PAPs that correlates with ap opening and expansion of the binding site. Mutation of Tyr to Ala (amino acid with no bulky side-chain) in apo PlmII and BACE1 led to complete ap collapse in MD simulation which was captured by DIST2 (Fig. 12 ). This observation is consistent with previous experimental study by Suzuki and co-workers 29 which shows that Tyr to Ala mutation resulted in loss of enzyme activity in human renin. Further, mutation of Tyr to Thr, Ile, Val in chymosin also led to loss of enzyme activity. MD simulation based mutational study together with experiments led us to conclude that the conserved Tyr plays a critical role in enzyme activity. In future, similar experiments on PlmII and BACE1 are necessary to generalise the aforementioned claim. Substitution of Tyr to Phe in pepsin retains the enzyme activity. Structural investigation of Toxoplasma gondii PAP, PlmIX, PlmX and PlmV reveals the presence of Phe in place of Tyr (Fig. 4) . Since Phe possess rotational degrees of freedom along c1 and c2 order parameters hence it dictates the ap dynamics in the aforementioned PAPs. In future, this assumption can be validated by mutating Phe to Ala which results in loss of enzyme activity and ap collapse in MD simulation. Mutation of Trp and its effect on enzyme activity of PAPs has not been studied extensively. Park et al. 30 mutated conserved Trp (Trp39) of R. pusillus PAP with other residues and observed decreased enzyme activity. However, crystal structures of PlmV 14 from P. vivax and P. falciparum (homology modelled) doesn't possess conserved Trp. In future detailed computational and experimental studies are required in order to decipher the exact role of Trp in conformational dynamics of PAPs. MD simulations of PAPs can produce high-dimensional dataset with million or more data points. These data points in together describes conformational motion associated with protein. Recently several dimensionality reduction techniques (e.g. PCA, 31-33 tICA, 34-38 tSNE, 39-42 diffusion map 43, 44 ) have been proposed which can extract useful informations related to conformational dynamics from the plethora of the data generated during MD simulations. Principal component analysis (PCA) does a linear transformation which (in context of MD simulation) aims at nding motions that maximize the variance. Before I dive into how it can be used as an order parameter to capture conformational dynamics associated with PAPs, I will briey introduce the fundamental concept behind PCA. The rst principal component (oen denoted as PC1, PCA1 or z 1 ) is the linear trans formation of the original variables x 1 , x 2 , .., x p with a. Mathematically this transformation can be expressed as: The weight components (a 11 , a 12 , ., a 1p ) maximizes the variance of the original data x subject to the following constraint (normalization): Generally speaking k th PC can be expressed as z k ¼ a T k x,** has the maximum variance and uncorrelated with z 1 , z 2 , ., z kÀ1 . x is the eigenvector matrix. Cartesian coordinates for each time-step in a MD simulation denes each of the rows in x whereas, p columns of x consist of 3N cartesian co-ordinates for each atom. For detailed description of PCA in context of MD simulation please refer ref. 32 and 33 . PCA algorithm as a post processing tool for MD simulation have been implemented in several state-of-the art soware packages e.g. Gromacs, 45 MSMBuilder, 46 MDTraj 47 etc. In practice, rst few principal components capture the motions of maximal variance from MD trajectories. Bhakat and Söderhjelm performed two independent MD simulations ($500 ns) of PlmII Sampling of conformational space of apo PlmII using PC1 and PC2 as CVs in metadynamics (PCA). The authors also performed metadynamics simulations with COM CVs (COM). The sampling can be compared with metadynamics with c1 and c2 CVs (Fig. 9 ).Both PCA and COM based metadynamics didn't explicitly bias the torsional order parameters of conserved Tyr. Well-tempered metadynamics simulation with TIC1 and TIC3 as order parameters (TICA) did better sampling compared to PCA and COM CVs (ref. 6 ). and performed PCA using the Ca atoms of the combined trajectory (ignoring the tail part) using g covar tool integrated with Gromacs. The authors thought that rst few PCs will capture degrees of freedom associated with the ap opening of PlmII. Metadynamics simulations using PC1 and PC2 as order parameters did a poor sampling of the conformational space of apo PlmII. PCA using the Ca atomic co-ordinates didn't take into account the dihedral angles associated with Tyr (as described in previous section) hence it is not a surprise that the metadynamics with PC1 and PC2 as order parameters failed to enhanced the sampling of the conformational space in apo PlmII (Fig. 13 ). An alternative approach will be to perform PCA on the dihedral (oen known as dihedral PCA 31 ) space (using c1 and c2 angles of the ap region in apo PlmII) which in principle should able to capture Tyr mediated ap dynamics in PAPs. Second order independent component analysis (ICA) or otherwise known as time-lagged ICA (TICA) has also been applied in context of PAPs. While in PCA the aim was to nd orthogonal linear transformations (PCs) that maximizes the variance, the goal in TICA is to identify linear transformations where the vectors (ICs) are statistically independent and maximizes autocorrelation (Fig. 14) . ICA based approaches have been used widely to analyze time-series data from fMRI 48 and EEG 49 experiments. The rst TIC component/vector (oen denoted as TIC1) captures the slow (in context of timescale of biomolecular motions) dynamical mode from input high-dimensional time-series data (dihedral order parameters of the ap region in PAPs). TIC vectors can also be used as order parameters to perform well-tempered metadynamics simulations. Bhakat and Söderhjelm performed TICA analysis (for full mathematical description of TICA please refer 38 and 36 ) on the MD trajectory of apo PlmII to identify key dihedral order parameters (more precisely linear combination of dihedral angles with different weights for each angle) that captures slow dynamical modes from MD trajectory ( Fig. 15 and 16 ). The authors further used TIC vectors as order parameters in metadynamics simulation which enhances the sampling of ap dynamics compared to classical MD simulations (Fig. 13) . In principle analogous methods of TICA 51 i.e. VAC 52 and SGOOP 53 † † can also be applied on PAPs to identify and bias order parameters which capture slow dynamical modes (from the MD trajectory) corresponds to ap dynamics. Integration of several ICA algorithms e.g. Infomax, 55 kernel ICA, 37 JADE, 57 Fast ICA 58 with programming interfaces such as Python opens up the door to test the effectiveness of these methods in identifying order parameters from the highdimensional dihedral space of PAPs. An information theoretic approach for ICA has also highlighted its relation with MaxEnt 38,55,56 Binary classiers are a set of supervised machine learning algorithms which aims at classifying an object into one of the two possible categories e.g. classifying the pictures of cats from dogs. The concept of using binary classiers in automated selection of order parameters was rst introduced by Sultan and Pande. 59 However, one prerequisite of using binary classiers is that one has to sample the start and end states. In case of PAPs, these two states (otherwise known as state A and state B) can be normal and ipped states. If one extracts co-ordinates correspond to normal and ipped states from MD trajectory then the next step is to generate subset of order parameters (e.g. c1 and c2 angles of the ap region) which represents these two states. This is followed by training of the binary classier e.g. support vector machine (SVM), 60 passive-aggressive (PasAg) classier, 61 logistic regression, 62 perceptron 63 on the subset of order parameters. In case of PasAg classier, the classier decision boundary can be used as a order parameter to drive enhanced sampling calculation (Fig. 17) . One advantage of using binary classier based order parameters in metadynamics is that one can simultaneously bias multiple degrees of freedom (using parallel-bias metadynamics) which can lead to faster convergence (less chance for metadynamics to remain stuck along arbitrary orthogonal slow degrees of freedom) of metadynamics simulations. For example, if transition between normal to ipped state requires change in both c1 and c2 angles of Tyr and additional c1 and c2 angles of other amino acids (of the ap region), then binary classier based order parameter would allow addition of metadynamics bias along these additional features using a single order parameter (classier decision boundary). Initial calculations (unpublished) by the author have shown the effectiveness order parameters in sampling normal to ipped transition in apo PlmII ( Fig. 18 and 19 ). Another method that can be applied in capturing normal to ipped transition in PAPs is Fisher's linear discriminant analysis (LDA) 64 (Fig. 18 ). LDA has been routinely used in machine learning as a robust supervised classication method. Recently, Parrinello and co-workers have proposed a modied version of LDA, harmonic linear discriminant analysis (HLDA) 65 based order parameter which can be applied in context of normal to ipped transition in PAPs. However from a sampling point of view, i believe HLDA will not provide any additional benets compared to binary classier e.g. SVM, PasAg, logistic regression etc. Fig. 15 TIC vectors and corresponding weights assign to each features. In case of TIC1, feature index 21 has significant weight which corresponds to sin c2. Whereas in case of TIC3, feature index 3 has most significant weight which corresponds to sin c1. 2D well-tempered metadynamics simulations with TIC1 and TIC3 as CVs did a better sampling of the conformational space of apo PlmII compared to classical MD simulation (Fig. 13) . Note: technically it is possible to perform metadynamics simulations with all five TICs using parallel-bias metadynamics. This was the philosophy behind using alanine dipeptide, chignolin, BPTI (together these systems known as the mice model for simulation methods) as typical test systems for novel computational methods directed towards order parameter selection. As discussed before the c1 and c2 order parameters 6 of Tyr plays a critical role in conformational dynamics of apo PAPs. The rotation degrees of freedom associated with Tyr is believed to be one of the slow degrees of freedom which governs conformational dynamics of the ap. However, rotation degrees of freedom associated with conserved Trp and other residues present in the ap region also plays crucial role in overall conformational dynamics of PAPs. High dimensionality associated with dihedral angles of PAPs makes it an ideal play ground to test neural network based latent variable order parameters. In mathematical terms, latent variables are transformations that are not directly observed (probability distributions of dihedral angles) but are rather emerged (using mathematical transformations) from observed variables. Latent variable from bottleneck layer of neural network can effectively capture slow degrees of freedom from unbiased MD trajectories ( Fig. 21 and 20) . This type of neural network based order parameter can be further used to drive enhanced sampling calculations. Sidky et al. 66 and Wang et al. 67 laid out several machine learning algorithms that have been tested in context of biomolecular simulation. I believe most of these algorithms can be applied on PAPs with an aim to automatise identication of order parameters that governs its conformational dynamics. Bigger size of PAPs and prior knowledge on the role of conserved Tyr in ap dynamics (makes it easier to interpreted abstract order parameters) makes PAPs an ideal testing ground for novel machine learning algorithms (e.g. slow feature analyses, 68 LSTM 69 or transformer 70 like recurrent neural network). Can one design a deep learning algorithm which will take multiple X-ray structures as inputs and predict trial CVs that captures slow degrees of freedom which governs conformational dynamics of a biomolecule? One can use PAPs as a test case in this project. The variations in PDB structures of BACE1 and PlmII are mainly dominated by torsional order parameters associated with conserved Tyr. An output trial CV from a deep learning architecture should give higher weights to torsional order parameters associated with Tyr. Besides, it must capture some non-linear combinations of other torsional order parameters that vary among these X-ray structures. The trial CVs can be iteratively optimized using a metadynamics framework similar to VAC. 52 Order parameters to capture ligand binding/unbinding Ligand or substrate binding with PAPs require conformational change i.e. opening of the ap so that ligand/substrate can bind to the active site. Flap opening in PAPs is similar to the ap opening in HIV/SIV protease. 75 A generalised order parameter that can describe ligand binding/unbinding can be dened as follows: 'COM unbind : the distance between COM of the ligand heavy atoms and the COM of the catalytic aspartates (Ca atoms)' Plotting the aforementioned order parameter COM unbind in combination of DIST2 gives an indication of the extent of ap opening during ligand binding/unbinding. Further, plotting COM unbind with c1 and c2 angles of Tyr can give an idea on how ipping of conserved Tyr residue affects ligand binding/ unbinding. However, reversible sampling of ligand binding/unbinding using physical order parameters such as COM unbind is not a trivial task. In one such initial effort Bhakat and Söderhjelm used COM unbind order parameter within well-tempered metadynamics simulation. The results showed that ap opening is necessary for ligand unbinding (Fig. 22) . However, the challenge is to multiple reversible sampling of ligand binding/ unbinding in order to validate the role of ap opening with statistical certainty. Another open question which goes hand in hand with ligand binding: 'does ipping of conserved Tyr/Phe in PAPs necessary during ligand binding to accommodate the ligand in the active site?' Recently, Limongelli 77 reviewed several pathway based simulation methods that can capture conformational change induced ligand binding/unbinding. One of these simulation methods include funnel metadynamics 78 (facilitates frequent ligand binding/unbinding by using a funnel like restraint potential which reduces the sampling of the unbound state) and its variant volume-biased (sphere shaped restraint potential instead of funnel) metadynamics. 79 In theory it is possible to use COM unbind order parameter within funnel/ volume biased ‡ ‡ metadynamics framework to understand how conformational changes in PAPs induce ligand binding/ unbinding ( Fig. 23 and 24 ). Other enhanced sampling methods such as accelerated MD (aMD), 80-82 weighted ensemble (WE) method, 83-85 adiabatic-biased MD (ABMD) 86, 87 etc can also be tested on PAPs to explore conformational dynamics, kinetics and free energy associated with ligand binding/unbinding. Recently, macro-cyclic ligand bound structures of BACE1 were reported by Yen et al. 88 (PDB IDs: 6NV7, 6NV9, 6NW3) . The authors further reported experimental K i and K d values as well as several thermodynamic parameters (DH, TDS, and DG) of ligand binding (thermodynamic parameters were calculated using isothermal titration calorimetry (ITC) experiments). Keeping in mind the availability of experimental datas, size and number of torsion angles of the macro-cyclic ligands and large scale conformational dynamics of BACE1, it will be the perfect challenge for methods e.g. funnel metadynamics, volume biased metadynamics, variational autoencoder driven infrequent metadynamics, 89,90 WE, 91 aMD etc. To predict binding free energies and kinetics on clinically important protein-ligand systems beyond typical test cases e.g. benzamide-trypsin and on systems where the ligand is relatively small with fewer torsional angles and there is no large scale conformational dynamics upon ligand binding. Fig. 18 Weights assign to each features which were used as inputs. Feature 3 and 12 corresponds to c1 angle (3 and 12 represents sin c1 and cos c1 respectively) of Tyr (see Github for more details). One can clearly see that the weights corresponds to c1 angle is higher compared to other side-chains angle of the flap region. This gives a clear signal that the c1 angle of the conserved Tyr dictates transition between normal and flipped conformations. In case of LDA, the single value decomposition (LDA-SVD) was used as a solver (see scikit-learn for more options). Rotation of Tyr side-chain and opening of the ap region makes PAPs an ideal case to apply methods that combines molecular simulations with biophysical experiments. Combining MD simulation with NMR spin relaxation derived dynamical information [93] [94] [95] [96] [97] to capture residue-wise backbone/side-chain congurational entropy is a powerful method to understand stability of proteins and its changes upon perturbation. An analytical approach to capture congurational entropy is to measure residue-wise O 2 (oen written as S 2 ) from MD simulations. It has been shown that congurational entropy of protein can be described from dihedral distribution sampled during MD simulations. 95 For each side-chain dihedral angle u j , the probability distribution p(u j ) is determined by von-Mises kernel estimation which contributes toward the congurational entropy S: Fig. 20 Upper panel: timescale separation (otherwise known as spectral gap) along different TIC vectors (unbiased MD simulation of apo PlmII with length of $600 ns). TIC1 and TIC2 projected along c2 angle of Tyr-79 shows that c2 is one of the slow degrees of freedom in plasmepsin-II (corresponds with the higher weight on sin c2 in Fig. 15 . Middle panel: bottleneck layer of a variational autoencoder (VAE) 71 projected along TIC1 and TIC2. TIC1-2 were used as inputs in VAE. Different values of bottleneck layer represents different points along TIC vectors. One can see that the bottleneck layer did a better job in separating different metastable states compared to c2. Further, time-series projection of the bottleneck layer shows its fluctuation during MD simulation. The coefficients (weights) of the bottleneck layer can be used as order parameters in metadynamics simulations. This method in principal similar to combining RAVE 72 with SGOOP. 53 See the Github profile corresponding this article to see step-by-step guide on how to use variational autoencoder. Bottom panel: shows apparent free energy surface projected along TIC1, TIC2 and bottleneck layer. In this case, i have used a variational autoencoder with the following hyper-parameters: number of hidden layers ¼ 2, number of neurons in each hidden layer ¼ 20, number of epochs ¼ 100, batch size ¼ 500, learning rate ¼ 1e À 2. O 2 is mathematically connected with S using the following equation: where M denotes number of side-chain dihedral angles, A and B are tted parameters. One way to capture O 2 from MD simulation is to use principal component based method otherwise known as isotropic reorientation eigenmode dynamics (iRED). 98 In principal, O 2 derived from MD simulation can be compared with NMR spin relaxation experiments which can give us an idea of residue-wise dynamics associated with the ap region of PAPs (Fig. 25) . NMR experiments can further give The hypothesis is that the information output from z will follow principle of maximum entropy (the output will be dominated by input RCs with maximal information entropy). insights (both dynamics and kinetics) into ring ip associated with conserved Tyr/Phe and Trp in PAPs (ref. [99] [100] [101] . Besides combining NMR with MD simulation, time resolved uorescence spectroscopy, FRET and Raman spectroscopy § § can be applied to capture conformational dynamics of PAPs. This is an open area of research which didn't gain much attention from the biophysical or structural biology community. In theory, time-dependent order parameters from experiments (e.g. FRET) can be integrated with MD simulations using embedding theorem e.g. Taken's embedding theorem (ref. 104) . Ligand binding to PAPs can be calculated using isothermal titration calorimetry (ITC) 105 or differential scanning calorimetry (DSC) 106 which measures different thermodynamic parameters involved in the binding process. The major challenge in front of the simulation community is to calculate thermodynamic parameters from pathway based simulation methods (or from enhanced sampling calculations) that agrees with experiment. This is a growing area of research within biophysical community and PAPs can be an ideal test system for integrating biophysical experiments with biomolecular simulation. Ideal playing eld for drug repurposing Plasmepsins (especially PlmII, PlmIV, PlmV, PlmIX and X) are drug targets for malaria. 107, 108 Artemisinin and chloroquine are the two major blockbuster drugs against malaria. Emergence of artemisinin 109 and chloroquine 110 resistant P. falciparum possess a huge risk towards eradication efforts towards malaria in Africa and South-East Asia. This calls for identication of novel drug targets against malaria and plasmepsins has potential to become the next big target against malaria. BACE-1 is a promising drug target for Alzheimer's disease with several inhibitors in different stages clinical trials and several ligand bound crystal structures deposited in PDB. Due to structural similarity and similar mechanism of action which governs conformational dynamics of apo PlmII and BACE-1 (in general all PAPs), the ligands targeting BACE-1 can be repurposed on plasmepsins. Machine learning based methods (e.g. convolutional neural network) combined with molecular Fig. 23 Pictorial representation of a model funnel shaped potential that can be used to calculate binding free energy and binding/ unbinding induced conformational changes in PAPs.R cyl describes the radius of the cylindrical section and Z defines the axis to study binding/ unbinding. Z can be COM unbind or some other distance based order parameter. Z cc is the distance where the potential switches from cone to cylindrical shape. The effect of restraint potential can be rigorously calculated as described in ref. 92 . R cyl should be adjusted such that it doesn't affect the natural conformational dynamics of the ligand binding/unbinding. PAP-inhibitor complex provides a true challenge (compared to T4-lysozyme and other systems where the ligand has few torsional degrees of freedom and protein is relatively rigid) for funnel metadynamics as the ligands have large number of torsion angles and the protein undergoes large conformational changes during binding/unbinding. Abstract order parameters e.g. TICA/ SGOOP/VAC, path CVs can address this challenge. Funnel metadynamics using open flap inhibitors can in principle give insights into how flipping of Tyr and Trp accommodate bigger ligands which has direct application in ligand design. Pictorial representation of sphere shaped restraint potential applied in case of volume-biased metadynamics. Unlike funnel metadynamics, the sphere shape restraint potential is more general and doesn't assume any prior knowledge of the binding site. This method has been used to identify novel ligand binding pathways in T4-lysozyme. Using this method on PAP-ligand complex can find alternative ligand binding pathways as well as path CVs for binding/unbinding transition. The challenge is to set the radius of the sphere so that it doesn't hamper the large scale conformational dynamics of PAPs upon ligand binding/unbinding. § § Raman spectra can provide H-bonding pattern of Tyr and Trp side-chain residues. If Tyr residue shows Fermi resonance in Raman spectra that gives an indication of the rotational degrees of freedom associated with the Tyr side-chain. 103 docking, 111-113 free energy calculations 114 (e.g. FEP, ABFE, MM/ PBSA or MM/GBSA) and biochemical assay can then be used to identify BACE-1 inhibitors (in broad sense this concept is applicable for other PAP inhibitors e.g. human renin inhibitor) as potential plasmepsin inhibitors. This repurposing strategy can help to identify scaffolds{{ that can be further modied by synthetic chemists to develop potent molecules targeting Growing interest of artificial intelligence (AI) based drug discovery companies (e.g. Atomwise) within drug repurposing in the wake of COVID19 shows that AI can also be used within drug repurposing workflow in context of plasmepsins. {{ Without designing ligands from scratch. plasmepsins (Fig. 26) . I believe in future machine learning driven molecular docking combined with experiment will play an important role in repurposing other PAP inhibitors against plasmepsins. The main aim of this review was to highlight the possibilities of using PAPs as model systems to test novel simulation methods and to combine biomolecular simulation with biophysical experiments. In order to kick-start this effort I have curated a Github prole which contains inputs for performing biomolecular simulation and order parameters that can be monitored during molecular simulation. Although in practice there has been a few advances in application of state-of-the art biomolecular simulations on PAPs, a combination of biophysical experiments with biomolecular simulation (otherwise known as integrative structural biology 115 ) is far from a standard practice. Dynamic nature of PAPs makes it an ideal case to integrate experiment with molecular simulation using methods such as Bayesian-maximum entropy 116 etc. which can provide both atomic-level description and mechanistic understanding (e.g. population of normal/ipped states, role of force eld and water models in population of different states, role of Tyr/Phe in conformational dynamics and ligand binding, extent of ap opening etc.) of the PAPs. Finally, COVID-19 pandemic shows a resurgence of drug repurposing by combining molecular simulation with biochemical experiments. Projects like COVID Moonshot shows how an old simulation technique (free energy perturbation) can be effectively used in order to hunt novel drug candidates against COVID. Similar simulation strategies (AI driven molecular docking, free energy perturbation etc.) can be applied to re-purpose BACE1, human renin, pepsin inhibitors against plasmepsins followed up by experimental validation (biochemical assay, X-ray crystallography, NMR etc.). Further keeping in mind the neglected 117 status of malaria it is of utmost importance to share computational and experimental protocols openly using public data/code-sharing platforms e.g. Jupyter Notebook, Github, Google Collab etc. Finally, bigger and dynamical nature PAPs and their role in human disease makes it an ideal model system for experimentalists and computational chemists to work together towards developing new methods and applying old methods which will have direct application towards developing novel therapeutics against malaria, Alzheimer's disease etc. To date, no pathway based simulation methods able to predict thermodynamics and kinetic parameters of HIV protease inhibitors (for the sake of sample size we can focus on nine clinically approved drugs) and compare it with experiments. Even in the year 2020, the most widely used method for small molecule drug discovery (clinically relevant) is molecular docking (oen followed by visual inspection and FEP) which is computationally less expensive compared to physical pathway based methods. Two major problems of molecular docking are approximated energy functions and lack of sampling (especially of the protein). The remedy to this is a skin-in-the game problem. 'when the solution is about solving this very problem-Nassim Nicholas Taleb'. In D3R Grand Challenge 4 (ref. 118 ) which aims at predicting binding pose and free energy using BACE1, the top performing submissions didn't use pathway based sampling methods. Author declares no potential conicts of interest. Structure and Mechanism of the Pepsin-Like Family of Aspartic Peptidases The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database Flap Dynamics in Aspartic Proteases: A Computational Perspective BACE1 inhibitors: Current status and future directions in treating Alzheimer's disease Flap dynamics of plasmepsin proteases: insight into proposed parameters and molecular dynamics Flap dynamics in pepsin-like aspartic proteases: a computational perspective using Plasmepsin-II and BACE-1 as model systems Investigation of ap exibility of b-secretase using molecular dynamic simulations A comparative molecular dynamics study on BACE1 and BACE2 ap exibility Functional Plasticity in the Substrate Binding Site of bÀSecretase The role of tyrosine 71 in modulating the ap conformations of BACE1 Biophysical experiments and biomolecular simulations: A perfect match? Flexibility of the ap in the active site of BACE1 as revealed by crystal structures and molecular dynamics simulations Novel Uncomplexed and Complexed Structures of Plasmepsin II, an Aspartic Protease from Plasmodium falciparum Structural basis for plasmepsin V inhibition that blocks export of malaria proteins to human erythrocytes SWISS-MODEL: homology modelling of protein structures and complexes X-ray Structure of Plasmepsin II Complexed with a Potent Achiral Inhibitor Achiral, Cheap, and Potent Inhibitors of Plasmepsins I, II, and IV Fragment-Based Discovery of 2-Aminoquinazolin-4(3H)-ones As Novel Class Nonpeptidomimetic Inhibitors of the Plasmepsins I, II, and IV Conformational switching in an aspartic proteinase Crystal structures of native and inhibited forms of human cathepsin D: implications for lysosomal targeting and drug design HIV-1 protease aps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state HIV-1 protease aps spontaneously open and reclose in molecular dynamics simulations Exploring the pHdependent structure-dynamics-function relationship of human renin Apo and Inhibitor Complex Structures of BACE (b À secretase) Flap Position of Free Memapsin 2 (b-Secretase), a Model for Flap Opening in Aspartic Protease Catalysis Renin inhibition by substituted piperidines: A novel paradigm for the inhibition of monomeric aspartic proteinases? Exploiting Structural Dynamics To Design Open-Flap Inhibitors of Malarial Aspartic Proteases The three-dimensional structure of recombinant bovine chymosin at 2.3Å resolution Tyrosine 83 of renin has an important role in renin?angiotensinogen reaction Involvement of a residue at position 75 in the catalytic mechanism of a fungal aspartic proteinase, Rhizomucor pusilus pepsin. Replacement of tyrosine 75 on the ap by asparagine enhances catalytic efficiency Principal component analysis of molecular dynamics: On the use of Cartesian vs. internal coordinates Protein Dynamics: Methods and Protocols Principal Components Analysis: A Review of its Application on Molecular Dynamics Data, Annual Reports in Computational Chemistry Slow dynamics of a protein backbone in molecular dynamics simulation revealed by time-structure based independent component analysis What Is the Relation Between Slow Feature Analysis and Independent Component Analysis? Separation of a mixture of independent signals using time delayed correlations Modeling Molecular Kinetics with tICA and the Kernel Trick A Tutorial on Independent Component Analysis t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories How to Use t-SNE Effectively Visualizing Data using t-SNE Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach Rapid Exploration of Conguration Space with Diffusion-Map-Directed Molecular Dynamics GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers MSMBuilder: Statistical Models for Biomolecular Dynamics MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories Independent component analysis of functional MRI: what is signal and what is noise? Independent component analysis of EEG signals Data-Driven Model Reduction and Transfer Operator Approximation Accelerating Metadynamics by Using Kinetically Selected Collective Variables A variational conformational dynamics approach to the selection of collective variables in metadynamics Spectral gap optimization of order parameters for sampling complex molecular systems The Maximum Caliber Variational Principle for Nonequilibria An Information-Maximization Approach to Blind Separation and Blind Deconvolution Independent Component Analysis Independent component analysis: algorithms and applications, Neural Network Automated design of collective variables using supervised machine learning Support-vector networks Online Passive-Aggressive Algorithms Dual coordinate descent methods for logistic regression and maximum entropy models Large Margin Classication Using the Perceptron Algorithm PCA versus LDA Collective Variables from Local Fluctuations Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation Machine learning approaches for analyzing and enhancing molecular dynamics simulations Predictive Coding and the Slowness Principle: An Information-Theoretic Approach Long Short-Term Memory Polosukhin Attention Is All You Need Variational encoding of complex dynamics Past-future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics The Information Bottleneck Problem and Its Applications in Machine Learning Opening the Black Box of Deep Neural Networks via Information Structural insights into HIV-1 protease ap opening processes and key intermediates Structures of plasmepsin II from Plasmodium falciparum in complex with two hydroxyethylamine-based inhibitors Ligand binding free energy and kinetics calculation in 2020 Ligand binding free-energy calculations with funnel metadynamics Exhaustive Search of Ligand Binding Pathways via Volume-Based Metadynamics Accelerated molecular dynamics simulations of ligand binding to a muscarinic G-protein-coupled receptor Exploring ligand binding pathways on proteins using hypersound-accelerated molecular dynamics, bioRxiv Estimation of Drug-Target Residence Times by sÀRandom Acceleration Molecular Dynamics Simulations Escape of a Small Molecule from Inside T4 Lysozyme by Multiple Pathways Ranking of Ligand Binding Kinetics using a Weighted Ensemble Approach and Comparison with Milestoning Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore G Protein-Coupled Receptors -Modeling and Simulation Adiabatic bias molecular dynamics: A method to navigate the conformational space of complex molecular systems Development of an Efficient Enzyme Production and Structure-Based Discovery Platform for BACE1 Inhibitors Unbinding Kinetics of a p38 MAP Kinase Type II Inhibitor from Metadynamics Simulations A combination of machine learning and infrequent metadynamics to efficiently predict kinetic rates, transition states, and molecular determinants of drug dissociation from G protein-coupled receptors Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore Funnel metadynamics as accurate binding free-energy method Structural analysis of protein dynamics by MD simulations and NMR spin-relaxation On the relationship between NMR-derived amide order parameters and protein backbone entropy changes A Dictionary for Protein Side-Chain Entropies from NMR Order Parameters NMR Order Parameter Determination from Long Molecular Dynamics Trajectories for Objective Comparison with Experiment What NMR Relaxation Can Tell Us about the Internal Motion of an RNA Hairpin: A Molecular Dynamics Simulation Study General Framework for Studying the Dynamics of Folded and Nonfolded Proteins by NMR Relaxation Spectroscopy and MD Simulation Ring Flips Revisited: 13C Relaxation Dispersion Measurements of Aromatic Side Chain Dynamics and Activation Barriers in Basic Pancreatic Trypsin Inhibitor Slow ring ips in aromatic cluster of GB1 studied by aromatic 13C relaxation dispersion methods Infrequent cavity-forming uctuations in HPr from Staphylococcus carnosus revealed by pressure and temperature dependent tyrosine ring ips Information ow and protein dynamics: the interplay between nuclear magnetic resonance spectroscopy and molecular dynamics simulations Encapsulation of Aspartic Protease in Nonlamellar Lipid Liquid Crystalline Phases Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series Application of ITC-Based Characterization of Thermodynamic and Kinetic Association of Ligands With Proteins in Drug Design Differential scanning calorimetry as a tool to estimate binding parameters in multiligand binding proteins New paradigm of an old target: An update on structural biology and current progress in drug design towards plasmepsin {II} Plasmepsin Inhibitors in Antimalarial Drug Discovery: Medicinal Chemistry and Target Validation (2000 to Present) Artemisinin Resistance in Plasmodium falciparum Malaria Chloroquine-Resistant Malaria Guiding Conventional Protein-Ligand Docking Soware with Convolutional Neural Networks Deep neural network affinity model for BACE inhibitors in D3R Grand Challenge 4 AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations Interplay between Conformational Entropy and Solvation Entropy in Protein-Ligand Binding Bayesian maximum entropy approach and its applications: a review Understanding and creating value from open source drug discovery for neglected tropical diseases D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies Acknowledgements SB thanks Swedish A-kassa for the nancial support. The computations were performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at LUNARC (Lund University) and HPC2N (Umeå University).