key: cord-0279262-ijc0pi01 authors: Wu, Nan; Strömich, Léonie; Yaliraki, Sophia N. title: Prediction of allosteric sites and signalling: insights from benchmarking datasets date: 2021-08-26 journal: bioRxiv DOI: 10.1101/2021.08.16.456251 sha: a7a238b169ad47f68dd6a4315571ae840c4312d4 doc_id: 279262 cord_uid: ijc0pi01 Allostery is a pervasive mechanism which regulates the activity of proteins in living systems through binding of a molecule at a distant site from the orthosteric site of the protein. The universality of allosteric regulation complemented by the benefits of highly specific, potentially non-toxic and protein activity modulating allosteric drugs makes uncovering allosteric sites on proteins invaluable for drug discovery. However, there are few computational methods to effectively predict them. Bond-to-bond propensity analysis, a recently developed method, has successfully predicted allosteric sites for a diverse group of proteins with only the knowledge of the orthosteric sites and the corresponding ligands in 19 of 20 cases. The method is based on an energy-weighted atomistic protein graph and allows for computationally highly efficient analysis in atomistic detail. We here extended the analysis onto 432 structures of 146 proteins from two existing benchmarking datasets for allosteric proteins: ASBench and CASBench. We further refined the metrics to account for the cumulative effect of residues with high propensities and the crucial residues in a given site with two additional measures. The allosteric site is recovered for 95/113 proteins (99/118 structures) from ASBench and 32/33 proteins (304/314 structures) from CASBench, with the only a priori knowledge being the orthosteric site residues. Knowing the orthosteric ligands of the protein, the allosteric site is identified for 32/33 proteins (308/314 structures) from CASBench. the entire protein structure as seen in multimeric proteins. We show in this work that bond-to-bond propensity analysis 115 achieves overall higher accuracy in the ASBench dataset. We further tested bond-to-bond propensities with a more 116 recent dataset, CASBench, which contains 91 protein entries with multiple crystal structures [33] . We evaluated the 117 allosteric site prediction performance of our method in these datasets based on the four statistical measures used in 118 [53] and two new measures introduced in this work. This is due to the other site (highlighted in blue) being in close proximity to the orthosteric site where direct interactions, 164 instead of long-range coupling, occur between the two sites. 2W4I, 3MWB, 4B1F), similar to those of 1Z8D above, are in close proximity. The allosteric effect is not mediated by 173 long-range coupling and is thus not revealed by propensity analysis. It is worth noting that the allosteric sites are generally large in size based on the definition provided in the ASBench 175 database (residues within 6 Å from the allosteric ligand). In the previous bovine seminal ribonuclease (PDB ID: 11BG) 176 example, the allosteric site contains eight residues but only four residues form direct interactions with the allosteric 177 ligand. Defining the allosteric site using these four residues, which is essentially a sub-site of the original allosteric site, 178 and rerunning all calculations give slightly different results as shown in Table 2 . 179 Similarly, not all residues in the orthosteric site defined in the database interact with the orthosteric ligand or support its 187 binding. Due to the absence of orthosteric ligands in the structures from the ASBench database, comparisons between 188 using the orthosteric site residues and the orthosteric ligand as perturbation source cannot be achieved. When the orthosteric ligand is selected as the perturbation source, the allosteric site is detected for 308/314 structures 208 (32/33 proteins), according to at least one statistical measure. When using the orthosteric site residues as the source, 209 the allosteric site is detected for 304/314 structures (32/33 proteins), according to at least one statistical measure. It is 210 observed that, in general, the allosteric site of a protein structure can be identified with more statistical measures when 211 the orthosteric ligand is set as the perturbation source. If the orthosteric ligand is selected as the source, the source bonds include the weak bonds formed by the ligand and the 213 surrounding residues. The orthosteric site includes all residues within 5 Åof the orthosteric ligand [33] . Therefore, the 214 number of source bonds is much lower compared to when using the entire orthosteric site residues as the source. The orthosteric substrate is not available, it is viable to select the orthosteric residues as the perturbation source. The results presented here strengthen confidence in allosteric site identification as predicted by bond-to-bond propensity, Table S5 . Bond-to-bond Propensity was first introduced in Ref. [53] and further discussed in Ref. [58], hence it is only briefly summarised here. The edge-to-edge transfer matrix M was introduced to study non-local edge-coupling in graphs [74] and an alternative interpretation of M is employed to analyse the atomistic protein graph. The element M ij describes the effect that a perturbation at edge i has on edge j. M is given by where B is the n × m incidence matrix for the atomistic protein graph with n nodes and m edges; W = diag(w ij ) is an m × m diagonal matrix which possesses all edge interaction energies with w ij as the weight of the edge connecting nodes i and j, i.e. the bond energy between the atoms. L † is the pseudo-inverse of the weighted graph Laplacian matrix L [75]. L, which defines the diffusion dynamics on the energy-weighted graph [76] and is defined as: To evaluate the effect of perturbations from a group of bonds b , which belong to the orthosteric ligand or the orthosteric site residues (i.e., the source), on a bond b anywhere else in the protein, we calculate: This is the raw propensity of an individual bond which reflects how strongly the bond is coupled to the source. As different proteins contain different numbers of bonds, the raw propensity is normalised and the bond propensity is defined as: The residue propensity is then defined as the sum of normalised bond propensities of all the bonds of a residue, R: Bond and residue propensities naturally decrease as the distance of the bond or residue from the perturbation source increases. To determine the bonds and residues that are significant, bond and residue propensities at a similar distance from the source are compared using conditional quantile regression (QR) [77] . The distance of a bond b from the perturbation source is defined as the minimum distance, d b , between b and any bond of the source: where the vector x b contains the cartesian coordinates of the midpoint of bond b. As propensity b decays exponentially with distance d, a linear model for the logarithm of the propensities is adopted to solve the QR minimisation problem: where ρ p (·) is the tilted absolute value function: The residue quantile score of residue R is defined similarly by using the residue propensity as shown in eq. 5 and the distance d p which is the minimum distance between the atoms of a residue and those of the source. Therefore, and are used to calculate the residue quantile score. where N R , allosteric site is the number of residues in the allosteric site. 4. The average reference bond quantile score of the allosteric site: where N b , allosteric site is the number of bonds in the allosteric site. For the purpose of complementing these previous measures and to investigate more aspects of allosteric site detection, 320 two additional measures were introduced in this work: All data presented in this study are available upon request. Chapter 3 -Proteins Chapter 4: Protein Interactions and Disease Peptide phage display as a tool for drug discovery: targeting membrane 403 receptors Allostery in Disease and in Drug Discovery Biomolecular Modeling: Goals, Problems, Perspectives A study of communication pathways in methionyl-tRNA synthetase by molec-454 ular dynamics simulations and structure network analysis Activation pathway of Src kinase reveals intermediate states 458 as targets for drug design Molecular Dynamics Simulation for All Binding kinetics of darunavir to human immunodeficiency virus type 1 protease explain the 463 potent antiviral activity and high genetic barrier Normal mode analysis and applications in biological physics Normal mode analysis of protein dynamics Normal mode analysis of biomolecular structures: functional 469 mechanisms of membrane proteins Exploiting protein flexibility to predict the location of allosteric sites PARS: a web server for the prediction of Protein Allosteric and Regulatory Sites AlloPred: prediction of allosteric pockets on proteins using normal mode pertur-476 bation analysis Improved Method for the Identification and Validation of Allosteric Sites Structure-Based Statistical Mechanical Model Accounts for the Causal-480 ity and Energetics of Allosteric Communication Reversing allosteric communication: From detecting allosteric sites 484 to inducing and tuning targeted allosteric response Elastic network model of learned maintained contacts to predict protein motion Prediction of allosteric sites and mediating interactions 489 through bond-to-bond propensities Protein multi-scale organization through graph 492 partitioning and robustness analysis: application to the myosin-myosin light chain interaction Uncovering allosteric pathways in caspase-1 using 495 Markov transient analysis and multiscale community detection Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification Solving Linear Systems STOC '04 Systems in Nearly-Linear Time Allostery and cooperativity in multimeric proteins: bond-to-506 bond propensities in ATCase Computational characterisation of protein interaction sites: from small ligand pockets to large domain 509 interfaces Allosteric Hotspots in the Main Protease of SARS-CoV-2. 511 bioRxiv 2020 ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic 513 graphs of biomolecules A potential allosteric subsite generated by domain swapping in bovine seminal ri-515 bonuclease11Edited by A. R. Fersht The Effect of Hinge Mutations on Effector Binding 518 and Domain Rotation in Escherichia coli D-3-Phosphoglycerate Dehydrogenase The crystal structure of human muscle glycogen phosphorylase a with bound glucose 521 and AMP: An intermediate conformation with T-state and R-state features Cooperativity and allostery in haemoglobin 524 function Study of Functional and Allosteric Sites in Protein Superfamilies Co-repressor Induced Order 529 and Biotin Repressor Dimerization: A Case for Divergent Followed by Convergent Evolution A Python package for the construction of atomistic,491energy-533 weighted graphs from biomolecular structures Asparagine and glutamine: using hydrogen atom 535 contacts in the choice of side-chain amide orientation11Edited by J. Thornton Inorganic chemistry: principles of structure and 538 reactivity Hydrophobic Potential of Mean Force as a Solvation Function for 540 DREIDING: A generic force field for molecular simulations The nature of .pi.-.pi. interactions Structure of complex networks: Quantifying 547 edge-to-edge relations by failure-induced flow redistribution Algebraic graph theory Random Walks, Markov Processes and the Multiscale Modular 551 Organization of Complex Networks Quantile Regression An introduction to the bootstrap