key: cord-0744439-b56hk7jm authors: Amamuddy, Olivier Sheik; Verkhivker, Gennady M.; Bishop, Özlem Tastan title: Impact of emerging mutations on the dynamic properties the SARS-CoV-2 main protease: an in silico investigation date: 2020-05-29 journal: bioRxiv DOI: 10.1101/2020.05.29.123190 sha: 3a63149c3b7b3a91700b417120ad108aa0e8e0c2 doc_id: 744439 cord_uid: b56hk7jm The new coronavirus (SARS-CoV-2) is a global threat to world health and its economy. Its main protease (Mpro), which functions as a dimer, cleaves viral precursor proteins in the process of viral maturation. It is a good candidate for drug development owing to its conservation and the absence of a human homolog. An improved understanding of the protein behaviour can accelerate the discovery of effective therapies in order to reduce mortality. 100 ns all-atom molecular dynamics simulations of 50 homology modelled mutant Mpro dimers were performed at pH 7 from filtered sequences obtained from the GISAID database. Protease dynamics were analysed using RMSD, RMSF, Rg, the averaged betweenness centrality and geometry calculations. Domains from each Mpro protomer were found to generally have independent motions, while the dimer-stabilising N-finger region was found to be flexible in most mutants. A mirrored interprotomer pocket was found to be correlated to the catalytic site using compaction dynamics, and can be a potential allosteric target. The high number of titratable amino acids of Mpro may indicate an important role of pH on enzyme dynamics, as previously reported for SARS-CoV. Independent coarse-grained Monte Carlo simulations suggest a link between rigidity/mutability and enzymatic function. While several mutations are present in domain III, the current data suggests that these have 144 not (yet) further evolved at these positions, and probably suggests that there might a higher fitness 145 cost involved with mutating residues from this domain. It is also possible that under-sampling may 146 lead to a similar observation. Most of the mutations have occurred in solvent-exposed surfaces of 147 the protein, which may be under reduced selective pressure, especially if these are in loop regions. On the other hand, interfacial residue mutations (particularly at the chain interface) may exert their 149 effects in a more exacerbated manner by either over-stabilising or destabilising protein-protein 150 interactions, as reported in work on other proteins [42, 43] . One such mutation has already 151 happened at position 7 (A7V) in the N-finger region -a region that is vital for enzymatic activity 152 [31] . The local impacts of the A7V mutation are discussed in section 2.2. After the preliminary analysis of M pro at the sequence level, their 3D structures were built for 155 further investigation. The z-DOPE scores for the best homology models obtained for each sample 156 were all below -1, indicating that they were all native-like [51, 52] . Overall, z-DOPE values had a higher level of variation in residue protonation states amongst each of the seven histidine residues 161 found in each M pro protomer, only the catalytic residue and the non-synonymous mutations are 162 described herein, post homology modelling. We suspect that the high number of titratable amino 163 acids may play a role in influencing protein behaviour at varying pH levels. Previous work in SARS 164 CoV reports of a "pH-dependent activity-switch" of the main protease [53] . In our case, the catalytic 165 residue HIS41 was generally protonated at the delta nitrogen (HID) atom, but also occurred in its 166 fully protonated state (HIP) in one of the protomers for samples EPI_ISL_419710 and 167 EPI_ISL_425655. Protonated aspartic acid (ASH) was found in both protomers of sample 168 EPI_ISL_420510, which was the only isolate to contain the N151D mutation. ASH was also found in 169 sample EPI_ISL_421312, for only one of its protomers at residue position 289, which otherwise 170 occurs in its deprotonated form in all other samples. The R105H mutation (present only in sample 171 EPI_ISL_419984) occurs as a HID in each protomer. Similarly the Y237H mutation, present only in 172 sample EPI_ISL_416720 occurs as HID in both of its protomers. The local residue interactions around residue mutations of interest were also investigated, by 174 comparing them with their equivalent position in the M pro reference, using their modelled structure. These mutations comprised residues that underwent a higher number of mutations (G15D/S, 176 V157I/L and P184L/S) in addition to mutations that occurred at or close to the dimer interface (A7V 177 and A116V). A7V significantly increased the number of proximal interactions to neighbouring 178 residues when compared to the reference protein (Fig. 2) , and gaining in hydrophobic interactions, 179 although a clash in van der Waal radius is additionally present. G15D is found to increase the The Cα RMSD obtained after frame fitting to the initial frames and periodic image correction 196 ( Fig. 3A) We suspect that the changes may rather be observable at more local levels, such at the intra/inter-213 domain or residue levels. This twisting motion is summarised using the first non-trivial mode 214 (number 7) obtained from the anisotropic network model (ANM) of the reference protease (Fig. 3A ). The predictions 146, 147, 174, 29, 113, 163, 39, 124, 150, 7, 175, 38, 161 and 173, 295 ranging from 0.057 to 0.077 nm. On the 3D mapping ( Fig. 5A ), it can be seen that the regions with 296 the highest flexibility are solvent exposed surfaces, comprising loops or parts of helices connected 297 by loops. The central core of the enzyme has the lowest flexibility, most likely to provide structural 298 stability to the functional dimer. Catalytic residues (HIS41 and CYS145) are connected to these 299 stable core residues on one side. However, HIS41 is connected on the other side to a more mobile 300 structure composed of a 310 helix connected by loops on each end, which forms a lid structure, 301 similar to what is described earlier for previous human coronavirus strains [56, 57] . BC is a network centrality metric that is maximised when a large number of nodes (C α or GLY 312 C β atoms) traverse a single node along geodesic paths to reach nodes within a network. When 128, 115, 111, 112, 14, 18, 39, 11, 323 13, 146, 148, 147, 139, 6 and 140 Table S2 , residues positions 17 and 128 were found to occur as the most 332 common first two residues in all cases. In the previous work done in human heat shock protein, high BC residues found within cavities were found to correspond to allosteric hotspots, as these had 334 been independently verified by the sequential application of external forces on protein residues 335 using the perturbation response scanning (PRS) method. In our case, however these two positions 336 are not found in cavities, and are rather buried structural units that are not very mobile in the short 337 to medium range. The high BC at M17 is possibly due to the increased stability imparted by the 338 dimer interface. Due to the high centrality of these residues, it is possible that mutations leading to 339 the alteration of their physicochemical activity may be accompanied by a decrease in the dimer 340 stability. The COM distance distributions between the M pro protomers (Fig. 6) From the distributions of Rg values for the catalytic site ( Fig. 9 .), we promptly observe the 407 asymmetry between substrate binding pockets coming from each protomer in each sample. Partial 408 symmetry is seen in few cases, such as EPI_ISL_419710, EPI_ISL_425242, EPI_ISL_423007, 416720, As done for the substrate binding cavity, the degree of compaction of the interprotomer cavities was 421 also measured. Here as well, the distributions are asymmetric (Fig. 10) likely an artefact linked to the absence of the two C-terminal residues, as the residue is solvent RMSF of protein residues revealed the distribution of stable and flexible regions, thereby 506 allowing the assessment of the extent of mobility for mutational sites (Fig. 12, panel A) (S139, F140, Q189 ) also exhibited appreciable fluctuations, while residues G143, S144, H164,H163, 515 E166, P168 C145 showed only moderate changes and remained stable in CG simulations (Fig. 12, 516 panel A). Notably and as expected the catalytic residues C145 and H41 from the substrate binding 517 site also remained stable. The analysis generally showed that domain II residues were stable, while 518 domain III (residues 198 to 303) showed more flexibility, especially in the peripheral solvent- It is particularly important to dissect the connection between the function of some key residues 546 and their contribution in collective movements. The dimerisation residues (R4, M6, S10, G11, E14, N28, S139, F140, S147, E166, E290, R298) are characterized by different local flexibility but tend to 548 correspond to low moving regions of the protein in collective motions (Fig. 12, panel C, D) . The key 549 substrate binding residues (H163, H164, M165, E166, and L167) are located at the very border of 550 structurally immobilized and more flexible regions, and as such may constitute a hinge region that 551 controls cooperative movements. Notably, some other binding site residues D187, R188, Q189, T190, 552 and A191 are more flexible in slow modes and may undergo functional motions. Substrate 553 recognition sites tend to exhibit structural flexibility and sequence variations so as to enable specific 554 recognition required for mediating substrate specificity. We also explored the functional dynamics 555 profile of the ligand bound protease complex (Fig. 13 ). This structural map clearly illustrated that 556 the ligand binding site is comprised of both rigid and flexible residues and located in the region 557 that bridges area of high and low structural stability. In particular, we highlighted that residues S46, M49, T190, A191 in the substrate recognition site and in the ligand proximity may belong to moving 559 regions in the global motions (Fig. 13, panel B) . Our analysis shows that structural clusters of mutations may be distinguished by their 579 evolution propensity and global mobility in slow modes regions. The mobile residues may be 580 predisposed to serve as substrate recognition sites, whereas residues acting as global hinges during 581 collective dynamics are often supported by conserved residues. The observed conservation and 582 mutational patterns may thus be determined by functional catalytic requirements, structural 583 stability and geometrical constraints, and functional dynamics patterns. We found that some sites 584 and corresponding mutations may be associated with dynamic hinge function. The mutability of 585 hinge sites (Y101C, R105H, P108S, V157I/L, C160S, A173V, P184L/S) and nearby sites (T190I, A191V, 586 A193V) may be related with their structural and dynamic signatures to reside in the exposed 587 protein regions rather than in the more conserved protein core. We could also conclude that these 588 sites are located near the active site and control in the bending motions needed for catalysis, so their oriented package [50] . In each simulation, the total number of cycles was set to 1,000 and the 652 number of cycles between trajectory frames was 100. Accordingly, the total number of generated suggesting an active viral adaptation in the human host, trying to find a trade-off between viral 696 fitness and immune egress. More generally, regions of lowest flexibility (and high BC) were core 697 residues, while solvent-exposed loops were most flexible. All samples displayed a slight 698 interprotomer twisting motion. A high degree of variation was observed in (1) Additionally, we report of a possible set of dynamic hinging residues and their tendency to acquire 715 mutations in exposed protein regions, whilst being grounded by less mutable core residues. As a final note, it is important to be aware that there is an inherent lack of sampling depth in Domains I-III are annotated as red, blue and orange strips respectively. The N-finger is in cyan while the linker 727 is coloured green. Active site residues are denoted by the letter "x". is in cyan while the linker is coloured green. Active site residues are denoted by the letter "x". Table S1 . Acknowledgement and sequence support. Table S2 . High BC residues are shown for each chain (A and B). et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin Structure-based 753 design of antiviral drug candidates targeting the SARS-CoV-2 main protease Covid-19: death rate is 0.66% and increases with age, study estimates Endothelial cell infection and endotheliitis in COVID-19 Presenting Characteristics, Comorbidities, and 762 Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area The COVID-19 Pandemic: 765 A Comprehensive Review of Taxonomy The epidemiology and pathogenesis of coronavirus disease (COVID-19) 768 outbreak COVID-19, school closures, and child poverty: a social crisis in the 770 making Data, disease and diplomacy: GISAID's innovative contribution to global 772 health Virus Variation Resource -improved response to emergent viral outbreaks Why are RNA virus mutation rates so damn high? HIV-1 subtype distribution and its demographic 780 determinants in newly diagnosed patients in Europe suggest highly compartmentalized epidemics. 781 Retrovirology Are All Subtypes Created Equal? The Effectiveness of 783 Nextstrain: real-time tracking of pathogen evolution Genotype and phenotype of COVID-19: Their roles in pathogenesis Highlight of Immune Pathogenic Response and Hematopathologic Effect in 792 SARS-CoV, MERS-CoV, and SARS-Cov-2 Infection Network Medicine Framework for Identifying Drug Repurposing Opportunities 796 for COVID-19 Network-based drug repurposing for novel 798 coronavirus 2019-nCoV/SARS-CoV-2 In silico discovery of candidate drugs against covid-19 Discovery of Potential Multi-Target-Directed Ligands by Targeting Host-specific SARS-803 CoV-2 Structurally Conserved Main Protease $ An investigation into the identification of potential 806 inhibitors of SARS-CoV-2 main protease using molecular docking study Mpro from COVID-19 virus and discovery of its inhibitors In silico studies on therapeutic agents for COVID-19: Drug repurposing 811 approach Machine learning 813 using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of Anti-SARS Drugs Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease 818 (3CLpro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing 819 candidates Structures 821 of Two Coronavirus Main Proteases: Implications for Substrate Binding and Antiviral Drug Design Design of Wide-Spectrum Inhibitors Targeting Coronavirus Main Proteases Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide 828 inhibitors. Science (80-. ). 2020, 412 Dimer Interface Results in the Complete Crystallographic Dimer Dissociation of Severe Acute 831 Respiratory Syndrome Coronavirus 3C-like Protease Structure of coronavirus 834 main proteinase reveals combination of a chymotrypsin fold with an extra ? -helical domain Dissection Study on the Severe Acute Respiratory Syndrome 3C-like Protease 837 Reveals the Critical Role of the Extra Domain in Dimerization of the Enzyme Structural basis of SARS-CoV-2 3CLpro 840 and anti-COVID-19 drug discovery from medicinal plants MD-TASK: a software suite for analyzing molecular dynamics trajectories Porcine respiratory coronavirus differs from transmissible 846 gastroenteritis virus by a few genomic deletions A new look at an old virus: patterns of mutation accumulation in the human 849 H1N1 influenza virus since 1918 Ribavirin and lethal mutagenesis of poliovirus: molecular 854 mechanisms, resistance and biological implications Increased Fidelity Reduces Poliovirus Fitness and Virulence under 857 Selective Pressure in Mice 859 Update of the Drug Resistance Mutations in HIV-1 Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human 865 health Structure-Based Analysis of Single Nucleotide 867 Variants in the Renin-Angiotensinogen Complex ProDy: Protein Dynamics Inferred from Theory and Experiments VMD: Visual molecular dynamics Protein modeling and structure prediction with a reduced representation Structural Flexibility and Large-Scale Dynamics: Coarse-Grained Simulations and Elastic Network Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields CABS-flex standalone: 884 a simulation environment for fast modeling of protein flexibility Comparative Protein Modelling by Satisfaction of Spatial Restraints MODELLER: A Program for Protein Structure Modeling Release 9 The crystal 891 structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Carbon-Oxygen Hydrogen Bonding in Biological Structure and Function Structure of Main Protease from Human Coronavirus 898 NL63: Insights for Wide Spectrum Anti-Coronavirus Drug Design Global Infectious Human Coronavirus, HCoV-HKU1 Allosteric Modulation of Human Hsp90α Conformational 904 The FTMap family of web servers for determining and characterizing ligand-binding hot spots of 907 proteins PyVOL: a PyMOL plugin for visualization, comparison, and 909 volume calculation of drug-binding sites PubChem 2019 update: improved access to chemical data PanDDA analysis group deposition --Crystal Structure CoV-2 main protease in complex with PCM-0102306 2020 RCSB Protein Data Bank: biological macromolecular structures enabling 918 research and education in fundamental biology, biomedicine, biotechnology and energy The PyMOL Molecular Graphics System architecture and applications Biopython: freely available Python tools for computational molecular 925 biology and bioinformatics Comparative Protein Modelling by Satisfaction of Spatial Restraints PDB2PQR: An automated pipeline for the 929 setup of Poisson-Boltzmann electrostatics calculations Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures Dassault Systèmes BIOVIA Discovery Studio Visualizer GROMACS: Fast, 936 flexible, and free Data Structures for Statistical Computing in Python The NumPy array: A structure for efficient numerical 942 computation SciPy 1.0: fundamental algorithms for scientific computing 945 in Python A Modern Open Library for the Analysis of Molecular 948 NGLview-interactive molecular graphics for Jupyter notebooks Evolving COVID-19 conundrum and its impact Funding: This research received no external funding. Acknowledgements: We would like to thank the Centre for High Performance Computing for providing the 740 computer time for the simulations. We also thank GISAID making available the COVID-19 genomic material, 741 the originating and submitting laboratories (acknowledged in Supplementary Table A1 ). Genetic sequences are 742 purposely not disclosed in this article due to the restrictions from GISAID, but are accessible from the 743 database. Conflicts of Interest: The authors declare no conflict of interest.