key: cord-0961394-adhhwozi authors: Sobitan, Adebiyi; Mahase, Vidhyanand; Rhoades, Raina; Williams, Dejaun; Liu, Dongxiao; Xie, Yixin; Li, Lin; Tang, Qiyi; Teng, Shaolei title: Computational saturation mutagenesis of SARS-CoV-1 spike glycoprotein: stability, binding affinity, and comparison with SARS-CoV-2 date: 2021-06-30 journal: bioRxiv DOI: 10.1101/2021.06.30.450547 sha: d0bd7ec93066247599805b573f2d9e42b159c057 doc_id: 961394 cord_uid: adhhwozi Severe Acute respiratory syndrome coronavirus (SARS-CoV-1) attaches to the host cell surface to initiate the interaction between the receptor-binding domain (RBD) of its spike glycoprotein (S) and the human Angiotensin-converting enzyme (hACE2) receptor. SARS-CoV-1 mutates frequently because of its RNA genome, which challenges the antiviral development. Here, we performed computational saturation mutagenesis of the S protein of SARS-CoV-1 to identify the residues crucial for its functions. We used the structure-based energy calculations to analyze the effects of the missense mutations on the SARS-CoV-1 S stability and the binding affinity with hACE2. The sequence and structure alignment showed similarities between the S proteins of SARS-CoV-1 and SARS-CoV-2. Interestingly, we found that target mutations of S protein amino acids generate similar effects on their stabilities between SARS-CoV-1 and SARS-CoV-2. For example, G839W of SARS-CoV-1 corresponds to G857W of SARS-CoV-2, which decrease the stability of their S glycoproteins. The viral mutation analysis of the two different SARS-CoV-1 isolates showed that mutations, T487S and L472P, weakened the S-hACE2 binding of the 2003-2004 SARS-CoV-1 isolate. In addition, the mutations of L472P and F360S destabilized the 2003-2004 viral isolate. We further predicted that many mutations on N-linked glycosylation sites would increase the stability of the S glycoprotein. Our results can be of therapeutic importance in the design of antivirals or vaccines against SARS-CoV-1 and SARS-CoV-2. Author Summary Severe acute respiratory syndrome coronavirus (SARS-CoV-1) is an RNA virus that undergoes frequent mutations, which may result in more virulent SARS-CoV-1 variants. To prevent another pandemic in the future, scientists must understand the mechanisms of viral mutations and predict if any variants could become a dominant. The infection of SARS-CoV-1 in cells is largely depending on the interactions of the viral Spike (S) and human angiotensin-converting enzyme 2 (hACE2). We applied a computational method to predict S missense mutations that will make SARS-CoV-1 more virulent. We are interested in the variants that can change SARS-CoV-1 spike protein stability and/or change the virus-receptor interactions. We mutated each residue of SARS-CoV-1 spike to all possible amino acids; we calculated the differences between the folding energy and binding energy of each variant and the wildtype and identified the target S mutations with significant effects on protein stability and protein-protein interaction. We found some viral mutations could destabilize S and weaken S-hACE2 binding of SARS-CoV-1 isolate. Our results show that the computational saturation mutagenesis is a reliable approach in the analysis and prediction of missense mutations. The severe acute respiratory syndrome coronavirus (SARS-CoV-1) belongs to a family of coronaviridae that are enveloped, positive-strand RNA viruses (1) . In November 2002, the first case of SARS-CoV-1 occurred in the Guangdong province in China. The symptoms include upper respiratory infections, fever, chills, and general body weakness (2) . The other signs showing human-to-human transmission were coughing and sneezing. Horseshoe bat species might be the origin due to their sequence similarity (3) . By the end of the SARS epidemic in 2003, SARS had spread to over two dozen countries resulting in more than 8000 laboratory-confirmed cases and approximately 800 deaths (4) . The recent outbreak of a newer strain of coronavirus, SARS-CoV-2, began in December 2019 in the Wuhan city in China (5) . In a month, this new coronavirus had spread across the world due to global travels. Compared to SARS-CoV-1, SARS-CoV-2 has a higher infection rate. As of June 6, 2021, the number of confirmed global cases and global deaths due to SARS-CoV-2 are ~173 million and ~3.7 million, respectively (6) . The Spike protein (S) is a structural protein that protrudes outwards from the virus surface. The role of S is to mediate viral entry into the host's cells. Structural studies of SARS-CoV-1's S revealed the presence of two subunits, S1 (residues 12-667) is in the N-terminal and S2 (residues 667-1190) is in the C-terminal. Studies of mammalian coronaviruses with similarity to the SARS-CoV-1 showed that the S1 subunit helps with hACE2 receptor attachment, while the S2 subunit helps with the fusion (1) . Expression analysis showed that a fragment of the S1 subunit, the receptor-binding domain (RBD), residues 306 -527, is enough for tight binding to the human Angiotensin-converting enzyme 2 (hACE2) receptor (7) . A shorter fragment, residues 424 -494, within the RBD interacts with the hACE2 receptor in humans. This fragment, the receptor-binding motif (RBM), forms a loop that fits perfectly into the peptidase domain (PD) of the hACE2 receptor. The SARS-CoV-1 glycoprotein has two cleavage sites that promote viral infection. The first cleavage site is in the 667-668 residue positions. As the virus enters the host's cell, its spike protein is cleaved into the S1 and S2 subunit by protease activity. The second cleavage site is in the 797-798 residue positions. Cleavage at this position detaches the fusion protein from the S2 subunit and allows the fusion protein to bind with the host's membrane (8) . One similarity shared by SARS-CoV-1 and SARS-CoV-2 is that they use hACE2 as the receptor to enter human cells [10, 12] . Both SARS-CoV-1 and SARS-CoV-2 S proteins bind to the Peptidase domain in the N-terminal of the hACE2 receptor (1) . The other domain, the Collectrin domain, is found in the C-terminal of the hACE2 receptor. However, studies showed they exhibit varying binding affinities. A recent study reported that SARS-CoV-2 S protein has a 10-to-20 fold higher affinity to hACE2 than that of SARS-CoV-1 (7) . The interaction between the S protein of SARS-CoV-1 and the hACE2 receptor initiates entry into the human cell [12] . The higher affinity SARS-CoV-2 has for hACE2 may explain the virulent nature of its infection (10) . Another similarity is the sequence and structural homology between the S proteins of SARS-CoV-1 and SARS-CoV-2. However, despite the similarities, a study evaluated the binding of SARS-CoV-2 to experimentally verified monoclonal antibodies (mAbs) against SARS-CoV-1. The result showed a slight contrast in cross-reactivity, which had no binding between SARS-CoV-2 and the three mAbs (11) . This result supports the hypothesis that the slight difference in their sequences/structures might be responsible for the varying infectivity between SARS-CoV-1 and SARS-CoV-2. In a previous study, we indicated that SARS-CoV-2 has a stronger affinity towards hACE2 than SARS-CoV-1 because of its higher electric field density (12) . hACE2 plays a role in the renin-angiotensin pathway, that it maintains cardiovascular homeostasis (13) . hACE2 participates in microbial infection by serving as an entry point for coronaviruses (14) . The variation of hACE2 across species explains why SARS-CoV-1 infects humans and not rats nor mice. A study manipulated the protein sequences of the hACE2 of rats and mice by mutating specific residues to the residues in humans. The result found an increase in infectivity when the mouse or rat hACE2 has human residues in certain positions (15) . Understanding the interactions between the contact residues of SARS-CoV-1 and hACE2 can provide insights into how SARS-CoV-2 enters human cells. A recent bioinformatic study on SARS-CoV-1 revealed that different subdomains of the spike protein exhibit varying mutation rates. These varying mutation rates might be the driving force for the difference in the infectivity rate of SARS-CoV-1 (16) . Computational analysis has proven an effective method in studying protein dynamics (17) . SARS-CoV-1 has an unstable RNA genome. Comparatively, experimental approaches are limited in their ability to generate functional measurements of all potential missense mutations, which would be expensive, laborious, and time-consuming. Recently, we used a computational approach to predict and analyze missense mutations in SARS-CoV-2 S protein (18) . We predicted several missense mutations that affect the stability and binding affinity of SARS-CoV-2. These target mutations include D614G, N501Y, and K417N found in South Africa, United Kingdom, and Brazil variants. As a result, we employed the bioinformatics tools to analyze the effects of missense mutations on the stability and proteinprotein interactions of SARS-CoV-1. This approach is fast and less tedious in identifying key residues, which will help design therapeutic drugs against SARS-CoV-1. The Jalview tool (19) shows the aligned residues, the quality of the alignment, the conservations scores, and the consensus between the RBD sequences of SARS-CoV-1 and SARS-CoV-2 ( Fig 1A) . The sequence alignment using clustal omega algorithm within the Jalview tool indicates a ~76% sequence identity between RBD regions of SARS-CoV-1 and SARS-CoV-2. The bright yellow bars represent high conservation and quality between the aligned residues within each column. As shown in Fig 1B, This alignment yielded an RMSD of 0.878, which is more homologous. The shape and spatial orientation of the structural alignments overlapped, which indicates close atomic coordinates between the two structures. The SARS-CoV-1 S protein has 1255 residues that were used to generate 23,845 non-redundant missense mutations. The effect of each mutation on the stability of the SARS-CoV-1 S protein was evaluated. Of the total mutations performed, 20,083 missense mutations generated energy changes, while the remaining 3,762 missense mutations gave no output due to missing residues on the protein structure. (20) . Therefore, the folding energy changes within the range (-0.5<ΔΔG <0.5) are insignificant or categorized as having neutral effect. In more specific categories, 4,767 mutations had a highly destabilizing effect (ΔΔG>2.5kcal/mol) on the spike protein, 6,868 mutations moderately destabilize the spike's protein (0.5<ΔΔG≤2.5kcal/mol), 5484 mutations had a neutral effect (-0.5<ΔΔG≤0.5kcal/mol), 2,816 mutations moderately stabilize the S protein (-2.5ΔΔG≤-0.5kcal/mol), and 148 mutations have a highly destabilizing effect (ΔΔG<-2.5kcal/mol) on the S protein. We utilized the crystal structure of the RBD of SARS We compared the mutational effects of seven residues on SARS-CoV-1 S protein to corresponding residues on SARS-CoV-2 S protein. Four residues (T1059, G981, S500, and R1089) had the highest mean stabilizing effects, while three residues (G839, G634, and A430) had the highest mean destabilizing effects. Pairwise alignment of SARS-CoV-1 S and SARS-CoV-2 S showed that the residues T1059, G981, S500, R1089, G839, G634, and A430 on SARS-CoV-1 corresponds with residues T1077, G999, S515, R1107, G857, G648, and S443 on SARS-CoV-2, respectively. As shown in Table 1 The protein structure (PDB ID:2AJF) used for interaction analysis covered the S RBD that interact with the hACE2 receptor in humans. This RBD chain contains 180 residues, which spanned from residual position 323 to 502. A total of 3,420 mutations were computed, with 114 missing values. The remaining 3,306 mutations were classified into one of five categories according to their binding energy changes (ΔΔΔG). As shown in Fig 2C, The contact residues between hACE2 and the SARS-CoV-1 S were shown in Fig 6A. Not all the residues within the PD interact with SARS-CoV-1 and vice-versa. However, mutations on interacting or contact residues between hACE2 and SARS-CoV-1 showed to significantly affect their binding affinities. Fig 6 B-D shows key residues at the interface of hACE2 and SARS-CoV-1 that contributes to the binding affinity between the two molecules. The effect of mutation on the binding energy of T487 of the S protein weaken its affinity for residues, Y41 and K353, on hACE2 and vice-versa. As shown in Table 1 cysteine-rich residues (29) . This is known to mediate the fusion of the spike protein to hACE2. Predicted palmitoylation sites (30) were found in positions C19, C1217, C1218, C1222, C1232, C1235, and C1236. The average computed mutations at the C19 position predicted neutral effects, which showed no significant effect in the stability of SARS-CoV-1. However, the remaining six palmitoylated were located outside the residues covered by the structural protein. We used sequence-based mutation pathogenicity tools to predict the damaging effect of our computed mutations on the SARS-CoV-1 S function. We analyzed 25,101 missense mutations using the full-length SARS-CoV-1 S (1 -1255) . The PolyPhen2 scores gave probabilistic values on the tolerance and deleterious effect of missense mutations. A score less than 0.446 is considered benign, a score greater than 0.446 but less than 0.908 is considered possibly damaging, and a score greater than 0.908 is considered probably damaging. In Fig 8A, the missense mutations with neutral effect were predicted to be mostly tolerated with some classified as benign, while the mean value, as shown by the red line, is considered possibly damaging. Whereas the mean values of the moderately increasing and decreasing mutations were predicted to be possibly damaging. However, the mean values of large increasing and decreasing mutations were predicted to be probably damaging. The analysis of variance (ANOVA) showed that the means of all five categories were significantly different, with P-value < 2e-16 ( Fig 8A) . SNAP is a neural network machine learning algorithm that accepts protein structure as input for functional predictions (31) . SNAP scores less than zero have neutral effect, while SNAP scores greater than zero have a pathogenic effect. 13,179 (~53%) of the total 25,101missense mutations have pathogenic effects. We performed further analysis to correlate the effects of folding energy change ∆∆G on stability with their SNAP scores. As shown in Fig 8B, the SNAP scores of missense mutations with effects greater than 2.5 kcal/mol or less than -2.5 kcal/mol were higher than missense mutations with moderate effects (0.5<ΔΔG <= 2.5 or -2.5 =< ΔΔG < -0.5 kcal/mol). Furthermore, the missense mutations with neutral effects on SARS-CoV-1 stability also gave a neutral SNAP prediction. The statistical analysis showed that the correlation of SNAP scores in the five groups were significant with a P-value of <2e-16. Coronaviruses have been the cause of the most recent pandemics. The most recent coronaviruses are SARS-CoV-1 and SARS-Cov-2. A recent study showed a close relationship between the sequence and the structure of SARS-CoV-1 and SARS-CoV-2 (33). The structural alignment performed in our study also revealed an evolutionary relationship between their S proteins, RBDs, and RBMs. However, the orientation of SARS-CoV-1 residues, S461 -N473, did not align properly with SARS-CoV-2 residues, A475 -N487. This imperfect alignment could be responsible for their varying binding affinities to the human hACE2 receptor (34) . The stability of the S protein is crucial for the rapid transmissions of infection (35) . Understanding the role of mutations on S protein stability would help in designing therapeutic drugs and vaccines. Our prediction showed that more than half of the SARS-CoV We obtained the 3-dimensional structures of full-length S and RBD-hACE2 of both SARS-CoV-1 and SARS-CoV-2 from the RCSB Protein Data Bank(PDB) (38) . The structure of a trypsincleaved SARS-CoV-1's spike glycoprotein (PDB ID: 6ACG) was used for stability analysis. The structure of the complex of SARS-CoV-1 S RBD and the human hACE2 receptor (PDB ID: FoldX (20) was used for mutational analysis. We used the command line interface of FoldX to mutate each residue to the other 19 residues. The initial step used the 'RepairPdb' command to repair the wildtype protein structure. This was followed by either the use of the 'BuildModel' command and the 'AnalyseComplex' command to calculate the folding energy change and the interaction or binding energy change, respectively. For each mutation, we used FoldX to calculate the folding energy change (ΔΔG) and binding energy change (ΔΔΔG) (40) . We used the Polymorphism Phenotyping v2 (PolyPhen2) (41) and Screening for non-acceptable polymorphisms (SNAP) (31) prediction tools to predict the pathogenicity of each missense mutations. We utilized the R programming language (https://www.r-project.org/.) for data visualization for the purpose of drawing inferences. Specifically, we constructed boxplots to compare the prediction of pathogenicity between PolyPhen2 and SNAP. The FASTA sequences of SARS-CoV-1 and SARS-CoV-2 S proteins were retrieved from the Universal protein Knowledgebase (UniProtKB) (42) . We performed the pairwise sequence alignment of SARS-CoV-1 (Entry: P59594) and SARS-CoV-2 (Entry: P0DTC2) using the Clustal Omega computer program (https://www.ebi.ac.uk/Tools/msa/clustalo/) and Jalview2 Saturated computational mutagenesis of SARS-CoV-1 S protein proved to be effective in analyzing energy changes. Missense mutations in key residues such as A430 and S500 stabilized and destabilized SARS-CoV-1 full-length S and RBD, respectively. Moreover, missense mutations on residues G488 and T487 weakened the binding affinity of SARS-CoV1 S to hACE2. Mutation pathogenicity analysis showed that most highly destabilizing and highly stabilizing Structure of SARS coronavirus spike receptorbinding domain complexed with receptor The Severe Acute Respiratory Syndrome Molecular epidemiology, evolution and phylogeny of SARS coronavirus SARS | Home | Severe Acute Respiratory Syndrome | SARS-CoV Disease | CDC Emerging WuHan (COVID-19) coronavirus: glycan shield and structure prediction of spike glycoprotein and its interaction with human CD26 An interactive web-based dashboard to track COVID-19 in real time Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2 Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells. bioRxiv. Cold Spring Harbor Laboratory A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Spike Proteins of SARS-CoV and SARS-CoV-2 Utilize Different Mechanisms to Bind With Human ACE2 Angiotensin-Converting Enzyme 2 Metabolizes and Partially Inactivates Pyr-Apelin-13 Apelin-17: Physiological Effects in the Cardiovascular System Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Efficient Replication of Severe Acute Respiratory Syndrome Coronavirus in Mouse Cells Is Limited by Murine Angiotensin-Converting Enzyme 2 Structural assessment of the effects of Amino Acid Substitutions on protein stability and protein-protein interaction Systemic effects of missense mutations on SARS-CoV-2 spike glycoprotein stability and receptor-binding affinity. Brief Bioinform Jalview Version 2--a multiple sequence alignment editor and analysis workbench The FoldX web server: An online force field Exploitation of glycosylation in enveloped virus pathobiology Viruses and glycosylation: an overview Brunak S2, 4, 5, Wandall HH1, Levery SB1 CH N-GlyDE: a two-stage Nlinked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding SPRINT-Gly: predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties | Bioinformatics | Oxford Academic GlycoPP: A webserver for prediction of Nand O-glycosites in prokaryotic protein sequences Prediction of glycosylation sites using random forests Palmitoylation of the cysteine-rich endodomain of the SARS-coronavirus spike glycoprotein is important for spike-mediated cell fusion CSS-Palm 2.0: An updated software for palmitoylation sites prediction SNAP: Predict effect of non-synonymous polymorphisms on function Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 Structural, glycosylation and antigenic variation between 2019 novel coronavirus (2019-nCoV) and SARS coronavirus (SARS-CoV). VirusDisease The 29-Nucleotide Deletion Present in Human but Not in Animal Severe Acute Respiratory Syndrome Coronaviruses Disrupts the Functional Expression of Open Reading Frame 8 Quantitative determination of mechanical stability in the novel coronavirus spike protein Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Predicting functional effect of human missense mutations using PolyPhen-2 The {PyMOL} Molecular Graphics System, Version~1 A method and server for predicting damaging missense mutations UniProt: A worldwide hub of protein knowledge