key: cord-0727232-unvabosp authors: Ou, Junxian; Zhou, Zhonghua; Dai, Ruixue; Zhao, Shan; Wu, Xiaowei; Zhang, Jing; Lan, Wendong; Cui, Lilian; Wu, Jianguo; Seto, Donald; Chodosh, James; Zhang, Gong; Zhang, Qiwei title: Emergence of SARS-CoV-2 spike RBD mutants that enhance viral infectivity through increased human ACE2 receptor binding affinity date: 2020-09-12 journal: bioRxiv DOI: 10.1101/2020.03.15.991844 sha: 425280106c4aa7bd61d3fc59ba20ea8746b31805 doc_id: 727232 cord_uid: unvabosp The current global pandemic of COVID-19 is caused by a novel coronavirus SARS-CoV-2. The SARS-CoV-2 spike protein receptor-binding domain (RBD) is the critical determinant of viral tropism and infectivity. To investigate whether naturally occurring mutations in the RBD have altered the receptor binding affinity and infectivity, firstly we analyzed in silico the binding dynamics between mutated SARS-CoV-2 RBDs and the human ACE2 receptor. Among 1609 genomes of SARS-CoV-2 strains isolated during the early transmission phase, 32 non-synonymous RBD mutants were identified and found clustered into nine mutant types under high positive selection pressure. Applying molecular dynamics simulations, three mutant types (V367F, W436R, N354D/D364Y) displayed higher binding affinity to human ACE2, likely due to the enhanced structural stabilization of the RBD beta-sheet scaffold. The increased infectivity of one mutant (V367F) circulating worldwide was further validated by performing receptor-ligand binding ELISA, surface plasmon resonance, and pseudotyped virus assays. Genome phylogenetic analysis of V367F mutants showed that during the early transmission phase, most V367F mutants clustered more closely with the SARS-CoV-2 prototype strain than the dual-mutation variants (V367F + D614G), which emerged later and formed a distinct sub-cluster. The analysis of critical RBD mutations provides further insights into the evolutionary trajectory of SARS-CoV-2 under high selection pressure and supports the continuing surveillance of spike mutations to aid in the development of COVID-19 drugs and vaccines. A novel coronavirus SARS-CoV-2 has caused outbreaks of Coronavirus Disease 2019 (COVID-19) globally beginning in mid-December 2019, with an epicenter in Wuhan, China [1] [2] [3] . As of September 8, 2020, SARS-CoV-2 had infected 27,236,916 people world-wide and caused 891,031 deaths with an estimated fatality rate of 3.27% [4] . This on-going pandemic of COVID-19 has become the most serious threat to public health in this century. The origin of SARS-CoV-2 remains elusive. However, the initial cases were largely associated with a seafood market, which indicated potential zoonotic transmissions [1, 5] . Although bats and pangolins are most likely the reservoir and the intermediate hosts in the wild, more evidence is needed to support zoonotic transmission and to track the origin of this new coronavirus [6] [7] [8] . Angiotensin-converting enzyme 2 (ACE2) is the host cellular receptor for the SARS-CoV-2 spike protein, which is similar to its counterpart in SARS-CoV [9] [10] [11] [12] . The receptor-binding domain (RBD) of the spike protein subunit S1 interacts directly with ACE2, providing for tight binding to the peptidase domain of ACE2 [13, 14] . Therefore, RBD is the critical determinant of virus-receptor interaction and reflects viral host range, tropism, and infectivity. Although the RBD sequences of different SARS-CoV-2 strains circulating globally are largely conserved, mutations have appeared; these may account for differences in viral infectivity and also contribute to its spread [15] [16] [17] [18] . The S protein expresses antigenic sites on its protein surface, likely with both T-cell and B-cell epitopes. The main antibody binding sites substantially overlap with RBD, and an antibody binding to these sites is likely to block viral entry into cells [19] [20] [21] . To investigate whether mutations in RBD have altered the receptor binding affinities and whether these strains may have been selected for higher infectivity, the binding dynamics and the infectivity between the SARS-CoV-2 RBDs of the mutant strains to date and human ACE2 receptor were modelled and assessed computationally. In addition, experimental validation of the enhanced affility and infectivity of the V367F mutant was assessed. The experimental results supported and validated our computational simulations. Among the 1609 SARS-CoV-2 strains with whole genome sequences available in public databases collected before April 2020, 32 strains contained amino acid mutations in the RBD (Supplementary Table 1 The geographic distribution of the 32 RBD mutants clustering into nine mutant types is displayed. Strains with names highlighted in red are mutants with the enhanced binding affinity. The strains with names noted in yellow are mutants with similar binding affinities. Since RBD is the only domain to bind human ACE2 and, in turn, initiates cell entry, it is believed that the RBD should be highly conserved. To test this hypothesis, we investigated the selective pressures of the S gene by calculating nonsynonymous/synonymous substitution rate ratios (dN/dS ratios) for various segments of the S gene in the 1609 early SARS-CoV-2 strains [22] . Interestingly, the entire S gene exhibited a dN/dS of 4.1197, remarkably greater than 1, showing that the S gene is indeed under positive selective pressure ( Table 1 ). The RBD showed a similar dN/dS (3.3545) as the entire S protein, indicating that high selective pressure was also applied to this essential domain. Therefore, the functional relevance of these RBD mutations may be inferred. To estimate the functional changes suggested by the RBD mutations, we performed molecular dynamics (MD) simulations for the prototype SARS-CoV-2 (Wuhan-Hu-1 strain) and the RBD mutants in order to assess their binding energy to human ACE2. These were performed using GROMACS 2019 [23, 24] . The complex structure of the SARS-CoV-2 RBD domain and human ACE2 was obtained from the National Microbiology Data Center (ID: NMDCS0000001) (PDB ID: 6LZG) (https://www.rcsb.org/structure/6LZG). Mutated amino acids of the SARS-CoV-2 RBD mutants were directly replaced in the model, and the bat/pangolin CoV RBD domain was modeled using SWISS-MODEL [25] . Each simulation was performed at 10ns and each model was simulated in triplicate. All trajectories reached a plateau of RMSD after 2~5ns ( Figure. 2A) , indicating that their structures reached equilibrium. All of the subsequent computations on thermodynamics were based on the 5~10ns trajectories. Three RBD mutant types (N354D and D364Y, V367F, W436R) exhibited significantly lowered Δ G, suggesting a significantly increased affinity to human ACE2; the other mutants showed a similar Δ G compared to the prototype ( Figure. 2B) . The Δ G of these three mutant types were around -58 kJ/mol, approximately 25% lower than the prototype strain (-46.5 kJ/mol, calculated from the experimentally measured KD) ( Figure. 2B) . Compared to the KD = 14.7 nM of the prototype RBD [9] , the equilibrium dissociation constant (KD) of the three mutants are calculated as 0.12 nM, 0.11 nM, and 0.13 nM, respectively ( Figure. 2C), which were two orders of magnitude lower than for the prototype strain, indicating a remarkably increased affinity to the human ACE2 receptor. For the only double amino acid mutant (N354D, D364Y), the N354D substitution decreased the affinity, while the D364Y provided a higher affinity than the double mutation ( Figure. 2B ). This indicated that the D364Y is the major contributor to the enhanced affinity. In comparison, the bat CoV RBD (strain RaTG13, with the highest genome similarity) showed a much lower binding affinity (KD=1.17mM; Δ G=-17.4kJ/mol) to human ACE2 than the pangolin CoV (KD=1.89μM; Δ G=-33.9kJ/mol). For comparison, the affinity of the pangolin CoV was slightly lower than the SARS-CoV-2 prototype strain (KD=14.7nM; Δ G=-46.5kJ/mol) ( Figure. The nine RBD mutant types that emerged during the early transmission phase were divided into two groups: a "similar affinity" group (V341I, F342L, R408I, A435S, G476S, V483A), whose affinity is not significantly increased, and a "higher affinity" group (N354D D364Y, V367F, W436R), whose affinity is significantly increased. To explain the structural basis of the increased affinity, we investigated the dynamics of the residues in these structures in greater detail. The binding surface of the RBD to ACE2 is largely arrayed in random coil conformation, which lacks structural rigidity. Logically, a firm scaffold should be necessary to maintain this conformation of the interaction surface, and thus may facilitate the binding affinity. The betasheet structure scaffold, centered by residues 510-524 ( Figure. 3A, marked as red), apparently provides this rigidity. "Higher affinity" mutants (N354D D364Y, V367F, and W436R) showed a considerable decrease of the RMSF (Root Mean Square of Fluctuation) at this region, demonstrating a more rigid structure; this was not observed for the "similar affinity" mutants ( Figure. 3B) . Coincidentally, the substitutions that account for the affinity increase (D364Y, V367F, and W436R) are all located near this fragment. Indeed, residues 475-485, which is a random coil near the binding site, showed a remarkably higher RMSF for the "similar affinity" group mutants, in contrast to the "higher affinity" group mutants ( Figure. As of August 24, 2020, among all of the mutants with dominant mutations in spike RBD (>20 strains), most mutations resulted in a substitution of amino acids with similar properties. V367F is the only mutant with higher binding affinity, as calculated by MD simulation ( Table 2 ). The other "higher affinity" mutants (N354D D364Y, and W436R) that were observed during the early transmission phase could not be detected again. Therefore, the binding affinity and the infectivity of V367F mutant were further validated experimentally. First, we performed experiments to assess the binding affinity in vitro with a receptor-ligand binding ELISA assay using purified S proteins and human ACE2 protein. The result showed that the V367F mutation significantly lowered the ED 50 concentration (ED 50 = 0.8±0.04 μ g/ml), as compared to the prototype (ED 50 = 1.7±0.14 μ g/ml) ( Figure. 4A ). This demonstrates that the V367F mutant has higher affinity than the prototype. Second, we performed surface plasmon resonance (SPR) experiments, which yielded the same conclusion: the prototype had a K D of 5.08 nM, compared to the V367F mutant with a K D of 2.70 nM (Figure. 4B) . These results qualitatively validated our computational simulations. Next, we performed an in vivo experiment to investigate the invasion efficiency of S proteins using a pseudovirus assay. Same amounts of S protein-containing pseudovirus were subjected to infection of ACE2overexpressed Vero and Caco-2 cells. A higher infection efficiency is represented by the increased copy number of the integrated lentivirus genome. At 24 hours post-infection, the V367F mutant pseudovirus showed 6.08x higher copy numbers than the prototype in Caco-2 cells. After 48 hours, the V367F mutant pseudovirus showed 4.38x and 3.88x higher copy number than the prototype in Vero and Caco-2 cells, respectively ( Figure. 4C) . These enhancements were statistically significant (P<0.01, t-test). Therefore, the in vitro and in vivo results were consistent with enhanced affinity and infectivity for the V367F mutant. The D614G mutation is located in the S1 region and is outside of the RBD of the SARS-CoV-2 spike protein. It has been confirmed that the D614G change increases virus infectivity by elevating its sensitivity to protease and that this mutation has spread widely [26] . Among all of the dominant mutations (>20 strains) in the spike RBD, V367F, S477N, V483A, G485R, A520S, P384L, A522V, and P330S were detected along with the D614 prototype during the early transmission phase. Later, these mutations were detected with the D614G mutant ( Table 2 ). The V367F mutants were initially discovered in January in Hong Kong, and the D614G+V367F dual mutant was initially discovered in March in the Netherlands. Afterwards, detection of both the V367F single mutant and the dual mutant increased rapidly ( Figure. 5) . Additionally, the distribution of V367F mutants and D614G/V367F dual mutants emerged mainly in Europe, including United Kingdom, the Netherlands, Spain, Northern Ireland, Switzerland, and Iceland, as well as in the USA, Australia, and Taiwan (Supplementary Table 2 ). The phylogenetic analysis of the V367F mutant genomes showed that most V367F mutants during the early transmission phase clustered more closely with the SARS-CoV-2 prototype strain. Intriguingly, all of the dual-mutation variants (V367F + D614G) formed a distinct sub-cluster separate from the V367 sub-cluster ( Figure. 6) . This indicates that the V367F mutation may have evolved along with D614G mutation, suggesting a synergistic effect of increased infectivity. The whole genome phylogenetic analysis, including all V367F mutants. For reference, the prototype strain Wuhan-HU-1 is marked with a filled circle. A phylogenetic tree was constructed using the maximum likelihood method with 1000 bootstrap replicates (MEGA X), and applying default parameters [27] . The distinct sub-cluster formed by all of the dual-mutation variants (V367F + D614G) is indicated in the orange box. Due to the challenging and on-going pandemic and given the evolving nature of the SARS-CoV-2 virus globally, identifying changes in viral infectivity may prove crucial to containing the COVID-19 pandemic. Quarantine policies need to be adapted with respect to the viral infectivity change. It is always a dilemma of quarantine and economy. Any government would balance the lost due to the quarantine lockdown versus the lost due to the disease. Numerous models have been rasied to estimate the two costs. For example, if the viral strain is more infectious, more stringent lockdown measure would be expected. This report provides computational and experimental insights into the functional outcome of mutations in RBD. As RBD mutations are under positive selection pressure, we identified the mutants that acquired increased binding affinity to human ACE2 receptor and presumably higher infectivity for human cells. First, our analysis of molecular dynamics simulation indicated a remarkable enhancement of the affinity efficiency of multiple mutant S proteins. Compared to the prototype strain Wuhan-Hu-1, the Δ G of mutants decreased ~25%. These mutants bind to ACE2 more stably due to the enhancement of the base rigidity of the S protein structure. Potential and recent animal-to-human transmission events of SARS-CoV-2 may explain the strong positive selection and enhancement of the affinity during the pandemic. Ongoing adaptation to transmission and replication in humans, including mutation events in the RBD may boost the binding affinity and lead to an increase in the basic reproduction number (R0), and in theory, further enable human to human transmission. The origin of this virus has been of considerable interest and speculation since the outbreak. Due to the high sequence similarity of the bat SARS-like CoV genome and the pangolin CoV RBD sequence to the SARS-CoV-2 sequences, these hosts were thought to have initiated the infection in humans [6] [7] [8] . Our results provide more clues to support this hypothesis. Our results suggest that the binding energy of the bat SARS-like CoV RBD is too high to bind human ACE2 effectively (K D in millimolar range). In contrast, the pangolin CoV showed a binding K D for human ACE2 at the micromolar range, just ~6x higher than that of human SARS virus (K D = 0.326μM) [28] (Figure. 3) , which indicates that the pangolin CoV has the potential to infect humans in unprotected close contact. Alignments of the genomic sequences of SARS-CoV-2 and pangolin CoV viruses suggest recombination events, particularly in the RBD domain, between pangolin and bat viruses 6 . The pangolin CoV has been detected among intercepted smuggled Malayan pangolins in multiple provinces in China, suggesting and supporting a high risk of zoonotic infections to humans, constantly and widely [7, 8] . It should be noted that during the early transmission phase, the mutation V367F that enhances the binding affinity was found in six strains: one in Hong Kong and five in France. As RBD is conserved in SARS-CoV-2, the coincidence of six strains with the same mutation across large geographic distances indicates that this mutant is more robust and that these strains originated as a novel sub-lineage, given the close isolation dates (January 22 and 23, respectively). An alternate view is that asymptomatic individuals with the same mutation were "super-infecting" travellers. Along with the epidemiological data, mutation surveillance is of critical importance as it can reveal more exact transmission routes of the epidemic and provide early warnings of additional outbreaks. Emergence of SARS-CoV-2 strains in Hong Kong, France, and other countries with RBD mutations allowing higher binding affinity to human ACE2 receptor suggests an increased risk and more severe morbidity and mortality during a sustained pandemic of COVID-19, particularly if effective precautions are not implemented. By performing assays comparing the prototype spike protein to the V367F mutation containing counterpart, we confirmed the significantly enhanced binding affinity and likely higher infectivity of the V367F mutant, and showed that the mutation stabilizes the RBD structure. By tracking mutation types in the SARS-CoV-2 spike RBD, only the V367F was continuously observed. The N354D, D364Y, and W436R mutations seemed to have disappeared during the course of the 0 pandemic, perhaps in part due to the extremely strict isolation policy set in China. Most of the dominant RBD mutants were first detected with the prototype D614, and then together with the G614 mutation. This is possibly due to multiple individual recombination events between the D614+V367F mutants and G614 mutants, although it is difficult to determine the exact position and time point due to the high sequence identities among SARS-CoV-2 mutants. D614G is distinct from the RBD mutationsV367F -it is not located in the RBD but enhances viral infectivity by elevating its sensitivity to protease [26] . Therefore, D614G and V367F may function independently and have synergistic effects on viral infectivity. Recombination is known to play an important role in natural coronavirus evolution, which may contribute to the convergence of dual enhancing mutants (D614G + V367F) [16, 18, 29, 30] . More attention should be paid to the risk of the advantage accumulation evolution through recombination among the variants, in which the recombinants may be more infectious and also better at immune escape. The S protein is also important for antigen recognition. By tracking dominant RBD mutation sequences up to August, 2020, multiple mutants with more than 20 sequences putatively related to weakly reduced host receptor binding and altered antigenicity have been detected [17] . These were found by comparing equivalent positions in more than 500 sequences of SARS and MERS genomes, in particular the N439K mutation [17] . Equivalent positions have been studied for V483A in the MERS-CoV genomes and for N439K in the SARS-CoV genomes. These mutations are hypothesized to result in weakly reduced host receptor binding and altered antigenicity, revealing possible immune escape driving virus evolution [31] , [32] . Since the RBD contains important antigenic epitopes, frequent mutations in RBD, especially those that change the amino acid properties, may weaken the binding affinity of any antibody raised against the prototype strain [20, 21] . This may lead to decreased vaccine efficacy and should be further studied. In summary, we have identified 32 RBD mutant strains clustering into nine mutant types during the early transmission phase. Three mutant types that emerged in Asia and Europe display apparent enhanced 1 structural stability of the spike protein along with higher binding affinities to the human ACE2 receptor. Multiple mutants apparently disappeared under high positive selection pressure; only the V367F mutation, which showed higher infectivity, was persistently observed along with the D614G mutation, the latter not in the RBD. This indicates that the V367F mutant is stable and may have acquired increased infectivity for humans during the COVID-19 pandemic. The emergence of dual mutants (V367F+D614G) suggests recombination among SARS-CoV-2 genomes may increase the infectivity of the virus along with enhanced escape from the host immune response. These findings support the continuing surveillance of spike mutations to aid in the development of COVID-19 drugs and vaccines. Full-length protein sequences of S protein RBD were downloaded from the NCBI GenBank Database, China 2019 Novel Coronavirus Resource (https://bigd.big.ac.cn/ncov), and GISAID EpiFluTM Database (http://www.GISAID.org). 1609 SARS-CoV-2 full-genome sequences isolated during early transmission phase (before April 5, 2020) were downloaded and the sequences with amino acid mutations in S protein and RBD region were parsed . The genome sequences with amino acid mutations in S protein and the RBD were analyzed in this study (Supplementary Table 1 ). For tracking the V367F mutation sequences and the dominant spike mutations "to date", 83,758 SARS-CoV-2 spike amino sequences isolated before August 24, 2020 were retrieved from the GISAID EpiFluTM Database (http://www.GISAID.org) and aligned. The genome sequences with either theV367F mutation or the V367F/D614G dual mutations in the S protein RBD were screened and analyzed in this study (Supplementary Table 2 ). Alignment of S protein sequences from different sources and comparison of ACE2 proteins among different species were performed using MAFFT version 7, with default parameters( https://mafft.cbrc.jp/alignmeloadnt/server/)and Bioedit [33, 34] . Selection pressure analyses were conducted using the Nei-Gojobori model [22] . Phylogentic analyses were conducted with Mega X(version 10.0.2, applying the maximum-likelihood method with 500 bootstrap replicates and default parameters [27] . The complex structure of the SARS-CoV-2 S-protein RBD domain and human ACE2 was obtained from the Nation Microbiology Data Center (ID: NMDCS0000001) (PDB ID: 6LZG). Mutated amino acids of the SARS-CoV-2 RBD mutants were directly replaced in the model, and the bat/pangolin CoV RBD domain was modelled using SWISS-MODEL [25] . Molecular dynamics simulation was performed using GROMACS 2019 with the following options and parameters: explicit solvent model, system temperature 37°C, OPLS/AA all-atoms force field, and LINCS restraints. With 2fs steps, each simulation was performed at 10ns, and each model was simulated three times to generate three independent trajectory replications. Binding free energy (ΔG) was calculated using MM-PBSA method (GitHub; https://github.com/Jerkwin/gmxtool), with the trajectories after structural equilibrium assessed using RMSD (Root Mean Square Deviation) [24, 35] . To calculate the equilibrium dissociation constant (K D ) and Δ G, the formula was used. Estimated Δ Gs of the RBD mutants were normalized using the Δ G of the prototype strain which was derived from experimental data [28] . The SARS-CoV-2 prototype S gene was cloned into pNPM5 vector (Novoprotein, NJ, USA), and fused with a C-terminal His 6 -tag. The V367F mutation was introduced using site-directed mutagenesis, consistent with the nucleotide sequence of the actual isolate. These two constructs were transfected into HEK293 cells using polyethyleneimine. Since the S protein includeed a signal peptide in its N-terminal 14 amino acids, it was secreted into the medium. Generation of SARS-CoV-2 S pseudovirus was done as previously described with some modifications [36] . Briefly, 293T cells, at about 70-90% confluence, were co-transfected with 9 ug of the transfer vector The number of nonsynonymous and synonymous differences per sequence from averaging over all sequence pairs are shown. Analyses were conducted using the Nei-Gojobori model [22] . The analysis involved 1609 SARS-CoV-2 S gene sequences. are evaluated by amino acid property change and by binding free energy using the MM-PBSA method [35] . aa: amino acid. Coronavirus disease (COVID-19) Weekly Epidemiological Update We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID's EpiFlu™ Database on which this research is based. All submitters of data may be contacted directly via www.gisaid.org. This work was supported by grants from the National Key Research and The authors declare that they have no conflict of interest. Multiple alignments of the RBD amino acid sequences from January through March, 2020. SARS-CoV-2Wuhan-Hu-1, the first reported genome, is used as reference. A bat and a pangolin SARS-like coronavirus sequences are also included. Amino acid substitutions are marked. Dots indicate identical amino acids. Metadata of the strains with mutations in the RBD of spike glycoprotein (January through March, 2020). Metadata of the strains with V367F mutations in the RBD of spike glycoprotein (January through August, 2020).