key: cord-1012204-u57xmbz5 authors: Yurkovetskiy, Leonid; Wang, Xue; Pascal, Kristen E.; Tomkins-Tinch, Christopher; Nyalile, Thomas; Wang, Yetao; Baum, Alina; Diehl, William E.; Dauphin, Ann; Carbone, Claudia; Veinotte, Kristen; Egri, Shawn B.; Schaffner, Stephen F.; Lemieux, Jacob E.; Munro, James; Rafique, Ashique; Barve, Abhi; Sabeti, Pardis C.; Kyratsous, Christos A.; Dudkina, Natalya; Shen, Kuang; Luban, Jeremy title: Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant date: 2020-09-15 journal: Cell DOI: 10.1016/j.cell.2020.09.032 sha: 71b2c9ef671b39e6e68eca579052d2362cfff5c1 doc_id: 1012204 cord_uid: u57xmbz5 The SARS-CoV-2 spike (S) protein variant D614G supplanted the ancestral virus worldwide, reaching near fixation in a matter of months. Here we show that D614G was more infectious than the ancestral form on human lung cells, colon cells, and on cells rendered permissive by ectopic expression of human ACE2 or of ACE2 orthologs from various mammals, including Chinese rufous horseshoe bat and Malayan pangolin. D614G did not alter S protein synthesis, processing, or incorporation into SARS-CoV-2 particles, but D614G affinity for ACE2 was reduced due to a faster dissociation rate. Assessment of the S protein trimer by cryo-electron microscopy showed that D614G disrupts an interprotomer contact, and that the conformation is shifted towards an ACE2 binding-competent state, which is modeled to be on pathway for virion membrane fusion with target cells. Consistent with this more open conformation, neutralization potency of antibodies targeting the S protein receptor-binding domain was not attenuated. Next-generation sequencing permits real-time detection of genetic variants that appear in pathogens during disease outbreaks. Tracking viral variants now constitutes a requisite component of the epidemiologist's toolkit, one that can pinpoint the origin of a zoonotic virus and the trajectory it takes from one susceptible host to another (Hadfield et al., 2018; Shu and McCauley, 2017) . Lagging behind sequence-based modeling of virus phylogenies and transmission chains is the ability to understand the effect of viral variants on the efficiency of transmission between hosts or on the clinical severity of infection. Most sequence variants that arise during virus replication are either detrimental to the fitness of the virus or without consequence. Even so, such variants can increase in frequency over the course of an outbreak by chance (Grubaugh et al., 2020) . More rarely, though, increasing frequency of a variant can reflect competitive advantage due to higher intrinsic replication capacity, with increased viral load and transmissibility. In December 2019, an outbreak of unexplained fatal pneumonia became apparent in Wuhan City, Hubei Province, China. By early January 2020, SARS-CoV-2 was identified as the virus causing the disease Lu et al., 2020; Wu et al., 2020a Wu et al., , 2020b Zhou et al., 2020b; Zhu et al., 2020) . After SARS-CoV (Drosten et al., 2003; Ksiazek et al., 2003) and MERS-CoV (Zaki et al., 2012) , SARS-CoV-2 is the third human coronavirus this century known to cause pneumonia with a significant case-fatality rate (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, 2020). Hundreds of coronaviruses have been identified in bats, including at least 50 SARS-like Sarbecoviruses (Lu et al., 2020; Zhou et al., 2020a) . The virus closest in sequence to SARS-CoV-2 observed to date was isolated from a bat (Zhou et al., 2020b) though the most proximal animal reservoir for SARS-CoV-2 remains unknown (Andersen et al., 2020; Lam et al., 2020) . Sarbecoviruses, the viral subgenus containing SARS-CoV and SARS-CoV-2, undergo frequent recombination, but SARS-CoV-2 is not a recombinant of any Sarbecoviruses detected to date. Its receptor-binding motif, important J o u r n a l P r e -p r o o f for human ACE2 receptor binding specificity, appears to be an ancestral trait shared with multiple bat viruses (Boni et al., 2020) . Among RNA viruses, coronaviruses are remarkable for having the largest known genomes (Saberi et al., 2018) and for encoding a 3'-to-5'-exoribonuclease that permits highfidelity replication by the viral RNA-dependent RNA polymerase (Denison et al., 2011; Smith et al., 2014) . By preventing otherwise lethal mutagenesis (Smith et al., 2013) , the coronavirus exonuclease is thought necessary for the coronavirus genome size to extend beyond the theoretical limit imposed by error rates of viral RNA polymerases (Holmes, 2003) . Though the rate of sequence variation among SARS-CoV-2 isolates is modest, over the course of the pandemic the virus has had opportunity to generate numerous sequence variants, many of which have been identified among the thousands of SARS-CoV-2 genomes sequenced to date (https://www.gisaid.org/) (Hadfield et al., 2018) . Here we investigate potential functional and structural consequences of one of these variants, the Spike protein variant D614G, which has been associated with increased viral load in people with COVID-19 (Korber et al., 2020; McNamara et al., 2020; Volz et al., 2020; Wagner et al.) . Over the course of the SARS-CoV-2 pandemic, 12,379 single nucleotide polymorphisms (SNPs) have been identified in genomic data (GISAID download on 25 June 2020). 6,077 of the SNPs were seen only once in the dataset and only four SNPs rose to high frequency. These four SNPs include C241U in the 5'UTR, a silent mutation C3037U, C14408U encoding the RNAdependent RNA polymerase variant P323L, and A23403G encoding the spike protein variant D614G. A23403G was first reported at the end of January 2020 in virus genomes from China and Germany. Given how few SARS-CoV-2 genomes have been sequenced from early in the outbreak, the geographic origin of A23403G cannot be determined. The frequency of A23403G has increased steadily over time and is now present in approximately 74% of all published sequences ( Figure 1A ). Sequences available over recent weeks, though, indicate that A23403G has nearly reached fixation globally ( Figure 1A and B). The ability of the D614G S protein variant to target virions for infection of ACE2-positive cells was assessed using single-cycle lentiviral vector pseudotypes in tissue culture. Mammalian expression plasmids were engineered to encode the ancestral S protein D614 or the D614G variant. Each S protein expression plasmid was separately transfected into HEK-293 cells with plasmids encoding HIV-1 structural proteins and enzymes. Separate plasmids were transfected that encode RNAs with HIV-1 cis-acting signals for packaging and replication, and either GFP or luciferase (Luc) reporter cassettes. For each condition tested, multiple virus stocks were produced, and each stock was tested in triplicate after vector particle normalization using reverse transcriptase activity. 48 hours after challenge with the vectors, the transduction J o u r n a l P r e -p r o o f efficiency of each virion preparation was assessed by measuring the percent GFP-positive cells using flow cytometry, or by quantitating target cell-associated luciferase activity. When Calu-3 human lung epithelial cells were used as targets, challenge with virus bearing D614G resulted in 6-fold more GFP-positive cells, or 5-fold more bulk luciferase activity, than did particles bearing D614 S protein (Figure 2A ). When Caco-2 human colon epithelial cells were used as target cells, 4-fold higher infectivity was observed with D614G ( Figure 2A ). Additionally, when HEK-293 cells or SupT1 cells had been rendered infectable by stable expression of exogenous ACE2 and TMPRSS2, D614G was 9-fold more infectious than D614 ( Figure 2A ). The likely zoonotic origin of SARS-CoV-2 raises the question of whether D614G was selected during the pandemic as a result of human-to-human transmission. To determine whether the increased infectivity of D614G is specific for certain ACE2 orthologs, HEK-293 were transfected separately with plasmids encoding ACE2 orthologs from human (Homo sapiens), Chinese rufous horseshoe bat (Rhinolophus sinicus), Malayan pangolin (Manis javanica), cat (Felis catus), and dog (Canis lupis). Each ACE2 transfectant was then challenged with luciferase reporter viruses bearing SARS-CoV-2 Spike protein, either D614 or D614G. Relative increase in infectivity due to D614G was comparable in cells expressing all of these ACE2 orthologs ( Figure 2B ), demonstrating that increased infectivity due to D614G is not specific for human ACE2. The effect of D614G on S protein synthesis, processing, and incorporation into virion particles produced by SARS-CoV-2 structural proteins was assessed next ( Figure 3A ). HEK-293T cells were transfected with plasmids expressing each of the four SARS-CoV-2 virion-associated J o u r n a l P r e -p r o o f structural proteins ( Figure 3A ). Western blots were then performed on the cell lysate and on proteins from the pellet after ultracentrifugation of the transfection cell supernatant. The minimal requirement for virion assembly by other coronaviruses is the membrane protein (M) and the envelope protein (E) (for mouse hepatitis virus (Bos et al., 1996; Vennema et al., 1996) ), or the M and nucleocapsid protein (N) (for SARS-CoV (Huang et al., 2004) ). In the case of SARS-CoV-2, M protein production was sufficient to release particles from transfected cells, though particle release by M was increased by co-production of N ( Figure 3B ). The S proteins D614 and D614G were produced to comparable levels, processed to S1 and S2 with comparable efficiency, and incorporated into SARS-CoV-2 virus-like particles at similar levels ( Figure 3B ). These results suggest that increased infectivity due to D614G is primarily manifest after virion assembly, during entry into target cells. Though D614G is located outside of the receptor-binding domain, this non-conservative amino acid change might alter ACE2-binding properties via allosteric effects. Surface plasmon resonance (SPR) was used to determine if the kinetics of SARS-CoV-2 S protein binding to human ACE2 is changed by D614G. Human ACE2 was immobilized and the binding of soluble, trimeric SARS-CoV-2 S protein, either D614 or D614G, was detected. At 25°C, the rate of association with ACE2 was little different between D614G and D614, but D614G dissociated from ACE2 at a rate four-fold faster than D614, resulting in a 5.7-fold reduction in binding affinity ( Figure 4 ). At 37°C, the association rate between D614G and ACE2 was slower than between D614 and ACE2, and the dissociation rate of D614G was faster, again resulting in five-fold reduction in binding affinity ( Figure 4 ). Consistent with this increased affinity, the 25 nM dilution of D614 S protein was too high to be measured at 37°C ( Figure 4C ). These data demonstrate that the increased infectivity of D614G is not explained by greater ACE2 binding strength. J o u r n a l P r e -p r o o f D614G and the ancestral S protein are equally sensitive to neutralization by monoclonal antibodies targeting the receptor binding domain The global spread and enhanced infectivity of the SARS-CoV-2 D614G variant raises the question of whether this structural change would compromise the effectiveness of antiviral therapies targeting the S protein, especially if they were designed to target D614. To determine if this is the case, the neutralization potency of four monoclonal antibodies that target the SARS-CoV-2 Spike protein receptor-binding domain was assessed. These fully human monoclonal antibodies (Hansen et al., 2020) are currently under evaluation in clinical trials as therapeutics for COVID-19 (NCT04425629, NCT04426695). Each of these monoclonal antibodies, whether tested individually or in various combinations, demonstrated similar neutralization potency against D614G as they did against D614 ( Figure 5 ). Since the increased infectivity of D614G was not explained by increased affinity for ACE2 ( Figure 4 ), cryo-electron microscopy (cryo-EM) was used to illuminate potential structural features that distinguish D614G from D614 (Table 1) . Structural studies of the SARS-CoV-2 trimeric S protein ectodomain demonstrate that the receptor-binding domain of each protomer can independently adopt either a closed or an open conformation, giving rise to asymmetric trimers (Walls et al., 2020; Wrapp et al., 2020) . The open conformation is required for ACE2binding since the ACE2 binding site is partially shielded in the closed conformation (Shang et al., 2020; Yan et al., 2020) , and the open conformation is believed to be on-pathway for S protein-mediated fusion of the virion membrane with the target cell membrane (Tortorici and Veesler, 2019) . Both S protein variants, D614 and D614G, were expressed in mammalian cells as soluble trimers. When enriched from culture media, and eluted from a size exclusion column, single peaks were observed for each variant protein at ~500 kD, the predicted mass of the J o u r n a l P r e -p r o o f homotrimer ( Figure S1A ). Enrichment of the protein complexes, and integrity of the full-length protomers, was additionally confirmed by SDS-PAGE ( Figure S1B ). Well-defined particles of S protein trimers were evident by cryo-EM ( Figure S1C ), and reference-free, two-dimensional clustering revealed structural details from different orientations ( Figure S1D ). Three-dimensional clustering and refinement generated the final density map for D614G ( Figure 6A , EMD-22301), which showed a similar overall architecture to the published map of D614 ( Figure 6B ). Unmasked Fourier shell correlation (FSC) analysis indicated that the D614G map had a mean resolution of 3.7 Å (gold-standard criteria, Figure S1E , and half-map FSC, Figure S1F ) which is sufficient to reveal fine differences from D614. The SARS-CoV-2 S protein consists of S1 and S2 subunits, with multiple domains within S1 ( Figure 6C ). Comparing the map of D614G with that of D614, the N-terminal domain (NTD), the intermediary domain (INT), and the C-terminal domain (CTD) of the S1 subunit were clearly identified ( Figure 6A and 6B). However, the density corresponding to the receptor binding domain (RBD), which was well-resolved in D614 (arrows, Figure 6B ), was scattered in the D614G map ( Figure 6A ), suggesting that its RBD is flexible and adopts multiple conformations (see section below). Based on the resolved ensemble cryo-EM density map ( Figure 6A ) and the primary sequence ( Figure 6C ), an atomic model of D614G without the RBD was built ( Figure 6D , PDB: 6XS6) and validated (Figure S1G, model-to-map FSC; Figure S1H , local resolution). In the structural model, the S2 subunit of D614G overlapped well with the published structures for D614 (RMSD = 0.77 Å). In contrast, there was significant deviation within the S1 subunit (RMSD = 4.5 Å, Figure 6E and 6F). When the S1 subunit of D614G was superimposed on the closed conformation of D614, the S1-NTD and S1-INT shifted away from each other by 6 Å and 4 Å, respectively ( Figure 6E and Figure S2B and S2C), revealing a wider space between these two domains. When the S1 subunit of D614G was superimposed on D614 in the open conformation, J o u r n a l P r e -p r o o f the S1-NTD was shifted outwards by 3 Å, while the S1-INT overlapped well with that of D614 ( Figure 6F ). D614 localizes to the interface between two protomers where its side chain forms hydrogen bonds with the T859 side-chain in the adjacent protomer ( Figure 6G ). The effect of D614G on this interaction could be assessed since local resolution in the map generated here reached 3.2 Å. The atomic model showed that D614G has two consequences. First, D614G disrupts the inter-protomer hydrogen bond with Thr859 ( Figure 6H and I), weakening the stability of the trimer. In effect, D614 acts as a "latch" that secures two protomers together, and D614G loosens this latch ( Figure 6H and I). Second, intra-protomer distance between the backbone amine of residue 614 and the backbone carboxyl group of residue 647 is shortened from 3.4 Å to 2.7 Å, presumably stabilizing the CTD. To better assess the conformation of the RBD, the flexible region of S1 was subjected to masked 3D classification and refinement, with the aim of resolving the conformational heterogeneity in that region. Two distinct classes arose from this analysis of the dataset. 58% of the protomers adopted an open conformation in which the RBD is positioned to interact with ACE2, and 42% were in the closed conformation in which the RBD is buried ( Fig. 7A and 7B ). This ratio of the two conformations contrasted dramatically with assessment of D614 structural data (Walls et al., 2020) (Fig. 7D ). Analysis of D614 (Walls et al., 2020) showed that 46% of particles were in the all closed conformation and 54% in the one open conformation. These data further emphasize the contrast between D614G conformational space and that of D614. The SARS-CoV-2 S protein variant D614G is one of only four SNPs, out of the more than 12,000 reported in GISAID, that has risen to high frequency ( Figure 1 ). This suggests that D614G confers a replication advantage to SARS-CoV-2, such that it increases the likelihood of human-to-human transmission. Data in which the presence of D614G correlates with increased rates of transmission through human populations would support this hypothesis. Several groups suggest that such an association exists (Furuyama et al., 2020; Korber et al., 2020; Volz et al., 2020) , though adequately powered datasets, appropriately controlled for age and other variables, have eluded investigators. Future prospective comparisons of D614G transmission to that of D614 seem unlikely given that D614G has gone to near fixation world-wide ( Figure 1 ). However, the SARS-CoV-2 genomes that have been sequenced are only a narrow snapshot of the pandemic and additional sequencing of archived samples might pinpoint the origin of D614G or better resolve the variant's trajectory. Indirect evidence that D614G is more infectious was provided here by experiments with pseudotyped viruses showing that D614G transduces 3 to 9-fold more efficiently than does the ancestral S protein ( Figure 2A ). This effect was seen with a range of cellular targets, including lung and colon epithelial cells. Efforts are underway to compare the replication efficiency of D614G with that of D614 in the context of the nearly 30,000 nucleotide SARS-CoV-2 genome. Such reverse genetic experiments, though, are technically difficult and potentially confounded by acquisition of unnatural, tissue culture-adapted mutations during genome rescue and expansion in transformed cell lines, as has occurred during similar assessments of Ebola virus variants (Marzi et al., 2018; Ruedas et al., 2017; Wang et al., 2017) . Consistent with the increased ability of D614G to infect cells in tissue culture, several studies suggest that D614G is associated with increased viral load in people with COVID-19 (Korber et al., 2020; McNamara et J o u r n a l P r e -p r o o f al., Volz et al., 2020; Wagner et al.) , though these studies quantitated SARS-CoV-2 RNA and did not measure infectious virus. If SARS-CoV-2 Spike D614G is an adaptive variant that was selected for increased human-to-human transmission following spillover from an animal reservoir, one might expect that increased infectivity would only be evident on cells bearing ACE2 orthologs similar to that in humans. In contrast to the primate-specific increase in infectivity that was reported for the major clade-forming Ebola virus glycoprotein variant from the 2013-2016 West African outbreak (Diehl et al., 2016; Urbanowicz et al., 2016) , the increased infectivity of D614G was equally evident on cells bearing ACE2 orthologs from a range of mammalian species ( Figure 2B ). The fact that D614G was more infectious on cells expressing ACE2 orthologs from Chinese rufous horseshoe bat and Malayan pangolin raises the question of why D614G does not dominate the sequences of closely-related Sarbecoviruses that circulate in these species. Among these viruses, only SARS-CoV-2 possesses a polybasic furin cleavage site at the S1-S2 junction in the S protein, which is required for SARS-CoV-2 to infect human lung cells but not other cell types (Hoffmann et al., 2020) . Interestingly, when non-lung cells are challenged, disruption of the furin-cleavage site increases SARS-CoV-2 infectivity to the same extent as does D614G . These observations suggest that D614G increases infectivity in the presence of the furin cleavage site, but that D614G offers no selective advantage when transmission is possible in the absence of the furin site, as appears to be the case in bats and pangolins. Interestingly, HIV-1 variants with increased infectivity in tissue culture have been isolated from the central nervous system, a compartment with reduced immune pressure (Peters et al., 2004; Quitadamo et al., 2018; Schnell et al., 2011) . Perhaps D614G dominated the SARS-CoV-2 pandemic because, unlike bats, humans are immunologically naive to Sarbecoviruses. Insight into the mechanism by which D614G increases infectivity was gleaned from cryo-EM studies of the SARS-CoV-2 S protein trimer. D614G exhibited striking conformational changes ( Figure 6E and F, Figure 7C and D), all of which may be attributable to disruption of the interprotomer latch between D614 in S1 and T859 in S2 ( Figure 6H exhibiting an all open state like that reported here (Gui et al., 2017; Pallesen et al., 2017; Walls et al., 2019; Yuan et al., 2017) . When the SARS-CoV-2 S protein RBD is in its closed conformation, the binding site for ACE2 is physically blocked (Shang et al., 2020; Yan et al., 2020) . Models of coronavirus S-mediated membrane fusion describe ACE2 binding to all three RBD domains in the open conformation as destabilizing the pre-fusion S trimer, leading to dissociation of S1 from S2 and promoting transition to the post-fusion conformation (Pallesen et al., 2017; Walls et al., 2019) . According to these models, the well-populated all open conformation of D614G ( Figure 7D ) would reflect an intermediate that is on-pathway to Smediated membrane fusion. Despite the increased infectivity of D614G in tissue culture, and the increased viral load in infected people, increased COVID-19 disease severity has not been detected in association with D614G infection (Korber et al., 2020; McNamara et al., 2020; Volz et al., 2020; Wagner et al.) . Perhaps there are fitness tradeoffs for D614G in vivo due to the more open conformation of its RBD (Figure 7) which potentially renders D614G more immunogenic. In keeping with the fact that the location of D614G within the S protein is remote from the receptor binding domain, that J o u r n a l P r e -p r o o f D614G affinity for ACE2 is less than that of D614 (Figure 4) , and that the relatively better concealed D614 receptor binding domain is likely to be advantageous for immune evasion, the D614G and D614 variants are equally sensitive to neutralization by human monoclonal antibodies targeting the S protein RBD ( Figure 5 ). Though the analysis of SARS-CoV-2 sequence variants presented here is based on viral RNA obtained from tens of thousands of people infected with the virus from around the world, the available samples are highly skewed in terms of geographic origin, and they reflect only a fraction of a percent of all circulating SARS-CoV-2. Additional sequencing of archived samples, or of viruses currently circulating, may shed further light on the pandemic trajectory of D614G. The current high frequency of D614G throughout the world suggests that this variant transmits person-to-person more efficiently from than do viruses bearing D614, but demographically matched cohorts, that might be used for comparing transmission likelihood of D614 versus D614G, have been difficult to assemble. Another complication of any epidemiologic study of human transmission is that D614G is generally accompanied by three other sequence variants. Nonetheless, the pseudotype experiments presented here show a pronounced increase in infectivity with D614G in isolation, and the structural studies are consistent with conformational changes expected for a more infectious S protein variant. Ultimately, the pseudotype results presented here need confirmation in the context of full-length recombinant SARS-CoV-2, and extension to transmission studies using an animal model. Finally, the structural determination of D614G performed here was with a widely used soluble version of the S protein which differs from the native protein in three aspects. First, the original furin cleavage site was removed. Second, a di-proline motif was introduced to stabilize the S protein. Third, the original transmembrane domain was substituted by a synthetic trimerization helix. Examination of the J o u r n a l P r e -p r o o f effect of D614G on native S protein will require electron cryotomography to directly visualize the S protein on virion-like particles. We thank all the members of the Sabeti, Shen, and Luban labs, and Thermo Fisher Scientific Workflow solutions team for technical assistance and helpful discussions. This work was supported by NIH grants R37AI147868 and R01AI148784 to J.L., NIH/NIAID U19AI110818 to (B) Lentiviral virions bearing a luciferase transgene, pseudotyped with either SARS-CoV-2 D614 or D614G S proteins, were produced by transfection of HEK293 cells, and used to transduce human HEK293 cells transiently transfected with plasmids encoding the indicated ACE2 orthologs. Relative infectivity of D614G vs D614, with D614 set at 1, was determined based on bulk luciferase activity. Each point represents the mean +/-standard deviation after transduction using lentiviral stock derived from an independent transfection, each of which is the mean of three technical replicates. P values are ratio paired t test (two-tailed). Total protein in cell lysates and in ultracentrifuge pellets from cell culture supernatant was normalized by Bradford assay, and then Western blots were performed with the primary antibodies indicated on the left of the blots. Anti-Raptor antibody was used as a loading control for the cell lysate. Uncleaved S protein, as well as the S1 and S2 cleavage products, are indicated on the right. Results here are representative of three independent rounds of transfection, ultracentrifugation, and westerns. (E) Summary of kinetic parameters measured in A-D. D614G binds ACE2 five-fold weaker than D614 at both temperatures tested. (C) Comparison of the two D614G protomer S1 subunit conformations with the corresponding conformations of the D614 protomer S1 subunit. Resource Availability Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Jeremy Luban (jeremy.luban@umassmed.edu). Plasmids: As itemized in the Key Resources The script for analyzing and plotting D614G variant frequency is available via GitHub: https://github.com/broadinstitute/sc2-variation-scripts. Coordinates for the CryoEM structures determined here have been deposited to PDB (6XS6) and EMDB (EMD-22301) databases. J o u r n a l P r e -p r o o f 24 hrs prior to transfection, 6 × 10 5 HEK-293 cells were plated per well in 6 well plates. All The frequency of the SARS-CoV-2 D614G S protein variant in published genomic data was examined using the full Nextstrain-curated set of sequences available from GISAID as of 25 June 2020 (Hadfield et al., 2018; Shu and McCauley, 2017) . Sequences were aligned to the ancestral reference sequence (NCBI GenBank accession NC_045512.2) using mafft v7.464 (Katoh and Standley, 2013) with the "--keeplength" and "--addfragments" parameters, which preserve the coordinate space of the reference sequence. To remove lower-quality sequences from the dataset, all sequences in the alignment were masked with ambiguous bases ('N') in the regions spanning the first 100bp and the last 50bp, as well as at error-prone sites located at the (1-indexed, NC_045512.2 coordinate space) positions 13402, 24389, 24390. Sequences shorter than 28kb or with >2% ambiguous bases were removed from the alignment. The frequency of D614G was calculated in the resulting data by extracting the sequence region corresponding to the gene for the S protein, spanning 21563-25384bp. These sequences were processed using a script importing biopython (Cock et al., 2009) to remove any gaps introduced by the alignment process and translate the sequence to protein space. The identity of the variant at amino acid position 614 was tabulated for the full dataset and reported as frequency by date using the collection dates reported in the Nextstrain-curated metadata file available from GISAID (Hadfield et al., 2018; Shu and McCauley, 2017) . The frequency was calculated as (# J o u r n a l P r e -p r o o f sequences with D614G)/(# sequences). Frequency within the six continental regions was calculated based on the "region" geographic classification associated with each sample in the metadata. Frequency values were linearly interpolated for dates surrounded by valid data. The frequency of the last date with data was carried forward where recent dates lack data. The resulting values were rendered as plots using matplotlib (Hunter, 2007) . The script for analyzing and plotting D614G variant frequency is available via GitHub: https://github.com/broadinstitute/sc2-variation-scripts The diversity of SNPs and their functional effects based on the same GISAID sequences and MAFFT alignment used to plot the frequency of D614G over time, with the 5' and 3' ends not masked. In the alignment, ambiguous nucleotide codes (R,Y,W,S,M,K,H,B,V,D) were all masked with "N" values. SNPs were calculated from the alignment using the snp-sites tool (Page et al., 2016) . The resulting VCF-format file was normalized using bcftools (Li et al., 2009) to include only SNPs. The VCF file with SNPs was annotated for functional effects using SnpEff (Cingolani et al., 2012) . VSV-SARS-CoV-2-S pseudoparticle generation and neutralization assays were performed as previously described Hansen et al., 2020) . The cells were incubated 1 hour at 37°C with 5% CO2. Cells were washed three times with PBS to remove residual input virus and overlaid with DMEM high glucose media (Life Technologies) with 0.7% low IgG BSA (Sigma), sodium pyruvate (Life Technologies), and gentamicin (Life Technologies). After 24 hours at 37° C with 5% CO2, the supernatant containing pseudoparticles was collected, centrifuged at 3,000 x g for 5 minutes to clarify, aliquoted, and frozen at -80° C. For neutralization assays, Vero cells were seeded in 96-well plates 24 hours prior to assay and grown to 85% confluence before challenge. Antibodies were diluted in DMEM high 3 glucose media containing 0.7% Low IgG BSA (Sigma), 1X Sodium Pyruvate, and 0.5% Gentamicin (this will be referred to as "Infection Media") to 2X assay concentration and diluted 3-fold down in Infection Media, for an 11-point dilution curve in the assay beginning at 10 ug/mL Proteins in VLP and cell lysate samples were separated by SDS-PAGE, as follows: 20 ul of unboiled VLP and cell lysate samples on a 10-20% Tris-Glycine gel to probe for the M protein; 2 ul of boiled cell lysate and 5 ul of boiled VLP samples on a 12% Tris-Glycine gel to probe for the N protein; 20 ul of boiled lysate and VLP samples on a 10% Tris-Glycine gel to probe for the S J o u r n a l P r e -p r o o f protein; 5 ul of boiled lysate on a 10% Tris-Glycine gel to probe for Raptor, as a loading control. Proteins were electro-transferred from the gels to PVDF membrane, which was blocked with 5% milk in Tris-Buffered Saline, pH 8.0, with 0.1% Tween-20, and detected with the indicated antibodies. Binding kinetics and affinities for ACE2.Fc were assessed using surface plasmon resonance technology on a Biacore T200 instrument (GE Healthcare, Marlborough, MA) using a Series S CM5 sensor chip in filtered and degassed HBS-EP running buffer (10 mM HEPES, 150 mM NaCl, 3mM EDTA, 0.05% (v/v) polysorbate 20, pH 7.4). Capture sensor surfaces were prepared by covalently immobilizing with a mouse anti-human Fc mAb (REGN2567) on the chip surface using the standard amine coupling chemistry, reported previously (Johnsson et al., 1991) . Following surface activation, the remaining active carboxyl groups on the CM5 chip surface were blocked by injecting 1 M ethanolamine, pH 8.0 for 7 minutes. A typical resonance unit (RU) signal of ~12,000 RU was achieved after the immobilization procedure. At the end of each cycle, the anti-human Fc surface was regenerated using a 12 second injection of 20 mM phosphoric acid. Following the capture of the ACE2.Fc on the anti-human Fc mAb immobilized surface, 0.78 nM -50 nM, two-fold serial dilutions, in duplicate, of soluble SARS-CoV-2 spike trimer protein, D614 or D614G, were injected for 3 minutes at a flow rate of 50 mL/min, with a 2 minute dissociation phase in the running buffer. All specific SPR binding sensorgrams were double-reference subtracted as reported previously (Myszka, 1999) and the kinetic parameters were obtained by globally fitting the double-reference subtracted data to a 1:1 binding model with mass transport limitation using Biacore T200 Evaluation software v 3.1 (GE Healthcare). The dissociation rate constant (k d ) was determined by fitting the change in the binding response during the dissociation phase and the association rate constant (k a ) was determined by globally fitting analyte binding at different concentrations. The equilibrium dissociation constant (K D ) was J o u r n a l P r e -p r o o f calculated from the ratio of the k d and k a . The dissociative half-life (t ½ ) in minutes was calculated as ln2/(k d *60). The steady state analysis was performed using Scrubber software and the K D value was determined. FreeStyle 293-F cells were cultured in SMM-293 TII serum-free media (SinoBiological) and maintained in a 37°C shaker with 8% CO 2 and 80% humidity. 600 ug of plasmid encoding Histagged SARS-CoV-2 S protein was transfected into 400 ml of 293 FreeStyle cells at 10 6 cells/ml, using 25 ml Opti-MEM and 1.8 ml PEI. 60 hours later, the media was collected and applied to 3 ml of Ni-NTA resin (Qiagen Movie frame alignment, estimation of microscope contrast-transfer function parameters, particle picking, 2D classification, and homogeneous refinement using the published structure EMD-21452 as initial model were carried out in cryoSPARC. Ensemble averaging of 266,356 particles resulted in a 3.7Å map with C3 symmetry imposed according to "gold standard" Fourier shell correlation of 0.143. Local resolution was estimated using cryoSPARC to extend from 3Å to 6 Å. Particle stacks with well-refined orientation parameters were imported in Relion3.1. Focused 3D classifications with a soft mask on the S1 subunit of the protomer were performed on the C3 symmetry expanded particles. Two monomer conformations, namely the closed state (a) and the open state (b) were identified after 3 rounds of classification. Four trimer classes were identified by 3D classification on the trimer particles: class_3a (13,555 particles), class_3b (51,600 particles), class_2a1b (96,029 particles), class_1a2b (105,118 particles). Homogeneous refinements were then performed on these 4 classes, with EMD-21452 as the initial model in Relion3.1. C3 symmetry was imposed on the refinement of particles from class_3a and class_3b; C1 symmetry was imposed on the refinement of particles from class_2a1b and class_1a2b. The final resolutions were class_1a2b with 4Å, class_2a1b with 3.9Å, class_3a with 4.2Å and class_3b with 3.9Å. The final map was deposited with the accession code EMD-22301. Atomic models were prepared with Coot based on the resolved structure of D614 SARS-CoV-2 Spike (PDB: 6vxx and 6vsb). Real-space refinements were performed using PHENIX with secondary structure restraints. MolProbity was used to evaluate the geometries of the structural J o u r n a l P r e -p r o o f model. Corrected Fourier shell correlation curves were calculated using the refined atomic model and the cryo-EM density map. The coordinates were deposited with the accession code 6XS6. To validate the structural model, the following analysis was performed to validate the map resolution and prevent overfitting: (1) "Gold Standard" FSC curve (Fig. S1E); (2) Docking previously resolved structures into the map and checking the appearance of expected structure elements; (3) Model-to-map (Fig. S1G ) and map-to-map FSC curves (Fig. S1F ). All the results validated the map and model. GraphPad Prism 8.4.3 was used to analyze the infectivity data using a ratio paired t test. In these experiments, all values shown are the mean with standard deviation, with the actual calculated two-tailed P value indicated on the figure. S1-NTD S1-RBD S1-CTD S1-INT S2 S1-NTD S1-RBD S1-CTD S1-INT S2 S1-NTD S1-RBD S1-CTD S1-INT S1-NTD S1-RBD S1-CTD S1-INT The proximal origin of SARS-CoV-2 Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic The production of recombinant infectious DI-particles of a murine coronavirus in the absence of helper virus A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118 Biopython: freely available Python tools for computational molecular biology and bioinformatics The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Coronaviruses: an RNA proofreading machine regulates replication fidelity and diversity Ebola Virus Glycoprotein with Increased Infectivity Dominated the 2013-2016 Epidemic Identification of a novel coronavirus in patients with severe acute respiratory syndrome Temporal data series of COVID-19 epidemics in the USA, Asia and Europe suggests a selective sweep of SARS-CoV-2 Spike D614G variant We shouldn't worry when a virus mutates during disease outbreaks Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding Nextstrain: real-time tracking of pathogen evolution Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Error thresholds and the constraints to RNA virus evolution Clinical features of patients infected with 2019 novel coronavirus in Wuhan Generation of synthetic severe acute respiratory syndrome coronavirus pseudoparticles: implications for assembly and vaccine production Matplotlib: A 2D Graphics Environment Immobilization of proteins to a carboxymethyldextran-modified gold surface for biospecific interaction analysis in surface plasmon resonance sensors MAFFT multiple sequence alignment software version 7: improvements in performance and usability Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus A novel coronavirus associated with severe acute respiratory syndrome Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins The Sequence Alignment/Map format and SAMtools Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Recently Identified Mutations in the Ebola Virus-Makona Genome Do Not Alter Pathogenicity in Animal Models High-density amplicon sequencing identifies community spread and ongoing evolution of SARS-CoV-2 in the Southern United States Improving biosensor analysis SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments Immunogenicity and structures of a rationally designed prefusion MERS-CoV spike antigen Biological analysis of human immunodeficiency virus type 1 R5 envelopes amplified from brain and lymph node tissues of AIDS patients with neuropathology reveals two distinct tropism phenotypes and identifies envelopes in the brain that confer an enhanced tropism and fusigenicity for macrophages HIV-1 R5 macrophage-tropic envelope glycoprotein trimers bind CD4 with high affinity, while the CD4 binding site on non-macrophagetropic, T-tropic R5 envelopes is occluded Spontaneous Mutation at Amino Acid 544 of the Ebola Virus Glycoprotein Potentiates Virus Entry and Selection in Tissue Culture A planarian nidovirus expands the limits of RNA genome size HIV-1 replication in the central nervous system occurs in two distinct cell types Structural basis of receptor recognition by SARS-CoV-2 GISAID: Global initiative on sharing all influenza data -from vision to reality Coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics Thinking Outside the Triangle: Replication Fidelity of the Largest RNA Viruses Structural insights into coronavirus entry Human Adaptation of Ebola Virus during the West African Outbreak Nucleocapsid-independent assembly of coronavirus-like particles by co-expression of viral envelope protein genes Evaluating the effects of SARS-CoV-2 Spike mutation D614G on transmissibility and pathogenicity Comparing viral load and clinical outcomes in Washington State across D614G mutation in spike protein of SARS-CoV-2 Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Biochemical Basis for Increased Activity of Ebola Glycoprotein in the 2013-16 Epidemic Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China A new coronavirus associated with human respiratory disease in China Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein A pneumonia outbreak associated with a new coronavirus of probable bat origin A Novel Coronavirus from Patients with Pneumonia in China