key: cord-0726952-pn6ym10f authors: Walls, Alexandra; Tortorici, M. Alejandra; Bosch, Berend‐Jan; Frenz, Brandon; Rottier, Peter J. M.; DiMaio, Frank; Rey, Felix A.; Veesler, David title: Crucial steps in the structure determination of a coronavirus spike glycoprotein using cryo‐electron microscopy date: 2016-10-18 journal: Protein Science DOI: 10.1002/pro.3048 sha: 6d3dd61e2046be5eb8a1cac9fa380482544eb31e doc_id: 726952 cord_uid: pn6ym10f The tremendous pandemic potential of coronaviruses was demonstrated twice in the last 15 years by two global outbreaks of deadly pneumonia. Entry of coronaviruses into cells is mediated by the transmembrane spike glycoprotein S, which forms a trimer carrying receptor‐binding and membrane fusion functions. Despite their biomedical importance, coronavirus S glycoproteins have proven difficult targets for structural characterization, precluding high‐resolution studies of the biologically relevant trimer. Recent technological developments in single particle cryo‐electron microscopy allowed us to determine the first structure of a coronavirus S glycoprotein trimer which provided a framework to understand the mechanisms of viral entry and suggested potential inhibition strategies for this family of viruses. Here, we describe the key factors that enabled this breakthrough. Abstract: The tremendous pandemic potential of coronaviruses was demonstrated twice in the last 15 years by two global outbreaks of deadly pneumonia. Entry of coronaviruses into cells is mediated by the transmembrane spike glycoprotein S, which forms a trimer carrying receptor-binding and membrane fusion functions. Despite their biomedical importance, coronavirus S glycoproteins have proven difficult targets for structural characterization, precluding high-resolution studies of the biologically relevant trimer. Recent technological developments in single particle cryo-electron microscopy allowed us to determine the first structure of a coronavirus S glycoprotein trimer which provided a framework to understand the mechanisms of viral entry and suggested potential inhibition strategies for this family of viruses. Here, we describe the key factors that enabled this breakthrough. Keywords: coronavirus spike protein; cryo-electron microscopy; rational vaccine design; rosetta; relion Coronaviruses are enveloped viruses with large positive-sense RNA genomes. In humans, coronaviruses are responsible for up to 30% of respiratory tract infections including mild upper respiratory tract infections (common cold), croup, bronchiolitis and pneumonia. 1 In addition, coronaviruses have fostered a lot of attention in the last 15 years due to the emergence of deadly viruses with tremendous pandemic potential: severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle-East respiratory syndrome coronavirus (MERS-CoV). 1,2 After its first occurrence, SARS-CoV rapidly spread around the world, reaching all five continents and resulting in over 8096 cases and 774 deaths by July 2003. † These authors contributed equally to this work. Broad Audience Statement: The recent emergence of highly pathogenic coronaviruses and the potential for future outbreaks have urged the need for a vaccine. Using cryo-electron microscopy, the first structure of the key antigenic, infectionmediating protein has been solved. This structure will assist rational vaccine design and development of strategies to combat this family of viruses. The emergence of MERS-CoV in 2012 has resulted in the infection of 1800 people and 640 deaths as of today. Currently, there are no approved antiviral treatments or vaccines for any human coronavirus. Coronaviruses use homotrimers of the spike (S) glycoprotein to promote cell attachment and fusion of the viral and host membranes. As it is virtually the only antigen present at the virus surface, S is the main target of neutralizing antibodies during infection and a focus of vaccine design. 2 The coronavirus S is a class I viral fusion protein synthetized as a single chain precursor of 1300 amino acids which trimerizes upon folding. It comprises an N-terminal S 1 subunit containing the receptor-binding domain and a C-terminal S 2 subunit which is the membraneanchored stalk carrying out membrane fusion. Cleavage by furin-like host proteases at the junction between S 1 and S 2 (S 2 cleavage site) occurs during biogenesis for some coronaviruses such as murine hepatitis virus (MHV, the prototypical and best studied coronavirus). [3] [4] [5] Coronavirus spike proteins have proven difficult targets for structural characterization and all reported studies have provided atomic resolution data for only a few isolated domains. [6] [7] [8] [9] [10] [11] [12] [13] The SARS-CoV S has also been studied in its native environment by cryo-electron microscopy (cryoEM) of intact virions, providing insights at low resolution into its overall shape. 14, 15 However, the lack of highresolution data for any coronavirus spike trimer until earlier this year had prevented a detailed analysis of the mechanisms associated with infection. Single-particle cryoEM is an increasingly important technique in structural biology, which enables the study of biological macromolecules in a nearnative environment. Cryo-EM is undergoing a technological revolution due to the development of direct detection cameras and dedicated algorithms for tracking beam-induced motion and stage drift in recorded movies. [16] [17] [18] [19] [20] [21] These advances led to an explosion of the number of high-resolution structures determined using cryoEM worldwide for numerous proteins and protein complexes that had previously been intractable using other structural techniques. We leveraged these recent advances to determine the first near-atomic resolution structure of a coronavirus S glycoprotein trimer earlier this year 22 employed for cryoEM data collection and processing, and the availability of a recently developed de novo model building algorithm using Rosetta. [23] [24] [25] Construct Design Viral fusion proteins adopt a metastable pre-fusion conformation at the virus surface until triggered to rearrange into a more stable post fusion conformation which promotes merger of viral and host membranes. 26 The significant magnitude of the conformational changes taking place during the fusion reaction could result in masking of epitopes initially accessible in the prefusion state and exposure of new epitopes specific to the post-fusion state. As a result, vaccine design initiatives aim at targeting the pre-fusion state of viral fusion proteins, which correspond to the conformation that could be detected by the immune system before infection. The intrinsic metastability of viral fusion proteins is usually associated with challenges to preserve the pre-fusion state during purification. This is illustrated by the case of the respiratory syncytial virus (paramyxovirus) F protein which required coexpression of the ectodomain (fused to a T4 foldon motif) with a pre-fusion specific Fab to enable isolation of this conformation. [27] [28] [29] During biogenesis, the MHV S protein is often naturally cleaved at the S 1 -S 2 junction (S 2 cleavage site) by Golgi-resident furin(-like) proteases 3,30 [ Fig. 2(A) ] resulting in an increase in its fusogenic propensity. After cleavage, the S 1 and S 2 subunits remain non-covalently associated in the metastable prefusion S trimer. In the case of SARS-CoV and MERS-CoV, S 2 processing has also been suggested to promote subsequent cleavage at a second site located just upstream of the fusion peptide (S 2 ' cleavage site) to allow the fusion reaction to proceed upon virion uptake by a target cell. 4, 5 We engineered a construct featuring a single amino acid substitution in the S 2 cleavage site to prevent furin processing and enhance the stability of the MHV S ectodomain pre-fusion structure. Substitution of an arginine residue present at position 717 by a serine residue at the site of cleavage (from RAHR# to RAHS) resulted in the purification of a homogeneous uncleaved protein product as confirmed by SDS-PAGE analysis [ Fig. 2(B) ]. Although MHV S is known to oligomerize into homo-trimers upon translation in vivo, expression of the ectodomain yielded predominantly monomers, indicating that the transmembrane domain is required for trimerization and/or trimer stabilization. To promote oligomerization, an engineered trimerization motif based on the transcription factor GCN4 31,32 was C-terminally fused to the MHV S ectodomain in frame with the heptad repeat 2 (HR2) motif helix [ Fig. 2(A) ]. Biophysical analyses using analytical size exclusion chromatography coupled online to multi-angle light scattering 33 (SEC-MALS) as well as native mass spectrometry confirmed the trimeric organization of the GCN4 stabilized MHV S ectodomain [ Fig. 2(C) ]. Proper folding of the purified MHV S ectodomain was confirmed by analyzing its binding affinity to the CEACAM1a ectodomain (the viral receptor) using microscale thermophoresis [ Fig. 2(D) ]. We determined a dissociation equilibrium constant of 48.5 6 3.8 nM which is in good agreement with the value of 21.4 6 4.2 nM reported by Peng et al. 12 for the isolated receptor-binding domain. Imaging of this sample using negative staining EM further confirmed the homogeneity of the purified protein and suitability for high-resolution studies [ Fig. 2(E) ]. Ice thickness has a strong influence on the final achievable resolution of single particle reconstructions. Ideally, the vitreous ice should be as thin as possible to still accommodate the particles of interest while maximizing Thon ring intensity at high spatial frequencies. 34 Imaging was completed on a Titan Krios 300 kV microscope equipped with a Gatan K2 Summit direct electron detector operated in counting mode. 18 Similarly to our previous work on the Thermoplasma acidophilum 20S proteasome, we initially curves of the initial 3D reconstruction obtained after 2D classification (pink), the 3D reconstruction obtained after the first round of 3D classification (green) and the final reconstruction obtained after particle polishing and an additional round of 3D classification with local searches (blue). (B-C) Density corresponding to the upstream helix is shown alone (B) or with the corresponding atomic model (C) for the three aforementioned maps after scaling (using the same coloring scheme as in (A)) to illustrate the significant enhancement of map quality observed throughout processing. sought to acquire data from holes having the thinnest possible vitreous ice. 35 However, the MHV S protein clearly showed signs of denaturation when images where acquired in such conditions [ Fig. 3(A) ]. We interpret this observation as resulting from the surface tension exerted on the S trimers in thin vitreous ice. Hence, we targeted holes with slightly thicker ice than desired in which we could observe compact well-folded MHV S trimers, similar to what was observed using negative staining EM [ Fig. 3(B) ]. We collected a large dataset (1,600 micrographs) at high defocus (2.0-5.0 mm) to maximize the low-resolution contrast and our ability to align the particle images during subsequent processing. This example illustrates that although it is not always possible to acquire data in the thinnest possible areas of ice (especially for fragile protein complexes), near-atomic resolution reconstructions can still be obtained by tailoring the imaging conditions appropriately. One of the major challenges encountered during processing of cryoEM data is the presence of multiple 3D structures in a given dataset. These differences can result from different conformations of the same protein, different chemical compositions due to loss of one or several subunits of a protein complex, or (partial) denaturation of a fraction of the particles during purification or vitrification. If left untreated, this heterogeneity can limit the resolution and compromise the quality of the final map. 3D classification has emerged as an extraordinarily powerful tool to deal with structural heterogeneity in allowing to computationally isolate homogeneous subsets of the data. [36] [37] [38] We relied on extensive 2D and 3D classification using the Relion software 39, 40 to deal with the marked structural heterogeneity of the MHV S ectodomain trimer dataset. We ran a first round of 3D classification without imposing symmetry to improve separation of "good" and "compromised" particle images. Figure 4 shows isosurface representation [ Fig. 4(A) ] and slices [ Fig. 4(B) ] going through the center of each of the four reconstructions corresponding to the four classes requested during unsupervised 3D classification. Although looking at the classified maps did reveal differences between the different classes, looking at the aforementioned slices further confirmed the structural heterogeneity present in this dataset at a glance, as previously suggested by Scheres et al. 19 At the resolution of our analysis, we could not identify distinct conformations of the MHV S trimer and postulated that the particles contributing to less-well resolved classes could be partially denatured. Comparison of the results of projectionmatching refinements (using C3 symmetry) run before and after the aforementioned 3D classification step suggested that both reconstructions had similar resolution (4.4 Å ) according to the gold standard Fourier shell correlation (FSC 0.143 ) criterion [ Fig. 5(A) ]. The quality of the two maps, however, differed significantly as only the reconstruction computed after 3D classification showed features compatible with the resolution estimate [ Fig. 5(B,C) ]. This case study highlights that gold standard FSC measures internal consistency between two halves of the data 41,42 not resolution, and that the quality of the final map should always be in agreement with any numerical estimates of resolution. Starting from 1,200,000 particle images, we significantly reduced the size of the data set to 82,000 particles using 2D and 3D classification to generate the final 3D reconstruction at 4Å resolution showing well resolved ahelices, b-strands and amino acid side chains for a large part of the map [ Fig. 5(B,C) ]. Obtaining an atomic model of the MHV S glycoprotein required a hybrid approach combining docking of available crystal structures, de novo modeling using Rosetta 23, 25, 43, 44 and Coot 45, 46 and densityguided homology modeling using RosettaCM. 24 The C-terminal S 2 subunit, which is the fusion machinery, is best defined in the density and was built using a combination of hand tracing with Coot and Rosetta de novo building. 25 The observation of large, bulky side chain densities, several disulfide bonds resolved in the map and of density putatively corresponding to glycans for several asparagine residues belonging to N-glycosylation sequons were used as internal controls during model building [ Fig. 6(A,B) ]. The density corresponding to the N-terminal receptor-binding S 1 subunit is not as well resolved as for the fusion machinery and features various levels of resolution in the reconstruction. The availability of two crystal structures for domain A 12, 47 (including a structure of the MHV domain A) and of several crystal structures for domain B 10,11 was of tremendous assistance and allowed us to directly dock these models into the reconstruction. Roset-taCM was then used to rebuild the core b-sheet of domain B and to derive a putative model (using density-guided homology modeling) for the disordered extension corresponding to the receptorbinding motifs in MERS-CoV and SARS-CoV. The quality of the map corresponding to domains C and D hampered manual sequence assignment for this region of the protein. Rosetta de novo 25 successfully identified a 30 residue-long fragment allowing to anchor the sequence register for domains C and D. The placement of several bulky side chains accounted for by the density and the identification of putative N-linked glycans suggested correct assignment, and allowed completion of the model [ Fig. 6(C,D) ]. The density for the linker connecting the S 1 and S 2 subunits is poorly resolved and Rosetta de novo was used to generate a putative model of this region of the protein which should be analyzed cautiously, as suggested by the high B-factors associated with it. In addition to recent developments in direct detector technology, the determination of the first nearatomic resolution structure of a coronavirus spike glycoprotein trimer was made possible by (i) engineering a pre-fusion stabilized ectodomain construct, (ii) using extensive computational classification of particle images to sort out sample heterogeneity and (iii) relying on major advances in the Rosetta automated model building algorithm. To conclude, our results allowed the identification of a conserved neutralizing epitope at the surface of the protein and suggested potential vaccinology strategies to elicit broadly neutralizing antibodies against coronaviruses. 22 This could pave the way toward the development of the first vaccine against human coronaviruses. Coronaviruses: important emerging human pathogens The spike protein of sars-cov-a target for vaccine and therapeutic development The coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex Host cell entry of middle east respiratory syndrome coronavirus after twostep, furin-mediated activation of the spike protein Host cell proteases: critical determinants of coronavirus tropism and pathogenesis Structural basis for coronavirusmediated membrane fusion. Crystal structure of mouse hepatitis virus spike protein fusion core Central ions and lateral asparagine/glutamine zippers stabilize the post-fusion hairpin conformation of the sars coronavirus spike glycoprotein Structure of the fusion core and inhibition of fusion by a heptad repeat peptide derived from the s protein of middle east respiratory syndrome coronavirus Structure of a proteolytically resistant core from the severe acute respiratory syndrome coronavirus s2 fusion protein Molecular basis of binding between novel human coronavirus mers-cov and its receptor cd26 Structure of sars coronavirus spike receptor-binding domain complexed with receptor Crystal structure of mouse coronavirus receptor-binding domain complexed with its murine receptor Crystal structure of nl63 respiratory coronavirus receptor-binding domain complexed with its human receptor Architecture of the sars coronavirus prefusion spike Conformational reorganization of the sars coronavirus spike following receptor binding: implications for membrane fusion Movies of iceembedded particles enhance resolution in electron cryo-microscopy Maximizing the potential of electron cryomicroscopy data collected using direct detectors Electron counting and beam-induced motion correction enable nearatomic-resolution single-particle cryo-em Ribosome structures to near-atomic resolution from thirty thousand cryo-em particles Measuring the optimal exposure for single particle cryo-em using a 2.6 a reconstruction of rotavirus vp6 Beam-induced motion correction for sub-megadalton cryo-em particles Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer Atomic-accuracy models from 4.5-a cryo-electron microscopy data with density-guided iterative local refinement High-resolution comparative modeling with rosettacm De novo protein structure determination from near-atomic-resolution cryo-em maps Viral membrane fusion Structure of rsv fusion glycoprotein trimer bound to a prefusion-specific neutralizing antibody Structure of respiratory syncytial virus fusion glycoprotein in the postfusion conformation reveals preservation of neutralizing epitopes Structural basis for immunization with postfusion respiratory syncytial virus fusion f glycoprotein (rsv f) to elicit high neutralizing antibody titers Coronavirus cell entry occurs through the endo-/lysosomal pathway in a proteolysis-dependent manner Structure of the parainfluenza virus 5 f protein in its metastable, prefusion conformation Crystal structure of gcn4-piqi, a trimeric coiled coil with buried polar residues Production and biophysical characterization of the cora transporter from methanosarcina mazei Single-particle cryo-em data acquisition by using direct electron detection camera ) 2.8 Å resolution reconstruction of the thermoplasma acidophilum 20s proteasome using cryo-electron microscopy Likelihood-based classification of cryo-em images using frealign Disentangling conformational states of macromolecules in 3d-em through likelihood optimization Sampling the conformational space of the catalytic subunit of human gamma-secretase Relion: implementation of a Bayesian approach to cryo-em structure determination A Bayesian view on cryo-em structure determination Ambiguities in helical reconstruction Resolution advances in cryo-em enable application to drug discovery Modeling symmetric macromolecular structures in rosetta3 Cryo-em model validation using independent map reconstructions Features and development of coot Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions Crystal structure of bovine coronavirus spike protein lectin domain