key: cord-0791799-gr60p3q0 authors: Semper, Cameron; Watanabe, Nobuhiko; Savchenko, Alexei title: Structural characterization of Nonstructural protein 1 from SARS-CoV-2 date: 2020-09-08 journal: bioRxiv DOI: 10.1101/2020.09.08.288191 sha: 4927ffe8bbe365e10f8f38baa815715ab94dfe8c doc_id: 791799 cord_uid: gr60p3q0 Severe acute respiratory syndrome (SARS) coronavirus-2 (SARS-CoV-2) is a single-stranded, enveloped RNA virus and the etiological agent of the current COVID-19 pandemic. Efficient replication of the virus relies on the activity of nonstructural protein 1 (Nsp1), a major virulence factor shown to facilitate suppression of host gene expression through promotion of host mRNA degradation and interaction with the 40S ribosomal subunit. Here, we report the crystal structure of the globular domain of SARS-CoV-2 Nsp1, encompassing residues 13 to 127, at a resolution of 1.65 Å. Our structure features a six-stranded, capped β-barrel motif similar to Nsp1from SARS-CoV and reveals how variations in amino acid sequence manifest as distinct structural features. Through comparative analysis of structural homologues, we identified a topological signature associated with this protein fold that facilitated modeling of Nsp1 from MERS-CoV. Combining our high-resolution crystal structure with existing data on the C-terminus of Nsp1 from SARS-CoV-2, we propose a model of the full-length protein. Our results provide unparalleled insight into the molecular structure of a major pathogenic determinant of SARS-CoV-2. In March of 2020, the World Health Organization (WHO) declared Coronavirus Disease 3 2019 (COVID-19) a global pandemic. As of September 2020, there have been more than 4 26,000,00 cases of infection reported globally and approximately 860,000 deaths attributed to 5 COVID-19 [1] . The etiological agent of this pandemic has been identified as Severe Acute 6 Respiratory Syndrome coronavirus-2 (SARS-CoV-2), a member of the Betacoronavirus genus 7 and closely related to the SARS-CoV that caused the SARS outbreak of 2002-2004 [2, 3] . 8 Coronaviruses infect a diverse array of vertebrates, with infection typically resulting in 9 respiratory disease or gastroenteritis [4] . Their broad host range allows for a substantial 10 reservoir for human infection and mutation facilitates cross-species transmission [5] . There is 11 mounting evidence that SARS-CoV-2 originated via mutation and cross-species transmission of 12 a pangolin coronavirus with which it shares ~97% sequence identity [6] . This highlights the 13 dramatic impact that relatively few amino acid substitutions can have in coronaviruses and 14 underscores the urgent need for characterization of these infectious agents at a molecular level. 15 SARS-CoV-2 is an enveloped, positive-sense RNA virus with a single-stranded genome 16 approximately 30 kb in size. The genome is 5´-capped and 3´poly-adenylated and the first ~2/3 17 of the genome encode two overlapping reading frames that produce polyprotein 1a and 1ab [7] . 18 Downstream of this region, the remaining 1/3 of the genome encodes for structural proteins and a 19 number of ORFs that produce accessory proteins largely of unknown function. Polyprotein 1a 20 and 1ab are large polypeptides that are processed post-translationally by viral-encoded proteases 21 to produce non-structural proteins (nsp) 1-16 [8] . The genome also contains 5´ and 3´ 22 untranslated regions (UTRs), the former of which plays a critical role in self-recognition that 23 allows for SARS-CoV-2 protein production to occur unabated while host gene expression is 24 suppressed [9] . At the core of this mechanism is Nsp1, a 180 amino acid (AA) protein produced 1 via processing of polypeptide 1a and 1ab by the Papain-like protease domain of Nsp3 ( Figure 2 1A) [10] . 3 Nsp1 mediates a two-pronged approach to suppression of host gene expression. Firstly, it 4 inhibits translation of host proteins during the initiation stage through interaction with the 40S 5 ribosomal subunit. Secondly, Nsp1 promotes the degradation of host mRNA by endonucleolytic 6 cleavage within the 5´UTR, which in turn leads to accelerated Xrn1-mediated mRNA decay [11, 7 12] . Viral mRNAs are able to avoid the fate of host mRNAs through interaction between Nsp1 8 and the stem-loop 1 (SL1) motif found in the viral 5´ UTR [9] . Nsp1 from SARS-CoV has been 9 shown to inhibit the Type I Interferon response in infected cells, allowing for the virus to 10 circumvent the innate immune response [13] . Expression of SARS-CoV Nsp1 has also been 11 shown to induce the production of chemokines, suggesting this protein may play a role in the 12 "cytokine storm", a maladaptive release of cytokines in response to infection, associated with a 13 number of COVID-19 infections [14, 15] . Thus, Nsp1 has emerged as a major pathogenicity 14 factor that plays a critical role in the coronavirus infection cycle [16] . Deletion or mutation of 15 nsp1 results in viral attenuation in infection models and restores the innate immune response in 16 infected cells [17] . Based on its central role in suppression of the host immune response and 17 essentiality to infection, Nsp1 has been proposed as a therapeutic target for the treatment of 18 . A prerequisite to any investigation into possible interventions targeting Nsp1 is 19 high-resolution structural data that can facilitate robust in silico screening. A partial structure 20 corresponding to the N-terminal fragment of Nsp1 from SARS-CoV was resolved by NMR 21 spectroscopy and revealed a unique β-barrel motif. The corresponding domain from SARS-22 CoV-2 shares 86% sequence identity with its SARS-CoV ortholog, which is a considerable level 23 of sequence diversity compared to many other non-structural proteins encoded by the SARS-1 CoV-2 genome (e.g. Nsp12 shares 96% identity with SARS-CoV ortholog [18] ). This highlights 2 the need for a pursuit of structural characterization of SARS-CoV-2 proteins. 3 Here, we report the high-resolution crystal structure of the globular N-terminal domain of 4 Nsp1 from SARS-CoV-2 at 1.65 Å resolution. Our data reveals a high level of structural 5 conservation between Nsp1 of SARS-CoV-2 and SARS-CoV, but also some unique structural 6 features that likely contribute to increased stability of the β-barrel fold in SARS-CoV-2 Nsp1. 7 Comparative analysis reveals additional structural homologues in Nsp1 proteins from 8 Alphacoronaviruses, despite low levels of shared sequence identity. These results highlight the 9 critical role this unique protein fold plays in facilitating viral infection and suppression of host 10 gene expression. 11 In pursuit of structural characterization of Nsp1 from SARS-CoV-2 we expressed a 15 codon optimised version of the ORF encoding this protein in E. coli. Using this expression 16 system, we obtained purified full-length Nsp1 after a two-step purification protocol (see details 17 in Material and Methods). However, we were unable to crystallize the full-length Nsp1 protein, 18 likely due to the presence of flexibly disordered regions at the N-and C-termini. 19 Previous studies of Nsp1 from SARS-CoV reported the presence of a distinct globular 20 domain comprised of residues 13 to 127 [19] . To structurally characterise the N-terminal 21 fragment of Nsp1 from SARS-CoV-2 (Nsp113-127), we sub-cloned it and expressed it in E. coli. 22 The expression level of Nsp113-127 was comparable to that of the full-length protein and with this 23 CoV-2 Nsp113-127 structure coloured from N-terminus (blue) to C-terminus (red). Secondary 5 structure elements and the N-and C-termini are labeled. (D) Ribbon depiction of the structure, 6 with sidechains that contribute to the hydrophobic core of the beta barrel shown in stick 7 representation. The three layers of side chains, plus the charged residues at the bottom are 8 coloured and the residues within the layers are labeled. (E) Electrostatic surface potential of the 9 likely RNA-binding interface of SARS-CoV-2 Nsp113-127. R124 which is critical for the 10 interaction with SL1 of the 5´UTR is labeled. 11 12 fragment we obtained well-diffracting crystals in several conditions that enabled structural 1 determination of this portion of Nsp1. 2 The crystal structure of SARS-CoV-2 Nsp113-127 was determined via molecular replacement 3 using the SARS-CoV Nsp1(PDB: 2HSX) solution structure as the search model and refined to a 4 resolution of 1.65 Å ( Table 1 ). The asymmetric unit contained a single molecule of Analysis of the Nsp113-127 structure by PDBePISA revealed the protein has a total solvent-6 exposed area of 6102 Å 2 [20]. 7 As in the case of SARS-CoV, the structure of SARS-CoV-2 Nsp113-127 features a unique 8 topological arrangement resulting in the formation of a six stranded (n = 6) β-barrel that is 9 primarily antiparallel, with the exception of strands β1 (Q15 -V20) and β2 (C51 -V54) ( Figure 10 1B). Additional major structural features include α1 (V35 -D48) helix, which is positioned as a 11 cap along one opening of the β-barrel, two 310 helices that run parallel to each other and the β5 12 strand (I95 -Y97) which is not part of the β-barrel but forms a β-sheet interaction with the β4 13 strand (V84 -L92) ( Figure 1C ). 14 The core of the β-barrel is highly hydrophobic and is mainly comprised of the side chains of 15 thirteen amino acids organized into three layers. The first layer, which is adjacent to α1 helix, is 16 formed by sidechains of residues L16, L18, V69, L88, L107 and L123 ( Figure 1D ). The opening 17 of the β-barrel at this layer is obstructed by the α1 helix, which contributes the side chain of L46 18 to the centre of this first layer of core hydrophobic residues. The middle layer of the β-barrel is 19 comprised of the side chains of residues V20, L53, I71, V86 and V121, while the bottom layer 20 features residues V84 and L104. Adjacent to this bottom layer are four solvent-exposed 21 residues; E55, R73, E102 and R119 whose sidechains point inward towards the core of the β-22 barrel ( Figure 1D ). Consequently, the hydrophobic core at both ends of the β-barrel is occluded: 1 by the α1 helix on one side and the charged residues (E54, R73, E102, R199) on the other. 2 A distinctive feature of Nsp1 that is evident in the crystal structure of Nsp113-127 is the large 3 number of flexible loops. For one of these loops (A76 -H81), we were unable to resolve the 4 structure due to a lack of interpretable electron density. This region has been previously shown to 5 be highly flexible as evidenced by the multitude of backbone confirmations observed in the 6 NMR structure of SARS-CoV Nsp1 [19] . The positions of other loops in the crystal structure 7 were stabilized via interactions between specific secondary structure elements. Specifically, the 8 two 310 helices are aligned in the crystal structure forming H-bonds between R24 and Q63. This 9 interaction appears to stabilize the position of two of the largest loops in the crystal structure, the 10 β1-α1 (L21 -S34) loop and the β2-β3 (E54 -P67) loop. 11 Electrostatic surface potential analysis of the SARS-CoV-2 Nsp113-127 structure revealed 12 several regions of charged residue colocalization [21] . The most electronegative and 13 electropositive patches appear adjacent on the Nsp113-127 surface, separated by a small 14 electroneutral region corresponding to the α1 helix. The electropositive patch continues around 15 the surface of the protein, where the R124 residue is localized ( Figure 1E ). This residue has 16 been shown to be essential for the interaction between Nsp1 and SL1 of the 5´ UTR, implicating 17 this charged surface in RNA binding [9] . 18 The overall fold of SARS-CoV-2 Nsp113-127 was highly similar to that of the corresponding 21 SARS-CoV Nsp1 fragment in accordance with the significant primary sequence identity shared 22 between the two proteins ( Figure 2A) . However, detailed comparative analysis revealed several 23 notable differences between the two structures. The SARS-CoV-2 structure of Nsp113-127 1 features two 310 helices, compared to the single one found in SARS-CoV Nsp1. The primary 2 sequence corresponding to the second 310 helix is completely conserved between SARS-CoV-2 3 and SARS-CoV Nsp1 proteins, suggesting this secondary structure element may be transient in Mapping the non-conserved residues onto the SARS-CoV-2 Nsp113-127 structure revealed that 1 most of the differences are in solvent-exposed residues. Distinct patches of variation are 2 observed immediately adjacent to both ends of β4 and in the loop between β3 and β4 (S74 -3 H83), where 4 of 9 residues differ between SARS-CoV-2 and SARS-CoV ( Figure 3A ). The 4 remaining sequence variation is broadly distributed across the Nsp113-127 structure. Residues that 5 contribute to the hydrophobic core of the β -barrel are conserved between SARS-CoV-2 and 6 SARS-CoV, suggesting this particular organization is critical to the function of Nsp1. CoV-2) highlighted in red. Distinctive secondary structure features from the SARS-CoV-2 7 Nsp113-127 structure are labeled. (C) Homology model of β-barrel domain of Nsp1 from MERS-8 CoV. Conserved residues between MERS-CoV and SARS-CoV-2 Nsp1 are highlighted in 9 magenta and the N-and C-termini are labeled. 10 levels of structural conservation, despite the low levels (<15%) of shared sequence identity, 1 between these proteins highlights the plasticity of the unique protein fold found in Coronaviral 2 Nsp1 proteins ( Figure 3B ). 3 To further examine the plasticity of this fold, we examined the predicted secondary 4 structure of Nsp1 proteins from other coronaviruses known to infect humans (MERS, OC43, 5 HKU1, 229E, NL63). These Nsp1 proteins share minimal primary sequence identity with Nsp1 6 from SARS-CoV-2 and a BLASTP search failed to identify them as homologues (Table S1) . 7 Despite the low primary sequence identity, we were able to identify topological fingerprints, 8 regions comprised of 6-7 β-strands and at least one α-helix, in each of the Nsp1 sequences that 9 are indicative of an arrangement capable of forming the capped β-barrel motif observed in the 10 SARS-CoV-2 Nsp113-127 crystal structure ( Figure S1 ). We then modeled the region of MERS-11 CoV Nsp1 identified through this approach using our crystal structure as the threading template. 12 This resulted in a high-scoring (C-score = 0.48) homology model featuring the capped β-barrel 13 motif characteristic of the structure of SARS-CoV-2 Nsp113-127 ( Figure 3C ). Mapping the modest 14 primary sequence conservation between MERS-CoV and SARS-CoV-2 onto the homology 15 model revealed it to be distributed throughout the structure ( Figure 3C ). The β-barrel in the 16 MERS-CoV Nsp1 homology model retains the hydrophobic core observed in SARS-CoV-2, 17 highlighting a level of functional conservation between the two proteins that may not have been 18 predicted through sequence-based comparisons alone. At the C-terminus of the model is the 19 R146/K147 dipeptide (R124/K125 in SARS-CoV-2) that has been shown to be involved in 20 Recent cryo-EM data of SARS-CoV-2 Nsp1 in complex with the 40S ribosome were able 2 to resolve the structure of the C-terminal portion of Nsp1 at a resolution of 2.6 Å [25]. This study 3 confirms the presence of a C-terminal α-helix spanning residues S166 to G179 and revealed a 4 second helix spanning residues Y154 to N160 that was not predicted bioinformatically. The 5 resulting helix-turn-helix motif interacts with the ribosome and anchors Nsp1 to the pre-initiation 6 complex. 7 To place the crystal structure of Nsp113-127 in context of full-length protein we analyzed 8 the amino acid sequence using PSIPRED and DISPRRED3 [26, 27] . These tools predicted the N-9 terminal 12 residues outside of our structure contained a disordered protein-binding region and a 10 largely unstructured C-terminal domain with the exception of a single predicted α-helix. The 11 presence of large disordered regions adjacent to the globular domain of Nsp1 provide strong 12 rationale for why the full-length protein failed to crystallize. 13 Using our crystal structure data in combination with insights into the C-terminal domain 14 of Nsp1, we generated a homology-based model of the full-length Nsp1 protein. This model 15 maintains the capped β-barrel motif observed in our crystal structure and features a largely 16 disordered linker between the end of the globular domain and the C-terminal helix-turn-helix. 17 The C-terminal domain is shown as extending outward from the core of the structure at the 18 bottom of the β-barrel, at the face opposite of where the α1 helix localizes; however the 19 disordered linker connecting this motif from the rest of the structure suggests that the position of 20 this domain is likely to be highly dynamic and capable of multiple configurations (Figure 4) . 21 This predicted full-length structure aligns well with the partial density and model proposed by The portion of Orf1a encoding full-length Nsp1 was codon-optimized for E. coli expression, 8 synthesized (Codex DNA) and cloned into the pMCSG53 expression vector at the SspI site via 9 Gibson assembly. The fragment encompassing amino acid residues 13-127 of Nsp1 was PCR 10 amplified and cloned into the same vector via ligation-independent cloning. Plasmids were 11 transformed into the E. coli strain BL21(DE3)-Gold for protein expression. The same procedure 12 was used for purification of full-length Nsp1 and Nsp113-127. Cells were grown at 37°C and 200 13 rpm to an OD600 of 0.8, cooled to 20°C then induced with 1 mM IPTG and incubated for 16 14 hours. Cells were harvested via centrifugation at 5000 x g, resuspended in binding buffer (300 15 mM NaCl, 50 mM HEPES pH 7.5, 5 mM imidazole, 5% glycerol) and lysed via sonication. 16 Lysates were centrifuged at 20,000 x g for 45 minutes at 4°C and the supernatant was incubated 17 with Ni-NTA resin and rotated for 1 hour at 4°C. Nsp1 and Nsp113-127 were eluted in elution 18 buffer (300 mM NaCl, 50 mM HEPES pH 7.5, 250 mM imidazole, 5% glycerol) then incubated 19 with Tobacco-etch virus (TEV) protease overnight to cleave the N-terminal polyhistidine tag 20 while dialyzing to remove imidazole. The proteins were then passaged over a second Ni-NTA to 21 remove impurities. Nsp113-127 was immediately dialyzed into precrystallization buffer (300 mM 22 NaCl, 10 mM HEPES pH 7.5), while full-length Nsp1was further purified via gel filtration using 1 a Superdex75 column (300 mM NaCl, 10 mM HEPES pH 7.5). 2 3 Crystallization 4 Crystals of Nsp113-127 were grown at 298 K in 0.2 M sodium formate, 20% PEG3350 via the 5 vapour diffusion sitting-drop method. Prior to data collection the crystals were soaked in 0.2 M 6 sodium formate, 20% PEG3350, 30% glycerol and flash frozen in liquid nitrogen. 7 8 Data collection, structure determination and refinement 9 X-ray diffraction data of crystals of Nsp113-127 were collected at Advanced Photon Source 10 Beamline 19 ID (Argonne, Illinoi USA) under cryo-stream at 93.15 K. Diffraction data were 11 processed with HKL3000 suit [28] . Initial phase for Nsp113-127 was obtained by molecular 12 replacement with Molrep using NMR structure of SARS-CoV Nsp1 (PDB: 2HSX) as a search 13 model [29] . Subsequently, the initial electron density map was improved though density 14 modification by parrot and the model was built using buccaneer [29] . Final model was produced 15 by the cycle of manual model building and refinement using Phenix.refine and Coot [30, 31] . All 16 geometry was verified using Phenix validation tools (Ramachandran statistics: Favoured 17 (97.1%), additionally allowed (2.9%), disallowed (0.0%)) and the wwPDB server. 18 Electrostatic potential surfaces were calculated using Pymol using APBS [21] . . Structure 21 similarity searches of the Protein Data Bank were performed using the Dali server 22 [22] .Secondary structure prediction of Nsp1 homologues was done using PSIPRED [26] . 23 Novel Coronavirus Polymerase and Nucleotidyl-Transferase 1 Structures: Potential to Target New Outbreaks Novel beta-barrel fold in the nuclear magnetic resonance structure 4 of the replicase nonstructural protein 1 from the severe acute respiratory syndrome 5 coronavirus Inference of macromolecular assemblies from crystalline 7 state Electrostatics of nanosystems: application to microtubules and the 9 ribosome Dali server update A conserved region of nonstructural protein 1 from alphacoronaviruses 13 inhibits host gene expression and is critical for viral virulence MERS coronavirus nsp1 participates in an efficient propagation 16 through a specific interaction with viral RNA Structural basis for translational shutdown and immune evasion by the 18 Nsp1 protein of SARS-CoV-2 The PSIPRED Protein Analysis Workbench: 20 years 20 on DISOPRED3: precise disordered region predictions with 22 annotated protein-binding activity HKL-3000: the integration of data reduction and structure solution--24 from diffraction images to an initial model in minutes Overview of the CCP4 suite and current developments PHENIX: a comprehensive Python-based system for macromolecular 29 structure solution Coot: model-building tools for molecular graphics I-TASSER server: new development for protein structure and 33 function predictions COFACTOR: improved protein function 35 prediction by combining structure, sequence and protein-protein interaction information Improving physical realism, stereochemistry, and side-chain accuracy 38 in homology modeling: Four approaches that performed well in CASP8 PROCHECK -a 41 program to check the stereochemical quality of protein structures High-resolution comparative modeling with RosettaCM. Structure The authors declare that they have no competing interests. 19 Homology modeling of MERS-CoV was performed by I-TASSER using the crystal structure of 1 SARS-CoV-2 Nsp113-127 as the threading template [32, 33] . The model produced by I-TASSER 2 was subjected to further energy minimization using the YASARA energy minimization server to 3 improve stereochemical property [34] . Stereochemical properties of the final model was 4 validated with Ramachandran plot using PROCHECK server and 90% of residues are in the 5 favoured/additionally allowed region [35] . The model of the full-length SARS-CoV-2 Nsp1 was 6 generated using the Robetta structural predication server with the SARS-CoV-2 Nsp113-127 7 structure defined as the template [36] . 8 9 Acknowledgements 10We thank Changsoo Chang and the Structural Biology Center Team at APS for data 11 collection. The structure presented was solved as part of the Center for Structural Genomics of 12 Infectious Diseases (CSGID). This project has been funded in whole or in part with U.S. Federal 13 funds from the National Institute of Allergy and Infectious Diseases, National Institutes of 14