key: cord-0839705-y87llpic authors: El‐Kamand, Serene; Du Plessis, Mar‐Dean; Breen, Natasha; Johnson, Lexie; Beard, Samuel; Kwan, Ann H.; Richard, Derek J.; Cubeddu, Liza; Gamsjaeger, Roland title: A distinct ssDNA/RNA binding interface in the Nsp9 protein from SARS‐CoV‐2 date: 2021-08-13 journal: Proteins DOI: 10.1002/prot.26205 sha: ea8c07e9c65c040bfcfedf159230503d93bc0a34 doc_id: 839705 cord_uid: y87llpic Severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) is a novel, highly infectious RNA virus that belongs to the coronavirus family. Replication of the viral genome is a fundamental step in the virus life cycle and SARS‐CoV‐2 non‐structural protein 9 (Nsp9) is shown to be essential for virus replication through its ability to bind RNA in the closely related SARS‐CoV‐1 strain. Two recent studies revealing the three‐dimensional structure of Nsp9 from SARS‐CoV‐2 have demonstrated a high degree of similarity between Nsp9 proteins within the coronavirus family. However, the binding affinity to RNA is very low which, until now, has prevented the determination of the structural details of this interaction. In this study, we have utilized nuclear magnetic resonance spectroscopy (NMR) in combination with surface biolayer interferometry (BLI) to reveal a distinct binding interface for both ssDNA and RNA that is different to the one proposed in the recently solved SARS‐CoV‐2 replication and transcription complex (RTC) structure. Based on these data, we have proposed a structural model of a Nsp9‐RNA complex, shedding light on the molecular details of these important interactions. SARS-CoV-2 have indicated that the protein forms a homo-dimer in solution with a weak affinity for single-stranded DNA (ssDNA) and RNA. 4, 6 Some similarities have been proposed to exist between viral Nsp9 proteins and members of the family of single-stranded DNA binding proteins (SSBs). 4 The ability to recognize and sequester ssDNA renders SSBs essential in DNA replication, repair and in the maintenance of genomic stability. [7] [8] [9] Like SSB proteins, Nsp9 proteins are thought to bind to nucleic acids in a non-specific manner to ensure the correct processing of newly synthesized viral RNA. 5, 10 Understanding the complex RNA synthesis machinery of the virus is crucial in the development of therapeutic strategies against COVID- 19 and other related coronaviruses. However, for some Nsp proteins, such as Nsp9, important molecular and structural details pertinent to function remain largely unknown. In this study, we have utilized nuclear magnetic resonance spectroscopy (NMR) in combination with biolayer interferometry (BLI) to map the RNA interaction interface of SARS-CoV-2 Nsp9. We have used our biophysical data to calculate structural models of Nsp9-RNA complexes revealing a set of binding residues (including one aromatic residue) that play an important role in the recognition of RNA. Importantly, the site of RNA binding to Nsp9, according to our structural model, differs from the proposed site in the recently published structure of the SARS-CoV-2 RTC. 5 2 | MATERIALS AND METHODS Recombinant expression of 6xHis-Nsp9 from SARS-CoV-2 was induced in E. coli BL21(DE3) cells by the addition of 0.2 mM IPTG for 15 h at 20 C. Isotopically labeled 15 N-Nsp9 and 15 N 13 C-Nsp9 were prepared in a bio-fermenter using the protocol described in Reference 11. Cells were lysed by sonication in lysis buffer (20 mM TRIS pH 8.0, 50 mM NaCl, 3 mM TCEP, 0.5 mM PMSF, 0.1% Triton X 100) or SEClysis buffer (buffer used to determine molecular weight by size exclusion chromatography; 50 mM TRIS pH 8.0, 500 mM NaCl and 0.01% v/v Igepal CA630, supplemented with protease inhibitors). Lysed cells were centrifuged, and the supernatant was subjected to Ni-NTA affinity chromatography followed by thrombin cleavage for 1 h at 25 C, and a subsequent thrombin cleavage for 15 h at 4 C. For size exclusion chromatography runs to determine molecular weight, no thrombin cleavage was carried out and the protein was eluted in SEC buffer (50 mM Tris-HCl pH 8.0, 150 mM NaCl and 0.01% v/v Igepal CA630) with 300 mM imidazole (see section below). For NMR and BLI experiments, the eluate from Ni-NTA affinity chromatography was subjected to size exclusion chromatography using a Superdex 75 column (120 ml) equilibrated with NMR buffer (25 mM sodium phosphate pH 6.0, 150 mM NaCl, 1 mM DTT). Fractions correlating to a distinct peak in UV absorbance (280 nm) were collected and analyzed by SDS-PAGE. Protein concentrations were determined using the absorbance at 280 nm and the theoretical molar extinction coefficient for Nsp9 (12 950 M À1 cm À1 ). Nsp9 was expressed and purified as described above at concentrations $50 μM. The protein was then analyzed via size exclusion chromatography on a Superose 6 10/300 column (Cytiva) in SEC buffer at 0.3 ml/min (single peak at 18.11 mL retention volume). Standards were run on the F I G U R E 1 Coronavirus family Nsp9 sequence information. (A) Primary sequence of Nsp9 with structural features indicated on top (the α1 helix is part of the monomermonomer interface). Residues in red boxes were successfully assigned in the HSQC spectra. (B) Sequence alignment of the Nsp9 proteins from SARS-CoV-2, SARS-CoV-1, MERS-CoV, porcine epidemic diarrhea virus (PEDV), porcine deltacoronavirus (PDCoV), transmissible gastroenteritis virus (TGEV), human coronavirus 229E (HCoV-229E), mouse hepatitis virus (MHV), and avian infectious bronchitis virus (IBV). Residues in bold and red boxed residues indicate residues that exhibit significant chemical shift changes ("binding residues") and residues involved in close contacts with RNA as revealed from molecular docking calculations, respectively, whereas gray areas indicate high sequence conservation same column under the same conditions (β-amylase β-AML, M = 200 kDa; alcohol dehydrogenase ADH, M = 150 kDa; carbonic anhydrase CA, M = 29 kDa; bovine serum albumin BSA, M = 66 kDA and cytochrome C, M = 12 kDa) and the elution volumes were used to calculate the molecular weight of Nsp9 (23 ± 6 kDa) using the equations described in Irvine et al. 12 The calculated molecular weight corresponds to a dimer in solution (predicted molecular weight of 28 kDa). For each Nsp9 mutant protein and the wild type, a concentration series consisting of 4 to 5 solutions at concentrations between 90 and 2600 μM were prepared using NMR buffer. BLI steady-state analysis was carried out in which the proteins were bound to a 5 0 biotinylated oligo (dU) 23 The protein structure of one monomer of Nsp9 (residue 4-113) was taken from the recently solved crystal structure of the Nsp9 dimer (PDB 6WXD) 4 and used as input for HADDOCK, 14 A very recent study has already established that Nsp9 exists as a dimer in solution at concentrations as low as 1 μM with dissociation into monomers in the nM range. 17 After expression, we purified Nsp9 As expected, Nsp9 elutes at a volume that most closely corresponds to a dimer in solution with a calculated size of 23 ± 6 kDa. To determine the molecular details of ssDNA and RNA binding, we used NMR spectroscopy to assign backbone amide resonances of Nsp9. A combination of 2D and 3D NMR experiments enabled us to assign around 70% of the 113 residues of the SARS-CoV-2 Nsp9 monomer (indicated in red boxes in Figure 1A and shown in Table S1 ). In agreement with the presence of a symmetric dimer in the crystal structure, 4 only one set of peaks was observed. Relatively broad lines observed are also consistent with the dimeric state of the protein in solution and also supported by our size exclusion chromatography data ( Figure 2 ). Apart from 10 peaks that exhibit substantial line broadening and could not be unambiguously assigned, the rest of the unassigned backbone signals were absent in the HSQC spectrum, most likely due to chemical exchange processes with the solvent. We used the backbone resonance obtained assignments to predict secondary structure elements of Nsp9 by TALOS-N 18 ( Figure 3) . Overall, the predictions are in very good agreement with the crystal structure 4 (shown on top of Figure 3 ). Next, to determine how ssDNA and RNA are recognized by Nsp9, we recorded HSQC spectra in the absence and presence of nucleic acids ( Figure 4 ; for full spectra see Figure S1 ). RNA has been shown to bind Nsp9 extremely weakly, 4 we therefore utilized the highest concentrations of soluble protein and ssDNA/RNA that could be obtained (between 1.4-1.6 mM protein and 4.5-5.1 mM nucleic acid). To ensure that the length of the ssDNA/RNA is sufficient to recognize the entire Nsp9 dimer, 4 we utilized nucleic acid oligonucleotides comprising of 23 bases (oligo (dT) 23 and oligo (dU) 23 , respectively). As is evident from Figure 4 , the observed chemical shift changes are very small ( Figure 4A ,B show HSQC spectra at binding saturation, that is, no additional chemical changes were observed upon further addition of nucleic acid). Calculation of weighted chemical shift changes 13 for Nsp9 upon binding to ssDNA/RNA, revealed residues that undergo substantial changes in backbone structure (above the average of all observed chemical shift changes). These residues are either subject to distant conformational changes or more likely involved in nucleic acid binding (termed "binding residues", see Figure 4C ,D; for a summary of all chemical shift changes see Table S1 ). Notably, the binding profiles of ssDNA and RNA are highly similar, indicating that the molecular details of the interaction of Nsp9 with single-stranded nucleic acids are conserved. Next, we mapped the RNA binding residues as determined in Figure 4D onto the X-ray crystal structure of the Nsp9 dimer (PDB 6WXD) 4 ( Figure 5 , colored in red and Table S1 ). As seen in Figure 5 , binding of RNA takes place on one side of the Nsp9 dimer (indicated as "binding interface" in Figure 5 ). The majority of binding residues are located on the external surface of the protein (at least 30% surface exposure), with the exception of a set of buried hydrophobic residues (V31, L42, A43, L45, F90, and V102). To visualize nucleic acid binding to Nsp9, we next used our NMR data as well as the existing crystal structure of SARS-CoV-2 Nsp9 4 to calculate structural models of Nsp9-RNA complexes ( Figure 6 ). The symmetry of the Nsp9 dimer and the fact that we did not observe any splitting of individual peaks in the HSQC upon RNA binding confirmed that each Nsp9 monomer recognizes RNA in an identical manner. Thus, only one SARS-CoV-2 Nsp9 monomer from the dimeric crystal 4 structure was utilized in our structural modeling. Preliminary docking calculations using HADDOCK 14, 15 revealed that an RNA oligonucleotide nine bases in length (oligo (dU) 9 , constructed in silico using the program BABEL 16 ), is sufficient to recognize one Nsp9 protein molecule. We first performed HADDOCK runs with all binding residues defined as active (AIRs) and oligo(dU) 9 RNA. Figure 6A depicts the lowest-energy structure from the lowest-energy cluster of 1000 calculated structures. Interactions between Nsp9 and the RNA occur mainly via the backbone of the nucleic acid, in agreement with the expectation that the binding is non-specific, ensuring a variety of different viral RNA sequences can be processed. Only two major electrostatic interactions were observed between positively charged R55 and K92 and the RNA backbone, which may explain the low binding affinity. Potential hydrogen bonds between two threonine residues (T35 and T64) and corresponding RNA bases were identified in most of the structural models; these residues are indicated as red boxes in Closer inspection of our NMR data ( Figure 4 ) reveals that two aromatic residues present within the binding interface exhibit significant F I G U R E 4 NMR analysis of Nsp9 protein in complex with ssDNA and RNA. Sections of 15 N-HSQC spectrum of Nsp9 in the absence (gray) and presence (1:1 mixture, red) of oligo(dT) 23 (A) and oligo (dU) 23 (B), respectively. Weighted backbone chemical shift changes of HN and N 13 atoms for Nsp9 upon binding to ssDNA (C) and RNA (D). Residues exhibiting changes larger than the average ("binding residues") are colored in red. Note the similarity in the binding profile between ssDNA and RNA F I G U R E 5 NMR reveals distinct interaction surface. Nsp9 dimer structure (taken from PDB 6WXD) in surface and cartoon representation (colored in light blue). The RNA binding residues as determined in Figure 4C ,D were mapped onto the deposited crystal structure. (red). Note that a distinct binding interface exists on one side of the dimer structure chemical shift changes and are sufficiently exposed to the solvent: F40 and Y66 ( Figure 6B ; also boxed in Figure 1B ). To determine if these aromatics could form Π-Π stacking interactions with RNA as well as to confirm the contribution by the other binding residues, we made a series of Nsp9 alanine mutants (F40A, R55A, Y66A, K92A, and R55A/K92A) based on our NMR data and binding model (Figures 4 and 6) . Dissociation constants of the binding between Nsp9 as well as mutant Nsp9 proteins and RNA (oligo(dU) 23 ) were calculated using a steady-state analysis from BLI data (Figures 7 and S2 ). All Nsp9 mutant proteins were assessed for correct folding using 1D NMR spectroscopy ( Figure 7F ). The F40A mutant exhibits a substantial decrease in the binding affinity compared to the wild type protein ( Figure 7A ,B,E) strongly suggesting that F40 forms stacking interactions with the RNA. In contrast, replacing either R55 or Y66 with alanine did not result in a significant change in the binding strength ( Figure 7C,D,E) . Notably, while binding of wild-type, F40A, R55A, and Y66 to RNA is instantaneous, interaction of both K92A and the double mutant R55A/K92A with the RNA is characterized by significantly longer association and dissociation phases with no steady-state reached within the observed time period ( Figure S2) . Consequently, the steady-state analysis could not be reliably performed for these two mutants, however, dissociation constants could be estimated from the kinetic binding data (K D > 450 μM and > 1000 μM for K92A and R55/K92A, respectively) indicating that K92 plays a major role in RNA recognition, both due to the increase in the dissociation constant but also the change in the interaction kinetics. In contrast, R55 may only make a minor contribution to RNA binding as no significant effect on the RNA interaction of the single R55A mutant compared to the wild-type protein could be observed ( Figure 7C ). Based on our BLI data, we repeated our HADDOCK calculations in the presence of additional restraints between F40 and the RNA (AIRs and planar restraints). As seen in Figures 8A,B , the resulting structural model is very similar to the one shown in Figure 6A , however, additional base-stacking can now be observed between F40 and U8 ( Figure 8C ). Consistent with the significant but small chemical shift changes observed for F40, the overall structural change of the side chain of this residue upon binding is minor. Overall, apart from F40, the binding interface is mostly made up by positively charged residues including K36, K58, and K92 ( Figure 8C) , however, refinement of the model resulted in significantly fewer contacts of R55 compared to K92, in good agreement with our mutational data (Figure 7 ). It has been proposed that the structure of Nsp9 exhibits some topological similarities with OB (oligonucleotide/oligosaccharide binding) domain-containing proteins, 4 however, until now, the molecular details of RNA binding have not been resolved. In this study we present structural models of Nsp9-RNA complexes based on NMR chemical shift perturbation analysis and BLI data. Although mostly electrostatic in nature, we identify one aromatic residue (F40) within the interaction surface that most likely plays a key role in the recognition of RNA via base-stacking mechanisms. A very recently published study used an RNA mimic (1,3-dimethyl-6H-pyrrolo[3,4-days]pyrimidine-2,4-dione; FR6) to probe nucleic acid binding by Nsp9. 19 Figure S3 depicts the determined crystal structure of that work (PDB 7KRI) revealing that FR6 engages Nsp9 via a tetrameric Π-Π stacking between F40 and the terminal bases of FR6 inducing the formation of a parallel Nsp9 trimerof-dimers. These data confirm that F40 is indeed able to stack with RNA bases, providing further evidence of the validity of our model. Π-Π stacking mechanisms are very common in OB domain containing proteins that recognize ssDNA or RNA where up to four F I G U R E 6 Structural models of Nsp9-RNA complex structures calculated using HADDOCK. (A) Surface and cartoon representation of an Nsp9-RNA complex model structure (containing one Nsp9 monomer) calculated using the binding residues (colored in red) as determined in Figure 4C ,D in HADDOCK (Nsp9 is colored in light blue, RNA in dark blue). (B) Detailed view of the surface-exposed aromatic residues F40 and Y66 (stick representation; colored in green) that are located within the binding interface aromatic residues within one OB domain are utilized. 20 However, Nsp9 differs both in the location and composition of the RNA binding site (as determined in this study) as well as the binding affinity, suggesting that no significant structural similarities exist between the two protein families. Although F40 is not conserved throughout the coronavirus family, other aromatics in close proximity to its position ( Figure 1B Figure 1B) . Several studies have shown that the binding strength depends on the oligomerisation state of Nsp9 (as mutants that disrupt dimerisation display lower affinities) as well as the length of the used oligonucleotide. [21] [22] [23] These data are consistent with our NMR experiments revealing that each protein monomer recognizes RNA in an identical manner. Furthermore, the interaction interface is located on one side of the Nsp9 dimer, allowing for one continuous strand of RNA to bind the Nsp9 dimer using the binding residues described by our structural model. In this study, we have used NMR to obtain backbone chemical shift assignments of Nsp9 under physiological conditions. Using standard triple resonance approaches we have been able to assign the backbone chain atoms of 77 out of the 113 residues of Nsp9 (red boxed in Figure 1A and listed in Table S1 ) and have identified 27 out of 29 residues that exhibit significant chemical shift changes upon binding of ssDNA/RNA (Figure 4) . During the course of our work, another two groups published near-complete solution-state backbone assignments of Nsp9, 17,24 one of which was available in the BMRB database (50513), along with our data (50725). Overall, our assignments are in good agreement with the published data with only minor differences observed, which could be explained by the different pH value used in our final NMR buffer (6) compared to the other two studies (7). Notably, one of the two studies (Buchko et al.) 17 revealed that Nsp9 exists predominantly as a dimer at protein concentrations >100 μM, in excellent agreement with our size exclusion data ( Figure 2 ). However, although not observed at concentrations used in our NMR experiments, Buchko et al. suggested that dissociation into monomers could still play a functional role for these proteins, for example through binding to other proteins (such as Nsp12 in the context of RTC formation; see next section). The Nsp9-RNA model in the context of the RTC structure A very recent study has revealed the structural basis of assembly of the SARS-CoV-2 replication and transcription complex (RTC), including the Nsp9 protein. 5 Although an interaction of Nsp9 with Nsp8 has been proposed in the past, 6 no direct binding between these two F I G U R E 8 Refined HADDOCK model of Nsp9-RNA complex. (A) Surface and cartoon representation of an Nsp9-RNA complex model structure calculated as in Figure 6 with the addition of restraints to achieve base-stacking between F40 with the corresponding RNA base (Nsp9 is colored in light blue, RNA in dark blue). (B) Same as in A with the protein (light gray) and the RNA (dark blue) shown as cartoon and stick representation, respectively. (C) Electrostatic potential (blue = positive, red = negative) of RNA binding interface with positively charged binding residues indicated (top) as well as detailed view (as stick and cartoon representation) of П-П stacking interaction between F40 and U8 (middle) and the location of the two electrostatic residues R55 and K92 (colored in green; bottom) F I G U R E 9 The RNA binding interface of Nsp9 within the replication and transcription complex (RTC). (A) Structure of the RTC 5 taken from PDB 7CYQ with Nsp9 colored in gray and Nsp7/8/12/13 proteins in salmon (cartoon representation). (B,C) Detailed view of RNA interaction surface as proposed in 5 (B) as opposed to in this study (C). Nsp9 is shown as surface representation (light blue) and the RNA binding site is colored in red. Note that the proposed binding site (indicated by the arrow) differs from the one determined in our structural modeling proteins has been observed in the RTC structure. Indeed, Nsp9 appears to only interact with the RNA polymerase (RdRp, Nsp12) 5 ( Figure 9A ). This interaction is characterized by the insertion of the amino terminus of Nsp9 (one of the regions of very high sequence conservation; Figure 1B) to the NiRAN domain of Nsp12, affecting the catalytic activity of the polymerase. Nsp9 binds as a monomer which can be explained by the fact that apart from the amino terminus, residues that form part of the monomer-monomer interface in the dimeric crystal structure 4 play an important role in the binding to Nsp12. These data indicate that Nsp9 dimers, although being the dominant species in solution (apart from very low concentrations), are not required for RTC function. Figure 9B (indicated by the arrow) have been assigned by NMR in our work and exhibit no significant chemical shift changes upon RNA binding (see also Table S1), further confirming that the recognition of RNA takes place on the opposite site as proposed in our model ( Figure 9C ). The 3 0 end of the RNA in our structural model points towards the flexible amino terminus of Nsp9 (consistent with the large chemical shift changes observed in the RNA titration experiments; Figure 4 ) and is in close proximity to two positively charged Nsp12 residues (R733 and R735) that may be able to direct RNA binding towards the polymerase. RNA processing within the RTC in viral replication is a complex process and the role of Nsp9 is not well understood. Although it is proposed that Nsp9 stabilizes the 5 0 end of the processed RNA, it is not clear how the RNA is directed to Nsp9, given the large distance ($ 110 Å) between the catalytic centre of Nsp12 NiRAN (RNA exit site) and the Nsp9 binding site. 5 Our study resolves some of the ambiguities around the exact location of RNA binding within Nsp9, however, further studies will be required to shed light onto these important processes given the urgent need to develop new therapeutic strategies against COVID-19. This research was funded by grants from Western Sydney University and facilitated by access to Sydney Analytical, a core research facility at the University of Sydney; in particular we thank Dr Mario Torrado Del Rey for assistance and advice with work related to protein expression and purification. The peer review history for this article is available at https://publons. com/publon/10.1002/prot.26205. The data that support the findings of this study are available from the corresponding author upon reasonable request. https://orcid.org/0000-0003-1095-2569 Emerging coronaviruses: genome structure, replication, and pathogenesis Author correction: a new coronavirus associated with human respiratory disease in China A structural view of SARS-CoV-2 RNA replication machinery: RNA synthesis, proofreading and final capping Crystal structure of the SARS-CoV-2 non-structural protein 9 Cryo-EM structure of an extended SARS-CoV-2 replication and transcription complex reveals an intermediate state in cap synthesis The nsp9 replicase protein of SARScoronavirus, structure and functional insights Multiple human single-stranded DNA binding proteins function in genome maintenance: structural, biochemical and functional analysis Human single-stranded DNA binding proteins are essential for maintaining genomic stability A data-driven structural model of hSSB1 (NABP2/OBFC2B) self-oligomerization The severe acute respiratory syndrome-coronavirus replicative protein nsp9 is a singlestranded RNA-binding subunit unique in the RNA virus world An efficient and cost-effective isotope labeling protocol for proteins expressed in Escherichia coli Determination of molecular size by size-exclusion chromatography (gel filtration) Latent and active p53 are identical in conformation HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets HADDOCK: a protein-protein docking approach based on biochemical or biophysical information Open Babel: an open chemical toolbox Backbone chemical shift assignments for the SARS-CoV-2 nonstructural protein Nsp9: intermediate (ms -mus) dynamics in the C-terminal helix at the dimer interface Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks Binding of a pyrimidine RNA base-mimic to SARS-CoV-2 non-structural protein 9 Single-stranded DNA-binding proteins: multiple domains for multiple functions Dimerization of coronavirus nsp9 with diverse modes enhances its nucleic acid binding affinity Structural basis for dimerization and RNA binding of avian infectious bronchitis virus nsp9 Severe acute respiratory syndrome coronavirus nsp9 dimerization is essential for efficient viral growth Backbone chemical shift spectral assignments of SARS coronavirus-2 non-structural protein nsp9 How to cite this article A distinct ssDNA/RNA binding interface in the Nsp9 protein from SARS-CoV-2