key: cord-0908624-t2trtlzi
authors: Gallo, Angelo; Tsika, Aikaterini C.; Fourkiotis, Nikolaos K.; Cantini, Francesca; Banci, Lucia; Sreeramulu, Sridhar; Schwalbe, Harald; Spyroulias, Georgios A.
title: (1)H,(13)C and (15)N chemical shift assignments of the SUD domains of SARS-CoV-2 non-structural protein 3c: “The SUD-M and SUD-C domains”
date: 2021-01-09
journal: Biomol NMR Assign
DOI: 10.1007/s12104-020-10000-9
sha: d9cb4955d6bf2f2d1ceb5df9e2ebac05e01fa0d1
doc_id: 908624
cord_uid: t2trtlzi

SARS-CoV-2 RNA, nsP3c (non-structural Protein3c) spans the sequence of the so-called SARS Unique Domains (SUDs), first observed in SARS-CoV. Although the function of this viral protein is not fully elucidated, it is believed that it is crucial for the formation of the replication/transcription viral complex (RTC) and of the interaction of various viral “components” with the host cell; thus, it is essential for the entire viral life cycle. The first two SUDs, the so-called SUD-N (the N-terminal domain) and SUD-M (domain following SUD-N) domains, exhibit topological and conformational features that resemble the nsP3b macro (or “X”) domain. Indeed, they are all folded in a three-layer α/β/α sandwich structure, as revealed through crystallographic structural investigation of SARS-CoV SUDs, and they have been attributed to different substrate selectivity as they selectively bind to oligonucleotides. On the other hand, the C-terminal SUD (SUD-C) exhibit much lower sequence similarities compared to the SUD-N & SUD-M, as reported in previous crystallographic and NMR studies of SARS-CoV. In the absence of the 3D structures of SARS-CoV-2, we report herein the almost complete NMR backbone and side-chain resonance assignment ((1)H,(13)C,(15)N) of SARS-CoV-2 SUD-M and SUD-C proteins, and the NMR chemical shift-based prediction of their secondary structure elements. These NMR data will set the base for further understanding at the atomic-level conformational dynamics of these proteins and will allow the effective screening of a large number of small molecules as binders with potential biological impact on their function.

A novel coronavirus (SARS-CoV-2) that causes the disease Coronavirus Disease 2019 (COVID-19) emerged in a seafood and poultry market in the Chinese city of Wuhan in 2019 (Li et al. 2020) . Cases have been detected in most countries worldwide, and on March 11, 2020, the World Health Organization characterized the outbreak as a pandemic. Other coronaviruses that have plagued humankind till these days are namely, SARS-CoV (identified in 2003) and MERS-CoV (Middle East Respiratory Syndrome; first reported in Saudi Arabia in 2012). Since these three viruses belong to the same family, they share significant similarities including giving rise to severe symptoms and having high pathogenicity in humans. However, despite the structural similarities and other common features in pathogenicity, studies have identified some differences inter alia in the spike protein (S), which is believed to be of immense importance for the increased efficiency in SARS-CoV-2 transmission and spread (Ou et al. 2020) . Therefore, it is important to determine the protein structure differences among these coronaviruses in an attempt to elucidate the mechanisms and the factors that induce virulence of each individual virus.

Among the protein domains that are common in SARS viruses are, the so-called, SARS Unique Domains (SUDs), first identified in SARS-CoV. The polypeptide that includes these SUDs domains is part of the non-structural protein 3 (nsP3), and it is named nsP3c. It consists of three separate domains, SUD-N that is located at the N-terminal of SUD, SUD-M (middle domain) and SUD-C that is the smallest of the three (Tan et al. 2007; Johnson et al. 2010; Serrano et al. 2009; Kusov et al. 2015; Burrell et al. 2017; Lei et al. 2018 ). These three SARS-CoV-2 domains share sequence identity of 68.57%, 81.6% & 73.44%, respectively, with SARS-CoV SUD N, M and C domains (Fig. 1) .

SUD-N and SUD-M exhibit a macro-like folding, α/β/α sandwich fold consisting of ~ 120-140 amino acids. According to the literature, they have a greater affinity for oligonucleotides instead of binding ADP-ribose (ADPr) and they lack the capacity of macro domains to hydrolyze attached ADPr molecules as well as their potential inability to de-MARylate substrates (Tan et al. 2009; Alhammad et al. 2020) . Unlike the two previous domains, SUD-C is the shortest in length domain (~ 60-70 amino acids) and has a frataxin-like or a double-wing motif α/β fold, consisting of five antiparallel β sheets, packed against two α helices (Johnson et al. 2010; Chatterjee et al. 2009; Tan et al. 2009 ). The proposed function of SUD-M and SUD-C is that of binding of G-quadruplex forming RNA (Hammond et al. 2017) . It has also been reported that SUD-C from bat coronavirus has DNA and metal ion-binding properties (Staup et al. 2019) . Specifically, SUD-M, as a single domain, has been reported to bind (GGGA) 2 and (GGGA) 5 as well as (GGGA) 2 GG while SUD-MC, as a double domain, only binds to (GGGA) 2 GG but not (GGGA) 2 or (GGGA) 5 , suggesting SUD-C might play a role in tuning the selectivity of binding of SARS Unique Domain (Johnson et al. 2010) . Moreover, in vivo experiments shed light on SUD-M as an essential domain for the replication of the viral genome, in contrast to SUD-N and SUD-C, which are nonessential for virus genome replication (Kusov et al. 2015) . One of the reported features of SUDs is the interaction with host proteins, like RCHY1, which is an E3 ligase that regulates the function of p53 protein, which might have an antiviral role (Ma-Lauer et al. 2016) . These interactions might also be crucial for the acute symptoms experienced by the individuals infected with the virus. In addition, a recent study demonstrated that SUD-MC interacts with specific cellular components, affecting the pulmonary inflammation (Chang et al. Fig. 1 Sequence alignments of the SUD-M and SUD-C domains the SARS-CoV and SARS-CoV-2. Amino acid numbering is according to the sequence of the multi-domain non-structural protein 3 c (nsP3c). The color coding is dark blue for conserved residues, light blue for conserved type of residues and white for non-conserved residues 1 3 2020). Conformational dynamics and interaction properties of SUDs may be of great interest for the detailed functional characterization of the viral components and/or the discovery or the identification of new lead compounds that bind to these proteins in the quest for new antiviral drugs.

We report herein the complete backbone and side chains chemical shift assignments of the SARS-CoV-2 SUD-M and SUD-C (spanning the residues 551-675 and 680-743; according to nsP3 numbering, respectively). These data can be exploited for the elucidation at the atomic level of the structure, dynamics and interaction of these domains with a library of chemical compounds with potential antiviral properties.

The coding sequences of the SUD-M domain (551-675 of nsP3) and SUD-C domain (680-743 of nsP3) were amplified using primers (fwd: 5′ GAA TTC CAT ATG GGT ACC GTG AGC TGG AAC 3′ and rev: 5′ CCG CTC GAG TTA TTA GCT GCT GGT CAG 3′) and (fwd: 5′ CGC GGA TCC GAG GAA CAC TTC ATC G 3′ and rev: 5′ CCG CTC GAG TTA TTA GCT CAG CAG GG 3′, respectively. cDNA sequence encoding nsP3 residues 201-745 (GenBank entry: MT066156.1nucleotide numbering of the whole genome 3319-4954-GenBank entry: QIA98553 orf1ab-protein numbering of 1019-1563) was used to design the primers. This sequence was synthesized, and codon optimized for expression in Escherichia coli, by GenScript, (Piscataway, NJ). SARS-CoV-2 SUD-M coding sequence was cloned into pET28a(+) expression vector, containing an N-terminal His-tag followed by a thrombin cleavage site. The produced protein contained four artificial N-terminal residues (GSHM) preceding the native protein sequence. The SARS-CoV-2 SUD-C coding sequence was cloned into pGEX4T-1 expression vector, containing an N-terminal GST-tag followed by a thrombin cleavage site. The produced protein contained two artificial N-terminal residues (GS) preceding the native protein sequence.

For the expression of SARS UNIQUE DOMAIN M (SUD-M) and SARS UNIQUE DOMAIN C (SUD-C), in 0.5 L M9 culture (40 mM Na 2 HPO 4 , 22 mM KH 2 PO 4 , 8 mM NaCl) containing 0.5 g 15 N labeled NH 4 Cl, 2 g unlabeled or 13 C d-glucose, 1 mL from a solution containing 0.5 mg/L biotin, 0.5 mg/L thiamin, 0.5 mL 1 M Mg 2 SO 4 , 0.15 mL 1 M CaCl 2 , 1 mL solution Q (40 mM HCl, 50 mg/L FeCl 2 ·4H 2 O, 184 mg/L CaCl 2 ·2H 2 O, 64 mg/L H 3 BO 3 , 18 mg/L CoCl 2 ·6H 2 O, 4 mg/L CuCl 2 ·2H 2 O, 340 mg/L ZnCl 2 , 605 mg/L Na 2 MoO 4 ·2H 2 O, 40 mg/L MnCl 2 ·4H 2 O), and 1 mg/L of kanamycin (for SUD-M) and 1 mg/L of ampicillin (for SUD-C), an LB preculture that was inoculated with BL21 (DE3) E. coli cells transformed with the above mentioned plasmid (that was grown overnight at 37 °C, 180 rpm) was added. The culture was incubated in 37 °C, 180 rpm until the OD 600 was between 0.6 and 0.8, then IPTG was added to final concentration of 1 mM and the culture incubated overnight (16 h) at 18 °C.

Protein purification performed according to standard protocols and details will be published elsewhere. The final NMR samples (concentration 0.9 mM for SUD-M and 0.7 mM for SUD-C) were prepared by adding 10% D 2 O and 0.25 mM DSS.

Protein NMR samples for SUD-M and SUD-C domains were prepared in 500 μL buffer at pH 7.2 containing 50 mM NaPi, 50 mM NaCl, 10% D 2 O, 2 mM DTT, 2 mM EDTA, 2 mM NaN 3 , bacterial inhibitor cocktail (Sigma Aldrich®) and 0.25 mM DSS (4,4-dimethyl-4-silapentane-1-sulfonic acid) as internal 1 H chemical shift standard. 13 C and 15 N chemical shifts were referenced indirectly to the 1 H standard using a conversion factor derived from the ratio of NMR frequencies (Wishart et al. 1995) . The protein concentration in the NMR sample was 0.9 mM for SUD-M and 0.7 mM for SUD-C. All NMR experiments were recorded at 298 K on a Bruker Avance III High-Definition four-channel 700 MHz NMR spectrometer equipped with a cryogenically cooled 5 mm 1 H/ 13 C/ 15 N/D Z-gradient probe (TCI). The acquired NMR experiments used for sequence specific assignment are summarized in Table 1 (Table 1) . We also performed CBCA(CO)NH selective experiments in order to help the identification of residues without CG and residues such as Ala, Cys and Ser (Table 1 ). All NMR data were processed with TOPSPIN 4.0.6 and analyzed with CARA 1.9.2a4 (Keller 2004 ).

The 2D 1 H, 15 N-HSQC spectrum shows well-dispersed amide signals as shown in Fig. 2 for SUD-M and in Fig. 3 for SUD-C, respectively. For nsP3c SUD-M we assigned 98.6% of Secondary structure prediction for both SUD domains (M and C) were performed using chemical shift assignments of five atoms (H N , H α, Cα, Cβ, CO, N) for each residue in the sequence using TALOS+ (Shen et al. 2009 ). The secondary structure elements for SUD-M protein (125 a.a.) are organized in the following order from N-to C-terminus: β/α/β/α/ β/β/α/β/α/α/β/α (Fig. 4) . The order of the secondary structure segments is very similar to that of the nsP3b protein (Cantini et al 2020) and to SUD-N domain of nsP3c, beside two extra β strands and an α -helix secondary structure elements. This domain has also high secondary structure identity in comparison with SUD-M domain from SARS-CoV. The secondary structure elements for SUD-C protein (64 a.a.) are organized in the following order from N-to C-terminus: α/β/ β/β/β/α (Fig. 5) . This domain has high secondary structure identity in comparison with SUD-C domain from SARS-CoV previously characterized and its secondary structure folding is very similar (Johnson et al. 2010) .

Chemical shift values for the 1 H, 13 C and 15 N resonances of SARS-CoV-2 nsP3c SUD-M and SUD-C have been deposited at the BioMagResBank (https ://www.bmrb. wisc.edu) under accession numbers 50516 and 50517, respectively. 

The SARS-CoV-2 conserved macrodomain is a highly efficient ADP-ribosylhydrolaseenzyme

Fenner and white's medical virology, 5th edn

2020) 1 H, 13 C, and 15 N backbone chemical shift assignments of the apo and the ADP-ribose bound forms of the macrodomain of SARS-CoV-2 non-structural protein 3b

SARS Unique Domain (SUD) of severe acute respiratory syndrome coronavirus induces NLRP3 inflammasome-dependent CXCL10-mediated pulmonary inflammation

Nuclear magnetic resonance structure shows that the severe acute respiratory syndrome coronavirus-unique domain contains a macrodomain fold

SARS-unique fold in the Rousettus bat coronavirus HKU9

SARS coronavirus unique domain: three-domain molecular architecture in solution and RNA binding

The computer aided resonance assignment tutorial, 1st edn. Cantina Verlag

A G-quadruplex-binding macrodomain within the "SARS-unique domain" is essential for the activity of the SARS-coronavirus replicationtranscription complex

NsP3 of coronaviruses: structures and functions of a large multi-domain protein

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

P53 down-regulates SARS coronavirus replication and is targeted by the SARS-unique domain and PLpro via E3 ubiquitin ligase RCHY1

Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV

Nuclear magnetic resonance structure of the nucleic acid-binding domain of severe acute respiratory syndrome coronavirus nonstructural protein 3

TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts

Structure of the SARS-unique domain C from the bat coronavirus HKU4

The "SARS-unique domain" (SUD) of SARS coronavirus is an oligo(G)-binding protein

The SARSunique domain (SUD) of SARS coronavirus contains two macrodomains that bind G-quadruplexes

) 1 H, 13 C and 15 N chemical shift referencing in biomolecular NMR

Acknowledgements Work at BMRZ is supported by the state of Hesse. Work in COVID19-NMR was supported by the Goethe Corona Funds and the DFG in CRC902: "Molecular Principles of RNA-based regulation." Work at CERM is supported by the Italian Ministry for University and Research (FOE funding) to the Italian Center (CERM, University of Florence) of Instruct-ERIC, a European Research Infrastructure, ESFRI Landmark. This work was also supported by the INSPIRED (MIS 5002550) which is implemented under the Action 'Reinforcement of the Research and Innovation Infrastructure,' funded by the Operational Program 'Competitiveness, Entrepreneurship and Innovation' (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). EU FP7 REGPOT CT-2011-285950-"SEE-DRUG" project is acknowledged for the purchase of UPAT's 700 MHz NMR equipment.

Conflict of interest The authors declare no conflict of interest.