key: cord-0924167-flo8skqc authors: Jia, Zhenghu; Liu, Chen; Chen, Yuewen; Jiang, Heng; Wang, Zijing; Yao, Jialu; Yang, Jie; Zhu, Jiaxing; Zhang, Boqing; Yuchi, Zhiguang title: Crystal structures of the SARS‐CoV‐2 nucleocapsid protein C‐terminal domain and development of nucleocapsid‐targeting nanobodies date: 2021-10-30 journal: FEBS J DOI: 10.1111/febs.16239 sha: e857710f2a7e803c870bac28d76387447bb121c6 doc_id: 924167 cord_uid: flo8skqc The ongoing outbreak of COVID‐19 caused by SARS‐CoV‐2 has resulted in a serious public health threat globally. Nucleocapsid protein is a major structural protein of SARS‐CoV‐2 that plays important roles in the viral RNA packing, replication, assembly, and infection. Here, we report two crystal structures of nucleocapsid protein C‐terminal domain (CTD) at resolutions of 2.0 Å and 3.1 Å, respectively. These two structures, crystallized under different conditions, contain 2 and 12 CTDs in asymmetric unit, respectively. Interestingly, despite different crystal packing, both structures show a similar dimeric form as the smallest unit, consistent with its solution form measured by the size‐exclusion chromatography, suggesting an important role of CTD in the dimerization of nucleocapsid proteins. By analyzing the surface charge distribution, we identified a stretch of positively charged residues between Lys257 and Arg262 that are involved in RNA‐binding. Through screening a single‐domain antibodies (sdAbs) library, we identified four sdAbs targeting different regions of nucleocapsid protein with high affinities that have future potential to be used in viral detection and therapeutic purposes. COVID-19, an infectious disease caused by a severe acute respiratory syndrome coronavirus SARS-CoV-2, has infected more than 170 million people and caused the death of 3.7 million [1] [2] [3] . Due to the outbreak of COVID-19, WHO has declared a public health emergency of international concern. Since SARS-CoV-2 is newly emerged virus, there is no effective drug specifically targeting this type of virus. There is an urgent need to understand the fundamental biology of SARS-CoV-2 and develop efficient detection and effective therapeutic methods accordingly. As a beta-coronavirus (bCoV), SARS-CoV-2 shares four main structural proteins with other coronaviruses: spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins [4, 5] . Among them, the N-protein is abundantly expressed during infection with high immunogenicity [6] . The main role of N-protein is to associate with the genomic RNA to form a ribonucleoprotein (RNP) complex, also called capsid [7] . It also has role in viral replication, assembly, and infection [8, 9] . In addition, through its double stranded RNA binding activity, the N-protein also functions as a viral RNA silencing suppressor (VSR) by counteracting host RNA-mediated antiviral responses [10] . Because of its high abundance, it can also induce strong postinfectious immune responses, which makes it as a good target for diagnostic purpose and for vaccine development [11, 12] . The N-protein consists of two independently folded domains, the N-terminal domain (NTD) (residues 44-180), and the CTD (residues 255-362), connected by an intrinsically disordered linker (IDL) (residues 181-254). In addition, two disordered regions are positioned to the sides of NTD and CTD, called N-arm (residues 1-43) and C-tail (residues 363-419) [13] (Fig. 1A) . It is proposed that the NTD is responsible for RNA binding, while CTD is involved in RNA binding and oligomerization, and the IDL regulates the RNA binding activity of N-protein by affecting the interaction between the NTD and the CTD. The structures of N-NTD and N-CTD from several coronaviruses have been solved [14] [15] [16] [17] [18] . However, because of the high flexibility of the disordered regions and the complicated oligomerization assembly, the structure of the full-length N-protein (FLN) remains unknown [19] . Antibodies targeting the key proteins of coronaviruses, such as SARS-CoV-1, MERS-CoV, and SARS-CoV-2, have been proven to be useful for diagnosis and treatment purposes [20] [21] [22] [23] . Compared to conventional antibodies, single-domain antibodies (sdAbs), which were initially discovered from the llama peripheral blood, generally confer increased affinity and specificity for the antigen [24] . Due to the natural loss of light chain, sdAbs contains only a single variable domain (VHH) rather than two variable domains (VH and VL) observed in traditional antibodies, which constitute the antigen binding fragment (FAB) [24] . Interestingly, despite of the smaller size, VHHs cloned and expressed alone have comparable or even higher structural stability and binding activity to antigen compared to FABs [25] . sdAbs also have several additional advantages. For example, sdAbs are less subject to steric hindrance, which may prevent the binding of larger conventional antibodies [26, 27] and are easy to be constructed in the multivalent forms with high thermal stability [28] . So far, a series of neutralizing sdAbs against the RBD domain of SARS-CoV-1 and SARS-CoV-2 S proteins have been developed for the prevention and therapeutic purposes [29, 30] . Because the N-protein of SARS-CoV-2 is essential for viral RNP formation and genome replication, it has emerged as an important drug target. Blocking its RNA binding or dimerization properties has proven as a good strategy for the development of antiviral drugs [16, 31, 32] . In addition, because of its native high abundance, the N-protein is also suitable for developing antibodies used for rapid and accurate detection of virus. Here, we reported two crystal structures of SARS-CoV-2 N-CTD at resolutions of 2.0 A and 3.1 A, respectively. Our structures reveal the key residues involved in dimer formation and RNA-binding. In addition, we developed a series of sdAbs targeting the N-protein of SARS-CoV-2 that have the potential to be used for virus detection and therapeutic purposes. We screened a naive llama single-domain antibody library with a capacity of 10 9 cfuÁµg À1 . After three rounds of panning, several N-protein specific sdAbs were enriched. 96 phage plaques from the library were analyzed by ELISA, and 94 of them showed high absorbance values, proving positive in binding. After sequencing, 59 effective sdAbs sequences were obtained. Based on the diversity of amino acid sequences, 6 nonrepetitive sequences were finally classified (Fig. 1B) . Six positive sdAbs were recombinantly expressed in the periplasm of Escherichia coli and purified to homogeneity using affinity and size-exclusion chromatography (Fig. 1C) . The full-length and four truncated versions of N-proteins, including NTD + IDL + CTD (NLC), NTD + CTD (NC), NTD, and CTD, were also expressed in E. coli. The FLN was purified by a three-step purification protocol, including the affinity, ion exchange, and size-exclusion chromatography (SEC) steps, while the four truncated N-proteins were purified by a five-step one, including an additional TEV cleavage and post-TEV affinity purification steps (Fig. 1D) . In order to remove nucleic acids bound to N-protein, the additional nuclease was added after cell lysis. According to the SEC results, all the constructs containing CTD form dimer in solution. In contrast, NTD by itself forms monomer in solution (Table 1 ). This supports that CTD functions as a dimerization domain as shown in our crystal structure. We characterized the interactions between sdAbs and N-protein using isothermal titration calorimetry (ITC). We first tested their binding to the FLN. Four out of six sdAbs showed clear binding. The K d values of positive sdAb-N2, sdAb-N3, sdAb-N5, and sdAb-N6 are 1.75 µM, 4.37 µM, 3.97 µM, and 3.53 µM, respectively ( Fig. 2 ). Next, we tested the binding of these four sdAbs with NLC protein. Only sdAb-N2 and sdAb-N3 showed positive results with K d values of 2.24 µM and 1.09 µM, respectively (Fig. 3) , indicating that the binding of sdAbs-N5 and sdAb-N6 requires the presence of the N-arm or C-tail of N-protein. Subsequently, we tested the binding of sdAb-N2 with NC, NTD, and CTD separately. sdAb-N2 illustrated the clear binding with NC and CTD but not with NTD ( Fig. 4A -C). The binding affinity between sdAb-N2 and CTD (K d = 2.38 µM) is the same to those binding with the FLN (K d = 1.75 µM), NLC (K d = 2.24 µM), and NC (K d = 1.77 µM), suggesting CTD itself forms the major binding site for sdAbs-N2. We further analyzed the thermodynamics parameters of these molecular bindings. All the interactions of sdAb-N2 are mainly entropy-driven and involve endothermic enthalpy ( Table 2 ). The N values for these interactions are closed to 0.5, suggesting a 2 : 1 binding ratio between N-protein and sdAb-N2. This is consistent with the ratio of band intensities of N-protein and sdAb-N2 shown by SDS/PAGE following the SEC (Fig. 4D ). In contrast, sdAb-N3 does not bind with either NTD or CTD (Fig. 5 ), suggesting that in this case the linker region has the opposite effect and contributes to the binding with sdAb-N3. These results suggest that the hydrophobic effect is the most prominent driving force for sdAb-N2 binding. In contrast, the bindings of the other three sdAbs are mainly enthalpy-driven with the reduction of entropy (Table 2) , indicating more contribution from the specific interactions such as H-bonds and electrostatic interactions. We screened the crystals of SARS-CoV-2 N-protein CTD in the absence and presence of a CTD-targeting sdAb (sdAb-N2) and solved their crystal structures individually ( Table 3) . Regardless of the presence of sdAb-N2, both structures only contain CTD, however, their crystal packings and space groups are very different. In one structure determined at 2.0 A, there are two CTDs in the asymmetric unit (ASU), while in the other structure determined at 3.1 A, twelve CTDs are found in a single ASU (Fig. 6A) . In order to examine the quaternary structure in solution, purified CTDs were subjected to size-exclusion chromatography. It elutes as expected for a dimer with or without sdAb-N2 (Fig. 6B) . Both structures show an interface with extensive interactions between two CTD monomers, implicating a stable native dimeric structure (Fig. 6C) . These interactions are mainly contributed by the residues from two b-strands, including Ile320, Met322 from b1 and Thr329, Trp330, Tyr333, Ile337, Lys338 from b2, which buries a total surface area of~2500 A 2 . These residues are all conserved between the N-proteins of SARS-CoV-1 and SARS-CoV-2, but partially different in MERS (Fig. 6D ). Strands b1 and b2 form a b-hairpin motif, which is swapped between two monomers to form extensive intersubunit interactions. In contrast, the other inter-CTD interfaces observed in dodecameric structure are much smaller, burying only~100-400 A 2 , suggesting that they are probably only present iN-protein crystals. Therefore, we focus our analysis on the dimeric units from both structures. Two dimeric structures are very similar to each other. The root mean square deviation (RMSD) between the CTD (dimer) and CTD (dodecamer) chain AB is 0.4 A for 212 Ca atoms. The RMSD values between different dimeric units of dodecameric CTD are in the same ball park. Each monomer contains five ɑ-helices, two b-strands, two 3 10 -helices, and several connecting loops (Fig. 6A ). The analysis of the surface charge distribution of the dimeric CTDs reveals a positively charged pocket, constituted by a stretch of positively charged residues between Lys257 and Arg262 conserved among SARS-CoV-1, SARS-CoV-2, and MERS (Fig. 6E) . It has been shown that the mutations of these conserved positive residues can weaken the binding of RNA in SARS-CoV-1, SARS-CoV-2, and MERS [33] [34] [35] [36] , suggesting their common important role in RNA-binding. The COVID-19 pandemic has caused a historic impact on global health and the economy of society. Antibodies targeting the key structural proteins of SARS-CoV-2 have been proven to be effective in detecting and combating the virus. However, the precise selection of proper epitope is crucial for the development of The FEBS Journal (2021) ª 2021 Federation of European Biochemical Societies observed in dodecameric structure are much smaller compared to the dimer interface, also supporting the dimeric form is the minimal stable form of CTD. The key residues involved in dimer formation were identified and found conserved among different coronaviruses. Based on the surface charge analysis, we propose that a positively charged surface area is important for RNA-binding. Recently the crystal structures of SARS-CoV-2 N-protein CTD have also been reported by some other groups, which show a similar structure [18, 36] . N-protein is relatively conserved among different coronaviruses. The sequence identity of N-proteins among SARS-CoV-1, SARS-CoV-2, and MERS is more than 50%. Thus, the antibodies developed to target SARS-CoV-2 N-protein could also bind Nproteins of other coronaviruses with similar high affinities. This has an advantage in therapeutics because these antibodies could be used to treat multiple diseases, but meanwhile it also has a disadvantage in disease diagnosis due to the lack of specificity. In this work, we developed a series of sdAbs targeting The structure-based design is a useful strategy to improve the binding properties of antibodies. To obtain the complex structure between N-protein and sdAbs, we co-purified CTD with sdAb-N2, which presented as a complex on SEC with a 2:1 binding ratio. However, eventually the co-crystals produced using the complex protein contained only CTD protein. We reloaded the complex protein on SEC after storage, which showed two peaks at the elution volumes for the dissociated individual components, reflecting the relatively low stability of the complex. When the affinity is not high enough, there would be a mixture of CTD dimer and CTD dimer+N2 during crystallization. If CTD dimer is easier to crystallize compared to the complex, it might shift the equilibrium, further dissociate the complex, and prevent the complex from crystallizing. It is also possible that the binding of N2 induces a conformational change of CTD dimer, making the structure of CTD dimer in complex less favorable for crystal packing. The directed evolution in combination with the structure-based design would further improve the affinity and specificity of our sdAbs that make them practical candidates for the antiviral therapy and diagnosis purpose. A phage display sdAbs naive library with a capacity of 10 9 cfuÁµg À1 was used to screen the sdAbs. After three rounds of biopanning, sdAbs targeting to SARS-CoV-2 N-protein were enriched. For each round of biopanning, the antigen was coated on the immune tubes with the coating of 5% nonfat milk as control, after which phage library was added. The phages were incubated with N-protein for 1 h and washed with PBST (0.05% Tween 20+ PBS) buffer. The bound phages were eluted by digesting with 1 mL 0.25 mgÁmL À1 trypsin and amplified in E. coli SS320 cells cultured in 2xYT media. 96 individual clones from the third round of panning were picked for ELISA verification. The positive clones were sequenced. The cloning, expression, and purification of sdAbs sdAbs were cloned into the pET22b vector, which contains a C-terminal His-tag and a N-terminal pelB signal peptide. E. coli BL21 (DE3) cells (NEB) were used to express protein. The cells were grown at 37°C in 2xYT media supplemented with 100 lgÁmL À1 ampicillin and induced with 0.2 mM b-D-1-thiogalactopyranoside (IPTG) when OD600 reached 0.6-0.8. The cells were grown at 25°C for another 16 h and harvested by centrifugation (8000 g for 10 min at 4°C). The cells were resuspended with a hypertonic solution (30 mM Tris, 20% w/v sucrose, 1 mM EDTA, pH 8.8) and incubated for 20 min at 4°C. After centrifugation at 12 000 g at 4°C for 30 min, the pellets were resuspended with a hypotonic solution (5 mM MgSO 4 ), put on ice for 20 min, and then centrifuged at 12 000 g at 4°C for 20 min. The supernatant was loaded onto a 10 mL HisTrap HP column (GE Healthcare) pre-equilibrated with buffer A (10 mM HEPES, pH 7.4, 250 mM KCl). The protein was eluted using buffer A supplemented with 150 mM imidazole and concentrated using Amicon concentrators (3K MWCO from Millipore, Darmstadt, Germany). The concentrated protein was injected on a Superdex 200 16/600 gel filtration column (GE Healthcare) and eluted with buffer A [37] . The cloning, expression, and purification of the full-length and truncated N-proteins The FLN was cloned into pET28a vector, which contains a N-terminal His-tag, a T7 tag and a thrombin cleavage site. Four truncated N-protein versions were cloned into pET28HMT vector containing a N-terminal His-tag, an MBP-tag, and a TEV cleavage site. Plasmids were transformed into E. coli BL21 (DE3) cells (NEB) for expression. The growth condition is same as the one for sdAbs except that the used induction temperature was 30°C. The cells were lysed by sonication in a lysis buffer (10 mM HEPES, was injected on a Superdex 200 16/600 gel filtration column (GE Healthcare) and eluted with buffer A. For the truncated versions, the eluted protein from HisTrap HP column was first digested with TEV protease overnight and was further purified by an amylose resin column (New England Biolabs, Ipswich, MA, USA) and a TALON column (GE Healthcare) to remove the fusion tags. The protein from the flow-through of TALON was collected and purified by a SP Sepharose high efficiency column (GE Healthcare) using the same protocol for the FLN. Finally, the protein was purified by a Superdex 200 16/600 gel filtration column (GE Healthcare). The protein samples were concentrated to 10 mgÁmL À1 before stored at À80°C. Analytical gel filtration column, Superdex 75 3.2/300, was used to determine the solution form of CTD-N2 complex. The purified sdAbs and N-proteins were dialyzed in a buffer containing 10 mM HEPES, pH 7.4, and 150 mM KCl, at 4°C overnight. Titrations consisted of 20 injections of 2 lL of sdAbs into the cell solution containing N-proteins at a 10-fold lower concentration. Typical concentrations for the titrant were between 100 and 400 lM depending on the affinity. The reference cell was filled with water. Experiments were performed at 25°C and a stirring speed of 750 rpm on a PEAQ-ITC instrument (Malvern, Worcestershire, UK). Crystallization, data collection, and structure determination Protein crystals were grown at 18°C using the hangingdrop method. The crystals of CTD dodecamer (35 mgÁmL À1 ) were grown in 0.1 M phosphate citrate, pH 4.5, and 40% PEG300. Diffraction data were collected on BL18U1 at Shanghai Synchrotron Radiation Facility (SSRF) [38] to a resolution of 3.1 A. The crystals of CTD dimer (15 mgÁmL À1 ) were set up in the presence of sdAb-N2 and grown in 0.1 M potassium phosphate, pH 6.2, 0.2 M sodium chloride, and 52% PEG 200. Diffraction data from a single crystal was collected by in-house X-ray diffraction machine (Rigaku MicroMax-007 HF) to a resolution of 2.0 A. The datasets were indexed, integrated, and scaled using HKL [39] . Molecular replacements were performed using PHENIX [40, 41] . The model was built in COOT and refined by PHENIX [40] . UCSF Chimera was used to conduct all the structural analysis and generate structural figures [42] . Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A pneumonia outbreak associated with a new coronavirus of probable bat origin A Novel Coronavirus from Patients with Pneumonia in China Coronavirus genome structure and replication Coronaviruses: an overview of their replication and pathogenesis Sensitivity in detection of antibodies to nucleocapsid and spike proteins of severe acute respiratory syndrome coronavirus 2 in patients with coronavirus disease 2019 The coronavirus nucleocapsid is a multifunctional protein Coronavirus nucleocapsid protein facilitates template switching and is required for efficient transcription Nucleocapsid protein recruitment to replication-transcription complexes plays a crucial role in coronaviral life cycle SARS-CoV-2-encoded nucleocapsid protein acts as a viral suppressor of RNA interference in cells Antibody detection and dynamic characteristics in patients with coronavirus disease 2019 Detection of SARS-CoV-2-specific humoral and cellular immunity in COVID-19 convalescent individuals The SARS coronavirus nucleocapsid protein-forms and functions Crystal structure of the severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein dimerization domain reveals evolutionary linkage between coronaand arteriviridae Ribonucleocapsid formation of severe acute respiratory syndrome coronavirus through molecular action of the N-terminal domain of N protein Structural basis for the identification of the N-terminal domain of coronavirus nucleocapsid protein as an antiviral target Structural characterization of the N-terminal part of the MERS-CoV nucleocapsid by X-ray diffraction and small-angle X-ray scattering Structures of the SARS-CoV-2 nucleocapsid and their perspectives for drug design Coronavirus nucleocapsid proteins assemble constitutively in high molecular oligomers Structural basis of neutralization by a human anti-severe acute respiratory syndrome spike protein antibody, 80R Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion Evaluation of candidate vaccine approaches for MERS-CoV Importance of neutralizing monoclonal antibodies targeting multiple antigenic sites on the middle east respiratory syndrome coronavirus spike glycoprotein to avoid neutralization escape Bendahman N & Hamers R (1993) Naturally occurring antibodies devoid of light chains Single-domain antibodies and their formatting to combat viral infections Dual beneficial effect of interloop disulfide bond for single domain antibody fragments Fusion of hIgG1-Fc to 111In-anti-amyloid single domain antibody fragment VHH-pa2H prolongs blood residential time in APP/PS1 mice but does not increase brain uptake Llama antibody fragments with cross-subtype human immunodeficiency virus type 1 (HIV-1)-neutralizing properties and high affinity for HIV-1 gp120 Structural basis for potent neutralization of betacoronaviruses by singledomain camelid antibodies Neutralizing nanobodies bind SARS-CoV-2 spike RBD and block interaction with ACE2 Structurebased virtual screening and experimental validation of the discovery of inhibitors targeted towards the human coronavirus nucleocapsid protein Viral and host factors related to the clinical outcome of COVID-19 Structure of the SARS coronavirus nucleocapsid protein RNAbinding dimerization domain suggests a mechanism for helical packaging of viral RNA Nucleocapsid protein-dependent assembly of the RNA packaging signal of Middle East respiratory syndrome coronavirus Solution structure of the c-terminal dimerization domain of SARS coronavirus nucleocapsid protein solved by the SAIL-NMR method Structural insight into the SARS-CoV-2 nucleocapsid protein C-terminal domain reveals a novel recognition mechanism for viral transcriptional regulatory sequences A sensory appendage protein protects malaria vectors from pyrethroids The macromolecular crystallography beamline of SSRF HKL-3000: the integration of data reduction and structure solution-from diffraction images to an initial model in minutes PHENIX: a comprehensive Python-based system for macromolecular structure solution Coot: model-building tools for molecular graphics Chimera-a visualization system for exploratory research and analysis We thank J. Xu from the Instrument Analytical Center of the School of Pharmaceutical Science and Technology at Tianjin University for assisting in using the in-house X-ray diffraction machine, and the staff at the beamline BL18U1 at Shanghai Synchrotron Radiation Facility. Funding for this research was provided by the National Natural Science Foundation of China (no. 32022073 and 31972287 to Z.Y.), and the Natural Science Foundation of Tianjin (no. 19JCYBJC24500 to Z.Y.). The authors declare no conflict of interest. ZH. J designed methodology. CL, YW. C, HJ done the experiments. ZY supervised the project and wrote the manuscript. All authors read and approved the final manuscript. The atomic coordinates and structure factors for CTD dimer (PDB ID 7F2B) and DBM CTD dodecamer (PDB ID 7F2E) have been deposited in the RCSB Protein Data Bank.