key: cord-0982884-nh340ang authors: Dang, Mei; Li, Yifan; Song, Jianxing title: ATP biphasically modulates LLPS of SARS-CoV-2 nucleocapsid protein and specifically binds its RNA-binding domain date: 2021-01-14 journal: Biochem Biophys Res Commun DOI: 10.1016/j.bbrc.2021.01.018 sha: ce822b7614369ccd2d296903784740f734b615ce doc_id: 982884 cord_uid: nh340ang SARS-CoV-2 is a highly contagious coronavirus causing the ongoing pandemic. Very recently its genomic RNA of ∼30 kb was decoded to be packaged with nucleocapsid (N) protein into phase separated condensates. Interestingly, viruses have no ability to generate ATP but host cells have very high ATP concentrations of 2–12 mM. A key question thus arises whether ATP modulates liquid-liquid phase separation (LLPS) of the N protein. Here we discovered that ATP not only biphasically modulates LLPS of the viral N protein as we previously found on human FUS and TDP-43, but also dissolves the droplets induced by oligonucleic acid. Residue-specific NMR characterization showed ATP specifically binds the RNA-binding domain (RBD) of the N protein with the average Kd of 3.3 ± 0.4 mM. The ATP-RBD complex structure was constructed by NMR-derived constraints, in which ATP occupies a pocket within the positive-charged surface utilized for binding nucleic acids. Our study suggests that ATP appears to be exploited by SARS-CoV-2 to prompt its life cycle by facilitating the uncoating, localizing and packing of its genomic RNA. Therefore the interactions of ATP with the viral RNA and N protein might represent promising targets for design of drugs and vaccines to terminate the pandemic. Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the etiologic agent of the ongoing pandemic [1] , which is a highly contagious beta-coronavirus of a large family of positive-stranded RNA coronaviruses with ∼30 kb genomic RNA packaged into a membrane-enveloped virion. SARS-CoV-2 has four structural proteins: the spike (S) protein that recognizes cell receptors angiotensin converting enzyme-2 (ACE2), nucleocapsid (N) protein responsible for packing viral genomic RNA, membrane-associated envelope (E) and membrane (M) proteins [2] . Its N protein is a 419-residue RNA-binding protein composed of two well-folded domains, namely N-terminal domain (NTD) over residues 44-173 and Cterminal domain (CTD) over residues 248-365 (Fig. 1A) , as well as three intrinsicallydisordered regions (IDRs) respectively over 1-43, 174-247 and 366-419 with low-complexity sequences (Fig. 1B) . The N protein plays multifunctional roles in the coronavirus life cycle, which include to assemble genomic RNA into the viral RNA-protein complex, as well as to localize it to replicase-transcriptase complexes. Previous studies revealed that its NTD is an RNA-binding domain (RBD) while CTD functions to dimerize/oligomerize to form highorder structures [3] [4] [5] [6] [7] . Very recently, it was decoded that the N protein functions through liquid-liquid phase separation (LLPS), which could be significantly induced by dynamic and multivalent interactions with various nucleic acids [5] [6] [7] . In this context, the N-terminal RBD was proposed to bind the specific sites of the genomic RNA for initiating the assembly into the RNA-N-protein condensates. Strikingly, the NMR structure of the SARS-CoV-2 N protein has been now reported both in the free state and in complex with short RNA fragments [8] . ATP (adenosine triphosphate) is best known as the universal energy currency for all living cells. Mysteriously the cellular concentrations of ATP are much higher than those required for its classic functions, ranging from 2 to 12 mM dependent of cell types [9, 10] . Only recently it was decoded that at concentrations > 5 mM, ATP functions as a biological hydrotrope to dissolve LLPS of RNA-binding proteins with the prion-like domains such as FUS [10] . Our NMR studies further identified that ATP can biphasically modulate LLPS of the intrinsically disordered domains (IDDs) of FUS and TDP-43 by specific binding to the Arg/Lys residues within IDDs: induction at low ATP concentrations but dissolution at high concentrations [11, 12] . Moreover, we also found that ATP is capable of specifically binding the RNA-recognition motif (RRM) domains of FUS and TDP-43, as well as to the SYNCRIP acidic domain (AcD) with an all-helical fold which is a non-canonical RNA-binding domain [13] [14] [15] . Intriguingly, viruses have no ability to generate ATP [16] and therefore upon release into the infected cell, the viral RNA-N-protein condensate of SARS-CoV-2 is anticipated to experience a sudden exposure to the environment with very high ATP concentrations. So two key questions of both fundamental and therapeutic significance arise: 1) does ATP have any effect on LLPS of the SARS-CoV-2 N protein? 2) can ATP specifically bind to its RBD domain whose structural fold is very different from those of RRM and AcD? In the present study, we first assessed the effect of ATP on LLPS of the SARS-CoV-2 N protein in the absence and in the pre-existence of 24-mer poly(A) (A24) as imaged by differential interference contrast (DIC) microscopy. Subsequently we characterized the binding of ATP to the RBD by NMR HSQC titrations, which led to determining the average dissociation constant (Kd) to be 3.3 ± 0.4 mM as well as to constructing the ATP-RBD complex structure. The gene encoding 419-residue SARS-CoV-2 N protein was purchased from a local company (Bio Basic Asia Pacific Pte Ltd), which was cloned into an expression vector pET-28a with a TEV protease cleavage site between N protein and N-terminal 6xHis-SUMO tag used to enhance the solubility. Its RNA-binding domain (RBD) over residues 44-180 was also cloned into the same vector. The recombinant N protein and its RBD domain were expression in E. coli cells BL21 with IPTG induction at 18 ºC. Both proteins were found to be soluble in the supernatant. For NMR studies, the bacteria were grown in M9 medium with addition of ( 15 NH 4 ) 2 SO 4 for 15 Nlabeling. The recombinant proteins were first purified by Ni 2+ -affinity column (Novagen) under native conditions and subsequently in-gel cleavage by TEV protease was conducted. The eluted fractions containing the recombinant proteins were further purified by FPLC chromatography system with a Superdex-75 column. The purity of the recombinant proteins was checked by SDS-PAGE gels and NMR assignment for RBD. The concentration of protein samples was determined by the UV spectroscopic method in the presence of 8 M urea [11] [12] [13] [14] [15] . ATP was purchased from Sigma-Aldrich as previously reported, and MgCl 2 was added into ATP for stabilization by forming the ATP-Mg complex [11] [12] [13] [14] [15] . The formation of liquid droplets was imaged on 50 µl of the N protein samples by DIC microscopy (OLYMPUS IX73 Inverted Microscope System with OLYMPUS DP74 Color Camera) as previously described [11, 12] . The N protein samples were prepared at 20 µM in 25 mM HEPES buffer (pH 7.5) with 70 mM KCl with addition of ATP or A24 in the same buffer at different molar ratios. Subsequently, the N protein samples at 20 µM in the same buffer with the pre-existence of A24 at 1:0.5 were also imaged with further addition of ATP at different molar ratios. Turbidity, the absorption at 600 nm, were measured three times for each sample. NMR experiments were conducted at 25 ºC on an 800 MHz Bruker Avance spectrometer equipped with pulse field gradient units and a shielded cryoprobe as described previously [11] [12] [13] [14] [15] . For NMR HSQC titrations with ATP, two dimensional 1 H-15 N NMR HSQC spectra were collected on the 15 N-labelled RBD samples at 50 µM in 10 mM phosphate buffer at pH 6.8 with 150 mM NaCl in the absence and in the presence of ATP in the same buffer at 0.5, 1, 2.5, 5, 7.5, 10, 15 and 20 mM respectively. Sequential assignment was achieved based on the deposited chemical shifts (BMRB ID of 34511) [8] . To calculate chemical shift difference (CSD), HSQC spectra collected without and with ATP at different concentrations were superimposed. Subsequently, the shifted HSQC peaks were identified and further assigned to the corresponding RBD residues. The chemical shift difference (CSD) was calculated by an integrated index with the following formula [11] [12] [13] [14] [15] : In order to obtain residue-specific dissociation constant (Kd), we fitted the shift traces of the 11 residues with significant shifts (CSD > average + STD) by using the one binding site model with the following formula as we previously performed [13] [14] [15] : The N protein of SARS-CoV-2 at 20 µM has a turbidity value of 0.08 (Fig. 1C ) and showed no droplets as imaged by DIC. However, upon titration with ATP, the turbidity Very recently, it was shown that LLPS of the SARS-CoV-2 N protein could be induced by various nucleic acids regardless of their sequences [5] [6] [7] . Indeed, here we found that A24 also biphasically modulated LLPS of the N protein as ATP but the ratios required to induce and dissolve droplets were much lower than those of ATP. Addition of A24 even at 1:0.5 resulted a turbidity value of 0.97 (Fig. 1C) as well as the formation of a large number of dynamic droplets with the diameter of some even close to ~2 µm (Fig. 1E) . Interestingly, additional addition of ATP to this sample led to the monotonic reduction of the turbidity values, as well as disappearance of the droplets as imaged by DIC. At 1:750, the droplets were completely disappeared. The results indicate that; 1) ATP has the much lower capacity than A24 in inducing LLPS of the N protein; but 2) nevertheless, ATP is able to dissolve the droplets of the N protein induced by A24, implying that ATP and A24 most likely target the J o u r n a l P r e -p r o o f same sites of the N protein for biphasically modulating its LLPS, similar to what we previously observed on the biophasic effects of ATP and oligonucleic acids on LLPS of FUS and TDP-43 IDRs [11, 12] . We then assessed whether ATP is able to bind the RBD of the SARS-CoV-2 N protein. As shown in Fig. 2A , the 15 N-labeled RBD (44-180) has a well-dispersed HSQC spectrum at 50 µM ( Fig. 2A) in 10 mM sodium phosphate buffer containing 150 mM NaCl (pH 6.8), with most peaks superimposable to those previously reported in different buffer at pH 5.5 [8] . Subsequently, we collected HSQC spectra of RBD by titrating ATP at different concentrations up to 20 mM. Interestingly, at ATP concentrations < 1 mM, only minor shifts of HSQC peaks were observed, indicating that the RBD has no significant binding with ATP at concentrations below mM. By superimposing HSQC spectra in the absence and in the presence of ATP at different concentrations, we found that only a small set of HSQC peaks showed large shifts upon adding ATP ( Fig. 2A) . We then calculated their chemical shift difference (CSD) induced by binding ATP at different concentrations and the results indicate that 11 residues have significant shifts which were largely saturated at 8 mM ATP (Fig. 2B) . Strikingly, the 11 residues were distributed over the whole RND sequence which include Asn48, Ser51, Leu56, Thr57, Arg89, Ala90, Arg92, Ser105, Arg107, Ala155 and Tyr172. As such, by fitting their CSD traces to the one-site binding model as we performed on other proteins [13] [14] [15] , we obtained the residue-specific dissociation constant (Kd) values of all 11 residues with the average value of 3.3 ± 0.4 mM (Fig. 2B) . Remarkably, upon mapping the 11 residues back to the NMR structure of RBD, these residues are clustered together to form a pocket (Fig. 2C ). Due to the extremely low binding affinity with Kd at mM, it is impossible to determine the three-dimensional structure of the ATP-BRD complex by NMR spectroscopy or X-ray crystallography. Therefore, to visualize the complex structure, the constraints derived from NMR titrations were utilized to guide molecular docking with the wellestablished HADDOCK program [13] [14] [15] . Fig. 3A presents the lowest-energy docking structure of the ATP-RBD complex, in which ATP occupies a pocket constituted by the residues with significant shifts of their HSQC peaks upon binding ATP (Fig. 2C ). This pocket is within the large surface of the RBD structure which is highly positively-charged ( Fig. 3B-3D ). In the complex, the purine ring of ATP appears to have π-cation interactions with several Arg residues. On the other hand, oxyanions of the β-phosphate of the triphosphate chain established three hydrogen bonds with RBD residues: two with Asn8 and one with Thr9 (Fig. 3E ). Very recently, the NMR structures of the RBD (44-180) of the SARS-CoV-2 N protein were reported also in complex with both single-stranded RNA (SsRNA) with a sequence of UCUCUAAACG (Fig. 4A ) and double-stranded RNA (DsRNA) with a sequence of CACUGAC (Fig. 4B) . Noticeably, superimposition of the structures with that of the ATP-RBD complex revealed that ATP occupies a pocket within the large positively-charged surface which bind both SsRNA and DsRNA. Therefore, the viral RBD of the SARS-CoV-2 N protein represent the third fold which is capable of binding ATP with Kd of ~mM, additional to the RRM (Fig. 4C) [13, 14] and AcD folds (Fig. 4D) [15] . Noticeably, although they have very different structural folds, their ATP-binding pockets are all located within the large surfaces which are utilized by these folds to bind various nucleic acids (13) (14) (15) . The catastrophic pandemic caused by SARS-CoV-2 has already resulted in infection of >81 millions and death of >1.7 millions. The SARS-CoV-2 N protein is not only the essential component of the packed viral genome, but also a key candidate for vaccine development due to its high expression in the infected cells [18, 19] . Very recently, the N protein was identified to undergo LLPS as induced by various nucleic acids. Therefore, to understand the underlying mechanisms for its LLPS is not only essential for developing therapeutic agents, but also critical for design of effective vaccines as the immune response to the phase separated N protein could be very different from those to the soluble forms/fragments of N protein. However, due to the extreme challenge in biophysically characterizing LLPS of the SARS-CoV-2 N protein for which both folded domains and IDRs are involved in, currently the high-resolution mechanism still remains unknown for its LLPS, particularly induced by nucleic acids. In the present study, for the first time, we found that like nucleic acids, ATP, which is not presented in the virions, does biphasically modulate LLPS of the N protein. On the other hand, as compared to A24, ATP has much weaker (~400-time weaker) capacity in both inducing and dissolving LLPS of the N protein. Furthermore, although much higher concentrations are needed, ATP can dissolve LLPS induced by A24. This set of results thus not only indicates that ATP and nucleic acids modulate LLPS of the N protein by targeting the same sites as we previously observed on FUS and TDP-43 [11, 12] , but further implies that the interactions of ATP and A24 with the N proteins are specific binding, rather than nonspecific electrostatic effects from the phosphate groups. As ATP has the triphosphate chain J o u r n a l P r e -p r o o f 13 which is much more negatively-charged than the phosphate group in A24, ATP is expected to have higher capacity than A24 in biphasically modulating LLPS of the N protein if the modulating capacity is mainly resulting from the non-specific electrostatic effect. However, the observation is opposite: ATP has much weaker capacity than A24 in both inducing and dissolving LLPS of the N protein. Therefore, based on the residue-specific results we previously obtained that ATP biphasically modulates LLPS of the IDDs of FUS and TDP-43 by specifically binding Arg/Lys residues within IDRs [11, 12] , here we propose that ATP biphasically modulates LLPS of the SARS-CoV-2 N proteins mainly by its bivalent interaction with Arg/Lys residues within its three IDRs (Fig. 1B) . Briefly ATP is capable of establishing the π-cation interaction between its puring ring and the sidechain cations of Arg/Lys as well as the electrostatic interaction between its triphosphate chain and the side chains of Arg/Lys located in its IDRs. This rationalize the result that A24 has much higher capacity than ATP, because A24 has the multivalent ability to bivalently interact with Arg/Lys residues of IDRs of the N protein, which thus leads to the significantly enhanced affinity [20] . On the other hand, due to the low numbers of Arg/Lys residues within three IDRs of the N protein (Fig. 1B) , ATP modulates LLPS of the SARS-CoV-2 N protein unlikely by the direct mechanism as we found on the 156-residue RGG-rich IDD of FUS by which the dynamic and bivalent binding of ATP to 25 Arg and 4 Lys residues is sufficient to drive the formation of large dynamic complexes manifesting as liquid droplets. Instead, ATP modulates LLPS of the N protein most likely by the indirect mechanism as we found with the 150residue TDP-43 prion-like domain with only 5 Arg and 1 Lys residue, by which the binding of ATP to Arg/Lys residues only acts to coordinate other forces, particularly the dimerization by a hydrophobic fragment, to drive LLPS. This may explain the previous reports that the dimerization domain of the N protein is essential for its RNA-induced LLPS [5] [6] [7] . Another novel and critical finding here is that although RBD of the SARS-CoV-2 N protein has a structural fold very different from those adopted by RRM as well as by AcD, it can also specifically bind ATP with similar Kd values of ~mM at a pocket within the surface for binding nucleic acids [13] [14] [15] . The result not only establishes the SARS-CoV-2 RBD to be the first viral domain capable of binding ATP at biologically-relevant concentrations (~mM), but also suggests that RBD may have an pivotal role in specifically regulating the uncoating, localizing and packing of the genomic RNA by the N protein. As illustrated in Fig. 4E , immediately after the infection, the SARS-CoV-2 will release its genomic RNA-Nprotein condensate into the infected cell, which is tightly packed into the gel-like state [5] . With consideration that at this stage, one infected cell may only have one to several copies of the condensate, the ratios between ATP and N protein/genomic RNA are very high. Consequently ATP acts to facilitate the condensate to be uncoated, such as to transform the gel-like condensate into more dynamic liquid droplets or even homogenous solution. Furthermore, once new copies of viral RNA polymerase and N-protein are synthesized by the host cell machinery, the ratios will reduce, and ATP may enhance LLPS of the mixture of the viral genomic RNA and N proteins as well as the host cell replicases to form replicasetranscriptase complexes. Finally, after all components needed for the assembly of new virions are synthesized by the infected cell, the ratios between ATP and N protein/genomic RNA will be further reduced and therefore a large population of the ATP-RBD complex will become dissociated. As such, the ATP-unbound RBD of the SARS-CoV-2 N protein become available for binding the specific sites of the genomic RNA to initiate the packing process, which might be even enhanced by ATP at low molar ratios. In summary, here for the first time we discovered that ATP not only biphasically modulates LLPS of the viral N protein of SARS-CoV-2, but also dissolves its LLPS induced by oligonucleic acid. Furthermore, ATP also specifically bind its RNA-binding domain with a A pneumonia outbreak associated with a new coronavirus of probable bat origin Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage The coronavirus nucleocapsid is a multifunctional protein The coronavirus nucleocapsid protein is dynamically associated with the replication-transcription complexes The SARS-CoV-2 Nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates SARS-CoV-2 nucleocapsid protein phaseseparates with RNA and with human hnRNPs Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein Lehninger's Principles of Biochemistry ATP as a biological hydrotrope A unified mechanism for LLPS of ALS/FTLD-causing FUS as well as its modulation by ATP and oligonucleic acids ATP regulates TDP-43 pathogenesis by specifically binding to an inhibitory component of a delicate network controlling LLPS ATP binds and inhibits the neurodegeneration-associated fibrillization of the FUS RRM domain ATP is a cryptic binder of TDP-43 RRM domains to enhance stability and inhibit ALS/AD-associated fibrillation ATP binds nucleic-acid-binding domains beyond RRM fold The Origins of Viruses HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets A proposed role for the SARS-CoV-2 nucleocapsid protein in the formation and regulation of biomolecular condensates The SARS-CoV-2 N Protein Is a Good Component in a Vaccine potential surface and ATP in ball-and-stick. (D) Expanded view of the ATP-binding pocket of RBD. (E) The ATP-RBD complex showing hydrogen bonds of the β-phosphate of the triphosphate chain of ATP with sidechain/backbone atoms of Asn8 as well as with backbone atom of Thr9 of RBD Superimposition of the ATP-RBD complex and NMR structure of the SsRNA-RBD complex (PDB ID of 7ACS) with RBD in ribbon (I) and in electrostatic potential surface Superimposition of the ATP-RBD complex and NMR structure of the dsRNA-RBD complex (PDB ID of 7ACT) with RBD in ribbon (I) and in electrostatic potential surface The complex structure of ATP with the TDP-43 RRM1 we previously determined This study is supported by Ministry of Education of Singapore (MOE) Tier 1 Grants R-154-000-B45-114 and R-154-000-B92-114 to Jianxing Song.J o u r n a l P r e -p r o o f