key: cord-0918408-7ema7ej5 authors: Iserman, Christiane; Roden, Christine A.; Boerneke, Mark A.; Sealfon, Rachel S.G.; McLaughlin, Grace A.; Jungreis, Irwin; Fritch, Ethan J.; Hou, Yixuan J.; Ekena, Joanne; Weidmann, Chase A.; Theesfeld, Chandra L.; Kellis, Manolis; Troyanskaya, Olga G.; Baric, Ralph S.; Sheahan, Timothy P.; Weeks, Kevin M.; Gladfelter, Amy S. title: Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid date: 2020-11-27 journal: Mol Cell DOI: 10.1016/j.molcel.2020.11.041 sha: 9453acec5be67e8ce8dc5c7add59bcf230d01e96 doc_id: 918408 cord_uid: 7ema7ej5 We report that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with viral RNA. N-protein condenses with specific RNA genomic elements under physiological buffer conditions and condensation is enhanced at human body temperatures (33°C and 37°C) and reduced at room temperature (22°C). RNA sequence and structure in specific genomic regions regulate N-protein condensation while other genomic regions promote condensate dissolution, potentially preventing aggregation of the large genome. At low concentrations, N-protein preferentially crosslinks to specific regions with single-stranded RNA flanked by structure and these features specify the location, number, and strength of N-protein binding sites (valency). Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is RNA sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules, and presents a screenable process for identifying antiviral compounds effective against SARS-CoV-2. Biomolecular condensates are required for multiple cell biological processes and can form through liquid-liquid phase separation (LLPS) of proteins containing intrinsically disordered domains (IDRs) and RNA-binding domains (Brangwynne et al., 2009; Molliex et al., 2015; Nott et al., 2015; Pak et al., 2016; Wang et al., 2018; Zhang et al., 2015) . Proteins with IDRs can sample many conformations to engage in weak, multivalent interactions that promote demixing and liquid-like properties of condensates (Holehouse and Pappu, 2018) . In many instances, RNA promotes condensate formation Maharana et al., 2018; Zhang et al., 2015) . Important protein features that determine condensates' molecular grammar have been discovered based on amino acid sequence composition. (Martin et al., 2020; Nott et al., 2015; Patel et al., 2015; Wang et al., 2018) . Nucleic acids are integral components of many biomolecular condensates however very little is understood about their roles. Indeed, only a few examples (Langdon et al., 2018; Ma et al., 2020; Maharana et al., 2018) describe specific RNA sequences and structures that contribute to LLPS. J o u r n a l P r e -p r o o f abundant virus-produced subgenomic RNAs (Masters, 2019) . Viral replication and gRNA packaging depends on the N-protein (Grossoehme et al., 2009; McBride et al., 2014) . N-protein must find this single gRNA molecule in the midst of many host and viral RNAs and must ensure the large gRNA does not become entangled, as has been observed for long cellular RNAs (Guillén-Boixet et al., 2020; Ma et al., 2020) . The N-protein has RNA-binding domains, forms multimers (Cong et al., 2017) and is predicted to contain IDRs ( Figure 1A) . N-protein thus has hallmarks of proteins that undergo liquid-liquid phase separation (LLPS), and we hypothesized LLPS, mediated by specific viral RNA sequences, may be important for SARS-CoV-2 processes such as viral genome packaging. We find that N-protein phase separates at 37°C with gRNA and that LLPS of N-protein is associated with specific patterns of RNA binding sites within gRNA. Specific RNA elements are correlated with condensate formation or dissolution potentially specifying the number and location of protein binding sites (valency). We further demonstrate that RNA elements encode condensate material properties. Thus, a combination of distinct viral RNA encoded elements ensures viral condensates of a specific molecular and physical identity. This study of SARS-CoV-2 reveals a new model viral system for uncovering rules for how RNA composition and physical state are specified in condensates and present new assays for screening viral LLPSdisrupting therapeutics. We reconstituted purified N-protein under physiological buffer conditions with RNAs encoding segments of SARS-CoV-2 gRNA and observed that N-protein produced either in mammalian cells (post-translationally modified) or bacteria (unmodified) phase separated with viral RNA ( Figure S1A and B) . Concentrations were chosen in part based on reported N-protein abundance in virions ( Bar-On et al., 2020) . Unmodified protein yielded larger, more abundant droplets, and the presence of an affinity tag or labeling the protein with dye did not alter behavior ( Figure S1B ). Since N-protein in SARS-CoV-1 virions is hypophoshorylated (Wu et al., 2009) , and packaging (initiated by binding of N-protein to gRNA) first occurs in the cytoplasm (Fehr and Perlman, 2015; Stertz et al., 2007) where N-protein is thought to be in its unphosphorylated state (Fung and Liu, 2018) , we used unmodified SARS-CoV-2 N-protein for subsequent experiments. J o u r n a l P r e -p r o o f sequence at the 3′-End encoding Nucleocapsid RNA. Importantly, similar results for viral RNA sequence were observed by the Morgan lab (Carlson et al., 2020) . N-protein binds gRNA in the cytosol in the presence of non-viral RNAs. We therefore assessed how non-viral, lung RNA influences LLPS. Total lung RNA did not alter N-protein-only LLPS; in contrast, when combined with 5′-End RNA, total lung RNA synergized with the 5′-End increasing the condensate size, number, and viral RNA enrichment (Figure S1F/G/H). gRNA is longer than many host RNAs and all subgenomic RNAs and we reasoned that length contributes to an electrostatically-driven component of N-protein LLPS given the protein pI (10.07). Addition of 0.3 kb or 2.4 kb of non-viral sequence to the 1 kb 5′-End or Frameshifting-region RNAs resulted in progressive increase in condensate size/number, with the 5′-End driving enhanced condensates relative to the Frameshifting-region at all lengths tested ( Figure 1H ). In sum, N-protein undergoes LLPS under physiological conditions, including in the presence of abundant nonspecific RNA, and LLPS is enhanced by viral RNA. Both specific viral RNA sequences (located at the 5′-End) and increased RNA length promotes N-protein to LLPS which raises the possibility that specific packaging of full-length genomic RNA (>30kb) could occur via LLPS. RNA SARS-CoV-2 replication is most efficient at 33°C (V'kovski et al., 2020) and we therefore assessed the temperature-dependence of LLPS. N-protein alone demixed into droplets in a temperature-dependent manner, highly pronounced at fever temperature (40°C) and above (45 °C) (Figure 2A /B/C/D, Figure S2A ). Addition of the 5′-End RNA lowered the most efficient condensation temperatures to 37 and 33°C (which correspond to the exterior lung and upper airway temperatures, respectively) (McFadden et al., 1985) . 5′-End RNA droplets included positive changes in their size and abundance in response to temperature, suggesting that temperature may change nucleation, fusion and/or ripening ( Fig. 2A/B) . The decrease in the critical temperature for LLPS was independent of RNA sequence and was also seen in condensates made of N-protein with Nucleocapsid RNA ( Figure S2A ). Further, protein concentration in solution was anti-correlated with surface area occupied by droplets ( Figure S2A ). In infected cells, subgenomic viral RNAs, like Nucleocapsid RNA, are highly abundant species (Kim et al., 2020) . We hypothesized that material property differences contribute to specific viral J o u r n a l P r e -p r o o f processes such as selective packaging of gRNA and examined N-protein condensates made with RNAs that yielded different material properties. We performed FRAP to examine droplets comprised of 5′-End versus Nucleocapsid RNA and observed that N-protein signal recovered faster in 5′-End RNA droplets (t=1/2, 14 seconds) than in Nucleocapsid RNA droplets (t=1/2, 28 seconds) ( Figure S2B ). The 5′-End RNA promoted larger, more liquid-like condensates; in contrast, the Nucleocapsid RNA and a non-viral (luciferase) RNA induced smaller, solid-like, flocculated condensates ( Figure S2C ). To assess relevance of these material differences to selectivity, we added Nucleocapsid RNA to preformed 5′-End droplets. 5′-End RNA readily mixed into preformed N-protein-5′-End condensates; in contrast, subgenomic Nucleocapsid RNA was excluded from the preformed 5′-End droplets (Nucleocapsid was excluded 10X more than 5′-End) and nucleated separate droplets (Figure 2E/F). Thus, material properties of Nprotein condensates have clear RNA sequence specificity that excludes other sequences. Different viral RNAs thus can promote or limit LLPS and yield different material properties ( Figure S2A /B/C). We hypothesized some RNA segments might function to maintain liquidity, and oppose problematic gelation. Given that the Frameshifting-region promoted dissolution at most concentrations, we examined whether this RNA could influence the condensation process and solubilize droplets made of other RNAs. Differences in droplet size and abundance reflect changes to the nucleation, coarsening and/or fusion capacity of condensates while flocculation is evidence of slow relaxation times for droplets that come in contact with one another due to the interplay of viscosity and surface tension (Berry et al., 2018) . We mixed Frameshiftingregion RNA with either 5′-End or Nucleocapsid RNA. Mixtures containing the 5′-End and We next examined how the underlying structures of SARS-CoV-2 RNA elements encode distinct LLPS behavior and material properties. We first experimentally assessed and modeled 5′-End and Frameshifting-region structures using SHAPE-MaP ( Figure 3A -D, Figure S3A /B) (Siegfried et al., 2014) . Both RNAs are highly structured. However, the Frameshiftingregion forms a greater number of more complex, multi-helix junction structures and has a higher J o u r n a l P r e -p r o o f A/U content (62% vs 52% for 5′-End) ( Figure 3A /B, Figure S4B /C). We next measured Nprotein interactions with viral RNAs using RNP-MaP which selectively crosslinks lysine residues to proximal RNA nucleotides, largely independent of nucleotide identity and local RNA structure (Weidmann et al., 2020) . We mapped N-protein interactions at protein:RNA ratios that promote either diffuse or condensed droplets for both the 5′-End and Frameshifting-region ( Figure S4A ). For the 5′-End in the diffuse state (20x excess protein), there are two prominent N-protein binding sites, and each occurs in a long A/U-rich unstructured region flanked by strong stemloop structures (Figure 3A/C). In the droplet state (80x/160x excess protein) the two principal sites from the diffuse state remain fully occupied and additional N-protein interaction sites appeared (the valency increased). In contrast, the Frameshifting-region showed generalized binding across the RNA by N-protein at all ratios (Figure 3B/D). Binding was observed in both single-stranded regions and also in A/U rich structured regions ( Figure 3B /D). In sum, N-protein interacts specifically with a few preferred sites in the 5′-End in both diffuse and condensed states, and interacts more homogeneously across the Frameshifting-region ( Figure 3E , Figure S4D ). These highly distinct protein interaction patterns suggest that the different regions have distinct modes of influence on LLPS: 1) specific, multivalent binding at limited sites in the 5′-End that then increase in number during condensation and 2) generalized binding across the Frameshifting-region at all protein:RNA ratios that is consistent with solubilization ( Figure 3E ). We hypothesized that the gRNA may be a mixture of sequences that promote LLPS (like the 5′-End) and that promote fluidity (Frameshifting-region). We therefore computationally evaluated sequence and structural properties of the 5′-End and Frameshifting-region and compared these to the rest of the gRNA. Compared to the 5′-End, most of the RNA genome has higher minimum free energies (MFEs) for predicted structures, a lower ∆G z-score, higher A/U-content, and higher ensemble diversity (ED; more dynamic structures) (Andrews et al., 2020) . All of these metrics imply that, most of the genome is similar to the Frameshifting-region ( Figure 4A /S4A/B/C, Supplementary File 1). Interestingly, the two major LLPS-promoting sequences, the 5′-End and nucleocapsid-encoding region at the 3′-End of the gRNA ( Figure 1E ) are predicted to share multiple features; particularly depletion in U and lower predicted MFE. Nucleocapsid-encoding RNA is predicted to contain a number of highly structured regions although Nucleocapsid-encoding RNA is not predicted to be as strongly structured as the 5′-End We tested these predictions with RNP-MaP experiments for two additional genome regions: (i) the Nucleocapsid RNA which is predicted to share sequence and structural features with the 5′-End and promote LLPS ( Figure 1E) , and (ii) the PS-region which shares sequence and structural features with the Frameshifting-region and similarly limits LLPS. Indeed, these two genome regions showed distinct N-protein binding patterns consistent with their distinct LLPS behaviors (Figure 4 B/C). Specifically, the Nucleocapsid RNA in the diffuse state (20X protein concentration) shows similarities in its N-protein interaction pattern to that of the 5′-End. The Nucleocapsid RNA sequence does not support generalized N-protein interaction in the diffuse condition and instead there are few N-protein interactions. These sites of interaction are not as densely occupied as the principle sites in the 5′-End RNA. Thus, the 5′-and 3′-End RNA share notable overall patterns. In contrast, the PS-region is homogeneously coated with N-protein in both diffuse and condensed states, similar to the Frameshifting-region. When combined, these data support a model in which the 5′ and 3′ genome ends interact with N-protein in a localized manner, and specifically drive LLPS, while interior gRNA regions are more uniformly coated in N-protein, and have a solubilizing property. To manipulate N-protein interactions with the RNA, we mutated a conserved residue in the predicted RNA-binding domain of N-protein, Y109, to alanine ( Figure 1A ). This mutation diminishes N-protein binding to RNA by ~2000-fold (Kang et al., 2020) and N-protein with the equivalent amino acid mutation failed to support viral replication in MHV, a related betacoronavirus (Grossoehme et al., 2009) . Y109A mutation eliminates 5′-End RNA driven condensation ( Figure 5A ) and interactions between N-protein Y109A and the 5′-End are diminished relative to the WT sequence, in both 20x (40% average binding decrease in N binding intensity) and 160x (50% N binding decrease) conditions. The Y109A mutant also binds at several new sites and overall, binding by the mutant is more diffuse and less punctate than for the wild-type protein (Figure 5B/C). N-protein Y109A has a greater propensity to demix in the absence of any added RNA and the Frameshifting-region RNA retains an ability to dissolve these protein-only condensates ( Figure S6 ) These data indicate that the RNA-binding domain plays a major role in LLPS but that N-protein motifs outside of Y109 contribute to solubilizing interactions. J o u r n a l P r e -p r o o f To assess the ability of N-protein to condense in cells, we co-transfected HEK293 cells with Nprotein fused to GFPspark and with H2B:mCherry (to mark nuclei in single cells). Cells with higher levels of transfection were more likely to form spherical droplets in the cytoplasm ( Figure 6A /B), suggesting N-protein condensation is concentration dependent. N-protein signal was generally excluded from the nucleus ( Figure 6C ). N-protein droplets readily underwent fusion We reasoned that screening for small molecules that increase or decrease N-protein LLPS or altered RNA recruitment could modify N-protein LLPS. We examined 1,6-hexanediol (Kroschwald et al., 2017) , lipoic acid (Wheeler et al., 2020) , and kanamycin (Blount et al., 2005) , each of which potentially alter LLPS by distinctive mechanisms. As a simple positive control which would be useful for future drug screening assays, we examined 1,6-hexanediol which disrupts LLPS (Kroschwald et al., 2017) and, indeed, prevented condensate formation ( Figure S7A /B). Lipoic acid dissolves cellular stress granules (Wheeler et al., 2020) , which, in the absence of N-protein phosphorylation, recruit SARS-CoV-1 N-protein during cellular stress (Peng et al., 2008) . Lipoic acid treatment reduced condensate size ( Figure 6G /H). The aminoglycoside kanamycin binds promiscuously to nucleic acids via electrostatic interactions, and was implicated as antiviral in HIV-1 by preventing RNA-protein interactions (Blount et al., 2005) . Addition of kanamycin to droplets decreased the size of condensates, decreased the protein/RNA ratio in the reconstitution assay ( Figure 6I /J), and caused N-protein to relocalize to the nucleus in 37% of treated cells ( Figure 6K ). In this work, we show that the SARS-CoV-2 nucleocapsid protein (N-protein) phase separates in an RNA sequence and structure-dependent manner. We present a potential mechanism for SARS-CoV-2 gRNA packaging through LLPS ( Figure 3E, Figure 7 : model), where N-protein condensate properties are conferred through specific gRNA sequences, structures, length. We find that distinct regions of the viral RNA genome either promote LLPS (5′-End region and nucleocapsid-encoding region (Nucleocapsid RNA) located at the 3′-End) or act as solubilizing elements (Frameshifting-region, PS-region). Multivalent polymer interactions are a driving force J o u r n a l P r e -p r o o f of LLPS and we propose that a punctate N-protein binding pattern enables the 5′-End and the 3′-End regions to promote LLPS. The Frameshifting-region and PS-region, conversely, are more uniformly bound by N-protein in both diffuse and droplet states. These sequences that are coated by N-protein have features predicted to be shared across much of the genome ( Figure 4A ) suggesting that many regions may contribute non-specific electrostatic interactions, likely promoting fluidity and solubilization to limit entanglement of the large gRNA molecule. In this model, the full-length gRNA consists of a mixture of LLPS-promoting and aggregation-dissolving elements to promote regulated, selective LLPS thereby excluding packaging of host mRNAs ( Figure 7) . LLPS may concentrate components to ensure efficient packaging, and may also protect the sensitive gRNA in virions (van Doremalen et al., 2020) . The membrane protein (M-protein) is also a known interactor of both gRNA and N-protein and thus N-protein:genome condensates may also specifically interact with the M-protein to facilitate packaging of a single genome per virion (Narayanan et al., 2000) . Future experiments are needed to test the proposed model that LLPS governs packaging and to investigate the complex interplay of various interaction partners in N-protein LLPS. While our model is focused on proposing a role for N-protein LLPS in packaging, N-protein LLPS could also be important for SARS-CoV-2 viral replication. LLPS was previously implicated in replication of other viruses (Alenquer et al., 2019; Heinrich et al., 2018; Nikolic et al., 2017) , and RNA sequence, structure, and length could encode both specificity and material N-protein condensate properties that govern functions in viral replication independent of or in addition to membrane encapsulation as might occur in "replication factories". The temperature dependent LLPS of N-protein provides a potential explanation for how SARS-CoV-2 and other coronaviruses spread through the likely reservoir species Chinese horseshoe bats (Calisher et al., 2006) . Bat body temperature lowers during hibernation and goes up during flight. In order to propagate, viral proteins must adapt to bat temperature extremes. It is possible that defects N-protein LLPS slows viral replication during hibernation. Indeed, it has been observed that coronavirus infection occurs prior to hibernation and persists with mild symptoms during hibernation (Subudhi et al., 2017) . (Dao et al., 2018; Iserman et al., 2020; Jiang et al., 2015) . LCST LLPS is mainly driven by the presence of aromatic and hydrophobic amino acids (Dao et al., 2018; Jiang et al., 2015; Li et al., 2014) . In contrast, LLPS in response to lowered temperature, referred to as upper critical solution temperature (UCST), is mainly driven by polar residues such as arginine (Quiroz and Chilkoti, 2015) . Interestingly, the N-protein is rich in certain hydrophobic amino acids (particularly Alanine and Glycine) and also certain polar amino acids (particularly glutamic acid and arginine), relative to all vertebrate proteins (Table S2 ), suggesting that a balance of these amino acids, as well as its interactions with RNA dictates optimal N-protein LLPS. RNA sequence is not likely the major source of LCST, as 5′-End and Nucleocapsid RNAs behave similarly, and their structures do not resemble RNA thermometers. New antivirals are needed for existing, emerging and drug-resistant viral diseases. We suggest that LLPS could represent a new easily screenable target for antivirals. The two compounds we tested lipoic acid and kanamycin were chosen as proof of concept and could serve as positive controls for a screen. Specific RNA sequences and structures which regulate N-protein LLPS may also be targeted directly in the development of antiviral therapies. These straightforward in vitro and in vivo assays comprise a powerful starting point for evaluating compounds to reveal new classes of antiviral strategies that target phase-separation. This study addresses mechanisms of LLPS of components of the SARS-CoV-2 virus. However, because the work involved reconstitution experiments from purified components and expression of viral proteins in mammalian cells rather than in an actual infection it is still unclear what step(s) in the viral replication cycle may utilize the mechanisms described. critical reading of the manuscript, Alain Laederach for initial discussion on genomic sequence, and James Iserman for essential logistical support. A.S.G., C.I. and C. A. R. were supported by NIH R01GM081506, Fast Grants Award #2139, and an HHMI faculty Scholar Award, C.A.R. was supported by NIH T32 CA 9156-43, F32GM136164 and L'OREAL USA for Women in Science Fellowship. The work by RS, CYP, ASB, and CLT is supported by NIH grants R01HG005998, U54HL117798 and R01GM071966, HHS grant HHSN272201000054C K.M.W. is an advisor to and holds equity in Ribometrix, to which mutational profiling (MaP) technologies have been licensed. A.S.G is a scientific advisor of Dewpoint Therapeutics. C.I. is currently employed at Dewpoint Therapeutics. All other authors declare that they have no competing interests. ORF1ab. X axis is position in ORF1ab in codons. Y axis is level of synonymous constraint. Significant synonymous constraints at four confidence cutoffs (1e-3, 1e-4, 1e-5, 1e-6) assessed over a ten-codon sliding window are marked by magenta lines. Tested regions correspond to those shown in C. E) LLPS of N-protein is viral RNA sequence-dependent. Different RNA regions (magenta signal) from SARS-CoV-2 (at 5 nM) either drive or solubilize N-protein (1 µM) droplets (green signal). F) Ability of 5′-End and Frameshifting-region RNA to drive or solubilize condensation of N-protein (4 µM) over increasing RNA concentrations. Frameshifting-region only drives LLPS at 25 nM RNA. G) 5′-End promotes LLPS whereas Frameshifting-region promotes solubilization. Phase diagram of N-protein (green) with either 5′-End or Frameshifting-region RNA at indicated concentrations. Quantification corresponds to microscopy images in Figure S1D . H) Addition of RNA length enhances N-protein LLPS. (2 µM) LLPS was assessed with Frameshifting-region and 5′-End RNAs extended with non-specific plasmid sequences (at 5 nM RNA). Scale bar, 8 µm unless otherwise noted. E) Sub-genomic Nucleocapsid RNA is excluded from preformed 5′-End droplets. 5′-End (yellow, upper panel) is recruited into preformed 5′-End /N-protein droplets (pink and green) but Nucleocapsid RNA (yellow, lower panel) is not efficiently recruited and forms separate condensates. F) Quantification of (E) showing intensity of second RNA added to preformed droplets. Nucleocapsid RNA (purple) has lower distribution of signal than 5′-End RNA (grey) in regions with high preformed 5′-End RNA signal. G) Mixing 5′-End and Frameshifting-region RNAs makes N-protein condensates with intermediate properties. Left: 5′-End (magenta) and N-protein (green) produced condensates. Middle: Frameshiftingregion (yellow) and N-protein did not produce condensates. Right: Combination of 5′-End and Frameshift-region produced smaller condensates than 5′-End alone. Scale bar, 8 µm unless otherwise noted. Violin plots are scaled to have equal widths. Outliers not shown. A and B) SHAPE-Map secondary structure models for the 5′-End (A) or Frameshifting-region (B). RNP-MaP N-protein binding sites, are marked by lines. Two principal binding sites on the 5′-End RNA, both flanked by strong RNA structures are emphasized with arrows. C and D) 5′-End (C) and Frameshiftingregion (D) display condition-specific RNP-MAP reactivity in condensed (80x, 160x) and diffuse (20x) conditions. X-axis is the position in nucleotides, y-axis is the reactivity (SHAPE or RNP-Map). i: Windowed (15 nt windows) median SHAPE reactivity (black). ii: RNP-MaP site density (sites per 15 nt windows); individual nt SHAPE reactivities in colored histograms iii: Arcs indicate base pair probabilities (from SHAPE). iv: N-protein binding sites (boxes: purple, at 160x; with black border, 20x, purple with black border, in 160x and 20x). v-vi: Raw RNP-MaP reactivity (black) in all conditions. Purple shading highlights RNP-MaP sites. E) Model for LLPS: Left panel: 5′-End LLPS coincides with an increase in valency with specific N-protein binding sites. Right panel: Frameshifting-region RNA has many binding sites (dashed arrows: ensembles of binding sites at lower N concentrations) that enrich N-protein and prevent condensate formation, unless excess N-protein is present to drive LLPS via protein-protein interaction. J o u r n a l P r e -p r o o f A) Computational prediction of the similarity of viral genomic sequences (MFE, ∆G z-score, ensemble diversity (ED), A/U content) to 5′-End or Frameshifting-region. Mean for each feature is computed over all 120 base pair windows with center in the region of interest. (B) and (C) Nucleocapsid RNA (predicted 5′-End-like sequence) RNP-MaP reactivities display similar patterns of binding to 5′-End between 20x and 160x conditions. PS-region RNA (predicted Frameshifting-region-like sequence) RNP-map reactivities display similar patterns of binding to Frameshifting-region between 20x and 160x conditions. i: Windowed (15 nt windows) median SHAPE reactivity (black). ii: RNP-MaP site density (sites per 15 nt windows). iii: Arcs indicate base pair probabilities (from SHAPE). iv: N-protein binding sites. (boxes: purple, at 160x; with black border, 20x, purple with black border, in 160x and 20x). v-vi: Raw RNP-MaP reactivity (black) in all conditions. Purple shading highlights RNP-MaP sites. Condensates recovered to 24% within 1 minute. Error bars show standard error from N=18 condensates. G) 2.38 µg/ml lipoic acid partially prevents N-protein/Frameshifting-region RNA LLPS relative to ethanol vehicle. Images show merge of protein (green) and RNA (red) signals. H) For lipoic acid, size and protein/RNA ratio is reduced relative to vehicle. Left, quantification of condensate area depicted in (G) and right quantification of protein/RNA ratio. I) 0.5 mg/ml kanamycin partially prevents N-protein/Frameshifting-region RNA LLPS relative to water vehicle. Images show merge of protein (green) and RNA (red) signals. J) For kanamycin, size and protein/RNA ratio is reduced relative to vehicle. Left, quantification of condensate area depicted in (I) and right quantification of protein to RNA ratio. K) 5 mg/ml kanamycin causes relocalization of N:GFP (Fire LUTs) to the cell nucleus (magenta H2B:mCherry signal) in 37% of treated cells (N= 105, 0% in H2O, N=100). Scale bar, 10 µm unless otherwise noted. Model: packaging of gRNA may be a temperature dependent LLPS process driven by single-stranded regions flanked by structured regions (5′-End-like) that are stable N-protein binding sites. The majority of the genome resembles the solubilizing (Frameshifting-region-like), while the region coding for N-protein is similar to the 5′-End. The balance between LLPS-promoting and solubilizing elements may facilitate gRNA packaging. Initial step of packaging (LLPS of N-protein with gRNA) may be targeted by compounds that either 1) induce condensate dissolution (1,6-hexanediol), 2) adjust condensate size through changes in kinetics or critical concentration (example kanamycin, lipoic acid) or 3) adjust protein/RNA ratio (example kanamycin). Resource Availability Lead Contact: Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dr. Amy Gladfelter (amyglad@unc.edu). Materials Availability: Materials such as plasmid constructs will be available without further restrictions upon request to the Lead Contact (amyglad@unc.edu). All code is published. Raw and processed sequencing datasets analyzed in this study have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo/, (accession no. XXXXX). Experimental Model and Subject Details: HEK293, HEK293T, and Vero-E6 cells were obtained from ATCC for this study. All cell lines were maintained in DMEM (Corning 10-013-CV) supplemented with 10% Fetal Bovine Serum (Seradigm V500-050) and grown at 37°C. No antibiotics were used. for 1 hour. Droplets were imaged for seven seconds with one second per frame. Following bleaching with 405 nm laser, recovery was monitored for at least one minute with one second frame intervals. Puncta fluorescence recovery was quantified using image J. Fluorescence was relative to both the initial unbleached signal and an unbleached droplet in the same frame. Quantification represents 8 (5′-End) or 9 (Nucleocapsid) droplets from 9 movies with error bars depicting standard error. To calculate T1/2 data were fit to a rising single exponential function. (ft(A,k,x) = A*(1-exp(x*k)). condensates were preformed and after 1.5 h incubation, 5 nM cy5-labeled RNA of interest was added mixed, and incubated for another 14 hours before imaging. and incubated at 37°C for 10 minutes, heated to 95°C for 5 minutes, cooled on ice for 2 J o u r n a l P r e -p r o o f minutes, and warmed to 37°C for 2 minutes. Proteinase K was then added to 0.5 mg/ml and incubated for 1 hour at 37°C, followed by 1 hour at 55°C. RNA was purified with 1.8× Mag-Bind TotalPure NGS SPRI beads (Omega Bio-tek), purified again (RNeasy MinElute columns, Qiagen), and eluted with 14 µl of nuclease-free water. Library preparation and Sequencing: Double-stranded DNA (dsDNA) libraries for sequencing were prepared using the randomer Nextera workflow (Smola et al., 2015) . Briefly, purified cDNA was added to an NEBNext second-strand synthesis reaction (NEB) at 16°C for 150 minutes. dsDNA products were purified and size-selected with SPRI beads at a 0.8× ratio. Nextera XT (Illumina) was used to construct libraries according to the manufacturer's protocol, followed by purification and size-selection with SPRI beads at a 0.65× ratio. Library size distributions and purities were verified (2100 Bioanalyzer, Agilent) and sequenced using 2x300 paired-end sequencing on an Illumina MiSeq instrument (v3 chemistry). were greater than 50,000 and nucleotides with a read depth of less than 5000 were excluded from analysis. The Superfold analysis software (Smola et al., 2015) was used with SHAPE reactivity data to inform RNA structure modeling by RNAStructure (Reuter and Mathews, 2010) . Default parameters were used to generate base-pairing probabilities for all nucleotides (with a max pairing distance of 200 nt) and minimum free energy structure models. The local median SHAPE reactivity were calculated over centered sliding 15-nt windows to identify structured RNA regions with median SHAPE reactivities below the global median. Secondary structure projection images were generated using the (VARNA) visualization applet for RNA (Darty et al., 2009 ). A custom RNP-MaP analysis script (Weidmann et al., 2020) was used to calculate RNP-MaP "reactivity" profiles from the Shapemapper 2 "profile.txt" output. RNP-MaP "reactivity" is defined as the relative MaP mutation rate were detected based on a set of 44 Sarbecovirus genomes listed in (Jungreis et al., 2020) . Genic regions were extracted, translated, and aligned based on the amino acid sequence using Muscle version 3.8.31 (Edgar, 2004) . For each gene, sequences with less than 25% identity to the reference SARS-CoV-2 sequence (NC_045512) were removed. A nucleotide-level codon alignment was constructed based on the amino acid alignment, and gene-specific phylogenetic trees were constructed using RAxML version 8.2.12 with the GTRGAMMA model of nucleotide evolution (Stamatakis, 2014) . Regions with excess synonymous constraint at a significance level of 1e-5 in ten codon windows were extracted for further analysis. Thirty base pairs of flanking sequence were added on either side of each synonymous constraint element and RNAz 2.1 (Gruber et al., 2010 ) was used to scan for conserved, stable RNA structures. The rnazWindow.pl script was used to filter alignments and divide into 120 base pair windows. Secondary structure detection was performed for both strands with SVM RNA-class probability set to 0.1. Percentage AT, mean free energy ∆G, ∆G Z-score, and ensemble diversity was determined in 120bp sliding windows, where mean free energy ∆G, ∆G Z-score, and J o u r n a l P r e -p r o o f ensemble diversity are taken from (Andrews et al., 2020) . All windows with center <= 1000 were used for the 5′-End region, and all windows with center >= 13401 and <14401 were used for the Frameshifting-region. Genome Analysis: A support vector machine with a linear kernel was trained using the scikit-learn Python library to distinguish between 120-base pair sliding windows in the 5′-End and the Frameshifting-region of SARS-CoV-2 (NC_045512.2 ) based on the following features: percent A content, percent U content, mean free energy ∆G, ∆G Zscore, and ensemble diversity, where mean free energy ∆G, ∆G Z-score, and ensemble diversity values were taken from (Andrews et al., 2020) . Features were scaled before classification to have mean of 0 and standard deviation of 1. All windows with center <= 1000 were used for the 5′-End, and all windows with center >= 13401 and <14401 were used for the Frameshifting-region. The classifier was then applied to all 120 bp windows outside the 5′-End and Frameshifting-region. The probability estimate for each sliding window of assignment to the 5′-End was plotted, after linearly re-scaling the probabilities for visualization purposes to have maximum of 1 and minimum of -1. Windows in the 5′- End are plotted with their class labels of 1, and windows in the Frameshifting-region are plotted with their class labels of -1. Images of representative cells are taken from at least 6 biological replicates pooled from at least 3 independent rounds of transfection/drug treatment. Average fluorescence intensity and area were obtained by thresholding max projections in ImageJ. Number of puncta per cell was manually counted from max projections. For FRAP, fluorescence was normalized by subtracting background fluorescence and relative to both the initial unbleached signal and an unbleached puncta in an unbleached cell in the same frame. Quantification represents 18 puncta from 15 movies with error bars depicting standard error. For measurements of in vitro condensates, experiments were repeated >3 times and droplets were segmented based on a threshold of 4*background intensity. Any segmented region with an area less than 0.07 µm 2 was removed. The average protein and RNA intensity values within each droplet were calculated, and protein/RNA ratios were determined by dividing these averages on a per-droplet basis. For protein intensity and area, a two-sample Kolmogorov-Smirnov (KS) test was applied to compare protein-only with protein+RNA distributions at each temperature. A two-sample KS test was also used to make pairwise comparisons between each of the protein-only distributions, and between each of the protein+RNA distributions. Similarly, a two-sample KS test was performed to compare protein/RNA ratios. Images were processed in ImageTank (O'Shaughnessy et al., 2019) and plotted with Python using Matplotlib and Seaborn. Statistics were performed in Python with SciPy. U U A A A G G U U U A U A C C U U C C C A G G U A A C A A A C C A A C C A A C U U U C G A U C U C U U G U A G A U C U G U U C U C U A A A C G A A C U U U A A A A U C U G U G U G G C U G U C A C U C G G C U G C A U G C U U A G U G C A C U C A C G C A G U A U A A U U A A U A A C U A A U U A C U G U C G U U G A C A G G A C A C G A G U A A C U C G U C U A U C U U C U G C A G G C U G C U U A C G G U U U C G U C C G U G U U G C A G C C G A U C A U C A G C A C A U C U A G G U U U C G U C C G G G U G U G A C C G A A A G G U A A G A U G G A G A G C C U U G U C C C U G G U U U C A A C G A G A A A A C A C A C G U C C A A C U C A G U U U G C C U G U U U U A C A G G U U C G C G A C G U G C U C G U A C G U G G C U U U G G A G A C U C C G U G G A G G A G G U C U U A U C A G A G G C A C G U C A A C A U C U U A A A G A U G G C A C U U G U G G C U U A G U A G A A G U U G A A A A A G G C G U U U U G C C U C A A C U U G A A C A G C C C U A U G U G U U C A U C A A A C G U U C G G A U G C U C G A A C U G C A C C U C A U G G U C A U G U U A U G G U U G A G C U G G U A G C A G A A C U C G A A G G C A U U C A G U A C G G U C G U A G U G G U G A G A C A C U U G G U G U C C U U G U C C C U C A U G U G G G C G A A A U A C C A G U G G C U U A C C G C A A G G U U C U U C U U C G U A A G A A C G G U A A U A A A G G A G C U G G U G G C C A U A G U U A C G G C G C C G A U C U A A A G U C A U U U G A C U U A G G C G A C G A G C U U G G C A C U G A U C C U U A U G A A G A U U U U C A A G A A A A C U G G A A C A C U A A A C A U A G C A G U G G U G U U A C C C G U G A A C U C A U G C G U G A G C U U A A C G G A G G G G C A U A C A C U C G C U A U G U C G A U A A C A A C U U C U G U G G C C C U G A U G G C U A C C C U C U U G A G U G C A U U A A A G A C C U U C U A G C A C G U G C U G G U A A A G C U U C A U G C A C U U U G U C C G A A C A A C U G G A C U U U A U U G A C A C U A A G A G G G G U G U A U A C U G C U G C C G U G A A C A U G A G C A U G A A A U U G C U U G G U AG C U G U A G U U G U G A U C A A C U C C G C G A A C C C A U G C U U C A G U C A G C U G A U G C A C A A U C G U U U U U A A A C G G G U U U G C G G U G U A A G U G C A G C C C G U C U U A C A C C G U G C G G C A C A G G C A C U A G U A C U G A U G U C G U A U A C A G G G C U U U U G A C A U C U A C A A U G A U A A A G U A G C U G G U U U U G C U A A A U U C C U A A A A A C U A A U U G U U G U C G C U U C C A A G A A A A G G A C G A A G A U G A C A A U U U A A U U G A U U C U U A C U U U G U A G U U A A G A G A C A C A C U U U C U C U A A C U A C C A A C A U G A A G A A A C A A U U U A U A A U U U A C U U A A G G A U U G U C C A G C U G U U G C U A A A C A U G A C U U C U U U A A G U U U A G A A U A G A C G G U G A C A U G G U A C C A C A U A U A U C A C G U C A A C G U C U U A C U A A A U A C A C A A U G G C A G A C C U C G U C U A U G C U U U A A G G C A U U U U G A U G A A G G U A A U U G U G A C A C A U U A A A A G A A A U A C U U G U C A C A U A C A A U U G U U G U G A U G A U G A U U A U U U C A A U A A A A A G G A C U G G U A U G A U U U U G U A G A A A A C C C A G A U A U A U U A C G C G U A U A C G C C A A C U U A G G U G A A C G U G U A C G C C A A G C U U U G U U A A A A A C A G U A C A A U U C U G U G A U G C C A U G C G A A A U G C U G G U A U U G U U G G U G U A C U G A C A U U A G A U A A U C A A G A U C U C A A U G G U A A C U G G U A U G A U U U C G G U G A U U U C A U A C A A A C C A C G C C A G G U A G U G G A G U U C C U G U U G U A G A U U C U U A U U A U U C A U U G U U A A U G C C U A U A U U A A C C U U G A C C A G G G C U U U A A C U G C A G A G U C A C A U G U U G A C A C U G A C U U A A C A A A G C C U U A C A U U A A G U G G G A U U U G U U A A A A U A U G A C U U C A C G G A A G A G A G G U U A A A A C U C U U U G A C C G U U A U U U U A A A U A U U G G G A U C A G A C A U A C C A C C C A A A U U G U G U U A A C U G U U U G G A U G A C A G A U G C A U U C U G C A U U G U G C A A A C U U U A Influenza A virus ribonucleoproteins form liquid organelles at endoplasmic reticulum exit sites An in silico map of the SARS-CoV-2 RNA Structurome RNA-Based Coacervates as a Model for Membraneless Organelles: Formation, Properties, and Interfacial Liposome Assembly Reentrant Phase Transition Drives Dynamic Substructure Formation in Ribonucleoprotein Droplets SARS-CoV-2 (COVID-19) by the numbers Physical principles of intracellular organization via active and passive phase transitions Conformational constraint as a means for understanding RNA-aminoglycoside specificity Spontaneous driving forces give rise to protein-RNA condensates with coexisting phases and complex material properties Germline P granules are liquid droplets that localize by controlled dissolution/condensation Accurate detection of chemical modifications in RNA by mutational profiling (MaP) with ShapeMapper 2 Guidelines for SHAPE Reagent Choice and Detection Strategy for RNA Structure Probing Studies Bats: important reservoir hosts of emerging viruses Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions Coronavirus nucleocapsid proteins assemble constitutively in high molecular oligomers Ubiquitin Modulates Liquid-Liquid Phase Separation of UBQLN2 via Disruption of Multivalent Interactions Organelle-like membrane compartmentalization of positive-strand RNA virus replication factories Prediction of protein disorder based on IUPred MUSCLE: multiple sequence alignment with high accuracy and high throughput The disordered P granule protein LAF-1 drives phase separation into droplets with tunable viscosity and dynamics Coronaviruses: An Overview of Their Replication and Pathogenesis Post-translational modifications of coronavirus proteins: roles and function Coronavirus N protein N-terminal domain (NTD) specifically binds the transcriptional regulatory sequence (TRS) and melts TRS-cTRS RNA duplexes RNAz 2.0: improved noncoding RNA detection RNA-Induced Conformational Switching and Clustering of G3BP Drive Stress Granule Assembly by Condensation Measles virus nucleo-and phosphoproteins form liquid-like phaseseparated compartments that promote nucleocapsid assembly Functional Implications of Intracellular Phase Transitions Assembly of severe acute respiratory syndrome coronavirus RNA packaging signal into virus-like particles is nucleocapsid dependent Nucleocapsid protein-dependent assembly of the RNA packaging signal of Middle East respiratory syndrome coronavirus Matplotlib: A 2D Graphics Environment Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production Phase transition of spindle-associated protein regulate spindle apparatus assembly Sarbecovirus comparative genomics elucidates gene content of SARS-CoV-2 and functional impact of COVID-19 pandemic mutations Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites The Architecture of SARS-CoV-2 Transcriptome Ultrastructural characterization of arterivirus replication structures: reshaping the endoplasmic reticulum to accommodate viral RNA synthesis Hexanediol: A Chemical Probe to Investigate the Material Properties of Membrane-Less Compartments Functional analysis of the murine coronavirus genomic RNA packaging signal mRNA structure determines specificity of a polyQ-driven phase separation Molecular description of the LCST behavior of an elastin-like polypeptide Unstructured mRNAs form multivalent RNA-RNA interactions to generate TIS granule networks RNA buffers the phase separation behavior of prion-like RNA binding proteins Valence and patterning of aromatic residues determine the phase behavior of prion-like domains Coronavirus genomic RNA packaging The Coronavirus Nucleocapsid Is a Multifunctional Protein Thermal mapping of the airways in humans Identification of a specific interaction between the coronavirus mouse hepatitis virus A59 nucleocapsid protein and packaging signal Phase separation by low complexity domains promotes stress granule assembly and drives pathological fibrillization Transmissible gastroenteritis coronavirus genome packaging signal is located at the 5' end of the genome and promotes viral RNA incorporation into virions in a replicationindependent process RNA basepairing complexity in living cells visualized by correlated chemical probing SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs Characterization of the coronavirus M protein and nucleocapsid interaction in infected cells Negri bodies are viral factories with properties of liquid organelles Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles Virus factories: associations of cell organelles for viral replication and morphogenesis Software for lattice light-sheet imaging of FRET biosensors, illustrated with a new Rap1 biosensor Sequence Determinants of Intracellular Phase Separation by Complex Coacervation of a Disordered Protein A Liquid-to-Solid Phase Transition of the ALS Protein FUS Accelerated by Disease Mutation Phosphorylation of the arginine/serine dipeptiderich motif of the severe acute respiratory syndrome coronavirus nucleocapsid protein modulates its multimerization, translation inhibitory activity and cellular localization Sequence heuristics to encode phase behaviour in intrinsically disordered protein polymers RNAstructure: software for RNA secondary structure prediction and analysis Functional organization of cytoplasmic inclusion bodies in cells infected by respiratory syncytial virus Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies The intracellular sites of early replication and budding of SARScoronavirus A persistently infecting coronavirus in hibernating Myotis lucifugus, the North American little brown bat Disparate Temperature-Dependent Virus -Host Dynamics for SARS-CoV-2 and SARS-CoV in the Human Respiratory Epithelium. bioRxiv Aerosol and surface stability of HCoV-19 (SARS-CoV-2) The Coronavirus Nucleocapsid Protein Is Dynamically Associated with the Replication-Transcription Complexes SciPy 1.0: fundamental algorithms for scientific computing in Python A Molecular Grammar Governing the Driving Forces for Phase Separation of Prion-like RNA Binding Proteins Analysis of RNA-protein networks with RNP-MaP defines functional hubs on RNA Small Molecules for Modulating Protein Driven Liquid-Liquid Phase Separation in Treating Neurodegenerative Disease Glycogen synthase kinase-3 regulates the phosphorylation of severe acute respiratory syndrome coronavirus nucleocapsid protein and viral replication RNA Controls PolyQ Protein Phase Transitions We thank Rick Young, Phil Sharp, Alex Holehouse, Kathleen Hall, Andrea Sorrano, Ahmet Yildez and their lab members for sharing data and discussions, Timothy Mitchison for discussions and critical reading of the manuscript. David Adalsteinsson for his help with ImageTank software, Ian Seim for analysis consultation and discussions, Benjamin Stormo for Frameshifting-region