key: cord-0996715-667urmpq authors: Masters, Paul S. title: Coronavirus genomic RNA packaging date: 2019-08-30 journal: Virology DOI: 10.1016/j.virol.2019.08.031 sha: cab98f1b37768539e98924c21710fe2133cf1cf0 doc_id: 996715 cord_uid: 667urmpq RNA viruses carry out selective packaging of their genomes in a variety of ways, many involving a genomic packaging signal. The first coronavirus packaging signal was discovered nearly thirty years ago, but how it functions remains incompletely understood. This review addresses the current state of knowledge of coronavirus genome packaging, which has mainly been studied in two prototype species, mouse hepatitis virus and transmissible gastroenteritis virus. Despite the progress that has been made in the mapping and characterization of some packaging signals, there is conflicting evidence as to whether the viral nucleocapsid protein or the membrane protein plays the primary role in packaging signal recognition. The different models for the mechanism of genomic RNA packaging that have been prompted by these competing views are described. Also discussed is the recent exciting discovery that selective coronavirus genome packaging is critical for in vivo evasion of the host innate immune response. The concluding stages of viral replication typically involve assembly and egress of progeny virions from the infected cell. At this point viruses must address a critical issue consequential to their parasitic status -genome packaging. The most fundamental formulation of the packaging problem is: how do viruses ensure the specific incorporation of their genomes into assembled virions, faced with competition from numerous other nucleic acid species in the infected cell? For RNA viruses (most of which are cytoplasmic), the solutions that have evolved to overcome this problem vary considerably, according to the needs imposed by the wide range of lifestyles and replication strategies that RNA viruses can follow. A broad array of positive-strand RNA viruses (picornaviruses, flaviviruses, alphaviruses, coronaviruses, and retroviruses) and negative-strand RNA viruses (rhabdoviruses, paramyxoviruses, and orthomyxoviruses) are generally held to selectively package genome-length RNA of the correct polarity and to exclude other viral and host species. This process occurs with a high degree of efficiency, although noteworthy exceptions can be found. For example, retroviruses package abundant amounts of host 7SL RNA (Onafuwa-Nuga et al., 2006) , and arenaviruses incorporate entire ribosomes (Sarute and Ross, 2017) . Specific packaging is often mediated by well-defined cis-acting RNA sequences or structures within the viral genome, designated packaging signals (PSs). One of the most intensively studied PS elements, named psi, is contained in the 5′ leader region of the HIV genome. The mechanism by which psi functions remains to be fully resolved, but recent studies have revealed that genome dimerization promotes the folding of psi and neighboring RNA elements into a conformer in which the genomic splice donor site is sequestered (Keane et al., 2015 (Keane et al., , 2016 . These and other structural features suggest how the viral Gag protein selects unspliced genomic RNA and how packaging is coupled to dimerization of the pseudodiploid retroviral genome. Another well characterized PS occurs within the nsP1 coding region of the genome of Venezuelan equine encephalitis virus and is highly conserved in many other alphaviruses. This element consists of a cluster of short stem-loop structures, each containing a GGG loop motif, which appear to contribute additively to the efficiency of genome packaging (Kim et al., 2011) . An additional level of complexity is encountered in the packaging of segmented viral genomes. Although bi-and tri-segmented viruses may depend upon stochastic processes in order to accumulate virions with a complete genome, a recent analysis by single-molecule fluorescent in situ hybridization has shown that most individual influenza virions have a full complement of eight genome segments (Chou et al., 2012) . Segment-specific PSs map near the termini of each segment, but how the network of segment-segment interactions is established is not yet precisely defined (Bolte et al., 2019) . Influenza viruses, as well as rhabdoviruses and paramyxoviruses, also provide examples of an important distinction that can exist for some enveloped viruses -the difference between encapsidation and packaging. For these viruses, both genomic and antigenomic RNA are encapsidated into helical ribonucleoproteins, https://doi.org/10.1016/j.virol.2019.08.031 Received 26 July 2019; Received in revised form 29 August 2019; Accepted 29 August 2019 but only genomic RNA is packaged into virions. A dedicated genomic structural element is not the only means by which an RNA virus can determine packaging specificity. Picornaviruses and flaviviruses, most of which lack a PS, tightly couple packaging to RNA replication (Nugent et al., 1999; Barrows et al., 2018) . In particular, for poliovirus no clearly defined PS has ever been uncovered, despite extensive searches (Jiang et al., 2014) . Moreover, exhaustive recoding experiments appear to rule out the existence of cryptic PSs in the poliovirus genome (Song et al., 2017) . This has led to the hypothesis that polio capsid protein-replicase interactions are sufficient to ensure the specificity of genome packaging. An alternative view that is developing for many other icosahedral positive-strand RNA viruses is that there exist multiple dispersed (and in many cases, poorly defined) genomic PSs. These are postulated to serve as recognition sites for capsid subunits, which cause collapse of genomic RNA into a more condensed configuration through protein-protein interactions. The resulting compacted intermediate structure then recruits additional capsid subunits to complete assembly of the virion shell (Borodavka et al., 2012; Mendes and Kuhn, 2018) . Coronaviruses (CoVs) are a family of enveloped RNA viruses that infect mammals and birds, causing mainly respiratory and gastrointestinal diseases (Masters and Perlman, 2013) . They have long been studied, in part because of their significant impact on livestock and companion animals. In humans, four coronaviruses, HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1, generally cause upper respiratory tract infections and are prevalent worldwide. Of paramount concern to human health, however, are two deadly coronaviruses that emerged just in this century (de Wit et al., 2016) . The first of these, the causative agent of severe acute respiratory syndrome (SARS), originated in bats. SARS-CoV caused an outbreak of atypical pneumonia that began in China in late 2002, spread worldwide, and was extinguished by mid-2003 through the rigorous application of public health measures. All told, the SARS epidemic was characterized by high human-to-human transmission and a case-fatality rate of 10%. Although SARS-CoV was eliminated from the human population, very closely related coronaviruses persist in bat reservoirs, posing a constant threat for reemergence. The second most noteworthy human coronavirus is the one responsible for Middle East respiratory syndrome (MERS), an outbreak of severe pneumonia that began in Saudi Arabia in 2012. At present, MERS is confined to the Arabian peninsula, from where cases have sporadically dispersed to other parts of the world. Like SARS-CoV, MERS-CoV has an ancestral origin in bats, but its proximal reservoir is dromedary camels, and transmission from this source in ongoing. In contrast to SARS-CoV, MERS-CoV does not spread as efficiently through human-to-human contact; nevertheless, it has an alarming case-fatality rate of 36%. The emergence of these two previously unknown coronaviruses has stimulated efforts to fully understand all aspects of the molecular biology of members of this family. The virions of coronaviruses contain four conserved structural proteins ( Fig. 1) : the spike (S) protein, which governs binding to host cell receptors and virus entry into cells; the membrane (M) protein and the envelope (E) protein, which mediate virion budding; and the nucleocapsid (N) protein, which together with genomic RNA (gRNA) constitutes the nucleocapsid. S, E, and M are embedded in a membrane envelope derived from the site of budding, the Golgi-endoplasmic reticulum intermediate compartment. The nucleocapsid resides in the interior of the virion envelope. The genomes of coronaviruses are nonsegmented, positive-sense RNA molecules of exceptional length (25-32 kb), with 5′ caps and 3′ polyadenylate tails (Fig. 1) . The downstream end of the genome contains the genes for the structural proteins in the invariant order S-E-M-N, as well as interspersed accessory genes. The 5′ two-thirds of the genome contains two open reading frames (rep1a and rep1b) encoding the viral replicase-transcriptase polyprotein, which is expressed by a ribosomal frameshifting mechanism. The replicase is cotranslationally processed into 15 or 16 subunits designated nonstructural proteins (nsps) that carry out RNAsynthesis functions common to most RNA viruses, as well as a collection of enzymatic activities unique to coronaviruses and other members of the nidovirus order (Ziebuhr, 2005) . There are three aspects of coronavirus molecular biology most critical to a consideration of genome packaging. First, viral infection entails not only replication of gRNA but also production of a set of subgenomic (sg) RNAs that serve as mRNAs for the downstream genes. The sgRNAs, synthesized via negative-strand intermediates, form a nested set, each member of which contains a 5′ genomic leader RNA fused to a segment of the 3′ end of the genome (Fig. 1) . Second, coronaviruses have helically symmetric nucleocapsids (Masters, 2006; Neuman et al., 2006; Gui et al., 2017) , a feature that is unusual for positive-strand RNA viruses, which generally have icosahedral capsids. Third, the assembly of the nucleocapsid into the virion occurs through interactions between the carboxy termini of multiple monomers of N protein and of M protein (Escors et al., 2001; Hurst et al., 2005; Verma et al., 2006) . For coronaviruses, the packaging problem amounts to achieving highly specific recognition and selection of gRNA from among an abundance of viral sgRNAs, cellular mRNAs, and other RNA species in infected cells. Although there have been some reports suggesting packaging of sgRNA (Hofmann et al., 1990; Sethna et al., 1989; Zhao et al., 1993) , it has generally been observed that stringently purified virions almost exclusively contain positive-sense gRNA (Makino et al., 1990; Escors et al., 2003; Kuo and Masters, 2013) . In the relatively few cases where it has been examined in detail, this selectivity has been found to be accomplished by means of a genomic PS. The PSs of coronaviruses are not broadly conserved across the family. Coronaviruses are taxonomically classified into four generaalpha-, beta-, gamma-, and deltacoronaviruses -and these are further sorted into lineages or subgenera. In particular, the betacoronaviruses comprise four phylogenetic lineages. The first of these, the lineage A betacoronaviruses, provided the foundation for coronavirus packaging studies and, so far, have remained their principal focus. The discovery and initial characterization of a coronavirus PS was carried out with the prototype lineage A betacoronavirus, mouse hepatitis virus (MHV), through experiments with defective-interfering (DI) RNAs. DI RNAs are extensively deleted gRNAs that retain the necessary cis-acting RNA elements that allow them to propagate by parasitizing the replicative machinery of the parental virus from which they originate. Comparisons of naturally arising and engineered DI RNAs that were either able or unable to be packaged into purified virions allowed the localization of the MHV PS to a portion of the replicase gene (Makino et al., 1990; van der Most et al., 1991) . A more detailed study finely mapped the PS to a 190-nt segment of rep1b (Fosmire et al., 1992) . This locus is some 20 kb downstream from the 5′ end of the genome, distant from cis-acting RNA elements required for viral RNA replication. As would be expected, it only appears in gRNA and not in the nested set of sgRNAs. Further mutational analysis identified a proposed 69-nt substructure within the 190-nt segment as the minimal functional PS (Fosmire et al., 1992) . However, subsequent studies found that larger forms of the PS were more efficient for packaging of DI RNAs or heterologous RNAs into virions of helper virus or into virus-like particles (VLPs) formed from expressed structural proteins (Bos et al., 1996; Narayanan and Makino, 2001) . Moreover, it was shown that bovine coronavirus (BCoV), another lineage A betacoronavirus, recognizes the PS of MHV, and reciprocally, MHV recognizes the PS of BCoV . A proposed model for the MHV PS was constructed through analysis of the folding of the homologous region from all lineage A betacoronavirus genomes (Chen et al., 2007) . This structure, a 95-nt bulged stem-loop, contains four copies of an AGC/GUAAU motif, each displaying an AA (or GA bulge) on its 3′ side ( Fig. 2A) . These repeat units are spaced at regular intervals, with a two-fold quasi-symmetry centered around an internal loop. It was noted by Chen et al. (2007) that their model has a different RNA fold than the previously reported 69-nt substructure of Fosmire et al. (1992) , and it is not in complete accord with that earlier study. The 95-nt bulged stem-loop structure was shown to be in good agreement with in vitro chemical and enzymatic probing experiments. It is also strongly supported by phylogenetic criteria (Fig. 2B ). The terminal loop and the repeat units are absolutely conserved in every currently known lineage A betacoronavirus. All other stem segments are preserved by covariation of base pairs. In contrast, wider divergence is allowed in the central internal loop, where numerous nucleotide substitutions or small insertions occur. Notably, the PS of equine coronavirus (EqCoV), a close relative of BCoV, contains a deletion that precisely encompasses the central bulge and one of the repeat units (Fig. 2B) . The betacoronavirus lineage A PS falls in a region of rep1b that codes for the nsp15 subunit of the replicase-transcriptase (Deng and Baker, 2018) . The PS RNA encodes a segment of nsp15 that is absent from the nsp15 sequences of betacoronavirus lineages B, C, and D, as well as those of alpha-, gamma-, and deltacoronaviruses (Fig. 2C ). The encoded polypeptide corresponds to a surface loop of MHV nsp15 (Xu et al., 2006 ) that does not exist in the nsp15 structures of SARS-CoV (Joseph et al., 2007) or MERS-CoV (Zhang et al., 2018) . Thus, there is no basis for the presumption that the same region of nsp15 contains the PS for either SARS-CoV (Hsieh et al., 2005) or MERS-CoV (Hsin et al., 2018) . The ancillary positioning of the PS-encoded protein loop suggests that this PS was a late acquisition during the evolution of the coronavirus family. Alternatively, it remains possible that this PS was originally present in a common ancestor of all coronaviruses and was subsequently lost from most of them; however, the advantage provided in vivo by selective gRNA packaging (see below) makes such a scenario less likely. Curiously, previously unnoticed PS elements appear elsewhere in the genomes of two lineage A betacoronaviruses. Three of the four isolates of rabbit coronavirus (RbCoV-HKU-14; Lau et al., 2012) contain a 116-nt insertion that, it turns out, harbors a duplication of the upper half of the native PS (Fig. 2D) . Remarkably, the remainder of the insertion carries an additional AGC/GUAAU repeat unit. Similarly, a duplication of the terminal segment of the PS can be found in a North American isolate of EqCoV (Zhang et al., 2007) , but not in two Japanese isolates (Nemoto et al., 2015) . Both the RbCoV and the EqCoV partial PS duplications occur at exactly the same site in a highly variable region of the replicase nsp3 subunit, situated between two domains designated NAB and G2M (Neuman et al., 2008) . The functional significance of these duplications, if any, remains to be determined. As discussed above, other lineages and genera of coronaviruses clearly do not possess a counterpart of the lineage A betacoronavirus PS. Among these, the only species for which packaging has been examined in depth is the alphacoronavirus transmissible gastroenteritis virus (TGEV). TGEV PS studies began with the characterization of three naturally arising DI RNAs that were found to be efficiently packaged (Méndez et al., 1996) . Notably, DI RNA-containing particles could be separated from those of virions by density gradient centrifugation, indicating that they were packaged independently of full-length gRNA. However, it is not clear if this is a general property of other packaged non-genomic RNAs, since even the smallest of the DI RNAs analyzed was quite large (9.7 kb). Dissection of smaller constructed DI RNAs, accompanied by stringent virion purification, led to the mapping of the TGEV PS to a segment spanning nucleotides 100-649 from the 5′ end of the viral genome (Escors et al., 2003) . This same segment was shown to contain the 5′ cisacting elements necessary for RNA replication. A subsequent study, which assayed packaging by duplication of the packaging region within a sgRNA of an engineered virus, reduced the downstream boundary to nt 598 ( Fig. 3A ) (Morales et al., 2013) . Further limitation of the RNA structures required for virion incorporation could not be attained through deletion analysis with this system, leading the authors to conclude that the TGEV PS encompasses the entire 5′-most 598 nt and possibly other parts of the genome. If correct, this would indicate that the TGEV PS is much more highly complex than that of MHV. A different, but more speculative, perspective on the PSs of TGEV and other alpha-and betacoronaviruses came from an analysis of the predicted RNA structures of the genomic 5′ ends of all coronaviruses (Chen and Olsthoorn, 2010) . Among the features that are conserved across all genera is an articulated bulged stem-loop, denoted SL5, which straddles the 5′ UTR-nsp1 boundary. In the alphacoronaviruses, as well as in all betacoronavirus lineages except lineage A, a set of substructures emerge from the apex of SL5. For TGEV, these SL5 insertions consist of three smaller stem-loops displaying repeats of the motif UUCCG(U/C) (Fig. 3B) ; three highly similar repeat units appear in all other alphacoronaviruses. Moreover, for the lineage B, C, and D betacoronaviruses, including SARS-CoV and MERS-CoV, SL5 contains two or three repeating substructures, each with the loop motif UUUCGU. The repetitive nature of these elements, akin to that of the lineage A betacoronavirus PS (above) and to those of the PSs of alphaviruses (Kim et al., 2011) , led to the proposal that these are the corresponding PSs of the alpha-and betacoronaviruses in which they occur (Chen and Top, a schematic of the coronavirus genome (gRNA), typified by that of mouse hepatitis virus (MHV). The 5′ end contains the rep1a and rep1b genes, which encode the viral replicase-transcriptase, while the 3′ end contains the structural protein genes S, E, M, and N. Also at the 3′ end of the genome are accessory genes (unlabeled), the locations and numbers of which vary for different viruses; those shown are for MHV. Under the genome, to the right, are the nested set of subgenomic mRNAs (sgRNAs), of which MHV has six. To the left is a schematic of the virion, depicting the essential structural proteins: spike protein (S); membrane protein (M); envelope protein (E); and nucleocapsid protein (N). Olsthoorn, 2010). A straightforward test of this hypothesis by reverse genetics has yet to be carried out. Gamma-and deltacoronaviruses, on the other hand, do not contain a version of the SL5 loop insertions. For the prototype gammacoronavirus, infectious bronchitis virus (IBV), an early study was able to show packaging of a constructed 5.9-kb DI RNA composed of a mosaic of genome segments, although further localization of a PS was not pursued (Dalton et al., 2001) . For the deltacoronaviruses, genome packaging studies may be delayed by the fact that currently only one species from this genus, porcine deltacoronavirus (PDCoV), can be propagated in vitro (Hu et al., 2015) . However, the hallmarks of a coronavirus PS appear to be its insertion into a highly conserved region of the replicase gene and its display of repeated units of RNA sequence and structure. These criteria draw attention to a 104-nt RNA segment of the rep1b gene of some deltacoronaviruses that contains six small stemloops, five of which present a repeated GUAC loop motif (Fig. 3C ). This segment falls within a 40-amino-acid insertion at the nsp13-nsp14 junction that is present in roughly half of the known deltacoronavirus species, including bulbul coronavirus (BuCoV), but is absent from other members of the genus, including PDCoV. The actual in situ role of the PS in the coronavirus genome, rather than in DI RNAs, was explored through the construction of MHV mutants in which this element was altered, deleted, or relocated (Kuo and Masters, 2013) . A mutant, named silPS, was created through the introduction of 20 coding-silent mutations into the PS (Fig. 4A) . This allowed the complete disruption of PS RNA primary sequence and secondary structure in a manner that did not modify the product of the nsp15 gene in which it resides. Analysis of extensively purified virions revealed that the silPS mutant, in contrast to the wild type, packaged large amounts of sgRNAs in addition to gRNA (Fig. 4A) . The same loss of RNA packaging specificity was observed in another mutant in which the PS was entirely deleted. Moreover, it was shown that the PS was fully functional if deleted at its native site and transposed to another locus downstream of rep1b (but still unique to gRNA). It should be noted that in infected cells, all of these mutants synthesized the same complement of gRNA and sgRNAs, and in the same amounts, as the wild type. Contrary to initial expectations, the silPS or PS-deletion mutations had little to no effect on viral plaque size or growth kinetics in tissue culture. Virions of silPS did not have a measurably different particle-to-PFU ratio compared to that of the wild type. Nevertheless, growth competition experiments demonstrated that viruses containing the PS had a fitness advantage over mutants lacking this element (Kuo and Masters, 2013) . Similarly, in a separate study, a mutant was constructed in which most of the PS was replaced by a segment encoding an epitope tag and its complement (Athmer et al., 2017) . This mutant also packaged greatly elevated amounts of sgRNA into virions, and it was demonstrated to be out-competed by wild-type MHV during serial passaging in tissue culture. Thus, the different PS mutants showed that in coronavirus replication the function of the PS is to ensure the selective incorporation of gRNA into virions, as opposed to being strictly required for gRNA packaging. The minimally defective growth phenotype of PS mutants was at first surprising, because the prior studies with DI RNAs had fixed the notion that the PS is necessary for packaging of an RNA species into virions. However, the latter may only be true for DI RNAs, because in order to be packaged they must compete with the (PS-containing) genomes of helper viruses. In the intact viral genome the function of the coronavirus PS resembles that of the alphavirus PS, which confers selective packaging of its gRNA over that of a single, more numerous sgRNA. PS-negative alphavirus mutants, like their coronavirus counterparts, package both gRNA and sgRNA (Kim et al., 2011) . In contrast, PS-negative alphavirus mutants have much more severely growth-defective phenotypes, likely reflecting a limitation on the amount of RNA that can be accommodated by an icosahedral capsid. The helical coronavirus nucleocapsid, surrounded by a pleomorphic membrane envelope, is apparently considerably more tolerant to the inclusion of extraneous RNA. The minimal phenotype of MHV packaging mutants in tissue culture raised the question of whether packaging selectivity has a significant function. To address this issue, Athmer et al. (2018) examined viral replication in the mouse and discovered that packaging plays a decisive role in evasion of host innate immunity. This was most dramatically demonstrated by constructing the set of 20 silPS mutations in the highly neurovirulent JHM strain of MHV. Mice infected with the MHV-JHM-silPS mutant had decreased weight loss and greatly enhanced survival, compared to the substantial weight loss and nearly 100% mortality of those infected with wild-type MHV-JHM. Additionally, the mutant produced lower viral titers in the brains of infected mice and was cleared more rapidly than the wild type. This attenuation was shown to be strongly dependent upon type-I interferon (IFN) signaling, as virulence of the MHV-JHM-silPS mutant was almost completely restored in IFN alpha/beta receptor-knockout (IFNAR -/-) mice. However, the mutant remained attenuated in either mitochondrial antiviral signaling protein-knockout (MAVS -/-) mice or in toll-like receptor 7-knockout (TLR7 -/-) mice. These results indicated that neither MAVS nor TLR7 signaling alone can account for attenuation, suggesting that the two pathways are redundant or else that other circuitry is involved in initiation of the IFN response to the packaging mutant virus. One interpretation of these findings would be that packaged sgRNA exhibits a pathogen-associated molecular pattern detectable by an innate immune sensor that is somehow blind to gRNA (Athmer et al., 2018) . This might suggest that MHV genome replication introduces base modifications to gRNA that are absent from, or inefficiently applied to, sgRNAs during transcription. A different, but less likely, possibility is that the PS itself antagonizes an innate immune sensor. Irrespective of mechanism, the critical in vivo impairment of MHV packaging mutants argues that gRNA packaging selectivity must be an essential property of all coronaviruses. Alternatively, coronaviruses that fail to maintain gRNA packaging selectivity would need to have evolved activities that suppress the particular components of innate immunity that become activated by packaging-negative virions. The two candidates for a viral molecular partner that recognizes the PS are the N protein and the M protein. The coronavirus N protein (ca. 50 kDa) comprises two highly basic, independently-folded domains, designated NTD and CTD, and a mostly acidic carboxy-terminal domain, N3 (Fig. 4B) . The NTD and CTD are each RNA-binding modules, and they exhibit high structural homology among different species across multiple genera (Chang et al., 2014) ; additionally, the CTD serves as the dimerization domain for N molecules. Spacer segments appear at the amino terminus and linking the three domains of N; the central spacer contains a serine-and arginine-rich region (SR). In contrast to the NTD and CTD, N3 and the spacer segments are thought to be intrinsically disordered polypeptides. The M protein (ca. 30 kDa) consists of a small ectodomain, three transmembrane domains, and a large endodomain (Fig. 4B) . The endodomain has a compact globular structure, only the final 20 residues of which constitute an accessible carboxy-terminal tail. The functional units of M are also dimers, which are connected into larger oligomeric arrays in the membrane (Neuman et al., 2011) . N protein wraps the viral genome into a helical nucleocapsid that is bound to the network of M endodomains at the site of virion budding, which is the endoplasmic reticulum-Golgi intermediate compartment. N and M molecules associate with each other via their respective carboxy termini (Fig. 4B ). Genetic and biochemical analyses have localized critical residues in both N (Hurst et al., 2005; Verma et al., 2006) and M (Escors et al., 2001; Kuo and Masters, 2002; Verma et al., 2007) that are essential for these assembly interactions. In virion ultrastructural studies, the N-M interaction has been visualized as what has been described as thread-like connections between the two proteins (Bárcena et al., 2009; Neuman et al., 2011) . M is the most abundant virion structural protein, present in a roughly 1.4:1 molar ratio to N protein (recomputed from Sturman et al., 1980) . Thus, in principle, there are sufficient M endodomain tails on the internal surface of the virion membrane to accommodate an interaction with every domain N3 in the nucleocapsid. However, there is structural evidence that virion M endodomains actually exist in two conformations, compact and elongated, with only the latter making contact with the nucleocapsid (Neuman et al., 2011) . Although N protein must possess broadly nonspecific RNA-binding activity in order to coat the entire viral genome, efforts have been made to identify sequence-specific RNA substrates that may be relevant to packaging or other functions. Molenkamp and Spaan (1997) were able to demonstrate specific binding to the MHV PS by N protein in lysates from MHV-infected cells, using either electrophoretic mobility shift assays or UV crosslinking followed by immunoprecipitation. Specificity for the PS RNA was also observed by the latter method with N purified from virions or N expressed in the absence of other viral proteins. A contrasting view of this interaction came from a study by , which made use of a filter-binding assay to gauge the affinity of RNA binding by N protein in MHV-infected cell lysates. This was applied to RNA substrates spanning the entire 5.5 kb of one of the smaller of the packaged MHV DI RNAs (van der Most et al., 1991) . From this work it was found that, although the PS was bound more efficiently Left, the primary sequence and RNA secondary structure of the wildtype (wt) MHV PS was disrupted with 20 coding-silent mutations to create a mutant designated silPS (Kuo and Masters, 2013) . Altered nucleotides are those highlighted in blue in the wt PS structure and in green in the silPS structure. Right, a Northern blot of RNA from highly purified virions of wt MHV and the silPS mutant detected with a probe specific for the 3′ end of the genome, which is common to gRNA and all sgRNAs. (B) Linear schematics of the MHV N and M proteins. In the N protein, there are two RNA-binding structural domains (NTD and CTD) and the carboxy-terminal domain (N3). At the amino terminus, and connecting the three domains, are unstructured segments; the central spacer contains a serine-and arginine-rich region (SR). In the M protein, the aminoterminal ectodomain (ecto) is connected to the carboxy-terminal endodomain (endo) by three transmembrane segments (Tm). Red, loci of substitutions or point mutations of N protein, in either the CTD (Kuo et al., 2014) or in N3 (Kuo et al., 2016b) , which separately abolish packaging selectivity in viruses that contain a wt PS. Green, loci of critical residues in N protein (Hurst et al., 2005; Verma et al., 2006) and M protein (Escors et al., 2001; Kuo and Masters, 2002; Verma et al., 2007) essential for N-M virion assembly interactions (arrow). than non-viral RNA, the regions of the DI RNA having the highest efficiency of binding by N protein were located outside of the PS. This result did not appear to support a role in which the PS serves simply as the primary nucleation site for gRNA encapsidation. The determination that there are multiple genomic regions of preferred binding by N protein was consistent with the observation by multiple groups that anti-N antibodies coimmunoprecipitated all positive-sense viral RNAs, both gRNA and all sgRNAs, from MHV-or BCoVinfected cell lysates (Baric et al., 1988; Narayanan et al., 2000) . Further pursuit of this observation by Narayanan et al. (2000) led to unexpected results. These investigators characterized the N-M interaction in infected cells, showing that anti-N antibody could coimmunoprecipitate M protein, and conversely, anti-M antibody could coimmunoprecipitate N protein. Moreover, pulse-labeling experiments demonstrated that M protein brought down by anti-N antibody was mostly unglycosylated; the O-glycosylation of MHV M protein occurs in the Golgi. Therefore, M and N associated in a pre-Golgi compartment assumed to be the virion budding site. Most significantly, in contrast to the precipitation of the full spectrum of viral RNA species by anti-N antibody, anti-M antibody coimmunoprecipitated only gRNA. Finally, it was found that the N-M interaction could not be initiated merely by coexpression of N and M, suggesting a requirement for some other viral component. In a follow-up report, the PS was identified as that other required viral component (Narayanan and Makino, 2001) . Analysis of a large set of constructed DI RNAs demonstrated that all of them were intracellularly associated with N protein, irrespective of the presence or absence of the PS. However, intracellular association of M protein with a given DI RNA correlated completely with the packaging ability of that DI RNA, which, in turn, was absolutely dependent upon the presence of the PS. Furthermore, it was shown that in MHV-infected cells N protein became bound to an expressed (nonreplicating) nonviral RNA (CAT gene RNA), whether or not it harbored the PS. Only the version of CAT gene RNA containing the PS was bound by M protein and was packaged into virions. This work was brought to a surprising conclusion in a subsequent study in which the basic coimmunoprecipitation experiments were recapitulated using only MHV components that were expressed from Sindbis virus vectors, in addition to RNA substrates that were produced from transfected plasmids with vaccinia virus-expressed T7 RNA polymerase (Narayanan et al., 2003) . Here it was shown that anti-M antibody precipitated CAT gene (PS+) RNA, but not CAT gene (PS-) RNA, from cells that were expressing both M protein and N protein. Remarkably, the same result was obtained if only M protein, but not N protein, was expressed. Finally, it was demonstrated that CAT gene (PS +) RNA was packaged into VLPs produced solely by expression of the MHV M and E proteins, in the absence of N protein. These results strongly implied that M protein (possibly with the assistance of E) is the viral component that recognizes and binds to the PS, and that formation of a nucleocapsid with N protein is not required for packaging specificity. The presence in the coronavirus N protein of two distinct RNAbinding domains, the NTD and the CTD, is an unusual characteristic for an RNA virus nucleocapsid protein (Chang et al., 2014) . To probe the functional significance of this arrangement, a genetic study was carried out in which each of these domains in the MHV N protein was substituted by its counterpart from the SARS-CoV N protein (Kuo et al., 2014) . The NTDs and CTDs of MHV and SARS-CoV have 44% and 35% amino-acid identity, respectively. This degree of homology was expected to be sufficiently close to allow such substitutions to be viable, while also being sufficiently divergent to uncover sequence-specific interactions between N protein domains and other viral components. Indeed, both substitutions were well tolerated by MHV, and neither affected the N protein-M protein interaction or the fidelity of the leaderbody sgRNA junctions formed during transcription. However, substitution of the CTD profoundly affected packaging. Notably, the SARS-CoV CTD substitution mutant had the same packaging-defective phenotype as that of the silPS mutant (above, Fig. 4A ), despite having a completely wild-type copy of the PS. By contrast, the NTD substitution mutant had a wild-type packaging phenotype. Further construction of partial SARS-CoV chimeras of the MHV N CTD showed that substitution by just a 30-amino-acid central region of the SARS-CoV CTD was sufficient to abolish packaging (Fig. 4B ). As discussed above, the MHV PS is unique to lineage A betacoronaviruses, and the genome of SARS-CoV (a lineage B betacoronavirus) does not contain a homolog of this RNA structure (Chen and Olsthoorn, 2010) . Thus, the SARS-CoV N protein likely would not have evolved to recognize the MHV PS. This outcome was reinforced by the subsequent demonstration that a mutant with a substitution of the even more divergent N protein CTD of the alphacoronavirus TGEV also exhibited a packaging-defective phenotype (Kuo et al., 2016b) . These results provided strong support for a role for the CTD of the N protein in recognition of the PS. This finding (distinguishing the CTD, not the NTD) was somewhat counterintuitive, based on the many unliganded structures that have been solved for coronavirus N protein NTDs and CTDs (Chang et al., 2014) . The NTD displays a beta-platform presenting a potential RNA-binding groove rich in basic and aromatic amino-acid residues, the latter of which are speculated to be able to form sequence-specific contacts with RNA bases. The CTD dimer, on the other hand, is a rectangular slab with a putative RNA-binding groove on one face that is lined with basic residues thought to be best suited for nonspecific interactions with the phosphodiester backbone. Nevertheless, the exact character of RNA binding by either the NTD or CTD awaits determination of structures containing bound RNA. A separate region of the N molecule was later shown to also be involved in PS recognition (Kuo et al., 2016b) . This finding resulted from examination of the effect of complete or partial replacement of the MHV M protein by the M protein of SARS-CoV. Such substitutions abolished packaging, but it could not be concluded that this defect was solely due to M. The construction of SARS-CoV M chimeric mutants was possible only because they also incorporated substitution of the SARS-CoV domain N3, the carboxy-terminal tail of N protein that is necessary and sufficient for virion assembly interactions between N and M. To clarify this outcome, multiple previously isolated MHV domain N3 mutants (all containing wild-type MHV M protein) were screened for their packaging ability, and some were found to have lost selective packaging of gRNA. As with the CTD mutants, packaging-defective domain N3 mutants had the same phenotype as that of the silPS mutant but had a completely intact wild-type PS. Comparison of packagingcompetent and packaging-defective viruses allowed mapping of the packaging defect to a 9-amino-acid segment of domain N3 (Fig. 4B ) that was adjacent to, but distinct from, key residues that interact with M protein in virion assembly (Hurst et al., 2005) . Together, these studies implicated both the N protein CTD and domain N3 as major determinants of PS recognition, a conclusion markedly in contrast with that of prior studies that suggested the primary role for M protein. To date, genetic evidence has not been obtained that could either rule in or rule out participation of the M protein in PS recognition, in part because the M protein endodomain is much less tolerant to mutational alteration than is the N protein (Kuo et al., 2016a) . MHV packaging studies based on genetic manipulations and studies based on analyses of DI RNAs and VLPs thus seem at odds as to whether M protein or N protein is the dominant player in choosing gRNA for virion assembly. However, these two sets of results are not necessarily mutually exclusive, and they possibly will be brought into alignment by further work. The apparent discrepancy between the outcomes of the two approaches may be analogous to the difference, discussed above, between the observed role of the MHV PS in DI RNA systems as opposed to its role in the intact viral genome. At this stage, the currently available data suggest three types of models for how coronavirus packaging selectivity is achieved (Fig. 5) . One model, which is derived from the genetic studies (Kuo et al., 2014 and Kuo et al., 2016b) , builds upon the known roles of the CTD as an RNA-binding module (Chang et al., 2014) and of domain N3 as the sole region of N that interacts with M protein (Hurst et al., 2005; Verma et al., 2006; Kuo et al., 2016a) . In this proposed mechanism, the largely acidic domain N3 is originally sequestered by the CTD and then becomes dislodged, directly or indirectly, only as a result of the CTD binding to the PS (Fig. 5A) . This may come about because the PS competes for the same site on the CTD that is occupied by the N3 peptide, or alternatively, because binding of the PS induces a conformational change in the CTD that frees N3 from a separate site. The resulting CTD-PS interaction would nucleate cooperative binding of N monomers along the whole length of the gRNA, with each monomer releasing its N3 domain to bind to an M monomer endodomain. The key feature of this model is that domain N3 is not available to interact with M protein until the CTD has bound to the PS. Thereby, RNA molecules that do not contain the PS are excluded from packaging, even though they may be nonspecifically bound by other N monomers. This model invokes a mode of RNA-binding specificity similar to that observed for the spliceosomal protein U1A and its target substrate, U1 hairpin II RNA (Law et al., 2013) . In free U1A, a short carboxyterminal helix occludes the beta-sheet platform of the RNA-binding domain of the molecule; only the sequence-specific U1 hairpin II can initiate events that displace the helix to form part of the RNA-binding surface. The role of domain N3 in the model is also akin to the chaperone-like function of the amino terminus of the paramyxovirus phosphoprotein, which maintains its nucleocapsid protein in an open state to prevent nonspecific RNA binding (Yabukarski et al., 2014) . A second model, which directly follows from the results of Narayanan et al. (2003) , proposes that the M protein endodomain, which has oligomerized on the intracellular membrane of the assembly site, specifically recognizes the PS. This M-PS binding event then serves as the nucleation point for the condensation of all other M-N and N-N interactions that drive budding (Fig. 5B) . Although N protein is most likely the first virion structural protein to encounter the genome during, or shortly following, gRNA synthesis in the replication compartment (Bost et al., 2001; Stertz et al., 2007) , its initial mode of binding to gRNA is proposed to be the same as that to all sgRNAs. The specific binding of M to the PS is thus envisioned to trigger formation of the helical ribonucleoprotein by N and concomitant incorporation of the nucleocapsid into nascent virions (Narayanan and Makino, 2001; Narayanan et al., 2003) . This would suggest that in each assembled virion, there exists a unique point of protein-gRNA contact, between M and the PS, which cannot be discerned at the current level of resolution of ultrastructural work (Bárcena et al., 2009; Neuman et al., 2011) . At the time this model was proposed there was no precedent for specific RNA binding by a transmembrane protein (Narayanan et al., 2003) . Subsequently, however, there have emerged multiple examples of RNA recognition by membrane-bound cellular or viral proteins (Einav et al., 2008; Hsu et al., 2018) . In particular, components of the essential Sec61 protein translocation pore complex form structurally well-defined bridges with certain rRNA helices of the large ribosomal subunit (Becker et al., 2009) . A third possible model is that neither the N protein nor the M protein by itself can bind to the PS, but that prior association of domain N3 and the M endodomain creates a surface that recognizes the PS (Fig. 5C ). This model would seem to be ruled out by the existence of N3 mutants that have fully retained normal N-M assembly but have lost selective gRNA packaging (Kuo et al., 2016b) . Nevertheless, it has been shown that not only the two structural domains, the NTD and CTD, but also unstructured segments of N, including N3, contribute to the affinity of RNA binding (Chang et al., 2009 ). Thus, it is conceivable that a unique conformational state of N induced by association with M could generate or stabilize a specific PS-binding site. Considerable further interrogation of the above models remains to be done to obtain a more complete understanding of coronavirus genome packaging. Genetic methods have yielded insights into the participation of N protein, but attempts to construct MHV mutants with partially chimeric M protein endodomains have been largely unsuccessful (Kuo et al., 2016b) . A more productive approach might be to target basic amino-acid residues in the M endodomain that are common to lineage A betacoronaviruses but are not conserved in the other lineages. For N protein, a similar strategy applied to clusters of charged surface residues of the CTD could pinpoint interactions with domain N3 or the PS. Complementary biochemical investigations need to be undertaken to examine the RNA-binding properties of full-length N protein, as well constructs of the CTD and CTD-N3. The methods used in many recent studies to examine N binding to nonspecific RNA and DNA (Chang et al., 2009 (Chang et al., , 2014 ought to now be applied to PS RNA. In vitro analysis of RNA binding by M protein would also be important but necessarily more difficult owing to the need to maintain M in a oligomeric membrane-bound state. Perhaps the largest contribution to be made to our knowledge of packaging mechanisms will come from structural biology. Understanding virion assembly cannot be separated from the need to more generally comprehend the molecular details of coronavirus N-N and N-RNA interactions. Although structures are currently available for 6 NTDs and 4 CTDs from different viruses, there is as yet no structure for an RNA-bound complex of either of these domains. Likewise, no structure has been solved for an entire N protein, with or without an RNA ligand, but a recent low-resolution cryo-EM reconstruction of MHV N has taken the first step in that direction (Gui et al., 2017) . At the next level, we lack a clear picture of the coronavirus nucleocapsid. More basically, we lack even fundamental parameters, such as the stoichiometry of N to RNA. One speculative model of the nucleocapsid has 7 nucleotides of RNA being bound by one monomer of N (Chang et al., 2014) ; however, another estimate places this ratio at 14 to 40 nucleotides per N monomer (Neuman et al., 2011) . A high-resolution nucleocapsid structure would provide details of how each NTD and CTD encapsidates gRNA and why the helical coronavirus nucleocapsid is much more flexible than the helical nucleocapsids of rhabdo-and paramyxoviruses. Similarly, a more detailed model of M protein structure, oligomerization, and M-N interactions in the virion would be highly valuable, although this will be a still more technically daunting achievement. Another important future goal would be to proceed beyond lineage A betacoronaviruses to identify PSs across all four genera of the coronavirus family. It is possible that suitably constructed MHV chimeras could be used to trap heterologous PSs. Although PS identity is genusor lineage-specific, the precise definition of more of these RNA elements and their interacting protein partners may reveal commonalities that are not currently apparent and shed light on unifying principles of coronavirus PS recognition. Finally, moving beyond lineage A betacoronaviruses would create the potential for utilization of knowledge about genomic RNA packaging to devise antiviral strategies for pathogenic human coronaviruses. The discovery that a packaging-defective MHV mutant was markedly suppressed by host innate immunity suggests an attractive pathway toward live-attenuated vaccine design (Athmer et al., 2018) . The PS itself may also become a candidate for therapeutic intervention as RNA structures are being increasingly developed as small-molecule druggable targets (Anokhina et al., 2019; Ingemarsdotter et al., 2018) . Additionally, more precise elucidation of packaging-specific oligomeric interactions could uncover molecular targets that would hinder the escape of drug-resistant viral mutants (Tanner et al., 2014) . Enhancing the ligand efficiency of anti-HIV compounds targeting frameshift-stimulating RNA In situ tagged nsp15 reveals interactions with coronavirus replication/transcription complex-associated proteins Selective packaging in murine coronavirus promotes virulence by limiting type I interferon responses Cryo-electron tomography of mouse hepatitis virus: insights into the structure of the coronavirion Interactions between coronavirus nucleocapsid protein and viral RNAs: implications for viral transcription Biochemistry and molecular biology of flaviviruses Structure of monomeric yeast and mammalian Sec61 complexes interacting with the translating ribosome Packaging of the influenza virus genome is governed by a plastic network of RNA-and nucleoprotein-mediated interactions Evidence that viral RNAs have evolved for efficient, two-stage packaging The production of recombinant infectious DI-particles of a murine coronavirus in the absence of helper virus Mouse hepatitis virus replicase protein complexes are translocated to sites of M protein accumulation in the ERGIC at late times of infection The SARS coronavirus nucleocapsid protein-forms and functions Multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging New structure model for the packaging signal in the genome of group IIa coronaviruses Group-specific structural features of the 5'-proximal sequences of coronavirus genomic RNAs One influenza virus particle packages eight unique viral RNAs as shown by FISH analysis Identification of a bovine coronavirus packaging signal Identification of nucleocapsid binding sites within coronavirus-defective genomes Cis-acting sequences required for coronavirus infectious bronchitis virus defective-RNA replication and packaging An "old" protein with a new story: coronavirus endoribonuclease is important for evading host antiviral defenses SARS and MERS: recent insights into emerging coronaviruses Discovery of a hepatitis C target and its pharmacological inhibitors by microfluidic affinity analysis Transmissible gastroenteritis coronavirus packaging signal is located at the 5' end of the virus genome The membrane M protein carboxy terminus binds to transmissible gastroenteritis coronavirus core and contributes to core stability Identification and characterization of a coronavirus packaging signal Electron microscopy studies of the coronavirus ribonucleoprotein complex Bovine coronavirus mRNA replication continues throughout persistent infection in cell culture Assembly of severe acute respiratory syndrome coronavirus RNA packaging signal into virus-like particles is nucleocapsid dependent Nucleocapsid protein-dependent assembly of the RNA packaging signal of Middle East respiratory syndrome coronavirus Oncoprotein AEG-1 is an endoplasmic reticulum RNA-binding protein whose interactome is enriched in organelle resident protein-encoding mRNAs Isolation and characterization of porcine deltacoronavirus from pigs with diarrhea in the United States A major determinant for membrane protein interaction localizes to the carboxy-terminal domain of the mouse coronavirus nucleocapsid protein An RNAbinding compound that stabilizes the HIV-1 gRNA packaging signal structure and specifically blocks HIV-1 RNA encapsidation Crystal structure of a monomeric form of severe acute respiratory syndrome coronavirus endonuclease nsp15 suggests a role for hexamerization as an allosteric switch Structure of the HIV-1 RNA packaging signal NMR detection of intermolecular interaction sites in the dimeric 5'-leader of the HIV-1 genome Conservation of a packaging signal and the viral genome RNA packaging mechanism in alphavirus evolution A key role for the carboxy-terminal tail of the murine coronavirus nucleocapsid protein in coordination of genome packaging Genetic evidence for a structural interaction between the carboxy termini of the membrane and nucleocapsid proteins of mouse hepatitis virus Functional analysis of the murine coronavirus genomic RNA packaging signal Analyses of coronavirus assembly interactions with interspecies membrane and nucleocapsid protein chimeras Recognition of the murine coronavirus genomic RNA packaging signal depends on the second RNA-binding domain of the nucleocapsid protein Isolation and characterization of a novel betacoronavirus subgroup A coronavirus, rabbit coronavirus HKU14, from domestic rabbits The role of the C-terminal helix of U1A protein in the interaction with U1hpII RNA Analysis of efficiently packaged defective interfering RNAs of murine coronavirus: localization of a possible RNA-packaging signal The molecular biology of coronaviruses Fields Virology Alphavirus nucleocapsid packaging and assembly Molecular characterization of transmissible gastroenteritis coronavirus defective interfering genomes: packaging and heterogeneity Identification of a specific interaction between the coronavirus mouse hepatitis virus A59 nucleocapsid protein and packaging signal Transmissible gastroenteritis coronavirus genome packaging signal is located at the 5' end of the genome and promotes viral RNA incorporation into virions in a replication-independent process Cooperation of an RNA packaging signal and a viral envelope protein in coronavirus RNA packaging Nucleocapsid-independent specific viral RNA packaging via viral envelope protein and viral RNA signal Characterization of the coronavirus M protein and nucleocapsid interaction in infected cells Complete genome analysis of equine coronavirus isolated in Japan Supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy Proteomics analysis unravels the functional repertoire of coronavirus nonstructural protein 3 A structural analysis of M protein in coronavirus assembly and morphology Functional coupling between replication and packaging of poliovirus replicon RNA 7SL RNA, but not the 54-kd signal recognition particle protein, is an abundant component of both infectious HIV-1 and minimal virus-like particles New world arenavirus biology Coronavirus subgenomic minus-strand RNAs and the potential for mRNA replicons Limits of variation, specific infectivity, and genome packaging of massively recoded poliovirus genomes The intracellular sites of early replication and budding of SARS-coronavirus Isolation of coronavirus envelope glycoproteins and interaction with the viral nucleocapsid A domain at the 3' end of the polymerase gene is essential for encapsidation of coronavirus defective interfering RNAs Identification of functionally important negatively charged residues in the carboxy end of mouse hepatitis coronavirus A59 nucleocapsid protein Importance of the penultimate positive charge in mouse hepatitis coronavirus A59 membrane protein New antiviral target revealed by the hexameric structure of mouse hepatitis virus nonstructural protein nsp15 Structure of Nipah virus unassembled nucleoprotein in complex with its viral chaperone Genomic characterization of equine coronavirus Structural and biochemical characterization of endoribonuclease nsp15 encoded by Middle East respiratory syndrome coronavirus Presence of subgenomic mRNAs in virions of coronavirus IBV The coronavirus replicase