key: cord-0721102-3jijeemg authors: Nicholson, Beth L.; White, K. Andrew title: Functional long-range RNA–RNA interactions in positive-strand RNA viruses date: 2014-06-16 journal: Nat Rev Microbiol DOI: 10.1038/nrmicro3288 sha: 65a8a944f412a54d87240084799a76317a49f0af doc_id: 721102 cord_uid: 3jijeemg Positive-strand RNA viruses are important human, animal and plant pathogens that are defined by their single-stranded positive-sense RNA genomes. In recent years, it has become increasingly evident that interactions that occur between distantly positioned RNA sequences within these genomes can mediate important viral activities. These long-range intragenomic RNA–RNA interactions involve direct nucleotide base pairing and can span distances of thousands of nucleotides. In this Review, we discuss recent insights into the structure and function of these intriguing genomic features and highlight their diverse roles in the gene expression and genome replication of positive-strand RNA viruses. SUPPLEMENTARY INFORMATION: The online version of this article (doi:10.1038/nrmicro3288) contains supplementary material, which is available to authorized users. less prevalent, but can also have important functional consequences for viral gene expression and/or genome replication in plant and animal positive-strand RNA viruses (TABLE 1) . Such long-range interactions were first reported for positive-strand RNA bacteriophages in the 1990s 8- 11 and, since then, have been discovered in various positive-strand RNA viruses that infect eukaryotes [4] [5] [6] [7] . All of the long-range interactions that have been characterized so far involve Watson-Crick RNA base pairing, and they often include sequences that are positioned in loop regions or in internal bulges within RNA structures. In this Review, we focus on the burgeoning field of interactions that span large distances, that is, ≥1,000 nucleotides, and we discuss how these interactions modulate viral processes. In particular, we focus on: the regulation of translation initiation by 3ʹ cap-independent translational enhancers (3ʹ CITEs) and internal ribosome entry sites (IRESs); translational recoding events, including ribosomal frameshifting and stop codon readthrough; genome replication; and sgm-RNA transcription. We then consider the possible roles and regulatory mechanisms of such interactions and discuss how they function in the complex context of viral RNA genomes. Last, we discuss outstanding questions that will determine the future directions of research into long-range intragenomic RNA-RNA interactions. Positive-strand RNA viral genomes are directly translated into viral proteins by host ribosomes, and a 3ʹ cap-independent translational enhancers (3ʹ CITEs). Functional folded regions of RNA in or near to the 3ʹ UTRs of certain positive-strand RNA virus genomes; they bind to translation initiation factors or ribosomal subunits to facilitate translation. Internal ribosome entry sites (IRESs) . Functional folded regions of RNA in or near to the 5ʹ UTRs of certain positive-strand RNA virus genomes (and some cellular mRNAs) that recruit ribosomes directly to an internal location close to the initiation codon independently of a 5ʹ cap structure. A translational recoding event in which a proportion of translating ribosomes change their reading frame at a defined point in the mRNA, resulting in the addition of an alternative carboxy-terminal extension. minority are generated by translational recoding mechanisms that involve either ribosomal frameshifting or stop codon readthrough 12 . Regardless of the translational strategy that is used, all viral genomes compete against abundant cellular mRNAs for ribosomes. Accordingly, these RNA genomes contain structural features that assist in the recruitment of the host translational machinery. Some viruses use the conventional terminal structures of mRNAs, that is, the 5ʹ cap and the 3ʹ poly(A) tail 13 . In other cases, less typical elements are used, such as 3ʹ CITEs 5, 6 or IRESs 14 , which are located in genomic 3ʹ and 5ʹ UTRs, respectively. Some of the viruses that use 3ʹ CITEs or IRESs, or that rely on translational recoding mechanisms, also require associated longrange RNA-RNA interactions. 3ʹ CITE-mediated translation. 3ʹ CITEs have been identified in several genera of the virus family Tombusvirida e, as well as in the related genera Luteovirus (family Luteoviridae) and Umbravirus (family unclassified) 5, 6 . These positive-strand RNA genomes lack both 5ʹ caps and 3ʹ poly(A) tails and instead rely on 3ʹ CITEs for protein translation. Several different structural classes of 3ʹ CITE have been uncovered, including a 68 nucleotide RNA stem-loop 15 and more complex multihelix pseudoknot 16, 17 or tRNA-shaped structures 18, 19 . Most 3ʹ CITEs function by binding to the ribosome-recruiting eukaryotic translation initiation factor 4F (eIF4F) complex via one or both of its eIF4G and eIF4E subunits 15, 16, [20] [21] [22] , whereas others bind directly to ribosomal subunits 23, 24 . Thus, 3ʹ CITEs can recruit ribosomes by either direct or indirect means. Despite their structural and functional diversity, 3ʹ CITEs share a common relative position near to, or within, the 3ʹ UTRs of viral genomes. Paradoxically, this 3ʹ-proximal position places them, and the recruited translational machinery, at the opposite end to the site of translation initiation. Consequently, many of these viruses use long-range RNA-based interactions to relocate 3ʹ CITEs that are bound to the translational machinery to positions that are close to their cognate 5ʹ UTRs. In the genome of barley yellow dwarf virus (BYDV; genus Luteovirus) 25 , complementary sequences that are located in the 5ʹ UTR and 3ʹ CITE form a long-distance RNA-RNA bridge via a base-pair-mediated kissing-loop interaction 25 . The importance of this interaction for viral translation was confirmed by compensatory mutational analysis, whereby disruption, and then restoration, of base-pairing potential by substitutions correlated with low and high levels of translational activity, respectively 25 . Similar functional base-pairing interactions between 5ʹ UTRs and 3ʹ CITEs have also been shown for different members of the genus Tombusvirus, including maize necrotic streak virus (MNeSV), carnation Italian ringspot virus (CIRV) and tomato bushy stunt virus (TBSV) 15, 22, [26] [27] [28] , as well as for saguaro cactus virus of the genus Carmovirus 29 . These findings have led to a general model for 3ʹ CITE activity, in which simultaneous binding of the 3ʹ CITE to both the eIF4F complex and the 5ʹ UTR enables the eIF4F-mediated recruitment of the 40S ribosomal subunit to the 5ʹ end 15, 16, 21, 25, 27, 30 (FIG. 2a) . In vitro studies with the MNeSV 3ʹ CITE have confirmed the formation of the proposed tripartite 5ʹ UTR−3ʹ CITE-eIF4F complex and its requirement for efficient ribosome recruitment to the 5ʹ-proximal start codon 15 . 3ʹ CITE-dependent translation is more complicated in pea enation mosaic virus (PEMV; Umbravirus) 31 . PEMV contains two different types of 3ʹ CITE: a panicum mosaic virus-like translational enhancer (PTE), which is located in its 3ʹ UTR and binds to eIF4F 16, 17 , and a kissing-loop T-shaped structure (kl-TSS), which is positioned immediately upstream of the PTE and binds directly to 60S ribosome subunits 24 (FIG. 2b) . Accordingly, both direct and indirect modes of ribosome recruitment probably occur in PEMV. However, unlike the PTEs that have been identified in other viruses 29 , the PTE in PEMV does not interact with the 5ʹ end of the viral genome 16 . Instead, the adjacent kl-TSS engages in a long-distance interaction with a 5ʹ-proximal hairpin, thereby uniting the terminal regions 31 (FIG. 2b) . The specific contribution of each of the 3ʹ CITEs to the PEMV translational process remains to be fully determined; however, it is clear that the kl-TSS-mediated long-range interaction could be beneficial to the activity of both of the 3ʹ CITEs by repositioning them close to the site of translation initiation. IRESs are structured RNA elements that recruit ribosomes -either directly or with the assistance of cellular proteins -to the vicinity of a start codon 14 The viral protein that is encoded at the 5ʹ end, p1 (for example, the viral RNA-dependent RNA polymerase (RdRp)), is translated directly from the genome. RdRp synthesizes a full-length complementary negative-strand RNA using the genome as a template, which is subsequently used for the synthesis of progeny genomes. Certain viral genomes also function as templates for the transcription of subgenomic mRNAs (sgmRNAs), which is a process that is also mediated by the viral RdRp and that involves a negative-strand intermediate. The ORFs of additional viral genes that are located downstream of the first ORF are generally not efficiently translated, owing to poor ribosome access (not shown). The transcription of smaller viral sgmRNAs enables the expression of these additional proteins (for example, p2), as downstream ORFs in sgmRNAs are relocated to 5ʹ-proximal positions, which enables efficient ribosome access. A translational recoding event in which a proportion of ribosomes read a stop codon as a sense codon, which results in a carboxy-terminally extended protein. A 7-methyl guanine nucleotide that is linked via a 5ʹ-to-5ʹ triphosphate to the 5ʹ-terminal nucleotide of eukaryotic mRNAs and some viral RNA genomes. It is bound by the cap-binding protein, eukaryotic translation initiation factor 4E (eIF4E), which is part of the eIF4F complex that recruits the small ribosomal subunit. A chain of consecutive adenine nucleotides that is present at the 3ʹ end of eukaryotic mRNAs and of some viral RNAs. The poly(A) tail is bound by poly(A)-binding protein, which mediates increased mRNA stability and enhanced protein translation. Regions within cellular mRNAs or viral RNA genomes that do not encode proteins. These regions, which are most commonly located at 5ʹ or 3ʹ ends, often contain cis-acting RNA elements that regulate RNA activities. A higher-order RNA structure that involves base pairing between the loop of a stemloop structure and an upstream or downstream RNA sequence that is located outside of the stem-loop. (eIF4F). A translation initiation factor complex that consists of three subunits: the cap-binding protein eIF4E; the RNA helicase eIF4A; and eIF4G, which recruits the small ribosomal subunit. regions that are far downstream of translation initiation sites. For example, the 5ʹ-uncapped but 3ʹ-polyadenylated genome of the picornavirus foot-and-mouth disease virus (FMDV) contains a 5ʹ IRES that is positively regulated by interactions with the genomic 3ʹ UTR 32 . In vitro, this 3ʹ UTR participates in two long-range interactions with the 5ʹ UTR -one with the IRES and the other with a 5ʹ-terminal S-region that is involved in genome replication 33 (FIG. 2c) . Although the specific sequences involved were not identified, the interaction of the 3ʹ UTR with the IRES was found to be independent of its interaction with the S-region, and the two interactions could not form simultaneously. Therefore, the detected contacts could potentially modulate both translation and genome replication. Negative regulators of viral IRES activity have also been identified. The genome of the pestivirus classical swine fever virus (CSFV) lacks both a 5ʹ cap and a 3ʹ poly(A) tail, and the 3ʹ UTR inhibits the translational activity of the IRES in the 5ʹ UTR 34 . The negative regulatory sequence that affected IRES activity was mapped to a 3ʹ-terminal RNA hairpin that ends with CGGCCC-OH. This terminal sequence was also found to be complementary to a sequence that is located in the ribosomebinding region of the IRES (FIG. 2d) , which suggests that a CGGCCC-IRES base-pairing interaction inhibits ribosome recruitment 34 . Regulation in hepatitis C virus (HCV) is more complex and involves a network of RNA-RNA interactions. The HCV genome does not contain a 5ʹ cap or a 3ʹ poly(A) tail but instead has an IRES that binds directly to the 40S ribosomal subunit. IRES activity is downregulated by a long-range RNA-RNA interaction [35] [36] [37] that occurs between the apical loop of helix IIId in the IRES and a bulge in an essential 3ʹ-proximal cis-acting replication element in the coding region of the non-structural protein 5B (NS5B), which is known as 5BSL3.2 (REF. 35 ) (FIG. 2e) . Interestingly, the same 5BSL3.2 bulge sequence also mediates genome replication by interacting with a nearby upstream sequence located around nucleotide 9110 (FIG. 2e) . Since the two interactions are equally probable in a thermodynamic context 41 , shifting of the conformational equilibrium between the two interactions could regulate viral translation and genome replication 39, 41, 42 . In addition, there is both genetic and structural evidence that the terminal loop of 5BSL3.2 base pairs with the 3ʹ SL2 element, which is an RNA element that is located in the 3ʹ UTR and is involved in genome replication [39] [40] [41] 43 (FIG. 2e) . Accordingly, 5BSL3.2 functions as a central hub for a network of interactions that collectively modulate IRES-mediated translation and genome replication. Intriguingly, some uncapped and non-polyadenylated virus genomes use both a 5ʹ IRES and a 3ʹ CITE; for example, the plant nepovirus blackcurrant reversion virus (order Picornavirales) uses a hybrid 5ʹ IRES−3ʹ CITEmediated translation mechanism and also requires a long-distance interaction between the 5ʹ and 3ʹ genomic ends for optimal translation [44] [45] [46] . Although the details of the individual or combined functions of the 5ʹ IRES and 3ʹ CITE remain to be determined, it was suggested that this terminal interaction might help to facilitate the re-recruitment of terminated ribosomes 46 . Indeed, this potential function in ribosome recycling is also applicable to some of the above examples in which 5ʹ−3ʹ interactions increase translational efficiency. Translational recoding. Recoding via stop codon readthrough or ribosomal frameshifting leads to the production of carboxy-terminally extended proteins. In certain viruses, functional long-range base-pairing interactions were found to be required for both of these types of recoding events [47] [48] [49] [50] , and the involvement of such interactions is particularly prevalent in positive-strand RNA plant viruses 12 . The most common form of ribosomal frameshifting involves a small proportion of elongating ribosomes moving backwards one base and then resuming translation in the new -1 reading frame 12 . This process is facilitated by a 'slippery' heptanucleotide sequence at the frameshift site and a stimulatory RNA structure that is located a few nucleotides downstream 12 (FIG. 3a) . In addition, in the plant viruses BYDV and red clover necrotic mosaic virus (RCNMV; genus Dianthovirus), the efficient -1 frameshifting that produces their viral RdRps requires base pairing between their proximal stimulatory RNA structures and complementary sequences that are located ~4,000 and ~2,500 nucleotides downstream, respectively 47, 48 . In BYDV, a bulge in the stimulatory RNA structure next to the frameshift site interacts with the terminal stem-loop of an RNA hairpin near to the 3ʹ end of the genome (FIG. 3a) , and a similar interaction occurs in RCNMV 48 . In addition to mediating , which is located in the 3ʹUTR and binds to eIF4F; and a kissing-loop T-shaped structure (kl-TSS), which is positioned immediately upstream and binds directly to the 60S ribosome subunit and mediates long-distance RNA-RNA base pairing with a 5ʹ-proximal hairpin. c-e | Internal ribosome entry site (IRES)-mediated translation. c | The foot-and-mouth disease virus (FMDV) IRES is stimulated by the 3ʹ UTR, which engages in long-range contacts with two regions in the 5ʹ UTR: the IRES and a region that has been shown to be involved in genome replication, which is known as the S-region. The specific sequences that are involved in this interaction have not been identified and the interaction with the S-region may modulate genome replication. d | The 3ʹ-terminal hexamer CGGCCC in classical swine fever virus (CSFV) is a negative modulator of IRES-mediated translation and may confer its inhibition by pairing with a ribosome-binding region in the IRES, thus blocking ribosome binding. e | Hepatitis C virus (HCV) IRES activity is negatively regulated by an interaction between helix IIId of the IRES and a bulge in the structure 5BSL3.2, which is located in the coding region of non-structural protein 5B (NS5B). The same bulge in 5BSL3.2 also interacts with a genomic sequence around position 9110 of the genome, and the terminal loop of 5BSL3.2 can pair with the 3ʹ SL2 element located in the 3ʹ UTR, which may modulate genome replication. These interactions may coordinate viral translation and genome replication. frameshifting, the interaction is also proposed to assist in the coordination of translation and negative-strand synthesis, which are directionally opposed processes for an RNA genome 47, 48 . The interactions between the stimulatory RNA structures and the 3ʹ-proximal RNA hairpins in BYDV and RCNMV are presumed to occur intramolecularly; however, there has been a recent report of a -1 ribosomal frameshifting event that is enhanced by an intermolecular genomic interaction 49 . In the severe acute respirator y syndrome coronavirus (SARS-CoV), such an interaction involves a palindromic loop sequence in a local pseudoknot that is positioned just downstream of the frameshift site. The palindromic loop sequences in two SARS-CoV genomes form a kissing-loop structure that increases frameshifting efficiency in vitro. Mutations that disrupted the base pairing abolished dimerization, reduced frameshifting and inhibited the accumulation of viral RNA in infected cells 49 . The disruption also affected the ratio of genomic RNA to sgmRNA levels and growth kinetics, which suggests that this intramolecular interaction has a genuine regulatory role in the viral life cycle 49 . Another common viral recoding strategy is stop codon readthrough 12 , whereby, instead of ribosome termination, the stop codon is decoded as a sense codon. Translation then proceeds in the original reading frame, which results in an extended protein that is produced at a low frequency. As with frameshifting, the efficiency of codon readthrough is typically influenced by RNA sequences and structures that immediately surround the stop codon 12 . In the plant tombusvirus CIRV, stop codon readthrough generates the viral RdRp, and this process requires a long-distance interaction between an RNA structure that is immediately downstream of the readthrough site (which is known as the proximal readthrough element (PRTE)) and a sequence in the 3ʹ UTR (which is known as the 3ʹ-proximal distal readthrough element (DRTE)) 50 (FIG. 3b) . The DRTE is associated with one of two mutually exclusive stem-loop structures, SL-T and SL-2, of which SL-2 is essential for genome replication 50 . Formation of SL-T positions the complementary 3ʹ sequence in its terminal loop, which facilitates the establishment of the long-distance interaction that improves translational readthrough and simultaneously inhibits genome replication by precluding the formation of SL-2. Conversely, the SL-2-containing conformation promotes genome replication and impedes readthrough (FIG. 3b) . On the basis of these observations, SL-T and SL-2 were proposed to function as an RNA switch that assists in the coordination of translation and replication 50 . The replication of positive-strand RNA virus genomes occurs via the synthesis of a complementary negativestrand RNA, which is subsequently used as a template for the production of progeny positive-strand RNA genomes (FIG. 1) . This process is catalysed by a virally encoded RdRp and is assisted by viral and host proteins. The initiation of negative-strand synthesis involves the RdRp accessing the 3ʹ terminus of a genome, and RNA sequences and structures that facilitate this are usually located near to the 3ʹ end. However, there is compelling evidence that RNA elements that are considerably distal to 3ʹ ends can also influence the efficiency of complementary strand production 51, 52 . shows coding regions as cylinders. p39 and p60 correspond to proteins of 39 kDa and 60 kDa, respectively, which are encoded in two separate ORFs in different reading frames. p39 is produced when no ribosome frameshifting occurs. However, when a -1 frameshift occurs near the very 3ʹ end of the p39 ORF, ribosomes are shifted into the p60 reading frame and translate the p60 ORF as a carboxy-terminal extension of p39. The resulting frameshift protein is the viral RNA-dependent RNA polymerase (RdRp), which is approximately 99 kDa. Frameshifting is stimulated by a long-range interaction (double-headed arrow) between a bulge in an RNA structure that is near to the frameshift site and the terminal loop of a 3ʹ-proximal stem-loop. Interacting sequences are shown in green. b | Linear representation of the carnation Italian ringspot virus (CIRV) RNA genome, which shows readthrough translation of its RdRp. Ribosomes that initiate at the 5ʹ end of the genome normally terminate at the stop codon at the end of the p36 ORF, producing a protein of 36 kDa. Translational readthrough of the p36 stop codon produces a C-terminally extended readthrough protein of 95 kDa (p95), which is the viral RdRp. Readthrough requires base pairing between the proximal readthrough element (PRTE) that is located near to the stop codon and the 3ʹ-proximal distal readthrough element (DRTE). The DRTE 3ʹ sequence is associated with one of two mutually exclusive RNA conformations. The SL-T conformation facilitates readthrough and prevents genome replication, whereas the alternative conformation, which contains SL-2, promotes replication and inhibits readthrough. These two conformations thus represent a type of RNA switch that probably coordinates translation and replication. Atomic force microscopy (AFM). A high-resolution type of scanning probe microscopy that can be used to assess the surface topography of biological molecules. sequence, the upstream of AUG region (UAR) and the downstream of AUG region (DAR) (FIG. 4a) . The resulting RNA circularization has been observed directly by atomic force microscopy (AFM) in the absence of proteins, which shows that circularization can be entirely RNA-based 56 (FIG. 4a) . In addition, structural analysis by chemical probing also supports protein-independent interactions between the 5ʹ and 3ʹ termini of the genome 62 . However, although circularization can occur autonomously, protein factors might assist in the process, as both the flavivirus core protein and NS3 helicase have been shown to mediate 5ʹ-and 3ʹ-end base pairing of genomic RNA in vitro 63, 64 . The observed circularization is required for flavivirus genome replication because RdRp binds to an RNA stem-loop in the 5ʹUTR, which positions RdRp ~11 kb upstream of the 3ʹ terminus where negativestrand synthesis initiates 65, 66 . The long-distance RNA-RNA interaction between the genomic termini thus repositions the RdRp to the 3ʹ end, where it can commence initiation 65 (FIG. 4a) . DENV genome circularization can be modulated by localized regulatory elements, such as a conserved RNA pseudoknot that is located adjacent to the cyclization sequence in the 5ʹ UTR 67 . Moreover, it was recently shown that the DENV genome requires a specific balance between circularized and linear (that is, non-circularize d) conformations 68 . Parts of the 3ʹ UAR and 3ʹ DAR sequences can fold into a small local RNA hairpin in the 3ʹ UTR, which is known as sHP, and the formation of sHP inhibits the interaction with 5ʹ-proxima l partner sequences 68 (FIG. 4a) . Virus replication was found to be sensitive to mutations that altered the natural balance between local sHP formation and long-range pairing, which indicates that a defined ratio between circularized and linear conformations is necessary for viability 68 . The regulatory function of sHP might be even more complex, as only base pairing of sHP is important for replication in mammalian cells, whereas, unexpectedly, both base pairing and sequence identity are important in mosquito cells 69 . Although the requirement for genome circularization in flavivirus replication is now generally accepted, it should be noted that an alternative model has recently been proposed, whereby the 5ʹ and 3ʹ complementary sequences would function in trans and generate dimers and/or oligomers of flavivirus genomes 70 . The formation of such concatamers would presumably be concentration-dependent and could have regulatory effects that differ from those of circularized monomers 70 . Although such alternative pairing scenarios are theoretically feasible, their existence and possible biological relevance remain to be investigated. Similarly to flaviviruses, RdRp of the tombusvirus TBSV associates with the viral genome far upstream of the 3ʹ terminus. In this case, RdRp forms a complex with its auxiliary replication protein, p33, which binds specifically to an internal RNA element, known as RII, that is located more than 3 kb upstream of the 3ʹ end 71, 72 (FIG. 4b) . The 3ʹ terminus of the genome contains an RNA element known as RIV, which is essential for genome replication. RII and RIV are united by a long-range base-pairing interaction that occurs between an upstream linker sequence, which is 3ʹ-proximal to RII, and a partner downstream linker sequence, which is near to the 3ʹ terminus 52 (FIG. 4b) . In addition to facilitating 3ʹ end access to the RdRp, the association between these linker sequences generates a bipartite RII-RIV RNA platform that is necessary for the assembly of the replicase complex, which is composed of viral and host proteins 52, 73 . As RII-like internal replication elements are also present in other viruses of the Tombusviridae family, it is probable that other members of this family might also require similar long-range intragenomic interactions for genome replication 74 . Many positive-strand RNA viruses are polycistronic, which means that they encode multiple viral proteins within a single genome segment. ORFs that are located downstream are usually not efficiently translated, owing to poor ribosome access Thus, to enable the robust expression of these proteins, these viruses transcribe smaller viral sgmRNAs in which the downstream ORFs are relocated to 5ʹ-proximal positions, which thereby enables efficient ribosome access (FIG. 1) . The mechanism that is involved in the generation of sgmRNAs depends on the virus, and some viruses that use discontinuous template synthesis or premature termination mechanisms require long-range RNA-RNA interactions. Coronavirus genomes are extraordinarily large -they can be up to ~32 kb in length -and expression of their 3ʹ-proxima l genes depends on sgmRNA transcription 75 . These sgmRNA s consist of a common 5ʹ-terminal region (known as the leader) that is fused to a variable 3ʹ-terminal segment (known as the body). They are transcribed from complementary negative-strand templates, which are generated by a discontinuous template synthesis mechanism 76 (FIG. 5a) . During negative-strand synthesis, RdRp dissociates from the positive-strand genome at specified locations for each sgmRNA (which defines the body segment) and then reprimes on the template within the 5ʹ UTR (where it copies the common leader sequence). The positions of RdRp release and repriming are guided by transcription-regulating sequences (TRSs) 76 . The discontinuous RNA template is then used to transcribe corresponding sgmRNAs, which contain a common 5ʹ leader that is connected to different 3ʹ-terminal body sections that encode the different viral ORFs. In transmissible gastroenteritis virus (TGEV), two long-distance interactions control the transcription of the sgmRNA that encodes the nucleocapsid protein Nature Reviews | Microbiology . Transcription of sgmRNA-N involves discontinuous synthesis (dashed arrow) of a negative-strand RNA that contains sequences from the body (purple) region, which is located at the 3ʹ end of the genome, and leader (orange) region, which is located at the 5ʹ end of the genome. The RNA-dependent RNA polymerase (RdRp) repriming step is guided by the transcriptionregulating sequences (TRS; red), and the negative strand that is generated is used as a template for sgmRNA-N transcription. In the lower panel, a simplified RNA secondary structure shows the two interactions, cBM-BM and DE-PE (green), bringing the TRS-N and TRS-leader (TRS-L) into close proximity to mediate RdRp repriming. b | The upper panel is a linear representation of the tomato bushy stunt virus (TBSV) genome, which shows the relative positions of long-range interactions that are involved in sgmRNA transcription. The transcription pathway that leads to sgmRNA2 production is shown on the right-hand side. The negative-strand template is generated by premature termination of RdRp, which is caused by an attenuation signal in the genome that is formed by a long-range interaction between an activator sequence and a receptor sequence (AS2-RS2). The lower left-hand panel shows a simplified RNA secondary structure that depicts the interactions that are involved in transcription of both sgmRNA1 and sgmRNA2. The long-range interactions AS1-RS1 and AS2-RS2 constitute the attenuation signals for sgmRNA1 and sgmRNA2, respectively. SgmRNA2 transcription requires an additional interaction between a distal element and a core element (DE-CE). A membrane-associated complex that consists of the viral RNA-dependent RNA polymerase and viral or host proteins; it mediates the replication of viral RNA genomes and the transcription of viral subgenomic mRNAs. (sgmRNA-N) [77] [78] [79] (FIG. 5a) . One of these interactions spans almost 26 kb, which is the longest that has been reported so far; it forms between complementary sequences known as the B-motif (BM) and the complementary BM (cBM), which are located upstream of the TRS for sgmRNA-N (TRS-N) and downstream of the leader TRS (TRS-L), respectively 79 . This interaction, in combination with a shorter-range interaction between a distal element and a proximal element (DE-PE), brings the two TRS elements into close proximity, which promotes efficient RdRp transfer [77] [78] [79] . Tombusvirus sgmRNA transcription. An alternative mechanism for the production of sgmRNAs involves the premature termination of the viral RdRp during negative-strand synthesis of the genome. This results in the generation of a 3ʹ-truncated negative-strand that is then used as a template for the synthesis of positive-strand sgmRNA s 1,80 (FIG. 5b) . Premature termination is facilitated by an RNA attenuation signal in the positive-strand genome, which is formed by a base-paired RNA segment that is located just upstream of the sgmRNA start site. This signal functions as a physical 'roadblock' that induces polymerase termination. Notably, the attenuation signals that promote the transcription of sgmRNA1 and sgmRNA2 in TBSV are formed by long-range interactions that involve activator and receptor sequences, which are known as AS1-RS1 and AS2-RS2, respectively 80,81 (FIG. 5b) . AS1-RS 1 spans ~1.1 kb, whereas AS2-RS2 spans ~2.2 kb and requires an auxiliary ~1.0 kb long-range interaction between a distal element and a core element (DE-CE) 82, 83 . Interestingly, the identity of the nucleotides that form the attenuation signals is not important 83 ; however, the stability of the base-paired segments was found to be important 84 . sgmRNA transcription in the bisegmented insect nodavirus flock house virus (FHV) is also likely to occur via a premature termination mechanism. A sequence that is located in the central region of its larger genome segment interacts with two different downstream sequences, one of which is located more than 1.4 kb away and is positioned directly in front of the sgmRNA transcription start site 85 . The double-stranded RNA structure that forms ahead of the initiation site is probably functionally comparable to the attenuation signals in TBSV. It has also been proposed that another bisegmented virus -the dianthovirus RCNMV -uses a premature termination mechanism for transcription 86 . However, for RCNMV, the attenuation signal forms in trans between two complementary sequences that are located separately in the RNA1 and RNA2 genome segments. During infections, the increase in the concentrations of the two genome segments promotes the formation of this bimolecular interaction, which, in turn, activates sgmRNA transcription. Accordingly, it was proposed that this interaction provides a concentration-dependent mechanism to temporally synchronize the transcription of the capsid protein-encoding sgmRNA with the accumulation of the two genomic segments that are co-packaged 86 . Roles and regulation of long-range interactions As described in the above sections, long-range interactions have diverse functions. These include the relocalization of bound proteins to a distal genomic location (for example, repositioning 3ʹ CITE-bound factors near to 5ʹ termini or repositioning 5ʹ-proximally bound RdRps near to 3ʹ termini), the generation of a bipartite RNA platform for the assembly of protein complexes (for example, the RII-RIV RNA platform that is used for tombusvirus replicase complex assembly), the colocalization of two RNA elements that require proximity for function (for example, TRS elements involved in RdRp repriming in coronavirus transcription) and the formation of RNA structures that directly regulate a viral process (for example, the double-stranded attenuation signals that direct the premature termination of RdRps during sgmRNA transcription). In some cases, the need for a long-range interaction is clear, such as in relocating proteins; however, in other cases, the requirement is less obvious. For example, the attenuation signals that are involved in sgmRNA transcription in tombusviruses can be functionally replaced by local RNA hairpins, which indicates that the long-distance interactions are not essential 83 . Interestingly, some viruses that are related to tombusviruses (for example, carmoviruses and necroviruses) use local, rather than long-range, transcription-attenuation signals [87] [88] [89] . Thus, the differences that are observed could simply reflect the random nature of the emergence of local versus long-range interactions during virus evolution. Nevertheless, long-range interactions may provide yet-to-be discovered regulatory advantages that could be mediated via genome-level RNA rearrangement. Long-distance base-pairing interactions can be regulated by several mechanisms. The most basic strategy is to modulate the stability of the base-paired region by altering the composition and/or number of nucleotides that are involved. The presence of competing RNA structures provides an additional mechanism, which is exemplified by the RNA switches that regulate the interactions that are required for translation and/or replication in HCV 40 , replication in DENV 68 and readthrough in CIRV 50 (as described in the previous sections). Furthermore, long-range interactions might also be regulated by proteins that could facilitate or prevent the formation of an interaction and/ o r destabilize or stabilize an interaction. In line with such concepts, the flavivirus core protein and NS3 helicase have been shown to facilitate circularization in vitro 63, 64 , and it seems probable that proteins also modulate some of the interactions in other systems. Interactions can also be mediated by several different contacts, as has been reported for flavivirus circularization [53] [54] [55] [56] [57] [58] [59] [60] [61] , and this suggests that cooperative effects and different regulatory mechanisms for individual contacts could also have a role 60 . Intramolecular interactions are not generally predicted to be influenced by the local genome concentration; however, as crowding agents have been shown to increase the folding of small RNAs in vitro 90 , it is possible that high concentrations of viral RNAs in vivo could also influence the formation or stabilization of intra molecular long-range interactions. Alternatively, if some of the proposed cis-interactions do actually occur in trans, as has been suggested 70 , then genome abundance would clearly be an additional mode of control (as has been proposed for sgmRNA transcription in RCNMV 86 and for frameshifting in SARS-CoV 49 ). However, a dependence on cis-only interactions could be advantageous, as it could provide a form of quality control for genetic completeness to viruses that use intragenomic interactions by selecting for viral genomes that maintain the interacting sequences and presumably intact intervening sequences 4,65 . Whole-genome context of long-range interactions Although we are beginning to understand the function and regulation of long-range interactions, it remains unclear how they are able to function in the complex context of viral RNA genomes that have multiple functions. Some interactions involve overlapping sequences and are therefore mutually exclusive, whereas others promote processes that are opposed either physically (such as translation and replication) or temporally (such as replication and encapsidation). Accordingly, proper coordination of these interactions must be essential for their function. This regulation would be particularly relevant for viruses that have multiple long-range interactions, The RNA secondary structure models for satellite tobacco mosaic virus (STMV; see the figure, part a) and tomato bushy stunt virus (TBSV; see the figure, part b) were generated by incorporating SHAPE (selective 2ʹ-hydroxyl acylation analysed by primer extension)-reactivity data into the 'RNAstructure' program. The global organization of the ~1.1 kb STMV RNA genome includes three domains that are formed by long-range base-pairing interactions 93 . Consistent with this structural model, a fraction of the STMV genomes that were examined by atomic force microscopy (AFM) were found to adopt three-branch structures 93 (see the figure, part a) . Although a second study that examined the STMV structure by SHAPE predicted different terminal configurations 94 , the long-range interactions that define the large central domain were consistent in both models 93, 94 . The role of these long-range interactions in the STMV reproductive cycle remains to be determined; however, it was suggested that the helices that form these and other interactions could be important for genome packaging by interacting with capsid proteins within assembled particles 93, 94 . SHAPE-based structural analysis was also used to study the larger ~4.8 kb TBSV RNA genome, which contains six known functional long-range interactions: 3ʹ cap-independent translational enhancer (3ʹ CITE )−5ʹ UTR, proximal readthough element-distal readthrough element (PRTE-DRTE), upstream linker-downstream linker (UL-DL), activator sequence 1-receptor sequence 1 (AS1-RS1), AS2-RS2 and distal element-core element (DE-CE) 95 . Interestingly, only two (AS1-RS1 and DE-CE) of these six known interactions (TABLE 1) were predicted to form in the proposed genome structure. Although the complementary segments that were involved in the other four interactions were not paired, the global organization of the genome brings these partner sequences into proximity, which suggests that the existing framework could allow for the formation of these additional interactions without the need for large-scale rearrangements 95 . The TBSV genome structure also revealed the presence of differently sized domains (small domain (sD), blue; medium domain (mD), orange; and large domain (lD), green), and the large domains (which are 500-2,000 nucleotides long) correspond to different coding regions. These domains emanate from a central hub that is formed by long-range base-pairing interactions, and AFM images of the TBSV genome showed that it had compact floret-like shapes that were consistent with a multidomain structure 95 (see the figure, part b) . This structural model for the TBSV genome is one of several different possible conformations and provides a reference structure with which those that have been (Selective 2ʹ-hydroxyl acylation analysed by primer extension structural mapping). An RNA structure-probing technique that modifies the backbone in a base-nonspecific manner at positions of base flexibility. Highly modified positions correspond to unpaired nucleotides; this information is used to more accurately predict RNA secondary structure. such as tombusviruses, which have at least six different functional long-distance interactions, all of which span distances of 1 kb or more (TABLE 1) . In such cases, the global structure of the viral genome must have features that enable it to form each of the different interactions at the correct time. Thus, genomes must be dynamic and able to adopt alternative conformations, which would be influenced by the intrinsic features of the RNA and its environmental context. Indeed, the structure of the viral genome is likely to be distinct during different steps of the viral life cycle -for example, when the genome is encapsidated, has just been released from its capsid, is being translated, is being replicated or transcribed or is undergoing packaging. Other events, such as co-replicativ e 5ʹ-to-3ʹ folding of the genome during its synthesis, would also influence initial and ultimate structures and the ability to form different long-range interactions 10 . The factors that govern structural transitions at the genomic level are therefore of great interest. The study of diverse genome states under different in vitro and in vivo conditions will assist in gaining an understanding of the complex coordination of alternative conformations. Ideally, long-range interactions should be studied in their natural genomic contexts, and a logical first step would be to obtain information about the secondary structure of viral RNA genomes. These studies are now possible, owing to technical advances, such as high-throughput SHAPE structural mapping (selective 2ʹ-hydroxyl acylation analysed by primer extension structural mapping) 91, 92 . This chemical probing method provides information about nucleotide flexibility at each position in an RNA, which positively correlates with the likelihood that a residue is single-stranded in the structure. The SHAPE reactivity data can be incorporated into a thermodynamic-based RNA secondary structurepredicting program as a pseudo-free energy parameter to improve model prediction 91, 92 . Using this approach, the global organization of the 1,058 nucleotide-long satellite tobacco mosaic virus (STMV) 93, 94 RNA genome and the 4,778 nucleotide-long TBSV RNA genome 95 have been predicted, and the results have provided insights into the genomic contexts of long-range interactions (BOX 1) . In this Review, we have discussed examples that highlight the important and varied roles of long-range RNA-RNA interactions in fundamental viral processes, including the translation of viral proteins, the replication of the viral genome and the transcription of sgmRNAs. It is clear that substantially different types of positive-strand RNA viruses use this distinctive regulatory strategy. These viruses have evolved to integrate long-range interactions within their genomes in a manner that provides them with mechanisms to regulate a diverse array of viral functions. Indeed, it is quite remarkable that these relatively simple interactive structural features are able to carry out such a broad range of structure-based functions. Importantly, these interactions provide the viruses with a unique opportunity for regulation that is linked to genome structure; this, in turn, may provide benefits in addition to those of local RNA elements. Many important advances have been made in understanding the structure and function of long-range interactions. Unfortunately, not all interactions are amenable to the available investigative methodologies. For example, interactions that are transient or that require special conditions in order to form may be difficult to detect biochemically and interactions that have unknown, functionally relevant, alternative base-pair partners may be intractable to genetic techniques, such as compensatory mutational analyses. Regardless of the challenges, the interactions that are supported by both biophysical and genetic evidence should be viewed as having increased credibility. For the future, additional research and technological advances are needed to expand on existing findings and to clarify open questions; for example, it will be crucial to determine the detailed structures of different interactions and to establish how these structures influence activity. Other important challenges will be to identify the dynamics and energetic barriers of transitions and folding pathways. In addition, future research is needed to determine the regulatory advantage of long-range interactions, how these interactions are integrated and coordinated within viral genomes and whether any of these interactions occur in trans. Moreover, it will be crucial to investigate the possible involvement of viral and/or host proteins in regulating interactions and how the cellular environment affects interactions. An increased understanding of longrange interactions will also help to determine whether such interactions are plausible targets for antiviral therapies. Finally, another area for future consideration extends beyond viral contexts to cellular messages: how common are functional long-range RNA-RNA interactions in cellular mRNA biogenesis and function? A limited number of recent reports suggest that these structures can indeed participate in the regulation of pre-mRNA splicing 96 as well as in the control of eukaryotic 97 and bacterial 98 mRNA translation. Consequently, the phenomenon that is observed in viruses may foreshadow a similar prevalence and diversity of functional long-range RNA-RNA interactions in cellular mRNAs. Subgenomic messenger RNAs: mastering regulation of (+)-strand RNA virus life cycle Diverse roles of host RNA binding proteins in RNA virus replication Cis-acting RNA elements in human and animal plus-strand RNA viruses Long-distance RNA-RNA interactions in plant virus gene expression and replication 3ʹ cap-independent translation enhancers of positive-strand RNA plant viruses A. 3ʹ cap-independent translation enhancers of plant viruses Unmasking the information encoded as structural motifs of viral RNA genomes: a potential antiviral target Secondary structure model for the last two domains of single-stranded RNA phage Qβ A long-range interaction in Qβ RNA that bridges the thousand nucleotides between the M-site and the 3ʹ end is required for replication Non-canonical translation in RNA viruses Translational control in positive strand RNA plant viruses Structural and functional diversity of viral IRESes Tombusvirus recruitment of host translational machinery via the 3ʹ UTR Structure of a viral cap-independent translation element that functions via high affinity binding to the eIF4E subunit of eIF4F The cap-binding translation initiation factor, eIF4E, binds a pseudoknot in a viral cap-independent translation element Structural domains within the 3ʹ untranslated region of Turnip crinkle virus Solution structure of the cap-independent translational enhancer and ribosome-binding element in the 3ʹ UTR of turnip crinkle virus A novel interaction of cap-binding protein complexes eukaryotic initiation factor (eIF) 4F and eIF(iso)4F with a region in the 3ʹ-untranslated region of satellite tobacco necrosis virus The 3ʹ cap-independent translation element of Barley yellow dwarf virus binds eIF4F via the eIF4G subunit to initiate translation Tombusvirus Y-shaped translational enhancer forms a complex with eIF4F and can be functionally replaced by heterologous translational enhancers The 3ʹ proximal translational enhancer of Turnip crinkle virus binds to 60S ribosomal subunits A ribosome-binding, 3ʹ translational enhancer has a T-shaped structure and engages in a long-distance RNA-RNA interaction Base-pairing between untranslated regions facilitates translation of uncapped, nonpolyadenylated viral RNA This is the first report of a long-range RNA-RNA interaction that facilitates translation initiation in a viral genome 5ʹ-3ʹ RNA-RNA interaction facilitates cap-and poly(A) tail-independent translation of tomato bushy stunt virus mRNA: a potential common mechanism for Tombusviridae Analysis of a 3ʹ-translation enhancer in a tombusvirus: a dynamic model for RNA-RNA interactions of mRNA termini Context-influenced capindependent translation of Tombusvirus mRNAs in vitro Long-distance kissing loop interactions between a 3ʹ proximal Y-shaped structure and apical loops of 5ʹ hairpins enhance translation of Saguaro cactus virus Oscillating kissing stem-loop interactions mediate 5ʹ scanning-dependent translation by a viral 3ʹ-cap-independent translation element The kl-TSS translational enhancer of PEMV can bind simultaneously to ribosomes and a 5ʹ proximal hairpin IRES-driven translation is stimulated separately by the FMDV 3ʹ-NCR and poly(A) sequences The 3ʹ end of the foot-and-mouth disease virus genome establishes two distinct long-range RNA-RNA interactions with the 5ʹ end region The 3ʹ-terminal hexamer sequence of classical swine fever virus RNA plays a role in negatively regulating the IRES-mediated translation A long-range RNA-RNA interaction between the 5ʹ and 3ʹ ends of the HCV genome References 35 and 36 report the identification of a 5ʹ-3ʹ RNA-RNA interaction in HCV and its role in regulating IRES-mediated translation The folding of the hepatitis C virus internal ribosome entry site depends on the 3ʹ-end of the viral genome A hepatitis C virus cis-acting replication element forms a long-range RNA-RNA interaction with upstream RNA sequences in NS5B A twist in the tail: SHAPE mapping of long-range interactions and structural rearrangements of RNA elements involved in HCV replication Direct evidence for RNA-RNA interactions at the 3ʹ end of the hepatitis C virus genome using surface plasmon resonance This is a biophysical study of HCV RNA elements that are involved in long-range interactions and their potential roles as RNA switches in viral life cycle regulation End-toend crosstalk within the hepatitis C virus genome mediates the conformational switch of the 3ʹX-tail region Kissing-loop interaction in the 3ʹ end of the hepatitis C virus genome essential for RNA replication Role of the RNA2 3ʹ non-translated region of Blackcurrant reversion nepovirus in translational regulation The RNA2 5ʹ leader of Blackcurrant reversion virus mediates efficient in vivo translation through an internal ribosomal entry site mechanism Translation mechanisms involving long-distance base pairing interactions between the 5ʹ and 3ʹ non-translated regions and internal ribosomal entry are conserved for both genomic RNAs of Blackcurrant reversion nepovirus A -1 ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral RNA A long-distance RNA-RNA interaction plays an important role in programmed -1 ribosomal frameshifting in the translation of p88 replicase protein of Red clover necrotic mosaic virus RNA dimerization plays a role in ribosomal frameshifting of the SARS coronavirus Multifaceted regulation of translational readthrough by RNA replication elements in a tombusvirus Genome cyclization as strategy for flavivirus RNA replication A discontinuous RNA platform mediates RNA virus replication: building an integrated model for RNA-based regulation of viral processes In vitro RNA synthesis from exogenous dengue viral RNA templates requires long range interactions between 5ʹ-and 3ʹ-terminal regions that influence RNA structure Essential role of cyclization sequences in flavivirus RNA replication Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization Long-range RNA-RNA interactions circularize the dengue virus genome This report uses AFM to visualize single viral RNA molecule circularization and identifies a second long-range interaction that is required for Dengue virus replication Functional analysis of dengue virus cyclization sequences located at the 5ʹ and 3ʹUTRs Nile virus genome cyclization and RNA replication require two pairs of long-distance RNA interactions Genetic analysis of West Nile virus containing a complete 3ʹCSI RNA deletion Interplay of RNA elements in the dengue virus 5ʹ and 3ʹ ends required for viral RNA replication The 5ʹ and 3ʹ downstream AUG region elements are required for mosquito-borne flavivirus RNA replication Structural complexity of Dengue virus untranslated regions: cis-acting RNA motifs and pseudoknot interactions modulating functionality of the viral genome Core protein-mediated 5ʹ-3ʹ annealing of the West Nile virus genomic RNA in vitro Novel ATP-independent RNA annealing activity of the dengue virus NS3 helicase This study reports the identification of a 5ʹ-proximal RNA element that is recognized by the viral RdRp and proposes a model that involves the relocation of the RdRp to the 3ʹ-terminus via long-range RNA-RNA base pairing Terminal structures of West Nile virus genomic RNA and their interactions with viral NS5 protein Novel cis-acting element within the capsid-coding region enhances flavivirus viral-RNA replication by regulating genome cyclization A balance between circular and linear forms of the dengue virus genome is crucial for viral replication Differential RNA sequence requirement for dengue virus replication in mosquito and mammalian cells Do RNA viruses require genome cyclisation for replication? Kinetics and functional studies on interaction between the replicase proteins of Tomato bushy stunt virus: requirement of p33:p92 interaction for replicase assembly Specific binding of tombusvirus replication protein p33 to an internal replication element in the viral RNA is essential for replication Defining the roles of cis-acting RNA elements in tombusvirus replicase assembly in vitro Internal RNA replication elements are prevalent in Tombusviridae Nidovirales: evolving the largest RNA virus genome A contemporary view of coronavirus transcription Identification of a coronavirus transcription enhancer Gene N proximal and distal RNA motifs regulate coronavirus nucleocapsid mRNA transcription This paper reports the characterization of an extremely long-range RNA-RNA interactionwhich is the longest interaction that has been reported so far -spanning ~26,000 nucleotides Subgenomic mRNA transcription in Tombusviridae An RNA activator of subgenomic mRNA1 transcription in tomato bushy stunt virus Regulatory activity of distal and core RNA elements in tombusvirus subgenomic mRNA2 transcription A complex network of RNA-RNA interactions controls subgenomic mRNA transcription in a tombusvirus Higher-order RNA structural requirements and small-molecule induction of tombusvirus subgenomic mRNA transcription Longdistance base pairing in flock house virus RNA1 regulates subgenomic RNA3 synthesis and RNA2 replication RNAmediated trans-activation of transcription from a viral RNA RNA-based regulation of transcription and translation of aureusvirus subgenomic mRNA1 Subgenomic mRNA transcription in tobacco necrosis virus Evidence for a premature termination mechanism of subgenomic mRNA transcription in a carmovirus Molecular crowders and cosolutes promote folding cooperativity of RNA under physiological ionic conditions RNA structure analysis at single nucleotide resolution by selective 2ʹ-hydroxyl acylation and primer extension (SHAPE) SHAPE-directed RNA secondary structure prediction Long-range architecture in a viral RNA genome In vitro secondary structure of the genomic RNA of satellite tobacco mosaic virus This paper reports secondary-structure modelling of the ~4.8 kb RNA genome of TBSV and analyses newly identified structures and interactions Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges 5ʹ-3ʹ-UTR interactions regulate p53 mRNA translation and provide a target for modulating p53 induction after DNA damage Base pairing interaction between 5ʹ-and 3ʹ-UTRs controls icaR mRNA translation in Staphylococcus aureus The authors apologize to colleagues whose research could not be cited owing to space limitations. The work in the authors' laboratory is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors declare no competing interests.