key: cord-0002997-ukcxh8qa
authors: Lim, Chun Shen; Brown, Chris M.
title: Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs
date: 2018-01-04
journal: Front Microbiol
DOI: 10.3389/fmicb.2017.02582
sha: 82b043ec316b0bfb8350d65b09f6028d4ea9e564
doc_id: 2997
cord_uid: ukcxh8qa

Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

This review illustrates the key concepts and strategies used for prediction of RNA structural elements in RNA viral sequences. A range of RNA structure prediction software and relevant resources are available, but most are underutilized by virologists. Here the concepts and strength of these methods are introduced using examples of successful approaches in viruses, with the intention of bridging the gap. The roles of RNA elements in viral biology is illustrated using well-studied viruses, flaviviruses, influenza, and barley yellow dwarf virus (BYDV). We further review the structures and functions of well-characterized types of RNA elements with the emphasis on prediction approaches and their limitations.

There have been several excellent recent reviews on generally predicting RNA structures, in particular relating to integrating experimental data and on 3D predictions (Cantara et al., 2014; Achar and Saetrom, 2015; Weeks, 2015; Dawson and Bujnicki, 2016; Lorenz et al., 2016; Turner and Mathews, 2016) . RNA 3D structure prediction methodology and incorporation of experimental constraint is beyond the scope of this review, but we include examples where they have been utilized.

Stems involving G-C, A-U, and G-U canonical Watson-Crick base-pairs are the basis of most viral RNA structures, indeed the stem-loop is the basic building block (Table 1, Figure 1 ). These stems usually form an A-form helix structure, as the 2 ′ -hydroxyl prevents the B-form helix found in DNA. RNA sections with unpaired bases may form structures such as loops or bulges (Bindewald et al., 2008 ; Table 1, Figure 1 ). An RNA secondary structure is more likely to be functional if it (i) has a low minimum free energy (MFE) that enables it to fold and base-pair, and/or (ii) is conserved during evolution with covarying stem base-pairs (compensatory base-pair changes). This RNA structural conservation is based on the concept that RNA stems can be conserved regardless of the base-pairs used (Akiyama et al., 2016; Rivas et al., 2017) . Both of these features can form the basis of predicting new RNA elements (Xu and Mathews, 2016; Taylor and Hamilton, 2017) and can be integrated with experimental data. Notably, in addition to the fold with the lowest free energy, MFE suboptimal predictions are particularly useful in assessing possible alternative structures of RNA (e.g., pseudoknots; Theis et al., 2008) and long-distance base-pairs ; long-range interactions; Table 1 ).

The limits of current methodology means stems are usually predicted initially using only the canonical base-pairs. However, many non-canonical base-pairs and other structural elements are found in experimentally determined RNA structures (Table 1, Figure 1 ). About 40% of bases in known crystal and solution structures were either unpaired, or form non-canonical interactions (Stombaugh et al., 2009) . Some of the more common non-Watson-Crick pairs in the RNA Basepair Catalog are U-U (about 10% as frequent as A-U pairs, 432 of 4,200) and A-G (about 2% as frequent as C-G, 191 of 9,316; Stombaugh et al., 2009) . For example, (i) the base-triple in retroviral encapsidation signals (D'souza et al., 2004) and the base-triples in the pseudoknots of Beet western yellows virus (Su et al., 1999) and Sugarcane yellow leaf virus (Cornish et al., 2005 ; Figure 1A , and (ii) the kink-turn/k-turn ( Figure 1A, Table 1 ) in the panhandle RNA structure of Influenza A virus that is inducible (Lee et al., 2016) and the A-minor k-turn in the encapsidation signal of Moloney murine leukemia virus (Miyazaki et al., 2010 ; Table 1, Figure 1 ).

RNA 3D structures can also be predicted directly from sequences. The accuracy of these prediction tools has improved in the past few years (Miao and Westhof, 2017) , such as RNAComposer (Popenda et al., 2012; Antczak et al., 2016) , 3dRNA (Zhao et al., 2012; Wang et al., 2017) and SimRNAweb (Magnus et al., 2016) . Notably, SimRNAweb has accurately predicted a previously solved frameshifting RNA pseudoknot from beet western yellow virus (Egli et al., 2002) .

These predictions can be tested experimentally. For example, to demonstrate that a predicted RNA structure exists and is functionally important, a wild type phenotype can be destroyed with mutations that disrupt the RNA structure (e.g., Fang et al., 2012; Chapman et al., 2014b) . This phenotype may be restored by compensatory base-pair changes-changing the primary sequence where base-pairing is still allowed. However, primary sequence motifs and structures of loops and bulges may also have important roles (Bindewald et al., 2008) .

The conventional representation of an RNA structure is the 2D stem-loop diagram (Figure 2) . However, the stem-loop diagram is not suitable to represent higher order interactions such as pseudoknotted interactions ( Figure 2C , Table 1 ). In contrast, these tertiary interactions represented by the dotbracket notation (Hofacker et al., 1994) , and circular (Nussinov et al., 1978) and arc (Wattenberg, 2002) diagrams are easier to interpret (e.g., Figures 2A,B,D, respectively) . These diagrams can be generated using VARNA, which requires dot-bracket notation as the input (Darty et al., 2009) . Arc diagrams can also be created using R-CHIE, which is available as R package and web service (Lai et al., 2012) .

However, these diagrams and dot-bracket notation normally represent one sequence at a time. Often common RNA elements are found in related sequences e.g., viral genotypes. Stockholm format is commonly used to represent the consensus RNA secondary structure of aligned sequences ( Figure 2E ; https://en. wikipedia.org/wiki/Stockholm_format). Stockholm format also stores some metadata e.g., the description of the aligned RNA sequences. R2R uses Stockholm file as the input to generate a novel stem-loop diagram of the consensus RNA secondary structure annotated with sequence conservation and covarying base-pairs ( Figure 2F ). Stockholm format and R2R stem-loop diagrams are both used by the Rfam database (see section on "Sources of Known RNA Structures"). Stockholm format file editors are available (Griffiths-Jones, 2005; Waterhouse et al., 2009 ).

Many RNA structure prediction tools were initially released as command line software (Zuker, 1989) . Biologists and virologists who are interested in using these would first learn the command line interface, this was and is a barrier for many researchers. However, where possible significant efforts have been made by developers to make their tools more readily available, as webservers Fallmann et al., 2017) or integrated graphic user interfaces (e.g., RNAStructure, or the Simple Sequence Editor, SSE; Simmonds, 2012; Bellaousov et al., 2013; Wang et al., 2017) . For example, mfold which is the most cited RNA software in virology papers, was first released as a command line software in the late 1980s and became available as web interfaces in early 2000s (Zuker, 2003) .

In contrast to folding one sequence at a time [single-sequence methods e.g., mfold/UNAfold (Zuker, 2003) , RNAfold ], a new generation of software such as LoCARNA (Smith et al., 2010) and RNAz (Gruber et al., 2007) work on multiple sequences (comparative methods). This alleviates the need of predicting RNA structures from related virus sequences one at a time and comparing them manually. Different methods TABLE 1 | Structural RNA elements, the most used prediction tools, and challenges for their prediction.

Most used prediction tools Challenges Stem-loop/hairpin. The helical stem consists of base-pairs. The loop consists of unpaired or non-canonical base-pairs (Zhang et al., 2011) . mfold/UNAFold (Zuker, 2003) , RNAfold Lorenz et al., 2011) , RNAStructure/Fold .

Predictions normally only consider standard or canonical base-pairs C-G, U-A, and U-G. Single base-pairs ("lone pairs") are often excluded by default. Functionally important alternative structures depending on ligand binding need special consideration (e.g., riboswitches).

A region of a helix where there are no canonical base-pairs at one strand (Zhang et al., 2011) .

mfold/UNAFold, RNAfold, RNAStructure/Fold. Predicted bulges may use non-canonical base-pairs e.g., U-U, A-G, kink-turn.

Internal loop. A region of a helix where there are no canonical base-pairs at both strands (Zhang et al., 2011) .

mfold/UNAFold, RNAfold, RNAStructure/Fold. Predicted internal loops may use non-canonical base-pairs e.g., U-U, A-G.

Tetraloop. A four-base terminal loop stabilized by intra-loop hydrogen bonds. This stabilizes the stem-loop structure. The GNRA loop is most common, where N represents any base and R represents either A or G (Zhang et al., 2011) .

mfold/UNAfold, RNAfold (free energy bonus). RNAComposer (Popenda et al., 2012) , 3dRNA (Zhao et al., 2012) (RNA 3D motif prediction).

Most 2D predictions do not predict the intraloop pair (e.g., the G-A pair of the GNRA loop). 3D predictions may predict them. Should be considered if a terminal four-base loop is predicted. Other types of loops e.g., tri-loop and anticodon like loops, can also be stabilized.

Pseudoknot. Bipartate structure in which the loop of one stem-loop base-pairs with a sequence outside of the stem-loop (Zhang et al., 2011) .

PknotsRG (Janssen and Giegerich, 2015) , DotKnot (Sperschneider and Datta, 2010) , Pknots (Rivas and Eddy, 1999) .

Not predicted by most 2D software. Alternative forms of pseudoknot are found.

Kink-turn/k-turn. A three nucleotide bulge in a helix followed by G-A and A-G pairs. Bends the helix (Petrov et al., 2013) .

RNAComposer, 3dRNA. Widespread but most software will not predict these due to non-canonical base-pairs. Requires 3D or homology based software which are yet to be integrated into the most used RNA structure prediction tools.

Junction. The point of connection between a number of different helices (Lilley et al., 1995; Bindewald et al., 2008; Zhang et al., 2011) .

mfold/UNAFold, RNAfold, RNAStructure/Fold. Junctions may be important for ligand binding but 3D structures are difficult to predict.

Base-triple. A group of three bases which interact by hydrogen-bonds that include edge-edge bonds.

RNAComposer, 3dRNA. Common in 3D structures.

Kissing hairpins/kissing loops/kissing stem-loops. Base-pair interactions between the loops of two stem-loops (Zhang et al., 2011) .

pAliKiss (Janssen and Giegerich, 2015) . Difficult to predict without prior knowledge.

tRNA-like or cloverleaf structures. Structures with a tRNA-like tertiary structure. In viruses pseudoknots are often located nearby.

Combination of stem-loop and pseudoknot prediction tools.

No specialized tools available to date.

Base-pair interactions over long distances. Arbitrarily defined as base-pairs over 100 bases apart.

mfold/UNAFold, LRIscan , CovaRNA (Bindewald and Shapiro, 2013) .

Difficult to predict without prior knowledge. Only two specialized tools available to date-CovaRNA and LRIscan. Only LRIscan is optimized for viral genomes and yet to be proven useful.

Inter-molecular interactions. Base-pair interactions between two RNA molecules e.g., two copies of a RNA genome.

RNAhybrid (Rehmsmeier et al., 2004) , RNAaliduplex Lorenz et al., 2011) , bifold (Mathews et al., 1999) .

Difficult to predict without prior knowledge.

vary in whether they align or fold first or do both simultaneously (Gardner and Giegerich, 2004) . The current range of functional RNA structures and prediction tools may seem intimidating (http://en.wikipedia.org/ wiki/List_of_RNA_structure_prediction_software). However, many RNA structure prediction tools and RNA-RNA interaction prediction tools have been compared for use in different applications (Gardner and Giegerich, 2004; Gardner et al., 2005; Puton et al., 2013; Umu and Gardner, 2016) . In general, comparative methods are more accurate than the older single-sequence methods (Puton et al., 2013) .

Many of these powerful applications have been underused by virologists. For example, Infernal (INFERence of RNA Alignment; Nawrocki and Eddy, 2013) and CMfinder (Yao et al., 2006) that are based on both sequence and RNA secondary structure conservation allow sensitive detection of homologous RNA structures. A list of software that has been cited in selected virology publications is available (http://bioanalysis.otago.ac.nz/ Lim2017.htm). Notably, these are beginning to include newer webservers which predict RNA 2D and 3D structures with high confidence.

Current methods often provide a 2D and oversimplified view of a certain sequence forming a single RNA structure. This is incorrect particularly in viral RNAs, where structures need to be transiently formed and melted (Moss et al., 2012a; Zhu and Meyer, 2015) . This one-to-one sequence and structure relationship is also not true in many RNA viruses because they may exist in a quasispecies state where sequence space is sampled by high levels of replication error (Holmes, 2010; Lauring and Andino, 2010; Marz et al., 2014) . Conservation in RNA structures but not the primary sequences across rapidly evolving species being particularly striking, e.g., the HIV frameshift site is one of the most conserved parts of the genome (Mathew et al., 2015) .

The starting point for RNA structure analysis is likely to be a complete (or partial) RNA genome (Figure 3) . This could be a well-studied virus, or come from an outbreak of a new or emerging virus (e.g., SARS or Zika). The aim of these analyses is to further understand the biology of the viruses, and also to identify drug or vaccine targets. Viral RNA elements have been identified as antiviral targets due to conservation of sequence and functions that are distinguishable from the host (Panjaworayan and Brown, 2011; Chen et al., 2014; Cardno et al., 2015; Le Grice, 2015; Hermann, 2016; Hilimire et al., 2017) . For example, the internal ribosome entry sites (IRES) of hepatitis C virus (HCV) is targeted by benzimidazole (Dibrov et al., 2012) . Another example is the HIV frameshift site, that has characteristics distinct from human frameshift sites Mathew et al., 2015; Hermann, 2018) . In addition, double stranded viral RNA structures could be targeted by the host innate immune response, e.g., through Protein kinase RNA-activated (PKR) (Hartmann, 2017) .

Targeting these viral specific features requires understanding of both the viral genomic sequence and its functional and sequence variation constraints-including gene structure and RNA cis-elements (Newburn and White, 2015; Hermann, 2018) .

As a first step a similarity search on the NCBI/RefSeq database may not only allow identification of the virus, but also identify related viral sequences that could assist in predicting functional elements (Figure 3 ). Deep and accurate multiple sequence alignment is crucial in predicting likely RNA structures Fallmann et al., 2017) . Specialized databases may also provide high quality sequence alignments to researchers, such as the LANL sequence databases for HIV, HCV, and hemorrhagic fever viruses (e.g., Ebola; Kuiken et al., 2012; Hatcher et al., 2017) .

A novel virus can be classified according to the International Committee on Taxonomy of Viruses (ICTV) (King et al., 2011) . The viral biology can be inferred if its species is well-characterized using published literature, and facilitated by general databases e.g., the ViralZone knowledgebase (Hulo et al., 2011) and specialized parts of the sequence databases e.g., NCBI Virus Variation Resource (Hatcher et al., 2017) , RefSeq (O'leary et al., 2016) , and NCBI Viral Genomes Resource (Brister et al., 2015) . ICTV and ViralZone are further discussed in the next section "Virus Biology and RNA Structures."

In conjunction with RNA structure analysis, potential coding sequences (CDS) can be predicted, in at least the three forward reading frames. This is an important step prior to prediction of RNA structures located in the coding sequence (Liu et al., 2009; Firth, 2014) , for example frameshifting elements (Giedroc and Cornish, 2009 ). The beginnings and ends of these potential CDS are hotspots for RNA structures (Newburn and White, 2015) .

If possible, alignments should be made to assist in identifying likely CDS and RNA structures (Firth, 2014 ; Figure 3 ). Similar sequences may be found with blastn, although non-coding similarity may be missed unless the initial hit size (word size) is reduced from the default of 11 to the greatest sensitivity available: 7. Alternatively, more sensitive local similarity search programs based on Smith-Waterman algorithm such as FASTA (SSEARCH; Lipman and Pearson, 1985; Pearson and Lipman, 1988) and SWIPE (Rognes, 2011) may be used, but are slower than blastn. FASTA is available through EMBL-EBI tools (https://www.ebi.ac.uk/Tools; Mcwilliam et al., 2013) . Creating alignments for detection of elements within CDS can be facilitated by searching with the encoded protein (e.g., tblastn and tblastx). This will give greater sensitivity than blastn searches.

If it is known that the RNA regions encode for proteins (CDS) and/or contain RNA structures, alignment algorithms that consider this should be used [e.g., webPRANK (Löytynoja and Goldman, 2010) or R-Coffee (Taly et al., 2011) , respectively; Figure 3 ]. RNA structures can also be detected in unaligned sequences, although these methods are more computationally intensive. Ideally, RNA primary sequence alignments should have dissimilarity of about 5-20% (Theis et al., 2015) . Near identical aligned sequences may lack complexity that allows accurate RNA structure prediction and are not usually included in the FIGURE 3 | Approaches in prediction of structured RNA elements in RNA viruses. A virus sequence of interest can be matched to the NCBI/RefSeq database (see section "KNOW YOUR ENEMY"). A range of related sequences can be aligned using RNA structure informed and/or CDS informed approaches. Structured RNA elements of a virus are likely conserved in structure rather than primary sequence (red, blue, and green dots indicate mismatches). Secondary structures can be predicted from the aligned sequence. Covariation of a secondary structure can be tested statistically. Secondary structures can also be predicted directly using minimum free energy MFE) approach. RNA 3D prediction can also be done.

prediction phase (see the success story on "RNA Structures in Coding Regions of Influenza A Virus"). However, the phenotype of a viable virus with a mutation in the structure may be informative (Kobayashi et al., 2016) .

Most RNA structures play cis-regulatory roles in various stages of the virus life cycle. Therefore, the functions of RNA structures can partly be inferred from their locations (Newburn and White, 2015) . The RNA structures located near the 5 ′ end are mostly involved in replication and initiation of translation, such as the dimer linkage structure (DLS) of retroviruses (Johnson and Telesnitsky, 2010) and IRES of Picornaviridae, Flaviviridae in particular HCV and Discistroviridae, respectively (see section on "Internal Ribosome Entry Sites (IRES)"; Lee et al., 2017) . Overlapping CDS may indicate frameshifts which would then direct the search to specific primary features, and nearby stemloops or pseudoknots (Miras et al., 2017 ; see sections on "KNOW YOUR ENEMY" and "pseudoknots"). Whereas, RNA structures located near the 3 ′ end are often important in nuclear export of viral RNAs, such as the Rev response element (RRE) of human immunodeficiency virus (HIV) (Groom et al., 2009) and in replication, processing, or RNA stability (Newburn and White, 2015) . However, other elements e.g., cis-acting replication elements (CRE) can be found in various genomic locations.

For example, it is located at the 3 ′ end of HCV but the CDS of poliovirus (Tuplin et al., 2002; Dutkiewicz et al., 2016) . Structured RNA elements in different locations of many viral genomes were reviewed in detail by Romero-López and Berzal-Herranz (2013) Some guide to what structures to look for can also be obtained from the classification and biology of the virus of interest. ViralZone provides up-to-date information about viral biology, but it is protein and virus centered, rather than RNA structure focused (Hulo et al., 2011) . As of June 2017, it documents the biology of 110 viral families, based on literature review, each entry is linked to Uniprot viral proteins. In ViralZone, summaries have been made under the section "Viral molecular biology: Transcription, replication, translation" (http://viralzone.expasy. org/915). This allows us to infer the viral molecular biology, which in turns provides some clues of what structural RNA elements to search for.

Currently, the ICTV master species list (2016 v1.3) has the taxonomic classification of 4404 viruses and viroids, 44% of these are RNA viruses (Figures 4A,B) . There are a total of 73 RNA viruses are usually enriched with RNA structures. This is partly due to both the replication and transcription of eukaryotic RNA viruses occur in the cytoplasm, which are distinct from the host system and are driven by viral RNA elements. RNA virus transcripts therefore lack 5 ′ -m 7 G-cap and are translated via unusual mechanisms such as internal ribosome entry site (IRES)-mediated translation and cap-independent translation. Only two RNA virus families are bacteriophages, namely Leviviridae and Cystoviridae, which are positive-sense single-stranded RNA and double-stranded RNA viruses, respectively.

RNA virus families. Notably, over half (58%) of the RNA viruses are positive-sense single-stranded RNA viruses. RNA viruses are often enriched with RNA structures. This is partly due to the replication and transcription of RNA viruses occurring in the cytoplasm, which are regulated by viral RNA elements. The genomes and transcripts of some RNA viruses lack the 5-m 7 G (cap) requiring cap-independent translation (Simon and Miller, 2013) . Indeed, some RNA viruses (e.g., picornaviruses) shutoff the host mRNA translation and use cap-independent translation such as IRES-mediated translation (Chase and Semler, 2012) .

In contrast, over 99% of bacterial and archaeal viruses (bacteriophages) are DNA viruses (Figure 4A ; ICTV master species list 2016 v1.3), although these may use RNA structures in their life cycles, notably as regulatory switches (Walsh and Mohr, 2011; Yang et al., 2014) and may have structured ncRNA (Hill et al., 2016) . Only two RNA virus families infect bacteria, namely Leviviridae and Cystoviridae, which are positive-sense singlestranded RNA and double-stranded RNA viruses, respectively ( Figure 4C) . Several RNA bacteriophages are well-characterized such as MS2, Q, F1, and phi6. In particular, the 19-nucleotide MS2 packaging signal stem-loop of E. coli MS2 phage has been extensively studied. This high affinity MS2 packaging signal stemloop is located at the ribosomal binding site of the replicase mRNA. Translation is inhibited upon the strong and specific binding of MS2 capsid protein (Peabody, 1990; Lim and Peabody, 1994; Stockley et al., 1995; Johansson et al., 1998) . Recent studies indicate that other RNA viral genomes may have multiple structured capsid protein binding sites (Patel et al., 2017) . The properties of MS2 have been exploited for various novel applications such as pull-down, tethering proteins to RNAs, RNA affinity purification, and live cell imaging of RNAs and protein-RNA interactions (Bardwell and Wickens, 1990; Bertrand et al., 1998; Coller et al., 1998; Graveley and Maniatis, 1998; Rackham and Brown, 2004) .

An example of useful resource that is not frequently cited by virus research articles is Rfam, the database of RNA structure families (Nawrocki et al., 2014) . It contains over 105 viral RNA structural elements from both DNA and RNA viruses (Rfam 12.2, release January 2017; Figure 5) . The most common viral RNA elements in Rfam are those in 3 ′ UTRs (e.g., CITEs), 5 ′ UTR (e.g., IRES), and packaging elements [e.g., packaging elements (n = 8) and cis-replication elements (n = 17, CRE) or encapsidation elements].

Importantly, the Rfam database can be used to annotate a viral sequence by searching for known RNA families with simple online tools (Nawrocki et al., 2014) . Alternatively, the roles of novel RNA structures can be inferred by comparing them to the existing RNA families (Eggenhofer et al., 2013) . Once characterized researchers can submit new RNA families to Rfam. Automated resources and guides for building families are available (Eggenhofer et al., 2016) . Building these models may be facilitated by using combinations of software (Chen X. S. and Brown, 2012) in particular the WAR webserver (Torarinsson and Lindgreen, 2008) then be published as family descriptions online and/or in RNA Biology (Gultyaev and Olsthoorn, 2010; Lim and Brown, 2016) .

Experimentally determined three dimensional RNA structures and descriptors of common structural elements (e.g., kink-turns, Table 1 ) are found in the Nucleic Acid Database (NDB) and related databases (Coimbatore Narayanan et al., 2014) . These elements can be automatically included in homology based 3D predictions (Antczak et al., 2016) .

In addition to this general database, there are specialized databases containing particular structural or functional classes of elements, some of which are overrepresented in viruses. The database of pseudoknots (Pseudobase++) contains 252 virus records (accessed in June 2017). IRESite contains 44 viral IRES entries (June 2017; Mokrejs et al., 2010) . Recode contains many viral recoding sites, in particular RNA elements stimulating frameshifting and readthrough (Bekaert et al., 2010) .

To illustrate the key concepts of RNA structure prediction, in this section we review the approaches used to successfully study the RNA structures located in flaviviruses, influenza, and BYDV. Common types of RNA structures are illustrated in Figure 1 and described in Table 1 .

In choosing these examples we note that different concepts and approaches should be used in predicting the RNA structures located in the CDS in contrast to UTRs (Figure 3) . RNA structures in the CDS have often been overlooked, and have only been discovered recently in some well-characterized viruses (see below). We will therefore review a successful story begun by several careful bioinformatics analyses of the CDS of the influenza A virus (Moss et al., 2011) . As in experimental approaches, these examples show that independent approaches and tools have been required to accurately predict an RNA structure.

Flaviviruses are positive-sense single-stranded RNA viruses, e.g., the mosquito-borne Dengue and Zika viruses. The RNA structures of flaviviruses have recently been reviewed (Villordo et al., 2016; Fernández-Sanlés et al., 2017) . The sequences and RNA structures of the 3 ′ UTRs of flaviviruses have been studied over three decades. Earlier studies found that the 3 ′ UTR sequences of flaviviruses are highly divergent immediately after the stop codon, but remarkably similar at the distal region of the 3 ′ UTR (Mandl et al., 1993; Wallner et al., 1995; Poidinger et al., 1996) . Earlier computational and biochemical studies also found that a long stable hairpin structure at the 3 ′ UTRs of flaviviruses (3 ′ -LSH) had a similar structure, but not sequence (Grange et al., 1985; Brinton et al., 1986; Hahn et al., 1987; Mandl et al., 1993; Wallner et al., 1995; Shi et al., 1996) . Remarkably, the dumbbell RNA structures of the 3 ′ UTRs of flaviviruses were first discovered by Proutski et al. and Rauscher et al. in 1997 using only computational approaches (Proutski et al., 1997; Rauscher et al., 1997) whereas Rauscher et al. used the Vienna RNA package and a comparative approach including multiple sequence alignment Lorenz et al., 2011) . Rauscher et al. found many covarying base-pairs in these structures, providing compelling evidence for RNA structure conservation. For example, there are 10 statistically significant covarying base-pairs in flavivirus dumbbell structures (RF00525; Figures 2E,F) currently annotated in Rfam (Nawrocki et al., 2014) . Covarying base-pairs of RNA structures and the depth of aligned sequences can be statistically tested using R-scape (Rivas et al., 2017) .

The RNA structures of flavivirus 3 ′ UTRs were subsequently refined and proposed as H-type pseudoknots (tertiary structures) by Olsthoorn and Bol (2001) using mfold with suboptimal folding (e.g., 5 ′ dumbbell of dengue virus 2 flavivirus; Figures 2A-D) . The structures of these flavivirus RNA elements were recently validated by SHAPE (Selective 2 ′ -Hydroxyl Acylation analyzed by Primer Extension) chemical probing, mutation analysis and X-ray crystallography (Manzano et al., 2011; Chapman et al., 2014a,b; Villordo et al., 2015; Akiyama et al., 2016) .

More importantly, many independent experiments have successfully uncovered their complex roles which have clinical implications. For example, deletion of the dengue virus 5 ′ dumbbell structure attenuates the virus, generating vaccine candidates that have been used for clinical testing (Whitehead et al., 2007) . It is shown that assembly of the host RNA helicase DDX6 and other proteins at the dumbbell structures of dengue virus 2 is required for virus replication (Manzano et al., 2011; Ward et al., 2011) . These 3 ′ UTR structures also protect flaviviral subgenomic RNAs (sfRNAs) from the host Xrn1 5 ′ -3 ′ exonuclease digestion (Pijlman et al., 2008) . These sfRNAs are pathogenic and important in regulating viral life cycle (Manzano et al., 2011; Chapman et al., 2014b; Akiyama et al., 2016) and have been targeted by specific antiviral oligomers (Zhang et al., 2008) .

Influenza A virus is a zoonotic virus that infects a wide range of mammals and birds (Shi et al., 2014) . It is a negative-sense singlestranded RNA virus that has an eight-segment genome. Moss et al. (2011) undertook a careful analysis of complete genomes of Influenza A strain H5N1 and H1N1 infecting human, avian and swine from NCBI Virus Variation Resource (Hatcher et al., 2017) . This enabled them to discover many putative structured RNA elements located in the CDS of Influenza A virus (Moss et al., 2011) .

To create multiple sequence alignment, Moss et al. first translated the CDS into protein sequences. The aligned protein sequences were then converted back to nucleotide sequences. They scanned the aligned CDS for putative RNA structures using RNAz (Gruber et al., 2007) . They used sliding windows of 120nucleotide, with 10-nucleotide steps. This allows rapid prediction of local RNA structures in the 120-nucleotide windows of the whole aligned sequence. They also detected synonymous (for the encoded protein) sites in the aligned CDS using SSE (Simmonds, 2012) , that were constrained during evolution. These codonbased alignments detect synonymous constraints, possibly due to the presence of structured RNA elements. This is based on the assumption that synonymous substitutions in a CDS are restricted by base-pairing required for RNA folding, but such constraints could also be due to primary sequence conservation in RNA (or DNA).

Alternatively, codon-based alignment could have been be done using webPRANK (Löytynoja and Goldman, 2010) or Codon Alignment (HCV sequence database; Kuiken et al., 2005) . Significant synonymous constraint sites of aligned CDS can also be detected using FRESco (Sealfon et al., 2015) or synplot2 (Firth, 2014) . Many automated alignments of viral genomes are available using codon based alignments in searches for conserved RNA structures or overlapping CDS (Hofacker et al., 2004; Firth and Brown, 2006; Firth, 2014) . Moss et al. (2011) predicted and refined the potential structured regions using RNAalifold , and Dynalign (Mathews and Turner, 2002) . Pseudoknots were predicted using DotKnot (Sperschneider and Datta, 2010) . Notably, a predicted pseudoknot located in the virus segment 2 genome was subsequently shown to be consistent with that of chemical probing results (Priore et al., 2015) . The predicted RNA structures near the splice junctions of M1/M2 and NS1/NEP transcripts were also validated experimentally and/or found to be important for the virus viability (Moss et al., 2012b; Jiang et al., 2016) .

To improve the power of detecting putative RNA structural elements, subsequent studies focused on specific genes/genome segments, namely HA (surface glycoprotein hemagglutinin; Gultyaev et al., 2016) , M (Kobayashi et al., 2016) , and NP (nucleoprotein; Gultyaev et al., 2014; Soszynska-Jozwiak et al., 2015) using deep multiple sequence alignment. Indeed, new structured RNA elements have been continuously discovered. For example, Kobayashi et al. (2016) analyzed 1884 sequences of M gene from 88 Influenza A virus subtypes. Similar to the Moss et al. (2011) approaches, they scanned the deeply aligned CDS for potential RNA structured regions and synonymous variations using SSE (Simmonds, 2012) . Prediction of the RNA structured regions was based on UNAfold MFE algorithm implemented in SSE (Simmonds, 2012) . They predicted RNA structures on the regions with both low MFE and synonymous substitution rate using RNAalifold. Remarkably, disrupting the base-pairs of the RNA structures located at the 5 ′ and 3 ′ ends of M gene using synonymous mutations reduced the infectivity and attenuated the virus, respectively (Kobayashi et al., 2016) .

In sum, these studies highlight the strength of comparative approach in detection of RNA structures in the CDS. Different comparative methods used by these studies can be compared and combined to achieve better results. However, these powerful comparative approaches are underutilized by virologists.

Luteoviruses including BYDV are important plant pathogens. BYDV infects barley, maize, oats, rice, and wheat, causing yellowing and dwarfing of the hosts (D'arcy and Domier, 2000) . It is arguably the viral genome with the greatest range and diversity of RNA structures. The type member BYDV-PAV has a 5.7 kb positive stranded RNA genome, and three coding and non-coding subgenomic RNAs (Figure 6) . From the initial sequence of the genome and prediction of ORFs, it was likely that it would have require multiple non-canonical translation events to make key proteins-sgRNA expression, frameshift, readthrough, leaky-scanning, and cap-independent translation (Miller et al., 1988) . Much careful experimental work indicated that these events require both structured and loosely-structured RNA elements (Miras et al., 2017) . Studies on frameshifting and cap-independent translation in BYDV showed that both local and distant sequences are required for full activity (Miller et al., 2015) . Some of these are used as illustrations in the following sections.

Most computational analysis will begin by predicting stem-loops but these can be built up into more complex structures (Figure 1,  Table 1 ). Prediction accuracy is increased if an alignment of sequences that fold into the same structure is used (Gorodkin et al., 2014) . When a stem-loop is predicted, attention should also be given to bulges, internal, and terminal loops (Figure 1) . For example: terminal loops may form more stable structures e.g., tetraloops; and be sites of RNA or protein interaction; apparent bulges may form non-canonical pairs (e.g., A-G); and unpaired bases are more likely to form sites of interactions (Lozano et al., 2016) . Modeling of loops is more difficult, but can be done thermodynamically (Sloma and Mathews, 2016) , by using similarity to known elements e.g., tetraloops, or known experimentally determined folds (Theis et al., 2015; Roll et al., 2016; Phan et al., 2017) .

In some cases the terminal loop may form additional "pseudoknot" base-pairs (Table 1, Figure 1C , and Figures 2A-D) . These are most easily visualized on arc and circular diagrams of the suboptimal RNA secondary structures (Figures 2B,D) . Pseudoknots are found in specific parts of the viral genome involved in translation and replication (Brierley et al., 2007; Atkins et al., 2016) , such as the domain IIIf of the HCV IRES ( Figure 7A ) and the dumbbell structures of dengue virus 2 (Figure 5) , respectively.

The most common virus frameshift, is −1 frameshift, which often has a stimulatory pseudoknot(s). This frameshifting was first discovered in a retrovirus, Rous sarcoma virus (gag/pol frameshift; Jacks and Varmus, 1985) . Many viral frameshifts have now been characterized [reviewed in detail by Atkins et al. (2016) ]. Frameshifting elements consist of a slippery site, a spacer (∼5-9 nucleotides) and a frameshift stimulator (stemloop or pseudoknot) [reviewed in detail by Cornish (2009) and Firth and Brierley (2012) ]. However, frameshift stimulators are highly diverse (Chung et al., 2010) . In some cases, non-canonical base-pairs e.g., base triples and Unusual translation mechanisms can also occur in some polycistronic viral RNAs. The 5 ′ UTR of cauliflower mosaic virus is long and highly structured. The highly structured region contains multiple upstream AUGs. A highly structured 5 ′ UTR with multiple upstream AUGs could inhibit translation of the main open reading frame (mORF) of a eukaryotic mRNA. Cauliflower mosaic virus overcome this problem with ribosome shunt cis-element. A ribosome first translate the small ORF (sORF) at the viral 5 ′ UTR. During translation termination, the ribosome dissociates but the take-off site (the sequence surrounding the termination codon) induce ribosome shunting. This allows the ribosome to bypass the highly structured region of the 5 ′ UTR, land on the landing site, followed by translation of the mORF. (D) Feline calicivirus contains two ORFs with a slightly overlapping sequence AUGA. A structured motif called stop/restart cis-element located upstream of AUGA permits effective reinitation and translation of the second ORF. A termination upstream ribosome-binding site (TURBS) located in the RNA structure allows tethering of 40S ribosomal subunit and eIF3. This promotes reinitiation of the second ORF.

long-range base-pairs may be required for −1 frameshifting (e.g., in BYDV, Figure 6 ; Miras et al., 2017) .

In the prediction of −1 frameshifting RNA elements, a slippery site is usually searched for between the two frames (zero and −1). This has a consensus sequence of "X XXY YYZ, " the zero frame codons are separated by spaces, X is an identical nucleotide, Y is either A or U, and Z is not G (Brierley et al., 2007) . A ribosome first encounters the slowly decoded codons of the slippery site. The stalled ribosome then "slips" one nucleotide backward (−1 frame; XXX YYY Z) to resume translation elongation. In most cases, a frameshift stimulator downstream (e.g., RNA structure) of the slippery site is required for physiological frameshifting. Although such structure is dispensable in at least one virus, Hibiscus latent Singapore virus (Niu et al., 2014) .

Pseudoknots are often predicted by visual inspection from studying the local secondary structures following these slippery sites. Software has also been used, for example, the H-type pseudoknot structure modulating −1 frameshifting in Japanese Encephalitis virus was successfully predicted using PknotsRG (now part of pKiss; Janssen and Giegerich, 2015) and experimentally validated (Melian et al., 2009 ). This was confirmed in an independent study on a vaccine strain that harbors a synonymous mutation that abolishes the RNA structure . However, de novo pseudoknot computational prediction remains challenging, and current tools are <5% accurate (Leamy et al., 2016) . This may be improved by including experimental data (Hajdin et al., 2013) .

These ribosomal frameshifting sites can be predicted specifically using KnotInFrame (Theis et al., 2008) . Others have used more general software, e.g., RNA Shapes Studio (Janssen and Giegerich, 2015) as was done for Zika virus, or combination of prediction programs as was used to predict a functional pseudoknot in West Nile virus (Moomau et al., 2016) . Models in 3D can be built of pseudoknots e.g., using MC-Sym as was recently done for Venezuelan equine encephalitis virus (Kendra et al., 2017) .

Kissing hairpins (also known as kissing-loops or kissing stemloops) are formed from the base-pairing between the loop of two stem-loops (Table 1, Figure 1 ). Many kissing hairpins are related to virus replication or transcription (You and Rice, 2008; Ganser and Al-Hashimi, 2016) .

The first viral kissing hairpins were discovered in enteroviruses (plus strand viruses), namely poliovirus and coxsackievirus B3 (Pilipenko et al., 1992) . These structures are located at the 3 ′ UTR of an enterovirus genomic RNA and required for synthesis of the viral negative strand RNA template (Dutkiewicz et al., 2016) . These kissing hairpins are formed by base-pairing of two adjacent stem-loops which are known as X and Y motifs. Interestingly, the primary sequence of these motifs are conserved only in certain enterovirus subgroups but the Y motif variants were shown to be interchangeable between poliovirus and coxsackievirus B3 (Zoll et al., 2009) .

Another well-studied example is the retroviral dimerization initiation sites (DIS). This structure is involved in dimerization of virus genomic RNAs, which is a critical step in retroviral replication (Paillart et al., 2004) . The "kissing" begins at the DIS of two virus genomic RNAs prior to encapsidation (Mailler et al., 2016) .

DotKnot (Sperschneider and Datta, 2010) , pKnots (Rivas and Eddy, 1999) , pKiss (Theis et al., 2010) , and pAliKiss (Janssen and Giegerich, 2015) could also be used to predict such kissing hairpins. However, these are limited to predicting intramolecular kissing interactions.

A tRNA-like structure harbors a four-way junction-three stemloops (a cloverleaf), and in viral structures may also contain additional pseudoknots (Figure 1, Table 1 ). In enteroviruses, a cloverleaf structure known as oriL is involved in viral replication (Prostova et al., 2017) . It is located at the 5 ′ leader of the plus strand genomic RNAs (Dutkiewicz et al., 2016) . The cellular PCBP [poly(rC)-binding protein] and viral protein 3CD pro binds to two different stem-loops of oriL, forming a replication complex. Other proteins could also bind to oriL (Prostova et al., 2017) . A cloverleaf structure is also formed in the negative strand RNA template, the kissing interaction of the hairpin loops, within the cloverleaf structure is required for viral genomic RNA synthesis (Melchers et al., 1997) .

Many positive strand plant viruses have such tRNA-like structures in the 3 ′ UTRs of genomic RNAs (Dreher, 2010) . Most viral tRNA-like structures are aminoacylated (e.g., by Val, His, or Tyr), mimicking cellular tRNAs to regulate translation. A recent study proposed that these tRNA-like structures can also act as mobile elements in plant by promoting transport of viral transcripts via phloem sap (Zhang et al., 2016) . These tRNA-like structures are amenable to both modeling and experimental 3D determination. For example, the tRNA-like structure of Tobacco mosaic virus was recently solved by X-ray crystallography (Colussi et al., 2014) , this tRNA-like structure has multiple additional upstream pseudoknots. Indeed, viral tRNAlike structures discovered to date have variable sequence, length, and structures (Dreher, 2010) .

The elements considered above form mainly local structures. These local structures may form in nascent RNA (Meyer, 2017) or be stabilized by protein or RNA binding. Local structures can be predicted using appropriate windows of sequence (e.g., 80-200 bases) and it is also practical to analyse local alignments of similar lengths (Lange et al., 2012) .

Long-range interactions from over a few hundred bases to >26 kb do occur in RNA viruses, but are difficult to predict accurately. Challenges include: there are many possible interactions; likely complex structures (e.g., pseudoknots); structures will form co-transcriptionally limiting interactions; and small molecules, proteins, RNAs and complexes (e.g., ribosomes) will bind and affect folding (Lai and Meyer, 2016; Napthine et al., 2016; Sun et al., 2017) .

Methods based on MFE when applied to long RNAs (e.g., mfold on a viral genome) will tend to predict large structures with a large number of long-distance interactions-which should be viewed with caution. Indeed, experimentally determined structures of full length genomes show more local than longrange interactions e.g., HIV RNA has many local structures (Watts et al., 2009 ) but only five long-distance interactions .

Specific tools have been developed to predict long-range interactions, e.g., LRIscan, with 14 of 16 known long-distance interactions confirmed and plausible candidates from other viruses predicted .

Efficient frameshifting, in addition to the local frameshifting elements (e.g., pseudoknots), may require long-range interactions (Nicholson and White, 2014) . These have been well-characterized in BYDV (Paul et al., 2001; Barry and Miller, 2002) and red clover necrotic mosaic virus (RCNMV; Tajima et al., 2011) and involve long-range kissing interactions (Figure 6 ).

Some viral RNAs also form structures with other RNAs, both viral and cellular. Notable examples are viral RNA dimerization elements (see subsection "Kissing Hairpins"), co-packaging elements, or interactions during translation with the rRNA in the ribosome (Deforges et al., 2015; Angulo et al., 2016) . Co-packaging of multiple segments of RNA may suggest that inter-molecular interactions occur, for example in RCNMV the loop of origin of assembly stem-loop on RNA2 interacts with on RNA1 (Newburn and White, 2015) . Specific software e.g., RNAhybrid can be used to predict such inter-molecular interactions (Rehmsmeier et al., 2004) .

The roles of many structured RNA elements of viruses have been studied in detail. Some examples are discussed in this section.

Viral RNAs are not always capped, this means that they have evolved specific mechanisms to enhance cap-independent translation. The RNAs of many viruses contain large structured IRES, to promote this. Well-characterized examples are found in picornaviruses and HCV (Lee et al., 2017 ; Figure 7A ). The IRES recruits ribosomes near or directly to the translation initiation codons of viral mRNAs, bypassing the need for the cap-binding complex. This allows the virus to manipulate the host translation machinery by inhibition or proteolytic cleavage of host eukaryotic initiation factors (eIFs). Translation of viral mRNAs is possible even during the host translation shutoff (Lee et al., 2017) .

However, viral IRES are one of the most challenging structural elements to predict and characterize. This is because IRES are complex and diverse, often consisting of multiple stem-loops and/or pseudoknots (Dreher, 2010; Lozano et al., 2016) . Limited progress has been made in development of automated pipelines for IRES prediction, however, two specialized webservers are available, namely VIPS (Hong et al., 2013) and IRESPred (Kolekar et al., 2016) . Both VIPS and IRESPred predict IRES based on known IRES sequences and structures. In particular, IRESPred looks for the binding sequence motifs of small subunit ribosomal proteins. Alternatively, a combination of tools, in particular BLAST, Pfold, Centroid Fold, mfold, and pKiss have also proven to be useful in IRES prediction (Asnani et al., 2015 (Asnani et al., , 2016 .

Ribosome shunting consists of a series of unusual translation events ( Figure 7C) . A ribosome first initiates at a small ORF (sORF) and terminates right before a large RNA structure. The large ribosomal subunit dissociate but the small subunit bypasses the RNA structure, docks on a landing site and resumes scanning. The ribosome can then reinitiate even at a non-AUG codon. The shunt elements were discovered in DNA viruses, first in the pararetrovirus cauliflower mosaic virus (Fütterer et al., 1990 (Fütterer et al., , 1993 , then a retrovirus, prototype foamy virus (Schepetilnikov et al., 2009) , and a plant RNA virus, rice tungro spherical virus (Pooggin et al., 2012) .

To predict the cis-elements driving ribosome shunting, several key characteristics of these elements have been taken into account. These elements are located in a long, highly structured 5 ′ UTR of the virus genomic RNA that has multiple upstream AUGs. These 5 ′ UTR features appear to inhibit translation of the main ORF(s). The sORF(s) precede the large RNA structure is involved, whereas the following upstream AUGs are folded up in a large RNA structure. This RNA structure has a stable basepairing at the stem base. The shunt take-off site (sequence around the sORF termination codon) and landing site are expected to be conserved between closely related viruses or co-evolved viruses. For example, the shunt cis-elements are remarkably similar between a pair of co-evolved viruses, a RNA picorna-like virus, rice tungro spherical virus and a DNA pararetrovirus, rice tungro bacilliform virus (Pooggin et al., 2012) .

Cap-independent translation may be stimulated by local RNA structures but surprisingly in some cases also by long-distance base-pairing. This long-distance base-pairing has been wellcharacterized in several plant viral RNAs (Miras et al., 2017) . For example, BYDV has a CITE located at the 3 ′ end (Figures 6, 7B) . This element interacts with a stem-loop located at the 5 ′ UTR (long-range kissing interactions) to promote cap-independent translation (Miller et al., 2015) . Other viral genera also use long-distance base-pairing or interaction with rRNA (Deforges et al., 2015) . Published models of two such complex structures have been made using RNA2D3D (Mccormack et al., 2008) and MC-Sym (Wang et al., 2011; Newburn and White, 2015) .

Some viruses use unusual mechanisms to reinitiate after translation of a long CDS. These stop/restart or terminationreinitiation mechanisms were initially found in Caliciviruses ( Figure 7D ) and then Influenza B viruses (reviewed in detail by Powell, 2010) . These mechanisms allow effective translation of both the first and second ORFs of a viral mRNA, producing two distinct functional proteins (Zinoviev et al., 2015) . These mechanisms require several cis-regulatory elements that can be partially structured, and these may interact with other RNAs e.g., the 18S rRNA.

These mechanisms are distinct from the mechanisms utilizing upstream ORFs (uORFs), or programmed ribosomal frameshifting (Miras et al., 2017) . In eukaryotic mRNAs, including viral ones, uORFs are commonly found to repress translation of the mORFs (Hellens et al., 2016; Zong et al., 2017) . These regulatory uORFs are usually short and therefore producing only small peptides (Hellens et al., 2016; Starck et al., 2016) . Whereas, in ribosomal frameshifting, only one protein is produced with the use of two overlapping ORFs (Atkins et al., 2016) .

The stop-start cis-elements in Caliciviruses and Influenza B viruses are found between −84 and the start codon of the second ORF (Powell, 2010; Zinoviev et al., 2015) . These ciselements consist of a termination upstream ribosome-binding site (TURBS) and a stop/restart site. TURBS consists of a motif 1 (18S rRNA complementary site), and motif 2 and 2 * (likely base-pairing and structured). Motif 1 is loosely structural to allow tethering of small ribosomal subunit for reinitiation whereas motif 2 and 2 * could form an RNA structure that enhances translation of the second ORF (Lee et al., 2017) .

Recently, stop/restart cis-elements were found in helminthosporium victoriae virus 190S. These elements consist of a H-type pseudoknot and an AUGA stop/restart site (start and stop codons are underlined and bolded, respectively; Li et al., 2015) . This pseudoknot was successfully predicted using HPknotter (Huang et al., 2005) . Disruption of the tertiary base-pairs abolishes translation of the second ORF (Li et al., 2015) .

To predict stop/restart cis-elements, one could first look for slightly overlapping ORFs with a stop/restart site. However, these cis-elements also enable translation of a synthetic, nonoverlapping second ORF effectively within a range of 40 nucleotides downstream of the first ORF (Ahmadian et al., 2000; Napthine et al., 2009; Zinoviev et al., 2015) . Motif 1 (18S rRNA complementary site) is likely present between −84 and the start codon of the second ORF. A RNA structure may also found within the region. However, suboptimal RNA structures could also be present (Napthine et al., 2009 ).

This review has presented examples where virology research has been enhanced by the appropriate use of bioinformatic methods for RNA structure prediction. These examples highlight how computer predictions were used in conjunction with experimental tools for functional studies. Some of computational tools and resources are generally applicable to RNA structure prediction whereas others are specific to virology. Additional prediction tools are continually becoming available Miao and Westhof, 2017; .

However, some challenges remain for the application of newer RNA structure tools in virology ( Table 1) . Some of these are being addressed by user friendly suites and tools becoming available as noted throughout this review and listed in the companion website (http://bioanalysis.otago.ac.nz/ Lim2017.htm). In addition, specialized workshops and training may facilitate the use of these RNA tools e.g., The EMBO Practical Course on Computational RNA Biology course material available online (https://bibiserv.cebitec.uni-bielefeld.de/EMBO-RNACourse/).

Both authors have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

This work is supported by a University of Otago Research Grant (UORG) and CSL is a recipient of a Dr. Sulaiman Daud 125th Jubilee Postgraduate Scholarship.

RNA motif discovery: a computational overview

Expression of the ORF-2 protein of the human respiratory syncytial virus M2 gene is initiated by a ribosomal termination-dependent reinitiation mechanism

Zika virus produces noncoding RNAs using a multipseudoknot structure that confounds a cellular exonuclease

LOOP IIId of the HCV IRES is essential for the structural rearrangement of the 40S-HCV IRES complex

New functionality of RNAComposer: an application to shape the axis of miR160 precursor structure

Widespread distribution and structural diversity of Type IV IRESs in members of Picornaviridae

Initiation on the divergent Type I cadicivirus IRES: factor requirements and interactions with the translation apparatus

Ribosomal frameshifting and transcriptional slippage: from genetic steganography and cryptography to adventitious use

RNA-bioinformatics: tools, services and databases for the analysis of RNA-based regulation

Purification of RNA and RNA-protein complexes by an R17 coat protein affinity method

A−1 ribosomal frameshift element that requires base pairing across four kilobases suggests a mechanism of regulating ribosome and replicase traffic on a viral RNA

Recode-2: new design, new search tools, and many more genes

RNAstructure: web servers for RNA secondary structure prediction and analysis

RNAalifold: improved consensus structure prediction for RNA alignments

Localization of ASH1 mRNA particles in living yeast

Computational detection of abundant long-range nucleotide covariation in Drosophila genomes

RNAJunction: a database of RNA junctions and kissing loops for three-dimensional structural analysis and nanodesign

Viral RNA pseudoknots: versatile motifs in gene expression and replication

Functions of the 3 ′ and 5 ′ genome RNA regions of members of the genus Flavivirus

The 3 ′ -nucleotides of flavivirus genomic RNA form a conserved secondary structure

NCBI viral genomes resource

Progress and outlook in structural biology of large viral RNAs

HIV-1 and human PEG10 frameshift elements are functionally distinct and distinguished by novel small molecule modulators

The structural basis of pathogenic subgenomic flavivirus RNA (sfRNA) production

RNA structures that resist degradation by Xrn1 produce a pathogenic Dengue virus RNA

Viral subversion of host functions for picornavirus translation and RNA replication

Distinct families of cis-acting RNA replication elements epsilon from hepatitis B viruses

Prospects for inhibiting the post-transcriptional regulation of gene expression in hepatitis B virus

Computational identification of new structured cis-regulatory elements in the 3 ′ -untranslated region of human protein coding genes

Coordination among tertiary base pairs results in an efficient frameshift-stimulating RNA pseudoknot

Frameshifting in Alphaviruses: a diversity of 3 ′ stimulatory structures

The Nucleic Acid Database: new features and capabilities

mRNA stabilization by poly(A) binding protein is independent of poly(A) and requires translation

The structural basis of transfer RNA mimicry and conformational plasticity by a viral RNA

A loop 2 cytidinestem 1 minor groove interaction as a positive determinant for pseudoknotstimulated−1 ribosomal frameshifting

Barley yellow dwarf

VARNA: interactive drawing and editing of the RNA secondary structure

Computational modeling of RNA 3D structures and interactions

mRNAs that specifically interact with eukaryotic ribosomal subunits

Structure of a hepatitis C virus RNA domain in complex with a translation inhibitor reveals a binding mode reminiscent of riboswitches

Viral tRNAs and tRNA-like structures

NMR structure of the 101-nucleotide core encapsidation signal of the Moloney murine leukemia virus

Structure and function of RNA elements present in enteroviral genomes

CMCompare webserver: comparing RNA families via covariance models

RNAlien -Unsupervised RNA family model construction

Metal ions and flexibility in a viral RNA pseudoknot at atomic resolution

Recent advances in RNA folding

Efficient−2 frameshifting by mammalian ribosomes to synthesize an additional arterivirus protein

Functional information stored in the conserved structural RNA domains of flavivirus genomes

Mapping overlapping functional elements embedded within the protein-coding regions of RNA viruses

Non-canonical translation in RNA viruses

Detecting overlapping coding sequences in virus genomes

Prediction of conserved long-range RNA-RNA interactions in full viral genomes

Positive and negative control of translation by the leader sequence of cauliflower mosaic virus pregenomic 35S RNA

Nonlinear ribosome migration on cauliflower mosaic virus 35S RNA

HIV-1 leader RNA dimeric interface revealed by NMR

A comprehensive comparison of comparative RNA structure prediction approaches

A benchmark of multiple sequence alignment programs upon structural RNAs

Frameshifting RNA pseudoknots: structure and mechanism

Concepts and introduction to RNA bioinformatics

Stable secondary structures at the 3 ′ -end of the genome of yellow fever virus (17 D vaccine strain)

Arginine/serine-rich domains of SR proteins can function as activators of pre-mRNA splicing

RALEE-RNA aLignment editor in emacs

Rev: beyond nuclear export

The Vienna RNA websuite

The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures

A family of non-classical pseudoknots in influenza A and B viruses

Subtype-specific structural constraints in the evolution of influenza A virus hemagglutinin genes

RNA structural constraints in the evolution of the influenza A virus genome NP segment

Conserved elements in the 3 ′ untranslated region of flavivirus RNAs and potential cyclization sequences

Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots

Nucleic acid immunity

Virus variation resource -improved response to emergent viral outbreaks

The emerging world of small ORFs

Small molecules targeting viral RNA

Viral RNA targets and their small molecule ligands

HIV-1 frameshift RNA-targeted triazoles inhibit propagation of replication-competent and multi-drug-resistant HIV in human cells

Prohead RNA: a noncoding viral RNA of novel structure and function

Fast folding and comparison of RNA secondary structures

Conserved RNA secondary structures in viral genomes: a survey

The RNA virus quasispecies: fact or fiction?

Viral IRES prediction system -a web server for prediction of the IRES secondary structure in silico

A heuristic approach for detecting RNA H-type pseudoknots

ViralZone: a knowledge resource to understand virus diversity

Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting

The RNA shapes studio

Mutations designed by ensemble defect to misfold conserved RNA structures of influenza A segments 7 and 8 affect splicing and attenuate viral replication in cell culture

A thermodynamic analysis of the sequence-specific binding of RNA by bacteriophage MS2 coat protein

Retroviral RNA dimerization and packaging: the what, how, when, where, and why

Ablation of programmed−1 ribosomal frameshifting in venezuelan equine encephalitis virus results in attenuated neuropathogenicity

Virus Taxonomy: Ninth Report of the International Committee on Taxonomy of Viruses

Computational and molecular analysis of conserved influenza A virus RNA secondary structures involved in infectious virion production

IRESPred: web server for prediction of cellular and viral internal ribosome entry site (IRES)

The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses

The Los alamos hepatitis C sequence database

A comprehensive comparison of general RNA-RNA interaction prediction methods

R-CHIE: a web server and R package for visualizing RNA secondary structures

Global or local? Predicting secondary structure and accessibility in mRNAs

Quasispecies theory and the behavior of RNA viruses

Targeting the HIV RNA genome: high-hanging fruit only needs a longer ladder

Bridging the gap between in vitro and in vivo RNA folding

Regulation mechanisms of viral IRES-driven translation

Structural features of influenza A virus panhandle RNA enabling the activation of RIG-I independently of 5 ′ -triphosphate

An RNA cassette from Helminthosporium victoriae virus 190S necessary and sufficient for stop/restart translation

A nomenclature of junctions and branchpoints in nucleic acids

Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element

Mutations that increase the affinity of a translational repressor for RNA

Rapid and sensitive protein similarity searches

Cis-acting RNA elements in human and animal plus-strand RNA viruses

ViennaRNA Package 2.0

Predicting RNA secondary structures from sequence and probing data

webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser

Modeling threedimensional structural motifs of Viral IRES

Coronavirus cis-Acting RNA Elements

SimRNAweb: a web server for RNA 3D structure modeling with optional restraints

The life-cycle of the HIV-1 Gag-RNA complex

Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses

Identification of cis-acting elements in the 3 ′ -untranslated region of the dengue virus type 2 RNA that modulate translation and replication

Challenges in RNA virus bioinformatics

The highly conserved codon following the slippery sequence supports−1 frameshift efficiency at the HIV-1 frameshift site

Dynalign: an algorithm for finding the secondary structure common to two RNA sequences

Predicting oligonucleotide affinity to nucleic acid targets

Structural domains within the 3' untranslated region of Turnip crinkle virus

Analysis tool web services from the EMBL-EBI

Kissing of the two predominant hairpin loops in the coxsackie B virus 3 ′ untranslated region is the essential structural feature of the origin of replication required for negative-strand RNA synthesis

NS1' of Flaviviruses in the Japanese encephalitis virus serogroup is a product of ribosomal frameshifting and plays a role in viral neuroinvasiveness

In silico methods for co-transcriptional RNA secondary structure prediction and for investigating alternative RNA structure expression

RNA-puzzles round III: 3D RNA structure prediction of five riboswitches and one ribozyme

RNA Structure: advances and assessment of 3D structure prediction

Cis-and trans-regulation of luteovirus gene expression by the 3 ′ end of the viral genome

Sequence and organization of barley yellow dwarf virus genomic RNA

Noncanonical Translation in Plant RNA Viruses

Structure of a conserved retroviral RNA packaging element by NMR spectroscopy and cryo-electron tomography

IRESite-a tool for the examination of viral and cellular internal ribosome entry sites

Structural and functional characterization of programmed ribosomal frameshift signals in West Nile virus strains reveals high structural plasticity among cis-acting RNA elements

The 3 ′ splice site of influenza A segment 7 mRNA can exist in two conformations: a pseudoknot and a hairpin

The influenza A segment 7 mRNA 3 ′ splice site pseudoknot/hairpin family

Identification of potential conserved RNA secondary structure throughout influenza A coding regions

Expression of the VP2 protein of murine norovirus by a translation termination-reinitiation strategy

A novel role for poly(C) binding proteins in programmed ribosomal frameshifting

Infernal 1.1: 100-fold faster RNA homology searches

Rfam 12.0: updates to the RNA families database

Cis-acting RNA elements in positive-strand RNA plant virus genomes

Functional long-range RNA-RNA interactions in positive-strand RNA viruses

Exploring the architecture of viral RNA genomes

An infectious RNA with a heptaadenosine stretch responsible for programmed−1 ribosomal frameshift derived from a full-length cDNA clone of Hibiscus latent Singapore virus

Algorithms for loop matchings

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Sequence comparison and secondary structure analysis of the 3 ′ noncoding region of flavivirus genomes reveals multiple pseudoknots

Dimerization of retroviral RNA genomes: an inseparable pair

Effects of HBV genetic variability on RNAi strategies

Rewriting nature's assembly manual for a ssRNA virus

A sequence required for −1 ribosomal frameshifting located four kilobases downstream of the frameshift site 1 1Edited by D. E. Draper

Translational repression by bacteriophage MS2 coat protein expressed from a plasmid. A system for genetic analysis of a protein-RNA interaction

Improved tools for biological sequence comparison

Automated classification of RNA 3D motifs and the RNA 3D motif atlas

Advancing viral RNA structure prediction: measuring the thermodynamics of pyrimidine-rich internal loops

A highly structured, nuclease-resistant, noncoding RNA produced by flaviviruses is required for pathogenicity

Towards identification of cis-acting elements involved in the replication of enterovirus and rhinovirus RNAs: a proposal for the existence of tRNA-like terminal structures

Molecular characterization of the Japanese encephalitis serocomplex of the flavivirus genus

Short ORF-dependent ribosome shunting operates in an RNA picorna-like virus and a DNA pararetrovirus that cause rice Tungro disease

Automated 3D structure composition for large RNAs

Translational termination-reinitiation in RNA viruses

The influenza A PB1-F2 and N40 start codons are contained within an RNA pseudoknot

Independent evolution of tetraloop in enterovirus oriL replicative element and its putative binding partners in virus protein 3C

Secondary structure of the 3' untranslated region of flaviviruses: similarities and differences

CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

Visualization of RNA-protein interactions in living cells: FMRP and IMP1 interact on mRNAs

Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities

Fast and effective prediction of microRNA/target duplexes

A dynamic programming algorithm for RNA structure prediction including pseudoknots

A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

JAR3D Webserver: scoring and aligning RNA loop sequences to known 3D motifs

Unmasking the information encoded as structural motifs of viral RNA genomes: a potential antiviral target

cis-Acting RNA elements in the hepatitis C virus RNA genome

Molecular dissection of the prototype foamy virus (PFV) RNA 5 ′ -UTR identifies essential elements of a ribosomal shunt

FRESCo: finding regions of excess synonymous constraint in diverse viruses

Evidence for the existence of a pseudoknot structure at the 3 ′ terminus of the flavivirus genomic RNA

Enabling the "host jump": structural determinants of receptor-binding specificity in influenza A viruses

SSE: a nucleotide and amino acid sequence analysis platform

3 ′ cap-independent translation enhancers of plant viruses

Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures

Freiburg RNA Tools: a web server integrating INTARNA, EXPARNA and LOCARNA

A conserved secondary structural element in the coding region of the influenza A Virus Nucleoprotein (NP) mRNA is important for the regulation of viral proliferation

DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model

Translation from the 5 ′ untranslated region shapes the integrated stress response

Probing sequence-specific RNA recognition by the bacteriophage MS2 coat protein

Frequency and isostericity of RNA base pairs

Minor groove RNA triplex in the crystal structure of a ribosomal frameshifting viral pseudoknot

Japanese encephalitis virus NS1 ′ protein depends on pseudoknot secondary structure and is cleaved by caspase during virus infection and cell apoptosis

Theory and modeling of RNA structure and interactions with metal ions and small molecules

A longdistance RNA-RNA interaction plays an important role in programmed−1 ribosomal frameshifting in the translation of p88 replicase protein of Red clover necrotic mosaic virus

Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Exploring RNA conformational space under sparse distance restraints

Prediction of RNA secondary structure including kissing hairpin motifs

KnotInFrame: prediction of−1 ribosomal frameshift events

RNA 3D modules in genome-wide predictions of RNA 2D structure

WAR: webserver for aligning structural RNAs

Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus

RNA Structure Determination: Methods and Protocols

A comprehensive benchmark of RNA-RNA interaction prediction tools for all domains of life

RNA structure duplications and flavivirus host adaptation

Dengue virus RNA structure specialization facilitates host adaptation

The flavivirus 3'-noncoding region: extensive size heterogeneity independent of evolutionary relationships among strains of tick-borne encephalitis virus

Viral subversion of the host protein synthesis machinery

Optimization of RNA 3D structure prediction using evolutionary restraints of nucleotide-nucleotide interactions from direct coupling analysis

The capbinding translation initiation factor, eIF4E, binds a pseudoknot in a viral cap-independent translation element

Quantitative mass spectrometry of DENV-2 RNA-interacting proteins reveals that the DEAD-box RNA helicase DDX6 binds the DB1 and DB2 3 ′ UTR structures

Jalview Version 2-a multiple sequence alignment editor and analysis workbench

Arc diagrams: visualizing structure in strings

Architecture and secondary structure of an entire HIV-1 RNA genome

Review toward all RNA structures, concisely

Prospects for a dengue virus vaccine

Prediction of secondary structures conserved in multiple RNA sequences

Transcription regulation mechanisms of bacteriophages: recent advances and future prospects

CMfinder-a covariance model based RNA motif finding algorithm

3 ′ RNA elements in hepatitis C virus replication: kissing partners and long poly(U)

Co-selection of West Nile virus nucleotides that confer resistance to an antisense oligomer while maintaining long-distance RNA/RNA base pairings

RNA Structural Motifs

tRNA-related sequences trigger systemic mRNA transport in plants

Automated and fast building of three-dimensional RNA structures

Four RNA families with functional transient structures

Multiple mechanisms of reinitiation on bicistronic calicivirus mRNAs

The structure-function relationship of the enterovirus 3 ′ -UTR

Differential regulation of hepatitis B virus core protein expression and genome replication by a small upstream open reading frame and naturally occurring mutations in the precore region

On finding all suboptimal foldings of an RNA molecule

Mfold web server for nucleic acid folding and hybridization prediction

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Copyright © 2018 Lim and Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.