key: cord-0947462-3l3jznkj authors: Holbrook, Stephen R title: RNA structure: the long and the short of it date: 2005-05-16 journal: Curr Opin Struct Biol DOI: 10.1016/j.sbi.2005.04.005 sha: 66bad98f0d51b374c6e3f7f0b1c380726e61e38d doc_id: 947462 cord_uid: 3l3jznkj The database of RNA structure has grown tremendously since the crystal structure analyses of ribosomal subunits in 2000–2001. During the past year, the trend toward determining the structure of large, complex biological RNAs has accelerated, with the analysis of three intact group I introns, A- and B-type ribonuclease P RNAs, a riboswitch–substrate complex and other structures. The growing database of RNA structures, coupled with efforts directed at the standardization of nomenclature and classification of motifs, has resulted in the identification and characterization of numerous RNA secondary and tertiary structure motifs. Because a large proportion of RNA structure can now be shown to be composed of these recurring structural motifs, a view of RNA as a modular structure built from a combination of these building blocks and tertiary linkers is beginning to emerge. At the same time, however, more detailed analysis of water, metal, ligand and protein binding to RNA is revealing the effect of these moieties on folding and structure formation. The balance between the views of RNA structure either as strictly a construct of preformed building blocks linked in a limited number of ways or as a flexible polymer assuming a global fold influenced by its environment will be the focus of current and future RNA structural biology. Stephen R Holbrook The database of RNA structure has grown tremendously since the crystal structure analyses of ribosomal subunits in [2000] [2001] . During the past year, the trend toward determining the structure of large, complex biological RNAs has accelerated, with the analysis of three intact group I introns, A-and B-type ribonuclease P RNAs, a riboswitch-substrate complex and other structures. The growing database of RNA structures, coupled with efforts directed at the standardization of nomenclature and classification of motifs, has resulted in the identification and characterization of numerous RNA secondary and tertiary structure motifs. Because a large proportion of RNA structure can now be shown to be composed of these recurring structural motifs, a view of RNA as a modular structure built from a combination of these building blocks and tertiary linkers is beginning to emerge. At the same time, however, more detailed analysis of water, metal, ligand and protein binding to RNA is revealing the effect of these moieties on folding and structure formation. The balance between the views of RNA structure either as strictly a construct of preformed building blocks linked in a limited number of ways or as a flexible polymer assuming a global fold influenced by its environment will be the focus of current and future RNA structural biology. Recent trends in RNA structure determination and analysis have accelerated in two seemingly disparate directions: the determination of larger, more biologically relevant RNA structures by both X-ray crystallography and NMR methods; and the identification, classification and characterization of the small, ubiquitous structural motifs found in these RNA structures. Surprisingly, these two directions are related and are mutually reinfor-cing the advancement of the field of RNA structural biology. Continuing improvements in methods for the synthesis, purification, crystallization and derivatization of large RNA molecules, together with technical advances at and increased availability of synchrotron X-ray beamlines, and the development of advanced structure solution software, are allowing more and larger RNA structures to be determined [1] . The parallel development of high-resolution NMR methods has increased the feasible size of RNA structures that can be determined, as evidenced by the structure of the 101-nucleotide RNA responsible for signaling core encapsidation in Moloney murine leukemia virus (MMLV) [2 ] . Figure 1 shows the average size of RNA structures determined by year, as catalogued in the Nucleic Acid Database (NDB) [3] . The solution of the first ribosomal subunit structures in 2000-2001 demonstrated that methods for the determination of large RNA structures were available. These methods are now being widely applied to a variety of biologically important RNA structure analyses. Structural studies and comparative sequence analyses have suggested that biological RNAs are largely modular in nature, composed primarily of conserved structural building blocks or motifs [4] of secondary (helices, and internal, external and junction loops) and tertiary (coaxial stacks, kissing hairpin loops, ribose zippers, etc.) structure. Although many secondary and tertiary structure motifs have been identified and characterized in terms of sequence preference, structural constraints, energetics and dynamics, it is an open question as to whether we have observed a large or small fraction of the universe of motifs. Immediately after the structure determination of the large [5, 6] and small [7] ribosomal subunits, inspection of the rRNA structures suggested that they consisted mainly of known RNA structural elements (i.e. few, if any, novel motifs were observed). More detailed studies are showing, however, that there is an abundance of information still to be extracted from these structures and correlated with RNA structural features observed in other biomolecules. The determination of large RNA structures in many cases has benefited from the design of more crystallizable sequences, based on an understanding of RNA structural motifs. Reciprocally, the identification and characterization of known and novel RNA motifs are greatly advanced by the structure determination of new types of large biological RNAs. In this review, recent structure determinations of several biological RNAs are presented and described. The secondary structure and tertiary interaction motifs of these RNAs are analyzed in terms of our current knowledge of RNA motifs and the novel motifs observed in these structures. The ability to partition large RNA structures into well-characterized subunits, or motifs, is an important advance in understanding the relationship between structure and function, and in RNA structure prediction and design. In addition to the updated ribosomal subunit models, with better resolution, and the refinement and analysis of rRNA interactions with protein and metal [8, 9] , many structures of the ribosomal subunits bound to antibiotics [10] and other ligands [11, 12] have been deposited in the Protein Data Bank (PDB). Other biological RNA structures that have recently been determined are summarized in Table 1 . Three group I introns are among recently solved, new, high-impact, biological RNA structures: an intact, selfsplicing group I intron with both its 5 0 and 3 0 exons from the purple bacterium Azoarcus sp [13, 14] , a group I ribozyme-product complex from phage Twort [15] and a group I intron ribozyme from Tetrahymena [16] . Other structures include the specificity domains of both A- [17] and B-type ribonuclease P [18 ] ; RNAs corresponding to a guanine-responsive riboswitch (xpt) complexed with guanine [19 ] or hypoxanthine [20] , and an adenosine-responsive riboswitch (add) complexed with adenosine [19 ] ; a highly conserved stem-loop motif found at the 3 0 end of the genome of SARS (severe acute respiratory syndrome) virus and other coronaviruses [21 ] ; the core encapsidation signal of MMLV [2 ] ; and complexes between a high-affinity RNA aptamer and the NF-kB p50 homodimer [22] , and between the archaeal RNAbinding protein L7Ae and an RNA K-turn derived from a H/ACA small RNA [23] . In addition, a series of crystal structures of the hepatitis virus ribozyme in its precleaved state [24] showed that significant conformational changes take place and a metal ion is lost in comparison to the product form of the ribozyme [25] . Finally, structures of a high-salt left-handed RNA duplex [26] and a mirror image (or L-configuration) [27] RNA duplex show variations on the canonical 'standard' RNA double helix. These structures demonstrate not only the prevalence of well-known RNA structural motifs, but also the presence of numerous novel structural elements that potentially may serve as common modules in biological RNA. Conversely, the structure of an alternative tRNA conformation, designated as lambda form, has been found in complex with the tRNA modification enzyme archaeosine tRNA-guanine transglycosylase [28, 29] . In this complex, the tertiary interactions between the D-loop and Tloop are disrupted, leading to an alternative base pairing pattern. This suggests that RNA structure in general, and motif structure specifically, are subject to change through interaction with environmental agents such as proteins, other RNAs, metals or other ligands. Number of RNA structures deposited in the NDB (http://ndbserver.rutgers.edu/) (dark red) and the average number of nucleotides per structure (yellow) given by year. Although the number of structure determinations has grown only slowly, the average structure size has dramatically increased since 2000. The availability of numerous diverse, large RNA structures has made possible the identification and classification of an increasing number of RNA secondary and tertiary structure motifs. Both manual and automated classification procedures are being used to identify and characterize motifs [4] . A database of non-canonical base pairs in RNA structures [30] , coupled with tools for the automatic identification and classification of RNA base pairs [31] , provides an initial description of RNA secondary structure. This description can be coupled with automated motif searches [32] [33] [34] , identifying both previously known and new RNA motifs. The SCOR (Structural Classification of RNA) database is a comprehensive, manually curated resource of RNA structural motifs that utilizes automated tools and literature descriptions to assist in the classification of RNA secondary and tertiary structure motifs [35, 36 ] . Characterization of these motifs or RNA building blocks in turn is enabling RNA design [37, 38 ] , RNA structure prediction, RNA modeling and RNA gene finding [39] . Recently determined crystal structures of RNA and RNA-protein complexes (in addition to rRNA structures) are revealing many new structural elements, in addition to previously characterized motifs (as summarized in Table 1 ). As additional RNA structures are determined, 304 Sequences and topology the occurrence of these elements may indicate that they are common structural motifs, and thus enable their characterization by sequence, structure and function. Presented below are examples of recently determined RNA structures that have both known RNA motifs and novel structural features that may correspond to new structural motifs. These motifs are likely to be crucial to both biological function and three-dimensional structure. The 3 0 end of the genome of SARS virus and related coronaviruses and astroviruses contains a highly conserved sequence called the s2m or stem-loop II element. As shown in Table 1 , the crystal structure of a 48-nucleotide RNA corresponding to this element has been solved at 2.7 Å resolution [21 ] . In addition to previously characterized RNA structural motifs, such as a GNRA-like pentaloop and a dinucleotide platform base triple [36 ] , novel structural features were observed. A unique base quartet is formed between Watson-Crick GC pairs at the junction of two helices, as shown in Figure 2 . Another novel RNA structural feature is formed by a three-purine bulge that excludes an adenosine from the central stack and forms tertiary interactions with residues from an asymmetric internal loop, forming a 'tunnel' that serves as a binding site for two magnesium ions [21 ] . The interaction between two hairpin loops in the guanineresponsive riboswitch is required for its biological function. Crystal structures of the riboswitch-hypoxanthine complex [20] and the riboswitch-guanine complex [19 ] show the detail of this tertiary interaction between these seven-nucleotide loops, as shown in Figure 3 . These quartets are similar to those found in the SARS virus s2m element (see above) in that they include Watson-Crick pairs (one base from each loop), but in this case they interact with non-canonical pairs (formed between loops) in their minor groove. This tertiary interaction pulls the loops together along with their attached stems. The Moloney murine leukemia virus core encapsidation signal Specificity of packaging of retroviral genomes is a result of interaction between nucleocapsid domains of the Gag polyproteins and the C-site of the viral genome. Conserved RNA secondary structure elements within the Csite have been identified for MMLV and other retroviruses. In MMLV, such a region, which includes three stem-loop structures, is highly conserved. The sequences of the three stem-loops (tetraloops) in this region were changed to prevent intermolecular interaction and enhance crystallization, while retaining function. The structure of this engineered, functional RNA, which corresponds to the core encapsidation signal, has been determined by NMR methods [2 ] . As seen in Figure 4 , two of the stem-loops form a coaxial stack connected to the third stem-loop by a flexible linker. In addition to several familiar structural motifs, a new element, termed an 'A-minor kink turn', was observed. This structural feature, with similarities to both the A-minor motif and a kink turn, is formed by the interaction of an extruded uridine and a GGAA bulge. In this element, two unpaired adenosines pack in the minor groove of a neighboring stem, inducing a kink in the helix. More structural examples are needed to determine whether this is a recurring RNA structural motif or unique to this structure. Considering the large fraction of RNA that can be attributed to secondary and tertiary structure motifs, the prediction of these motifs from sequence would be a large step toward the prediction of RNA three-dimensional structure. Currently, the most successful approach to predicting RNA motifs begins by predicting the secondary structure by means of either energy minimization using nearest neighbor thermodynamic rules, as implemented in MFOLD [40] and the Vienna RNA package [41] , or covariation analysis, or a combination of both [42] . Next, potential motifs are identified in the predicted RNA structure Holbrook 305 A Watson-Crick base quartet found in a highly conserved region of the SARS virus genome. The quartet is shown in orange, a GNRA-like pentaloop with a closing base pair is in green, a base pair from a second helix at a sharp angle to the tetraloop is shown in magenta and a dinucleotide linker between bases of the quartet is in yellow. secondary structure loop regions using sizes and sequences known to be compatible with the motifs. The sequence profiles of homologous aligned RNAs are then checked against these potential motifs to confirm compatibility. This approach was used successfully to identify sarcin-ricin loops and loop E motifs in rRNA [34] . The ribose zipper tertiary structure motif has been characterized [43] in terms of sequence and secondary structure based on 97 instances observed in rRNA. The most common, or canonical, type of ribose zipper shows a strong sequence bias, and is found to link stem or stem-like regions with loop regions. The residues involved in ribose zippers were also found to be phylogenetically conserved and to demonstrate covariation patterns. Thus, the application of sequence and structure conservation analysis, as used for the prediction of secondary structure motifs, is also a powerful tool for predicting this tertiary interaction motif. Further studies of the application of this approach to the prediction of other secondary and tertiary structure motifs are necessary to confirm and make this approach standard. The focus of RNA structure determination has switched from small model structures and molecular fragments to large, naturally occurring biological RNAs, some in protein complexes and some engineered to enhance crystallization. These larger, more biological structures provide a diverse picture, and represent a more uniform sampling of the secondary and tertiary structure motifs available to RNA. Detailed examination of recently determined biological RNAs has revealed many new structural elements, in addition to finding repeated examples of previously characterized motifs. In fact, by far the greatest proportion of RNA structure can be described as consisting of combinations of well-characterized structural modules: double helices, motifs in hairpin loops and internal loops, and tertiary interaction motifs. Although we can identify many motifs in RNA structures, we are far from having a complete library of RNA structural motifs, and even farther from understanding the context of these motifs and their effect on the surrounding structure, and the ability to predict them directly from sequence. Because the potential applications of being able to predict, design [44] and engineer [37, 38 ] RNA structure through an understanding of its building blocks are so great, directed effort in this area is warranted. Although we will eventually obtain this knowledge through research on individual biological RNAs, we should consider a structural genomics type project to map the universe of RNA motifs, and a parallel computational effort to predict motifs from sequence and determine how they interact combinatorially to build up an overall RNA structure. An emerging trend in RNA structural biology is the association of RNA secondary structure and tertiary interaction motifs with human disease. Three examples have recently appeared in the literature recognizing these relationships and correlating them with specific structures. The most recent of these is a proposed RNA target, 306 Sequences and topology Interacting hairpin loops from the guanine-responsive riboswitch. One loop is in cyan and the other is in magenta, with stacked quartets in the loops colored yellow and orange. Helical stems of the hairpin loops are colored blue. MMLV core encapsidation signal. Green, GNRA tetraloop; salmon, disordered tetraloop; violet, Watson-Crick double helix; blue, non-canonical base pairs; cyan, base triple; orange, A-minor K-turn; yellow, linker region and bulge base. formed by kissing hairpin loop tertiary interaction motifs, for the binding of the KH2 domain of the fragile-X mental retardation protein (FMRP) [45] . This complex interaction competes with the binding of FMRP to brain polyribosomes and is associated with a crucial known mutation. Another example is a proposed explanation for lead toxicity by hydrolysis of mRNAs containing the leadzyme motif [46] . Finally, models of the role of mRNA secondary structure in a variety of human neurodegenerative diseases (trinucleotide repeat expansion diseases) caused by the expansion of CAG repeats in the open reading frames of certain genes have been proposed and are under active study [47, 48] . Crystallization of RNA and RNA-protein complexes NMR structure of the 101-nucleotide core encapsidation signal of the Moloney murine leukemia virus The RNA structure of the core recognition signal for encapsidation of the positive strand genome of MMLV is reported. This is the largest NMR structure of an RNA determined to date The Nucleic Acid Database: a comprehensive relational database of three-dimensional structures of nucleic acids Analysis of RNA motifs The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution High resolution structure of the large ribosomal subunit from a mesophilic eubacterium Crystal structure of the 30S ribosomal subunit from Thermus thermophilus: purification, crystallization and structure determination The roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit The contribution of metal ions to the structural stability of the large ribosomal subunit Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics Atomic structures of the 30S subunit and its complexes with ligands and antibiotics Crystal structure of an initiation factor bound to the 30S ribosomal subunit Crystal structure of a group I intron splicing intermediate Crystal structure of a self-splicing group I intron with both exons Crystal structure of a phage Twort group I ribozyme-product complex Structure of the Tetrahymena ribozyme: base triple sandwich and metal ion at the active site Crystal structure of the specificity domain of ribonuclease P Basis for structural diversity in homologous RNAs The crystal structure of the specificity domain of A-type ribonuclease P RNA from Thermus thermophilus was determined. Comparison of the Aand B-type ribonuclease P structures shows that, although the secondary and tertiary structures differ Structural basis for discriminative regulation of gene expression by adenine-and guanine-sensing mRNAs Crystal structures of guanine-and adenine-responsive riboswitches are determined. These structures illustrate how the RNAs discriminate between adenine and guanine, and how they perform their regulatory function Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine The structure of a rigorously conserved RNA element within the SARS virus genome The crystal structure of a highly conserved fragment of the SARS virus genome was determined. The structure shows novel RNA secondary and tertiary structure elements Crystal structure of NF-kappaB (p50)2 complexed to a high-affinity RNA aptamer Structure of protein L7Ae bound to a K-turn derived from an archaeal box H/ACA sRNA at 1.8Å resolution A conformational switch controls hepatitis delta virus ribozyme catalysis Crystal structure of a hepatitis delta virus ribozyme High salt solution structure of a left-handed RNA double helix Betzel C: First look at RNA in L-configuration Alternative tertiary structure of tRNA for recognition by a posttranscriptional modification enzyme tRNA structure goes from L to lambda NCIR: a database of non-canonical interactions in known RNA structures Tools for the automatic identification and classification of RNA base pairs RNA structure comparison, motif search and discovery using a reduced representation of RNA conformational space The identification of novel RNA structural motifs using COMPADRES: an automated approach to structural discovery Motif prediction in ribosomal RNAs. Lessons and prospects for automated motif prediction in homologous RNA molecules SCOR: Structural Classification of RNA, version 2.0 Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns ) is a compilation and classification of RNA secondary and tertiary motifs. Several new motifs are identified in the database and described in this article TectoRNA: modular assembly units for the construction of RNA nano-objects Building programmable jigsaw puzzles with RNA Self-assembling RNA building blocks were used to generate defined structures termed tectosquares. This illustrates the modularity of RNA, the ability to use known structural motifs to design building blocks and the potential of RNA in nanotechnology A computational approach to identify genes for functional RNAs in genomic sequences Mfold web server for nucleic acid folding and hybridization prediction Vienna RNA secondary structure server Secondary structure prediction for aligned RNA sequences Sequence and structural conservation in RNA ribose zippers Paradigms for computational nucleic acid design Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes Lead toxicity through the leadzyme RNA structure of trinucleotide repeats associated with human diseases Molecular architecture of CAG repeats in human disease related transcripts The author acknowledges Jasmin Yang of the NDB, and Donna Hendrix and Elizabeth Holbrook of UC Berkeley for assistance in the preparation of this manuscript. SRH was supported by National Institutes of Health grants 1R01GM66199 and 1R01HG002665.