key: cord-0736338-qgit5qqo
authors: Reeder, Jens; Höchsmann, Matthias; Rehmsmeier, Marc; Voss, Björn; Giegerich, Robert
title: Beyond Mfold: Recent advances in RNA bioinformatics
date: 2006-06-25
journal: J Biotechnol
DOI: 10.1016/j.jbiotec.2006.01.034
sha: 58fc0a107bce6eea608a3fcba66fe6acab0880aa
doc_id: 736338
cord_uid: qgit5qqo

Computational analysis of RNA secondary structure is a classical field of biosequence analysis, which has recently gained momentum due to the manyfold regulatory functions of RNA that have become apparent. We present five recent computational approaches that address the problems of synoptic folding space analysis, pseudoknot prediction, structure alignment, comparative structure prediction, and miRNA target prediction. All these programs are in current use and are available via the Bielefeld Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de.

RNA secondary structure analysis has been an important task in molecular biology throughout the recent years. Computational structure prediction based on a thermodynamic model has been around since the first version of the Mfold program (Zuker and Stiegler, (1) The MFE structure is not guaranteed to be the native one. For sequences longer than (say) 400 bases, it will almost surely be wrong. On the other hand, it will also contain some correct structural elements -given just the MFE structure and no additional clues, we do not know which parts of the prediction are good. And also, the native structure is typically a member of the near-optimal folding space, but again, we do not know which one.

(2) For reasons of computational complexity, standard folding programs disregard pseudoknots, which are an important structural feature. Since the pseudoknots are not contained in the folding space evaluated by the programs at all, they cannot be found among the near-optimals either.

(3) Better results can be obtained with a comparative approach, using the fact that structure is conserved more than sequence with homologous RNA molecules that share the same function. When a large number of well-aligned sequences is available, statistical methods yield good results (Doshi et al., 2004) . When only a few sequences are given, and when their alignment is difficult, much more sophisticated methods are required. (4) RNA molecules do not fold in isolation, and contacts with proteins or other RNA molecules may play an important role. Furthermore, RNA-RNA interaction is guided by the same thermodynamics as is the self-interaction that creates the structure of a single sequence, but its correct modeling again requires specialized approaches.

A large number of contributions has been made addressing the above problems. We only list a few for each category, adding in the ones to be presented below in the appropriate place:

(1) The fact that the truth may lie buried in the nearoptimal folding space has prompted researchers to augment their folding programs to also compute near-optimal structures. There are basically three approaches: Mfold provides a heuristic sample of near-optimal structures, taking care that the structures produced are not too similar. RNAsubopt (Wuchty et al., 1998 ) produces a complete account of the near-optimal folding space up to a certain energy threshold. In the first case, we do not know if an "interesting" structure is overlooked in the heuristic example, in the second case, we are confronted with a large number of structures, as the near-optimal folding space grows exponentially with sequence length as well with the allowed energy interval. Sfold (Ding and Lawrence, 2003) samples structures according to their probabilities derived from Boltzmann statistics. A sample size of 1000 structures is usually sufficient for good results. With subsequent clustering of the sampled structures according to structural similarity, a small number of centroids can be returned that can be taken as a representative ensemble of potentially relevant structures (Ding et al., 2005) . The MFE structure is not guaranteed to be contained in the sample, but can be easily added by traditional methods. This approach comes close to a complete, non-heuristic and yet useful (because of the compact result) analysis of the folding space. A more direct way to achieve the same goal is chosen by the program RNAshapes (Giegerich et al., 2004b) , to be presented below. Here the folding space is subdivided into families of structures that share the same abstract shape, and for each family, a representative structure is determined.

(2) The problem of pseudoknots has been widely studied, as it is also of great theoretical interest. The general case has been proved to be an NP-hard problem (Lyngsø and Pedersen, 2001; Akutsu, 2000) , and hence computationally intractable. Some researchers have abandoned the thermodynamic model and resorted to graph algorithms or genetic algorithms (Tabaska et al., 1998; Shapiro and Navetta, 1994) . The program pknots by Rivas and Eddy (1999) implements thermodynamic folding of a rather large subclass of pseudoknots in O(n 6 ) time and O(n 4 ) space, which implies that it can be used only for very short sequences. The program pknotsRG, to be presented below, restricts the class of pseudoknots further, but still recognizes many pseudoknots arising in practice, and runs in O(n 4 ) time and O(n 2 ) space. A filtering approach has recently be presented (Huang et al., 2005) that combines a pre-filter with the folding programs, which sometimes substantially gains in efficiency.

(3) Comparative structure prediction from unaligned sequences must solve the task of simultaneous alignment and thermodynamic folding with a combined objective function to be optimized. As multiple sequence alignment under the common sumof-pairs score by itself is NP-complete (Wang and Jiang, 1994) , one cannot hope to obtain an efficient non-heuristic algorithm. The Sankoff algorithm (Sankoff, 1985) , which gives the exact solution to this problem, has hence served as a source of derived approaches, for example, DNYALIGN (Mathews and Turner, 2002) and FOLDALIGN (Gorodkin et al., 1997) . These programs must make severe restrictions to the underlying model to achieve practical runtime. Even so, the results are often quite moderate; see (Gardner and Giegerich, 2004 ) for a recent survey. In this situation, the recent tool RNAforester (Höchsmann et al., 2003 , which allows for a meaningful way of aligning structures directly (without use of a sequence alignment), has opened a new road that circumvents the dilemma of the Sankoff approach. The program RNAcast (Reeder and Giegerich, 2005) efficiently determines the consensus shapes of a set of sequences, and from their representative structures, a multiple alignment can be obtained with RNAforester. Both RNAforester and RNAcast are described below. (4) When the sites are known where RNA molecules interact with other molecules, this can be incorporated into the folding algorithms. The aforementioned folding programs provide options to specify that a certain base must pair or should not pair. Although such information can be obtained by experimental or comparative methods, most often it is not available. General RNA-RNA interaction has been recently studied by Andronescu et al. (2005) . A special case of great current interest is the interaction between a miRNA or a siRNA with its target RNA. As this process is guided by thermodynamics, early approaches have modeled this situation by linking the small RNA to the end of its target, and then folding them jointly with a standard folding program (Stark et al., 2003) As this approach tends to produce artefacts, a proper model of hybridizing a small RNA to a larger target has been developed and implemented in the program RNAhybrid. This program, already widely used for miRNA and siRNA target prediction, is presented below.

A common problem in RNA secondary structure analysis is the huge number of suboptimal structures. Many of these are similar, whereas only few show differences that are biologically relevant.

The idea of abstract shape analysis (Giegerich et al., 2004b) is to classify structures upon their composition from structural elements, such as helices, multiloops, internal loops and bulge loops. For each shape class one representative structure is reported. This provides a synoptic and non-heuristic overview of the structure space. Furthermore, this partitioning allows to deduce properties of each class, such as its probability or number of members.

The secondary structure of an RNA is defined by its base pairs. A widely used representation of a secondary structure is the Dot-Bracket-Notation, also known as "Vienna"-string, where a base pair is represented by a pair of parentheses and unpaired bases by dots, see Fig. 1 (b). We use a similar representation for shapes. A paired region is represented by a pair of square brackets and unpaired regions by the underscore character, see Fig. 1 (c). The member of the shape class having lowest free energy is called shape representative structure or shrep for short.

We can also compute the probability of a shape as the sum of the probabilities of its members, according to Boltzmann statistics. The calculation of the probability of an individual structure follows McCaskill (1990) and makes use of the partition function known from thermodynamics. A manuscript describing the complete probabilistic shape analysis is currently in preparation.

The method of shape analysis is implemented in the tool RNAshapes. This tool computes shapes together with shreps, calculates probabilities of shapes, performs stochastic sampling of structures, and includes complete suboptimal folding and prediction of MFE structure.

Transfer RNAs (tRNAs) are one of the best analyzed RNA families. Various experiments have revealed the biological active structure of tRNAs which is known as the cloverleaf structure. In contrast to this we found that out of 99 tRNA sequences from the Rfam database (Griffiths-Jones et al., 2003) , only 30 have a cloverleaf as their predicted MFE structure (data not shown). The biological explanation for this is that tRNAs possess modified bases which may on the one hand be no longer capable of forming base pairs, or on the other hand are able to interact in a different way. This alters the free energy of the predicted conformation such that it rises above the free energy of the cloverleaf (or vice versa), letting the latter achieve the energetical optimum. For structure prediction, when the modifications are unknown, current practice is to calculate suboptimal structures for a certain energy range and to subsequently search (by eye or by a simple pat-tern matching algorithm) for the cloverleaf structure in the list of suboptimals. For tRNAs this means that about 50-300 structures have to be checked. To give an example we chose the Natronobacterium pharaonis tRNA for alanine (gb: AB003409.1/96-167). The predicted MFE structure is one hairpin with three internal loops, as depicted in Fig. 2 (a). The cloverleaf structure, shown in Fig. 2 (c) appears at position 104 in the energy sorted list of 199 suboptimals, produced by RNAsubopt. Using RNAshapes, we obtain three shapes in an energy range of 5 kcal/mol above the MFE, of which the rank 3 shrep is the cloverleaf structure. Furthermore, the probabilities indicate that without the aforementioned modifications it is highly unlikely that this tRNA attains the cloverleaf shape. The output of RNAshapes and the squiggle plots for the shreps are shown in Fig. 2. 

Pseudoknots play an important role in many biological processes. They build the catalytic core of some ribozymes and are an important building block of many structural RNAs. Pseudoknots are involved in telomerase activity and they can stimulate efficient ribosomal frameshifting, a mechanism used by a wide range of RNA viruses.

Standard RNA folding programs neglect pseudoknots for reasons of efficiency. While the standard methods need time proportional to the cube of the input sequence length, pseudoknot prediction is much more demanding. The algorithm presented in this section needs O(n 4 ) time and O(n 2 ) space, where n is the sequence length. The algorithm by Rivas and Eddy (1999) , which is able to predict a larger class of pseudoknots, needs O(n 6 ) time and O(n 4 ) space.

Most of the currently known pseudoknots are rather simple ones. They consist of only two helices, interacting in a crosswise fashion, as shown in Fig. 3 . If we allow the unpaired strands (u, v, w in Fig. 3 ) to build secondary structures internally in an arbitrary way, including multiloops and pseudoknots, we call this class simple recursive pseudoknots.

Our model further restricts this class by three canonization rules:

• Rule 1: the 5 and 3 part of each pseudoknot helix must have the same length and must not have an interruption. This disallows bulges and internal loops inside pseudoknot stems. • Rule 2: the pseudoknot helices must have maximal length, i.e. if there is a possible base pair at either end, it must be closed. • Rule 3: if due to Rule 2 the helices would overlap, the first helix (a-a ) is prioritized and the second one is shortened. We call the resulting class canonical simple recursive pseudoknots (csr-pk).

The canonization rules are motivated by the following observations: if a pseudoknot contains a bulge in one of the stems, it is not canonical. However, there must be a pseudoknot with a smaller stem without the bulge that serves as a canonical representative. This representative may have a higher energy, but usually the differences are small.

Rule 2 is justified by the fact that the energy model strongly favors helix extension. The only exception is, when bases of the pseudoknot stem compete with internal structures of the inner parts u, v, w.

Finally, the decision of Rule 3 will have only a minor effect, since the same bases are either stacked on helix a-a or b-b .

In an exhaustive analysis of the pseudoknot database PseudoBase, 1 it was shown, that out of 212 known pseudoknots, 172 are simple recursive (Reeder and Giegerich, 2004) . Almost 80% of these are even canonical simple recursive pseudoknots. This shows the abundance of the class csr-pk within all validated structures.

The program pknotsRG extends the usual RNA folding programs by the class of simple recursive pseudoknots. It uses the current thermodynamic energy model by the Turner group (Mathews et al., 1999) . pknotsRG provides three basic modes:

(1) Standard MFE folding with pseudoknots.

(2) Enforced folding: the best structure in the folding space that contains at least one pseudoknot is reported.

(3) Local folding: the pseudoknot with the best energy to length ratio is predicted.

All modes can also be run in a suboptimal fashion, where all suboptimal solution up to a user-defined threshold are computed.

The program is available as download, as interactive tool and WebService, which allows for an easy integration into other software. 1 http://wwwbio.leidenuniv.nl/Batenburg/PKB.html.

Recently, an unusual three-stemmed pseudoknot has been identified that promotes ribosomal frameshift in the SARS Coronavirus (Plant et al., 2005) . The pseudoknot is thought to pause the ribosome during translation, which then shifts back by one nucleotide on the "slippery site". This special pseudoknot seems to be conserved in all Coronaviruses, and thus could be a target for anti-viral therapeutics.

The three-stem topology is predicted by pknot-sRG as the optimal structure with an energy of −31.8 kcal/mol (Fig. 4) . Where the selective pressure acts on the function, often the structure instead of the sequence is conserved. In spite of all its success, purely sequence based comparative analysis gets to its limit when structural conservation is of interest. In contrast,

RNAforester is a tool that aligns the structures of RNA molecules.

From the viewpoint of computer modeling, RNA secondary structures are naturally represented as trees or forests. The parent and sibling relationship of nodes is determined by the nesting of base-pair bonds. The 5 -3 orientation of an RNA molecule imposes the order among sibling nodes. This produces a forest structure in general (where a forest is simply a sequence of trees) but a virtual root node can always turn a forest into a tree. Different tree representations of RNA structures that vary in their resolution have been proposed. See secondary structures, every distance measure on trees can be applied to compare RNA secondary structures.

The tree alignment distance was introduced by Jiang et al. (1995) . Let Σ be an alphabet and let T (Σ) denote the set of Σ-labeled trees. We define the alignment alphabet Σ n λ as the tuple alphabet where for each of its elements at least one component is different from λ. Formally, Σ n λ = (Σ ∪ {λ}) n \{λ} n . A tree alignment A is an element of T (Σ 2 λ ). Its component-wise projections A| 1 and A| 2 are elements of T (Σ ∪ {λ}). For some T ∈ T (Σ ∪ {λ}), π(T ) ∈ F (Σ) is the forest that results from the deletion of all nodes v with label(v) = λ.

(Deleting node v in T means that the children of node v become the children of the parent node of v. Moreover, if v has any siblings, the deletion preserves the preorder relation of these nodes.) A tree alignment A is an alignment of trees F and G, iff π(A| 1 ) = F and π(A| 2 ) = G. See Fig. 6 for an illustration of the tree alignment model.

Given this notation of tree alignment, similarity of trees can be defined in a way analogous to sequence similarity: the similarity score σ of an alignment A is the sum of scoring contributions for its node labels. The alignment similarity σ TA of T 1 and T 2 is the maximum score that an alignment of T 1 and T 2 can achieve. An alignment of T 1 and T 2 is optimal if it achieves this score. These definitions generalize to forests in a straightforward way.

Calculating global similarity of RNA secondary structures is not sufficient when the focus is to find similar substructures. This holds particularly if the substructures are far apart in sequence due to their 5 positions. Local similarity means finding the maximal similarity between two substructures. If these substructures are extended, the score decreases. This requires a scoring scheme that balances positive and negative scoring contributions. Otherwise, the similarity of the whole structure always achieves the maximum score. With suitable generalized definitions we can perform a BLAST-or Smith-Waterman-like local similarity search on RNA structures. A dynamic programming algorithm that solves this problem was proposed in (Höchsmann et al., 2003) and is implemented in RNAforester.

An alignment of forests is a forest and, hence, can again be aligned in the forest alignment model. This makes virtually every progressive strategy that was reported for multiple sequence alignment applicable to multiple forest alignments. The idea of the algorithms persists even if the type of the alignment is not a sequence. The concept of sequence profiles can be naturally extended to forests, which results in a profile representation of RNA secondary structures. This data structure and a progressive profile alignment algorithm for RNA secondary structures were proposed in and are implemented in RNAforester.

RNAforester is a command line based tool for comparing RNA secondary structures. It supports the computation of (local) pairwise and multiple alignment of structures based on the tree alignment model (Jiang et al., 1995) and the extensions and algorithms presented in (Höchsmann et al., 2003 . The user interface follows the philosophy of the Vienna RNA Package (Hofacker et al., 1994) and the program is part of the forthcoming Vienna RNA Package Version 1.6. An online version of RNAforester and the source code distribution is available at the Bielefeld Bioinformatics Server (BiBiServ).

In Fig. 7 (a) the local alignment of the 5 UTRs of the human ferritin heavy chain mRNA (D28463) and the mouse ferritin heavy chain mRNA (M60170) is displayed. Here, the IRE is not only conserved in structure but also in sequence. In contrast, the local structure alignment of the 5 UTR of the human ferritin heavy chain mRNA and the 3 UTR of the human transferrin receptor mRNA (X01060) (see Fig. 7 (b)) shows numerous compensatory mutations in the IRE. This shows that our approach can discover arbitrary conserved structural motifs in a larger structure, independent of their position and primary sequence. Fig. 8 shows an example of a multiple structure alignment and its progressive calculation.

Most RNAs exert their function by structure, not by sequence. Thus a family of RNAs with the same function has most likely a conserved tertiary and secondary structure. Determining the secondary structure by comparative methods usually gives better results than methods based on only one sequence. The best results were achieved by statistical analysis of covariation and sequence conservation within a RNA family (Gutell et al., 1992) . However, these methods need a large family and a very good, hand crafted alignment. For the comparative analysis of only a few related RNAs, the algorithm formulated by Sankoff (1985) is thought to be the ideal approach. It simultaneously aligns and folds a set of RNA sequences. Unfortunately, this method is exponential in the number of sequences and practical only for two sequences of moderate length. An intrinsic reason why the Sankoff algorithm is so expensive is the implicit multiple sequence alignment. The key to a practical, yet non-heuristic algorithm for consensus prediction is to find the consensus without constructing a sequence alignment. If desired, an alignment can be derived from the consensus afterwards with a structural alignment program, such as RNAforester.

We redefine the problem addressed by the Sankoff algorithm: a consensus structure for k sequences s 1 , . . ., s k is a set of secondary structures x 1 , . . ., x k , such that x 1 , . . ., x k have the same abstract shape.

Since the shape abstracts from sequence content and explicit loop and helix lengths, it lends itself per-fectly to a comparative approach: shapes of different structures and different sequences can be meaningfully compared for identity, using fast exact matching rather than alignment methods. This outlines our method termed RNAcast:

•

Step 1: for RNA sequences s 1 , . . ., s k , enumerate the near-optimal shape spaces P(s 1 ) = [p 1 1 , . . . , p l 1 1 ], . . . , P(s k ) = [p 1 k , . . . , p l k k ]. For each shape p j i , also compute the shape representative structurep j i , which is the best way sequence s i can fold into this particular shape.

Step 2: identify the shapes that occur in all shape spaces, i.e. that all sequences can fold into. This yields a list of all common shapes, together with their respective shreps:

Step 3: finally, rank each common shape p i by adding the individual free energies ofp i 1 , . . . ,p i k . The best scoring common shape is the consensus shape, and its shreps constitute our consensus structure prediction.

The idea of the RNA consensus abstract shapes technique is implemented in the tool RNAcast. For a set of (related) RNA sequences, it computes a consensus shape, and for each sequence, the shape representative structure (shrep). RNAcast predictions are unaligned structures. If an alignment is of interest, it can be computed afterwards with a multiple structure alignment program, such as RNAforester.

RNAcast consensus structure predictions are most reliable for sets of 5-10 sequences, not longer than 200 nucleotides. If fewer sequences are used, individual sequences may have a too strong overall influence. The strength of the comparative approach does not show to advantage. On the other hand, with more than 10 sequences in the input set, the user should check for potential outliers, since these may prohibit the finding of the native folding.

RNAcast has been evaluated on several RNA families, including micro RNAs, tRNAs, the RNA of the signal recognition particle, several spliceosomal RNAs and other. It shows a significant increase in accuracy over single sequence RNA folding and performs as good as the heuristic implementation of the Sankoff approach by Mathews and Turner (2002) , while being a lot faster.

As an example, we apply RNAcast to five signal recognition particle RNAs of Homo sapiens, Rattus norvegicus, Gallus gallus, Canis lupus and Anopheles gambiae (accession numbers: X04248, AC091616, AB073218, J01853, BX044091), each of length approximately 300 bases, taken from the SRP Database. 2 The predicted shreps are all very similar to the published reference structure (not shown). The consensus structure, computed by RNAforester afterwards from RNAcast's prediction clearly shows all the common features of the SRP (see Fig. 9 ). This particular computation took less than 20 s on current hardware, which demonstrates the efficiency of RNAcast.

MicroRNAs (miRNAs) are short RNAs of around 22 nt that post-transcriptionally regulate the expression of target genes by binding to the target mRNAs. In plants, this binding usually leads to cleavage and subsequent degradation of the target, in animals the target is usually not cleaved, but blocked from translation into protein. Although several thousand miR-NAs have been defined, only a small, albeit increasing, number of targets is known. The increase in known miRNA/target relationships is also due to the use of computational prediction methods, one of which is RNAhybrid (Rehmsmeier et al., 2004) . A related situation is the design of small interfering RNAs (siR-NAs), in which RNAhybrid could be used to reduce off-target effects by checking for possible hybridizations of designed siRNAs and genes other than the ones to be targeted.

RNAhybrid extends the classical RNA secondary structure prediction to two sequences. A small query RNA, such as a microRNA, is hybridized to a potential target RNA in a way that optimizes the free energy of the hybridization site. Secondary structure outside a hybridization site is not considered, and intra-molecular base pairings are not allowed. The sizes of bulge loops and internal loops can be restricted if from the nature of the RNA/RNA interaction this appears reasonable. This can be the case, for example, in the prediction of microRNA targets in plants, where microRNAs usually bind nearly perfectly to the target, the hybridizations showing only small loops of 1 nucleotide, if any. As a further refinement, Fig. 9 . RNA of the signal recognition particle: (a) structure of the dog SRP as found in the SRP database, (b) RNAforester consensus produced from an RNAcast prediction of five SRP sequences: human, chicken, rat, dog and mosquito. The predicted consensus structure clearly shows the conserved features of the SRP. hybridizations can be forced to have uninterrupted base pairing for specified regions, for example from nucleotide 2 to nucleotide 7 in the microRNA. Such "seeds" have been suggested as necessary for functioning microRNA/target interactions. Important in any prediction method is an assessment of statistical significance. In the RNAhybrid rationale, this is accounted for in a stepwise manner in which pvalues are calculated for, first, individual binding sites, second, multiple binding sites of one microRNA in a target, and third, binding sites of one microRNA in orthologous targets across multiple species. p-Values for individual binding sites are based on extreme value distributions that are estimated in a microRNA-specific way; p-values of multiple binding sites are modeled with a Poisson approximation that considers individual good binding sites as rare events; multi-species pvalues are based on the effective number of orthologous sequences, which describes the statistical dependence of these and which avoids biased overestimates of statistical significance.

RNAhybrid is written in C and is available as source distribution for download as a program bundle. For small query RNAs and constant maximal loop sizes, the time and memory efficiency of the algorithm is linear in the length of the target. There will be a new version available soon that integrates the complete prediction rationale in one program and also contains a Graphical User Interface to facilitate the use of the program. A version that allows for the easy prediction of energetically favorable binding sites is available as an online tool on the BiBiServ. Fig. 10 shows a prediction of Drosophila melanogaster miR-2b binding sites in the sickle 3 UTR sickle, with grim and reaper, constitutes a group of proapoptotic genes which have been demonstrated to be targets of miRNA-2 (Stark et al., 2003) . In this reference, however, only grim and reaper were among the predictions. In (Rehmsmeier et al., 2004) , sickle was predicted as possible target of miR-2b.

The tools presented here have already acquired a widespread usership. All programs are available for (Hofacker et al., 1994) , starting with version 1.6. All programs are also available as online tools at BiBiServ, and Table 1 shows their usage counts for January 2004 through July 2005. For online use, we normally restrict the size of jobs to be submitted, but exceptions may be negotiated.

Note that often the tools can be used in context. For example, a predicted miRNA target site should be checked for accessibility -it should not be hidden in helical regions of the target molecule structure. This requires a run of RNAshapes on the targets predicted by RNAhybrid. An important aspect of our future work will be to support the use of these tools in an integrated fashion.

Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots

Secondary structure prediction of interacting RNA molecules

RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble

A statistical sampling algorithm for RNA secondary structure prediction

Evaluation of the suitability of free-energy minimization using nearestneighbor energy parameters for RNA secondary structure prediction

A comprehensive comparison of comparative RNA structure prediction approaches

A discipline of dynamic programming over sequence data

Abstract shapes of RNA

Finding the most significant common sequence and structure motifs in a set of RNA sequences

Rfam: an RNA family database

Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods

Local similarity in RNA secondary structures

Pure multiple RNA secondary structure alignments: a progressive profile approach

Vienna RNA secondary structure server

Fast folding and comparison of RNA secondary structures

A heuristic approach for detecting RNA H-type pseudoknots

Alignment of trees an alternative to tree edit

RNA pseudoknot prediction in energy based models

Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure

Dynalign: an algorithm for finding the secondary structure common to two RNA sequences

The equilibrium partition function and base pair binding probabilities for RNA secondary structure

A three-stemmed mRNA pseudoknot in the SARS coronavirus frameshift signal

Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction

Fast and effective prediction of microRNA/target duplexes

A dynamic programming algorithm for RNA structure prediction including pseudoknots

Simultaneous solution of the RNA folding, alignment and protosequence problems

A massively parallel genetic algorithm for RNA secondary structure prediction

Identification of Drosophila microRNA targets

An RNA folding method capable of identifying pseudoknots and base triples

On the complexity of multiple sequence alignment

Complete suboptimal folding of RNA and the stability of secondary structures

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary informations

Four of the five tools presented here have been developed with the method of algebraic dynamic programming (Giegerich et al., 2004a) . The authors gratefully acknowledge the contributions of Peter Steffen to the efficient implementation of this programming technique.