key: cord-354465-5nqrrnqr
authors: Haslinger, Christian; Stadler, Peter F.
title: RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties
date: 1999
journal: Bull Math Biol
DOI: 10.1006/bulm.1998.0085
sha: 
doc_id: 354465
cord_uid: 5nqrrnqr

The secondary structures of nucleic acids form a particularly important class of contact structures. Many important RNA molecules, however, contain pseudo-knots, a structural feature that is excluded explicitly from the conventional definition of secondary structures. We propose here a generalization of secondary structures incorporating ‘non-nested’ pseudo-knots, which we call bi-secondary structures, and discuss measures for the complexity of more general contact structures based on their graph-theoretical properties. Bi-secondary structures are planar trivalent graphs that are characterized by special embedding properties. We derive exact upper bounds on their number (as a function of the chain length n) implying that there are fewer different structures than sequences. Computational results show that the number of bi-secondary structures grows approximately like 2.35(n). Numerical studies based on kinetic folding and a simple extension of the standard energy model show that the global features of the sequence-structure map of RNA do not change when pseudo-knots are introduced into the secondary structure picture. We find a large fraction of neutral mutations and, in particular, networks of sequences that fold into the same shape. These neutral networks percolate through the entire sequence space.

The secondary structures of nucleic acids form a particularly important class of contact structures. Many important RNA molecules, however, contain pseudoknots, a structural feature that is excluded explicitly from the conventional definition of secondary structures. We propose here a generalization of secondary structures incorporating 'non-nested' pseudo-knots, which we call bi-secondary structures, and discuss measures for the complexity of more general contact structures based on their graph-theoretical properties. Bi-secondary structures are planar trivalent graphs that are characterized by special embedding properties. We derive exact upper bounds on their number (as a function of the chain length n) implying that there are fewer different structures than sequences. Computational results show that the number of bi-secondary structures grows approximately like 2.35 n . Numerical studies based on kinetic folding and a simple extension of the standard energy model show that the global features of the sequence-structure map of RNA do not change when pseudo-knots are introduced into the secondary structure picture. We find a large fraction of neutral mutations and, in particular, networks of sequences that fold into the same shape. These neutral networks percolate through the entire sequence space.

c 1999 Society for Mathematical Biology

Presumably the most important problem and the greatest challenge in present day theoretical biophysics is deciphering the code that transforms sequences of biopolymers into spatial molecular structures. A sequence is properly visualized as a string of symbols which together with the environment encodes the molecular architecture of the biopolymer. In case of one particular class of biopolymers, the ribonucleic acid (RNA) molecules, decoding of information stored in the sequence can be properly decomposed into two steps: (i) formation of the secondary structure, that is, of the pattern of Watson-Crick (and GU) base pairs, and (ii) the embedding of the contact structure in three-dimensional space.

The sequence structure relation of RNA was studied in detail in a series of papers (Fontana et al., 1991 (Fontana et al., , 1993a Bonhoeffer et al., 1993; Tacker et al., 1994; Grüner et al., 1996a,b; Tacker et al., 1996) at the level of secondary structures. The most salient findings of these investigations are:

(i) There are many more sequences than (secondary) structures.

(ii) There are few frequent and many rare structures. Almost all sequences fold into frequent or 'common' structures. (ii) Sequences that fold into a 'common' structure are distributed nearly uniformly in sequence space. (iv) A sequence folding into a 'common' structure has a large number of neutral neighbors (folding into the same structure) and a large number of neighboring sequences that fold into very different secondary structures. (v) Neutral paths percolate sequence space along which all sequences fold into the same secondary structure. In fact, there are extended neutral networks of sequences folding into the same 'common' structure (Grüner et al., 1996b; . (vi) Almost all 'common' structures can be found close to any point in sequence space. This property is called shape space covering.

The impact of these features on evolutionary dynamics is discussed in Schuster (1995) and : a population explores sequence space in a diffusion-like manner along the neutral network of a viable structure. Along the fringes of the population novel structures are produced by mutation at a constant rate (Huynen, 1996) . Fast diffusion together with perpetual innovation makes these landscapes ideal for evolutionary adaptation (Fontana and Schuster, 1998) . The 'classical' definition of secondary structures incorporates a quite restrictive condition on the set of base pairs that implies a tree-like arrangement of the doublehelical regions, see Fig. 1 . Additional interactions between different branches of this tree are referred to as pseudo-knots (for an exact definition see Section 2). Pseudoknots are excluded from many studies for a mostly technical reason (Waterman and Smith, 1978a, b) : the folding problem for RNA can be solved efficiently by dynamic programming (Waterman and Smith, 1978b; Zuker and Sankoff, 1984) in their absence.

On the other hand, an increasing number of experimental findings, as well as results from comparative sequence analysis, suggest that pseudo-knots are important structural elements in many RNA molecules (Westhof and Jaeger, 1992) . Notably, Figure 1 . The contact structure of Escherichia coli RNAse P RNA contains two pseudoknots [http://jwbrown.mbio.ncsu.edu/RNaseP/home.html]. The conventional secondary structure is drawn on the l.h.s., the (four) regions forming the pseudo-knots are marked by braces, interaction regions are connected. The arc diagram of the same structure is obtained by arranging the backbone along a line and indicating base pairs by arcs connecting the corresponding bases. The base pairs of the conventional secondary structure are drawn above the line, the two pseudo-knot stems are shown below the back-bone. For details see Section 2. functional RNAs such as RNAseP RNA (Loria and Pan, 1996) and ribosomal RNA (Konings and Gutell, 1995) contain pseudo-knots. The diversity of molecular biological functions performed by pseudo-knots can be subdivided into three groups. Pseudo-knots at the 5 -end of mRNAs appear to adopt a role in the control of mRNA translation. For instance, the expression of replicase is controlled in several viruses either by ribosomal frame shifting (Ten Dam et al., 1990; Brierley et al., 1991; Dinman et al., 1991; Chamorro et al., 1992; Tzeng et al., 1992) or by in-frame readthrough of stop codons (Wills et al., 1991) . Both mechanisms involve pseudo-knots. Core pseudo-knots are necessary to form the reaction center of ribozymes. Most of the enzymatic RNAs with core pseudo-knots, such as RNAseP, are involved in cleavage or self-cleavage reactions (Michel and Westhof, 1990; Forster and Altman, 1990; Brown, 1991; Haas et al., 1991) . Pseudo-knots in the tRNA-like motifs at the 3 -end of the genomic RNA mediate replication control in several groups of plant viral RNA (Mans et al., 1991) .

It is important, therefore, to include pseudo-knotted structures into investigations of RNA sequence-structure relationships. In particular, we need to know whether the findings (i) through (vi) described above remain true when pseudo-knots are taken into account. Assertion (i), the existence of more sequences than structures, is a necessary prerequisite for all subsequent statements concerning the sequencestructure map of RNA. It is necessary therefore to estimate the number of RNA structures with pseudo-knots in order to decide whether the results quoted above can in fact be true for 'real' RNA molecules.

In the following two sections we give a detailed mathematical analysis of what we call bi-secondary structures. In a nutshell, bi-secondary structures generalize to the notion of secondary structures to include pseudo-knots without allowing overly involved knotted structures or nested pseudo-knots. In fact, almost all known pseudo-knotted structures, with the notable exception of the E. coli αmRNA, fall into this class.

In Section 2 we review a variety of equivalent graph-theoretical characterizations of bi-secondary structures and provide a way of efficiently determining whether a list of base pairs corresponds to a bi-secondary structure. Then we briefly review a few graph invariants that might be useful for determining the complexity of higher-order structures beyond the realm of bi-secondary structures. At the end of Section 2 we show that a convenient distance measure for comparing secondary structures can be used also in the presence of pseudo-knots (Section 2.7). In Subsection 2.8 we argue that the intersection theorem is valid for general nucleic acid contact structures. We say that an RNA sequence is compatible with a structure s if it can in principle form this structure irrespective of energetic constraints. This means that for each base pair (i, j) in s the sequence positions x i and x j are one of the six possible RNA base pairs AU, UA, GC, CG, GU, or UG. The set of sequences that actually fold into a given structure s is therefore a subset of the set of compatible sequences. The intersection theorem (Reidys et al., 1997) now states that for any two structures s and s there are sequences which are compatible with both of them. This result is the reason why very different structures with very closely related sequences can exist. The fact that the intersection theorem holds for structures with pseudo-knots means that we have to expect shape space covering provided the fraction of neutral mutations is large enough (Reidys et al., 1997) .

In Section 3 we determine the number of different structures with pseudo-knots. Combinatorial aspects of RNA secondary structures have been studied in detail by Waterman and co-workers (Stein and Waterman, 1978; Waterman, 1978; Waterman and Smith, 1978a, b; Penner and Waterman, 1993; Schmitt and Waterman, 1994; Waterman, 1995) and Hofacker et al. (1999) . Using different techniques we give analytical upper bounds on the number of different bi-secondary structures showing that their number does not increase much faster than the number of secondary structures. The analytical results are complemented by numerical data (see Table 2 at the end of Section 3) indicating that the number S n of 'reasonable' bi-secondary structures with chain length n grows approximately as S n ∼ 2.35 n . 'Reasonable' means here that the structures have no isolated base pairs (i.e., the minimum stack size is l = 2) and that hairpin loops contain at least m = 3 unpaired bases. For comparison, the number of secondary structures without pseudo-knots grows like 1.86 n . Exhaustive enumeration for short sequences suggest that only 1.65 n different secondary structures appear as minimum energy structures of sequences of length n (Grüner et al., 1996a) . Hence the number 4 n of RNA sequences of length n, is much larger than the number of possible structures, independent of whether or not one takes pseudo-knots into account.

This observation poses the question how the sequences that fold into a given structure are distributed in sequence space. In Section 4 we describe a set of numerical experiments strongly suggesting that the inclusion of pseudo-knots does not alter the qualitative picture [properties (i) through (vi) above] of the RNA sequence-structure map. A short discussion (Section 5) concludes this contribution.

Readers who are not interested in the mathematical details of defining, characterizing, and counting contact structures of various types might want to skip Sections 2 and 3.

The three-dimensional structure of a linear biopolymer, such as RNA, DNA, or a protein can be approximated by its contact structure, i.e., by the list of all pairs of monomers that are spatial neighbors. Contact structures of polypeptides have been introduced by Ken Dill and co-workers in the context of lattice models of protein folding (Chan and Dill, 1988; Chen and Dill, 1995) . They arise implicitly in knowledge-based potentials for polypeptides such as the Delauney-Tesselation potential described in Singh et al. (1996) . Last but not least, RNA secondary structures form a special class of contact structures. The purpose of this section is to bring together different mathematical approaches that can be used to describe biopolymer structures: contact graphs, linked diagrams, book embeddings, and graph colorings.

A contact structure is represented by the contact matrix C with the entries C i j = 1 if the monomers i and j are spatial neighbors without being adjacent along the backbone, and C i j = 0 otherwise. Hence C i j = 0 if |i − j| ≤ 1. We shall use the notation [n] = {1, . . . , n}.

We define a diagram ([n] , ) to consists of n vertices labeled 1 to n and a set of arcs that connect non-consecutive vertices.

A closely related class of diagrams which also allow arcs between consecutive vertices are the linked diagrams introduced by Touchard (1952). These are studied in some detail in Hsieh (1973 ), Kleitman (1970 ), Stein (1978 and Stein and Everett (1978) .

It is customary to arrange the vertices along the x-axis and to draw the vertices in such a way that they are confined in either the upper or the lower half-plane. The diagram of a contact structure with contact matrix C has the set of arcs = {{i, j}|C i j = 1}.

(1)

The contact matrix is thus the adjacency matrix of the corresponding diagram.

With each diagram we may associate a diagram graph with the following properties: Let B be the adjacency matrix of the backbone, i.e., the matrix with the entries B i,i+1 = B i+1,i = 1, i = 0, . . . , n − 1, and B 0n = B n0 = 1. Then the adjacency matrix of a diagram graph with n + 1 vertices is of the form

Equation (2) establishes the 1-1 correspondence of diagrams and the associated diagram graphs. Essentially the same construction can be used for contact structures of molecules with a circular backbone, i.e., for circular ssRNA or ssDNA. The only restriction is that {1, n} cannot be an arc in the case of a circular molecule. It is convenient in this case to define the corresponding diagram graph without the artificial root 0. Each graph with a Hamiltonian cycle is then the diagram graph of a contact structure with a circular backbone. The results in the following discussion hold for both linear and circular nucleic acids.

A diagram is a 1-diagram if for any two arcs α, β ∈ holds α ∩ β = ∅ or α = β.

A diagram is a 1-diagram if and only if associated diagram graph ( ) has vertex degrees less or equal to 3. Such graphs are often called sub-cubic or trivalent. The diagram graphs of 1-diagrams are closely related to cubic Hamiltonian graphs. The latter are studied in detail in Section 9.4 of Wagner and Bodendiek (1990): a graph S is homeomorphic from a graph if S can be produced from by inserting vertices of degree 2 into some edges of . S is also called a subdivision of . Obviously each cubic Hamiltonian graph gives rise to a diagram graph on n vertices by subdividing the edges of a Hamiltonian cycle. On the other hand, not all diagram graphs are homeomorphic from a cubic Hamiltonian graph: suppose {1, 3} is an arc and 2 is an unpaired vertex. The corresponding diagram graph cannot be cubic because the triangle 1, 2, 3 cannot be obtained from a cubic graph.

The classical definition of secondary structures (Waterman, 1978) requires that each base interacts with at most one other nucleotide. Thus nucleic acid secondary structures are special types of 1-diagrams. The second defining condition is that arcs do not cross. In terms of the contact matrix this means: if C i j = C kl = 1 and i < k < j then i < l < j. With the following notation we will find a simpler formulation of condition 2:

Let α = {i, j} with i < j be an arc of a diagram. We writeᾱ = [i, j] ⊂ R for the associated interval. Two arcs of a diagram are consistent if they can be drawn in the same half-plane without crossing each other. Equivalently, two arcs α, β ∈ of a diagram are consistent if either one of the following four conditions is satisfied:

Case (iv) is ruled out by definition in 1-diagrams. The non-crossing condition thus may be expressed as follows: whenever the intervals of two arcs {i, j} and {k, l} have non-empty intersection then one is contained in the other (Schmitt and Waterman, 1994) . Equivalently, we may simply define that a secondary structure is a 1-diagram in which any two arcs are consistent.

As a consequence, each secondary structure can be encoded as a string s of length n in the following way: if the vertex i is unpaired, then s i = '.'. Each arc α = {p, q} with p < q translates to s p = '(' and s q = ')'. As the arcs are consistent their corresponding parentheses are either nested, (( )), or next to each other, ()(). As there are no arcs between neighboring vertices in a 1diagram there is at least one dot contained within each parenthesis. A variant of this notation is the mountain representation of RNA secondary structures (Hogeweg and Hesper, 1984) . The 'dot-parenthesis' notation is used as a convenient notation in input and output of the Vienna RNA Package, a piece of public domain software for folding and comparing RNA molecules .

A graph that can be embedded in the plane (or, equivalently on the sphere) is called planar. If it can be embedded in the plane in such a way that all its vertices lie on the exterior region it is called outer-planar. This class of graphs was introduced and characterized in terms of subgraphs in Chartrand and Harary (1967) and Sys lo (1979). Clearly, a 1-diagram is a secondary structure if and only if its diagram graph ( ) is outer-planar. The outer-planar embedding corresponds to the 'circle representation' of secondary structures.

A similar procedure leads to book-embeddings. A p-book is a set of p distinct half-planes (the pages of the book) that share a common boundary line , called the spine of the book. An embedding of a graph into a book B consists of an ordering of the vertices along the spine of the book together with an assignment of each edge to a page of the book, in which edges assigned to the same page do not cross. The book-thickness (sometimes also called the page-number) bt( ) of a graph is the minimal number p of pages of a book into which it can be embedded (Bernhart and Kainen, 1979) . Book-embeddings have a practical application in the context of VLSI design. For an overview see Chung et al. (1987) and Heath et al. (1992) .

Not surprisingly, the book thickness is closely related to other embedding properties of graphs. Below we list a few important results: (v) bt(K n ) = n/2 , where K n is the complete graph with n vertices (Bernhart and Kainen, 1979) .

n + 6 for sub-cubic graphs (Chung et al., 1987) .

of a graph is the minimum number of 'handles' one needs to add to a sphere so that the graph can be embedded on the resulting surface without crossing edges.)

The book thickness of a variety of other graph classes has been studied in detail, among them hypercubes (Chung et al., 1987) , De Bruijn graphs (Obrenić, 1993) , and various types of network graphs of practical interest (Games, 1986).

Essentially the same construction is used for the investigation of cubic Hamiltonian graphs in Wagner and Bodendiek (1990) . We shall see that the inconsistency graph is a useful construction for characterizing embedding properties of diagram graphs. THEOREM 1. Let be a diagram. Then the following statements are equivalent.

Proof. (i ⇐⇒ ii) can be drawn without intersection arcs if and only if ( ) is planar because the Hamiltonian cycle H of ( ) divides the plane into the interior and the exterior of H which correspond to the upper and lower half-plane of the diagram , respectively.

(ii ⇐⇒ iii) can be shown in the same way as the analogous result for cubic Hamiltonian graphs in Wagner and Bodendiek (1990) , see also Even and Itai (1971) . (ii ⇐⇒ iv) follows immediate from Bernhart and Kainen (1979, Theorem 2.5) as a planar diagram graph is by construction Hamiltonian.

As noted in Even and Itai (1971) , the determination of the book thickness of a is equivalent to finding a minimal vertex-coloring of a certain circle graph, which in our case is the intersection graph ( ). This problem is in general NP-complete. The following observation simplifies the task by reducing the number of arcs that have to be considered.

Two

It is easy to show that the arcs of a stem of a 1-diagram are either all isolated vertices or they are contained in the same component of the inconsistency graph ( ). Furthermore, all arcs of a stem have the same adjacent vertices in ( ). We may therefore use a reduced intersection graphˆ ( ), the vertices of which are the stems. (In addition, we may recursively remove vertices of degree 2 that are not contained in a triangle before forming the intersection graph. This has the effect of removing bulges and interior loops that interrupt stems.) Examples of reduced intersection graphs are given in Figs 3 and 4.

Most of the literature on linked diagrams deals with complete diagrams, that is, each vertex x ∈ [n] is incident with an arc (Touchard, 1952; Kleitman, 1970; Stein, 1978) . It is straightforward to extend Touchard's definition of reducible diagrams to the incomplete diagrams considered here:

The following equivalence is proved in Haslinger (1997):

Reducible diagrams can therefore be viewed as being composed of substructures. These substructures do not in general conform to the conventional decomposition into stems and loops of an RNA that forms the basis of the standard energy model of nucleic acid secondary structures (Freier et al., 1986) .

The contact structure of the proposed SRV-1 frameshift signal contains a pseudoknot, see Ref.

Ten Dam et al. (1994) . Pseudo-knots such as this one belong to the class of bi-secondary structures. Knots such as the one in the lower part of the figure do not belong to the class of bi-secondary structures. Knots, in contrast to pseudo-knots, may contain parallel stranded helices which so far have not been described for RNA.

We may draw the arcs in the upper or lower half-plane, but they are not allowed to intersect the x-axis. In other words, it can be embedded in 2-page book. Bi-secondary structures are therefore 'superpositions' of two secondary structures.

The virtue of bi-secondary structures is that they capture a wide variety of RNA pseudo-knots, [Figs 1 and 2 (upper part)], while at the same time they exclude true knots. Knotted RNAs could in principle arise either from parallel stranded helices (Fig. 2) , or in very large molecules from sufficiently complicated crosslinking patterns. Parallel-stranded RNA has not been observed (so far), see, however, Fortsch et al. (1996) on parallel-stranded DNA. Wollenzien Cantor et al. (1980) have searched unsuccessfully for knots in large RNAs. The definition of bi-secondary structures, by allowing a planar drawing of the structure, rules out both possibilities.

Among the RNA structures with pseudo-knots, almost all are bi-secondary structures. Our examples include several viral RNAs such as Coronavirus (Brierley et al., 1991) , Luteovirus (Ten Dam et al., 1990) , and Retrovirus RNA (Chamorro et al., 1992) , as well as catalytic RNAs such as RNAseP RNA (Loria and Pan, 1996), tmRNA (Vlassov et al., 1995; Felden et al., 1997) , and ribosomal RNAs (Gutell et al., 1994) . We have encountered only a single exception, namely αmRNA (Tang and Draper, 1990) . THEOREM 2. Let be a 1-diagram. Then the following statements are equivalent:

(v) Among any three arcs of at least two are consistent.

Proof. The equivalence of (i), through (iv) is proved in Theorem 1 for all diagrams. The equivalence of (v) and (vi) follows immediately from the definition of ( ). The implication (iii ⇒v) is obvious. Finally, it is possible to show that ¬(ii) implies ¬(v) based on Kuratowski's (1930) theorem. For the details we refer to Haslinger (1997).

The practical importance of Theorem 2 lies in the fact that existence or nonexistence of triangles in ( ) can be checked very easily, and hence we have a very efficient (polynomial time) method for deciding whether a diagram is a bi-secondary structure or not. Note that the equivalence of (iii) and (vi) does not hold for general diagrams. A counterexample is shown in Fig. 3 .

Being the union of the two secondary structures ([n], U ) and ([n], L ) we can represent each bi-secondary structure as a string s using two types of parentheses: as in a secondary structure we write a dot '.' for all unpaired vertices. A pair { p, q} ∈ U becomes s p = '(' and s q = ')', while an arc { p, q} ∈ L becomes s p = '[' and s q = ']'. Unfortunately, the decomposition of a bi-secondary structure into two secondary structures is in general not unique, see Fig. 4 .

The fact that ( ) is bipartite allows us to define a normal form for this representation by means of the following rule: the leftmost arc of each connected component of ( ) belongs to U . In particular, all isolated vertices of ( ) are contained in U . The normal form of a secondary structure therefore contains only dots and (round) parentheses. Within each non-trivial connected component of ( ) the distribution of arcs between U and L is unique because the component is bipartite. All arcs in a stack have a common neighboring vertex in ( ), hence they all belong to the same class of the partition. Therefore, in normal form, all arcs belonging to the same stack are written with the same type of brackets.

The following example shows that there are natural RNA structures that are more complicated than bi-secondary structures. The Escherichia coli α-operon mRNA folds into a structure that is required for allosteric control of translational initiation (Tang and Draper, 1990) . Compensatory mutations have defined an unusual pseudo-knotted structure (Tang and Draper, 1989) , the thermodynamics of which were subsequently investigated in detail (Gluick and Draper, 1994) . The diagram of its contact structure cannot be drawn without intersections, see Fig. 5 . To our knowledge it is the only known RNA structure that cannot be embedded in a 2-page book.

In this subsection we briefly discuss a few graph properties that could be used for a classification of polymer structure complexity beyond the realm of bi-secondary structures. Clearly, one may use its book thickness. A closely related quantity is the chromatic number of the intersection graph: a color partition of a graph is partition V = V 1 ∪ V 2 ∪ · · · ∪ V c of its vertex set into c subsets V i such that no two vertices in V i are adjacent. The chromatic number χ( ) is the smallest number c of colors for which a color partition of can be found.

An arbitrary diagram can be decomposed into substructures by means of the following obvious result: let = ([n], ) be a diagram and let V : = 1 ∪ Figure 5 . Diagram of the contact structure of E. coli α-mRNA. The structure contains five stems, labeled by uppercase Greek letters. We may choose the color partition if ( ) such that all arcs in a stem have the same color. It therefore suffices to draw the inconsistency graph for stems (r.h.s. of the figure) . It contains triangles, thus the diagram of this RNA structure is not a bi-secondary structure. It is easy to check that χ( ( )) = 3.

the inconsistency graph ( ). Noticing that χ( ) = 1 if contains no edges and χ( ) = 2 if is bipartite with non-empty edge set, the following characterization follows immediately:

is a bi-secondary structure iff χ( ( )) ≤ 2.

Clearly, χ ( ( )) equals the minimum number of pages of all book embeddings in which the the ordering of the vertices along the spine coincided with the natural ordering along the backbone. In general, we have bt( ( )) ≤ χ( ( )) for all diagrams. We remark that graphs with moderate chromatic numbers can be characterized by results similar to Kuratowski's theorem for planar graphs. For instance, one can show for k ≤ 4, that a graph with chromatic number χ( ) ≥ k contains a subdivision of the complete graph K k (Dirac, 1952) . The generalization of this proposition to k > 4 is known as Hajós' conjecture. It is false for k ≥ 7 and unsolved for k = 5 and k = 6 (Holton and Sheehan, 1993). It seems that χ( ( )) is in fact the more useful quantity, as there are no efficient algorithms to determine the book-thickness of a given graph, and χ( ( )) accounts for the immutable ordering of the backbone vertices, whereas the book-thickness might decrease by changing this ordering. A quite different algebraic graph invariant µ, introduced by de Verdière (1990), leads to the same classification of structures for small µ: µ = 1 ( ) is a circle, has no arcs. µ ≤ 2 ( ) is outer-planar, is a secondary structure. µ ≤ 3 ( ) is planar, is a bi-secondary structure.

The graphs with µ ≤ 4 have recently been identified as the flat or linklessly embeddable graphs (Lovász and Schrijver, 1996) . A useful characterization of this class of graphs is proved in Robertson et al. (1995, b) : 'A graph is non-flat if and only if it has no minor in the so-called Petersen family'. The graph V * 8 , Fig. 6 , is a valid diagram graph. It is easy to check that V * 8 is flat and that its inconsistency graph is γ δ α β γ δ α β Figure 6 . The graph V * 8 and its inconsistency graph.

(V * 8 ) = K 4 . Hence there are flat diagram graphs for which χ( ( )) ≥ 4. Thus there is no direct correspondence between χ( ( )) and µ, not even for 1-diagrams. . Interpreting each arc {i, j} as a transposition (i, j) on [n] we may assign the permutation

to each diagram . One observes: (i) if a 1-diagram then π( ) is an involution.

(ii) An involution π is the permutation representation of a 1-diagram if and only if its cycle decomposition does not contain a canonical transposition, i.e., a transposition of the form (i, i + 1). (iii) Different 1-diagrams give rise to different involutions. A natural set of generators for the symmetric group S n is the set T of all transpositions. The corresponding length function is

where cyc(π ) is the number of cycles into which π decomposes. We have (τ ) = 1 if and only if τ ∈ T is a transposition. The associated metric is the canonical metric on the Cayley graph (S n , T ), see Reidys and Stadler (1996) for a detailed discussion. As the involutions form a subset of S n we have THEOREM 3. The function

where π( ) denotes the permutation representation of a diagram , is a metric on the set of all 1-diagrams with n vertices.

In particular, two 1-diagrams and have distance d( , ) = 1 if and only if they differ by a single arc. Metrics on 'shape space' are necessary for a detailed quantitative study of sequence-structure maps. Applications to RNA secondary structures are reported for instance in Fontana et al. (1993a) and . (3) is not limited to defining a metric on the set of structures. Suppose we are given an alphabet of monomers (for instance {A, U, G, C} in the case of RNA) and a rule that determines which pairs of monomers may form a base pair (AU, UA, GC, CG, GU, UG in the case of RNA).

The proof of this result in Reidys et al. (1997) is valid for all 1-diagrams, not only for secondary structures. The intersection theorem sets the stage for shape space covering: it allows close-by sequences to fold into structures that are as different as desired -given a suitable folding potential. Further applications of equation (3) can be found in Weber (1997).

The number X n of all diagrams on n vertices is X n = 2 (n−1)(n−2)/2 as there are (n − 1)(n − 2)/2 possible arcs (Söler and Jankowski, 1991), which can be arbitrarily combined to form a diagram.

In Section 2.7 we have shown that all 1-diagrams correspond to involutions, therefore the number T n of involutions on [n] is an upper bound for the number D n of 1-diagrams on [n]. The combinatorics of involutions is discussed for instance in the book by Wilf (1994):

T n = T n−1 + (n − 1)T n−2 n ≥ 2 and T 0 = T 1 = 1 , and has the asymptotic form

The number of involutions T n therefore grows faster than exponential in the sense that n √ T n → ∞. 1-Diagrams can be counted by a very similar recursion as the following result shows:

Proof. The first few values of D n are obvious, D 0 = 1 is a convenient definition. The recursion is derived as follows: a 1-diagram on n + 2 vertices can be formed either by adding a lone vertex to a 1-diagram on n + 1 vertices or by adding an arc {1, k} to a 1-diagram on n vertices by inserting the vertex labeled k between the k − 1st and the kth vertex of . Note, however, that must be a 1-diagram, but in addition it might have an arc {k − 1, k} in , as these vertices are separated by the endpoint of the newly introduced arc in the new structure. Viewing this differently, we may either add the arc {1, k} or the -like structure consisting of the arcs {1, k} and {k − 1, k + 1}, which leaves us with a 1-diagram on n − 2 vertices and the same problem. Repeating this argument we arrive at the following expansion:

Hence we have D n+2 = D n+1 + n D n + (n − 1)D n−2 + (n − 3)D n−4 + · · · . Observing that D n+1 can of course be written in the same form and substituting into the above equations yields D n+2 = (n +1)D n +n D n−1 +(n −1)D n−2 +(n −2)D n−3 +· · ·+2D 1 + D 0 − D n−1 .

Subtracting the corresponding expansion for D n+1 yields

A simple rearrangement now completes the proof. Proof. The series D n is obviously monotonically increasing. Hence the series a n+2 = (n +1)a n , a 0 = a 1 = 1 is a lower bound. It is well known that a n = (n −1)!! grows faster than exponentially.

A very similar formula is obtained for the case of a circular backbone. There are D n−2 diagrams with arc {1, n} on n vertices. Thus the number of 1-diagrams with circular backbone is D n = D n − D n−2 .

An exponential upper bound can be found, however, on the numbers D n (c) of 1-diagrams whose inconsistency graph has chromatic χ( ( )) ≤ c. We find THEOREM 6. D n (c) ≤ (2c + 1) n .

Then there is a color partition of with c colors. As ([n] , i ) is a secondary structure, it can be encoded in dot-parenthesis notation. Coloring the parenthesis with a different color for each class i of the color partition hence yields a unique representation of . This representation can be interpreted as a string of length n over an alphabet consisting of '.' and c different pairs of brackets, i.e., with 2c + 1 letters.

Theorem 6 is not a very good estimate as we shall see in Section 3.3. 

A secondary structure on n+1 digits may be obtained from a structure on n digits either by adding a free end at the right-hand end or by inserting a base pair 1 ≡ (k + 2). In the second case the substructure enclosed by this pair is an arbitrary structure on k digits, and the remaining part of length n − k − 1 is also an arbitrary valid secondary structure. Therefore, we obtain the following recursion formula for the number S n of secondary structures:

This expression has first been derived by Waterman (1978) 

The most important numbers are collected in Table 1 . A more detailed table can be found in Hofacker et al. (1999) . Detailed combinatorial studies on various aspects of secondary structure graphs are based on equation (6), see for instance Penner and Waterman (1993) , Stein and Waterman (1978 ), Waterman (1978 , Smith (1978a, b) and Hofacker et al. (1999) . In the following we shall make use of the number

of secondary structures of length n with k base pairs. This closed formula was recently derived in Schmitt and Waterman (1994).

A first naive upper bound is D n (2) ≤ S 2 n , because on each side of the x-axis we have a secondary structure on n vertices. Theorem 5 implies D n (2) ≤ 5 n . A slightly better bound can be derived using the enumeration of secondary structures:

Proof. We start with the s(n, k) secondary structures with k arcs. In order to produce a bi-secondary structure we use 2l of the n − 2k unpaired positions for introducing l additional arcs. There are n−2k 2l possible choices for these additional pairs, which may form any of the C l = 1 l+1 2l l possible configurations of l matched parentheses. C l is a Catalan number. Without losing generality we may assume that l ≤ k, i.e., the partial secondary structure with the larger number of pairs is drawn above the x-axis. Thus

Replacing the sums by appropriate multiples of the maximum entry is trivial.

Note that this bound is still a gross overestimate: (i) it contains all the redundancy of the ().[]-representation. (ii) The number C l also counts conformations of square brackets of the form [], which do not correspond to a graph at all, and it counts conformations in which not all square brackets are inconsistent with an arc that is represented by a round bracket. These latter configurations are counted more than once. Proof. Let A n (k, l) denote argument of the maximum in Lemma 2. It is straightforward to compute

Solving the optimization problem that defines A is straightforward. A short computation shows thatŷ = 1/ √ 21 andx = (7 − √ 21)/14 is the only local maximum with x, y ≤ 1/2. It violates the condition y ≤ x, however. The solution thus lies on the boundary of the triangle (0, 0), (1/2, 0) and (1/4, 1/4). Setting y = 0 one obtains the maximumx = 1/2 − 1/ √ 20. Along the edge x + y = 1/2 we findŷ = 1/ √ 12 violating the condition y ≤ x. With x = y we arrive at the cubic equation 31x 3 − 31x 2 + 10x − 1 = 0 which has a single real solution x ≈ 0.1942. We find A(x,x) ≈ 1.5605329 = A, because this value is much larger than the values of A(x, y) at the three corners of the triangle. More sophisticated models of RNA take into account that (i) base pairs must enclose at least m = 3 other bases, and (ii) that isolated base pairs are energetically disfavored. In Hofacker et al. (1999) the numbers (m,l) n of secondary structures with stack size at least l base pairs and separation of the vertices incident with an arc at least m is derived. We define (m,l;κ) n to be the number of 1-diagrams with χ ( ( )) ≤ κ and with the same restrictions, and set

Clearly we have (m,l;2) n ≤ [ (m,l) n ] κ because the 1-diagram is a superposition of at most κ secondary structures. In particular, we find the upper bound A (2) 3,2 ≤ 3.418 for the biophysical case.

We have not been able to derive an exact counting series for bi-secondary structures. Hence we resorted to a numerical survey. We pursued three different strategies for estimating the number of bi-secondary structures:

(1) Complete enumeration is feasible only for very small values of n because the number of structures grows faster than 2 n . (2) As an alternative we produce random strings from the alphabet ().[] and check each string if it is the normal form of a bi-secondary structure. The number of secondary structures is then estimated by 5 n × N nf /N sample , where N sample is the size of the random sample and N nf is the number of detected normal forms in the sample. Our best estimates are compiled in Table 2 . In the biologically interesting case, m = 3 and l = 2, we find A (2) 3,2 ≈ 2.35. Judging from the exhaustive enumeration data (Grüner et al., 1996a) we should expect that the number of structures that actually occur as minimum energy structures is still smaller.

In order to incorporate pseudoknots into secondary structure computations we first have to devise an energy model. Naturally, we require that this energy function extends the standard model for RNA secondary structures without pseudo-knots. The standard energy model is based on decomposing a secondary structure into its 'loops' (Zuker and Sankoff, 1984) . For secondary structures without pseudo-knots this decomposition is unique and coincides with the so-called minimum cycle basis of the secondary structure graph (Leydold and Stadler, 1998) . The free energy of a particular secondary structure is computed as the sum of the contributions of the individual loops. These contributions depend on the type of the loop (stacked base pairs, hairpin loop, bulge, interior loop, or multi-branch loop), its size, and on the sequence of nucleotides, see e.g., Walter et al. (1994) .

We emphasize that the energy model for pseudo-knotted structures introduced in this section is not intended as an accurate potential for predicting pseudo-knots in particular (biologically relevant) sequences. It is intended as a simplified model that allows us to investigate the likelihood of pseudo-knots in an ensemble of sequences and the stability of pseudo-knots against point mutations of the sequence. It is shown in Tacker et al. (1996) for (pseudo-knot-free) RNA secondary structures that such statistical properties are surprisingly robust against changes in the parameter set and the choice of the folding algorithm. For instance, most global properties of RNA folding are already present in the 'maximum matching' model, which, instead of an elaborate energy model, simply seeks to maximize the number of base pairs (Tacker et al., 1996) . A potential function that captures the most salient features of pseudo-knots is therefore sufficient for our purposes.

Very little experimental information is available on the thermodynamics of pseudoknots, see, however, Wyatt et al. (1990) . On the other hand, the geometric constraints of RNA structures are well understood (Saenger, 1984; Pleij et al., 1985) . Hence we start from the following three principles:

(i) Loops that are not involved in pseudo-knots have the same energy contributions as in pseudo-knot-free RNA secondary structures. (ii) The stacking energies of base pairs are not affected by pseudo-knot formation even in stems that are part of pseudo-knots. (iii) Steric hindrance is the major contribution to the pseudo-knot energies.

The energy parameters detailed in Walter et al. (1994) , and implemented in release 1.2 of the Vienna RNA Package , are used in this study for . Schematic drawing of an RNA structure with pseudo-knots. The three loops A, B, C and the four stems 1, 2, 3, and 6 are involved in pseudo-knots. The evaluation of loops A and C is straightforward as they contain only a single paired region, namely stack 3. Three stems are contained in loop B; we assume that stack 6 is the longest one. The ν-parameters of the three pseudo-knotted loops are listed on the r.h.s. The energy contributions of base pair stacking and the contributions of all unmarked loops are evaluated according to the standard model. the non-pseudo-knot contributions. The basic idea for parameterizing the pseudoknot contributions rests on two simplifications: (i) RNA stacks are viewed as stiff rods and (ii) unpaired regions are assumed to be very flexible. Within a loop that is involved in pseudo-knot formation, we assume that each of the stacks formed by the pseudo-knotted base pairs is a stiff helix. This reasoning leads to an ansatz based upon the following quantities: u = number of unpaired bases in the loop. L max = number of base pairs in the longest pseudo-knot stack. L i = number of bases in pseudo-knot stack i. K = number of stacked base pairs that can be bridged by one unpaired base.

First we define a measure for the sterical hindrance in the pseudo-knotted loop:

This expression assumes that all other parts of a loop can be used to meet the constraint introduced by the longest stacked region L max within the loop, see Fig. 7 . The free energy contributions of the unpaired regions can be estimated from a theory by Jacobson and Stockmeyer (1950) . The same approach is used for long loops in the standard energy model for RNA secondary structures. If the free energy needed to join the ends of an unrestricted, zero volume polymer is known, the theory predicts the free energy needed to form a similar but larger loop. The minimum length of an RNA loop that behaves according to the Jacobson-Stockmayer theory is not known. We therefore introduce a parameterν and define the energy function as follows:

Our energy model therefore has four free parameters that need to be estimated from the available experimental data, namely K ,ν, E ps and α. For simplicity we fixed α at the same value that is used for all non-pseudo-knotted loops: α = 1078.56 cal mol −1 (at 37 • C).

Given the sequence, one can compute the secondary structure with the minimum energy by means of dynamic programming (Waterman, 1978; Zuker and Sankoff, 1984) . In the presence of pseudo-knots this is no longer true. In the present study we use Tacker's kinetic folding algorithm (Tacker et al., 1996) which is based on (Martinez, 1984) . It first produces a list of all possible stems of a given sequence and then determines the free energies of the loops and stacks. The most stable stem is the first one added to the folding structure. Using this as a constraint, we compile a list of the remaining possible stems and add the most stable one to the growing structure. This procedure is repeated until the free energy of the structure cannot be decreased anymore.

The parameters K ,ν, and E ps are adjusted by predicting the structures of a sample of sequences that are known to form pseudo-knots. This set includes seven fragments with about 80 nt from bacteriophages that form H-type pseudo-knots, E. coli tmRNA containing five pseudo-knots, and RNAse P sequences from several different species [for details see Haslinger (1997) ]. The best results were obtained using K = 4,ν = 9, E ps = 4.2 kcal mol −1 . The same value of E ps was used in Abrahams et al. (1990) . In order to check the influence of these parameters on the sequence-structure relation of RNA we also used a parameter set leading to an unrealistically large number of predicted pseudo-knots in the test sequences (K = 3,ν = 10, E ps = 2.0 kcal mol −1 ).

The average number of base pairs and related statistical properties of the predicted structures depend very little on the inclusion of pseudo-knots and the choice of the pseudo-knot parameters. This is not surprising as the relative stability of base pairs and unpaired regions remains essentially unchanged. The average loop size decreases with the 'unrealistic' pseudo-knot potential because loop regions may take part in pseudo-knots at very little entropic cost. The frequency of pseudo-knots in random sequences is tabulated in Table 3 . For the realistic potential we find a pseudo-knot every ∼2500 bases, while with the exaggerated potential one would expect one pseudo-knot in every random sequence of length n = 148.

As we have seen in the previous section, there are still many more sequences than structures. In order to obtain a better impression of the relationship between the numbers of sequences and structures that arise through folding, we determine the rank order statistics of folded structures. To this end we compute the structures of a large number of randomly chosen sequences and rank them according to their frequency f of occurrence in the sample. A plot of log f versus the logarithm of the rank reveals a generalized Zipf's law (Zipf, 1949) , Fig. 8 . While the inclusion of pseudo-knots somewhat increases the fraction and the diversity of rare structures (large ranks) it does not change the general shape of the distribution. As for 'pure' secondary structures there is only a small number of common structures into which almost all sequences fold. Naturally, we ask how sequences folding into the same (common) secondary structure are distributed in sequence space. We call the set S(ψ) of all sequences (genotypes) folding into phenotype (contact structure) ψ the neutral set of ψ. More precisely, S(ψ) is the pre-image of ψ w.r.t. the folding map algorithm. As for 'pure' secondary structures, a large fraction λ of point mutations is neutral, i.e., does not change the structure. On the other hand, RNA sequences folding into a particular structure are not significantly clustered: they form a percolating network spanning the entire sequence.

The fraction λ of neutral point mutations was estimated from 6000 independently generated random sequences, see Table 4 . As observed in Grüner et al. (1996a, b) , we find that λ decreases somewhat with chain length (the large values for n = 30 being caused in part by the large number of short sequences that 'fold' into the open structure). The fraction of neutral neighbors approaches an asymptotic value slightly above 0.5. Surprisingly, this value is almost independent of the potential function: even a potential leading to a large fraction of pseudo-knotted structures decreases λ only by a few percent. A random graph theory (Reidys et al., 1997; Reidys, 1997) shows that there is threshold value of about λ * = 0.307 (for a 4-letter alphabet). If the fraction of neutral neighbors exceeds this threshold, then the set of all sequences folding into a given structure s forms a single connected network, which has been termed the neutral network of s.

These neutral networks can be conveniently detected by means of a simple computer experiment. A neutral path starts at a randomly chosen sequence. Then we construct a series of subsequent mutants such that each sequence along the path folds into the same structure as the initial sequence, and such that each step increases the Hamming distance from the starting point. The strict logic on base pairing in RNA makes it necessary to consider two types of mutations: (i) point mutations in the unpaired regions of the molecules, and (ii) the substitution of one possible base pair (GC, CG, GU, UG, AU, UA). All other mutations in paired regions necessarily change the structure, for instance by changing a GU pair into a GG mismatch. If there are neutral networks in sequence space the neutral path will reach a length L close to n before there is no neutral mutant further away from the starting point (n is the maximal Hamming distance between sequences of length n). On the other hand, if the neutral sets S(ψ) form isolated clusters we will find L n. When interpreting the lengths of neutral paths we have to keep in mind that (i) the search procedure only produced lower bounds on the diameter of neutral networks, and (ii) that a pair of random sequences has an expected distance of 0.75n for a 4-letter alphabet. The data in Fig. 9 are therefore a clear indication for the existence of percolating neutral networks in the presence of pseudo-knots.

Secondary structures form a particular class of contact structures. In this contribution we have considered a natural generalization of this class. Indeed, most known RNA structures with pseudo-knots are bi-secondary structures (which do not involve nested pseudo-knots). Bi-secondary structures correspond to planar graphs while secondary structures form the sub-class of outer-planar graphs.

The inconsistency graph introduced in Section 2.4 is a useful construction capturing most of the geometrical features of nucleic acid structure. Its chromatic number may serve as a measure of structural complexity. It seems possible that an analogous construction will be useful for classifying and comparing protein structures as well. The analysis of graph-theoretical properties of classes of contact structures might also be useful for designing energy models that are more realistic and/or algorithmically easier than pair potentials. The standard folding potential for RNA and DNA secondary structures, for instance, is based on loops, that is, induced subgraphs of the diagram graph that are circles. The total energy of a secondary structure is defined as the sum of the sequence-dependent energy contributions of all loops [see, e.g., Freier et al. (1986) ]. It is by no means obvious how this energy function should be generalized to include non-secondary structure features such as pseudo-knots, G-quartets, or knots, because in general there is no unique decomposition of a graph into loops.

In order to understand the sequence-structure mapping of a class of biopolymers it is necessary to have bounds on the number of structures that can possibly be formed for a given set of sequences. We can expect the existence of neutral networks and shape space covering only if the number of sequences by far exceeds the number of structures. While the number of possible contact structures grows faster than exponentially with the length of the molecules we find exponential upper bounds when the structural complexity is limited. In particular, there are not more than some 4.7 n possible bi-secondary structures. If we enforce in addition the sterical (looplength at least 3) and thermodynamic (no isolated base pairs) constraints of natural RNA sequences, then this bound drops to 3.42 n . Exhaustive enumeration indicates that the actual number of bi-secondary structures with biophysical constraints grows roughly as 2.35 n . Therefore the number of RNA sequences, 4 n , exceeds by far the number of possible bi-secondary structures.

We have then devised a simple energy function extending the standard model to incorporate pseudo-knots. Our ansatz assumes that steric hindrance is the major contribution to pseudo-knot energies counteracting the stabilizing effect of the additional base pairings. Based on this approach we used a kinetic folding procedure to show that the inclusion of pseudo-knots does not significantly change the global features of the sequence structure map of RNA: there are many more sequences than structures, and almost all sequences fold into one of a small number of common structures. Common structures are uniformly distributed over sequence space.

Neutral networks in sequence space can therefore be modeled as random graphs (Reidys et al., 1997) . This ansatz generalizes from secondary structures to 1-diagrams without modifications. The only input parameter in this model, namely the fraction λ of neutral neighbors, has been determined computationally. Computer simulations agree with the prediction of a random graph theory: the fraction of neutral mutations, λ > 0.5, is well above the threshold value of λ * ≈ 0.306, hence all sequences folding into a given common structure form a single percolating network that spans the entire sequence space. This is verified by the detection of neutral paths that extend through the entire sequence space.

The intersection theorem is valid for bi-secondary structures, hence the random graph approach (Reidys et al., 1997) , can be used to predict the relative locations of the neutral networks of two different common structures. In particular, we have to expect shape space covering, i.e., the neutral networks of any two common structure come very close to each other at least in some parts of the sequence space. This sets the stage for the evolutionary transitions between different structures described in detail in Weber (1997) and Fontana and Schuster (1998) .

In summary, the mathematical results and the computer simulations presented in this contribution indicate that pseudo-knots do not change the qualitative picture of the RNA sequence-structure map as it was obtained from studying secondary structures.

Prediction of RNA secondary structure, including pseudoknotting, by computer simulation

The book thickness of a graph

RNA multi-structure landscapes. A study based on temperature dependent partition functions

Mutational analysis of the RNA pseudoknot component of a coronavirus ribosomal frameshifting signal

Structure and evolution of ribonuclease P RNA

Structure and topology of 16S ribosomal RNA. an analysis of the pattern of psoralen crosslinking

An RNA pseudoknot and an optimal heptameric shift site are required for highly efficient ribosomal frameshifting on a retroviral messenger RNA

Interchain loops in polymers: effects of excluded volume

Planar permutation graphs

Statistical thermodynamics of double-stranded polymer molecules

A-1 ribosomal frameshifting in a doublestranded RNA virus of yeast forms a gag-pol fusion protein

A property of 4-chromatic graphs and some remarks on critical graphs

Queues, stacks, and graphs

Probing the structure of the Escherichia coli 10Sa RNA (tmRNA)

Statistics of landscapes based on free energies, replication and degradation rate constants of RNA secondary structures

Statistics of RNA secondary structures

Continuity in evolution: on the nature of transitions

RNA folding and combinatory landscapes

Similar cage-shaped structures for the RNA component of all ribonuclease P and ribonuclease MRP enzymes

Parallel-stranded duplex DNA containing dA·dU base pairs

Improved free-energy parameters for predictions of RNA duplex stability

Optimal book embeddings of FFT, benes, and barrel shifter networks

Thermodynamics of folding a pseudoknotted mRNA fragment

Analysis of RNA sequence structure maps by exhaustive enumeration. I. Neutral networks

Analysis of RNA sequence structure maps by exhaustive enumeration. II. Structures of neutral networks and shape space covering

Lessons from an evolving rRNA: 16S and 23S rRNA from a comparative perspective

Long-range structure in ribonuclease P RNA

RNA secondary structures with pseudoknots. Master's thesis, Inst. f. Theoretische Chemie

Comparing queues and stacks as mechanisms for laying out graphs

Fast folding and comparison of RNA secondary structures

Combinatorics of RNA secondary structures

Energy directed folding of RNA sequences

The Petersen Graph

Proportions of irreducible diagrams

Exploring phenotype space through neutral evolution

Smoothness within ruggedness: the role of neutrality in adaptation

Intramolecular reaction in polycondensations

Proportions of irreducible diagrams

A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs

Sur le problème des courbes gauches en topologie

Minimal cycle bases of outerplanar graphs

Domain structure of the ribozyme from eubacterial ribonuclease P

The Colin de Verdière number of linklessly embeddable graphs

Graphs with e edges have pagenumber o( √ E)

Transfer RNA-like structures: structure, function and evolutionary significance

An RNA folding rule

Modelling of the three-dimmensional architecture of group I catalytic introns based on comparative sequence anaysis

Embedding De Bruijn graphs and shuffle-exchange graphs in five pages

Spaces of RNA secondary structures

A new principle of RNA folding based on pseudoknotting

Random induced subgraphs of generalized n-cubes

Bio-molecular shapes and algebraic structures

Generic properties of combinatory maps: Neural networks of RNA secondary structures

Petersen family minors

Sachs' linkless embedding conjecture

Linear trees and RNA secondary structure

How to search for RNA structures. Theoretical concepts in evolutionary biotechnology

From sequences to shapes and back: A case study in RNA secondary structures

Comparing multiple RNA secondary structures using tree comparisons

Delaunay tessellation of proteins: Four body nearest neighbor propensities of amino acid residues

Modeling RNA secondary structures I. Mathematical structural model of predicting RNA secondary structures

On a class of linked diagrams, I. Enumeration

On a class of linked diagrams

On some new sequences generalizing the Catalan and Motzkin numbers

Characterizations of outerplanar graphs

Algorithm independent properties of RNA structure prediction

An unusual mRNA pseudoknot structure is recognized by a protein translation repressor

Evidence for allosteric coupling between the ribosome and repressor binding sites of a translationally regulated mRNA

Identification and analysis od the pseudoknot-containing gag-pro ribosomal frameshift signal of simian retrovirus-1

RNA pseudoknots and translational frameshifting on retroviral, coronaviral and luteoviral RNAs

Sur une problème de configurations et sur les fractions continues

Ribosomal frameshifting requires a pseudoknot in the saccharomyces cerevisiae double-stranded RNA virus

Sur un novel invariant des graphes et un critère de planarité

Cleavage of tRNA with imidazole and spermine imidazole constructs: a new approach for probing RNA structures

Co-axial stacking of helixes enhances binding of oligoribonucleotides and improves predicions of RNA folding

Secondary structure of single-stranded nucleic acids

Introduction to Computational Biology: Maps, Sequences, and Genomes

Combinatorics of RNA hairpins and cloverleaves

RNA secondary structure: a complete mathematical analysis

Dynamics on Neutral Evolution

RNA pseudoknots. Current Opinion Struct

Evidence that a downstream pseudoknot is required for translational readthrough of the moloney murine leukemia virus gag stop codon

RNA pseudoknots stability and loop size requirements

Embedding planar graphs in four pages

Human Behaviour and the Principle of Least Effort

RNA secondary structures and their prediction

Stimulating discussions with Ivo L. Hofacker and Christoph Flamm are gratefully acknowledged.