key: cord-0005048-umx8mewn authors: Pressing, J.; Reanney, D. C. title: Divided genomes and intrinsic noise date: 1984 journal: J Mol Evol DOI: 10.1007/bf02257374 sha: 614aabc49f42a808dd028adb926450fe0c8403b4 doc_id: 5048 cord_uid: umx8mewn Segmental genomes (i.e., genomes in which the genetic information is dispersed between two or more discrete molecules) are abundant in RNA viruses, but virtually absent in DNA viruses. It has been suggested that the division of information in RNA viruses expands the pool of variation available to natural selection by providing for the reassortment of modular RNAs from different genetic sources. This explanation is based on the apparent inability of related RNA molecules to undergo the kinds of physical recombination that generate variation among related DNA molecules. In this paper we propose a radically different hypothesis. Self-replicating RNA genomes have an error rate of about 10(−3)–10(−4) substitutions per base per generation, whereas for DNA genomes the corresponding figure is 10(−9)–10(−11). Thus the level of noise in the RNA copier process is five to eight orders of magnitude higher than that in the DNA process. Since a small module of information has a higher chance of passing undamaged through a noisy channel than does a large one, the division of RNA viral information among separate small units increases its overall chances of survival. The selective advantage of genome segmentation is most easily modelled for modular RNAs wrapped up in separate viral coats. If modular RNAs are brought together in a common viral coat, segmentation is advantageous only when interactions among the modular RNAs are selective enought to provide some degree of discrimination against miscopied sequences. This requirement is most clearly met by the reoviruses. Divided genomes, in which the genetic information is dispersed among two or more physically separate molecules, occur in over 17 groups of RNA viruses (Matthews 1979) . By contrast, only one group of DNA viruses appears to have a divided genome structure (Haber et al. 1981) . Perhaps the most popular explanation for the existence of divided genomes is that they allow modular RNAs from related but different clones to exchange sequences by reassortment (Jaspers 1974; Joklik 1974; Nahmias and Reanney 1977; Reijnders 1978; Lane 1979) . According to this view divided genomes have been selected for during evolution because they expand the pool of variation in interbreeding populations. The expanded variation model is widely favoured because reassortment has been documented convincingly in many groups of viruses with divided genomes, e.g., the influenza group (Webster and Granoff 1974; Palese and Young 1982) , the reoviruses (Ahmed and Fields 1981; Joklik 1981) and the rotaviruses (Greenberg et al. 1982) . Mixed infection experiments also support the concept that related viruses can exchange RNA segments: The RNA-3 of cowpea chlorotic mottle virus can substitute for the RNA-3 ofbrome mosaic virus (Bancroft 1972) , and the Q strain of cucumber mosaic virus (CMV) and the V strain of tomato aspermy virus (TAV) form a vigorous hybrid that contains RNA-3 of CMV and RNA-1 and RNA-2 of TAV (Habili and Francki 1974) . Recent developments, however, have cast very substantial doubt on the concept that the raison d'rtre of divided genomes is to generate diversity for evolution. For one thing this explanation is strongly 'group selectionist', and the group-selection argument is now invoked only as a last resort by biologists (see Maynard-Smith 1978; Rose and Doolittle 1983) . Perhaps more to the point, the rate of mutation in RNA viruses such as the Qf3 is so high that each viable viral genome in a clonally derived population differs from the 'average' sequence of the parental population in one or two positions (Domingo et al. 1978 ). This appears to be true of all RNA viruses (Holland et al. 1982; Reanney 1984) . Thus the identity of any RNA virus genome in nature is only maintained because selection continually removes the unacceptable variants that are continually generated by the error-prone copier mechanism. In a situation in which the pool of preexisting genetic variation is so high that 'only 14% of the population consists of"wild-type" phage' (i.e., virus) (Domingo et al. 1978 ) the need for additional variation to be generated by reassortment is not obvious. Other explanations may therefore be sought. Paradoxically, the very thing that damages the credibility of the 'generator of diversity' model, namely the high level of noise in the RNA copier mechanism, provides, in our view, the correct explanation for the widespread occurrence of divided viral genomes. DNA is usually a double-stranded molecule that replicates in a semiconservative fashion. By contrast, RNA molecules replicate asymmetrically from single strands even if the copying process uses a double-helical template (as in reoviruses). This distinction has the fundamental consequence that lesions in RNA molecules cannot be repaired. This is because known error-correcting mechanisms always use the information specified by one intact strand of a duplex molecule to guide restorative processes on the damaged complementary strand (see Loeb and Kunkel 1982) . Because RNA lacks the editing and proofreading functions of DNA, the frequency of mutation in RNA copier systems is between 100,000 and 100,000,000 times greater than that in DNA copier systems (see Kornberg 1980; Holland et al. 1982) . The rate of mutation in the ribophage Q/3 has been accurately calibrated at 3 x 10 -4 substitutions per nucleotide per generation (Domingo et al. 1978) . Studies in polio, Sendai and influenza viruses on the evolution of variants resistant to monoclonal antibodies suggest that this value is essentially the same in all RNA viruses (Portner et al. 1980; Prabhakar et al. 1982) . However, measurements of mutation frequency are of dubious value unless the temperature at which replication occurs is taken into account, since error rates in RNA replication increase with temperature (Reanney and Pressing 1983) . In the absence of repair, errors will accumulate in the genome. An RNA virus that replicates at 37~ may thus transmit considerably more errors to its progeny than one that replicates at 15~ Nondividing RNA molecules also can accumulate errors from a variety of sources. Physical agents may damage RNA by deaminating cytosine to uracil (heat) or inducing pyrimidine-to-pyrimidine dimers (ultraviolet radiation), while a chemical agent such as hydrogen peroxide may generate a variety of changes due to its oxidative capacity. Since these premutational lesions cannot be repaired in an RNA system, they may cause a significant, long-term deterioration in the quality of the genetic information encoded in RNA molecules. RNA genomes are also vulnerable to cleavage by RNases, which are abundant in most sites of RNA virus multiplication. Collectively these observations suggest that the level of noise in the RNA information transmission mechanism may be much higher than is generally appreciated. How have RNA genomes compensated for these hazards? We suggest that genome segmentation is a direct adaptive consequence of the high error burden placed on RNA genes. This suggestion is based on the observation that a small module of information has a higher chance of passing through a noisy channel without damage than does a large one. Essentially our model depends on the fact that all of the agents that induce errors in RNA (including the replicase mechanism itself) do so in a lengthdependent manner. To provide a rigorous, quantitative model of the 'protective' effect(s) of segmentation we have compared the 'survival rating' of a divided genome with that of an undivided genome of equivalent length. A detailed treatment requires consideration not only of the error rate per generation due to copying, but also of the long-term differential survival rate of mutants with respect to the wild type as a result of processes of chemical equilibrium and kinetics. The details of such processes are still poorly known, and we avoid the need to consider them by building our model in the following way" Let infection occur via a population of initially error-free viruses. From this base line we then derive the fidelity of the next generation of viruses for both the divided and undivided genome cases. The quotient of these two fidelities is considered to represent the selective advantage, K, of the divided genome strategy in each generation of virus multiplication. Development of the model shows that the protective effects ofgenome subdivision differ depending on whether the various modular RNAs are united in one capsid (monocompartment viruses) or dispersed among separate capsids (multicompartment viruses). Consider a simple case where two RNA modules, A and B, are separately replicated and encapsidated. Let q be the mean copying fidelity per nucleotide per generation. The corresponding error rate is then 1 -q. Copy error evidently makes the largest contribution to 1 -q, but there is also some deterioration in the quality of the genetic information due to the various factors mentioned earlier. The effects of these latter types of error burden should be comparable and are treated further on in the discussion. Consider first the undivided genome case in which a fraction r of a total of N virus particles enters the cells. If the genome length is L nucleotides the number of correctly replicated viruses in the subsequent generation is just rNq L and the overall fidelity of the process is f = rNqL/N = rq L (1) For a comparable divided genome, let the lengths of the two modular RNAs A and B be LA and Ln nucleotides, with LA + LB = L. Of the initial N A A-modules, rN A will enter cells. However, these will not be viable in any given cell unless at least one B is also present. Hence the number of viable As will be rNANa, where ha is the overall fraction of cells inoculated with RNA B. These viable As will then produce a next generation of rNAqL^hB correct As. Similarly, the number of correct Bs produced will be rNBqL~XA. The resulting fidelities of the new generation will differ for A-and B-type modules. To obtain an overall fidelity we reason as follows. The final amount of correct A-type RNA is rNgqLAhB'La and that of B-type RNA rNnqLBhA'LB, for a total of r(NAL g qLA~k a + NaLnqLBXA). The initial amount of RNA was NAL A + NBLa. Hence, the overall fidelity is given by NAL g -F NaLa and K, the selective advantage per generation, by This result may readily be generalized to the case of n particles in which each A is viable only if it enters a cell already inoculated with at least one B, one C, etc. We obtain K = ~ N~L~qI-' l~ hi / (qL ~ Ñ L , ) ~-, J=' , ~, j@i where N i is the initial number of RNA type i, L~ is its genome length, and ~k i is the overall fraction of cells inoculated with RNA i. 9 Data from Matthews (1979) . Some virus groups for which information is lacking or imprecise have not been included The significant size asymmetry of this group is probably related to their highly elongated tubular structure, with consequent capsid size variation (of. tobacco mosaic virus). This contrasts with the isometric or polyhedral or, in alfalfa mosaic virus bacilliform, capsid symmetries of the other groups Several comments follow from this equation. First, h i is clearly an increasing function of Ni. In fact the most plausible hypothesis, random transmission of particles, may be shown to yield ~,i = Ie -rNim-, where NH is the number of host cells. Second, the expression for K may be shown to achieve its maximum value when both numbers and sizes of modules are equally distributed. This is shown mathematically in Appendix 1. The prediction of equal size distribution is well supported by existing data on multicompartment viruses (Table 1) , considering that other biochemical factors are bound to influence the distribution to some degree. We expect equal number distribution to hold as well, but data on this are not available. Taking equal distribution as given, we may assess the dependence of K on n by writing L i ~ L/n, Ni ~ N/n and i~I ~j ~ k"-1 to obtain j~i K = Xn--lq (5) valid for n > 1, where ~ is the average fraction of cells infected. From this it may be seen that the selective advantage of genome segmentation increases with error rate (1-q) and genome size (L). Since q[(t/nt-llL is a slowly increasing function of n (when n > 1) for typical q and L values and ~nis sharply decreasing n is unlikely to be large (see Fig. 1 ). The data are in accord with this prediction: As seen in Table 1 , n is never greater than 3. As will be seen below, this contrasts sharply with the monocompartment case. (5) ]. h = 0.97; q-L = 1.10 (10% errors). The precise range of n within which K > 1 is a sensitive function of the infection density 9~, and attention should be directed to the shape of the curve rather than its specific n-intercept We consider first the undivided genome case. If a population of initially error-free viruses of genome length L nucleotides reproduces inside hosts, the overall fidelity after one generation will be f = qL and the fraction of incorrect copies, 1 _qL. For a comparable divided genome, consider first the simple case in which there are two modular RNAs A and B of length LA and LB nucleotides, with LA + LB = L. The result of the first replication will be to produce a new generation of NA As and NB Bs. We assume here for simplicity that NA = NB, since (a) this clearly corresponds to the most efficient use of cell material for reproduction and (b) in many cases the mechanism of reproduction is known to be co-operative (e.g., when A codes for the replicase of both A and B and B codes for the protein coat). The fraction of correct copies would then be the same (qLA. qLB = qL) as in the undivided genome case if all possible pairings of one A and one B (viz. AB, A'B, AB*, A'B*, where * indicates a miscopied sequence) were equally likely to result in encapsidation. However, genetic and molecular data indicate that this equiprobability is not achieved in nature. For example, the reoviruses contain 10 to 12 modular RNAs (Matthews 1979) . These RNAs are assembled in a highly specific manner such that each virus particle normally accumulates the correct quota of the genetic information (Silverstein et al. 1976; Joklik 1981) . The molecular basis of this specificity is believed to be a set of selective RNA:RNA interactions and/or RNA:protein interactions (Silverstein et al. 1976 ) that operate while the RNAs are single stranded (Joklik 1981). Lane (1979) has proposed a co-operative process in which the binding of one RNA during assembly alters a nucleation complex to create a binding site for a second RNA and so on. If the binding of a given RNA is faulty, the subsequent RNAs have a smaller chance of entering the nascent particle. These selective interactions constitute a crude form of molecular proofreading (Reanney 1982) , since miscopied RNAs are less likely than well-copied RNAs to recognise sequence-specific elements in complementary RNAs or RNA-binding sites in proteins. This bias towards accurately copied sequences allows us to define a molecular 'discrimination coefficient' tr which theoretically may vary between 0 and 1.0. (When tr = 1, there is no discrimination against erroneous copies, and when a = 0, there is complete discrimination.) It is possible to provide a physical interpretation of a in the following manner. For the process of RNA association and encapsidation the fundamental rate constant, k, may be assumed to follow an Arrhenius-type equation where C is a constant, kT is the Boltzmann factor, and E is the activation energy of the reaction A + B ~ AB. Now consider the reaction A + B* -, AB* and let the presence of one error in B* increase the activation energy by ~, two errors by 2~ and so on. If the mean number of errors in B* is m, then the corresponding activation energy is E + roB* and the corresponding reaction constant is which can be shown (Appendix 2) to yield aB = e -r"'/kr (7) The result for module A has a similar form. The precise value of ~ is not known, but it may be estimated from the difference in mean free energy (AG) of hydrogen bonding of an incorrect base pair (e.g., G-A) relative to a correct pair (e.g., G--C) times the probability that this error is located or expressed in the s sites of specific interaction between the A-type and B-type RNAs. A rough estimate is thus Since an average AG value is reported to be approximately 1.8 kcal (Tinoco et al. 1971) , for sample values ofLB = 1000 and s = 30 we obtain ~ ~ 0.054 kcal.t For the well-studied Q~ system, in which m 1 The sample values for L and s have been chosen because they accord with known data. The value of 1000 given for L is a 'rounded off' figure for genome segment 8 of simian rotavirus 11, which has been sequenced and which has a length of 1059 bases (Both et al. 1982) . The average length of the 11 modular RNAs of this monocompartment virus is about 1110 bases, according to estimates ofgenome segment length given in the same reference. The value for s is difficult to estimate because the mechanism of reovirus assembly is poorly understood. The value must be greater than 10, otherwise it would not be possible to assemble 10 to 12 modular RNAs in the same coat (see Lane 1979) . If one assumes the basis of this specificity to be a set of RNA:RNA interactions, then the presumed interaction between small nuclear RNAs and the 'consensus' sequences at the exon:intron junction of split genes may provide a model of what happens. The number of nucleotides in this 'consensus' sequence is about 139 is known to be about 1.5, this yields tr = 0.88 at 37~ This value agrees well with the measured reaction-rate ratios between mutant and wild-type forms of this virus (Domingo et al. 1978) , which were typically 0.8-0.9, and shows that our model is physically realistic. To see the effect of a we first note that the A-and B-type RNAs may be equal or different in size. By appropriate choice of labels we then write LA --LB. Since then qLA ___ qLB, there will either be fewer correct Bs than As or an equal number. In either case the number of correct Bs will limit the number of possible correct encapsidations. Now for the case of random association (~r = 1) the relative numbers of products would be: For a < 1 a certain fraction g(a) of the Bs that (for ~r = 1) paired to form A*Bs will now preferentially associate with As that (for tr = 1) paired to form AB*s. This is the only way additional ABs may be formed. Since the number of A*Bs is less than or equal to the number of AB*s, there will be fewer such Bs than As, or an exactly equal number. Con-28 (Rogers and Wall 1980) . If, as seems more likely, the specificity resides in RNA:protein interactions one can be more confident, because many examples of specific DNA:protein interactions are known in detail. The average of the published values is about 30, the value for s used by us in this paper. There is no reason to believe that the number ofnucleotides recognised by a protein would be greatly different if the substrate were single-stranded RNA, because the high degree of secondary structure in known RNAs confines most bases to double-helical regions. Size of target sequences for some polynucleotide:protein interactions a Data from Matthews (1979) Names recommended by Matthews (1979) have been used, the suffixes "virus" or "viridae" being omitted for simplicity c Double-stranded RNA virus sequently the number of extra ABs formed for ~r < 1 is proportional to g(a). (number of A*Bs for a = 1) = g(~r)(1 --qLA)qLB, and the total fraction of correct ABs is F = qL~ILB -]-g(#)(1 -qLA)qLB (9) whence we find K = F/f = 1 + g(a)(q-L^ --1) where g satisfies the conditions O--. 0. This is because an increase in error rate (1 -q) will increase the mean number of errors per RNA molecule and cause a to decrease. That is, d~r/d(1 -q) < 0. Therefore, From equation (10) we then obtain the following important conclusions: 1. The selective advantage (K) of genome segmentation increases with error rate (1-q). 2. As a approaches 1, K approaches 1 as expected. As cr approaches 0, K approaches q-LA. 3. The overall advantage is cumulative. Even if the K value per generation is small, the effect is multiplied over succeeding generations. The dependence of K on g(a) may be shown in tabular form, where we assume a high error rate of approximately 10% per module so that q--L^ = 1.10, as: g(o-) K 0.05 1.005 0.5 1.05 0.95 1.095 4. The overall effect of segmentation in monocompartment viruses is to decrease by a factor of as much as 2 the amount of genetic information subject to noise-induced damage, since for a -. 0 with LA ~ Ln, K -* q--L/2. Maximum message length is inversely related to the frequency of copy error (Eigen and Schuster 1977) . Point (4) therefore suggests that genome subdivision should allow a segmented RNA virus to exceed significantly the maximum information capacity of a continuous RNA genome. It is therefore of interest to note that the largest RNA genomes are found among the reoviruses (Table 2) , and it can hardly be coincidental that these upper-limit genomes (12-20 x 106 daltons) are also the most highly divided (10-12 modules per particle). Indeed, a relatively high degree of segmentation for monocompartment viruses is predicted by our model (see below). We now turn to a second question: Can we predict an optimal number of modular RNAs for monoparticulate viruses for which the above advantage holds true? Consider n RNAs A, B, C, etc., and assume as before that we label the largest segment LB so that the concentration of Bs is a limiting factor. Let a mean a operate for each RNA association inside the viral particle. The enhancement of correct genomes over incorrect will now entail the factor g(a) (see earlier discussion) for each separate viral association. The fraction of correct copies may then be shown to be: i~B where the product is calculated over all but the largest modular RNA. It is not possible to determine unambiguously the optimal size distribution of RNA modules from Eq. (9) for the mathematical reason that the precise variation of g(a) with module size is unknown. If g(a) were independent of module size then K would be maximized for an equal size distribution. It is far more likely that g(a) increases with individual module sizes, since covalent and hydrogen bonding sites sensitive to errors will increase in number as Li rises [this is readily shown from Eq. (4) of Appendix 2]. In this case the optimal distribution is expected to be asymmetrical; just how asymmetrical cannot be determined without a functional form for g(a). In any case an asymmetrical distribution is strongly preferred in nature for viruses with highly divided genomes, such as the influenza group and the reoviruses (Table 3) . To assess the dependence of K on n it is convenient to simplify the expression using the approximation qx = 1 + x(q-l) for q values very close to 1. For typical values of q and L involved here, this approximation is quite reasonable. We then find Now for fixed L, an increase in n will proportionately decrease the average size of the RNA modules. If this holds for RNA B as well, we may write LB = fL/n, so that where f is a constant > 1. Hence, the important conclusion emerges that the greater n is, the greater K becomes (Fig. 2) . In reality, various counteracting factors would come into play at sufficiently large values of n. There could, for example, be some variation of the mean a with n due to conformational co-operativity, which would be expected to cause K to increase less rapidly for larger n. In any case, we predict that n values in monocompartment viruses should range to much larger values than in multicompartment viruses. This is as observed (Table 2 ). One of the basic theorems of physics states that information cannot be transmitted over long pe- riods of time without experiencing some deterioration in quality due to 'noise' in the copier process (see Shannon 1949) . The error rate in the 'first' genes in prebiotic systems can be determined by measuring the rate at which non-complementary bases are incorporated into single-stranded RNA in the absence of enzymes. This value is about 10-1-10 -2 substitutions per nucleotide per doubling (Inoue and Orgel 1983) . By contrast, in a present-day 'highfidelity' system based on duplex DNA, the error rate can be as low as 10 -~ (Drake 1974) . Thus, during the course of evolution, noise levels in genetic replication have dropped by a factor of 100 million or so (Reanney 1984) . This reduction has been brought about by the development of such corrective mechanisms as proofreading and mismatch repair. RNA cannot use any of these restorative processes, so RNA viral genomes are the only (surviving?) genetic systems in which the error rate remains at the level (3 x 10 -4) characteristic of unrepaired, enzymatically catalysed nucleic acid synthesis. This presents RNA genomes with a set of unique survival problems (Reanney 1982) and limits the amount of information that can be encoded in any single uninterrupted RNA molecule to about 23,000 nucleo-tides (Eigen and Schuster 1977; Reanney 1984) . The premise of this article is that genome segmentation compensates for this high error level by dividing the genetic data among smaller subunits, thus presenting a lesser target size to the various error-promoting agents. The protection offered by segmentation thus applies not only to the deleterious effects of errorprone replicases (the chief source of error) but also to damage by physical agents such as heat. Most metabolically active cells contain large numbers of RNase enzymes to maintain a rapid rate of mRNA turnover. Since a single endonucleolytic cut normally destroys infectivity, RNases pose a fundamental problem for the intracellular survival of RNA genomes. Most single-strand RNAs appear to minimise this danger by folding into compact, largely double-helical formats [e.g., the flower arrangement for the coat protein gene of ribophage MS2 (Min Jou et al. 1972) ] that are relatively resistant to the action of most RNases, whose preference is for single-stranded sites. It is possible that the fully base-paired, double-helical character of segmental reoviral RNAs is a consequence of the need to protect the large amount of information in the reoviral genome from enzymatic degradation. One must remember here that duplex RNA is not functionally equivalent to duplex DNA, since the latter can be unwound, whereas the former cannot (Reanney 1982) . RNases may also favour evolution towards small segmental genomes because the probability of a single cut being introduced into an RNA molecule should increase monotonically with length. (In fact, one can show mathematically from a simple Brownian-motion model that the RNA degradation rate should vary with RNA length as L1/,.) It may also be true that the chemical lability of RNA molecules at physiological pH favours genome segmentation, since the greater the number ofphosphodiester bonds (i.e., the longer the molecule), the greater the chance of hydrolytic self-destruction (Reanney 1984) . Thus, information divided between two (or more) RNA modules may have a significantly longer half-life in an RNase-rich, alkaline environment than would a continuous RNA of equivalent size. The model developed in this paper shows that segmentation has followed two evolutionary routes which appear to be quite unrelated. Where modular RNAs are partitioned among discrete particles (multicompartment viruses) segmentation per se has a significant protective effect provided (a) the transmission of particles remains independent and random and (b) effective mechanisms exist for the hostto-host passage of viral genes. Point (a) is certainly true of natural viruses and point (b) is probably true of viruses that have insect vectors or that infect hosts (e.g. plants) that are grouped closely together in a common habitat (Nahmias and Reanney 1977) . However, this kind of arrangement has the disadvantage that two or more particles must jointly infect a common cell if infection is to be successful. This limits the number of modular RNAs to two or three (Table 1 and Fig. 2 ). The sizes of the modular RNAs are also curtailed by this requirement, as the chances of successful coinfection are proportional to the number of available progeny viruses, which is in turn a function of genome lengths (the smaller the composite genome, the larger the number of RNAs that can be generated from a fixed pool of precursor elements). Thus, it is not surprising that multicompartment viruses in general have relatively small aggregate genome sizes. The problems associated with a multihit infection process may also explain why many RNA viruses have not adopted the divided genome strategy. The second evolutionary route was followed by those viruses whose modular RNAs were able to combine in a sequence-specific manner. Such specific interactions, if spread over a large enough number of nucleotides and/or stages, provide for some degree of discrimination against miscopied information. The improved overall fidelity that can be achieved by this mechanism allows the amount of genetic information that can be encoded in RNA genomes to expand significantly. Only a few groups of RNA viruses display this feature. Chief among these are the reoviruses, which have both the largest and the most highly divided of all RNA genomes. Is the model advanced in this paper sufficient to explain the abundance of divided genomes in RNA viruses? Lane (1979) lists four possible advantages of genome segmentation: (a) increased genetic flexibility; (b) more efficient packaging; (c) more efficient control of translation; and (d) increased resistance to inactivation by such environmental agents as ultraviolet radiation (UV). The argument for increased flexibility is essentially that reassortment combines genetic information from different sources and so provides an RNA version of the 'hybrid vigour' seen in higher systems. However, the idea that divided genomes have been selected for because they enhance genetic variation (see Joklik 1974) seems paradoxical in a situation in which the amount of inherent genetic variation is so great that the genome can only be defined in a probabilistic sense (Domingo et al. 1978; Reanney 1982) . Thus, while there is no doubt that reassortment among modular RNAs occurs in nature, the 'expanded variation' model seems inadequate to explain the genesis and maintenance of divided genomes among so many groups of RNA viruses. Point (b) (more efficient packaging) is suspect because it would apl~ly to single-stranded DNA as well as single-stranded RNA, and any credible theory of the origin of divided genomes must explain why these structures are virtually confined to RNA (as opposed to DNA) viruses. Point (d), increased resistance to UV, supports the general argument of this paper, since UV-induced lesions in DNA viruses can be repaired, whereas those in RNA viruses cannot. This leaves point (c), more effective control of translation. Eukaryotic cells, unlike prokaryotes, transcribe their genes into monocistronic messenger RNAs. Jaspers (1974) has suggested that the replicative strategies of most groups of RNA viruses can be rationalised by assuming that they represent attempts to accommodate polycistronic RNAs to a biochemical environment tailored to process only monocistronic mRNAs (for a discussion see Reanney 1982) . On this basis segmentation has the striking advantage that it divides RNA viral information into small units that closely resemble cellular mRNAs. There is no doubt, in our view, that this argument is correct, as far as it goes. However, it cannot be put forward as a general or unitary explanation of the divided genome phenomenon because other strategies open to and adopted by RNA viruses also enable RNA genomes to survive in na-144 ture. Thus, some RNA viruses, e.g., the polio group, overcome the problem of translation by translating their polycistronic genomes into long polyproteins which are then cleaved into specific, functional peptides. Other viruses, e.g., the influenza group, generate monocistronic RNAs by transcription from continuous, negative-strand genomes using discrete initiation and termination signals. Yet another mechanism has been adopted by the coronaviruses, and so on (for a discussion of these various points, see Reanney 1982) . But perhaps the most telling argument against the 'monocistronic message" concept as a general explanation for RNA genome segmentation is the presence of RNA viruses with divided genomes in prokaryotes, since, as stated, prokaryote messengers are polycistronic, not monocistronic. In the context of a bacterial cell the segmental RNA genomes of the eystoviruses (see Matthews 1979 ) seem very much out of place, suggesting that, at least in this instance, a different explanation of the divided genome phenomenon must be sought. In summary, we believe the various selective advantages ofgenome segmentation proposed to date fail, either singly or together, to take account of what we believe to be the chief guiding influence on the evolution of RNA as opposed to DNA viruses, namely the 105-10S-fold greater error rate of RNA replication compared with DNA replication. Although factors such as the need to adapt eukaryotic RNA viral genomes to the unit-message character of higher cells may have played a part in the tendency of RNA genomes to split into separate modules, any explanation that does not recognise the critical importance of genetic noise for the divided genome phenomenon is at best only a partial answer, and at worst, a misleading one. We suggest that the current theories be revised to accommodate the model presented in this paper. so that the activation energy for this reaction is E + mst, where ms is the mean number of errors in B*. The reaction constant for this second reaction is then Ce-~ + mrS, r, which by comparison with Eq. (lb) yields O.I~ = e mt'~T (4) and similarly for o.^. To solve Eqs. (1) and (2) so that as t ~ co, and dropping concentration brackets, AoBo(Bo*~ t'oA 1 For o.^ = o.s = 1, and since Ao* = Bo* to the accuracy of the high error approximation, AB~ ~-AoBd2qa, which is the statistically correct result. For Ao* = Bo* and aA ~ o.s, we may write This formula is readily generalized to an n-module system by the consideration of successive reactions A + B ~ AB, AB + C ~ ABC, ABC + D ~ ABCD,..., with the result qL F = ~ (12) r _ r so that the selective advantage K = F/q L is 1 K (13) a"(2 -#)~ a monotonically increasing function of n and monotonically decreasing function ofo.. Steps in the approximate solution disallow the limit o. -0. Reassortmentofgenomesegments between reovirus defective interfering particles and infectious virus: construction of temperature sensitive and attenuated viruses by rescue of mutations from DI particles A virus made from parts of the genomes of brome mosaic and eowpea ehlorotie mottle viruses A general strategy for cloning double-stranded RNA: nucleotide sequence of the Simian-ll rotavirus gene Characterisation of the RNA of fowl plague virus Nucleotide sequence heterogeneity of an RNA phage population The role of mutation in microbial evolution The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercele Rescue and serotypic characterisation of noncultivable human rotavirus by gene reassortment Evidence for a divided genome in bean golden mosaic virus, a geminivirus Comparative studies on tomato aspermy and cucumber mosaic viruses. III. Further studies on the relationship and construction of a virus from parts of the two viral genomes Vande Pol S (1982) Rapid evolution of RNA genomes A non-enzymatic RNA polymerase model Plant viruses with a multipartite genome DNA replication. WH Freeman and Co The RNAs of multipartite and satellite viruses of plants Fidelity of DNA synthesis Classification and nomenclature of viruses The evolution ofsex Nucleotide sequences of the gene coding for the bacteriophage MS2 coat protein The evolution of viruses Differences in RNA patterns of influenza A viruses Variation of influenza A, B and C viruses Similar frequencies of antigenic variants in Sendai, vesicular stomatitis and influenza A viruses High frequency of antigenic variants among naturally occurring human Coxsackie B4 virus isolates identified by monoclonal antibodies The evolution of RNA viruses Heat as a determinative factor in the evolution of genetic systems The molecular evolution of RNA viruses The origin ofmulticomponent small ribonueleoprotein viruses A mechanism for RNA splicing Parasitic DNA--the origin of species and sex The mathematical theory of communication Separation often reovirus genome segments by polyacrylamide gel electrophoresis The reovirus replicative cycle Estimation of secondary structure in ribonucleie acids (eds) Viruses, evolution and cancer We seek the values of L~ that will maximize the function K = Ni~q t~ ~,j qL Keeping only first order terms in 6, and using the excellent approximation q~*6 = qL'[1 --6(1 --q)], substitution yieldsN, LI i valid for all j,k. This gives a rather complex set of relations between the Li, Ni and q. An additional set may be obtained by setting ~K = 0 for variations in the N~. However, the biology of the situation allows an appropriate simplification. Any real virus system must cope with a range of values of q, since daily or seasonal temperature changes may affect q considerably, as may factors such as changing levels of ultraviolet radiation. The most generally valid solution to Eq. (4) will then be one that treats q as an independent variable and sets all individual coefficients ofq t, identically equal to zero, since the q~ are independent functions (unless the L~ are equal, which then gives our final result immediately).The only terms of the form qt, for i r j,k come from the righthand side of Eq. (4), and their coefficients can only be zero if Nj = Nk, i.e., Ni = N/n for all i. In this case all ~,i are equal, and there is no solution for the coefficients of qt~ and qt, unless Lj L, = L/n, as the reader may readily verify. (1 d) dt where k is the basic reaction constant and the as indicate mean reductions of reaction rate for copies with errors. We have assumed second-order kinetics.In addition, there are four equations of material conservation: It is possible to provide a straightforward interpretation of the as as follows. The rate constant k of Eq. (l a) may be assumed to follow an Arrhenius-type equation k = Ce -~T(3) where C is a constant and E is the activation energy of the reaction A + B ~ AB. Consider now the reaction A + B* ~ AB*. Let the presence of each error in B* increase the activation energy by ~,