key: cord-022128-r8el8nqm
authors: Domingo, Esteban
title: Molecular basis of genetic variation of viruses: error-prone replication
date: 2019-11-08
journal: Virus as Populations
DOI: 10.1016/b978-0-12-816331-3.00002-7
sha: 
doc_id: 22128
cord_uid: r8el8nqm

Genetic variation is a necessity of all biological systems. Viruses use all known mechanisms of variation; mutation, several forms of recombination, and segment reassortment in the case of viruses with a segmented genome. These processes are intimately connected with the replicative machineries of viruses, as well as with fundamental physical-chemical properties of nucleotides when acting as template or substrate residues. Recombination has been viewed as a means to rescue viable genomes from unfit parents or to produce large modifications for the exploration of phenotypic novelty. All types of genetic variation can act conjointly as blind processes to provide the raw materials for adaptation to the changing environments in which viruses must replicate. A distinction is made between mechanistically unavoidable and evolutionarily relevant mutation and recombination.

Genetic change was a prerequisite for the early life forms to be generated and maintained (Chapter 1), and it is also a requirement for the evolution of present-day life. We may willingly or inadvertently modify selective pressures, but genetic change is rooted in all replication machineries. The results of genetic modifications, regarding relative dominances of variant forms, are guided by selective pressures and random events. The replicative machinery itself has probably been influenced by natural selection; as an example, polymerases devoid of a capacity to generate variants should have endured a long-term selective disadvantage. However, once the replicative machinery was established, the mechanisms of variation acted independently of the selective pressures applied or to come. Viruses use the same molecular mechanisms of genetic variation than other forms of life: mutation (that encompasses point mutations and insertions-deletions of different lengths), hypermutation, several types of recombination, and genome segment reassortment. Mutation is observed in all viruses, with no known exceptions. Recombination is also widespread, but its role in the generation of diversity appears to vary among viruses. Its occurrence was soon accepted for DNA viruses, but it was considered uncertain for the RNA viruses. Pioneering studies of poliovirus (PV) by P. D. Cooper, V. I. Agol and colleagues, and of foot-and-mouth disease virus (FMDV) by A.M. King and colleagues provided the first evidence of recombination in RNA. The present perception is that recombination is more widespread than thought only a few decades ago and that its frequency and the types of genomic forms it generates are varied among viruses. For example, it appears that positivestrand RNA viruses recombine more easily than negative-strand RNA viruses to give rise to mosaic genomes of standard length. Several negative-strand RNA viruses, however, can yield defective genomes through recombination, frequently characterized by deletions in their RNA. A connection between the structure of replication complexesdas viewed by X-ray diffraction or high-resolution cryo-electron microscopydand the propensity to produce defective genomes has not been established. Defective genomes are increasingly perceived not only as unavoidable side-products of blind replicative imperfections but as classes of genome subpopulations that perform relevant biological roles for the standard, infectious viruses. Genome segment reassortment, a type of variation close to chromosomal exchanges in sexual reproduction, is an adaptive asset of segmented viral genomes, as continuously evidenced by the ongoing evolution of the influenza viruses. The three modes of virus genome variation are not incompatible, and reassortantrecombinant-mutant genomes are continuously arising in present-day viruses. The potential for genetic variation of RNA and DNA viral genomes is remarkable, and it is the ultimate molecular mechanism that lies at the origin of the virus diversity delineated in Chapter 1.

Mutation is a localized alteration of a nucleotide residue in a nucleic acid. It generally refers to an inheritable modification of the genetic material. In the case of viral genomes, mutations can result from different mechanisms: (i) template miscopying (direct incorporation of an incorrect nucleotide); (ii) primer-template misalignments that include miscoding followed by realignment, and misalignment of the template relative to the growing chain (polymerase "slippage" or "stuttering"); (iii) activity of cellular enzymes (i.e., deaminases), or (iv) chemical damage to the viral nucleic acids (deamination, depurination, depyrimidination, reactions with oxygen radicals, direct and indirect effects of ionizing radiation, photochemical reactions, etc.) (Naegeli, 1997; Bloomfield et al., 2000; Friedberg et al., 2006) .

The basis of nucleotide misincorporation during template copying (defined as the incorporation of a nucleotide different from that expected from the template residue at that position) lies mainly in the electronic structure of the bases that make up DNA (adenine, A; guanine, G; cytosine, C; thymine, T) or RNA (with uracil, U instead of T). Each base includes potential hydrogen-bonding donor sites (amino or amino protons) and hydrogen-bonding acceptor sites (carbonyl oxygens or aromatic nitrogens) that contribute to standard Watson-Crick base pairs ( Fig. 2.1) , as well as wobble base pairs (nonstandard Watson-Crick, but fundamental for RNA secondary structure and mRNA translation) ( Fig. 2.2) . The conformation of the purine and pyrimidine bases is highly dynamic. Amino and methyl groups rotate about the bonds that link them to the ring structure. In dilute solution, hydrogen bonds are established with water, and they can be displaced by nucleotide or amino acid residues to give rise to nucleotide-nucleotide or nucleotide-amino acid interactions. The strength difference between hydrogen bonds established in a polynucleotide chain with water, and their strength between two bases in separate polynucleotide chains determines whether a double-stranded polynucleotide will be formed.

Purine and pyrimidine bases can acquire different charge distributions and ionization states. As a consequence, in addition to the standard Watson-Crick and wobble, other base pairs are found in naturally occurring nucleic acids (notably cellular rRNA and tRNA) and in synthetic oligonucleotides (A-U or A-T Hoogsteen, and A-G, C-U, G-G, and U-U pairs, as well as interactions involving ionized bases). One of the types of electronic redistribution leads to tautomeric changes, such as the keto-enol and amino-imino transitions, which modify the hydrogen-bonding properties of the base; tautomeric imino and enol forms of the standard bases can produce non-Watson-Crick pairs. The proportion of the alternative tautomeric forms can be influenced by modifications in the purine and pyrimidine rings, which, in turn, can favor either the syn or anti conformation of a nucleoside, which is defined by the torsion angle of the bond between the 1 0 carbon of the ribose and either N1 in pyrimidines or N9 in purines ( Fig. 2.1) . The anti-conformation is usually the most stable in standard nucleotides and polynucleotides. The transition from the anti to the syn conformation may alter the hydrogen-bonding properties of the base, thereby inducing mutagenesis (Bloomfield et al., 2000; Suzuki et al., 2006) . The understanding of conformational and G-C, with ribose as pentose). Phosphodiester bonds of two potential polynucleotide chains of different polarity (outer arrows) are indicated.

effects on the base-pairing tendencies of nucleoside analogs in the context of the active site of a polymerase is very relevant to the design of specific mutagenic analogs for viral polymerases in lethal mutagenesis-centered antiviral approaches (Chapter 9).

Base-base interactions are not only responsible for part of the mutations that occur during genome replication, but also for the formation of double-stranded nucleic acids, either within the same polynucleotide chain or between two different chains. Transitions from a coil-like into an organized double-stranded (or other) structure are functionally relevant for both RNA and DNA. In the case of RNA, doublestranded regions in the adequate alternation with single-stranded regions, determine key catalytic or macromolecule-attracting abilities, as for example, ribozyme activities (Chapter 1), the internal ribosome entry site (IRES) of several viral and cellular mRNAs, or multitudes of functional RNA-protein interactions (Denny and Greenleaf, 2018) . Adjacent base stacking due to electronic interactions (rather than hydrophobic bonds as once thought), contribute also to the stability of double-helical regions in nucleic acid molecules. Structural transitions due to alternative stacking conformations, particularly within polypurine or polypyrimidine tracts, can affect nucleic acidprotein interactions. In turn, replication machineries (typically including viral and host proteins gathered in membrane structures) may also be affected by nucleic acid conformations; such effects are important in virology regarding consequences for mutant generation in a given template sequence context. These considerations on structural transitions are relevant to the nonneutral character of silent (also termed synonymous) mutations (those in open-reading frames that do not result in an amino acid substitution), a point to be addressed in the next section. Transitions from a single-stranded into a doublestranded nucleic acid structure and the relative stability of the two forms depend on multiple factors that include the nucleotide sequence of the nucleic acid, its being a ribo-or a deoxyribosepolynucleotide, temperature, ionic environment, and ionic strength. Positively charged counterions neutralize negatively charged phosphates, and favor duplex stability [as an overview of FIGURE 2.2 Examples of a class of non-Watson-Crick base pairs termed wobble base pairs. The drawing is similar to that of Fig. 2 .1, except that the sugar residues and phosphodiester bonds have been omitted. Hydrogen bonds (discontinuous lines in red) are shown between I (inosine) and C, U, and A, and between G and U. Wobble base pairs are important for codonanticodon interactions, as described in the text. physical and chemical properties of nucleic acids and their nucleotide components, see (Bloomfield et al., 2000) ].

Mutations resulting from any of the mechanism just summarized can be divided into transitions, transversions (both referred to as point mutations), and insertions and deletions (referred to as indels) (Fig. 2.3) . The latter occurs preferentially at homopolymeric tracts and also at short, repeated, sequences which are prone to misalignment mutagenesis ( Fig. 2.4 ). An example is an editing mechanism for some viral mRNAs, such as the phosphoprotein mRNA of the Paramyxovirinae [ (Kolakofsky et al., 2005) 

Other examples in vivo are hot spots for variation in reiterated sequences in complex DNA genomes (Yamaguchi et al., 1998; Barrett and McFadden, 2008; McGeoch et al., 2008) , or the insertion of two amino acids (often Ser-Ser, Ser-Gly, or Ser-Ala between residues 69 and 70 of the HIV-1 reverse transcriptase), in concert with HIV-1 resistance to nucleoside inhibitors (Winters and Merigan, 2005 ) (Chapter 8). Hairpin structures in RNA and DNA may also induce deletions as a result of slippage mutagenesis (Pathak and Temin, 1992; Viguera et al., 2001) . Transition mutations occur more frequently than either transversions or indels during virus replication. Nucleotide discrimination at the catalytic site of viral polymerases fits this observation because of the more likely replacement of a purine or pyrimidine nucleotide by its structurally more similar nucleotide. In some cases, however, an abundance of indels and similar numbers of transitions and transversions have been recorded (Cheynier et al., 2001; Malpica et al., 2002) . The molecular bases of such unexpected behavior regarding mutational spectra are not well understood.

The generation of point mutations and indels is subject to thermodynamic and quantummechanical uncertainties inherent to atomic fluctuations, rendering mutagenesis a highly unpredictable event, thus introducing stochasticity (randomness) in a key motor of evolution: the generation of diversity at the molecular level (Domingo et al., 1995; Eigen, 2013) . indicate point mutations, insertion or deletions (known as indels). A genome is depicted as an elongated rod. Symbols on the rod (cross, circle, and line) represent mutations. Hypermutation is generally associated with a high frequency of specific mutation types (crosses and lines). A region inserted or deleted from the genome is depicted as an empty rod.

The effect of mutations on the structure and function of proteins is extremely relevant to penetrate into the mechanisms that drive virus evolution since selection acts on phenotypes that are often embodied in protein molecules. Silent or synonymous mutations are those that do not give rise to an amino acid substitution despite being located in a protein-coding region of a genome. Their occurrence is due to the degeneracy of the genetic code: the same amino acid can be coded for by two or more triplets (codons), with the exception of AUG for methionine and UGG for tryptophan. Synonymous mutations are not necessarily selectively neutral, neutral meaning that they have no discernible consequence for any viral function.

The assumption that synonymous mutations are selectively neutral, and the fact that the early comparison of nucleotide sequences of homologous genes showed a dominance of synonymous over nonsynonymous mutations, contributed to the foundations of the neutral theory of molecular evolution. This theory attributes the evolution of organisms at the molecular level mainly to the random drift of genomes carrying neutral or quasi-(or nearly-) neutral mutations (King and Jukes, 1969; Kimura, 1983 Kimura, , 1989 . The terms quasi-neutral or nearly-neutral may seem ambiguous to molecular biologists. However, in the formulation of the neutral theory they had a precise meaning of the selection coefficient (a parameter that measures fitness differences) being lower than the inverse of the effective population size, with minor variations in the equations of some formulations (Kimura, 1983) . Despite random drift of genomes playing an important role in molecular evolution, evidence gathered over the last decades renders untenable the assumption that synonymous mutations are neutral. Evidence to the contrary has been obtained with viruses and cells, including mutations in the human genome that may affect enhancer functions (Hirsch and Birnbaum, 2015) , mRNA folding (Faure et al., 2017; Mittal et al., 2018) , and microRNA targeting (Brest et al., 2011) , among other processes (Novella, 2004; Novella et al., 2004; Parmley et al., 2006; Hamano et al., 2007; Resch et al., 2007; Lafforgue et al., 2011; Nevot et al., 2011 Nevot et al., , 2018 Supek, 2016) . There are several mechanisms by which synonymous mutations can affect virus behavior: alteration of cis-acting regulatory elements in viral genomes, decrease of the stability of duplex structures within the RNA genome or between viral sequences and miRNAs or siRNAs, or changes of viral gene expression [splicing precision or translation fidelity through the modification of RNA-RNA or RNA-protein interactions; reviewed in (Martínez et al., 2016) ].

Synonymous codons use different tRNAs for protein synthesis, and different tRNAs do not have the same relative abundance in different host cell types. Thus, the rate of protein synthesis, an important phenotypic trait for cells and viruses, can be affected by the frequency of alternative synonymous codons present in mRNAs (Richmond, 1970; Akashi, 2001) . Not only codon bias, but also specific codons or codon combinations may affect ribosome speed to regulate the folding of nascent proteins during translation (Makhoul and Trifonov, 2002; Rocha, 2004; Aragones et al., 2010; Brule and Grayhack, 2017) . As a consequence, generation of rare codons by mutation of abundant codons (or vice versa) can modify viral fitness (Chapter 5). Rare codons may also limit the fidelity of amino acid incorporation when the frequency of the required aminoacyl-tRNAS is low (Ling et al., 2009; Zaher and Green, 2009; Czech et al., 2010) . The frequency of codon pairs in RNA genomes is also a fitness determinant relevant to the preparation of attenuated viral vaccines.

To complicate matters further, a synonymous mutation may be neutral or quasi-neutral in one environment, but it may contribute to selection in a different environment, because of the phenotypic effects of RNA structure and codon usage. Neutrality is relative to the environment.

Regarding the effects of mutations (Box 2.1), the following general statements are applicable to viruses:

• Although difficult to prove due to the limited number of environments used for experimentation, truly neutral mutations (i.e., with no influence on the virus in any environment) are probably very rare. This applies to synonymous, as well as to nonsynonymous mutations. • Mutations resulting in chemically conservative amino acid substitutions are more likely to be tolerated than those leading to chemically different amino acids. Tolerability (quantified by substitution matrices among amino acids in protein evolution) should be distinguished from neutrality. A tolerated mutation may cause a reduction in fitness, which is nevertheless compatible with virus replication. • A conservative amino acid substitution may have important biological consequences. • The effect of any individual mutation is context-dependent in two ways: it may depend on other mutations in the same genome (epistasis, see also Section 2.8 and Chapter 5) or on the mutant cloud that surrounds the genome harboring the mutation (effects of complementation, cooperation, or interference, discussed in Section 3.8 of Chapter 3). • The previous points do not deny the influence of random drift of genomes on intrahost and interhost evolution. The currently most accepted view is that positive and negative selection and random drift occur continuously during virus evolution (Chapter 3).

The proportion of transition versus transversion mutations may depend initially on the specific replication machinery of a virus that tends to produce some mutation types preferentially over others. For a given virus, short-term evolution is often reflected in the dominance of transitions, a dominance which is less apparent when distantly related sequences of the same virus are compared. The effect of evolutionary distance on the transition to transversion ratio was observed in the FMDV genome sequence comparisons carried out in our laboratory over several decades, that ranged from analyses of mutant spectra relative to their corresponding consensus sequence to independent viral isolates from disease outbreaks separated by several decades [review of the work on FMDV evolution in (Domingo et al., 1990 (Domingo et al., , 2003 ]. These two levels of sequence comparisons (within mutant spectra vs. independent isolates) can be highly significant, as discussed in Chapters 3 and 7.

The proportion of synonymous and nonsynonymous mutations that have mediated the diversification of viral genomic sequences that belong to the same phylogenetic lineage is often considered informative of the underlying evolutionary forces. Probably because of the rooted (albeit uncertain) notion that biological function is more likely to reside in protein than in DNA or RNA, the ratio of nonsynonymous substitutions (corrected per nonsynonymous site in the sequence under study) (d n ), to the number of synonymous substitutions per synonymous site (d s ), termed u (u ¼ d n /d s ) is calculated to infer the dominant mode of evolution (Nei and Gojobori, 1986) . 

Mutations may affect stem-loop or other secondary and higher-order structures involved in regulatory processes through nucleic acidnucleic acid or nucleic acid-protein interactions. The primary sequence in nonstructured, noncoding regions may also be functionally relevant.

In coding regions 

The effect of a mutation may be contextdependent in two manners: it may be affected by other mutations in the same genome (epistasis) or by other genomes of the surrounding mutant spectrum.

When u ¼ 1 the evolution is considered neutral, when u < 1 purifying (or negative) selection is dominant, and when u > 1 positive (or directional) selection prevails (Yang and Bielawski, 2000) . The types of selection undergone by viruses are discussed in Section 3.4 of Chapter 3.

There are several reasons to be cautious about the significance of u: (i) synonymous mutations need not be neutral, for reasons discussed in Section 2.3. (ii) In the course of evolution, important but transient events of positive selection (termed episodic positive selection) due to one or a few amino acid substitutions may be accompanied by a larger number of synonymous, tolerated mutations. In this situation, u computes as u < 1, thus indicative of purifying selection despite a critical role of positive selection triggered by one or few nonsynonymous mutations in the evolutionary outcome (Crandall et al., 1999) . (iii) In a striking proof of the above arguments, statistically significant mutational biases led to a value of u indicative of positive selection in an in vitro evolution experiment simulating pseudogene evolution in which positive selection was not possible (Vartanian et al., 2001) ; this study represents a warning which is rarely mentioned when discussing the limitations of conclusions based on the value of u. (iv) A synonymous change may permit the mutant codon to acquire a relevant nonsynonymous change through a point mutation. The term quasisynonymous has been used to describe codons that encode the same amino acid, but that has a different evolutionary potential regarding the amino acids that they can access through a point mutation. Alternative codons for a given amino acid approximate a replicative system to points of sequence space from which a phenotypically relevant change has a different probability (Chapters 3 and 4).

(v) Finally, u was initially proposed to compare distantly related rather than closely related genomes, as is often the case in the short-term evolution of viruses (Kryazhimskiy and Plotkin, 2008) .

For all these reasons, u values as a diagnostic of forces mediating DNA and RNA virus evolution must be regarded only as indirect and suggestive, not as a definitive parameter. Despite these arguments, use of u to propose a model of virus evolution continues being surprisingly unchallenged in the literature of virus evolution. We use u only in a limited way in subsequent chapters because, in addition to the limitations just listed, it does not help in the interpretation of critical evolutionary events regarding viruses. Related shortcomings apply to other tests of neutrality developed to interpret the origin of DNA polymorphisms in the years following the summit of the neutralist-selectionist controversy (Fu, 1997; Achaz, 2009 ).

Mutation rates quantify the number of misincorporations per nucleotide copied, irrespective of the fate (increase or decrease in frequency) of the mutated genome produced. A mutation rate for a genomic site measures a biochemical event dictated by the replication machinery and environmental parameters that affect the catalytic properties of the polymerase. In contrast, a mutant (or mutation) frequency describes the proportion of a mutant (or a set of mutants) in a genome population. The frequency of a mutant will depend on the rate at which it is generated (given by the mutation rate) and on its replication capacity relative to other genomes in the population (Drake and Holland, 1999) (Fig. 2 .5). A specific mutation may be produced at a modest rate, but then be found at high frequency because the mutation is advantageous for genome replication in that environment. The converse situation may also occur. Some mutational hot spots (in the sense of genomic sites where mutations tend to occur with high probability) may never be reflected among the repertoire of mutations found in a 2.5 Mutation rates and frequencies for DNA and RNA genomes genome population because of the selective disadvantage they inflict upon the genome harboring them.

A very significant example is the elongation of an internal oligoadenylate tract located between the two functional AUG initiation codons in the FMDV genome. The homopolymeric tract constitutes a hot spot for variation due to polymerase slippage ( Fig. 2 .4B). The elongation of the internal oligoadenylate was dramatic because it sextuplicated the number of adenylate residues present at that site; it was only observed when FMDV was subjected to repeated plaque-to-plaque (bottleneck) transfers, not large population passages. In fact, this drastic genetic modification has not been recorded among natural isolates of the virus. The molecular instruction to elongate the oligoadenylate was very strong because it was observed in many independent biological clones subjected to bottleneck transfers (Escarmís et al., 1996) . Despite qualifying as a hot spot for variation, the first event in fitness recovery when the clones were subjected to large population passages was the reversion of the elongated tract to its original size (Escarmís et al., 1999) . The interpretation of these findings, to be further analyzed in Chapter 6, is that during plaque-to-plaque transfers the negative selection to eliminate unfit genomes is less intense than during large, highly competitive population passages. Again, a clear molecular instruction to elongate a homopolymeric track may not be reflected in a high frequency of the affected genomes. Therefore, although mutation rates and frequencies for viruses bear some relationship, rates cannot be inferred from frequencies and vice versa ( Fig. 2 .5).

The first calculations of mutation rates for cellular organisms and for some DNA bacteriophages were carried out by J.W. Drake, who pursued comparative measurements that have generally supported a difference between mutation rates for DNA and RNA viruses. The rates estimated for bacteriophages l and T4 were about 100 times higher than those of their host E. coli. An approximately constant rate of 0.003 mutations per genome per replication round was calculated for a number of DNA-based microbes (Drake, 1991) , an observation sometimes referred to as "Drake's rule." This rather surprising constancy suggests that different DNA organisms have accommodated the template-copying fidelity of their replication machineries to achieve a narrow window in the mutational load measured as mutations fixed per genome, a remarkable fitting of biochemistry with evolutionary needs. The basal mutation rate in mammalian cells has been estimated at about 10 À10 substitutions per nucleotide and cell generation [reviewed in (Naegeli, 1997; Domingo et al., 2001; Friedberg et al., 2006) ] (Table 2 .1).

The synonymous mutation rate measured with experimental populations of bacteria has been assumed to reflect the neutral mutation rate (despite limitations explained in Section 2.3). Values for E. coli have ranged from 2 Â 10 À11 up FIGURE 2.5 Scheme that illustrates the difference between mutation rate and mutant frequency. Residue A in a template residue (top) can be misread to incorporate a C, A, or G into the complementary strand (discontinuous lines), at a rate of 10 À4 , 10 À5 , and 10 À5 substitutions per nucleotide, respectively. The replicative capacity of the newly generated templates (with G, U, and C, continuous lines) will determine widely different mutant frequencies with G > C > U.

to 5 Â 10 À9 substitutions per synonymous site per generation (Ochman et al., 1999) with 5 Â 10 À10 as the most likely estimate (Lenski et al., 2003) . The latter value is in agreement with a rate of 3 Â 10 À10 to 4 Â 10 À10 substitutions per base pair and generation based on wholegenome deep sequencing of an experimentally evolved lineage of Myxococcus xanthus (Velicer et al., 2006) . There are biological phyla for which no mutation rates have been calculated. From current knowledge, we can assume that mutation rates in cells and viruses depend on the replicative machinery (generally a multiprotein complex that includes the relevant viral polymerase with additional viral and host proteins and membrane structures) and on multiple environmental parameters (template nucleotide sequence context, ionic environment, temperature, metabolites in interaction with components of the replication apparatus, etc.). Whether bacteria are in the exponential or stationary phase of growth can affect intracellular metabolites and proton exchange rates which, in turn, may alter the proportion of tautomeric forms in nucleotides and misincorporation tendencies (Friedberg et al., 2006) . The sequence context of the template nucleic acids (presence of repeated sequences that can induce misalignment mutagenesis or G-C vs. A-T rich regions in relation to relative nucleotide substrate abundances, etc.) may impel or attenuate mutability. Insertion elements may enhance mutation rates at neighboring sites in a bacterial genome (Miller and Day, 2004) . Despite these influences, vesicular stomatitis virus (VSV) displayed comparable mutation rates in several host cells (Combe and Sanjuan, 2014) suggesting that there is a limited range of average error rates needed for a virus to maintain fitness (Chapters 5 and 9).

In addition to the general environmental and sequence context consequences for templatecopying fidelity that may affect any genome type, mutation rates for DNA viruses will also be influenced by: (i) whether the DNA polymerase that catalyzes viral DNA synthesis includes or lacks a functional proofreading-repair activity. High copying fidelity is typical of DNA polymerases involved in cellular DNA replication (Bebenek and Ziuzia-Graczyk, 2018) , and low copying fidelity is generally a feature of DNA polymerases involved in DNA repair (Friedberg et al., 2006; Ganai and Johansson, 2016) . Thus, repair of lesions that by themselves might not be mutagenic may lead to the introduction of mutations during the error-prone repair process. (ii) Expression of proteins active in repair encoded in the viral genome, such as uracil-DNA glycosylase, DNA repair endonucleases, etc. (iii) The mechanism of viral DNA replication, particularly the occurrence of double-stranded versus single-stranded DNA in replicative intermediates. (iv) Intracellular site of replication and the availability of postreplicative DNA repair proteins (regarding both intracellular location and concentration) to the viral replication factories. Little is known of the spatial relationships and relative affinities of cellular and viral proteins and structures that may critically affect polymerase fidelity. Comparative measurements of mutation rates at specific genome sites of DNA viruses are needed, as a first step to define the cellular and biochemical influences on the fidelity of DNA virus genome replication.

General genetic variability affecting the entire virus genome should be distinguished from RNA viruses 10 À5 to 10 À3

Retroviruses 10 À6 to 10 À4 DNA viruses 10 À8 to 10 À3 Cellular DNA 10 À9 to 10 À11

Values are expressed as substitutions per nucleotide. The range of values is the most likely according to several independent studies. No distinction is made between mutation rates and frequencies. See text for comments and references.

2.5 Mutation rates and frequencies for DNA and RNA genomes localized variability at hot spots in a genome. Even the extremely complex human genome shows genetic instability at specific loci, some associated with genetic disease (Domingo et al., 2001; Alberts et al., 2002; Bushman, 2002) . Genome size is a parameter pertinent to biological behavior, not only because it imposes a commensurate copying fidelity, but also because it affects the impact of genetic heterogeneity within infected organisms and upon the invasion of new hosts (Chapter 3). Mutation frequencies measured by subjecting virus to a specific selective agent (e.g., mutants that escape the neutralizing activity of a monoclonal antibody or mutants that escape inhibition by a drug) span a broad range of values (10 À 3 to 10 À 8 ) for DNA and RNA viruses (Smith and Inglis, 1987; Sarisky et al., 2000; Domingo et al., 2001) (Table 2 .1). The technical details of any procedure used to calculate a mutation frequency should be carefully evaluated to translate its meaning to the genome level. Important variables are the efficacy of the antibody or drug (which will be concentration-dependent) or the possibility of phenotypic hiding-mixing in the escape mutants to be quantified (Holland et al., 1989; Valcarcel and Ortin, 1989) . Unexpected low levels of escape mutants (that would imply < 10 À6 substitutions per nucleotide) for an RNA virus can mean either a general or sitespecific high polymerase fidelity, a selective disadvantage of the genome that harbors the mutation or, when a phenotypic alteration is measured, the requirement of two or more mutations to produce the alteration. Conversely, a high mutation frequency for a DNA virus whose replication is catalyzed by a high-fidelity DNA polymerase may mean that either repair activities were not functional or that the mutant displayed a selective advantage and overgrew the wild type prior to the measurement of its frequency. Mathematical treatments that take into account reversion of a low fitness mutant and its competition with wild-type virus have been used to calculate mutation rates (Batschelet et al., 1976; Coffin, 1990) .

Despite difficulties and limitations in the calculations, independent genetic and biochemical methods with different viruses support mutation rates for RNA viruses in the range of 10 À3 to 10 À5 substitutions per nucleotide copied [as representative articles and reviews see (Batschelet et al., 1976; Domingo et al., 1978 Domingo et al., , 2001 Steinhauer and Holland, 1986; Eigen and Biebricher, 1988; Varela-Echavarria et al., 1992; Ward and Flanegan, 1992; Mansky and Temin, 1995; Preston and Dougherty, 1996; Drake and Holland, 1999; Sanjuan et al., 2010; Bradwell et al., 2013) ] (Table 2 .1). A few early studies indicated unusual low mutation rates or frequencies for some RNA viruses. As discussed in some of the reviews listed above, there are technical reasons to suggest that such values were probably underestimates of the true average mutations rates or frequencies.

Obviously, it cannot be excluded that some genomic sites or viruses under a given environment might be unusually refractory to introduce mutations, but most evidence supports the range of values listed in Table 2 .1. The near million-fold higher mutation rates for RNA viruses than cellular DNA, whose biological implications were presciently anticipated by J. Holland and colleagues (Holland et al., 1982) , have been confirmed. That is, for RNA viruses of genome length between 3 Kb and 32 Kb, an average of 0.1-1 mutation is introduced per template molecule copied in the replicating population. Unless most mutations impeded viral replication, a continuous input of mutant genomes is expected, as indeed found experimentally (Chapter 3).

High mutation rates for RNA genomes are also supported by measurements of templatecopying fidelity by RNA polymerases, reverse transcriptases, and DNA polymerases devoid of 3e5 0 proofreading exonuclease (or under conditions in which, such exonuclease is not functional) [ (Steinhauer et al., 1992; Varela-Echavarria et al., 1992; Mansky and Temin, 1995; Domingo et al., 2001; Friedberg et al., 2002 Friedberg et al., , 2006 Men endez-Arias, 2002) , and references therein]. In vitro fidelity tests may be based on genetic or biochemical assays using homopolymeric or heteropolymeric templateprimers. Measurements include the kinetics of incorporation of an incorrect versus the correct nucleotide directed by a specific position of a template or the capacity of a polymerase to elongate a mismatched template-primer 3 0 end, [these and other assays have been reviewed (Men endez- Arias, 2002) ] (see also Section 2.6). Differences between related enzymes (i.e., AMV RT is more accurate than HIV-1 RT), and the fact that amino acid substitutions in the polymerases affect nucleotide discrimination, demonstrate that proofreading-repair activities together with the structure of the polymerase and replication complexes are determinants of template-copying fidelity.

The term mutation rate if often used in a light manner in the literature of virus evolution, probably driven by nomenclature from classical population genetics. It is used to mean mutation frequency, rate of evolution, and sometimes to mean mutation rate in its real sense (as explained in Section 2.5). A particularly risky habit is to use mutation rate when what is measured is a mutation frequency. Some studies have claimed that they have a replication system devoid of selection, and therefore, the number of mutations counted corresponds to the true mutation rate of the system. This is incorrect. There is no replicative system devoid of selection because at least selection to maintain replication is in continuous operation. Furthermore, as taught by quasispecies dynamics (Stadler, 2016) , supported by mutational waves in hepatitis C virus upon prolonged replication in a constant cellular environment (Moreno et al., 2017) , the mutant spectrum per se is part of the environment. Since the mutant spectrum is constantly changing, so is the environment in which replication takes place. Unfortunately, some studies have proposed the existence of mutational cold spots (sites at which mutations as a biochemical event occur at a very low frequency) ignoring that negative selection might have eliminated newly arising mutations. These false conclusions imply the existence of genomic regions particularly suitable as drug or antibody targets because they cannot mutate. These are the types of studies and incorrect conclusions that keep perpetuating the problem of control of viral diseases by encouraging antiviral and vaccine designs doomed to failure (Chapters 8 and 9).

2.6 Evolutionary origins, evolvability, and consequences of high mutation rates:

fidelity mutants

The amino acid substitutions in the core polymerase that affect fidelity can be located either close to or away from the active site of the enzyme. The change in fidelity can reach almost one order of magnitude, but virus viability is not compromised. Thus, error rates themselves can be subjected to selection, as supported by theoretical studies on evolvability (Earl and Deem, 2004) . Early studies documented heterogeneity in the mutation rates among individual plaque isolates of influenza A virus (Suarez et al., 1992) . It is not clear whether mutation rates of viruses have evolved to procure a balance between adaptability and genetic stability, or whether other selective constraints have imposed the observed values. It has been suggested that because of the generally deleterious nature of most mutations, the adaptive value of the high mutation rates for RNA viruses is debatable and that there might have been a trade-off between replication rate and copying fidelity (Elena and Sanjuan, 2005) . Mutation rates would be a consequence of rapid RNA replication, and an increase in copying fidelity would come at a 2.6 Evolutionary origins, evolvability, and consequences of high mutation rates: fidelity mutants cost, resulting in a lower replication rate. A connection between elongation and error rate has been suggested by results with some viral and cellular polymerases [review (Kunkel and Erie, 2005) ]. In an early study with the poliovirus RdRp in vitro, an increase in the error frequency was observed when the pH and Mg 2þ ion conditions were modified, and the decreased fidelity correlated with increased RNA elongation rate (Ward et al., 1988) . A possible connection between elongation rate and copying fidelity cannot be ruled out, but current evidence points to template-copying fidelity as being the result of multiple factors, not necessarily linked to the rate of genome replication (Vignuzzi and Andino, 2010; Campagnola et al., 2015; Domingo and Perales, 2019) .

There is ample support for an adaptive value of high mutation rates for RNA viruses, independently of their biochemical origins. A poliovirus mutant, whose RdRp displayed a three-to fivefold higher fidelity than the wild-type enzyme, replicated at a slightly lower rate than wild-type virus in cell culture but displayed a strong selective disadvantage regarding the invasion of the brain of susceptible mice (Pfeiffer and Kirkegaard, 2005; Vignuzzi et al., 2006) . The impediment to cause neuropathology was due at least partly to the limited complexity of the mutant spectrum since its broadening through mutagenesis restored the capacity to produce neuropathology. These and other studies have provided evidence that mutant spectrum complexity, by virtue of its impact on fitness, can be a virulence determinant. The work by M. Vignuzzi, J. Pfeiffer, R. Andino, C. Cameron, K. Kirkegaard, and their colleagues on poliovirus fidelity mutants opened a much-needed branch of research in virus evolution and quasispecies implications. As proof of this statement, the field of fidelity mutants is rapidly expanding (Bordería et al., 2016) , and references to the information they provide will be made in several chapters.

Theoretical models and experimental observations suggest that mechanisms for error correction had to evolve to maintain functionality of increasingly complex genomes (Swetina and Schuster, 1982; Eigen and Biebricher, 1988; Domingo et al., 2001; Eigen, 2002 Eigen, , 2013 ) (here complexity means genome size, provided no redundant information is encoded). The coronaviruses have the largest genomes among the known RNA viruses, with 30e33 kb. This is about 10-fold more genetic information than encoded in the simple RNA bacteriophages, such as MS2 or Qb. Coronaviruses are replicated by complex RNA-dependent RNA polymerases, which include a domain that corresponds to a 3 0 e5 0 exonuclease, proofreading-repair activity. The protein displays exonuclease activity in vitro, and its inactivation affects viral RNA synthesis (Minskaia et al., 2006) , and results in increases of about 15-fold in the average mutation frequency (Eckerle et al., 2007 (Eckerle et al., , 2010 . A coronavirus mutant devoid of this repair function is more susceptible to lethal mutagenesis than the corresponding, nonmutated virus , as expected from a connection between replication accuracy and proximity to an error threshold for the maintenance of genetic information (Chapter 9). Thus, it is likely that a proofreading activity evolved (or was captured from a cellular counterpart) in RNA genomes, whose genomic complexity was in the limit compatible with the fidelity achievable by standard RNA replicases. It would be interesting to discover new RNA viruses with a single RNA molecule longer than 30 Kb as a genome to analyze whether they have evolved more accurate core polymerases or exhibit a proofreading-repair function during replication. Toward the other end of the RNA size scale, viroid RNAs display a mutation rate higher than (or close to the highest) recorded for RNA viruses, consistent with the correlation between genome size and template-copying accuracy (Gago et al., 2009) .

Studies with bacteria have identified some of the factors that successively increase copying fidelity. It has been estimated that during E. coli DNA replication the error rate would be 10 À1 to 10 À2 mutations per nucleotide copied if accuracy relied only upon the strength of interactions provided by base pairing (Section 2.2). The error rate would decrease to 10 À5 to 10 À6 with base selection and proofreading-repair, to about 10 À7 with the contribution of additional proteins present in the replication complex, and to about 10 À10 misincorporations per nucleotide with the participation of postreplicative mismatch correction mechanisms (Naegeli, 1997; Kunkel and Erie, 2005; Friedberg et al., 2006) . Reduction of bacterial genome size results in the increase of mutation frequency (Nishimura et al., 2017) . The error rate of the bacteriophage f29 DNA polymerase is about 10 À6 without the proofreading exonuclease activity, and it decreases to 10 À8 with the correcting activity [(de Vega et al., 2010) and references therein]. Postreplicative repair pathways act on double-stranded DNA, but not (or very inefficiently) on RNA or DNA-RNA hybrids. Therefore, the known postreplicative repair systems that operate in cellular DNA do not make a significant contribution to error correction in RNA viruses (Steinhauer et al., 1992) .

The importance of copying fidelity for complex genomes is reflected in the fact that more than 100 proteins are directly or indirectly involved in the repair of the human genome . Elevated mutation rates in the range of those operating for RNA viruses would be lethal for large DNA genomes. Localized genetic modification occurs physiologically in processes, such as somatic hypermutation and class-switch recombination in B cells of the germinal centers, as mechanisms of diversification of immunoglobulin genes (Upton et al., 2011; Methot et al., 2017) . Chromosomal instability has long been associated with cancer (Gatenby and Frieden, 2004; Stratton et al., 2009) . Surveys have been (and are currently being) used to identify genes associated with chromosomal instability and their role in aging and disease (Aguilera and Garcia-Muse, 2013; Vijg et al., 2017 ) (see also Chapter 10). While uncontrolled high mutability is deleterious for differentiated cellular organisms, it constitutes a modus vivendi for a great majority of viruses.

Despite its attractiveness, definitive proof of the hypothesis of a direct relationship between error rate and limited genome complexity will require additional functional and biochemical studies. Exceptions to the absence of repair activities in simple genetic elements have been described. A satellite RNA of the plant virus turnip crinkle carmovirus evolved a 3 0 -end RNA repair mechanism. It implicates the synthesis of short oligoribonucleotides by the viral replicase using the 3 0 -end of the viral genome as a template. The mechanism consists probably of template-independent priming at the 3 0 -end of the damaged RNA to generate wild type, negative strand, and satellite RNA (Nagy and Simon, 1997) . A reversible, NTP-dependent excision of the 3 0 residue of the nascent nucleic acid product has been described in some retroviruses and hepatitis C virus (Meyer et al., 1998; Jin et al., 2013) . This activity is important for drug resistance, and it may also modulate the overall fidelity of some polymerases. It cannot be excluded that some type of point mutation correction may operate in RNA genetic elements of less than 30 Kb. Such putative mechanisms may even diminish mutation rates that would otherwise be prohibitively deleterious, and they do not overshadow high mutation rates as a feature of RNA and some DNA genomes (Table 2 .1).

Limited copying fidelity in the absence of proofreading-correction mechanisms can be regarded as an unavoidable consequence of the molecular mechanisms involved in template copying by viral polymerases. Most nucleic acid polymerases share a structure that resembles a right hand, with fingers, palm, and thumb domains (Fig. 2.6A ). Three-dimensional structures of viral RdRps and RTs indicate that interactions between the incoming nucleotide or residues of the template-primer with amino acids of the polymerase must permit displacement of the growing polymerase chain along 2.6 Evolutionary origins, evolvability, and consequences of high mutation rates: fidelity mutants On the left, the Klenow fragment is represented with colored fingers, palm and thumb domains, next to an open right hand. The structure on the right is that of PV RdRp, next to a closed right hand. Courtesy of N. Verdaguer and L. Vives-Adri an (the hand is that of L. Vives-Adri an). (B) The structure of the ternary complex between FMDV 3D, an RNA molecule, and UTP as the substrate (PDB id. 2E9Z). The left panel is a front view of the complex, depicting the polymerase chain as a yellow ribbon, the RNA in dark blue (template) and cyan (primer). The incoming UTP and the pyrophosphate product are shown in atom type, and 2 Mg 2 þ ions as magenta balls. The right panel is the same complex in a top-down orientation. (Figure courtesy of C. Ferrer-Orta and N. Verdaguer). (C) Scheme of the minimum number of steps involved in nucleotide incorporation. The first step consists of the binding of polymerase 0 E to the template-primer R n (elongated up to nucleotide n) to form a complex 0 ER n . Formation of the activated complex ER n is governed by the rate of constant k assembly (k a ). The activated ER n complex binds a nucleotide NTP with an apparent binding affinity given by K d,app to form the ER n NTP complex. Catalysis to covalently incorporate the NTP to the growing primer chain to yield ER nþ1 and pyrophosphate (PP i ) is governed by the rate constant k pol . Other constants depicted in the scheme are the inactivation rate constant (k inact ) of 0 E, and dissociation of E from RNA (k off, ER n and k off , ER nþ1 ). Based on Arias, A., Arnold, J.J., Sierra, M., Smidansky, E.D., Domingo, E., et al., 2008 . Determinants of RNA-dependent RNA polymerase (in)fidelity revealed by kinetic analysis of the polymerase encoded by a foot-and-mouth disease virus mutant with reduced sensitivity to ribavirin. J. Virol 82, 12346e12355, and previous studies with PV polymerase 3D by C. E. Cameron and his colleagues.

the channel located at the palm domain of the polymerase (Steitz, 1999; Ferrer-Orta et al., 2006; Wu and Gong, 2018) (Fig. 2.6B ). The polymerase of classical swine fever virus and other pestiviruses, such as bovine viral diarrhea virus includes an N-terminal extra-domain of about 100 amino acids; its interaction with the palm domain is important for template copying fidelity . If interactions around the catalytic site to ensure the correct nucleotide incorporation were so strong as to preclude misincorporations, the movement of the growing polynucleotide chain would be hampered. Again, this compromise suggests a match between biochemical and evolutionary needs.

The orientation of the triphosphate moiety of the incoming nucleotide substrate is important for nucleotide incorporation (Men endez- Arias, 2002; Graci and Cameron, 2004; Ferrer-Orta et al., 2009 ). One of the several steps involved in nucleotide incorporation is the formation of a ternary complex (polymerase with templateprimer and the incoming nucleotide) that undergoes a conformational change (reorientation of the divalent ion-complexed triphosphate moiety of the incoming nucleotide). This conformational change activates the complex for phosphoryl transfer, to link the nucleosidemonophosphate to the 3 0 -terminus of the primer (or growing chain). Steps involved in the nucleotide incorporation are represented in Fig. 2 .6C. Both the conformational change and the relative rate of phosphoryl transfer for an incorrect nucleotide versus the correct nucleotide influence the error rate at each site of the growing chain. Critical kinetic constants in Fig. 2 .6C that are determined experimentally to quantify relative nucleotide incorporations and misincorporations are K d,app (expressed as mM), k pol (expressed as s À1 ), and the ratio k pol /K d,app (mM À1 s À1 ) termed the catalytic efficiency. The ratio of k pol /K d,app for the incorporation of an incorrect nucleotide to k pol /K d,app for a correct nucleotide gives the frequency of that particular misincorporation, and an assessment of polymerase fidelity [ (Castro et al., 2005) and references therein]. Modifications of polymerase residues by site-directed mutagenesis, combined with comparisons of the relevant structures, have identified critical amino acid residues involved in template-copying fidelity.

High-fidelity mutants are frequently obtained by selecting viruses resistant to mutagenic nucleotide analogs such as the antiviral agent ribavirin (Beaucourt and Vignuzzi, 2014) . Limited incorporation of a deleterious nucleotide can be attained either through specific discrimination against the analog (ribavirin or other) or through a general decrease of all types of misincorporations, that is, a high-fidelity phenotype. Structural modifications of viral polymerases that lead to high fidelity have inspired the design of mutant viral polymerases displaying either an increase or decrease of copying fidelity achieved through a single amino acid substitution (Wainberg et al., 1996; Men endez-Arias, 2002; Mansky et al., 2003; Pfeiffer and Kirkegaard, 2003; Arnold et al., 2005; Domingo, 2005; Vignuzzi et al., 2006; Coffey et al., 2011; Gnadig et al., 2012; Meng and Kwang, 2014; Rozen-Gagnon et al., 2014; Borderia et al., 2016) . The capacity of the virus to evolve at higher or lower rates than their ancestors is achievable through modest numbers of mutations (limited movements in sequence space, Chapter 3), again emphasizing the evolvability of mutation rates.

The rates of mutation and recombination need not be independent. M. Vignuzzi and colleagues have shown that a mutator Sindbis virus displays a higher recombination rate and enhanced production of DI particles than the wild type virus (Poirier et al., 2015) . A connection between mutation and recombination rates strengthens the evolutionary consequences of the modifications of template copying fidelity that can be achieved through a single amino acid substitution.

2.6 Evolutionary origins, evolvability, and consequences of high mutation rates: fidelity mutants 2.7 Hypermutagenesis and its application to generating a variation: APOBEC and ADAR activities Some viral genomes either isolated from biological samples or evolved in cell culture show biased mutation types (e.g., monotonous G / A or C / U substitutions in the same genome), generally at frequencies of around 10 À2 substitutions per nucleotide (10-to 1000fold higher than standard mutation rates and frequencies) ( Table 2 .1). Biased hypermutation was first observed in some defective interfering (DI) RNAs of vesicular stomatitis virus (VSV) (Holland et al., 1982) , and in variant forms of measles virus, associated with postmeasles neurological disease, such as subacute sclerosing panencephalitis (Cattaneo and Billeter, 1992) . Hypermutation is mainly due to the activity of cellular deaminases, such as the apolipoprotein B mRNA and the editing complex (APOBEC), or the adenosine deaminase acting on doublestranded RNA (ADAR) families, that are involved in cellular editing and regulatory functions (Sheehy et al., 2002; Santiago and Greene, 2008; Nishikura, 2010; Stavrou et al., 2014; Pfaller et al., 2018; Venkatesan et al., 2018) . In the event of a viral infection, such cellular functions can become part of an innate defense mechanism against the invading virus. Viral proteins (i.e., Vif in HIV-1) bind some APOBEC proteins, thus inhibiting mutagenesis and permitting virus survival (Sheehy et al., 2002) . In oncoretroviruses, retroviruses, and hepatitis B virus (HBV), the APOBEC-3 cytidine deaminase acts on singlestranded DNA and results mainly in G / A and C / U hypermutation, that may affect 40% e100% of the G residues. The preferred sequence context for G hypermutation in HIV-1 observed in vivo is GpA > GpG > GpT z GpC. The specific dinucleotide context of the hypermutated sites provides a means to distinguish genomes that have undergone hypermutation by cellular activities from those that are heavily mutated by other mechanisms, such as the action of mutagenic agents (Chapter 9). APOBEC 3 proteins play a role in cancer through cytidine deaminase mutagenesis and generation of double-strand breaks in chromosomal DNA (Wang et al., 2016) . APOBEC 3 levels in the cell may be regulated by cellular and viral proteins, for example, human papillomavirus (HPV) oncoprotein E that stabilizes APOBEC 3A in human keratinocytes that may promote cervical cancer (Westrich et al., 2018) .

The ADAR-associated hypermutation was identified in negative-strand RNA riboviruses and results mainly in A / G and U / C hypermutation. It is originated by A / I (inosine) modifications in double-stranded viral RNA catalyzed by ADAR-1 L, one of more than 100 proteins inducible by type I IFN (Maas et al., 2003) . Inosine can be recognized as G by the replication machinery (Valente and Nishikura, 2005) , although it can form wobble base pairs also with A and U (Fig. 2.2) . Hypermutation can contribute to genetic variation of viruses (Hirose et al., 2018) .

There are additional mechanisms of hypermutagenesis. Higher than average mutation frequencies can occur as a result of replication in the presence of biased concentrations of the standard nucleotide substrates; this has been applied to the in vitro generation of genes mutated at frequencies of 10 À1 to 10 À2 (mutagenic PCR), as a powerful tool to study sequence-function relationships and functional robustness of nucleic acids and proteins (Meyerhans and Vartanian, 1999) . Error-prone PCR has been used in experiments of in vitro evolution of nucleic acid enzymes to generate heterogeneous collections of nucleic acid sequences to select for molecules capable of catalyzing specific reactions (Joyce, 2004 ) (Chapter 1).

High mutation rates have practical implications in laboratory studies on the behavior of 2. Molecular basis of genetic variation of viruses: error-prone replication virus mutants obtained by molecular cloning of a biological sample, or constructed by site-directed mutagenesis. A transition mutation that causes a strong fitness decrease but that still allows residual RNA genome replication will most likely revert following infection or transfection of cells with the mutant construct and subsequent viral replication. Double or triple mutants (preferentially including transversions) should be engineered (when possible according to the genetic code) to study the behavior of a viral mutant with an amino acid replacement of interest that may produce a fitness decrease. As an example, a C / U transition found in an open reading frame of an RNA virus may convert a Pro into a Ser (CCG / UCG). Since Ser will revert to Pro through a U / C transition in the triplet (a common type of misincorporation by most polymerases), Ser should be engineered to be encoded by AGU; in the course of replication, reversion to Pro would require at least two transversions since the codons for Pro are CCU, CCC, CCA, or CCG. Thus, if effects derived from the difference in the primary sequence of the RNA or codon bias do not intervene in the behavior of the viral genome, codons with a high genetic barrier to reversion should be engineered for studies involving viral replication.

In general, deletions revert at a much lower frequency than point mutations, and when appropriate for the question under study, a deletion should be introduced within the gene of interest to probe gene function in reversegenetics studies. High mutation rates also imply that infection or transfection with debilitated mutant viruses may result in progeny with sequences that differ from the input. V. I. Agol and colleagues have coined the term quasi-infectious to refer to mutant viruses that are capable of yielding progeny, but the progeny differs from the initial genome (pseudorevertants) (Gmyl et al., 1993; Agol and Gmyl, 2018) . The difference between the input mutant and the rescued progeny virus will depend on the type of genetic lesion in the input virus and its consequences for virus multiplication. A single point mutation that decreases replication is likely to evolve to yield a true revertant (return to the original sequence) upon replication. If the same reversion depends on two or more mutations, a true revertant will require extended replication for exploration of sequence space (Chapter 3), and selection of compensatory mutations elsewhere in the genome (sometimes referred to as second site revertants) becomes an alternative for fitness gain. The term compensatory applies to mutations that compensate for the deleteriousness of other mutations. A typical example is a mutation that decreases the stability of a stem in an RNA stem-loop that functions as a cis-acting element. A compensatory mutation restores a stable stem needed for the activity. Transfection of cells by an engineered virus with some preselected genetic modification (produced either from cDNA copies of a viral genome or by chemical synthesis) may yield progeny genomes, which differ from the parent. If a substantial loss of replicative capacity is produced by a drastic genetic change (an indel, loss of a stem-loop structure, etc.) selection of a true revertant becomes extremely unlikely. The compensatory generation of alternative structures (or constellations of point mutations) that restores replication (partially or completely) becomes an interesting and informative possibility.

Procedures to copy an entire viral RNA genome into a cDNA for reverse genetics studies are now available (Fan and Di Bisceglie, 2010) . If for technical reasons an infectious cDNA clone is constructed from several molecules, which were copied from different genomes present in the mutant spectrum, the ligation product may be transcribed into an RNA, which is not infectious. This is because, while some constellations of mutations may be compatible with infectivity, others may not, or may allow limited, suboptimal replication, thus favoring the selection of additional mutations or reversions. The same applies to a synthetic genome based on one of the multiple genomic sequences from a viral 2.8 Error-prone replication and maintenance of genetic information: instability of laboratory viral constructs isolate. Individual mutations may be detrimental either per se, or by the combined presence of other mutations. The joint effect of different mutations in the same genome is often referred to as epistasis. Mutations that reinforce each other with regard to a viral function are said to produce positive epistasis, and those that interfere with each other produce negative epistasis (also mentioned in Section 2.3). Epistasis in RNA viruses may be blurred by the weight of mutant spectra in determining viral behavior through intergenomic interactions (Chapter 3).

An interesting contrast that recapitulates concepts given in Sections 2.5 and 2.6 is the effect of an active proofreading-repair activity in maintaining the infectivity of a viral genome upon its extended replication in vitro (in a test tube, in the absence of cellular extracts). The 19,285 bp bacteriophage f29 DNA can be amplified at least 4000-fold without detectable loss of infectivity due to the fidelity of f29 DNA polymerase conferred by a 3 0 e5 0 proofreading-repair exonuclease activity (Bernad et al., 1989) . Engineered f29 DNA polymerases provide a powerful amplification tool in genomics (de Vega et al., 2010) . In contrast, the 4220 nucleotides long Qb RNA rapidly loses its infectivity when replicated by Qb replicase in vitro due to the accumulation of mutations and deletions in the viral RNA (Mills et al., 1967; Sabo et al., 1977) . The error-prone Qb replicase is not adequate to amplify infectious viral RNA, but it was at the origin of the quasispecies concept to be discussed in Chapter 3. Mutagenic DNA polymerases (generally those involved in DNA repair) are an alternative to mutagenic PCR (Section 2.7) to produce randomly mutated collections of nucleic acid molecules (Forloni et al., 2018) .

Recombination is the formation of a new genome by covalent linkage of genetic material from two or more different parental genomes (Fig. 2.7) . Recombination can also involve different sites of the same genome to yield insertions or deletions, such as in the formation of defective interfering (DI) genomes. It is a widespread mechanism of genetic variation in all biological systems, and in cells, it underlies critical physiological and developmental processes (splicing, generation of diversity in FIGURE 2.7 RNA recombination and segment reassortment. (A) Scheme of replicative and nonreplicative RNA recombination. RNA polarity is indicated by þ, À symbols. Replicative recombination is displayed as the result of template switching during minus-strand RNA synthesis. Nonreplicative recombination is depicted as the outcome of breakage and ligation (joining) of fragments of plus-strand RNA. (B) An example of genome segment reassortment with the formation of a new segment constellation in which six genomic segments originate from one parent (blue) and two from the other (gold). Influenza virus is the best-known example (see text).

immunoglobulin genes and T cell receptors, transposition events, phase variation in bacteria, repair pathways that promote postreplicative error correction, etc.). Cellular DNA recombination relates to replication, repair, and completion of DNA replication, operations that involve multiple proteins displaying a variety of activities (Smith and Jones, 1999; Alberts et al., 2002; Nimonkar and Boehmer, 2003; Friedberg et al., 2006) .

Recombination occurs both with DNA and RNA viruses, often with the participation of the virus replication machinery. Several types of recombination have been distinguished in viruses: homologous versus nonhomologous recombination, according to the extent of nucleotide sequence identity around the recombination (crossover) site, and replicative versus nonreplicative recombination, according to the requirement of viral genome replication for recombination to occur (Kirkegaard and Baltimore, 1986; King, 1988; Lai, 1992; Nagy and Simon, 1997; Plyusnin et al., 2002; Boehmer and Nimonkar, 2003; Gmyl et al., 2003; Chetverin et al., 2005; Agol, 2010; Simmonds, 2010; Bujarski, 2013; Perez-Losada et al., 2015; Agol and Gmyl, 2018; Bentley and Evans, 2018) .

As in the case of cells, homologous recombination in double-stranded DNA viruses is intimately connected with DNA replication and repair. It implicates multiple viral gene products (DNA polymerase, single-stranded DNA-binding proteins, processivity factors, helicase-primase, eukaryotic topoisomerase I, etc.), and a succession of protein-catalyzed steps (Czarnecki and Traktman, 2017) . In the copy choice (or template switching) mechanism, the nascent DNA switches from one template molecule to another, resulting in the synthesis of recombinant, daughter DNAs. In its basic form, recombination by breakage and rejoining starts with the introduction of a nick at one of the strands of each parental DNA, strand invasion of one parental DNA by the other, branch migration, ligation at the nicks (linking DNA strands from the two parents), and further isomerization and cleavage reactions. DNA recombination is responsible for the endonuclease-mediated isomerization of herpesvirus genomes [four isomers defined by the orientation of the long (L) and short (S) regions of the viral genome]. During the late phase of herpes simplex virus-1 replication, the frequency of recombination has been estimated at 0.6% per Kb of the genome . Integration or excision of proviral DNA or temperate bacteriophage DNA, are examples of site-specific recombination that involves specific enzyme activities (i.e., retroviral integrases), and requires a short stretch of nucleotide sequence identity.

The copy choice mechanism of homologous RNA recombination is also associated with genome replication. An RNA polymerase molecule with its nascent RNA product jumps into the corresponding position of another template molecule, to complete synthesis of the RNA product ( Fig. 2.7) . Given the large numbers of viral genomes often present in replication complexes (also termed replication factories) in each infected cell, it is not surprising that this mechanism may give rise to frequent recombinant progeny genomes. Formation of mosaic genomes has long been recognized as an essential feature of the genetics of some retroviruses and plant RNA viruses. For HIV-1 and some plant RNA viruses recombination frequencies have been estimated at 2%e10% of progeny per 100 nucleotides; for picornaviruses and coronaviruses the number of recombinants amounts to 10%e20% of the progeny (King, 1988; Lai, 1992; Nagy and Simon, 1997; Levy et al., 2004; Urbanowicz et al., 2005; Sztuba-Solinska et al., 2011) . Using a phylogenetic approach the average recombination rate of HIV-1 in vivo was estimated in 1.4 Â 10 À4 recombination events/site/generation, which is about fivefold greater than the average point mutation rate (Shriner et al., 2004) . A ten-fold lower value of 1.4 Â 10 À5 recombination events/site/ 2.9 Recombination in DNA and RNA viruses generation was estimated from the changes in the genetic composition of HIV-1 within single patients (Neher and Leitner, 2010) . Recombination is required for HIV-1 replication and genome integrity (Rawson et al., 2018) . In negative-strand RNA viruses recombination may be inefficient or absent, but some of them can display homologous recombination (Plyusnin et al., 2002) , and a high rate of generation of DI RNAs and other types of defective genomes (Roux et al., 1991; Rezelj et al., 2018) .

Recombination frequency may be altered by environmental factors that affect viral replication. A decrease of intracellular nucleotide levels as a result of treatment of cells with hydroxyurea may favor template switching reflected in an increase of intra and intermolecular recombination (Pfeiffer et al., 1999; Svarovskaia et al., 2000) . Homologous RNA recombination can also be influenced by amino acid substitutions in the polymerase, the primary sequence in the RNA (i.e., high frequency of template switching in AU-rich regions), the sequence identity between the nascent strand and acceptor template, and secondary structures at or around the crossover sites, among other influences (Nagy and Simon, 1997; Agol, 2010; Agol and Gmyl, 2018) . Since recombination necessitates coinfections of the same cell by at least two parental genomes, the persistence of a viral genome in a cell increases the likelihood of sequential coinfections, unless some reinfection or superinfection exclusion mechanism operates [(Webster et al., 2013) and references therein]. Without such restrictions, persistently infected cells may be an environment with a higher probability of recombination than transiently infected cells, assuming comparable genome loads at the sites of replication.

Nonreplicative recombination does not require replication of the viral genome, and has been described upon cotransfection of cells with viral RNA fragments that could not replicate by themselves (Gmyl et al., 2003; Gallei et al., 2004; Agol, 2010; Agol and Gmyl, 2018; Bentley and Evans, 2018 ). It appears to be a promiscuous event with a required 3 0 -phosphate in the 5 0 partner RNA and a 5 0 -hydroxyl residue in the 3 0 partner RNA mediated by cellular components whose mechanisms of activity are not understood.

The emerging picture is that the frequency of recombination varies among viruses, and that as new tools for genome analyses have become available, recombination has been detected in an increasing number of viruses. Recognition of recombination in a viral system is facilitated when a cell culture system is available. Controlled infection of cells with genetically marked parental viruses has been essential to estimate recombination frequencies, and to distinguish true recombination from mutationreversion events that may mimic the formation of recombinants. As with the concept of high genetic variation in RNA viruses, recombination has often gone from being considered marginal to prominent and relevant; HCV is a typical example (Galli and Bukh, 2014) , with presently at least one chimera established as a circulating recombinant form in the field.

Viral replicative machineries may be endowed with features that influence the occurrence of recombination. One such feature is processivity of the viral polymerase (capacity of continued copying of the same template molecule). Genome detachment of the polymerase complex from one genome to bind either to a different genome or to a distant site of the same genome is part of the standard replicative cycle of viruses, such as retroviruses and coronaviruses. Reverse transcriptase participates in strand transfer during DNA synthesis, and coronavirus polymerase switches from one template site to another during discontinuous RNA synthesis. It may be significant that they belong to viral families displaying high recombination frequencies (Makino et al., 1986) . Thus, here again, we encounter a "molecular instruction" that evolved as an essential feature of viral genome replication, and that can be exploited to generate variation, and permit new genomic forms to undergo the scrutiny of selection (Neher and Leitner, 2010) 

Recombination is diagnosed by discordant positions of different genes or genomic regions in phylogenetic trees, as a result of the transfer of part of a viral genome from representatives of one lineage to representatives of another lineage (Chapter 7). A commonly used procedure measures similarity values between sequences using a sliding-window scanning method. The recombination crossover point (where the two parental sequences meet) is identified by the point (or region) where the similarity plot crosses from one sequence into another (Salemi and Vandamme, 2004; Martin et al., 2005; Kosakovsky Pond et al., 2006 , Perez-Losada et al., 2015 . Crossover points along a viral genome are not distributed at random, either because polymerase detachment from the template is sequence-dependent or because many of the recombination events do not lead to viable progeny. Absence of recombinant viability introduces a parallel with the distinction between mutation rate and mutation frequency (Section 2.5); that is, a difference between what does occur at the biochemical level during replication and what is subsequently observed upon analysis of the replication products. Newly arising recombinants may be subjected to negative selection, and only a viable subset might be detectable in the progeny virus (King, 1988; Lai, 1992 ). An elegant study by D. J. Evans and colleagues documented "imprecise" enterovirus recombinant intermediates that were lost upon serial virus passage (Lowry et al., 2014) . Recombination is viewed as a biphasic process consisting of initial imprecise events followed by a stage of resolution in favor of fit recombinants.

The distinction between generation and resolution events that applies both to mutants and recombinants has yet another implication for RNA virus genetics. Some mutants or recombinants that in isolation do not exhibit sufficient replicative fitness to acquire dominance in a population may nevertheless persist as minority genomes. They may display low-level replication or be maintained by complementation by partner genomes (as in the case of two FMDV genome segments that are described in Section 2.11). As minority genomes, they may engage in modulatory activities (Chapter 3).

In viruses whose genomes are composed of two or more RNA or DNA segments, genome segment reassortment consists in the formation of new constellations of viral genomic segments from two or more parental genomes (McDonald et al., 2016) (Fig. 2.7) . Reassortment can produce new phenotypic traits. It is the main mechanism of antigenic shift of influenza A virusesdoften associated with new influenza pandemics (Webster et al., 1992; Morse, 1994; Gibbs et al., 1995; Domingo et al., 2001) das opposed to antigenic drift, which is mediated by amino acid substitutions in the surface proteins hemagglutinin and neuraminidase (Barbezange et al., 2018) . Reassortments occur among the 9e12 doublestranded RNA segments of the widespread Reoviridae family (Tanaka et al., 2012) . Fitness differences among all possible segment combinations (2 n , for two types of coinfecting particles with n genome segments) determine the types of genome segment groupings that dominate subsequent rounds of infection. In the laboratory, analysis of reassortant viruses has been applied to map a viral function into one segment or a combination of segments.

Genomic segments can be encapsidated either into a single virus particle (as in Orthomyxoviruses or Arenaviruses) or into separate particles 2.10 Genome segment reassortment (as in multipartite plant viruses). A multipartite virus can have either RNA or DNA as genetic material. The plant Nanoviruses have 6-8 molecules of single-stranded circular DNA of about 900e1000 nucleotides, and each segment encodes a single protein. In the case of the nanovirus Faba bean necrotic stunt virus, its eight segments vary in frequency in a hostdependent manner (Sicard et al., 2013) . This observation led the authors to propose a "setpoint genome formula," which may reflect the control of segment (gene) copy number that may provide some still unrecognized benefit to the multipartite phenotype (see next Section 2.11). In principle, replication of multipartite viruses requires that each cell be coinfected by at least one of each type of particle harboring a different genome type, which in fact represents a remarkable cost for replicative efficiency. The fact that unsegmented and segmented RNA viruses are well represented in our biosphere suggests that neither of the two organizations confers a definitive and general advantage for long-term survival.

The origin of viral genome segmentation is a debated issue, although there is general agreement that it may confer adaptive flexibility to viruses. Most proposals have been based on theoretical studies. Segmentation has been viewed as a form of sex that facilitates genomic exchanges to counteract the effect of deleterious mutations (Chao, 1988; Szathmary, 1992 ). An alternative, not mutually exclusive model is that segmentation confers an advantage because replication of shorter RNA molecules is completed earlier than the unsegmented counterparts (Nee, 1987) . Yet another possibility is that the lifestyle of a virus (in particular, the particle yield in connection with the number of surrounding susceptible cells), shaped over long evolutionary periods, may favor segmentation over intactness of a genetic message or vice versa.

An experimental system of genome segmentation is available with the picornavirus foot-and-mouth disease virus (FMDV). Its single-stranded RNA genome underwent a modification akin to genome segmentation when the standard virus was subjected to 260 passages in BHK-21 cells at high multiplicity of infection. The experiment was originally intended to investigate the limits of fitness gain following prolonged multiplication in a defined environment, in this case, BHK-21 cells in culture. The starting FMDV had not been well adapted to the BHK-21 cell culture environment since it derived from a diseased swine during a disease outbreak, and it was minimally propagated in BHK-21 cells to obtain a biological clone by plaque isolation. Upon extensive replication of the clone in BHK-21 cells, the virus evolved toward a bipartite genome (García-Arriaza et al., 2004) . Each of the two pieces of RNA that composed the bipartite (or segmented) genome version contained in-frame deletions affecting trans-acting proteins ( Fig. 2.8) . Each segment in isolation could not infect cells productively, but, when present together, they were infectious by complementation, and killed cells in the absence of standard FMDV. A low multiplicity of infection rapidly selected the full-length genome as a result of recombination of the two parental, defective segments (García-Arriaza et al., 2004) . The particles containing the shortened RNA were thermally more stable than the standard particles (Ojosnegros et al., 2011) , but this difference did not explain the initial trigger of the segmentation event. The solution to this question came with the demonstration that the transition toward genome segmentation was possible because of an extensive exploration of the mutational sequence space by the standard virus. Indeed, the mutations that accumulated during serial passages enhanced the fitness of the segmented genome version to a much higher extent than the fitness of the standard genome (Moreno et al., 2014) (Fig. 2.8) . Thus, gradual evolution (drift in sequence space) was a requirement for the major transition toward segmentation, thus adding reassortment to mutation and recombination as potential mechanisms of genetic variation of this laboratory-adapted picornavirus. Cooperation and complementation are discussed in Section 3.8 of Chapter 3.

It should be noted that segmented forms of RNA viruses have been engineered, but little is known of the incurred fitness cost; it is relevant to have shown that segmentation was possible upon unperturbed replication of a virus with an unsegmented RNA genome. The experimental result suggests that in evolution there is no unsurmountable barrier that allows the conversion between intact and split forms of the same genome, reflecting remarkable genome flexibility that will be emphasized in Chapter 7 in the context of the relevance of virus variation in the emergence of viral pathogens.

Mutation, recombination, and segment reassortment contribute to the evolution of most DNA and RNA viruses. Sometimes one form of genetic change appears to be more prominent than another, and sometimes the concerted action of recombination or reassortment with the mutation is apparent [i.e., antigenic drift in influenza virus, following the origin of a new antigenic type through reassortment (Ghedin et al., 2005) ]. A mutation is a universal form of genetic change. It underlies numerous adaptive responses and critical biological transitions in viruses, and it is a prerequisite for recombination and reassortment to have a biological impact. If mutations were not present in different template molecules during replication, recombinants with the crossover point at equivalent positions of the parental genomes would be "silent," and display the same behavior as the parental genomes. Apparently, "silent" recombination events may take place within replicative units; even if some mutations distinguished individual genomes of the same quasispecies swarm, a recombinant would not be distinguished from a mutant genome. The frequency of recombination in HIV-1 was noticed only when the acquired FIGURE 2.8 Evolution toward RNA genome segmentation in the laboratory. The monopartite, standard FMDV genome (clone C-S8c1 or pMT28, top) was subjected to 260 passages in BHK-21 cells. The resulting population p260 lacked detectable standard genome that could be rescued by low MOI passages. The evolved C-S8p260 accumulated 30 point mutations (depicted as vertical lines on the genome at the bottom) and consisted in two segments that were infections by complementation: D417, that lacked most of the L protease-coding region, and D999 that lacked most of the capsid proteins VP3, VP1-coding region. See text for further details and references.

immune deficiency syndrome (AIDS) pandemic had advanced, and the virus had diversified through the accumulation of mutations. Similar arguments apply to segment reassortment. Genomes necessitate mutation-driven diversification for reassortment to provide a biological difference; detection of a reassortant will be easier the larger the replicative advantage it confers to the virus (see also Chapter 10).

The evolutionary significance of recombination has been viewed in two opposite ways: as a means to rescue fit genomes from less fit parents (a conservative force that eliminates deleterious mutations), or as a means to explore new genomic forms for adaptive potential (a vast substrate for the exploration of sequence space; Chapter 3) [reviewed in (Zimmern, 1988; Lai, 1992; Worobey and Holmes, 1999; Simmonds, 2010; Perez-Losada et al., 2015) ]. Recombination has been probably at the origin of new viruses that presently occupy a well-established niche, and it is also at play today to expand diversity during the spread of viruses. As a historical event, the coronavirus mouse hepatitis virus appears to have acquired its hemagglutininesterase gene by recombination with an influenza C virus. The alphavirus Western equine encephalitis virus originated probably by recombination between Sindbis-like and Eastern equine encephalitis-like viruses [reviewed in different chapters of ].

Several recent poliomyelitis outbreaks have been associated with recombinants between oral poliovirus vaccine (OPV) viruses and other circulating enteroviruses (Gavrilin et al., 2000; Kew et al., 2002; Oberste et al., 2004; Muslin et al., 2015) . Intersubtype HIV-1 recombinants play a key role in current HIV-1 diversification, with around 100 circulating recombinant forms (and the number is growing) displaying complex mosaic structures (multiple crossover sites) (Thomson et al., 2002; Gerhardt et al., 2005) . In addition, other HIV-1 recombinants have been characterized that are not established epidemiologically. Fewer HCV recombinants have been identified, but the number is likely to increase as the virus diversifies in nature. Positive selection of HIV-1 recombinants that unite different drug-resistant mutations in the same genome offers an example of the conservative force of recombination to rescue fit viruses in the face of a strong selective constraint (Men endez- Arias, 2002) . Recombination is expected to play an increasing role in the spread of drug resistance among viruses for which new antiviral agents are in use, such as HBV and HCV.

Some defective DNA and RNA genomes that include indels, notably DI RNAs, which originate from recombination events can play an important role in the establishment and maintenance of persistent infections in cell culture, and can modulate viral infections in vivo (Holland and Villarreal, 1974; Roux et al., 1991; Rezelj et al., 2018) . Detailed genetic and biochemical analyses by A. Huang, J.J. Holland and their colleagues on the generation of VSV DI's and their interplay with the standard, infectious VSV contributed to unveil a continuous dynamics of genetic variation, competition, and selection, observable within short time intervals, a hallmark of RNA genetics (Palma and Huang, 1974; Holland et al., 1982) , fully confirmed by application of new sequencing techniques. DI particles and defective genomes are present in populations of positive-and negative-strand RNA viruses as they multiply in their natural hosts (N€ uesch et al., 1989; Drolet et al., 1995; Li et al., 2011; Saira et al., 2013; Ke et al., 2013; Rezelj et al., 2018) . Their widespread presence in vivo may mean that they are an unavoidable side-product of the replication machineries (i.e., instruction to recombine) or that selection might have favored their generation. Both possibilities are compatible. An instruction whose result is a means to modulate replication of the corresponding standard viruses or the antiviral immune response will be selected as a consequence of its biological effects.

DI RNAs can be regarded as the tip of the iceberg of many classes of defective genomes with a range of interfering or potentiating capacities that may coexist with standard animal, plant, insect, and bacterial viruses, and that may facilitate persistence and modulate disease symptoms (Holland et al., 1982; Vogt and Jackson, 1999; L opez-Ferber et al., 2003; Rosario et al., 2005; Sachs and Bull, 2005; Villarreal, 2005; Aaskov et al., 2006; Rezelj et al., 2018) . Noncytopathic coxsackievirus B3 (CVB3) variants with deletions at the untranslated 5 0genomic region were isolated from hearts of mice inoculated with CVB3. The variants replicated in vivo and were associated with longterm viral persistence (Kim et al., 2005) .

Despite the continuous dynamics of the escape of infectious virus to the interfering activities of DIs, some authors consider DIs as potential antiviral agents (Dimmock and Easton, 2014) . If defective genomes are competent in RNA (or DNA) synthesis or are complemented to replicate, they can act as dominant-negative swarms, provided they reach a sufficient load. In this manner, they may underlie the suppressive effects of mutant spectra of viral quasispecies. Intramutant spectrum decrease of replicative capacity due to the presence of defective genomes is one of the mechanisms of virus extinction evoked by enhanced mutagenesis (Chapters 3 and 9).

Recombination events must have been the last step in ancestral processes of horizontal gene transfer that mediated the incorporation of host genes (or gene segments) into viral genomes, and vice versa. Host genes related to immune responses were probably captured by complex DNA viruses at early stages of their evolution (Alcami, 2003; McFadden, 2005) . Mosaicism associated with nonhomologous recombination events is the norm among tailed bacteriophages (Canchaya et al., 2003) . Nonhomologous recombination can give rise to genomic sequences with a viral and a nonviral moiety. They include DI RNAs of Sindbis virus-containing cellular RNA sequences at their 5 0 ends, some cytopathic forms of bovine viral diarrhea virus, RNA of potato leafroll virus-containing tobacco chloroplast RNA, or an influenza virus with an insertion of ribosomal RNA into the hemagglutinin gene, mentioned in Chapter 5 regarding transient, high fitness levels [reviewed in (Domingo et al., 2001) ].

Phylogenetic analyses have suggested that recombination between RNA and DNA viruses might have occurred to give rise to some present-day single-stranded DNA viruses (Stedman, 2013) . However, the evidence for this attractive possibility is indirect and, to my knowledge, no experimental evidence of viral RNA-viral DNA recombination in cell culture or in vivo has been reported. Viability of mutant and recombinant viral genomes is severely constrained by the evolutionary history of the virus that has shaped viral genomes as coordinated sets of modules (Botstein, 1980 (Botstein, , 1981 Zimmern, 1988; Koonin and Dolja, 2014) . Experimental studies with engineered recombinant viruses have shown that modularity can restrict recombination (Martin et al., 2005) .

The three molecular mechanisms of viral genome variation (mutation, recombination, and reassortment) are not incompatible, although it may sometimes be difficult to discern their occurrence (Varsani et al., 2018) . It would be truly remarkable if a viral system could be proven to be totally devoid of one of the mechanisms of genetic variation. It would imply that there are powerful molecular reasons to dispense with an effective adaptive mechanism. Absence of a mechanism is extremely difficult to demonstrate but, if we could, its basis would open a new chapter of molecular virology.

There is an ongoing controversy regarding clonality versus nonclonality in biological evolution, particularly concerning the evolution of cellular parasites (Heitman, 2010; Tibayrenc 2.12 Mutation, recombination, and reassortment as individual and combined evolutionary forces and Ayala, 2012, 2014; Ramirez and Llewellyn, 2014; Hauser and Cushion, 2018) . Clonality means asexual progeny from a single ancestor. In the case of viruses, clonal evolution emphasizes reproduction without the exchange of genetic material among two or more parental genomes. Sexual reproduction necessarily involves the exchange of genetic material. The question for viruses is interesting because we would be inclined to propose clonal evolution despite considerable promiscuity of recombination and reassortment. A tentative solution was offered based on one assumption and some experimental observations. The assumption is that recombination is not a requirement for viruses to complete their infection cycles. Despite the possibility that recombination might be imprinted in the replication apparatuses (rendering it inherent to replication), its occurrence is not a necessity. The experimental observations are of two sorts. One is that historically recombination and reassortment have been at the origin of the emergence and reemergence of viral pathogens [western equine encephalitis virus, pandemic influenza viruses, emergent circulating poliovirus, HIV-1 and HCV recombinants, etc. (Section 2.12)]. The second observation is that recombination-based genome segmentation can occur given the adequate population dynamics and competitive environment, as documented with FMDV (Section 2.11). In consequence, the distinction was made between mechanistically unavoidable but biologically irrelevant, and meaningful recombination (Perales et al., 2015) . The latter form of recombination requires prior diversification of parental genomes by mutation and a number of cellular and epidemiological conditions. Despite its relevance to evolutionary transitions and viral emergence, it is not a requirement for virus survival, propagation, and evolution. This marks a contrast with the genomic exchanges associated with sexual reproduction. The proposal is that viruses evolve clonally at widely different time scales (intrahost or within-host evolution vs. long-term evolution at the epidemiological level). Similar arguments apply to mutation. This point will be addressed again in the closing Chapter 10, concerning implications of clonality.

All forms of genetic variation of viruses must be viewed essentially as blind processes despite preferences of nucleotide sequences or structures for mutation and recombination events: hot spots with higher than average rate values, and cold spots with lower than average rate values. Mutation originates largely in fluctuations of electronic structure that modify base-pairing properties, and from features of polymerase-template interactions, not subject to regulation, in the sense that we understand the regulation of gene expression or enzymatic activity. Absence of regulation is not incompatible with long-term evolution having shaped the molecular interactions that yield a level of mutagenesis compatible with survival and adaptability. Given the biological consequences of mutation rates, many additional studies are needed for the biological phyla, to quantity not only basal mutation rates but also the possible presence of mutator strains.

Similar arguments can be used for recombination and reassortment. The number of segments that enter a new genomic constellation may be regulated but not which variant forms of the individual segments will make up the new viral particles. It is short-term selection acting at the very center of replication and recombinant complexes that preserves some mutant and recombinant forms in detriment of others. Subsequent levels of selection occur when variant forms expand in multiple rounds of infection first within cells, then within an organism and then at the epidemiological level. The very nature of life in our planet has been built upon an inherent tendency to instruct variation in an incessant fashion, as necessary and unavoidable as the physical principles that dictate the behavior of our universe.

The net result of all mechanisms of genetic variation available to a virus is the generation of repertoires of variant genomes for random drift and selective forces to act upon. In other terms, genetic variation sets the scene for the actors of evolution to play their roles, and secure a continuous input of new forms despite subtle or catastrophic environmental perturbations. The same forces that drive general evolution have produced the dominant virus forms we see in nature, with all their nuances in the interaction with cell components. The adaptation of viruses to participate in intracellular processes with cells dictates that genetic variation of viruses has its limits to prevent deleteriousness. This is currently exemplified by the effects of amino acid substitutions in viral polymerases that either increase or decrease templatecopying fidelity. Viruses have reached a compromise between the stability of core information and flexibility for adaptability.

Although not yet treated in this chapter, viral population numbers are a key parameter in the evolutionary events. Next chapters address some of these questions, not only in general conceptual terms but also in the way evolution affects our daily confrontation with viral disease (see Summary Box).

• Mutation, recombination, and genome segment reassortment are the mechanisms of genetic variation used by DNA and RNA viruses. Mutations are due mainly to changes in the electronic distributions of the standard nucleotides, to damage of nucleotides by external influences, and by alignment alterations of the template relative to product polynucleotide chains. The effect of a mutation can range from being well-tolerated to highly detrimental or lethal. • Mutation frequencies are only an indirect consequence of mutation rates. Their values for viruses whose replication is catalyzed by polymerases devoid of proofreading-repair activity are 10 5 -to 10 6 -fold higher than those displayed by replicative cellular DNA polymerases. Error-prone replication is a hallmark of RNA viruses and some DNA viruses. The larger the amount of genetic information encoded in a viral genome, the lower the mutation rate must be to maintain the genetic message. • Several mechanisms of genetic recombination have been described for DNA and RNA viruses. The best characterized is homologous recombination whose frequency of occurrence is dependent on the replicative machinery, in particular, polymerase processivity. Genome segment reassortment is operative in segmented genomes, and it gives rise to biologically relevant changes, such as an antigenic shift in the influenza type A viruses. • Studies with foot-and-mouth disease virus have shown that extensive evolution of an unsegmented RNA genome has the potential to undergo a recombination-mediated transition akin to genome segmentation. Therefore, segmented and unsegmented forms of RNA viruses need not be considered as completely unrelated classes of genome organization. • Recombination and genome segment reassortment have been viewed as conservative forces to rescue viable genomes from a damaged pool, and also as a means to explore new genomic compositions that deviate from their parents. All forms of genetic variation give rise to repertoires of variant genomes on which selection and random drift act to produce the viral forms that we isolate and study.

Long-term transmission of defective RNA viruses in humans and Aedes mosquitoes

Frequency spectrum neutrality tests: one for all and all for one

Picornaviruses as a model for studying the nature of RNA recombination

Emergency services of viral RNAs: repair and remodeling. Microbiol

Causes of genome instability

Gene expression and molecular evolution

A universal BMV-based RNA recombination systemdhow to search for general rules in RNA recombination

Fine-tuning translation kinetics selection as the driving force of codon usage bias in the hepatitis a virus capsid

Determinants of RNAdependent RNA polymerase (in)fidelity revealed by kinetic analysis of the polymerase encoded by a footand-mouth disease virus mutant with reduced sensitivity to ribavirin

Remote site control of an active site fidelity checkpoint in a viral RNA-dependent RNA polymerase

Seasonal genetic drift of human influenza A virus quasispecies revealed by deep sequencing

Origin and evolution of poxviruses

The proportion of revertant and mutant phage in a growing population, as a function of mutation and growth rate

Ribavirin: a drug active against many viruses with multiple effects on virus replication and propagation. Molecular basis of ribavirin resistance

Fidelity of DNA replication ─ a matter of proofreading

Mechanisms and consequences of positive-strand RNA virus recombination

A conserved 3'-5' exonuclease active site in prokaryotic and eukaryotic DNA polymerases

Nucleic Acids. Structures, Properties, and Functions. University Science Books

Herpes virus replication

Fidelity variants and RNA quasispecies

A theory of modular evolution for bacteriophages

A modular theory of virus evolution

Correlation between mutation rate and genome size in riboviruses: mutation rate of bacteriophage Qbeta

A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn's disease

Synonymous codons: choose wisely for expression

Genetic recombination in plant-infecting messenger-sense RNA viruses: overview and research perspectives

Lateral DNA Transfer. Mechanisms and Consequences

Structure-function relationships underlying the replication fidelity of viral RNAdependent RNA polymerases

Phage as agents of lateral gene transfer

Incorporation fidelity of the viral RNA-dependent RNA polymerase: a kinetic, thermodynamic and structural perspective

Mutations and A/I hypermutations in measles virus persistent infections

Evolution of sex in RNA viruses

Viral RNAdirected RNA polymerases use diverse mechanisms to promote recombination between RNA molecules

Insertion/deletion frequencies match those of point mutations in the hypervariable regions of the simian immunodeficiency virus surface envelope gene

Arbovirus high fidelity variant loses fitness in mosquitoes and mice

Genetic variation in retroviruses

Variation in RNA virus mutation rates across host cells

Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection

The vaccinia virus DNA polymerase and its processivity factor

Silent mutations in sight: co-variations in tRNA abundance as a key to unravel consequences of silent mutations

Improvement of phi29 DNA polymerase amplification performance by fusion of DNA binding motifs

Linking RNA sequence, structure, and function on massively parallel highthroughput sequences

Defective interfering influenza virus RNAs: time to reevaluate their clinical potential as broad-spectrum antivirals?

Virus entry into error catastrophe as a new antiviral strategy

Viral Quasispecies

Nucleotide sequence heterogeneity of an RNA phage population

Genetic variability and antigenic diversity of foot-and-mouth disease virus

Quasispecies: the concept and the word

Quasispecies and RNA Virus Evolution: Principles and Consequences

Evolution of footand-mouth disease virus

Viral quasispecies: dynamics, interactions and pathogenesis

A constant rate of spontaneous mutation in DNA-based microbes

Mutation rates among RNA viruses

Detenction of truncated virus particles in a persistent RNA virus infection in vivo

Evolvability is a selectable trait

High fidelity of murine hepatitis virus replication is decreased in nsp14 exoribonuclease mutants

Infidelity of SARS-CoV Nsp14-exonuclease mutant virus replication is revealed by complete genome sequencing

Error catastrophe and antiviral strategy

Sequence space and quasispecies distribution

Adaptive value of high mutation rates of RNA viruses: separating causes from consequences

Multiple molecular pathways for fitness recovery of an RNA virus debilitated by operation of Muller's ratchet

RT-PCR amplification and cloning of large viral sequences

Adaptation of mRNA structure to control protein folding

A comparison of viral RNA-dependent RNA polymerases

Structural insights into replication initiation and elongation processes by the FMDV RNA-dependent RNA polymerase

Random mutagenesis using error-prone DNA polymerases

Specialized DNA polymerases, cellular survival, and the genesis of mutations

DNA Repair and Mutagenesis

Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection

Extremely high mutation rate of a hammerhead viroid

RNA recombination in vivo in the absence of viral replication

Comparative analysis of the molecular mechanisms of recombination in hepatitis C virus

DNA replication-a matter of fidelity

Evolutionary transition toward defective RNAs that are infectious by complementation

Information dynamics in carcinogenesis and tumor growth

Evolution of circulating wild poliovirus and of vaccine-derived poliovirus in an immunodeficient patient: a unifying model

In-depth, longitudinal analysis of viral quasispecies from an individual triply infected with late-stage human immunodeficiency virus type 1, using a multiple PCR primer approach

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution

Molecular Basis of Virus Evolution

Functional and genetic plasticities of the poliovirus genome: quasi-infectious RNAs modified in the 5'-untranslated region yield a variety of pseudorevertants

Nonreplicative homologous RNA recombination: promiscuous joining of RNA pieces? RNA 9

Coxsackievirus B3 mutator strains are attenuated in vivo

A single-nucleotide synonymous mutation in the gag gene controlling human immunodeficiency virus type 1 virion production

Is sex necessary for the proliferation and transmission of Pneumocystis?

Evolution of eukaryotic microbial pathogens via covert sexual reproduction

Within-host variations of human papillomaviruses reveal APOBEC signature mutagenesis in the viral genome

Persistent noncytocidal vesicular stomatitis virus infections mediated by defective T particles that suppress virion transcriptase

Rapid evolution of RNA genomes

Virus mutation frequencies can be greatly underestimated by monoclonal antibody neutralization of virions

NTP-mediated nucleotide excision activity of hepatitis C virus RNA-dependent RNA polymerase

Directed evolution of nucleic acid enzymes

Phylodynamic analysis of the emergence and epidemiological impact of transmissible defective dengue viruses

Outbreak of poliomyelitis in Hispaniola associated with circulating type 1 vaccine-derived poliovirus

5'-Terminal deletions occur in coxsackievirus B3 during replication in murine hearts and cardiac myocyte cultures and correlate with encapsidation of negativestrand viral RNA

The Neutral Theory of Molecular Evolution

The neutral theory of molecular evolution and the world view of the neutralists

Non-Darwinian evolution

The mechanism of RNA recombination in poliovirus

Paramyxovirus mRNA editing, the "rule of six" and error catastrophe: a hypothesis

Virus world as an evolutionary network of viruses and capsidless selfish elements. Microbiol

The population genetics of dN/dS

DNA mismatch repair

Tempo and mode of plant RNA virus escape from RNA interference-mediated resistance

Genetic recombination in RNA viruses

Rates of DNA sequence evolution in experimental populations of Escherichia coli during 20,000 generations

Dynamics of HIV-1 recombination in its natural target cells

Defective interfering viral particles in acute dengue infections

Aminoacyl-tRNA synthesis and translational quality control

DNA mismatch repair and its many roles in eukaryotic cells

A unique intra-molecular fidelity-modulating mechanism identified in a viral RNA-dependent RNA polymerase

Defective or effective? Mutualistic interactions between virus genotypes

Recombination in enteroviruses is a biphasic replicative process involving the generation of greater-than genome length 'imprecise' intermediates

A-to-I RNA editing: recent news and residual mysteries

Distribution of rare triplets along mRNA and their relation to protein folding

Highfrequency RNA recombination of murine coronaviruses

The rate and character of spontaneous mutation in an RNA virus

Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase

Influence of reverse transcriptase variants, drugs, and Vpr on human immunodeficiency virus type 1 mutant frequencies

RDP2: recombination detection and analysis from sequence alignments

Synonymous viral genome recoding as a tool to impact viral fitness

Reassorment in segmented RNA viruses: mechanisms and outcomes

Poxvirus tropism

Molecular evolution of the Herpesvirales

Molecular basis of fidelity of DNA synthesis and nucleotide specificity of retroviral reverse transcriptases

Attenuation of human enterovirus 71 high-replication-fidelity variants in AG129 mice

Molecular mechanisms of somatic hypermutation and class switch recombination

Unblocking of chain-terminated primer by HIV-1 reverse transcriptase through a nucleotide-dependent mechanism

Microbial Evolution. Gene Establishment, Survival and Exchange

An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule

Discovery of an RNA virus 3'/5' exoribonuclease that is critically involved in coronavirus RNA synthesis

Codon usage influences fitness through RNA toxicity

Exploration of sequence space as the basis of viral RNA genome segmentation

Internal disequilibria and phenotypic diversification during replication of hepatitis C virus in a noncoevolving celular environment

The Evolutionary Biology of Viruses

Evolution and emergence of enteroviruses through intra-and inter-species recombination: plasticity and phenotypic impact of modular genetic exchanges in the 5'untranslated region. PLoS Pathog. 11, e1005266. Naegeli, H., 1997. Mechanisms of DNA Damage Recognition in Mammalian Cells

New insights into the mechanisms of RNA recombination

The evolution of multicompartmental genomes in viruses

Recombination rate and selection strength in HIV intra-patient evolution

Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions

RNA interference as a tool for exploring HIV-1 robustness

HIV-1 protease evolvability is affected by synonymous nucleotide recoding

Reconstitution of recombination-dependent DNA synthesis in herpes simplex virus 1

Functions and regulation of RNA editing by ADAR deaminases

Coordinated changes in mutation and growth rates induced by genome reduction

Negative effect of genetic bottlenecks on the adaptability of vesicular stomatitis virus

Detection of defective genomes in hepatitis A virus particles present in clinical specimens

Evidence for frequent recombination within species human enterovirus B based on complete genomic sequences of all thirty-seven serotypes

Viral genome segmentation can result from a trade-off between genetic content and particle stability

Cyclic production of vesicular stomatitis virus caused by defective interfering particles

Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers

5-Azacytidine and RNA secondary structure increase the retrovirus mutation rate

Gonz alez-Candelas, F., 2015. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences

Extensive editing of cellular and viral double-stranded RNA structures accounts for innate immunity suppression and the proviral activity of ADAR1 p150

A single mutation in poliovirus RNA-dependent RNA polymerase confers resistance to mutagenic nucleotide analogs via increased fidelity

Ribavirin resistance in hepatitis C virus replicon-containing cell lines conferred by changes in the cell line or mutations in the replicon RNA

Altering the intracellular environment increases the frequency of tandem repeat deletion during Moloney murine leukemia virus reverse transcription

Transfection-mediated generation of functionally competent Tula hantavirus with recombinant S RNA segment

Low-fidelity polymerases of alphaviruses recombine at higher rates to overproduce defective interfering particles

Mechanisms of retroviral mutation

Reproductive clonality in protozoan pathogens ─ truth or artefact?

Recombination is required for efficient HIV-1 replication and the maintenance of the viral genome integrity

The defective component of viral populations

Non-Darwinian evolution: a critique

Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization

Functional characterization of the genomic promoter of borna disease virus (BDV): implications of 3'-terminal sequence heterogeneity for BDV persistence

Effects of defective interfering viruses on virus replication and pathogenesis in vitro and in vivo

Alphavirus mutator variants present host-specific defects and attenuation in mammalian and insect models

A guanosine to adenosine transition in the 3' terminal extracistronic region of bacteriophage Qb RNA leading to loss of infectivity

Experimental evolution of conflict mediation between genomes

Sequence analysis of in vivo defective interferinglike RNA of influenza A H1N1 pandemic virus

The Phylogenetic Handbook. A Practical Approach to DNA and Protein Phylogeny

The role of the APOBEC3 family of cytidine deaminase in innate immunity, G-to-A hypermutation, and evolution of retroviruses

Difference in incidence of spontaneous mutations between Herpes simplex virus types 1 and 2

Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein

Pervasive genomic recombination of HIV-1 in vivo

Gene copy number is differentially regulated in a multipartite virus

Recombination in the evolution of picornaviruses

Coronaviruses as DNA wannabes: a new model for the regulation of RNA virus replication fidelity

The mutation rate and variability of eukaryotic viruses: an analytical review

DNA Recombination and Repair

Coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics

Evolution of RNA-based networks

Different modes of retrovirus restriction by human APOBEC3A and APOBEC3G in vivo

Mechanisms for RNA capture by ssDNA viruses: grand theft RNA

Direct method for quantitation of extreme polymerase error frequencies at selected single base sites in viral RNA

Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase

DNA polymerases: structural diversity and common mechanisms

The cancer genome

Heterogeneity of the mutation rates of influenza A viruses: isolation of mutator mutants

The code of silence: widespread associations between synonymous codon biases and gene function

Template properties of mutagenic cytosine analogues in reverse transcription

Structural determinants of murine leukemia virus reverse transcriptase that affect the frequency of template switching

Self-replication with errors. A model for polynucleotide replication

RNA-RNA recombination in plant virus replication and evolution

Mycoreovirus genome alterations: similarities to and differences from rearrangements reported for other reoviruses

Molecular epidemiology of HIV-1 genetic forms and its significance for vaccine development and therapy

Reproductive clonality of pathogens: a perspective on pathogenic viruses, bacteria, fungi, and parasitic protozoa

Cryptosporidium, Giardia, Cryptococcus, Pnemocystis genetic variability: cryptic biological species or clonal near-clades?

AID: a riddle wrapped in a mystery inside an enigma

Homologous crossovers among molecules of brome mosaic bromovirus RNA1 or RNA2 segments in vivo

Phenotypic hiding: the carryover of mutations in RNA viruses as shown by detection of mar mutants in influenza virus

Comparison of Moloney murine leukemia virus mutation rate with the fidelity of its reverse transcriptase in vitro

Notes on recombination and reassortment in multipartite/segmented viruses

Simulating pseudogene evolution in vitro: determining the true number of mutations in a lineage

Comprehensive mutation identification in an evolved bacterial cooperator and its cheating ancestor

Perspective: APOBEC mutagenesis in drug resistance and immune escape in HIV and cancer evolution

Biological implications of picornavirus fidelity mutants

Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population

Replication slippage involves DNA polymerase pausing and dissociation

Genome instability: a conserved mechanism of ageing? Essays Biochem

Satellites and defective viral RNAs

Enhanced fidelity of 3TC-selected mutant HIV-1 reverse transcriptase

Role of the single deaminase domain APOBEC3A in virus restriction, retrotranscription, DNA damage and cancer

Determination of the poliovirus RNA polymerase error frequency at eight sites in the viral genome

Direct measurement of the poliovirus RNA polymerase error frequency in vitro

Evolution and ecology of influenza A viruses

Evasion of superinfection exclusion and elimination of primary viral RNA by an adapted strain of hepatitis C virus

Insertions in the human immunodeficiency virus type 1 protease and reverse transcriptase genes: clinical impact and molecular mechanisms

Evolutionary aspects of recombination in RNA viruses

Visualizing the nucleotide addition cycle of viral RNA-dependent RNA polymerase

Genomic heterogeneity maps to tandem repeat sequences in the herpes simplex virus type 2 UL region

Statistical methods for detecting molecular adaptation

Fidelity at the molecular level: lessons from protein synthesis