key: cord-1049194-g768fi93
authors: Domingo, Esteban
title: Long-Term Virus Evolution in Nature
date: 2015-10-01
journal: Virus as Populations
DOI: 10.1016/b978-0-12-800837-9.00007-1
sha: 3f54a73bbb6fc8e974a1dd3fb30dde2fe8633801
doc_id: 1049194
cord_uid: g768fi93

Viruses spread to give rise to epidemics and pandemics, and some key parameters that include virus and host population numbers determine virus persistence or extinction in nature. Viruses evolve at different rates of evolution depending on the polymerase copying fidelity during genome replication. Calculated rates of evolution in nature vary depending on the time interval between virus isolations. In particular, intra-host evolution is generally more rapid that inter-host evolution and several possible mechanisms for this difference are considered. The mechanisms by which the error-prone viruses evolve render very unlikely the operation of a molecular clock (constant rate of incorporation of mutations in the evolving genomes). Several computational methods are reviewed that permit the alignment of viral sequences and the establishment of phylogenetic relationships among viruses. The evolution of virus in the form of dynamic mutant clouds in each infected individual, together with multiple environmental influences, render the emergence and reemergence of viral pathogens an unpredictable event, another example of biological complexity.

human leukocyte antigen HTLV-1

human T-cell 

Intrahost virus replication and evolution are the first steps in the process of virus diversification that continues with successive virus transmission events that are a condition for long-term survival in nature.

Viruses are perpetuated as a consequence of many rounds of persistent or acute infections, with possible extracellular stages in which genomes remain basically invariant. Despite lacking direct evidence, we presume that multitudes of successive transmissions have allowed viruses to survive at least for thousands of years, probably undergoing continuous genetic change. Picornavirologists are familiar with an Egyptian stela dated 1550-1333 B.C. (18th Egyptian dynasty) that portrays the image of a man with an atrophic leg probably a consequence of infection with poliovirus (PV) or a related virus (Eggers, 2002) . In this chapter we deviate from the focus on how viral population numbers affect short-term survival and evolution, and we turn to features of viruses as they infect successive hosts to persist in nature. Viruses can be transmitted vertically or horizontally. Vertical transmission occurs from parental organisms to their offspring, and it includes infection through the germ line in animals and plants, from the mother to the embryo during fetal development, and also postnatal transmission to the newborn via blood, milk, or contact (Mims, 1981; Nash et al., 2015) . In the horizontal transmission, a virus spreads from infected individuals to susceptible recipients. This is the type of transmission we are most familiar with. It frequently gives rise to disease outbreaks (infection episodes localized in space and time that affect a few individuals), epidemics (that affect an ample geographical area and are often extended in time), and pandemics (that affect most areas of our planet), typically the periodic influenza pandemics.

All transmission modes have probably contributed to the maintenance of viruses in our biosphere. Persistent infections are likely to have played a major role when the number of individual humans or animals living in close contact was limited throughout the preagricultural era, earlier than 10,000 years ago. A favorable climate change during the Holocene (the geological epoch that began at the end of the Pleistocene, around 12,000 years before present; compare with Chapter 1) was probably an important driver toward large-scale domestication of plants and animals, 10,000-7000 years B.C. From the behavior of current viruses, there might have always been a dynamics of virus change for adaptability within individual hosts, and transmissions among animals or plants. The probability of transmission increased as host population numbers rose with agricultural practices and urban life in the last several thousand years. Intensive agriculture must have contributed to accelerated sequence space exploration by viruses with consequences for the emergence of viral disease (Section 7.7). There has been probably a continuous dynamics of viral emergences, reemergences, and extinctions with patterns that may be parallel to those observed with present-day viruses. Virology has existed as an organized scientific discipline with the possibility to isolate, store, and study viruses only for about one century. The challenge to reconstruct the events that might have led to viruses similar to the ones we isolate today was addressed in Chapter 1, with a critical first question being if viruses originated 4000 million years ago, or "only" 2000 million years ago (diagrams in Figures 1.3 and 1.4, and Section 1.5 in Chapter 1). In this chapter, we are more modest in our aspirations and we will analyze with the tools of genomics what happens when viruses evolve for months or years in what we call interhost virus evolution.

Unfortunately viruses have not left any fossil record (at least that we can uncover with the available tools), since according to current paleontology nitrogen-and phosphorus-rich molecules are unlikely to be protected in fossils older than 1 million years. At most, decades-old biological samples containing viruses (such as those from lung specimens or a frozen body infected with the incorrectly called "Spanish" influenza of 1918) have been preserved, and sequences have been retrieved. Fortunately, there is a different type of "molecular fossil" record of viral genomes in the DNA of differentiated organisms, in the form of integrated virus-like genetic elements. This research area is termed Paleovirology (Aswad and Katzourakis, 2012) . The presence of recognizable viral genomic sequences in cellular DNA suggests a history of long-term interaction between viruses and cells, in support of some models of virus origins that propose a long coevolution between precellular and cellular entities with virus-like elements (Chapter 1). Despite the current capacity to amplify tiny amounts of viral nucleic acids for nucleotide sequence determinations, proposals on how long-term viral evolution might proceed have to be based mainly on the comparison of viral genomes and the structure of viral proteins from modern representatives of different virus groups. First we should understand the basic concepts related to virus transmission, keeping in mind viral population numbers and the complexity of viral populations.

The basic reproductive ratio (R 0 ) is the average number of infected contacts per infected individual. At a population level, a value of R 0 larger than 1 means that a virus will continue its propagation among susceptible hosts, if no environmental changes or external influences intervene. A R 0 value lower that 1 means that the virus is doomed to extinction at the epidemiological level under those specific circumstances. The basic models of infection dynamics were developed by R.M. Anderson, R.M. May, and M.A. Nowak, with inclusion of the following key parameters: rate k at which uninfected hosts enter the population of susceptibles (x), their normal death rate (u) (so that the equilibrium abundance of uninfected hosts is k/u), number of infected hosts (y), mortality due to infection (v) (so that 1/u + v is the average lifetime of an infected host), a rate constant (β) that characterizes parasite infectivity (so that βx is the rate of new infections and βxy is the rate at which infected hosts transmit the virus to uninfected hosts). These parameters are schematically indicated in Figure 7 .1 and they provide a theoretical value for R 0 (Anderson and May, 1991; Nowak and May, 2000; Nowak, 2006) . R 0 values are not a universal constant for viruses because, as discussed in Chapters 3 and 4, virus variation may affect viral fitness and viral load in infected individuals, and the latter, in turn, may influence the amount of virus that surfaces in a host to permit transmission. Despite uncertainties, consistent R 0 values have been estimated for different viral pathogens based on field observations. Values of R 0 for human immunodeficiency virus type 1 (HIV-1) and severe acute respiratory syndrome (SARS) coronavirus range from 2 to 5, for PV from 5 to 7, for Ebola virus from 1.5 to 2.5. For measles virus (MV), which is one of the most contagious viruses described to date, the R 0 reaches 12-18 (Heffernan et al., 2005; Althaus, 2014) . Most isolates of the SARS coronavirus that circulated months after the emergence of this human pathogen had modest R 0 values, and this is consistent with SARS not having reached the pandemic proportions feared immediately following its emergence. In contrast, MV is highly transmissible, thus explaining frequent outbreaks as soon as a sizable population stops vaccinating its infants. Since some of the parameters that enter the basic equations of viral dynamics depend on the nucleotide sequence of the viral genome, mutations may alter R 0 values, allowing some virus variants to overtake those that were previously circulating in the population (Figure 7 .2). Viral replication, fitness, load, transmissibility, and virulence are all interconnected factors that contribute to virus persistence in its broader sense of virus being perpetuated in nature. These parameters can affect both disease progression in an infected individual and transmissibility at the epidemiological level. A schematic representation of the main parameters of viral dynamics that enter the equations that predict the rate of variation of uninfected and infected (internal horizontal lines in the human figure) individuals (shaded box on the left) and the R 0 value (shaded box on the right). The meaning of parameters and literature references are given in the text. 7.2 REPRODUCTIVE RATIO AS A PREDICTOR OF EPIDEMIC POTENTIAL The difference between the numbers of infectious particles that participate in transmission versus the total number of virus in an infected, donor organism provides a first picture of the indeterminacies involved in viral transmissions. The larger the population size and genetic heterogeneity of the virus in an infected individual, the higher will be the likelihood that independent transmission events have different outcomes. Individual susceptible hosts will receive subsets of related but nonidentical genomes. In a bright article that emphasized the molecular evidence and medical implications of quasispecies in viruses, J.J. Holland and colleagues wrote the following statement: "Therefore, the acute effects and subtle chronic effects of infections will differ not only because we all vary genetically, physiologically, and immunologically, but also because we all experience a different array of quasispecies challenges. These facts are easily overlooked by clinicians and scientists because disease syndromes are often grossly similar for each type of virus, and because it would appear to make no difference in a practical sense. However, for the person who develops Guillain-Barré syndrome following a common cold, or for the individual who remains healthy despite many years of HIV-1 infection, for example, it may make all the difference in the world" (Holland et al., 1992) . Indeterminacies in the process of virus spread can be viewed as an extension of the diversification due to bottleneck events in the case of virus transmission, as visualized in Figures 6.1 and 6.2 in Chapter 6, when dealing with the limitations of the virus samples retrieved from an infected host as the starting material for experimental evolution approaches. Displacement of a virus variant by another, by virtue of the latter displaying a higher R 0 value. The competing viruses are depicted as horizontal lines with a distinctive symbol. Differences in R 0 recapitulate part of the determinants of epidemiological fitness (Section 5.9 in Chapter 5). Concepts of competition among clones or populations within infected host organisms or cell cultures, treated in previous chapters, can be extended at the epidemiological level, with the appropriate choice of the key parameters. References are given in the text.

Despite a necessarily approximate and imprecise knowledge of how many and which types of genomes participate in successive horizontal and vertical transmissions, we can obtain an overall estimate of the rate at which viruses evolve in nature. This is commonly done by comparing consensus genomic nucleotide sequences of viruses isolated at different times during an outbreak, epidemic, or pandemic.

The rate of evolution (also termed rate of fixation or rate of accumulation of mutations) is generally expressed as substitutions per nucleotide and year (s/nt/y). The term fixation is not the most adequate when dealing with virus evolution given that the term refers to a consensus that, in addition to being an average of the real sequences, has a fleeting dominance. Yet, the term is frequently used in the literature of general genetics and virus evolution. The rate of evolution is calculated from genetic distances between consensus viral genomic sequences of successive viral samples from a single persistently or acutely infected host, or from different host individuals infected at different times. Rates of evolution are only indirectly related to mutation rates and mutation frequencies that do not include a time factor in them. There have been several comparisons of rates of evolution for viruses that document the differences between RNA and DNA viruses (Jenkins et al., 2002; Hanada et al., 2004; Domingo, 2007) . A few comparative values are given in Table 7 .1.

Herpes simplex virus constitutes an example of a complex DNA virus for which, despite uncertainties (Firth et al., 2010) a calculated rate of evolution was 10 −8 s/nt/y (Sakaoka et al., 1994) , which is actually closer to the rate estimated for cellular genes than for most viruses. Yet, its mutation frequencies, measured by independent procedures are in the range of 7 × 10 −3 to 1 × 10 −5 (see also Section 7.4.2). The latter values may result from the selective agent targeting a replicating herpes simplex virus that has produced multiple variants, while the overall slow rate of evolution may be influenced by periods of latency. Slow evolution is expected for retroviruses such as human T-cell lymphotropic virus types 1 and 2 (HTLV-1 and HTLV-2) whose life cycles are dominated by the integrated provirus stage, with the viruses following the clonal expansion of their host cells (Melamed et al., 2014) . Some single-stranded DNA viruses display rates of evolution typical of the rapidly evolving RNA viruses (Table 7 .1).

Different genes of the same virus set may show different rates of evolution (i.e., the polymerase and other nonstructural proteins may evolve more slowly than structural proteins). Thus, a rate of evolution is far from being a universal feature of a virus. A comparison of rates of synonymous substitutions (under the assumption that synonymous substitutions do not affect protein function; see Chapter 2 for limitations of considering synonymous mutations as neutral) for several RNA viruses, yielded a range of evolutionary rates of 6 × 10 −2 to 1 × 10 −7 synonymous substitutions per synonymous site per year (Hanada et al., 2004) . The values were recalculated from primary phylogenetic data using maximum likelihood (ML) (Section 7.6), under the assumption of the molecular clock, and inference of the ancestral nucleotide sequences at the tree nodes. The five orders of magnitude variation were attributed mainly to the degree of virus replication rather than to differences in error rate. We will deal with the molecular clock hypothesis (constant rate of accumulation of mutations) in Section 7.3.3, but the major features of virus evolution studied in previous chapters (mainly those typical of mutant swarm-forming RNA and DNA viruses) should make us skeptical of similar evolution rates in different biological contexts. Rate variations were documented with HIV-1 subpopulations in different compartments of the human brain . The data did not fit a "global" molecular clock for the virus in the brain, and "local" clocks showed that meninges and temporal lobe HIV-1 subpopulations evolved 30 and 100 times faster, respectively, than other HIV-1 populations in the brain. It is believed that these differences were due to random drift rather than selection. An additional complication is that even restricting virus isolations to the same biological material in a standard epidemiological setting, several measurements indicated discontinuities in evolutionary rates. The discontinuities had at least two origins: the nonlinear effect of time, and some unique features of evolution occurring inside an infected host. These points are examined next.

Noncumulative sequence changes in the hemagglutinin of influenza virus (IV) type C were found in an early study by Buonagurio et al. (1985) . The authors proposed a cocirculation of variants that belonged to different evolutionary lineages. If multiple evolutionary pathways coexist in a given geographical area, and they establish a network of lineages that evolve with time, variations of calculated rates of evolution are expected, and they may distort the rate of evolution of individual lineages. A second early observation was made during an episode of foot-and-mouth disease (FMD) in Spain. Estimates of the rate of evolution of the virus ranged from <4 × 10 −4 to 4 × 10 −2 s/nt/y, depending on the genomic region analyzed, and the time period between isolations (Sobrino et al., 1986) . Cocirculation of multiple heterogeneous foot-and-mouth disease virus (FMDV) samples ("evolving quasispecies") was proposed. The result to be emphasized here is that the calculated rates of evolution were extremely high (higher than 10 −2 s/nt/y) if the two FMDVs compared were isolated at close time points, while lower values were calculated when the viruses were sampled from different animals at distant time points.

The dependence of the calculated rate of evolution during the epidemic spread of the virus on the time interval between virus isolations for sequence determination is expected for viruses that need not be transmitted by direct contact between an infected and a susceptible host. Some viruses remain infectious in the environment for prolonged time periods, until they reach a susceptible host in which to initiate replication rounds. This is the case of viruses transmitted by the fecal-oral route such as enteroviruses. FMDV can adhere and remain infectious on many objects (fomites), including dust particles, food products with neutral pH, or insects that can transport the virus mechanically. Infectious FMDV can traverse long distances (many kilometers) on dust particles, people, trains, and the like. Even if some infectivity is lost, a few infectious particles are sufficient to infect an animal (Sellers, 1971 (Sellers, , 1981 . There are some classic examples of long-distance transport of FMDV, a virus subjected to close scrutiny due to its economic impact. One is the spread of SAT1 and A22 FMDV during the 1960s in Turkey along the railway line from the cattle raising region of Lake Van to slaughterhouses in Istambul [this and other examples are described in (Brooksby, 1981) ]. Computer models have been developed to explain and predict possible airborne FMDV transmission in different geographical areas (Sorensen et al., 2000) . [As an anecdote, in my experience as a member of the Research Group of the Standing Technical Committee for the Control of FMD of Food and Agriculture Organization of the United Nations (FAO) in the 1980s, FMD outbreaks in any country always came from somewhere else.].

For viruses that can remain infectious outside their hosts, and that do not need donor-recipient host contacts to perpetuate transmission chains, the time between isolations will influence the calculated rate of evolution based on genomic nucleotide sequences. The reason is that during the extracellular stages, the virus will not undergo genetic change, at least to the extent of variation during intracellular replication (possible mutations due to chemical damage in viral genomes is indicated in Section 2.2. of Chapter 2). The effect of nonreplicative time intervals in the rate of evolution is illustrated in Figure 7 .3.

Some complications should be considered in the interpretation of the analyses depicted in Figure 7 .3: (i) the consensus sequences determined to characterize the virus shed by each animal is a simplification of the real genome composition of the virus. (ii) Individual animals vary in physiological and immunological status, and, obviously, they are not in line waiting to be infected; they move, gather around water and food sources, some are isolated, others in close contact with their peer, and Illustration of the inverse correlation between the time between viral isolations for consensus sequence determination, and the calculated rate of evolution. Animals sustain the replication (inside curved arrow) of virus that will be transmitted to a susceptible animal. The time that the virus spends outside an animal (absence of replication) is depicted by a horizontal arrow. The time between virus isolation is given by t1, t2, and t3. The number of nucleotide differences in the virus isolates relative to the sequence of the initial (reference) animal is given by d1, d2, and d3 (vertical arrows). Because of the increasing periods of stasis (addition of horizontal arrows), calculated rates of evolution given by the d/t ratio will be higher the shorter the time interval between isolations. See text for references. so on. (iii) In this case, virus transport is assumed to be mechanical (on dust particles carried by wind, aerosols, insects, etc.) without additional viral replication during transport. Yet, subpopulations of the most environment-resistant particles, or particles that adhere best to the transporter object, may bias the composition of the virus that will reach an animal to pursue replication. Such events, occurring for ten to hundred rounds of host infections, render the appalling virus diversity described in Chapter 1 a bit less appalling. Since several additional environmental circumstances are changeable and unpredictable, it is unlikely that rates of viral evolution in nature can remain invariant on the basis of some internal principle of constant mutation occurrence (as if accumulation of mutations was as monotonous as radioactive decay!).

Additional observations against constant mutational input with time have been made with HIV-1 and human and avian hepatitis B virus (HBV). The main finding is that interhost rates of evolution are lower than intrahost rates, even under a comparable set of epidemiological parameters. Several proposals have been made to account for this difference. A.J. Leslie and colleagues described cytotoxic T lymphocyte (CTL)-escape mutants of HIV-1 from infected patients. Some of the mutants reverted to the wild-type sequence after transmission to individuals negative for the human leukocyte antigen (HLA) alleles associated with long-term HIV-1 control (Leslie et al., 2004) . Strong intrahost selective pressures and reversion of part of the selected mutations upon transmission to a susceptible individual is one of the possible mechanisms behind diminished evolutionary rates when viruses from multiple hosts are compared (Figure 7 .4, Box 7.1).

J.T. Herbeck, J.I. Mullins, and colleagues systematically observed lower nucleotide sequence divergence between HIV-1 isolates from different individuals sampled in primary infection than between isolates from individuals with advanced illness. HIV-1 regained some ancestral features when infecting a new host, again explaining a higher intrahost than interhost evolutionary rate (Herbeck et al., 2006) . In a study of HIV-1 transmission between several pairs of individuals over an 8-year period, A.D. Redd and colleagues reported that the viral populations found in the newly infected recipients were more closely related to ancestral sequences from the donor than to the sequences found in the donor near the time of transmission (Redd et al., 2012) . Preferential transmission of ancestral sequences may also contribute to lower interhost than intrahost rates of evolution (Box 7.1). K.A. Lythgoe and C. Fraser provided evidence that cycling of HIV-1 through long-lived memory CD4 + T cells is probably the main contributing factor to slower HIV-1 evolution at the epidemic level (Lythgoe and Fraser, 2012) . Ancestral sequences of HIV-1 in infected individuals may arise by the activation of proviral sequences kept in the form of quasispecies memory. In this case, it is the type of molecular memory that we defined as reservoir, anatomical, or cellular memory in Section 5.5 of Chapter 5. A related type of reservoir memory is found in HBV, in the form of covalently closed circular DNA (cccDNA) that persists in the nuclei of infected hepatocytes, and acts as a template for the synthesis of pregenomic RNA and viral mRNAs (Kay and Zoulim, 2007) . In this case, a record of ancient sequences is registered in the cccDNA. It should be noted that memory levels are dependent on fitness values, as evidenced experimentally with FMDV and expected from the theoretical basis of memory implementation (Chapter 5). In consequence, the most abundant memory genomes established early in an infection might be those displaying the highest fitness early in infection, and they might be better adapted to initiate infections than to sustain them (Figure 7 .4).

Additional mechanisms for the time dependence of evolutionary rates have been suggested for HBV. In a 17 years follow-up of several patients, HBV diversity increased during periods of active

• For viruses that remain infectious in the extracellular environment, stasis due to absence of replication will result in rates of evolution inversely correlated with the time between the isolation of the compared viruses. • Adapt and revert. Mutants that permit adaptation to a new host individual revert upon transmission. • Preferential transmission of ancestral sequences. Despite diversification in any host, ancestral sequences have a selective advantage in transmission. They may be retrieved from cellular memory (integrated provirus in HIV-1 or cccDNA in HBV). • Colonization-adaptation trade-off. Sequential changes in the intensity of the host immune response favor dominance of some genome subpopulations over others. Upon transmission, ancestral minority subpopulations may become dominant.

A possible mechanism for a faster intrahost than interhost rate of evolution. Transmission events are represented by long arrows, and intrahost evolution by short arrows. The virus in the person on the left (black outline) has evolved to generate a complex mutant spectrum. However, only a subset of the genomes are efficiently transmitted to the recipient person (brown outline). The virus in the recipient person evolves toward a complex mutant spectrum. Again, in this new mutant spectrum only a subset of genomes that resemble the ones in the first transmission are efficiently transmitted to the third person (green outline). The net result is that because at each transmission the genomes related to those that first entered the previous host have an advantage, rates of evolution will appear as slower than those within each host. Boxes at the bottom summarize the major event at each step. See text for additional related mechanisms and references.

host immune response, and viral copy numbers decreased. When the immune response was weak, viral genome diversity decreased and viral copy numbers increased; these periods are expected to be those of high transmissibility (Wang et al., 2010) . Endogenous hepadnaviruses are present in the genomes of several organisms. There is evidence that some of the integration events in avian hosts are at least 19 million years old. These integrated hepadnaviruses maintain about 75% nucleotide sequence identity with present-day hepadnaviruses, and the comparisons suggest that the long-term substitution rates are 10 3 -fold lower than those for circulating avian HBVs (Gilbert and Feschotte, 2010) . Permanence of viral genomic sequences in cellular DNA is a mechanism of evolutionary stasis, as it was emphasized in Chapter 3 with the comparison of the evolutionary rate of the retroviral v-mos gene and its cellular counterpart c-mos (Gojobori and Yokoyama, 1985) , among other evidence. Considerable evolutionary stasis is also observed by comparing isolates of HTLV-1 and HTLV-2 whose replication displays preference for maintaining its integration in cellular DNA (Melamed et al., 2014) . For viruses that have a dual potential of error-prone replication and of cellular DNA-like stasis, the permanence in cellular DNA may also contribute to reduced long-term evolutionary rates.

HBV quasispecies dynamics was examined in virus that infected members of the same family that presumably acquired the virus through mother-to-infant transmission (Lin et al., 2015) . Again, intrahost evolutionary rate was higher than interhost rate, and the latter decreased with the number of transmissions. The differences were mainly due to nonsynonymous substitutions at limited sites. These observations were interpreted as a rapid switch of HBV between colonization (invasion of new host) and adaptation (quasispecies optimization in the new host). The authors referred to the colonization-adaptation trade-off (CAT) model, or alternations of virus facing an environment marked by a limited host immune response followed by a period of active immune response. In the former environment, viruses displaying rapid replication are selected, while in the latter environment, HBV escape mutants with lower productivity are selected. In each transmission, when the virus reaches a new host, the previously adapted subpopulations are overgrown by the rapidly replicating ones. cccDNA can serve as a reservoir of ancient sequences.

In agreement with these proposals, rates of evolution measured in a single infected individual persistently infected with a continuously replicating virus tend to be higher than those observed with the same viruses isolated from different individuals (Morse, 1994; Domingo et al., 2001; Domingo, 2006) . Slowly evolving viral genes may nevertheless undergo episodes of rapid evolution and, vice versa, a rapidly evolving gene may be transiently static. This should be considered in statistical approaches to evolution (Gaucher et al., 2002) .

Several not mutually exclusive mechanisms can account for the difference between intra-and interhost rates of evolution as well as the inverse correlation between time of viral isolation and rate of evolution in a scenario of viral disease outbreaks or epidemics (Box 7.1). Some of the mechanisms involve viral population numbers and competition among subpopulations of mutant spectra as critical ingredients. The molecular clock hypothesis dwindles as a conceptual framework because from all evidence, virus evolution is far from being dictated by a steady accumulation of mutations in viral genomes. The major event is "replacement of subpopulations" rather than "accumulation of mutations in genomes." This conceptual change is as important for the long-term evolution of viruses as it was the consideration of the wild type as a cloud of mutants in the definition of a viral population (Chapter 3).

There are additional arguments against the operation of a molecular clock in virus evolution. According to the clock hypothesis, the rate of accumulation of mutations coincides with the rate at which mutations arise in the infected individuals. This holds for neutral mutations and, as documented in several chapters of this book, very few mutations occurring in highly compact viral genomes are truly neutral (with no functional consequences in any environment; see Section 2.3 in Chapter 2). Even for neutral evolution, spatial asymmetries in populations are sufficient to perturb the molecular clock rate, as documented with a theoretical model of broad applicability (Allen et al., 2015) . Yet, quasispecies and the operation of a clock are not totally irreconcilable. An epidemiological study with FMDV suggested that viral quasispecies could produce a transient molecular clock due to the periodic sampling of components of the mutant spectrum in transmission (Villaverde et al., 1991) . In this case, both time differences between transmissions and spatial heterogeneities would blur the transiently observed regularity.

Viruses can change their antigenic properties gradually, in a process termed antigenic drift, or suddenly, in a process termed antigenic shift. The distinction between antigenic drift and shift was established with IV (Gething et al., 1980; Webster, 1999; Parrish and Kawaoka, 2005) . Shift in IV is due to genome segment reassortment that incorporates new hemagglutinin or neuraminidase genes. In monopartite viruses, the difference between gradual and drastic antigenic change has also been established (Martínez et al., 1991b) .

The antigenic diversification of one FMDV serotype was examined over a six-decade period by comparing amino acid sequences of the major antigenic sites of the virus isolated in three continents (Martínez et al., 1992) . The evolution of the capsid genes was associated with linear accumulation of synonymous mutations, but not of amino acid substitutions. Remarkably, the antigenic variation over six decades was due to fluctuations among limited combinations of amino acid residues without net accumulation of amino acid substitutions over time (Figure 7 .5). This result suggests that constraints at the protein level may maintain a long-term virus identity at the antigenic level. In a related observation Evolution of a major antigenic site of FMDV over four decades. The sequence at the top is that of amino acid residues 129-151 of capsid protein VP1 of FMDV C2 Pando, isolated in Uruguay in 1944. Key amino acid positions did not diverge in a linear fashion and isolates over four decades displayed only two types of amino acid residues at each position. See text for possible mechanisms and references. on the inter-host evolution of HIV-1 mentioned in Section 7.3.2. HIV-1 recovered ancestral features when infecting a new host (Herbeck et al., 2006) . Thus, multiple constraints in viruses may limit the rate and mode of long-term diversification, resulting in different numbers of circulating serotypes among related viruses.

A puzzling question in evolutionary virology is that despite sharing high mutation rates, some viruses display extensive antigenic diversity in nature reflected in multiple serotypes, while other viruses maintain a relatively invariant antigenic structure, with only one serotype recorded. For the latter group of viruses, the same vaccine can maintain its efficacy over many decades; examples are rabies virus (RV) and MV, two RNA viruses that show remarkable genetic diversity in nature, and estimates of mutation rates and frequencies comparable to other RNA viruses. Antigenic constancy versus variation is also a determinant of long-lasting immunity after infection or vaccination. MV infection produces lifelong immunity (probably as a result of several factors) while patients that have cleared hepatitis C virus (HCV) can be re-infected by the same virus. Cases of patients infected with HCVs of different genotypes are increasingly identified, as more refined diagnostic tests are utilized. No correlation between virus structure (or morphotype) and antigenic diversity has been found. Among structurally closely related viruses, differences in antigenic diversity are apparent. A dramatic case is that of the picornaviruses since encephalomyocarditis virus (EMCV) or hepatitis A virus (HAV) have a single serotype, while human rhinoviruses (HRVs) have been divided into more than 100 serotypes. Other picornaviruses have intermediate numbers of serotypes: three in the case of PV and seven in the case of FMDV. Although it may seem that a diverse antigenic structure may predict a broad host range, this is actually not the case. HAV is highly specialized for the human host while EMCV infects more than 30 species, including mammals, birds, and invertebrates (Knowles et al., 2010) . Several, not mutually exclusive models, have been proposed to account for differences in the antigenic stability (number of serotypes) among viruses:

• Differences in mutation rates, either the average value for the entire genome, or the local mutation rate at the genomic sites that encode antigenic determinants. • The presence of some dominant and invariant antigenic sites that evoke long-lasting antibodies in the infected hosts, and that obscure other antigenic sites that produce different antibodies that have a limited impact on the antigenic profile of the virus. • Differences among the assays used for serotype classification. If a universal and standard procedure to classify virus isolates in different serotypes were applied, differences among viruses would be largely lost. • Difference in the history of virus circulation. Ancient viruses that undergo many rounds of genome replication in each infected host have had an opportunity to diversify antigenically in a manner not possible with viruses that have a more limited history of circulation among susceptible hosts. Antigenic diversification of some viruses currently viewed as antigenically invariant will take place during the next hundreds of years if their circulation continues. • Some viruses have antigenic sites that cannot vary because they are under severe constraints to accept amino acid substitutions. Antigenic variants may exist as low-fitness subpopulations, but their frequency is too low to modify the results of the diagnostic tests used for serological classifications.

Consideration of these possibilities requires examining some experimental data on virus antigenicity. First, as a conceptual precision, we assume that the number of serotypes is essentially determined by amino acid sequences located in the virus particle, and that either directly or indirectly can affect the interaction of virus with antibodies. Neutralizing and nonneutralizing antibodies may contribute to serological distinctions, depending on the assays performed for serotyping. Serum neutralization tests will identify differences in sensitivity to neutralization while enzyme-linked immunosorbent assay (ELISA) tests will capture reactivity by all raised antibodies. Antibodies can be obtained from infected natural hosts, or from some laboratory animals which are not a natural host for the virus. An ensemble of amino acid residues forms an antigenic determinant which is usually composed of multiple epitopes [defined here as a unit of interaction with a monoclonal antibody (MAb)]. Epitopes can be either continuous (also termed linear) or discontinuous (also termed structured). Continuous epitopes are those whose primary amino acid sequence has the information to react with the cognate antibody. Discontinuous epitopes are those whose reactive residues come from distant positions of the same protein or from residues of different proteins. Many overlapping epitopes can be found within the same antigenic site. Epitopes can include modified amino acid residues such as glycosylated amino acids. Reactivity of discontinuous epitopes with the cognate antibody is generally lost as a consequence of denaturation of the proteins that form the epitope.

With these introductory clarifications, we can now examine the different possibilities listed above.

There is no correlation between limited antigenic diversity and low average mutation rate. Mutation rates and frequencies for RNA viruses fall in the range of 10 −5 to 10 −3 substitutions per nucleotide (Chapter 2). However, mutation rates along a viral genome are not uniform, as evidenced by the occurrence of hot spots for variation. Influences such as nucleotide sequence context or RNA structure may conceivably alter mutation rates. It was proposed that a predicted double-stranded RNA at the region encoding the major antigenic site of FMDV might increase locally the polymerase error rate and give rise to multiple amino acid substitutions (Weddell et al., 1985) . While at some specific sites polymerases may be more error prone than average, subsequent evidence for FMDV indicated that antigenic variation is due to amino acid substitutions at different antigenic sites and that even variation at the major site can be mediated by distant amino acids on the viral capsid (Rowlands et al., 1983; Geysen et al., 1984; Mateu et al., 1990; Feigelstock et al., 1996) . Later molecular studies have not provided evidence that viruses may have a large number of serotypes because their polymerases are more error prone when copying regions encoding amino acids that belong to antigenic sites. Therefore, the possibility that differences in mutation rates determine the number of serotypes is highly unlikely.

Most viruses include multiple antigenic sites, and antibodies are raised against several surface proteins to produce an array of neutralizing and nonneutralizing antibody molecules. Taking again picornaviruses as an example, the number of antigenic domains (each composed of multiple epitopes) varies between one and four (Mateu, 1995) . There is no evidence that a restriction in the number of sites or epitopes or that the expression of a salient class of antibody molecule may explain a 100-fold difference in the number of serotypes among picornavirus genera. Thus, the second proposal is unlikely to be correct.

The difference among classification assays argument does not have an easy response. Indeed, there is no universal procedure used to classify viruses serologically, and, therefore, strictly speaking, there in the possibility that different numbers may be obtained using alternative classification procedures. FMDV is a pertinent example. Its seven serotypes are defined on the basis of a very stringent test that cannot be performed with a human virus for obvious reasons: absence of cross protection resulting from vaccination or infection with a given FMDV. Infection or vaccination with FMDV of one serotype does not confer protection against FMDV of a different serotype. In contrast, the subtype classification of FMDV was based on serological assays such as cross-neutralization or complement fixation tests, usually using sera raised in guinea pigs. These assays allowed classification of FMDV in more than 65 serological subtypes. Subtyping was stopped when it was realized that using increasingly discriminatory assays such as reactivity with MAbs, virtually any new isolate could define a new subtype [ (Mateu et al., 1988) , see Domingo et al., 1990; Sobrino and Domingo, 2004 for review of serotype and subtype classification of FMDV]. Despite these considerations, it is unlikely that serological assays using in vitro tests would be responsible for a 100-fold difference between two human pathogens such as HRV and HAV. Thus, it does not seem justified to attribute antigenic constancy to an artifact derived from diagnostic procedures.

More extensive virus circulation will favor genetic and antigenic diversification, as repeatedly justified in several chapters of this book. Obviously, following hundreds of additional years of circulation of a virus, a single serotype may diversify in multiple serotypes. What we are describing today may be a snapshot of an evolving process. Genotype differentiation is actually being witnessed during the expansion of HCV pandemics, partly due to a true genetic diversification of the virus as it circulated over the last decades, and partly due to increasing capacity of virus surveillance, and of molecular and phylogenetic tools for genome analysis. The reader can find an illustration of this point by comparing the expanded phylogenetic HCV tree from six to seven genotypes and the subtype ramifications, published by P. Simmonds and colleagues in 1993 [compare (Simmonds et al., 1993 and (Smith et al., 2014) ]. Although it cannot be excluded that time might tend to equalize the number of serotypes among viruses, current evidence does not justify blaming unknowns of long-term evolution to settle this issue.

We come to constraints at antigenic sites that limit the number of accepted amino acid substitutions as a model for antigenic invariance. It is the preferred model of molecular virologists. The initial concept was proposed by M.G. Rossmann, in his canyon hypothesis (Rossmann, 1989) , based on studies with HRV14. A canyon in the virus preserves the receptor-binding site inside the canyon, while permitting amino acid substitutions that affect antigenicity, without consequences for receptor recognition. A physical and functional separation between receptor and antibody binding allows extensive antigenic variation. Clearly, in many viruses there is an overlap between antigenic and receptor recognition sites (Section 4.5 in Chapter 4), that could limit antigenic variation. A difference in constraints is also supported by a structural comparison carried out by J.M. Casasnovas and his colleagues of the interaction of PV and HRV16 with their respective cellular receptors, that revealed a receptor-binding site more accessible in PV than in HVR16 (Xing et al., 2000) . This would render HRV the picornavirus most prone to antigenic variation, as indeed found in nature. Thus, constraints imposed by the requirement to interact with the cellular receptor may explain limited capacity for antigenic diversification, and perhaps with the contribution of other possible influences, the puzzle of widely different antigenic types despite similar high genome mutability. Additional structural and functional studies with viruses of different families are necessary to substantiate this proposal.

If some viruses have a limitation in accepting amino acid substitutions at their antigenic sites, they are expected to yield low frequencies of MAb-escape mutants [monoclonal antibody-resistant mutant (MARM) frequencies] in laboratory experiments. Comparison of MARM frequencies of different viruses shows that this is not the case (Table 7 .2). In particular, the cardiovirus Mengo virus (one serotype) displays similar MAR frequencies than HRV (hundred serotypes). In fact, none of the RNA and DNA viruses listed in Table 7 .2 deviate from a broad range of MARM frequencies of 10 −3 to 10 −5 , except for substitutions at some discontinuous epitopes of FMDV (Lea et al., 1994) . In several of the studies listed, the stability of the selected escape mutants was tested after a few passages in cell culture, but in other studies, lack of reversion of the antigenic change was not ascertained. Two FMDV escape mutants showed a selective disadvantage over the parental wild-type virus (fitness decrease); upon continued replication, the mutants acquired fitness-enhancing mutations without reversion of the antigenic change (Martínez et al., 1991a) . Unless the escape mutations are selectively neutral, the expectation is that MARM frequencies may be an underestimate of the real rate at which the amino acid substitutions occur. Thus, it is possible that following selection by an antibody, some mutants may decrease in frequency due to a fitness cost, or that their level is maintained due to additional compensatory mutations acquired by the replicating genomes (Figure 7 .6). Viruses that are highly constrained for antigenic variation may be diagnosed through fitness decrease of MARM mutants despite them occurring at similar rates as those that affect unconstrained sites. This is a concept similar to the distinction between fitness and function that we made in Section 5.8 of Chapter 5. That is, the occurrence of an antigenic change does not guarantee that the change will be perpetuated in nature and contribute to natural antigenic diversification. Again, fitness should be considered as a relevant parameter, and fitness effects on antigenic stability have been largely unexplored. 

The likely multiple origin of viruses, followed by extended events of interaction with evolving host organisms of all phyla have produced the myriads of viral particles that at the present time outnumber cells by a factor of 10 (Chapter 1). The way to put order into such diversity is to classify viruses as done periodically by the International Committee on Taxonomy of Viruses (ICTV) (http://www.ictvonline.org/). The computational procedures developed to study phylogenetic relationships in evolutionary biology are routinely applied to virology to establish relationships among closely or distantly related viruses (Page and Holmes, 1998; Hall, 2001; Felsenstein, 2004; Salemi and Vandamme, 2004; Yang, 2006) . No phylogenetic tree that connects the viruses that have been characterized to date can be derived in a reliable way, not even a tree for DNA or for RNA viruses. What we can do is to produce trees for related viruses that probably share a common ancestor. Many data banks are available for viruses to retrieve sequences for comparison with new isolates. Despite the fact that data banks are periodically updated, some are listed in Table 7 .3, and can serve as the starting point to reach the desired uniform Stability of selected mutant subpopulations when selective pressure is removed. This scheme is the same portrayed in Figure 6 .3 of Chapter 3, but shifted toward the right of the time scale. Five different colors have been chosen to depict fluctuations of four genomic classes. In a real population, thousands of genomes may be involved in each infected cell. In the present diagram, the red line represents genomes selected for their resistance to neutralizing monoclonal or polyclonal antibodies. Once the antibody pressure is removed, the mutant genomes may remain dominant either because the relevant substitutions do not affect viral fitness or because compensatory mutations have been acquired (top diagram). In contrast, upon removal of the antibody pressure, the proportion of selected genomes may fade with time due to a fitness cost and absence of compensatory mutations (bottom diagram). In the latter case, the antibody-resistant mutant will not contribute to the long-term antigenic diversification of the virus. resource locator (URL) to implement a procedure for genome characterization. Prior to any comparative study of nucleotide or amino acid sequences (not only to establish phylogenetic relationships, but also to calculate genetic distances, to identify regulatory regions, functional domains, and structural motifs, to design oligonucleotide primers for amplifications, or other applications) it is essential to align sequences accurately, and some programs for sequence alignments are also given in Table 7 .3.

Databases differ in format and contents, which may include prediction of traits derived from sequence information (RNA secondary structures, antiviral drug sensitivity levels, assignments to homologous protein families, etc.). Some of them offer a link with the web page of the ICTV thus providing background information to assign newly determined sequences to current taxonomic groups. A structure-based amino acid sequence alignment of protein homologues can be carried out based on three-dimensional structures of proteins. Such types of amino acid sequence alignments may help in the identification of relevant structural and functional motifs. Sequence variability among a set of aligned sequences can be quantitated by the number of variable sites, mean pairwise diversity, mutation frequency, and other estimators (i.e., the Watterson's estimator) (Page and Holmes, 1998; Mount, 2004; Salemi and Vandamme, 2004 ) (see also Chapter 3 for parameters used to quantify mutant spectrum complexity). Relevant information on protein evolution can be derived from alignment of the protein sequence of related viruses (or of isolates from one virus, or for components of the same mutant spectrum) and analyzing the statistical acceptability of the divergent amino acids at each position. Statistical acceptability derives in part from the chemical nature and shape of the amino acid side chains and from the limitations that the genetic code imposes on amino acid replacements (Porto et al., 2005) . The basic assumption is that the more conserved the amino acid sequences, and the more similar are the variant amino acids, the more likely is that the proteins derived from a common ancestor. M. Dayhoff pioneered the early comparison of protein sequences establishing a protein information resource (PIR) in the middle of the twentieth century. Tables named PAM (percent accepted mutation) were constructed and several evolved versions such as BLOSUM matrices, based on the BLOCKS database, are used to compare protein sequences. The BLOSUM62 amino acid substitution matrix groups amino acids according to their chemical structure and provides a probability of occurrence of each amino acid replacement: zero, amino acid replacement expected by chance; positive number, replacement found more often than by chance; and negative number, replacement found less often than by chance.

The URLs listed in Table 7 .3 give access to computational analyses that allow sequence alignments and derivation of phylogenetic trees which are extremely informative of middle-and long-term evolutionary change of viruses (Page and Holmes, 1998; Notredame et al., 2000; Mount, 2004; Salemi and Vandamme, 2004; Holmes, 2008 Holmes, , 2009 . Application of phylogenetic methods to virus evolution requires careful consideration of the evolutionary models to be used, including probabilities of the different types of nucleotide and amino acid replacements, and the rates at which they may occur. Statistical methods (i.e., likelihood ratio tests) are available to select an adequate models for a given data set (Salemi and Vandamme, 2004 ). At the nucleotide sequence level, it is often assumed that when transitions are more frequent than transversions in a set of related sequences, no saturation of mutation took place. In contrast, when transversions are more frequent than transitions, saturation is presumed 7.6 PHYLOGENETIC RELATIONSHIPS AMONG VIRUSES (Xia and Xie, 2001) . Parameter α applied to amino acid sequence alignments (e.g. using program AAml from PAML package, version 3.14) takes into account multiple amino acid replacements per site, as well as unequal substitution rates among sites (Yang et al., 2000) . Parameter α can be calculated using the amino acid replacement matrix WAG available in the program MODELTEST (Posada and Crandall, 1998) . Despite their obvious utility, it is unlikely that these statistical procedures which were developed on the assumption of successions of defined sequences (rather than mutant clouds) can capture the complexities underlying long-term evolution of viruses in nature. Phylogenetic reconstructions based on nucleotide (and deduced amino acid) sequence alignments are generally possible with selected genes of relatively close viruses (i.e., that belong to the same family). The main methods used to derive evolutionary trees are: maximum parsimony, distance, ML, Bayesian methods of phylogenetic inference, and splits-tree analysis [reviewed in (Eigen, 1992; Page and Holmes, 1998; Mount, 2004; Salemi and Vandamme, 2004; Sullivan, 2005; Holmes, 2008) ] (Table 7 .3).

Maximum parsimony predicts the minimal mutation steps needed to produce the observed sequences from ancestor sequences. It is most suitable for closely related sequences. Often, all possible trees are examined before a consensus tree is produced, and, therefore, the method is time consuming. Most programs based on maximum parsimony assume the operation of a molecular clock, with the limitations that were discussed in Section 7.3.

Distance methods are based on the calculation of genetic distances between any two sequences of a multiple sequence alignment. Large genetic distances require a correction for multiple mutational steps (i.e., Kimura 2-parameter distance). Most distance methods can handle large numbers of sequences, and results are relatively reliable even when a molecular clock does not operate. Commonly applied distance methods include neighbor joining (NJ) (that does not assume a molecular clock and yields an unrooted tree), several variant versions of NJ, and the unweighted pair group method with arithmetic mean (UPGMA, a clustering method that assumes a molecular clock and produces a rooted tree). The software package TREECON was developed to derive NJ trees.

ML methods use probability calculations to derive a branching pattern from the mutations at different positions of the nucleic acids under study. They can estimate both distances and the most accurate mutational pathway between sequences. Generally, supercomputers are needed when many sequences are compared since all possible trees are examined. ML methods are included in several programs listed in Table 7 .3. Bayesian methods (based on conditional probabilities derived by Baye's rule) (Huelsenbeck et al., 2001; Ronquist and Huelsenbeck, 2003; Huelsenbeck and Dyer, 2004) have the advantage of increased speed of data processing, but they still require time to avoid incorrect inferences.

Splits-tree procedures are based on split-decomposition theory or statistical geometry, and they provide a geometrical representation of the distance relationships in sequence space (Eigen, 1992; Dopazo et al., 1993; Salemi and Vandamme, 2004) (Chapter 3). The procedure has been used to analyze rapidly evolving viral sequences (http://bibiserv.techfak.uni-bielefeld.de/splits/); methods that allow the inclusion of insertions and deletions have been adapted to the splits-tree program (Cheynier et al., 2001) . Phylogenetic trees can be presented as rooted trees (with a reference out-group) and unrooted trees.

When possible it is advisable to apply different phylogenetic procedures to compare tree topologies. Resampling methods (i.e., bootstrapping, jackknifing, etc.) are used to assess the statistical reliability of the trees (Page and Holmes, 1998; Salemi et al., 1998; Mount, 2004; Salemi and Vandamme, 2004) . A tree defines clades or lineages of a virus attending to groupings by relatedness. Different tree topologies can be obtained when analyzing different genes of the same virus set. Discordant phylogenetic positions of two different genes of the same virus is suggestive of recombination, that should be evaluated statistically (Worobey, 2001; Salemi and Vandamme, 2004; Martin et al., 2005) . Recombination is very frequently in viruses, and in some of them is intimately linked to the replication mechanism (Chapters 2 and 10).

The more conserved genes (i.e., the polymerase and other nonstructural protein-coding genes) may permit the establishment of phylogenetic relationships among some distant virus groups. Examples are the clustering of a number of animal and plant RNA viruses as supergroups (Morse, 1994) . Families of DNA-dependent DNA polymerases group some bacterial and bacteriophage DNA polymerases with some eukaryotic polymerases (Morse, 1994; Villarreal, 2005) , in support of active exchange of modules during coevolution of viruses and their hosts (Botstein, 1980 (Botstein, , 1981 Zimmern, 1988) . In contrast to conserved genes, variable genes (typically capsid proteins and surface glycoproteins) serve to establish short-term evolutionary relationships within the same virus group, including the survey of virus variation during outbreaks, epidemics, and pandemics (Gorman et al., 1992; Martínez et al., 1992; Morse, 1994; Gavrilin et al., 2000) .

Distantly related viruses, with no discernible nucleotide or amino acid sequence identity, can sometimes be grouped on the basis of the three-dimensional structures of viral proteins. The evolutionary trace (ET) clustering method combines phylogenetic partition of sequences with structural information (Chakravarty et al., 2005) , and it may help identifying functionally relevant domains shared by divergent isolates in particular highly variable capsid and surface viral proteins. ET can be applied to proteins and nucleic acids, and its clustering features may reveal conserved structures that are overlooked when all sequences are compared together. As explained in Chapter 1, the great diversity of amino acid sequences recorded among viral structural proteins (several URL links in Table 7 .3) are actually reduced to a limited number of morphotypes at the structural level. In another approach, the probabilities of equivalence between pairs of residues in viral proteins are converted into evolutionary distances (Bamford et al., 2005; Ravantti et al., 2013) . The structure-based classification has grouped the coat protein of icosahedral viruses in separate classes, each of which, interestingly, embraces different domains of life (Archaea, Bacteria, and Eukarya). A lineage of structurally related viruses includes tailed bacteriophages and the herpesviruses, suggesting that parts of the genomes of complex viruses may have a very ancient origin. They might have belonged to viruses that infected primitive cells, before the latter diverged into the domains of life that we identify in our biosphere (Bamford et al., 2005; Villarreal, 2005) (compare with models of virus origins in Chapter 1). Viral clades may cluster with clades of their host species, suggesting either virus-host coadaptation or an extended parasite-host relationship, with limited possibilities of jumping the host barrier (Section 7.7). Hantaviruses and their rodent hosts (Plyusnin and Morzunov, 2001) , lyssaviruses and bat species, spumaviruses and their primate hosts, and herpesviruses and their vertebrate hosts, are some among other examples of long-term host-virus coevolution (Mc Geoch and Davison, 1999; Woolhouse et al., 2002; Switzer et al., 2005) . (See, however, a discussion on time scale discrepancies of coevolutionary rates (Sharp and Simmonds, 2011) , and compare with section 7.3.3)

The viral groups defined by phylogenetic methods may or may not occupy a defined geographical location. It will depend on whether viral vectors or infected individuals carry the virus over long distances or not. A defined phylogenetic group may include viruses that produce similar or different pathology. This is because the capacity of a virus to cause disease may depend on modest genetic change (i.e., one or a few amino acid substitutions) that does not alter its position in a phylogenetic tree. It is important to emphasize that, independently of the time frame considered, the tips of phylogenetic trees are a cloud of mutants, that genomes within the cloud are the origin of future diversification pathways, and that individual cloud components may differ in pathogenic potential. Figure 7 .7 summarizes the diversification of HIV-1 since it entered the human population. Once HIV-1 originated from multiple introductions of a chimpanzee simian immunodeficiency virus (SIVcpz), the four major HIV-1 groups M, O, N, and P were generated, and group M evolved into the multiple subtypes and recombinant forms that circulate at present. Many factors determine the pathogenic potential of any of the HIV-1 subtypes and the newly arising recombinant forms. The relevance of the mutant cloud in determining viral fitness and survival was documented by comparing five isolates of west Nile virus (WNV) that had identical consensus sequences and differed in the mutant spectrum, as analyzed by next-generation sequencing (NGS) (Kortenhoeven et al., 2015) (Figure 7 .8). The study concerned a WNV lineage 2 that circulated in Europe during the beginning of the twenty-first century. Environmental changes modified the haplotype composition while maintaining an invariant consensus sequences, an example of "perturbation" manifested only at the level of the mutant spectrum (see Section 6 in Chapter 6).

HIV-1 is a notorious case of successful emergence of a new viral pathogen from a zoonotic reservoir of a related virus. However, despite limited records, there is also evidence that some viruses that once produced human disease might be now extinct. One example is Economo's disease (also termed lethargic encephalitis or epidemic encephalitis), a degenerative disease of the brain that produced loss of neurons. The disease had an acute phase of variable duration and intensity, followed by a chronic phase, sometimes with a late onset of symptoms. The disease showed a seasonal character with maximum incidence in late winter. The first cases were recorded in Eastern Europe in 1915 and the disease was first described by Baron C. Von Economo in Vienna in 1917 . In 1920 -1923 the disease attained pandemic proportions, although the number of cases and mortality were limited. It was estimated that between 1917 and 1929 about one hundred thousand cases occurred in Germany and Great Britain and then, mysteriously, the number of cases decreased and the disease disappeared (Ford, 1937) . Economo's Kortenhoeven et al. (2015) . BMC Genomics is an open access journal, and the article can be reproduced under the terms of the Creative Commons Attribution. The figure has been reproduced with permission of the authors. 7.7 EXTINCTION, SURVIVAL, AND EMERGENCE OF VIRAL PATHOGENS disease, of a likely viral origin, is now extinct. At the time it was suspected that a virus similar to IV or some picornavirus might have been the etiological agent of this disease, but no proof could be provided.

FMDV, the agent of the economically most important disease of cattle and other farm animals circulated until recently as seven different serotypes termed A, O, C, Asia 1, SAT1, SAT2, and SAT3, and each serotype as multiple subtypes and antigenic variants (review in Sobrino and Domingo, 2004) . Interestingly, in the 1980s the incidence of serotype C FMDV decreased to the point that at the beginning of the twenty-first century this FMDV serotypes was considered nearly extinct and its eradication feasible. It cannot be totally excluded, however, that type C FMDV is replicating in some persistently infected ruminant in some remote part of our planet and that the virus reemerges again. If not, its ecological niche has been occupied by FMDVs of other serotypes. This is one important issue behind virus eradication (smallpox in the late 1970s or rinderpest in 2011): the possibility that the niche left by an eradicated pathogen is occupied by a related pathogen. A.E. Gorbalenya, E. Wimmer, and colleagues examined the possible evolutionary origin of present-day PV, and that other picornaviruses might occupy the PV niche in the event of its eradication (Jiang et al., 2007) . Their phylogenetic analysis suggests that PV could originate from a C-cluster coxsackie A virus through amino acid substitutions in the capsid that led to a change of receptor specificity (other cases are discussed in Chapter 4). They generated chimeras of PV and its putative ancestors, and some of them were viable and pathogenic for transgenic mice expressing the PV receptor. The authors suggest that in a world without anti-PV neutralizing antibodies, coxsackieviruses may mutate to generate a new PV-like agent.

Thus, despite virology being a very recent scientific activity, there is ample evidence of emergence of new viral pathogens, as well as cases of extinctions due to human interventions, and possible extinctions by natural influences. Viruses may evolve with regard to the symptoms they inflict upon their hosts. An increase of severity of Dengue virus infection has been observed in some world areas, consisting of neurological manifestations in patients with dengue fever or dengue hemorrhagic fever (Cam et al., 2001) , among other examples of human and veterinary viral diseases. The dynamics of extinction of mutant viruses and their replacement by other forms is a continuous process, as the cycles of birthdeath for any organism, but in a highly accelerated fashion.

We now turn to the pressing problem of the emergence and reemergence of viral disease.

New human viral pathogens emerge or reemerge at a rate of about one per year, representing an important concern for public health. Emergence is defined as the appearance of a new pathogen for a host, while reemergence often refers to the reappearance of a viral pathogen, following a period of absence. Being a popular topic, the reader will find numerous books and reviews on the subject. It is worth emphasizing that in the twentieth century many authors took the lead in emphasizing the problem of viral emergences, and the need to investigate the underlying mechanisms, notably S.S. Morse and J. Lederberg [see several chapters of Morse (1993 Morse ( , 1994 ]. Given the adaptive capacity of viruses, in particular the RNA viruses, the reader will certainly suspect that genetic variation of viruses must be one of the factors involved in viral emergences. Indeed, most of the high-impact new viral diseases recorded recently or historically are due to RNA viruses. A statement by J. Lederberg reflects our vulnerability in the face of the nearly unlimited potential of viruses to vary: "Abundant sources of genetic variation exist for viruses to learn new tricks, not necessarily confined to what happens routinely or even frequently" (Lederberg, 1993) . The situation is even more complex because genetic variation of viruses is only one of many ingredients that promote the introduction of new viral pathogens in the human population. A report issued by US Institute of Medicine in 2003 analyzed and documented 13 factors that individually or in combination participate in the emergence of microbial disease. They include a number of sociological, environmental, and ecological influences that act to promote the emergence and reemergence of viruses, bacteria, fungi, and protozoa (Smolinski et al., 2003 ) (Box 7.2).

Here we will deal briefly with those factors of viral emergence related to the virus and host population numbers, in line with the focus of this book. Other aspects have been covered elsewhere (Antia et al., 2003; Haagmans et al., 2009; Wang and Crameri, 2014; Lipkin and Anthony, 2015 ; among others). The emergence of a viral disease can be regarded as a consequence of virus adaptation to a new environment, therefore, involving the concepts and mechanisms dissected in previous chapters. In particular, a relevant parameter is the variation of viral fitness in different environments (Domingo, 2010; Wargo and Kurath, 2011) .

Fitness can directly or indirectly impact any of the three steps involved in viral disease emergence or reemergence, which can be summarized as follows:

• Introduction of virus into a new host species. • Establishment of the virus in the new host. • Dissemination of the virus among individuals of the new host species to produce outbreaks, epidemics, or pandemics.

For the introduction and establishment steps, replicative fitness is critical while for the dissemination step, epidemiological fitness plays the major role (Chapter 5).

Two population numbers are key for the establishment step: the number of viral particles shed by the infected donor host, and the number of potential new hosts that come into contact with the infected donor. We are now aware that even if two viral populations shed by an infected host have an identical number of infectious particles, not all mutant spectra might have the genomes subpopulations to permit the establishment in the human host (Figure 7.9 ). There is a natural lottery regarding which quasispecies subpopulations will hit which host. In the words of J.J. Holland Points summarized from Smolinski et al. (2003) .

themselves will not really be new, but rather mutated and rearranged to allow infection of new hosts, or to cause new disease patterns. It is important to remember that every quasispecies genome swarm in a infected individual is unique and 'new' in the sense that no identical population of RNA genomes has ever existed before and none such will ever exist again" (Holland et al., 1992) . The higher the number of viral particles shed by an infected host, the higher the probability of transmission to susceptible hosts (Section 7.2), and also of producing an emergence in a new host species. Viral population numbers and the number of transmissible particles can be largely amplified in immunocompromised individuals. Such individuals are termed super-spreaders, and can contribute large amounts of variant viruses to the transmission lottery (Rocha et al., 1991; Paunio et al., 1998; Gavrilin et al., 2000; Khetsuriani et al., 2003; Small et al., 2006; Odoom et al., 2008) . Concerning the recipient hosts, the higher the number of potentially susceptible hosts that come into contact with an infected donor, the higher the probability of establishment of an emergent infection. It is likely that the advent of agricultural practices some 10,000 years ago, combined with increased contacts between humans and animals, inaugurated a time of new viral emergences. In the new scenario, viruses could shift from a persistent (low interhost transmission) mode into an acute (high interhost transmission) infection mode.

Not only population numbers are important, the connections between the spatial habitats of potential donor and recipient hosts are also highly relevant (Figure 7 .10). As correctly emphasized by S.S. Morse, changes in viral traffic may allow viruses to come near potential new hosts that had never been encountered before. Several sociological and ecological factors that can impact directly or indirectly the accessibility to an infected donor play a role. A typical example that connects several of the points listed in Box 7.2 is provided by the increase of arbovirus vectors during a specially humid season Relevance of virus population size and mutant spectrum composition in the zoonotic transmission of a virus. Only a subset of the genomes that surface an infected pig may be able to establish an infection in humans. The scheme indicates that a single genome that reached a human was not adequate to establish infection (pathway A). When multiple genomes reached the human (pathways B and C), only those that included a subset that displayed a minimum fitness in humans were able to initiate an infection and expand in the new host (pathway B). For pathways A and B, events are as if the contact between donor and recipient host had not taken place (arrows with cross). See text for implications.

due to the climate change because insect larvae can proliferate on water reservoirs. Increased travel may put humans infected with arboviruses in contact with the flourishing insect vector population. Climate change may modify the migration routes of some birds, again putting these potential vertebrate hosts in contact with infected animals and insect vectors.

Other points listed in Box 7.2 are worth commenting: close human-to-human contacts are favored by urbanization. In 1975, there were five megacities in the world (meaning cities with more than 10 million human population) while at the time of writing this book the number of megacities exceeds 20. Humans in close contact are, in addition, highly mobile. At present it is possible to go around the world in about 36 h (if you choose the adequate airports…) which represents a 1000-fold increase in spatial mobility of humans relative to the mobility in the year 1800. The 2014-2015 Ebola epidemics in Africa was made worse by the breakdown of public health measures, poverty, and lack of political will of local and international agencies to put efforts in stopping transmission. Underdeveloped countries are a reservoir of viral infections that represent a global threat due to several of the points listed in Box 7.2 [ (Smolinski et al., 2003) ; several chapters of Singh (2014) ].

Concerning the establishment and dissemination steps, the molecular mechanisms of quasispecies optimization in the new host environment apply. The underlying events are those presided by the extended Darwinian concepts of variation, competition, and selection, with the perturbations derived from stochastic effects (treated in different chapters of this book).

Types of habitats that may limit or facilitate interaction between hosts that can potentially establish an emergent infection in a local habitat. In separate habitats contacts are restricted while in overlapping habitats contacts are facilitated. In most cases, habitats cannot be reduced to the standard extremes, and are multicomponent habitats with various degrees of complexity. See text for implications for viral emergences.

The meanings of complexity in virology were discussed in Section 3.9 of Chapter 3, one of them being the inability to explain a whole as the sum of its parts (Solé and Goodwin, 2000) . R.V. Solé and B. Goodwin define the sciences of complexity as "the study of those systems in which there is no simple and predictable relationship between levels, between the properties of parts and of wholes." Several levels of complexity can be identified in the events that give to the emergence of a viral disease (Domingo, 2010; Sáiz et al., 2014) . One level of complexity concerns the behavior of viral populations: behavior is often determined by interactions among components of mutant spectra in a way that cannot be predicted by the individual components of the population, even if we knew them!

The second level of complexity that can have an impact in the emergence of viral disease stems from the environmental, sociological, and ecological variables that must converge for a virus from some animal reservoir to come into contact and successfully infect a new host, for example, a human. Despite close surveillance, emergences of viral disease are unpredictable. Experts expect new influenza pandemics to arise somewhere in Asia from the avian reservoirs of IV, yet in 2009 the new influenza pandemic originated in Central America. Paradoxically, despite a general agreement that surveillance of human and zoonotic virus reservoirs should be intensified using new molecular tools (i.e., NGS to go beyond the consensus sequences), the reality is that what we have learnt are the reasons why emergences are unpredictable. The "abundant sources of genetic variation" that was emphasized by J. Lederberg (see Section 7.7.1) should be extended to refer to "abundant sources of complexity in viral emergences." For the time being we have to be ready to react once the emergence has already occurred.

Viruses have survived because they have undergone multiple rounds of vertical and horizontal transmission in their host organisms, and because occasionally they have found new suitable hosts where to replicate. Among the many parameters involved, in this chapter we have emphasized the relevance of virus and host population numbers for sustained transmissions and the long-term maintenance of viral entities. A point that is often either ignored or not sufficiently emphasized is that the quasispecies nature of viral populations introduces an element of uncertainty regarding which types of mutants are transmitted to new hosts. Despite being a complication that cannot be easily handled, it is a fact that should stimulate new approaches to the surveillance of virus transmission and the identification of the founder viruses in new infections.

A steady accumulation of mutations during evolution in the field was a proposal that agreed with the neutral theory of molecular evolution developed last century. One suspects that this agreement resulted in a premature preference for a regular clock of steady incorporation of mutations to work in the case of viruses. The evidence, however, is that there are multiple molecular mechanisms that render the operation of a molecular clock for viruses very unlikely, and perhaps fortuitous in some cases. Several possible mechanisms of variable evolutionary rates have been discussed, and further clarification is expected from entire genome sequencing applied to viruses during outbreaks and epidemics. Viruses are probably not the best biological systems to obtain experimental evidence in support of the clock hypothesis.

A puzzling and pendent issue in viral evolution is the interpretation of the widely different number of viral serotypes, despite viruses sharing comparably large mutation rates and frequencies. Different possibilities have been examined, and a slight preference for variable constraints acting on the amino acid residues that determine the antigenic properties of viruses has been expressed. Again, additional work is necessary to solve this interesting problem.

Procedures for sequence alignments and the establishment of phylogenetic relationships among related viruses have been briefly summarized, with some indications to find useful URL sites. The comparison of genomic sequences (and encoded amino acids) of new viral isolates with those of the viruses characterized to date is important given the increasing number of new viruses discovered in natural habitats.

The important problem of viral emergences and reemergences has been treated with emphasis on the concept of complexity. There are multiple interacting influences that converge to produce the emergence or reemergence of a viral pathogen, one of them being the heterogeneity of viral populations at the genetic and phenotypic level. Despite considerable methodological progress we are still in the realm of uncertainty regarding prediction of when and where a new viral pathogen will emerge (see Summary Box).

• Long-term evolution of viruses is the result of a history of virus transmission among hosts.

Basic principles of transmission dynamics must take into consideration sampling effects and the inherent heterogeneity of viral populations. • Rates of evolution of viruses in nature are extremely high as compared to the estimated rate for their host organisms. Contrary to some tenets of neutral evolution, rates of viral evolution are not constant with time. In particular, several mechanisms explain why intrahost virus evolution is faster than interhost evolution. • Several procedures for sequence alignments and derivation of phylogenetic trees allow a partial description of virus diversification in nature. • Antigenic diversification of viruses is subjected to constraints that differ among viruses. Some viruses have a single serotype while others have 100 serotypes. Several possible mechanisms may contribute to this difference. • The emergence and reemergence of new viral pathogens is a multifactorial event with a clear influence of host and virus population numbers. Several levels of complexity participate in the emergence of a new pathogen, rendering the event highly unpredictable.

The molecular clock of neutral evolution can be accelerated or slowed by asymmetric spatial structure

Estimating the reproduction number of Ebola virus (EBOV) during the 2014 outbreak in West Africa

Infectious Diseases of Humans

The role of evolution in the emergence of infectious diseases

Paleovirology and virally derived immunity

What does structure tell us about virus evolution?

Characterization of Mengo virus neutralization epitopes

A theory of modular evolution for bacteriophages

A modular theory of virus evolution

Surveillance and control of virus diseases: Europe, Middle East and Indian sub-continent

Noncumulative sequence changes in the hemagglutinin genes of influenza C virus isolates

Prospective case-control study of encephalopathy in children with dengue hemorrhagic fever

Evolutionary trace residues in noroviruses: importance in receptor binding, antigenicity, virion assembly, and strain diversity

Insertion/deletion frequencies match those of point mutations in the hypervariable regions of the simian immunodeficiency virus surface envelope gene

Quasispecies: concepts and implications for virology

Virus evolution

Mechanisms of viral emergence

Genetic variability and antigenic diversity of foot-and-mouth disease virus

Quasispecies and RNA Virus Evolution: Principles and Consequences

Split decomposition: a technique to analyze viral evolution

History of poliomyelitis and poliomyelitis research

Steps towards life

Poliovirus neutralization epitopes: analysis and localization with neutralizing monoclonal antibodies

Emerging foot-and-mouth disease virus variants with antigenically critical amino acid substitutions predicted by model studies using reference viruses

Using time-structured data to estimate evolutionary rates of double-stranded DNA viruses

Diseases of the Nervous System in Infancy

Predicting functional divergence in protein evolution by site-specific rate shifts

Evolution of circulating wild poliovirus and of vaccine-derived poliovirus in an immunodeficient patient: a unifying model

Cloning and DNA sequence of double-stranded copies of haemagglutinin genes from H2 and H3 strains elucidates antigenic shift and drift in human influenza virus

Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid

Genomic fossils calibrate the long-term evolution of hepadnaviruses

Rates of evolution of the retroviral oncogene of Moloney murine sarcoma virus and of its cellular homologues

Evolutionary processes in influenza viruses: divergence, rapid evolution, and stasis

The application of genomics to emerging zoonotic viral diseases

Phylogenetic Trees Made Easy: A How-To Manual for Molecular Biologists

A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes

Perspectives on the basic reproductive ratio

Human immunodeficiency virus type 1 env evolves toward ancestral states upon transmission to a new host

Mutation frequencies at defined single codon sites in vesicular stomatitis virus and poliovirus can be increased only slightly by chemical mutagenesis

RNA virus populations as quasispecies

Comparative studies of RNA virus evolution

The Evolution and Emergence of RNA Viruses. Oxford Series in Ecology and Evolution

Bayesian estimation of positively selected sites

Bayesian inference of phylogeny and its impact on evolutionary biology

Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis

Evidence for emergence of diverse polioviruses from C-cluster coxsackie A viruses and implications for global poliovirus eradication

Hepatitis B virus genetic variability and evolution

Persistence of vaccine-derived polioviruses among immunodeficient persons with vaccine-associated paralytic poliomyelitis

Overview of taxonomy

Virus genome dynamics under different propagation pressures: reconstruction of whole genome haplotypes of west Nile viruses from NGS data

The structure and antigenicity of a type C foot-and-mouth disease virus

Viruses and humankind: intracellular symbiosis and evolutionary competition

HIV evolution: CTL escape mutation and reversion after transmission

New insights into the evolutionary rate of hepatitis B virus at different biological scales

Virus hunting. Virology 479-480C

New insights into the evolutionary rate of HIV-1 at the within-host and epidemiological levels

RDP2: recombination detection and analysis from sequence alignments

Fitness alteration of foot-and-mouth disease virus mutants: measurement of adaptability of viral quasispecies

Two mechanisms of antigenic diversification of foot-and-mouth disease virus

Evolution of the capsid protein genes of foot-and-mouth disease virus: antigenic variation without accumulation of amino acid substitutions over six decades

Antibody recognition of picornaviruses and escape from neutralization: a structural view

Extensive antigenic heterogeneity of foot-and-mouth disease virus of serotype C

A single amino acid substitution affects multiple overlapping epitopes in the major antigenic site of foot-and-mouth disease virus of serotype C

The molecular evolutionary history of the herpesviruses

Clonality of HTLV-2 in natural infection

Vertical transmission of viruses

Location and primary structure of a major antigenic site for poliovirus neutralization

Antigenic structure of polioviruses of serotypes 1, 2 and 3

Emerging Viruses

The Evolutionary Biology of Viruses

Mims' Pathogenesis of Infectious Disease

T-Coffee: a novel method for fast and accurate multiple sequence alignment

Evolutionary Dynamics

Virus Dynamics. Mathematical Principles of Immunology and Virology

Changes in population dynamics during longterm evolution of sabin type 1 poliovirus in an immunodeficient patient

Molecular Evolution. A Phylogenetic Approach

The origins of new pandemic viruses: the acquisition of new host ranges by canine parvovirus and influenza A viruses

Explosive school-based measles outbreak: intense exposure may have resulted in high risk, even among revaccinees

Virus evolution and genetic diversity of hantaviruses and their rodent hosts

Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences

MODELTEST: testing the model of DNA substitution

Automatic comparison and classification of protein structures

Previously transmitted HIV-1 strains are preferentially selected during subsequent sexual transmissions

Antigenic and genetic variation in influenza A (H1N1) virus isolates recovered from a persistently infected immunodeficient child

MrBayes 3: Bayesian phylogenetic inference under mixed models

The canyon hypothesis. Hiding the host cell receptor attachment site on a viral surface from immune surveillance

Chemical basis of antigenic variation in foot-and-mouth disease virus

Molecular and evolutionary mechanisms of viral emergence

Quantitative analysis of genomic polymorphism of herpes simplex virus type 1 strains from six countries: studies of molecular evolution and molecular epidemiology of the virus

The phylogeny handbook. A Practical Approach to DNA and Protein Phylogeny

Evolutionary rate and genetic heterogeneity of human T-cell lymphotropic virus type II (HTLV-II) using isolates from European injecting drug users

Phylodynamic analysis of human immunodeficiency virus type 1 in distinct brain compartments provides a model for the neuropathogenesis of AIDS

Spontaneous mutation rate of measles virus: direct estimation based on mutations conferring monoclonal antibody resistance

Quantitative aspects of the spread of foot-and-mouth disease

Factors affecting the geographical distribution and spread of virus diseases of food animals

Evaluating the evidence of virus/host co-evolution

Use of monoclonal antibodies to identify four neutralization immunogens on a common cold picornavirus, human rhinovirus 14

Classification of hepatitis C virus into six major genotypes and a series of subtypes by phylogenetic analysis of the NS-5 region

Viral Infections and Global Change

Super-spreaders and the rate of transmission of the SARS virus

The mutation rate and variability of eukaryotic viruses: an analytical review

Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: updated criteria and genotype assignment web resource

Microbial Threats to Health, Emergence, Detection and Response

Foot-and-Mouth Disease: Current Perspectives

Fixation of mutations in the viral genome during an outbreak of foot-and-mouth disease: heterogeneity and rate variations

Signs of life. How Complexity Pervades Biology

An integrated model to predict the atmospheric spread of foot-and-mouth disease virus

Neutralization escape mutants define a dominant immunogenic neutralization site on hepatitis A virus

Antibody-selected variation and reversion in Sindbis virus neutralization epitopes

Maximum-likelihood methods for phylogeny estimation

Ancient co-speciation of simian foamy viruses and primates

Viruses and the Evolution of Life

Fixation of mutations at the VP1 gene of foot-and-mouth disease virus. Can quasispecies define a transient molecular clock

Emerging zoonotic viral diseases

Distinct hepatitis B virus dynamics in the immunotolerant and early immunoclearance phases

In vivo fitness associated with high virulence in a vertebrate virus is a complex trait regulated by host entry, replication, and shedding

Antigenic variation in influenza viruses

Sequence variation in the gene for the immunogenic capsid protein VP1 of foot-and-mouth disease virus type A

Antigenic variants of rabies virus

Biological and biomedical implications of the co-evolution of pathogens and their hosts

A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria

DAMBE: software package for data analysis in molecular biology and evolution

Distinct cellular receptor interactions in poliovirus and rhinoviruses

Computational Molecular Evolution

Codon-substitution models for heterogeneous selection pressure at amino acid sites

Evolution of RNA viruses