key: cord-0929903-any83elf authors: Gutierrez, Bernardo; Escalera-Zamudio, Marina; Pybus, Oliver G title: Parallel molecular evolution and adaptation in viruses date: 2019-01-28 journal: Curr Opin Virol DOI: 10.1016/j.coviro.2018.12.006 sha: f62d64a50f26c6a45fc9f049fcaba788600bedbb doc_id: 929903 cord_uid: any83elf Parallel molecular evolution is the independent evolution of the same genotype or phenotype from distinct ancestors. The simple genomes and rapid evolution of many viruses mean they are useful model systems for studying parallel evolution by natural selection. Parallel adaptation occurs in the context of several viral behaviours, including cross-species transmission, drug resistance, and host immune escape, and its existence suggests that at least some aspects of virus evolution and emergence are repeatable and predictable. We introduce examples of virus parallel evolution and summarise key concepts. We outline the difficulties in detecting parallel adaptation using virus genomes, with a particular focus on phylogenetic and structural approaches, and we discuss future approaches that may improve our understanding of the phenomenon. The processes that drive the cross-species transmission and emergence of viruses in natural systems are diverse and involve a range of ecological, evolutionary, and genetic factors. Amongst the evolutionary mechanisms, molecular adaptation by natural selection is a key phenomenon that entails the generation and spread of beneficial mutations that increase virus fitness in a specific environment. If rapidly evolving virus populations repeatedly experience comparable environmental changes and similar selective pressures, then viruses may exhibit what is known as parallel evolution. The concept of parallel evolution is applied to organisms across the tree of life and can refer to all manners of phenotypes and traits at different levels of biological organisation [1] . Parallel evolution manifests itself in a variety of ways, from the repeated fixation of single point mutations or larger genetic changes (e.g. indels or genome rearrangements) [2] , to the evolution of structural, functional or behavioural phenotypes. The terms 'parallel' and 'convergent' evolution are sometimes given distinct definitions: convergent evolution refers to the independent evolution of similar traits from different ancestral starting points [3] , whereas parallel evolution describes the independent emergence of similar traits from the same state [4] . Here, we use 'parallel' to refer to both situations. Further, we use 'adaptation' to refer to evolutionary change through positive natural selection, and 'evolution' to encompass change via any process (e.g. by random genetic drift). Parallel evolution in viruses may arise from adaptation to new host species [5] or different host demographic structures [6] , evasion of host immune responses [7, 8] or to circumvent anti-viral drugs [9] . These situations involve changes to the virus' environment that generate strong selective pressures, favouring the recurrent evolution of certain genotypes. Here, we summarise the theoretical concepts behind parallel molecular evolution in viruses, and explore some of the methods that can be used to identify such phenomena. We highlight the importance of phylogenetic methods to detect recurring evolutionary patterns generated by natural selection. Viruses are useful model systems for studying parallel evolution and adaptation. Many viruses, particularly RNA viruses, can adapt rapidly due to a combination of high mutation rates, large population sizes, short generation times, and large mutational selection coefficients [10 ] . Further, the small genomes of RNA viruses (and some DNA viruses) may limit the range of genetic solutions available to viruses as they respond to environmental change. In contrast, the larger genomes of DNA viruses, such as Myxoma [11] , may offer more potential genetic routes to the same phenotype. Perhaps the best described example of virus parallel evolution is the development of anti-viral drug resistance by the human immunodeficiency virus (HIV). HIV exhibits all the features of rapid adaptation listed above [12 ] ; consequently, many HIV drug resistance mutations have been identified, which independently and repeatedly arise during chronic infection in different HIV patients [12 ,13,14] . Interestingly, even within a genome as constrained as HIV's, different mutations can confer resistance to specific drugs [15] . HIV also displays rapid and repeatable evolution to host immune responses, as demonstrated by the parallel evolution in different patients of escape mutations to HLA-restricted T-cell responses [7] . Viruses that register frequent cross-species transmission events might also exhibit parallel evolution as a result of adaptation to new host environments [16] . Barriers to virus replication in a new host may include (i) lack suitable receptors for virus cell entry, (ii) innate and adaptive immune responses, (iii) reduced replication efficacy in specific cell types, and (iv) mechanisms of virion release from infected cells [17 ] . For example, HIV-1 groups M, N and O are derived from separate cross-species transmissions to humans of simian immunodeficiency viruses (SIVs) in chimpanzees and gorillas. Following spillover, these groups independently acquired a Met ! Arg amino acid change in the retroviral Gag protein [18] . Under experimental conditions, this mutation increases SIV replication in human lymphoid tissue [19] . Rabies virus (RABV) offers another example: the RABV lineage that infects dogs has changed host species on multiple occasions, causing outbreaks in other Carnivora species. Amongst these events, parallel amino acid changes were observed in two separate RABV zoonoses from dogs to ferret-badgers (a Leu ! Ser mutation in the RABV nucleoprotein and a Lys ! Arg change in the polymerase) [5] . Parallel evolution and adaptation may also target larger genome regions, and sites involved in post-translational modification. For example, the hemagglutinin of H7 subtype avian influenza virus (AIV) can acquire a Nglycosylation site when the virus jumps from wild to domestic birds [20] , and highly pathogenic strains of AIV have emerged independently many times, through the evolution of different polybasic cleavage sites in hemagglutinin [21] . For HIV-1, it is now understood that, after zoonosis, groups M and O independently evolved the ability to antagonise human tetherin (an anti-viral restriction factor), but did so using entirely different retroviral accessory genes [22, 23] . Parallel molecular evolution has been described for many viruses both under laboratory conditions and in natural systems. Table 1 provides an illustrative selection of RNA virus examples. Whilst theoretical studies have focused on the ability of natural selection to increase the probability of parallel evolution [24] , experimental studies have indicated the potential impact of other factors, including epistatic interactions amongst sites and heterogeneity of mutation rates across the genome [25 ,26] . It is commonly hypothesised that natural selection is the key process involved in parallel molecular evolution. Negative selection eliminates many mutations from viral populations because strong functional constraints render most genetic changes disadvantageous. In contrast, positive selection favours the accumulation of mutations that are beneficial to virus replication and transmission. Determining whether a given site or genomic region has Parallel molecular evolution and adaptation in viruses Gutierrez, Escalera-Zamudio and Pybus 91 evolved under natural selection can be achieved by comparing the relative rates of synonymous (d S ) and non-synonymous (d N ) changes, quantified as a d N /d S ratio [27] . Because positive selection can act on specific sites at particular points in time, it is important to explore variation in d N /d S amongst sites and amongst virus lineages. Several methods are available to undertake such calculations [28] [29] [30] . If d N /d S > 1 for a specific codon, then multiple amino acid changes have likely occurred at that codon, and the codon is a candidate for the detection of parallel evolution (see below). However, d N /d S methods have limitations and may fail to detect instances of sites under selection [29] . In particular, some studies report that d N /d S tests have failed to identify sites that appear to have undergone parallel adaptation [5, 31] . The robust identification of parallel evolution and adaptation can be broken down into four steps: (i) detection of parallel changes (usually amino acid changes), (ii) association of those changes with a phenotype or environmental change of interest, (iii) characterisation of the evolutionary forces behind the repeated occurrence of such changes, and (iv) evaluation of the functional effects of mutations in the context of host-pathogen biology. A powerful method for detecting repeated, independent evolution is comparative phylogenetics, which infers the evolutionary history of a given trait (including reconstruction of ancestral states) using parsimony or other phylogenetic models [32] . This approach is typically robust, so long as recombination or horizontal gene transfer (which can mimic parallel evolution) is absent. Steps (ii) and (iii) are less straightforward. It is tempting to detect associations between a mutation and a phenotype by calculating the mutation frequencies in viruses that do or do not exhibit the relevant phenotype. This generates a contingency table of observed counts (Figure 1 ) that is evaluated using a Chi-squared or Fisher's exact test. However, it has long been known in the evolutionary literature that this approach can lead to false positive conclusions, because the counts are not independent observations, but instead are correlated due to shared ancestry [33, 34] . The problem can be resolved by considering the phylogenetic history of the trait in question ( Figure 1 ). Specifically, recurrent acquisition of a mutation in the context of independent changes in phenotype ( Figure 1b) indicates that the mutation is evolutionarily linked to the phenotype. However, if mutation and phenotypic change occur only once in the phylogeny (Figure 1a ) then their association could be due to chance. Bhattacharya et al. [7] referred to the latter situation as the 92 Emerging virus: intraspecies transmission Figure 1 Summary Table Summary Table 8 Different evolutionary scenarios underlying the same apparent association between virus mutation and phenotype. (a) An apparently perfect association between the presence/absence of a mutation (blue/red dots) and a specific phenotype/environment (shading) can arise from a single evolutionary change (red cross). This association is typically represented and tested using a contingency table (bottom). However, each virus sequence state is not an independent observation due to this evolutionary history. (b) The same apparent association can arise through multiple independent mutations (red crosses) linked to change in phenotype/environment. This latter scenario is less likely to occur by chance and provides evidence for parallel adaptation. (c) Experimental evolution studies are often designed to observe changes (red crosses) in variants from a known ancestral strain (blue dot, centre) that is introduced to different environments (shading). 'founder effect' and used a phylogenetically corrected statistical test to demonstrate that previously reported HLA-associated escape mutations in HIV [35] were not statistically robust. Another phenomenon that may cause mutations to appear artefactually linked to a specific phenotype or environment is genetic hitchhiking, whereby an observed mutation does not affect the associated phenotype, but is genetically linked to a mutation that does. Further, the phylogenetic association between a mutation and a phenotype may be weak if the phenotype is polygenic in nature, or depends on an interaction between the viral genotype and the environment. It is useful to place virus experimental evolution studies in the same phylogenetic context (Figure 1c ). Experimental evolution was, for example, used to identify adaptive mutations that increase viral fitness in vesicular stomatitis virus [26] . Such studies often initiate independent virus populations from the same source inoculum. The replicate populations are passaged in different environments and later scanned for shared evolutionary changes. This experimental design is, thus, equivalent to a star-shaped phylogeny with a known ancestor (Figure 1c ). Since the replicate populations are not correlated by shared ancestry, differences amongst them are independent and can be tested using standard statistics. As explained above, phylogenetic analyses of viral sequences may be able to detect parallel evolution, but are insufficient to establish whether a given trait has arisen repeatedly through adaptation. Experimental evidence, from molecular biology or animal challenge studies, is ultimately required to determine the functional consequences of parallel mutations. However, it may be possible to further refine candidate sites for experimental confirmation by undertaking in silico analyses that combine structural and evolutionary information. This is especially true if virus proteins evolve the same structural change (e.g. a change in charge) via a variety of mutational paths. Mapping of virus mutations onto resolved protein structures can provide information about the functional effects of genetic changes, including thermodynamic stability, or interactions with other proteins or biologically important molecules [36 ] . Parallel molecular evolution and adaptation in viruses Gutierrez, Escalera-Zamudio and Pybus 93 For example, a recent study of human influenza B virus genomes observed parallel amino acid changes occurring three times in the HA gene of the Yamagata lineage [38] . Computational structural analyses showed that these were located in a major antigenic epitope [37] . These approaches include structural alignment methods [38] that can compare divergent virus proteins that exhibit little homology at the sequence level, but which share homology at the structural level. Other combined phylogenetic approaches can estimate the effect of specific mutations on protein stability, as demonstrated in a functional analysis of influenza A virus hemagglutinin gene variation [39] . We envisage that structural analyses could form part of a phylogenetically informed workflow for investigating the significance of parallel evolutionary changes (Figure 2) , and for selecting appropriate candidates for experimental validation. The results obtained would also provide insights into the evolvability, mutability and robustness of virus proteins [40] . Other functional constraints, beyond those imposed by protein structure, could also play a role in parallel molecular evolution during emergence or zoonosis. Non-coding viral genetic elements, such as untranslated regions (UTRs) [41] , internal ribosome entry sites (IRES) [42, 43] , or noncoding RNAs [44, 45] , can be functionally important for virus replication and infection. Flaviviruses provide many examples of the importance of RNA structures for viral replication [46] and host adaptation [41] , some of which are proposed to have arisen through parallel evolution [47] ; different functional roles have been inferred for RNA structures in other viruses [48, 49] . Evaluating the effects of natural selection on RNA genome regions encoding for secondary structures represents an additional challenge because (i) our understanding of how such structures evolve is poor, and (ii) it is difficult to define neutrally evolving sites in such regions [50] , which limits the use of d N /d S estimation methods, and our limited knowledge of the degree to which non-coding elements are structurally constrained. Various approaches have been developed to test for selection on non-coding regions [51] and to predict the secondary structure of RNA molecules [52, 53] , but to date their use has been limited in evolutionary analyses of viruses. Further development of these methods could improve our understanding of the role that RNA structures play in virus evolution [54] . Detecting parallel evolution in viruses, and inferring the mechanisms that underlie such changes, can benefit both basic and applied problems. If, in a rapidly evolving virus population, parallel mutations repeatedly arise in the context of the same change in phenotype or environment (i.e. Figure 1b) , then at least some aspects of virus evolution are predictable. Such repeating patterns, at the sequence or structural level, may help to forecast virus emergence and cross-species transmission events in the future. Further, placing these changes within a structural context could be used to triage newly discovered viruses, enabling surveillance and research to focus on those with the greatest predicted propensity for cross-species transmission or emergence. We note, however, that instances of parallel evolution identified by computational means must be interpreted with caution; confirmation of the fitness costs of evolutionary change ultimately depends on experimental validation, as shown for some recent outbreak scenarios [55, 56] . A number of experimental techniques are now available to link virus genotypes to phenotypes, including pseudotyping [57 ] , minigenome systems [58] , viral reverse genetics [59] , and site-directed mutagenesis [60] . These and related approaches are vital to determining the functional role of viral mutations and their effects on virus and host fitness. Current challenges for inferring parallel molecular evolution are twofold. First, to better understand and predict parallel adaptation, and apply it to the specific scenario of virus emergence, we need a model that describes how the phenomenon depends on factors such as mutation rates, population sizes, selection coefficients and genome evolvability. Second, improvements in statistical methods and software are needed to make it easier to test parallel evolution hypotheses in the correct way. Advances have been made in testing specific phenomena associated with parallel evolution, such as coevolution [61] and episodic selection [28, 30] ; however, an integrated inference framework that maximises the accuracy of site identification, and is robust to confounding processes such as recombination and genetic hitchhiking is still required. New high throughput experimental approaches, such as deep mutational scanning, have been applied to virus pathogens and will provide more comprehensive information on the mutability of virus proteins [62 ,63] . Results from similar empirical approaches, combined with the rapid growth of genome sequences for viruses from multiple hosts and ecosystems, will provide a broader evidence base for virus parallel evolution during virus emergence. Parallel genotypic adaptation: when evolution repeats itself Convergent evolution: the need to be explicit Convergent evolution on the molecular level Large-scale phylogenomic analysis reveals the complex evolutionary history of rabies virus in multiple carnivore hosts Myxomatosis in Australia and Europe: a model for emerging infectious diseases Founder effects in the assessment of HIV polymorphisms and HLA allele associations HIV evolution: CTL escape mutation and reversion after transmission HIV-1 drug resistance and resistance testing Assessing the epidemic potential of RNA and DNA viruses This work provides an overview of DNA and RNA viruses with emergence potential, paired with simulated data to assess factors that determine the ability to establish outbreaks Evolutionary history and attenuation of myxoma virus on two continents update of the drug resistance mutations in HIV-1 Summary of currently reported mutations associated with drug resistance in HIV-1, which includes many cases of parallel molecular evolution Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID Multiple, linked human immunodeficiency virus type 1 drug resistance mutations in treatment-experienced patients are missed by standard genotype analysis Current perspectives on HIV-1 antiretroviral drug resistance The evolution and genetics of virus host shifts Pathways to zoonotic spillover This work explores the scenarios and factors that can prompt spillovers from animals to humans and the barriers that a pathogens sorts during zoonotic events Adaptation of HIV-1 to its human host Efficient SIVcpz replication in human lymphoid tissue requires viral matrix protein adaptation Host shifts and molecular evolution of H7 avian influenza virus hemagglutinin Molecular determinants within the surface proteins involved in the pathogenicity of H5N1 influenza viruses in chickens Tetherin-driven adaptation of Vpu and Nef function and the evolution of pandemic and nonpandemic HIV-1 strains Nef proteins of epidemic HIV-1 group O strains antagonize human tetherin The probability of parallel evolution What drives parallel evolution?: How population size and mutational variation contribute to repeated evolution Metanalysis of experimental evolution datasets to analyse the effect of population sizes on the probability of occurrence of parallel molecular evolution Molecular basis of adaptive convergence in experimental populations of RNA viruses The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing Statistical properties of the branch-site test of positive selection Accuracy and power of bayes prediction of amino acid sites under positive selection Detecting individual sites subject to episodic diversifying selection Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection Evolutionary Pathways in Nature: A Phylogenetic Approach The Comparative Method in Evolutionary Biology Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters Evidence of HIV-1 adaptation to HLArestricted immune responses at a population level Proposed integrated framework for mapping the effects of structural variants of proteins in the context of the human genome Genome-wide evolutionary dynamics of influenza B viruses on a global scale Algorithms, applications, and challenges of protein structure alignment Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin Stability effects of mutations and protein evolvability RNA structure duplications and Flavivirus host adaptation Viral IRES RNA structures and ribosome interactions Structural basis for ribosome recruitment and manipulation by a viral IRES RNA Zika virus produces noncoding RNAs using a multi-pseudoknot structure that confounds a cellular exonuclease Viral noncoding RNAs: more surprises Role of RNA structures present at the 3 0 UTR of dengue virus on translation, RNA synthesis, and viral replication Molecular archaeology of Flaviviridae untranslated regions: duplicated RNA structures in the replication enhancer of flaviviruses and pestiviruses emerged via convergent evolution RNA structural constraints in the evolution of the influenza A virus genome NP segment Functional analysis of RNA structures present at the 3 0 extremity of the murine norovirus genome: the variable polypyrimidine tract plays a role in viral virulence Detecting selection in noncoding regions of nucleotide sequences Methods to detect selection on noncoding DNA RNAalifold: improved consensus structure prediction for RNA alignments Pfold: RNA secondary structure prediction using stochastic context-free grammars Structural constraints on RNA virus evolution Ebola virus glycoprotein with increased infectivity dominated the 2013-2016 epidemic Human adaptation of Ebola virus during the West African outbreak Review of the use of pseudotyping methods to investigate emerging viruses such as SARS-CoV Minigenomes, transcription and replication competent virus-like particles and beyond: reverse genetics systems for filoviruses and other negative stranded hemorrhagic fever viruses RNA virus reverse genetics and vaccine design Site-directed mutagenesis Detecting coevolving positions in a molecule: why and how to account for phylogeny Example of the use of deep mutational scanning to determine the mutability of the hemagglutinin in Influenza A viruses Experimental estimation of the effects of all amino-acid mutations to HIV's envelope protein on viral replication in cell culture Evolutionary genomics of host adaptation in vesicular stomatitis virus Adaptive evolution of MERS-CoV to species variation in DPP4 Receptor specificity in human, avian, and equine H2 and H3 influenza virus isolates Transmission of influenza virus in a mammalian host is increased by PB2 amino acids 627K or 627E/701N A single positively selected West Nile viral mutation confers increased virogenesis in American crows The evolutionary pathway to virulence of an RNA virus We thank Michael Golden for discussion and feedback. B.