key: cord-0035157-8djm8mi6 authors: Carrington, Christine V. F. title: Viral Genomics: Implications for the Understanding and Control of Emerging Viral Diseases date: 2011-12-28 journal: Genomics Applications for the Developing World DOI: 10.1007/978-1-4614-2182-5_7 sha: cfe29880cb5fec8bb792b61371b58d83b850cbf2 doc_id: 35157 cord_uid: 8djm8mi6 In recent decades, many infectious diseases have significantly increased in incidence and/or geographic range, in some cases impacting heavily on human, animal or plant populations. Some of these ‘emerging infectious diseases’ are associated with pathogens that have appeared in populations for the first time as a result of cross-species transmission (e.g. human immunodeficiency virus—acquired immunodeficiency syndrome (HIV-AIDS), severe acute respiratory syndrome (SARS)), while others were previously known but are rapidly increasing in incidence or geographic range as a result of underlying epidemiological changes (e.g. multi-drug resistant Staphylococcus aureus (MRSA) infection, dengue, West Nile encephalitis, foot and mouth disease, cassava mosaic disease). The latter include prominent diseases as tuberculosis, malaria and yellow fever that were once on the decline but are now ‘re-emerging diseases’. factors such as deforestation, habitat fragmentation, urbanisation and modern agricultural practices provide increased opportunities for human interaction with infected reservoirs and vectors, and the existence of rapid global transport networks, and high-density human and animal populations facilitate the spread of pathogens at an unprecedented rate, often over very large distances Morens et al. 2004 ; Taylor et al. 2001 ; Weiss and McMichael 2004 ; Woolhouse and Gaunt 2007 ; Woolhouse 2002 ; Woolhouse et al. 2005 ) . Although emerging and re-emerging diseases are associated with all types of microbes, viruses (and in particular RNA viruses) predominate (Taylor et al. 2001 ) . This is considered a consequence of their large population sizes and capacity for rapid evolutionary change (Woolhouse 2002 ) which together can produce large pools of phenotypic variants including viruses with altered virulence, transmissibility or host range that have increased epidemic potential in their original hosts or are able to jump species boundaries and establish themselves in new hosts. Emergence that leads to successful host switching may be classifi ed into three stages: (1) initial single infection of a new host with no onward transmission (i.e. spillovers into 'dead-end' hosts), (2) spillovers that go on to cause local chains of transmission in the new host population before epidemic fade-out (i.e. outbreaks) and (3) epidemic or sustained endemic host-to-host disease transmission in the new host population (Parrish et al. 2008 ) . In human populations, the majority of emerging diseases are caused by viruses that originate in wildlife populations and spill over into humans either directly or via domestic animals (Taylor et al. 2001 ) . Wolfe et al. ( 2007 ) defi ned fi ve stages through which animal only (i.e. stage 1) viruses progress to become human only (stage 5) viruses such as measles, smallpox and mumps (see Table 7 .1 ). For the majority of emerging viruses, humans apparently represent dead-end hosts and only a few proceed to stage 3 and beyond to achieve human-human transmission and cause epidemics or sustained endemic transmission (Parrish et al. 2008 ; Wolfe et al. 2007 ) . Nonetheless when this does occur, the impact in terms of morbidity, mortality and economic costs can be immense, as has been well demonstrated by the emergence of HIV, SARS coronavirus and H1N1 infl uenza. In terms of understanding and eventually controlling viral disease emergence, the challenge lies in identifying and quantifying the factors that determine which viruses may make the species jump and whether a new disease will progress to epidemic stage or not. At the other end of the spectrum, there is the ever-present challenge of developing effective therapies and vaccines against rapidly evolving viral pathogens. In this regard, emerging viruses, the nature and extent of their diversity, their evolutionary processes and disease mechanisms need to be fully characterised and understood. Viruses were the fi rst organisms to have their genomes completely sequenced (Fiers et al. 1976 ) , and because of their small size, this could be done relatively quickly and cheaply even prior to the advent of 'next-generation' sequencing technologies. There is no doubt, however, that the latter has opened the fl oodgates since viral genomes can now be generated at lower cost and much more rapidly than was possible using conventional sequencing approaches. The number of viral genomes available in public databases continues to increase exponentially. This wealth of data has led to signifi cant progress in terms of rapid identifi cation and characterisation of emerging viruses, as well as knowledge about their biodiversity and evolution. In terms of evolutionary biology, the beauty of working in viral genomics lies in the ability to study evolutionary changes on the same time scales as the events that shape them. For several viruses, historical samples are available for retrospective study, and their analysis has contributed to our understanding of viral evolutionary and epidemiological factors/events accompanying their emergence, maintenance and spatial diffusion (reviewed in (Pybus and Rambaut 2009 ) ). In addition to enabling an exploitation of existing virus collections, the new sequencing technologies and accompanying bioinformatic tools provide the potential for comprehensive tracking of viral evolution and population dynamics in real time. Unfortunately, much less progress has been made in areas that impact directly on virus control and treatment (Holmes 2009 ) . This is largely a consequence of the lack of appropriate clinical and epidemiological data to accompany the wealth of sequences (Holmes 2009 ) . The other challenge that cannot be ignored is the ability of current computational approaches to deal with the huge volume of sequence data being generated. In this chapter, I discuss how viral genomics has contributed to our understanding of each of the stages of viral emergence and how it might contribute to disease prevention and control in the future. Although disease emergence in other species can be of equal importance and ultimately impacts on human development, for the purpose of brevity, I concentrate primarily on diseases that have emerged in human populations and draw examples from those that have most deeply affected the developing world. While there is no apparent relationship between the tendency for new human pathogens to be reported and a country's geographic location or level of development (Woolhouse and Gaunt 2007 ) , inadequate public health surveillance and response systems in developing countries coupled with the existence of underlying disease conditions have meant that disease burden is usually greater in developing than developed countries (as well illustrated by the recent H1N1 pandemic, (Archer et al. 2009 ) ). Additionally, prevention and control strategies that are effective in more developed countries often fall short in resource-limited settings, which can then act as pockets of refuge where pathogens persist, and may serve as future source populations for outbreaks in other regions. It has been suggested that preventing viral disease emergence in human populations begins with a systematic survey of viral diversity in animal populations (Wolfe et al. 2007 ) . Such knowledge would enable identifi cation of animal populations harbouring viruses that have previously infected humans or that are likely to do so by virtue of their relatedness to known human pathogens, or perhaps their ability to infect human cell lines . Zoonotic viruses generally cause little or no apparent disease in their original hosts; thus animal reservoirs are often not obvious. Since it is clearly impossible to survey all animal species, the focus of animal surveillance should be on species that are more likely to harbour potentially emergent viruses, for example species with large and/or dense populations, and in particular those that live in close proximity to and are more closely related to humans and their domestic mammals, such as rodents, bats and birds . Non-human primate populations (regardless of their size or population density) are also worth surveying because of their close evolutionary relationship with humans and the fact that a number of important human pathogens have emerged from them (e.g. dengue virus (DENV), chikungunya virus (CHIKV), yellow fever virus (YFV), human T-cell leukaemia virus (HTLV) and HIV). Finally, any other species having direct (e.g. bushmeat, livestock) or indirect contact (e.g. vector-mediated contact) with humans that could have led to human infections in the past should also be included (Wolfe et al. 2007 ) . Traditional approaches to virus discovery such as electron microscopy, cell culture, animal inoculation studies and serology (Storch 2007 ) have a number of limitations, the most important being that not all viruses can be cultured in the laboratory (Amann et al. 1995 ) . There are now a range of sensitive molecular approaches to virus discovery that circumvent this problem by relying on detection and characterisation of viral genomes rather than targeting viral particles, antigens or their cytopathic effects (reviewed in Bexfi eld 2011 ). These include hybridisation-, PCR-and sequence-based approaches that have varying levels of reliance on sequence information from known pathogens and thus differ in terms of the range of pathogens they would be expected to detect. For example, hybridisation-based techniques (such as microarray (Wang et al. 2002 ) and subtractive hybridisation (Lisitsyn et al. 1993 ) ) require sequence information from known pathogens to detect related pathogens and are unable to detect completely novel virus families. Likewise, PCRbased approaches using degenerate primers are limited to amplifi cation and detection of related viruses. However, there are also sequence independent PCR approaches that facilitate detection of completely novel pathogens. These include sequence-independent single primer amplifi cation (SISPA), degenerate oligonucleotide primed PCR, random PCR and rolling circle amplifi cation (reviewed in Bexfi eld 2011 ). When these approaches are coupled with 'next-generation' sequencing technology (Margulies et al. 2005 ) such as 454 pyrosequencing (Roche), Illumina (Solexa) and SoLiD™ (Applied Biosystems) for defi nitive identifi cation of amplifi ed fragments, they very effi ciently generate large amounts of sequence data that can then be analysed using bioinformatic tools. Next-generation sequencing also obviates the need for amplifi cation prior to sequencing and has opened the fi eld of metagenomics, i.e. the culture-independent study of microbial, communities in environmental or biological samples by analysing the sample's nucleotide content. First applied to environmental samples such as sea water (Angly et al. 2006 ; Breitbart et al. 2002 ; Williamson et al. 2008 ) , fresh water (Breitbart et al. 2009 ; Djikeng et al. 2009 ) , soil (Fierer et al. 2007 ) and marine sediments (Breitbart et al. 2004 ) , this approach has now been used to defi ne the 'microbiomes' of a range of biological samples including human nasopharyngeal swabs (Bogaert et al. 2011 ) , termite gut (Hongoh 2011 ) and cow rumen (Hess et al. 2011 ) . It has also been adapted to specifi cally target viral metagenomes or 'viromes', by enriching samples for intact virions and then treating with nucleases to remove nonvirion particle protected (naked) DNA and RNA (Djikeng et al. 2008 ) . In terms of targeting potential reservoirs or vectors for emerging diseases, studies have been performed on faecal, oral, urine and tissue samples from bats (Donaldson et al. 2010 ; Li et al. 2010a ) , insect pools (Victoria et al. 2008 ) , chimpanzee and farm animals (Li et al. 2010b ) . The metagenomic approach has also been used for the identifi cation and characterization of 2009 pandemic H1N1 infl uenza A virus from nasopharyngeal swabs (Greninger et al. 2010 ) , to study previously 'uncharacterisable' viruses that have been isolated through culture (Victoria et al. 2008 ) , to explore within host diversity of HIV and SIV (Bimber et al. 2010 ) and in comparative studies to identify viruses found in diseased versus healthy tissues from a variety of species (Blomström et al. 2010 ; Ng et al. 2009a ; Ng et al. 2009b ; Willner et al. 2009 ) . However, one important limitation of this approach to detecting novel viruses is that the protocol currently used to enrich samples for viruses prior to sequencing includes a fi ltration step designed to exclude cells, cell debris and bacteria, which may also exclude very large viruses ('giant viruses') such as mimiviruses. Also, nuclease treatment eliminates the genomes of any viruses whose integrity has been disrupted by the enrichment process, and depending on the titre of remaining intact virions, these may not be effi ciently sequenced (Djikeng et al. 2008 ) . One intriguing new approach to virus discovery that is worth noting in terms of its ability to characterise viral diversity in insects (which can be important viral vectors) is 'virus discovery in invertebrates by deep sequencing and assembly of total small RNAs' or vdSAR (Kreuze et al. 2009 ; Wu et al. 2010 ) . This approach involves deep sequencing of viral small interfering RNAs (vsiRNA) produced by host immune machinery in response to infection. vsiRNAs are produced by cutting up viral genomes, so piecing their sequences together recovers the virus sequence. In addition to being a sequence independent approach, the process is expected to be more effi cient since only a small proportion of host small RNAs need to be sequenced and data-mined (Wu et al. 2010 ) . Additionally, since vdSAR assembles viral genomes from the products of an active host immune response to infection, only replicating and infectious viruses that induce the immune response are identifi ed by this approach (Wu et al. 2010 ) . In addition to facilitating surveys of animal reservoirs and vectors, all of the techniques described above (with the exception of vdSAR) may be used to rapidly detect and characterise newly emerged viruses in human populations. This is usually the primary research focus when an apparently new infectious disease fi rst appears, as it facilitates the development of screening tests for early detection and epidemiological investigations aimed at identifying risk groups, reservoirs and possible transmission routes. Such information can then be used to inform control and prevention strategies, including the development of vaccines and antiviral therapies. The role that viral genomics can play in this regard was well demonstrated during the emergence of SARS, the fi rst cases of which appeared in November 2002 in southern China. In March 2003, traditional cell culture resulted in the isolation of a novel virus from patient specimens (Drosten et al. 2003 ; Ksiazek et al. 2003 ; Peiris et al. 2003 ) . Within days of this, the virus was identifi ed as a coronavirus through the use of a pan viral microarray and confi rmed by sequencing using two parallel approaches. The fi rst involved designing primers based on known coronaviruses and amplifying regions of the novel virus, and in the second, viral sequences were directly recovered from the surface of the microarray to which they were hybridised, cloned and sequenced without the need to design specifi c primers (Wang et al. 2003 ) . Comparison with previously characterised coronavirus strains demonstrated that the virus identifi ed was distinct from all known human pathogens (Wang et al. 2003 ) . Thus within 24 h, an unknown virus was identifi ed as a coronavirus and within days partial genome sequences had been generated. Comparative genomics and evolutionary analyses also played the major role in pinpointing bats as the source of the precursor to the SARS virus and the primary reservoirs for SARS-like coronaviruses (Dominguez et al. 2007 ; Gloza-Rausch et al. 2008 ; Lau et al. 2005 ; Poon et al. 2005 ; Tang et al. 2006 ; Tong 2009 ; Woo et al. 2006 ; Carrington et al. 2008 ) . As sequencing costs continue to fall and computing capacity improves, metagenomic approaches to virus detection and characterisation will no doubt become more and more routine aspects of public health activities. Researchers have demonstrated the potential utility of high-throughput pyrosequencing for the detection of viruses in human clinical specimens such as stool (Nakamura et al. 2009 ) , nasopharyngeal swabs (Bogaert et al. 2011 ; Nakamura et al. 2009 ) , autopsy-derived liver and kidney tissues (Palacios et al. 2008 ) and serum (Briese et al. 2009 ) . This includes identifi cation of novel viruses associated with high mortality outbreaks of unknown aetiology (Briese et al. 2009 ) and in tissues from individuals who died following organ transplantation from the same donor (Palacios et al. 2008 ) . Others have demonstrated the potential usefulness of metagenomic sequencing in fi eld surveillance for arboviruses by applying the technique to mosquitoes experimentally infected with dengue virus (Bishop-Lilly et al. 2010 ) . It has even been suggested that metagenomic sequencing may be used for continual surveillance of large human populations for known and unknown viral pathogens (Anderson et al. 2003 ) . The suggestion is that large pooled samples of human serum and plasma (possibly discarded specimens from diagnostic laboratories) could be enriched for viral particles and then subjected to metagenomic sequencing on a routine basis. Such large-scale continual surveillance could allow identifi cation of viruses that have entered the human population even before the usual detection thresholds (which would normally depend on several people being infected) have been reached. According to the authors, this approach could be used to 'monitor the levels of known viruses, rapidly detect outbreaks and systematically discover novel or variant human viruses' (Anderson et al. 2003 ) . Evidence suggests that transmission of viruses from animal reservoirs to humans is not uncommon (Hahn et al. 2000 ; Wolfe et al. 2005 ; Wolfe et al. 2004 ) . However, in the majority of cases, humans are dead-end hosts or even when they are not, the zoonotic virus cannot be sustained in prolonged transmission chains such that outbreaks are small and die out quickly. The barriers to onward transmission are primarily biological (Woolhouse and Gaunt 2007 ) . For example, tissue tropism or viral titres achieved might not allow for effi cient human-to-human transmission, or transmission might be restricted by reliance on a vector that does not commonly interact with humans or in which the virus does not achieve high enough titres to effi ciently infect humans. In an apparent minority of cases, viruses surmount these barriers and can be maintained in the human population and may even lose their ability to replicate in the animal species they originated from. The evolutionary events that enable cross-species transmission and subsequent adaptation to the new host are poorly understood. However, they are more likely to be the result of viral rather than human evolutionary changes since the time scale of human evolution is so much longer than the time frame implied by the frequency with which these events occur Schliekelman et al. 2001 ) . Studying viral evolution and comparative genomics applied to viruses before and after a transition, or to phylogenetically related human-animal pathogen pairs, can help us to understand the changes involved in adaptation to humans and other aspects of successful emergence. This type of approach, coupled with In vitro and in vivo studies, was used to identify a single amino acid change in the envelope glycoprotein that is responsible for enzootic strains of Venezuelan encephalitis virus (VEEV) gaining the ability to cause epidemics of neurological and potentially fatal disease in horses, with humans as spill-over hosts (Anishchenko et al. 2006 ) . VEEV, an arbovirus belonging to the genus Alphavirus , is usually maintained in an enzootic rodentmosquito-rodent cycle. An amino acid change (Thr → Arg) at position 213 in the E2 glycoprotein confers the ability to cause high titre viraemia in horses, whereas the wild type is either unable to replicate in horses or does so at very low titres (Anishchenko et al. 2006 ) . Likewise, the dramatic emergence of the CHIKV (another mosquito-borne alphavirus) in Asia has been linked to a single amino acid change in the envelope 1 glycoprotein (E1-A226V) of the Indian Ocean lineage responsible (de Lamballerie et al. 2008 ; Hapuarachchi et al. 2010 ; Kumar et al. 2008 ; Ng et al. 2009c ; Sam et al. 2009 ; Schuffenecker et al. 2006 ) . This change results in increased infectivity and transmissibility by Aedes albopictus (Tsetsarkin et al. 2007 ; Vazeille et al. 2007 ) , previously considered as only a secondary vector in human-mosquito-human cycles (urban epidemic cycles), which typically involve Ae. aegypti . While these fi ndings in VEEV and CHIKV provide proof of concept, they are both unusual in that only one amino acid change resulted in adaptation to a new host/vector. This may be because in both cases, the viruses already had the ability to infect the 'new' host, albeit ineffi ciently. In the case of viruses entering a new species for the fi rst time, the scenario is expected to be much more complicated. This may be why mutations associated with emergence remain unknown for other zoonoses including intensely studied viruses like HIV. Also, more recent work on CHIKV has shown that the effect of the E1-A226V mutation is lineage specifi c, working only in the IOL genomic background, with endemic Asian CHIKV strains requiring a second mutation (E1-98T) to become Ae. albopictus adapted (Tsetsarkin et al. 2011 ) . Next-generation sequencing technology allows for rapid and comprehensive surveys of the extent and nature of viral diversity within and amongst animal reservoir hosts, vectors and human populations. This would provide a basis for investigating the fi tness distribution and relevance of mutations produced. The latter, coupled with good ecological, epidemiological, immunological and experimental data from In vitro and in vivo systems, is crucial if we are to understand the mechanisms involved in adaptation. Phylogenetic inference may be used to reconstruct the demographic history of a population from molecular sequences sampled from the population (Drummond et al. 2005 ) . The approach is based on a population genetic model known as the coalescent which describes the relationship between the shape of the genealogical tree of sampled sequences and the demographic history of the population from which they were sampled (i.e. rates of population growth and decline, extent of population subdivision and patterns of migration) (Kingman 1982 ; Griffi ths and Tavare 1994 ) . In the case of RNA viruses, exploiting this link between population dynamics and molecular evolution, i.e. exploring their 'phylodynamics', (Holmes 2009 ; Grenfell et al. 2004 ) is particularly attractive since their high mutation rates, short generation times and large populations sizes can result in signifi cant genetic differences between sequences sampled within years, months or even days of each other. Additionally, the relatively short time frames involved mean that evolutionary and demographic events may be temporally aligned with the immunological, transmission and ecological events that shaped them. Given date and location stamped sequences, and depending on the nature and spatiotemporal resolution of the sampling, it is then possible to estimate when and where a given epidemic began or particular lineages arose, the order and timing of transmission events, the timing of changes in population growth rates, and the pattern and rate of virus movement between geographic regions, epidemiological risk groups, individuals and even tissues within an individual (reviewed in (Pybus and Rambaut 2009 ) ). All very pertinent given that ecological and immunological rather than genetic factors are thought to be the main determinants of viral emergence (Holmes 2006 ) . One of the potential pitfalls of this approach is that inferences are based on estimated genealogies that have been derived with a level of uncertainty as the reconstructed genealogy is in fact only one of many that can be derived from the data. While it may be the best estimate, the true genealogy is rarely, if ever, known with absolute certainty. One solution is to account for this uncertainty by using probabilistic models to estimate parameters over many, many plausible genealogies, thereby providing a more rigorous statistical framework. The most commonly used model is the Bayesian skyline plot (Drummond et al. 2005 ) incorporated into the BEAST software package (Drummond and Rambaut 2007 ) . This approach uses a Markov chain Monte Carlo (MCMC) sampling procedure to derive a distribution of trees from which a distribution of population size estimates is determined at intervals going back to the most recent common ancestor of the gene sequences (Drummond and Rambaut 2007 ; Drummond et al. 2002 ) . The result is a plot of the estimated effective population size over time with credibility intervals that represent both phylogenetic and coalescent uncertainty (see Boxes 7.1 and 7.2 ). BEAST also jointly estimates substitution rates and divergence times (i.e. times to the most recent common ancestors of individual lineages and the genealogy as a whole) with credibility intervals and provides the option of using relaxed molecular clock models that allow All four DENV serotypes currently circulate in the Americas. DENV-4 was fi rst reported in 1981 and identifi ed as subtype II originating from Asia (Carrington et al. 2005 ; Lanciotti et al. 1997 ). In the same year, an Asian strain of DENV-2, distinct from the previously existing American subtype, was also reported (Deubel et al. 1986 ; Lewis et al. 1993 ; Twiddy et al. 2002 ) . Figure 7 .1 shows molecular clock phylogenies (top panel) and skyline plots (centre panel) estimated for the invading strains using sequences derived from DENV isolated from several countries in the Caribbean, South and Central America over about 20 years. Both skyline plots describe rapid exponential growth then maintenance of genetic diversity across epidemic peaks and troughs as estimated by the number of countries reporting DENV-2/-4 each year (bottom panel). This is likely a result of population subdivision, which is refl ected, for example, in the clustering of sequences from mainland (red) and island (blue) countries. Therefore, in this case, genetic diversity cannot be reliably interpreted as proportional to population size. The faster initial increase in genetic diversity for DENV-4 compared to DENV-2 may refl ect the immunological landscape, in that there was no herd immunity to DENV-4 in 1981, but another subtype of DENV-2 had already been circulating for many decades. The dates that the most recent common ancestors of each subtype existed are indicated by arrows along the x-axis of the skyline plot. They pre-date the fi rst epidemiological reports of each virus by about a year. This suggests that viruses remained undetected until the number of infections or disease incidence reached a detection threshold, which might be quite high given the inadequate surveillance in many countries. The y axes of the skyline plots represent relative genetic diversity, which is equal to the product of effective population size and generation length in the absence of population structure. For both viruses, the maximum a posteriori tree is presented on the same time scale as the skyline plot, with tip times corresponding to sampling times. The thick black lines are the median estimates, and the areas between the 95% CIs are shaded grey. Isolates on the trees are identifi ed by their country of origin; mainland countries are labelled in red and islands in blue, and the tips of the phylogenies correspond to their sampling times. The numbers of countries reporting DENV-2 and DENV-4 activity in each year are summarised in the histograms shown. In the case of DENV-2, this represents the activities of both subtype III and V, which are not distinguished in epidemiological reports (Figure reproduced All four DENV serotypes have existed in Puerto Rico (PR) since the 1970s causing regular and increasingly severe outbreaks (Bennett et al. 2010 ) . Figure 7 .2 shows (A) a skyline plot inferred from DENV4 sequences (~4,000 nucleotides) derived from viruses isolated in PR between 1981 and 1998. The pattern of cyclic epidemics described by the overlaid census data (number of confi rmed dengue cases) strongly correlates with the estimates of effective population size (derived using the Bayesian coalescent framework in the software package BEAST), with a 7-month lag (increases in population size precede outbreaks) that is adjusted for in Figure B . The reason for the observed time lag is unclear. It may be that increased diversity provides the variation from which more fi t, epidemic-causing strains are more likely to arise. Then during epidemics, diversity might be lost due to selection. Alternatively the discrepancy may be due to non-random and biased sampling in case counts and isolations (e.g. epidemiologic surveillance) (Bennett et al. 2010 ) . for substitution rate variation across lineages in a tree (i.e. it does not assume a molecular clock) (Drummond et al. 2006 ) . Several models that assume a particular pattern of population growth (e.g. exponential growth, constant population size) are also available for comparison (Drummond and Rambaut 2007 ) . Boxes 7.1 and 7.2 describe results from two studies in which the demographic histories of dengue viruses were reconstructed from molecular sequences using the skyline plot in BEAST (Bennett et al. 2010 ; Carrington et al. 2005 ) . The BEAST programme was also recently extended to allow for inference, visualisation and hypothesis testing of phylogeographic history (Lemey et al. 2009 ) . In the fi rst model implemented, the geographic locations from which sequences were derived are considered as discrete states (Lemey et al. 2009 ) . The spatial diffusion of the virus is then reconstructed using the coalescent approach to infer when and where direct ancestors of the sampled sequences existed. Different scenarios and models of spatial diffusion can be investigated and compared by specifying different prior distributions for the diffusion rates amongst the sampling locations (Lemey et al. 2009 ; Auguste et al. 2010 ; Talbi et al. 2010 ; Allicock et al. 2012 ) . Phylogeographic inferences may be summarised using virtual globe software (Google Earth) such that spread over time may be visualised as an interactive animation. Examples of virtual globe projections demonstrating the diffusion dynamics through time are available online at http://www.phylogeography.org . The above-mentioned discrete model, however, requires the assumption that at any point along the phylogeny, the samples existed in one of the sampled locations. To address this limitation, a more realistic 'continuous trait' model that allows for diffusion over a continuous landscape was recently implemented ) . Box 7.3 illustrates the spatial spread of rabies virus amongst racoons in North America reconstructed using this model ) . In addition to those shown in , there are numerous other examples where a 'phylodynamic' approach to viral evolutionary analysis has been successfully applied. They include, for example, the reconstruction of the origin and global dissemination of HIV-1 (Gilbert et al. 2007 ; Korber et al. 2000 ; Vidal et al. 2000 ; Zhu et al. 1998 ) , reconstruction of the spread of rabies virus in North Africa with an investigation of factors underlying the patterns observed (Talbi et al. 2010 ) , inference of YFV and DENV spatial diffusion in the Americas (Auguste et al. 2010 ; Allicock et al. 2012 ) and investigation of the mechanism by which the YFV is maintained between epidemics (Auguste et al. 2010 ) and elucidation of the role of natural selection and global migration in infl uenza A epidemic patterns (Nelson et al. 2007 ; Rambaut et al. 2008 ; Russell et al. 2008 ) . Although this approach cannot replace good epidemiological data, it complements traditional epidemiological approaches and provides insights into the evolutionary dynamics underlying epidemic behaviour. The ability of the approach to recover information not available in census data (e.g. in the analysis described in Box 7.3 , the geographic area where the raccoon rabies virus is estimated to have spread by 1973 includes the location where the fi rst raccoon rabies case was reported in 1977 even though the data did not include a sequence for this case ) may be particularly useful in regions that have fl awed monitoring and surveillance systems, such as in the developing world. The spatiotemporal dynamics of rabies virus in North America was reconstructed from rabies virus nucleotide sequences using the Bayesian phylogeographic framework in the BEAST software package (Drummond and Rambaut 2007 ; Lemey et al. 2009 ; Lemey et al. 2010 ) . The data set consisted of 47 rabies virus genomic fragments (of 2,811 nucleotides in length) sampled from a 30-year epidemic (Biek et al. 2007 ) . Figure 7 .3 shows snapshots of the dispersal pattern at different time points as illustrated by a projection of the inferred rabies virus phylogeny onto a map. The shaded regions represent the uncertainty about the locations of the rabies. Interestingly the area contained in 1973 diffusion pattern includes the location of the fi rst raccoon rabies case reported in 1977 (green circle) even though the data do not include a sequence for this case. The changing tempo of the diffusion over time can be observed using the interactive animated visualisation available at www.phylogeography.org . Fig. 7 .3 Spatiotemporal dynamics of the rabies epidemic amongst North American raccoons. We provide snapshots of the dispersal pattern for August 1973 August , 1983 August , 1993 August and 2003 . Lines represent MCC phylogeny branches projected on the surface. The uncertainty on the location of raccoon rabies is represented by transparent polygons. These 80% highest posterior density regions are obtained by contouring a time-slice of the posterior phylogeny distribution and imputing the location on each branch in each phylogeny using the precision matrix parameters for the respective sample. The white-red colour gradient informs the relative age of the dispersal pattern (older recent). A green circle marks Pendleton County, WV, where the epizootic's fi rst case was reported in 1977. The maps are based on satellite pictures made available in Google Earth ( http://earth.google.com ). A dynamic visualisation of the spatiotemporal reconstruction can be explored at http://www.phylogeography.org/ The rapid viral evolution that facilitates species jumps and emergence also underlies viruses' ability to escape our immune systems and often presents a challenge in terms of developing effective vaccines and antiviral therapies. As described above, analysis of genomic data can provide valuable insights into virus evolution and epidemiology. The plethora of genomic sequence data being generated and the availability of rapid high-throughput sequencing technology therefore represent valuable resources that have already impacted on the way vaccine and therapeutic development is approached. In particular, they present the opportunity to understand the scope and distribution of the genomic diversity that must be tackled for a given virus and facilitate monitoring of the spatiotemporal dynamics of this biodiversity, thereby underpinning reverse vaccinology, pan-genomic and comparative genomic approaches to identifying vaccine and/or drug targets (Seib et al. 2009 ) (see Box 7.4 ) . Genomic approaches are also expected to accelerate the identifi cation of genetic and other molecular markers of prognostic and therapeutic relevance, such as markers of disease severity and drug resistance. Access to genomic data also enables researchers to go beyond genomics to transcriptomics, proteomics and other 'omics' approaches to studying emerging viruses. However, despite the immense potential, with a few exceptions such as the use of pyrosequencing to screen for mutations associated with antiviral resistance in infl uenza (Deyde et al. 2009 ; Deyde et al. 2010 ; Bright et al. 2005 ; Deng et al. 2011 ; Dharan et al. 2009 ; Duwe and Schweiger 2008 ; Hurt et al. 2009 ; Lackenby et al. 2008 ) and resources such as the Stanford HIV drug resistance database ( http:// hivdb.stanford.edu/index.html ), genomic developments of direct relevance to clinical care have been slow in coming (Holmes 2009 ) . This is likely to be a consequence of the fact that genomic data are not often associated with data on clinical manifestations and host immunological responses that would enable them to be fully exploited (Holmes 2009 ) . Notable exceptions are the aforementioned Stanford Reverse vaccinology: Bioinformatic approaches are used to screen whole pathogen genomes for genes that are potential vaccine and drug targets by virtue of the predicted function and other attributes of the proteins they encode. Pan-genomics: Multiple genomes from a given pathogen are analysed in order to identify conserved antigens/targets that would ensure that vaccines or therapies based on them are effective against the full spectrum of pathogen diversity. Comparative genomics: Genomes from pathogenic and non-pathogenic strains of a given pathogen are analysed in order to identify antigens/targets associated with disease. HIV drug resistance database and the Los Alamos HIV databases ( http://www.hiv. lanl.gov/content/index ), and more recently, large-scale whole genome sequencing projects such as the Broad Institute's Genome Resources in Dengue Consortium (GRID) project ( http://www.broadinstitute.org/annotation/viral/Dengue/ ) and the infl uenza genome sequencing projects (IGSP) by The Institute for Genomic Research (TIGR) ( http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html ) have sought to incorporate these and other metadata. The BROAD dengue sequencing initiative, for example, aims to sequence over 3,500 dengue genomes tagged with information on geographic origin and disease severity (i.e. whether the disease outcome is dengue fever (DF) or the more severe, life-threatening dengue haemorrhagic fever (DHF) and dengue shock syndrome (DSS)) in an attempt to determine the impact of introduced strains versus indigenous evolution on disease outcomes, understand genomic correlates of disease severity and provide a map of genomic distributions with reference to DF, DHF and DSS ( http://www.broadinstitute.org/annotation/viral/Dengue/projects.html ). Dengue sequence diversity within individual patients with well-characterised disease outcomes, and for whom time courses for viraemia and status as primary or secondary infections are available, will also be investigated, in order to determine how intra-host diversity drives viraemia and disease and how it correlates with disease severity and primary versus secondary infection. At the time of writing, the GRID project, which was initiated in 2005, had sequenced 2,372 dengue genomes, IGSP (also launched in the same year) had generated over 3,400 of approximately 7,400 planned genome sequences and there were tens of thousands of HIV sequences available in the Los Alamos database of which 2,788 were HIV1 complete genomes. The current and potential impact of these and other dengue, HIV and infl uenza sequencing initiatives is well reviewed in Holmes 2009 . For dengue, in addition to the previously detailed insights into evolution and epidemiology, analyses suggest that some genotypes differ in virulence and/or fi tness (Armstrong and Rico-Hesse 2001 ; Bennett et al. 2003 ; Cologna et al. 2005 ; Cologna and Rico-Hesse 2003 ; Klungthong et al. 2004 ; Leitmeyer et al. 1999 ; Rico-Hesse et al. 1997 ; Sittisombut et al. 1997 ; Thu et al. 2004 ; Wittke et al. 2002 ; Zhang et al. 2005 ) and that immune-mediated natural selection may determine which genotypes survive (Adams et al. 2006 ) . Thus the fi tness of a given genotype may vary with the changing immunological landscape, which has major implications for vaccine development since tetravalent vaccines designed to induce immunity to all four DENV serotypes are unlikely to provide complete crossprotection (Whitehead et al. 2007 ) . For infl uenza virus, analysis of IGSP data has already altered basic concepts of infl uenza virus evolution and shed light on the evolution of drug resistance, identifi ed important source and sink populations and provided data on genomic diversity that will improve and accelerate the process of choosing which strains to incorporate into annual vaccines (reviewed in (Holmes 2009 ) ). HIV is perhaps the greatest disappointment in terms of our inability to arrive at a vaccine despite a wealth of genomic data on the virus. In this regard, the major lesson learned from viral genomics is that HIV is immensely diverse both within and between individual hosts ) and vaccines are likely to have to be location/population specifi c and require regular updating (Holmes 2009 ) . The ability to generate viral genomes increasingly, rapidly and cheaply and the development of bioinformatic tools for analysing these data have transformed the study of emerging viruses. Metagenomic sequencing and evolutionary analyses will soon become routine diagnostic and surveillance tools, allowing us to detect and visualise viral emergence and spatiotemporal dynamics in real time. In addition to enabling rapid responses in terms of development of pathogen-specifi c screening tests, identifi cation of source populations and disease tracking, this will facilitate generation of hypotheses about evolutionary mechanisms and ecological factors underlying the patterns observed. However, despite the immense potential, addressing prevention and control issues of more direct clinical relevance such as the development of vaccines and therapeutics will only be possible if genomic data are accompanied by relevant clinical, immunological, phenotypic, host genomic and epidemiological data, with biological measures from In vitro and in vivo experimental studies incorporated as they arise. The development and maintenance of widely accessible and fl exible genomic databases is therefore key in this regard. Furthermore, if we are to avoid the limitations of past efforts, it is essential that data from across the clinical spectrum be included so that the all too common bias towards symptomatic and/or severe cases is avoided. An ideal database would also include viral genomic and corresponding metadata from animal reservoir and/or vector populations, particularly if our goal is to predict future viral emergence. In addition to traditional sources, these data might be derived from programmes and early warning systems such as the Global Viral Forecasting Initiative ( http://www.gvfi .org/ ), USAID PREDICT ( http://www.vetmed. ucdavis.edu/ohi/predict/index.cfm ) and the WHO/FAO/OIE Global Early Warning and Response System (GLEWS; http://www.glews.net/ ), which focus on identifi cation and control of potentially emergent pathogens through surveillance at the animalhuman interface. This is a tall order in terms of the level of coordination and collaboration required to bring all of these data together-public health practitioners, fi eld epidemiologists, clinicians, veterinarians and researchers would all have to work together. More important, however, is the computational challenge. There is no shortage of good ideas, but many of the analyses involved are very computationally intensive, and this is already a limiting factor. Bioinformatic and computational tools will therefore have to further evolve to handle the amounts of genomic and other metadata generated. Given our level of globalisation and population mobility (which is only going to increase), it is also essential that all affected geographic regions and populations be represented in these efforts. In addition to providing a complete picture of viral biodiversity against the full span of existing host genomic backgrounds, this will ensure that needs are addressed where the burden of disease is often greatest. It will also reduce the number of surveillance and control 'blind spots' where viruses might take refuge and eventually re-emerge. It is therefore essential that developing countries be fully integrated into the genomic age, through collaboration, technology transfer and in-country capacity building. The availability of open source databases, computational tools and scientifi c literature also goes a long way in this regard. Cross-protective immunity can account for the alternating epidemic pattern of dengue virus serotypes circulating in Bangkok Phylogeography and population dynamics of Dengue Viruses in the Americas Phylogenetic identifi cation and in situ detection of individual microbial cells without cultivation Global screening for human viral pathogens The marine viromes of four oceanic regions Venezuelan encephalitis emergence mediated by a phylogenetically predicted viral mutation Interim report on pandemic H1N1 infl uenza virus infections in South Africa. Epidemiology and Factors associated with fatal cases Differential susceptibility of Aedes aegypti to infection by the American and Southeast Asian genotypes of dengue type 2 virus Yellow fever virus maintenance in Trinidad and Its dispersal throughout the Americas † Selection-driven evolution of emergent dengue virus Epidemic dynamics revealed in dengue evolution Metagenomics and the molecular identifi cation of novel viruses A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus Whole-genome characterization of human and simian immunodefi ciency virus intrahost diversity by ultradeep pyrosequencing Arbovirus detection in insect vectors by rapid, high-throughput pyrosequencing Detection of a novel astrovirus in brain tissue of mink suffering from shaking mink syndrome using viral metagenomics Variability and diversity of nasopharyngeal microbiota in children: a metagenomic analysis Genomic analysis of uncultured marine viral communities Diversity and population structure of a near-shore marine-sediment viral community Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Cienegas Genetic detection and characterization of Lujo virus, a new hemorrhagic fever-associated arenavirus from southern Africa Incidence of adamantane resistance among infl uenza A (H3N2) viruses isolated worldwide from 1994 to 2005: a cause for concern Invasion and maintenance of dengue virus type 2 and type 4 in the Americas Detection and phylogenetic analysis of group 1 coronaviruses in South American bats American genotype structures decrease dengue virus output from human monocytes and dendritic cells Selection for virulent dengue viruses occurs in humans and mosquitoes Chikungunya virus adapts to tiger mosquito via evolutionary convergence: a sign of things to come A comparison of pyrosequencing and neuraminidase inhibition assays for the detection of oseltamivir-resistant pandemic infl uenza A(H1N1) 2009 viruses Nucleotide sequence and deduced amino acid sequence of the structural proteins of dengue type 2 virus, Jamaica genotype Pyrosequencing as a tool to detect molecular markers of resistance to neuraminidase inhibitors in seasonal infl uenza A viruses Detection of molecular markers of drug resistance in 2009 pandemic infl uenza A (H1N1) viruses by pyrosequencing Outbreak of antiviral drug-resistant infl uenza a in long-term care facility Viral genome sequencing by random priming methods Metagenomic analysis of RNA viruses in a fresh water lake Detection of group 1 coronaviruses in bats in North America Metagenomic analysis of the viromes of three North American bat species: viral diversity among different bat species that share a common habitat Identifi cation of a novel coronavirus in patients with severe acute respiratory syndrome BEAST: Bayesian evolutionary analysis by sampling trees Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data Viral Genomics: Implications for the Understanding… Bayesian coalescent inference of past population dynamics from molecular sequences Relaxed phylogenetics and dating with confi dence A new and rapid genotypic assay for the detection of neuraminidase inhibitor resistant infl uenza A viruses of subtype H1N1, H3N2, and H5N1 Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene The emergence of HIV/AIDS in the Americas and beyond Detection and prevalence patterns of group I coronaviruses in bats Unifying the epidemiological and evolutionary dynamics of pathogens A metagenomic analysis of pandemic infl uenza A (2009 H1N1) infection in patients from North America Sampling theory for neutral alleles in a varying environment AIDS as a zoonosis: scientifi c and public health implications Re-emergence of chikungunya virus in South-east Asia: virological evidence from Sri Lanka and Singapore Metagenomic discovery of biomass-degrading genes and genomes from cow rumen The evolution of viral emergence RNA virus genomics: a world of possibilities Viral evolution and the emergence of SARS coronavirus Toward the functional analysis of uncultivable, symbiotic microorganisms in the termite gut Emergence and spread of oseltamivir-resistant A(H1N1) infl uenza viruses in Oceania Global trends in emerging infectious diseases The coalescent The molecular epidemiology of dengue virus serotype 4 in Timing the ancestor of the HIV-1 pandemic strains Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses A novel coronavirus associated with severe acute respiratory syndrome A226V mutation in virus during the 2007 chikungunya outbreak in Kerala, India Rapid quantitation of neuraminidase inhibitor drug resistance in infl uenza virus quasispecies Molecular evolution and phylogeny of dengue-4 viruses Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Dengue virus structural differences that correlate with pathogenesis Bayesian phylogeography fi nds its roots Phylogeography takes a relaxed random walk in continuous space and time Phylogenetic relationships of dengue-2 viruses Bat Guano Virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses Host range, prevalence, and genetic diversity of adenoviruses in bats Cloning the differences between two complex genomes Genome sequencing in microfabricated high-density picolitre reactors The challenge of emerging and re-emerging infectious diseases Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased highthroughput sequencing approach Phylogenetic analysis reveals the global migration of seasonal infl uenza A viruses Discovery of a novel single-stranded DNA virus from a sea turtle fi bropapilloma by using viral metagenomics Novel anellovirus discovered from a mortality event of captive California sea lions Entomologic and virologic investigation of chikungunya A new arenavirus in a cluster of fatal transplant-associated diseases Cross-species virus transmission and the emergence of new epidemic diseases Coronavirus as a possible cause of severe acute respiratory syndrome Identifi cation of a novel coronavirus in bats Evolutionary analysis of the dynamics of viral infectious disease The causes and consequences of HIV evolution The genomic and epidemiological dynamics of human infl uenza A virus Origins of dengue type 2 viruses associated with increased pathogenicity in the Americas The global circulation of seasonal infl uenza A (H3N2) viruses Chikungunya virus of Asian and Central/East African genotypes in Malaysia Natural selection and resistance to HIV Genome microevolution of chikungunya viruses causing the Indian Ocean outbreak The key role of genomics in modern vaccine and drug design for emerging infectious diseases Possible occurrence of a genetic bottleneck in dengue serotype 2 viruses between the 1980 and 1987 epidemic seasons in Diagnostic Virology Phylodynamics and human-mediated dispersal of a zoonotic virus Prevalence and genetic diversity of coronaviruses in bats from China Risk factors for human disease emergence Myanmar dengue outbreak associated with displacement of serotypes 2, 3, and 4 by dengue 1 Detection of novel SARS-like and other coronaviruses in bats from Kenya A single mutation in chikungunya virus affects vector specifi city and epidemic potential Chikungunya virus emergence is constrained in Asia by lineage-specifi c adaptive landscapes Phylogenetic relationships and differential selection pressures among genotypes of dengue-2 virus Two Chikungunya isolates from the outbreak of La Reunion (Indian Ocean) exhibit different patterns of infection in the mosquito, Aedes albopictus Rapid identifi cation of known and new RNA viruses from animal tissues Unprecedented degree of human immunodefi ciency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa Microarraybased detection and genotyping of viral pathogens Viral discovery and sequence recovery using DNA microarrays Social and environmental risk factors in the emergence of infectious diseases Prospects for a dengue virus vaccine Distribution of Mycobacterium ulcerans in buruli ulcer endemic and non-endemic aquatic sites in Ghana Metagenomic analysis of respiratory tract DNA viral communities in cystic fi brosis and non-cystic fi brosis individuals Extinction and rapid emergence of strains of dengue 3 virus during an interepidemic period Naturally acquired simian retrovirus infections in central African hunters Bushmeat hunting, deforestation, and prediction of zoonoses emergence Origins of major human infectious diseases Molecular diversity of coronaviruses in bats Population biology of emerging and re-emerging pathogens Ecological origins of novel human pathogens Emerging pathogens: the epidemiology and evolution of species jumps Virus discovery by deep sequencing and assembly of virus-derived small silencing RNAs Clade replacements in dengue virus serotypes 1 and 3 are associated with changing serotype prevalence An African HIV-1 sequence from 1959 and implications for the origin of the epidemic