key: cord-0858074-th1da1bb authors: Gardy, Jennifer L.; Loman, Nicholas J. title: Towards a genomics-informed, real-time, global pathogen surveillance system date: 2017-11-13 journal: Nat Rev Genet DOI: 10.1038/nrg.2017.88 sha: 297af3a9e45c96abf0191b085d895dc60a30019e doc_id: 858074 cord_uid: th1da1bb The recent Ebola and Zika epidemics demonstrate the need for the continuous surveillance, rapid diagnosis and real-time tracking of emerging infectious diseases. Fast, affordable sequencing of pathogen genomes — now a staple of the public health microbiology laboratory in well-resourced settings — can affect each of these areas. Coupling genomic diagnostics and epidemiology to innovative digital disease detection platforms raises the possibility of an open, global, digital pathogen surveillance system. When informed by a One Health approach, in which human, animal and environmental health are considered together, such a genomics-based system has profound potential to improve public health in settings lacking robust laboratory capacity. SUPPLEMENTARY INFORMATION: The online version of this article (doi:10.1038/nrg.2017.88) contains supplementary material, which is available to authorized users. In late 2013 and early 2014, a lethal haemorrhagic fever spread throughout forested Guinea (Guinée forestière), undiagnosed for months. By the time it was reported to be Ebola, the virus had spread to three countries 1 and was likely past the point at which case-level control measures, such as isolation and infection control, could have contained the nascent outbreak. In 2015, a new dengue-like illness was implicated in a dramatic increase in Brazil's microcephaly cases; one year later, analyses revealed that the Zika virus had been sweeping through the Americas, unnoticed by existing surveillance systems, since late 2013 . Although public health surveillance systems have evolved to meet the changing needs of our global popu lation, we continue to dramatically underestimate our vulnerability to pathogens, both old and new 5 . Indeed, the recent events in West Africa and Brazil highlight the gaps in existing infectious disease surveillance systems, particularly when dealing with novel pathogens or pathogens whose geographic range has extended into a new region. Despite the lessons learned from previous outbreaks 6 , such as the severe acute respiratory syndrome (SARS) epidemic in 2002-2003 and the 2009 influenza pandemic -particularly the need for enhanced national surveillance and diagnostic capacity -infectious threats continue to surprise and sometimes overwhelm the global health response. The cost of these epidemics demands that we take action: with fewer than 30,000 cases, the Ebola outbreak ultimately resulted in over 11,000 deaths, left nearly 10,000 children without parents 7 and caused cumulative gross domestic product losses of more than 10% 8 . As with prior crises, in the wake of Ebola, multiple commissions have offered suggestions for essential reforms 8, 9 . Most focus on systems-level change, such as funding research and development or creating a centralized pandemic preparedness and response agency. However, they also call for enhanced molecular diagnostic and surveillance capacity coupled to data-sharing frameworks. This hints at an emerging paradigm for rapid outbreak response, one that employs new tools for pathogen genome sequencing and epidemiological analysis (FIG. 1) and that can be deployed anywhere. In this model, portable, in-country genomic diagnostics are targeted to key settings for routine human, animal and environ mental surveillance or rapidly deployed to a setting with a nascent outbreak. Within our increasingly digital landscape, wherein a clinical sample can be transformed into a stream of data for rapid analysis and dissemination in a matter of hours, we face a tremendous opportunity to more proactively respond to disease events. However, the potential benefits of such a system are not guaranteed, and many obstacles remain. Here, we review recent advances in genomicsinformed outbreak response, including the role of real-time sequencing in both diagnostics and epidemiology. We outline the opportunities for integrating sequencing with the One Health and digital epidemiology fields, and we examine the ethical, legal The systematic collection, analysis and dissemination of health-related data to support planning, implementation and evaluation of public health practices and response. Outbreaks and epidemics are both defined as increases in the number of cases of a particular disease beyond what is expected in a given setting. In outbreaks, the affected settings are smaller geographic regions; epidemics can span larger areas. Clinical metagenomics. With its untargeted approach to sequencing, clinical metagenomics can cross disciplines in a way that clinical microbiology struggles to -identifying viral, bacterial, fungal and other eukaryotic pathogens in a single assay 11 and coupling pathogen detection to pathogen discovery. Given the current high cost of the technique -conservatively estimated at several thousand dollars -it is most often used when dealing with potentially lethal infections that fail the conventional diagnostic paradigm, such as the recent diagnosis of an unusual case of meningoencephalitis caused by the amoeboid parasite Balamuthia man drillaris 12 or the diagnosis and treatment of neuroleptospirosis in a critically unwell teenager 13 . In the latter case, despite a high index of suspicion for infection, Leptospira santa rosai was not detected by culture or PCR, as the diagnostic primer sequences were eventually found to be a poor match to the genome of the pathogen. Intravenous antibiotic therapy resulted in rapid recovery. In such an example, the costs are easily justified, particularly when offset against the cost of a stay in an intensive treatment unit. However, routine diagnostic metagenomics is currently limited to a handful of clinical research laboratories worldwide; it is therefore regarded as a 'test of last resort' and kept in reserve for vexing diagnostic conundrums. Substantial practical challenges hinder the adoption of metagenomics for diagnostics (FIG. 2) (reviewed in depth in REF. 11) . Chief among these is analytic sensitivity, which depends on pathogen factors (for example, genome size, ease of lysis and life cycle); analytic factors (for example, the completeness of reference databases and the potential to mistake a target for a close genetic relative); and sample factors (for example, pathogen abundance within a sample and contaminating background DNA). As an example of a problematic sample, during Zika surveillance, attempts to perform un targeted metagenomics sequencing on blood yielded few, or in some cases zero, reads owing to low viral titres 14 . Targetenrichment technologies (reviewed in REF. 15 ) such as bait probes can be employed, but even these were unsuccessful at recovering whole Zika genomes, necessitating PCR enrichment 14 . In addition to sensitivity, universal pathogen detection through clinical metagenomics is complicated by specificity issues arising from misclassification or contaminated reagents, the challenge of reproducing results from a complex clinical workflow, nucleic acid stability under varying assay conditions, ever-changing bioinformatics workflows and cost. Given these issues, could metagenomics replace conventional microbiological and molecular tests for infection? Recent studies have used metagenomics in common presentations, including sepsis 16 , pneumonia 17 , urinary tract infections 18 and eye infections 19 . These have generally yielded promising results, albeit typically at a lower sensitivity than conventional tests and at a much greater cost. Despite these problems, two factors will drive sequencing to eventually become routine clinical practice. First, the ever-decreasing cost of sequencing coupled with the potential for cost savings achieved by using a single diagnostic modality versus tens or hundreds of different diagnostic assays -each potentially requiring specific instrumentation, reagents, validation and labour -is attractive from a laboratory operations perspective. Second, and perhaps most compelling, is the additional information afforded by genomics, including the ability to predict virulence or drug resistance phenotypes, the ability to detect polymicrobial infections and phylogenetic reconstruction for outbreak analysis. Novel technologies: portable sequencing. Given that outbreaks of emerging infectious diseases (EIDs) most often occur in settings with minimal laboratory capacity, where routine culture and bench-top sequencing are simply not feasible, the need for a portable diagnostic platform capable of in situ clinical metagenomics and outbreak surveillance is evident. A trend towards smaller and less expensive bench-top sequencing instruments was seen with the 454 Genome Sequencer Junior system (which has since been discontinued), the Ion Torrent Personal Genome Machine (PGM) system and the Illumina MiSeq system, which were released in close succession 20 . Each of these instruments costs <$150,000 and puts NGS capability into the hands of smaller laboratories, including clinical settings. In 2014, the MinION from Oxford Nanopore Technologies was released to early access users 21 , heralding the potential Nature Reviews | Genetics Outbreak Response Portable genome sequencing Digital epidemiology One Health Figure 1 | A genomics-informed surveillance and outbreak response model. Portable genome sequencing technology and digital epidemiology platforms form the foundation for both real-time pathogen and disease surveillance systems and outbreak response efforts, all of which exist within the One Health context, in which surveillance, outbreak detection and response span the human, animal and environmental health domains. The event through which a pathogen is transferred from one entity to another. Transmission can be person-to-person, as in the case of Ebola, vector-to-person, as with Zika, or environment-to-person via routes including food, water and contact with a contaminated object or surface. The use of genome sequencing to understand infectious disease transmission and epidemiology. See FIG. 3. for highly portable 'lab-in-a-suitcase' sequencing. The MinION is pocket-sized and is controlled and powered through a laptop USB connection. It is provided under a model whereby the hardware is free but the consumer pays a premium for the reagent and flow cell consumables. Compared with bench-top instruments, the absence of a rolling service contract or regular engineer visits makes it theoretically possible to scale this platform out to potentially unlimited numbers of labora tories. Importantly, the MinION has been used in field situations, including in diagnostic tent labora tories during the Ebola epidemic 22, 23 and in a roving busbased mobile laboratory in Brazil as part of the ZiBRA project 3, 24 . Others have taken the MinION to more extreme environments where even the smallest traditional bench-top sequencer could not go, including the Arctic 25 and Antarctic 26 , a deep mine 27 and zero gravity aboard the reduced-gravity aircraft (nicknamed the 'Vomit Comet') 28 and the International Space Station 29 . However, this technology is not yet a panacea; remaining challenges include high DNA or RNA input requirements (currently hundreds of nanograms), which often necessitate PCR-based amplification approaches; a flow cell cost of $500, keeping the cost per sample high despite multiplexing approaches; and high error rates, which require that genomes are sequenced to high coverage for single nucleotide polymorphism-based analysis and analysed at the signal level. Moreover, although the long reads produced by the MinION overcome a number of challenges in assembling eukaryotic microbial pathogen genomes, such as the presence of discrete chromosomes or long repetitive regions, the upstream nucleic acid extraction steps required to obtain genomic DNA vary across microbial domains and might necessitate reagents and equipment far less portable than the MinION. From transmission to epidemic dynamics. Genomics is capable of informing not just pathogen diagnostics but also epidemiology. Pathogen sequencing has been used for decades to understand transmission in viral outbreaks, from early studies of hantavirus in the United States of America 30 to human immunodeficiency virus (HIV) in the United Kingdom 31 ; more recently, the approach has been successfully extended to include bacterial pathogens (reviewed in REF. 32) and has come to be known as genomic epidemiology, a term encompassing everything from population dynamics to the reconstruction of individual transmission events within outbreaks 32 . Most transmission-focused investigations to date have been retrospective, with only a subset unfolding in real time, as cases are diagnosed [33] [34] [35] [36] [37] . In transmission-focused investigations, genetic variants are used to identify person-to-person transmission Figure 2 | Challenges to in-field clinical metagenomics for rapid diagnosis and outbreak response. A mobile medical unit deploying a portable clinical metagenomics platform has been established at the epicentre of an infectious disease outbreak, but the team faces challenges throughout the diagnostic process and epidemiological response. For example, in the case of Zika virus, samples, such as blood, with low viral titres, a small genome of <11 kb and transient viraemia 120 combine to complicate detection of viral nucleic acid by use of a strictly metagenomic approach. Furthermore, obtaining a sufficient amount of viral nucleic acids for genome sequencing beyond simple diagnostics requires a tiling PCR and amplicon sequencing approach 14 . Other challenges include, for example, access to a reliable Internet connection, the ability to collect sample metadata and translating genomic findings into real-time, actionable recommendations. The average number of secondary cases of an infectious disease produced by a single infectious case, given a completely susceptible population. A term describing infectious diseases that typically exist in an animal reservoir but that can be transmitted to humans. The transmission of an infectious disease, such as Ebola, from a survivor of that disease who has recovered from their symptoms. A term describing infectious diseases that are transmitted to humans through contact with a non-human species, particularly those diseases spread through insect bites. An example is the Zika virus, which is carried by mosquitos. Geographical settings where a variety of factors converge to create the social and environmental conditions that promote disease transmission. The process by which an infectious disease changes from existing exclusively in animals to being able to infect, then transmit between, humans. See FIG. 4 . events (FIG. 3) , either through manual interpretation of the variants shared between outbreak cases 38 or via modelbased approaches 39 , with the result being a transmission network. Epidemic investigations are very different -only a subset of the epidemic cases are sequenced. Thus, the goal is to use the population structure of the pathogen to understand the overall dynamics of the epidemic. Here, phylodynamic approaches are used to infer epidemiological parameters of interest. First conceptualized in 2004 by Grenfell et al. as a union of "immunodynamics, epidemiology, and evolutionary biology" (REF. 40 ), phylodynamics captures both epidemiological and evolutionary information from measurably evolving pathogens -those viruses and bacteria for which high mutation rates and/or a range of sampling dates contribute to a meaningful amount of genetic variation between sequences 41,42 -in other words, enough genetic diversity to be able to infer an evolutionary history for a pathogen of interest, even if that history is only over the short time frame of an outbreak or epidemic. This is possible for most pathogens, particularly single-stranded DNA viruses, RNA viruses and many bacterial species 42, 43 , but there are certain species for which the lack of a strict molecular clock and/or frequent recombination complicate both phylodynamics studies and attempts to infer transmission events 42 . Phylodynamics relies on tools such as Bayesian evolutionary analysis sampling trees (BEAST) 44 , in which sequence data are used to build a time-labelled phylogenetic tree using a specific evolutionary process as a guide -often variations on a theme of coalescent theory 45 . From the tree, one can infer epidemiological parameters, including the basic reproductive number R 0 (REF. 46 ). While the insights that can be gained from genomic data alone are exciting, the utility of phylodynamic approaches is greatly extended when additional data are integrated into the models (reviewed in REF. 47 ). Genomic epidemiology in action: Ebola. The many genomic epidemiology studies from the Ebola outbreak (reviewed in REF. 48 ) used bench-top and portable sequencing platforms to reveal outbreak-level events and epidemic-level trends. Real-time analyses published around the peak of the epidemic suggested the following: the outbreak probably arose from a single introduction into humans and not repeated zoonotic introductions 49, 50 ; sexual transmission had a previously unrecognized role in maintaining transmission chains 51 ; and survivor transmission -another un recognized phenomenon -contributed to disease flare-ups later in the outbreak 52 . The first sequencing efforts, all of which had an effect on the epidemiological response in real time, unfolded months into the epidemic. Had they been deployed earlier, we can only speculate as to their potential impact. Arguably, the most compelling use of early sequencing would have been to provide a definitive Ebola diagnosis in this previously unaffected region of West Africa. However, even after the outbreak was underway, sequencing could have benefited the public health response. For example, ruling out bush meat as a source of repeated viral introductions could have changed public health messaging campaigns from avoiding bush meat to the importance of hygiene and safe funeral practices 53 , potentially averting some cases. Portable sequencing and phylodynamic approaches are currently being deployed in the ongoing Zika epidemic; whether the real-time reporting of genomic findings is able to alter the course of a vector-borne epidemic remains to be seen. Retrospective phylodynamic investigations are also useful for pandemic preparedness planning. A recent analysis of 1,610 Ebola virus genomes -approximately 5% of all cases -reconstructs the movement of the virus across West Africa and reveals drivers for its spread 1 . The authors deduce that Ebola importation was more likely to occur between regions of a country than across international borders and that both population size and distance to a nearby large urban centre were associated with local expansion of the virus. These findings may affect decision-making around border closures in future Ebola outbreaks and point to the need to develop surveillance, diagnostic and treatment capacity in urban centres. The role of the environment In deploying genomics for surveillance, diagnostics and epidemiological investigation, a key question remains: where? Many regions lack the diagnostic laboratory capacity to carry out basic surveillance, but continuous genomic surveillance in all of these settings would be impossible. Numerous projects have attempted to describe the pool of geographic hot spots and candidate pathogens from which the next epidemic or pandemic will arise. Determining these factors is key to predicting and preventing spillover events (FIG. 4) 55 ). They report an increasing number of events each decade, generally located in hot spots defined by specific environmental, ecological and socio-economic characteristics. Most EIDs are zoonotic in origin, with the highest risk of spillover in regions with high wildlife diversity that have experienced recent demographic change and/or recent increases in farming activity 55 . A global biogeographic analysis of human infectious disease further supports the use of biodiversity as a proxy for EID hot spots 56 , and reviews focused on systems-level, rather than ecological, factors identify the breakdown of local public health systems as drivers of outbreaks, suggesting that surveillance ought to be targeted to settings where bio diversity and changing demographics meet inadequate sanitation and hygiene, lack of a public health infra structure for deliver ing interventions and no or limited resources for control of zoonoses and vector-borne diseases 57 . These analyses provide a shortlist of regions, including parts of eastern and southeastern Asia, India and equatorial Africa, on which genomic and other surveillance activities should be focused 55, 58 . Within these regions, sewer systems and wastewater treatment plants could be important foci for sample collection, providing a single point of entry to biological readouts from an entire community. Indeed, proof-of-concept metagenomics studies have revealed the presence of antibiotic resistance genes 59 , human-specific viruses 60 61 . Most were zoonotic in origin, and over one-quarter had been detected in non-human species many years before being identified as human pathogens. A later review reiterates this observation, noting that recent agents of concern -Ebola, Zika and chikungunya -had been identified decades before they achieved pandemic magnitude 62 . As a result of NGS technology, the pace of novel virus discovery is accelerating, with recent large-scale studies revealing 184 new viruses sampled from macaque faeces in a single geographic location 63 and 1,445 new viruses discovered from RNA transcriptomic analyses of multiple invertebrate species 64 . However, understanding which of these new entities might pose a threat requires a new approach. One Health. The emergence of a zoonotic pathogen proceeds in stages 65 (FIG. 4) ; in an effort to better anticipate these transitions and more proactively respond to emerging threats, the One Health movement was launched in 2004. Recognizing that human, domestic animal and wildlife health and disease are linked to each other and that changing land-use patterns contribute to disease spread, One Health aims to develop systems-minded, forward-thinking approaches to disease surveillance, control and prevention 66 . By investing in infrastructure for human and animal health surveillance, committing to timely information sharing and establishing collaborations across multiple sectors and disciplines, the goal of the One Health community is an integrated system incorporating human, animal and environmental surveillance -a goal in which genomics can have an important role. The One Health approach has been implemented through the PREDICT project, which is part of the Emerging Pandemic Threats (EPT) programme of the US Agency for International Development (USAID). PREDICT explores the spillover of selected viral zoonoses from particular wildlife taxa 67 , and early efforts have focused on developing non-invasive sampling techniques for wildlife 68 , estimating the breadth of mammalian viral diversity across nine viral families and at least 320,000 undiscovered species 69 and demonstrating that viral community diversity is at least a partially deterministic process, suggesting that forecasting community changes, which potentially signal spillover, is a possibility 63 . Although the goal of using integrated surveillance information to predict an outbreak is still many years away, One Health studies are already leveraging the tools and techniques of genomic epidemiology to understand current outbreaks. Combining genomic data with data streams from enhanced One Health surveillance platforms presents an opportunity to detect the population expansions Nature Reviews | Genetics Figure 3 | Inferring transmission events from genomic data. Genomic approaches to identifying transmission events typically involve four steps. In the first step, outbreak isolates, and often non-outbreak control isolates, are sequenced and their genomes either assembled de novo or mapped against a reference genome. Next, the genomic differences between the sequences are identified -depending on the pathogen and the scale of the outbreak, these may include features such as genetic variants, insertions and deletions or the presence or absence of specific genes or mobile genetic elements. In the third step, these features are examined to infer the relationships between the isolates from whence they came -a variant common to a subset of isolates, for example, suggests that those cases are epidemiologically linked. Finally, the genomic evidence for epidemiological linkages is reviewed in the context of known epidemiological information, such as social contact between two cases or a common location or other exposure. Recently, automated methods for inferring potential epidemiological linkages from genomic data alone have been developed, greatly facilitating large-scale genomic epidemiological investigations 121 . and/or cross-species transmissions that may precede a human health event. For example, genome sequences from a raccoon-associated variant of rabies virus (RRV), when paired with fine-scale geographic information and data from Canadian and US wildlife rabies vaccination programmes, demonstrated that multiple cross-border incursions were responsible for the expansion of RRV into Canada and sustained outbreaks in several provinces 70 ; this finding led to renewed concern about and action against rabies on the part of public health authorities 71 . One of the first studies coupling detailed wildlife and livestock movement data with phylodynamic analysis of a bacterial pathogen revealed that crossspecies jumps from an elk reservoir were the source of increasing rates of Brucella abortus infections in nearby livestock 72 ; as the most common zoonosis of humans, brucellosis control programmes will benefit substantially from this sort of One Health approach 73 . This model, in which diagnostic testing in reference laboratories triggers genomic follow-up, represents an effective near-term solution for integrating genomics into One Health surveillance efforts as the community explores solutions to the many challenges facing in situ clinical metagenomics surveillance of animal populations (reviewed in REF. 74 ). Initial forays into this area have been successful; for example, metagenomics analysis of human diarrhoeal specimens and stools from nearby pigs revealed potential zoonotic transmission of rotavirus 75 . However, metagenomic sequencing across a range of animal species and environments yields more questions than answers. What is an early signal of patho gen emergence versus background microbial noise 65 ? Which emerging agents are capable of crossing the species barrier and causing human disease 74 ? What degree of sampling is required to capture potential spillovers 67 ? Ultimately, a more efficient use of metagenomics in a One Health surveillance strategy might be scanning for zoonotic 'jumps' in selected sentinel human populations rather than a sweeping animal surveillance strategy 62 , with sentinels chosen according to EID hotspot maps and other factors 65 and interesting genomic signals triggering follow-up sequencing in the relevant animal reser voirs. By combining genomic data generated through these targeted surveillance efforts with phylodynamic approaches, it will be possible to take simple presence or absence signals and derive useful epidemiological insights: signals of population expansion; evidence of transmission within and between animal reservoirs and humans; and epidemiological analysis of a pathogen's early expansion. Most modern surveillance systems use human, animal, environmental and other data 76 to carry out disease-specific surveillance, in which a single disease is monitored through one or more data streams, such as positive laboratory test results or reportable communicable disease notifications. Despite marked advances over the preceding decades, testimony from multiple expert groups has repeatedly emphasized the need for improved surveillance capacity 8, 77 , including the use of syndromic surveillance, a more pathogen-agnostic approach aimed at early detection of emerging disease 78, 79 . Syndromic surveillance systems might leverage unique data streams such as school or employee absenteeism, grocery store or pharmacy purchases of specific items or calls to a nursing hotline as signals of illness in a population. Increasingly, digital streams are being used as an input to these systems, be they participatory epidemiology projects such as Flu Near You 80 , the automated analysis of trending words or phrases on social media sites, such as Twitter 81,82 , or Internet search queries [83] [84] [85] . This new approach to surveillance is known as digital epidemiology and is also referred to as digital disease detection 86 . In digital epidemiology, information is first retrieved from a range of sources, including digital media, newswires, official reports and crowd sourcing; second, translated and processed, which includes extracting disease events and ensuring reports are not duplicated; third, analysed for trends; and fourth, disseminated to the community through media, including websites, email lists and mobile alerts 87 In spillover, a pathogen previously restricted to animals gradually begins to move into the human population. During stage one (pre-emergence), as a result of changing demographics and/or land use, a pathogen undergoes a population expansion, extends its host range or moves into a new geographic region. During stage two (localized emergence), contact with animals or animal products results in spillover of the pathogen from its natural reservoir(s) into humans but with little to no onward person-to-person transmission. During stage three (pandemic emergence), the pathogen is able to sustain long transmission chains, that is, a series of disease transmission events, such as a sequential series of person-to-person transmissions, and its movement across borders is facilitated by human travel patterns 65 . epidemiology platforms are currently operating 88 , and their flexible nature and cost-effective, real-time reporting make them effective tools for gathering epidemic intelligence, particularly in settings lacking traditional disease surveillance systems. The fields of One Health and digital epidemiology are increasingly overlapping. In the PREDICT consortium, the HealthMap system 89 and local media surveillance were combined to identify 307 health events in five countries over a 16-week period 90 . PREDICT also suggested a role for digital epidemiology in not just event detection but also the identification of changing EID drivers. EIDs are driven by multiple factors, many of which have digital outputs and represent novel sources of surveillance data 91 . For example, human movement can be revealed by mobile phone data or by the patterns of lighted cities at night, hunting data collected by states can reveal interactions between humans and wildlife, and social media and digital news sources can reveal early signals of famine, war and other social unrest. A major challenge is that the number of digital data sets available for each driver varies substantially, from hundreds for surveying land use changes -many based on remote sensing data 92 -to mere handfuls around social inequalities and human susceptibility to infection, with most data biased towards North America and Europe. The digital and genomic epidemiology domains are also starting to overlap. In the Ebola outbreak, digital epidemiology revealed that drivers of infection risk included settings where households lacked a radio, with high rainfall and with urban land cover 93 , echoing the evidence from a genomic study suggesting that sites at which urban and rural populations mix contribute to disease 1 . During the Zika epidemic, Majumder et al. used HealthMap and Google Trends to estimate the basic reproductive number R 0 to be 1.42-3.83 94 ; phylo dynamic estimates from Brazilian genomic data gave similar ranges (1.29-3.85) 3 , indicating that both types of data streams can be leveraged in calculating epi demiological parameters that help shape the public health response. A digital pathogen surveillance era Recent reports have called for the integration of genomic data with digital epidemiology streams 92, 95 . When informed by a One Health approach, the epidemiological potential of this digital pathogen surveillance system is profound. Imagine parallel networks of portable patho gen sequencers deployed to laboratories and communities in EID hot spots -regions that are traditionally underserved with respect to laboratory and surveillance capacity -and processing samples collected from targeted sentinel wildlife species, insect vectors and humans (FIG. 5) . Samples would be pooled for routine surveillance -either through targeted diagnostics or, if the issue of analytical sensitivity can be overcome, through metagenomics -with a full genomic work-up of individual samples should a pathogenic signal be detected. At the same time, existing Internet-based platforms such as HealthMap and new local participatory epidemiology efforts would be collecting data to both identify potential hotspot regions and detect EID events, enabling both prospective and rapid-response deployment of additional sequencers. Genome sequencing data coupled with rich metadata would then be released in real time to web-based platforms, such as Virological for colla borative analysis and Nextstrain for analysis and visualization 96 . These sites -already used in the Ebola and Zika responses -would act as the nexus for a global network of interested parties contributing to real-time phylo dynamic and epidemiological analyses and looking for signals of spillover, pathogen population expansion and sustained human-to-human transmission. Results would be immediately shared with the One Health frontlineepidemiologists, veterinarians and community health workers -who would then implement evidence-based interventions to mitigate further spread. The pathway to such a reality is not without its roadblocks. Apart from technical and implementation challenges, a series of larger concerns surrounds the rollout of genomics-based rapid outbreak response, ranging from the uptake of a new, disruptive technology to effecting systems-level change on a global scale. Sequencing-based diagnostics, particularly clinical metagenomics approaches, are still straddling the boundary between research and clinical use. In this realm, uncertainty is a certainty, be it uncertainty inherent to the technology itself or informational uncertainty, such as how accurate, complete and reliable results actually are 97 . Early adopters of genomics in the academic domain are used to uncertainty, often acknowledging and appraising it, but routine clinical use requires meeting the evidentiary thresholds mandated by a range of stakeholders, from regulators to the laboratories implementing new sequencing-based tests. Decision criteria that influence whether a new genomic test is adopted include the ability of the assay to differentiate pathogens from commensals, the correlation of pathogen presence with disease, the sensitivity and specificity of the test, its reproducibility and robustness across sample types and settings and a cost comparable to that of existing platforms 98 . Validation -defining the conditions needed to obtain reliable results from an assay, evaluating the performance of the assay under said conditions and specifying how the results should be interpreted, including outlining limitations 99 -is also critical. Much can be learned from the domain of microbial forensics, where sequencing is playing a large part 100 . Budowle et al. review validation considerations for NGS 101 , noting that this technology requires validating sample preparation protocols, including extraction, enrichment and library preparation steps, sequencing protocols, and downstream bioinformatics analyses, including alignment and assembly, variant calling, the underlying reference databases and software tools and the interpretation of the data. Complete validation of a sequencing assay may not always be possible, particularly for emerging patho gens. Therefore, just as the West African Ebola virus outbreak triggered a review of the ethical context for trialling new therapeutics and vaccines 102 , the scale-up of NGS in emerging epidemics will engender similar conversations. Rather than wait for this to happen, an anticipatory approach is best, outlining the exceptional circumstances under which unvalidated approaches might be used, selecting the appropriate approach and examining the benefits of a potentially untested approach in light of individual and societal interests. If the social landscape surrounding the introduction of a new technology is not considered, prior experience suggests that the road to implementation will be difficult, with hurdles ranging from public mistrust to moratoria on research 103 . The enthusiasm of the scientific community for new technology must not lead to inflated claims of clinical utility and poor downstream decisions around the deployment of that technology. Howard et al. outline several principles for successfully integrating genomics into the public health system, and as we pilot digital pathogen surveillance, the community would do well to keep many of them in mind: ensuring that the instruments and processes used are reliable and that reporting is standardized and readily interpretable by end users; that the technology is used to address important health problems; that the advantages of the approach outweigh the disadvantages; and that economic evaluation suggests savings to the health care system and society 104 . It is also important to reconsider the role of the diagnostic reference laboratory in the new genomic landscape. As their mandates expand to include enhanced surveillance and closer collaboration with field epidemiologists, laboratory directors will face new challenges, from managing exploratory work alongside routine clinical care to hiring a new sort of technologist, one with basic genomics and epidemiology training. The ethical, social and legal implications of digital pathogen surveillance are an emerging area of research (reviewed in REF. 105 ). Chief among the issues that Geller et al. identify is the tension that exists when a new technology has the power to identify a problem but there is limited or no capacity to address the issue. Balancing the benefits and harms to both individuals and populations is challenging when the predictive insight offered by a genomic technology is variable -for example, using genomics to identify an individual as a 'super spreader' has important implications for quarantine and isolation, but that label may be predicated on a tenuous prediction. The problem is further compounded by the fact that many infectious disease diagnoses carry with them a certain amount of stigma and that an individual's right to privacy might be superseded by the need to protect the larger population 105 . Data sharing and integration. A critical need for successful digital pathogen surveillance is the capacity for rapid, barrier-free data sharing, and arguments for such sharing are frequently rehashed after outbreaks and epidemics. Genomic epidemiology was born largely in the academic sphere, with early papers coming from laboratories with Nature Reviews | Genetics In one such region, the syndromic surveillance system reports higher-than-average sales of a common medication used to relieve fever. Spatial analysis of the data from the pharmacies in the region suggests that the trend is unique to a particular district; a follow-up geographic information system (GIS) analysis using satellite data reveals that this area borders a forest and is increasingly being used for the commercial production of bat guano. An alert is triggered, and the field response team meets with citizens in the area. Nasopharyngeal swabs are taken from humans and livestock with fever as well as from guano and bat tissue collected in the area. The samples are immediately analysed using a portable DNA sequencer coupled to a smartphone. An app on the phone reports the clinical metagenomic results in real time, revealing that in many of the ill humans and animals, a novel coronavirus makes up the bulk of the microbial nucleic acid fraction. The sequencing data are immediately uploaded to a public repository as they are generated, tagged with metadata about the host, sample type and location and stored according to a pathogen surveillance ontology. The data release triggers an announcement via social media of a novel sequence, and within minutes, interested virologists have created a shared online workspace and open lab notebook to collect their analyses of the new pathogen. extensive histories in microbial genomics and bioinformatics. For this community, open access to genome sequences, software and, more recently, publications has tended to be the rule rather than the exception. Indeed, a 2004 National Research Council report described "the culture of genomics" as "unique in its evolution into a global web of tools and information" (REF. 106 ). The same report includes a series of recommendations on access to pathogen genome data, including the statement that "rapid, unrestricted public access to primary genome sequence data, annotations of genome data, genome databases, and Internet-based tools for genome analysis should be encouraged" (REF. 106 ). As genomics has moved into the domain of clinical and public health practice, the notion of free and im mediate access to genomic surveillance data has encountered several barriers: the siloing of critical metadata across multiple public health databases with no interoperability; balancing openness and transparency with patient privacy and safety; variable data quality, particularly in resource-limited settings; concerns over data reuse by third parties; a lack of standards and ontologies to capture metadata; and career advancement disincentives to releasing data [107] [108] [109] . Despite these challenges, the spirit of open access and open data remains strong in the community, with over 40 public health leaders from around the world recently signing a joint statement on data sharing for public health surveillance 110 . The Ebola and Zika responses in particular highlight the role of realtime sharing of data and samples, be it through the use of chat groups and a LabKey server to disseminate Zika data 111 or GitHub to share Ebola data 112 . In the wake of Ebola, Yozwiak et al. 113 and Chretien et al. 114 outline additional issues facing data sharing, from differing cultures and academic norms to complicated consent procedures and technical limitations. They note that we as a community must agree on standards and practices promoting cooperation -a conversation that could begin by examining how the Global Alliance for Genomics and Health (GA4GH) framework for responsible sharing of genomic and health-related data (BOX 1) could be adapted for the digital pathogen surveillance community. The future: the sequencing singularity? Transformative change to public and global health is profoundly difficult. Complicating the existence of a rapid, open, transparent response is the fact that no matter the setting, there are often conflicting interests at work. In an outbreak scenario, conflict may result from governments wishing to keep an outbreak quiet and/or from the tension between lower-income and middle-income countries with few resources for generating and using data and the researchers or response teams from better-resourced settings 115 . Indeed, the conflicting values in outbreak responses meet the definition of a 'wicked' problem, where issues resist simple resolution and span multiple jurisdictions and where each stakeholder has a different perspective on the solution. Even the International Health Regulations (IHR), which ostensibly provide a legal instrument for global health security, fail to effect a basic surveillance and outbreak response. As of the most recent self-reporting, only 30% of the 196 member countries of the IHR are in compliance, meeting the prescribed minimum public health core capacities 5 . In these settings, digital pathogen surveillance must be within the purview of the larger global health community and its diverse group of non-state actors rather than being solely the responsibility of nations themselves 116 . This raises an important issue: if nations are willing to cede a certain amount of surveillance and diag nostic control Box 1 | The Global Alliance for Genomics and Health (GA4GH) framework for genomic data sharing In the 1948 Universal Declaration of Human Rights, Article 27 outlines the right of every individual "to share in scientific advancement and its benefit". In this spirit, the Global Alliance for Genomics and Health (GA4GH) data-sharing framework 119 , which covers data donors, producers and users, is guided by the principles of privacy, fairness and non-discrimination and has as its goal the promotion of health and well-being and the fair distribution of benefits arising from genomic research. The core elements of the framework include the following: • Transparency: knowing how the data will be handled, accessed and exchanged • Accountability: tracking of data access and mechanisms for addressing misuse • Engagement: involving citizens and facilitating dialogue and deliberation around the societal implications of data sharing • Quality and security: mitigating unauthorized access and implementing an unbiased approach to storing and processing data • Privacy, data protection and confidentiality: complying with the relevant regulations at every stage • Risk-benefit analysis: weighing benefits (including new knowledge, efficiencies and informed decision making) against risks (including invasion of privacy and breaches of confidentiality), minimizing harm and maximizing benefit at the individual and societal levels • Recognition and attribution: ensuring recognition is meaningful to participants, providing due credit to all who shared data and ensuring credit is given for both primary and secondary data use • Sustainability: implementing systems for archiving and retrieval • Education and training: advancing data sharing, improving data quality, educating people on why data sharing matters, and building capacity • Accessibility and dissemination: maximizing accessibility, promoting collaboration and using publication and digital dissemination to share results to the global health community, the notion of reciprocity suggests that they should derive some corresponding local benefit. The 'trickle-down' effects of global genomic surveillance have yet to be fully articulated, but they are likely to be realized first in the zoonotic domain, where global surveillance efforts will feed back into improved animal health at a local level, in turn benefiting local farmers. Outbreaks occur at the intersection of risk perception, governance, policy and economics 117 , and outbreak response is often based on political instinct rather than data 5 . Building a resilient and responsive public health system is therefore more than just enhancing surveillance and coupling it to novel technology -it is about engagement, trust, cooperation and building local capacity 8 , as well as a focus on pandemic prevention through development rather than pandemic response via disaster relief mechanisms 57 . Expert panels convened by Harvard and the London School of Hygiene and Tropical Medicine 9 and by the National Academy of Medicine 8 have called for a central pandemic preparedness and response agency and also underscored the need for deeper partnerships between formal and informal surveillance, epidemiology and academic and public health networks 5 . More recently, evolutionary biologist Michael Worobey wrote: "Systematic pathogen surveillance is within our grasp, but is still undervalued and underfunded relative to the magnitude of the threat" (REF. 118 ). If we are to achieve the sequencing singularity -the moment at which pathogen, environmental and digital data streams are integrated into a global surveillance system -we require a community united behind a vision in which public health and the attendant data belong to the public and behind the idea that we are a better, healthier society when the public is able to access and benefit from the data being collected about us and the pathogens we share the planet with. Virus genomes reveal factors that spread and sustained the Ebola epidemic Zika virus in the Americas: Early epidemiological and genetic findings This work is the first to leverage genome sequences generated early in the Zika outbreak to provide a real-time glimpse into the spread of the virus Establishment and cryptic transmission of Zika virus in Brazil and the Americas Genomic epidemiology reveals multiple introductions of Zika virus into the United States This paper is the first to use a genomic approach to track the entry of Zika into the USA Our shared vulnerability to dangerous pathogens Progress in global surveillance and response capacity 10 years after severe acute respiratory syndrome West African Ebola crisis and orphans Commission on a Global Health Risk Framework for the Future. The Neglected Dimension of Global Security: A Framework to Counter Infectious Disease Crises Will Ebola change the game? Ten essential reforms before the next pandemic. The report of the Harvard-LSHTM Independent Panel on the Global Response to Ebola Application of next generation sequencing in clinical microbiology and infection prevention This report, from the American Society for Microbiology and the College of American Pathologists, provides a comprehensive overview of clinical metagenomics and the associated validation challenges Diagnosing Balamuthia mandrillaris encephalitis with metagenomic deep sequencing Actionable diagnosis of neuroleptospirosis by next-generation sequencing Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Clinical and biological insights from viral genome sequencing Next-generation sequencing diagnostics of bacteremia in septic patients Rapid pathogen identification in bacterial pneumonia using real-time metagenomics Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing Illuminating uveitis: metagenomic deep sequencing identifies common and rare pathogens Performance comparison of benchtop high-throughput sequencing platforms The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community Real-time, portable genome sequencing for Ebola surveillance Nanopore sequencing as a rapidly deployable ebola outbreak tool Mobile real-time surveillance of Zika virus in Brazil Extreme metagenomics using nanopore DNA sequencing: a field report from Svalbard Real-time DNA sequencing in the Antarctic Dry Valleys ising the Oxford Nanopore sequencer Deep sequencing: intra-terrestrial metagenomics illustrates the potential of off-grid Nanopore DNA sequencing Nanopore sequencing in microgravity Nanopore DNA sequencing and genome assembly on the International Space Station Genetic identification of a hantavirus associated with an outbreak of acute respiratory illness The molecular epidemiology of human immunodeficiency virus type 1 in Edinburgh Whole genome sequencing -implications for infection prevention and outbreak investigations Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli Realtime investigation of a Legionella pneumophila outbreak using whole genome sequencing A multi-country Salmonella enteritidis phage type 14b outbreak associated with eggs from a German producer: 'near real-time' application of whole genome sequencing and food chain investigations, United Kingdom Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation A brief primer on genomic epidemiology: lessons learned from Mycobacterium tuberculosis Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks This paper introduces the concept of phylodynamics, which has since become a key tool in the population genomics and epidemiology toolboxes Measurably evolving populations Genome-scale rates of evolutionary change in bacteria Towards a new paradigm linking virus molecular evolution and pathogenesis: experimental design and phylodynamic inference This paper describes BEAST, a frequently used toolkit for phylogenetics and phylodynamic reconstructions Inferring epidemiological dynamics with Bayesian coalescent inference: the merits of deterministic and stochastic models The epidemic behavior of the hepatitis C virus Emerging concepts of data integration in pathogen phylodynamics The evolution of Ebola virus: insights from the 2013-2016 epidemic Emergence of Zaire Ebola virus disease in Guinea Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak Molecular evidence of sexual transmission of Ebola virus Reduced evolutionary rate in reemerged Ebola virus transmission chains The wanderings of the communication on the Ebola virus disease Ecological origins of novel human pathogens This landmark work surveys the emergence of infectious diseases since 1940 and identifies a number of hot spots for disease emergence Global biogeography of human infectious diseases Preventing pandemics via international development: a systems approach The structure and diversity of human Environmental surveillance of viruses by tangential flow filtration and metagenomic reconstruction Search strategy has influenced the discovery rate of human viruses Detecting the emergence of novel, zoonotic viruses pathogenic to humans Non-random patterns in viral diversity Redefining the invertebrate RNA virosphere Prediction and prevention of the next pandemic zoonosis Conference summary: One World, One Health: building interdisciplinary bridges to health in a globalized world One Health proof of concept: bringing a transdisciplinary approach to surveillance for zoonotic viruses at the human-wild animal interface Optimization of a novel noninvasive oral sampling technique for zoonotic pathogen surveillance in nonhuman primates A strategy to estimate unknown viral diversity in mammals Processes underlying rabies virus incursions across US-Canada border as revealed by whole-genome phylogeography The changing face of rabies in Canada Genomics reveals historic and contemporary transmission dynamics of a bacterial disease among wildlife and livestock Brucellosis in livestock and wildlife: zoonotic diseases without pandemic potential in need of innovative one health approaches Viral metagenomics on animals as a tool for the detection of zoonoses prior to human infection? Unbiased whole-genome deep sequencing of human and porcine stool samples reveals circulation of multiple groups of rotaviruses and a putative zoonotic infection Traditional and syndromic surveillance of infectious diseases and pathogens Committee on Achieving Sustainable Global Capacity for Surveillance and Response to Emerging Diseases of Zoonotic Origin. Sustaining Global Surveillance and Response to Emerging Zoonotic Diseases Implementing syndromic surveillance: a practical guide informed by the early experience What is syndromic surveillance? MMWR Suppl Flu Near You: crowdsourced symptom reporting spanning 2 influenza seasons The reliability of tweets as a supplementary method of seasonal influenza surveillance Twitter improves influenza forecasting Web queries as a source for syndromic surveillance Google trends: a web-based tool for real-time surveillance of disease outbreaks Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic Digital disease detection -harnessing the Web for public health surveillance An overview of internet biosurveillance Digital disease detection: a systematic review of event-based internet biosurveillance systems HealthMap: the development of automated real-time internet surveillance for epidemic intelligence HealthMap has become one of the most important digital epidemiology resources Evaluation of local media surveillance for improved disease recognition and monitoring in global hotspot regions Drivers of emerging infectious disease events as a framework for digital detection Precision global health in the digital age Spatial determinants of Ebola virus disease risk for the West African epidemic Utilizing nontraditional data sources for near real-time estimation of transmission dynamics during the 2015-2016 Colombian Zika virus disease outbreak Precision public health for the era of precision medicine This paper describes the nextflu project, which gave rise to the Nextstrain platform, whose approach to analysis and visualization recently earned an international prize for open science Known unknowns: building an ethics of uncertainty into genomic medicine Delphi technology foresight study: mapping social construction of scientific evidence on metagenomics tests for water safety Criteria for validation of methods in microbial forensics Expansion of microbial forensics Validation of high throughput sequencing and microbial forensics applications The Ebola clinical trials: a precedent for research ethics in disasters Germline genome-editing research and its socioethical implications The ethical introduction of genome-based information and technologies into public health Genomics and infectious disease: a call to identify the ethical, legal and social implications for public health and clinical practice US) Committee on Genomics Databases for Bioterrorism Threat Agents. Seeking Security: Pathogens, Open Access, and Genome Databases Perspectives on data sharing in disease surveillance. Chatham House: The Royal Institute of International Affairs Overcoming barriers to data sharing in public health: a global perspective Big data or bust: realizing the microbial genomics revolution Public health surveillance: a call to share data. International Association of Public Health Institutes Real-time sharing of Zika virus data in an interconnected world Democratic databases: science on GitHub Data sharing: make outbreak research open access Make data sharing routine to prepare for public health emergencies Best practices for ethical sharing of individual-level health research data from low-and middle-income settings Grand challenges in global health governance Social and economic aspects of the transmission of pathogenic bacteria between wildlife and food animals: a thematic analysis of published research knowledge Epidemiology: molecular mapping of Zika spread Framework for responsible sharing of genomic and health-related data Literature review of Zika virus Using genomics data to reconstruct transmission trees during disease outbreaks Smith Foundation for Health Research programmes. Both authors contributed equally to all aspects of the article. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.