key: cord-276916-j53i5xfs authors: Kraemer, M. U. G.; Cummings, D. A. T.; Funk, S.; Reiner, R. C.; Faria, N. R.; Pybus, O. G.; Cauchemez, S. title: Reconstruction and prediction of viral disease epidemics date: 2018-11-05 journal: Epidemiol Infect DOI: 10.1017/s0950268818002881 sha: doc_id: 276916 cord_uid: j53i5xfs A growing number of infectious pathogens are spreading among geographic regions. Some pathogens that were previously not considered to pose a general threat to human health have emerged at regional and global scales, such as Zika and Ebola Virus Disease. Other pathogens, such as yellow fever virus, were previously thought to be under control but have recently re-emerged, causing new challenges to public health organisations. A wide array of new modelling techniques, aided by increased computing capabilities, novel diagnostic tools, and the increased speed and availability of genomic sequencing allow researchers to identify new pathogens more rapidly, assess the likelihood of geographic spread, and quantify the speed of human-to-human transmission. Despite some initial successes in predicting the spread of acute viral infections, the practicalities and sustainability of such approaches will need to be evaluated in the context of public health responses. Infectious disease outbreaks pose a significant threat to human health. The frequency of such outbreaks is thought to have increased over the past decade. For example, quickly after an epidemic of Ebola virus affected Guinea, Sierra Leone, and Liberia in 2013-2016 [1] , chikungunya virus (CHIKV) caused an extensive international epidemic in the Americas and beyond, and was quickly followed by Zika virus (ZIKV) emergence. To date, there have been more than 500 000 confirmed or probable cases of ZIKV but the true number of cases remains unknown [2] . Yellow fever (YF), a vaccine-preventable disease, recently posed major public health problems. In 2015-2016, the largest YF outbreak since the 1980s was observed in Angola and the Democratic Republic of the Congo, causing 962 confirmed cases and 393 deaths [3] . YF also poses an ongoing public health risk to large, urban and under-vaccinated populations in the coastal areas of southern Brazil, a country that successfully eradicated YF in the 1950s and 1960s [3] [4] [5] . Examples of other emerging pathogens that have caused international health security concerns include the severe acute respiratory syndrome (SARS) virus and the Middle East Respiratory Syndrome Coronavirus (MERS-Cov) [6] [7] [8] [9] . This list extends to other pathogens such as influenza, Nipah and henipaviral diseases, and Lassa fever [10] . These examples show the continued risks that infectious diseases pose and highlight the challenges of large international outbreaks to epidemic planning and response. During emerging infectious disease outbreaks, empirical information and mathematical modelling techniques are now commonly used to characterise and predict the spatio-temporal dynamics of the spread of pathogens. Such analyses may help policymakers to evaluate the threat to public health, determine the resources required to reduce disease burden, and guide disease surveillance efforts and the deployment of interventions. In the last decade, our ability to perform such assessments has been improved by advances in a number of disciplines, including digital disease surveillance [11] , environmental modelling [12, 13] , genomics [14] and mathematical modelling [15] . For example, environmental variables such as rainfall and precipitation [13, [16] [17] [18] [19] [20] [21] [22] can be used to better understand the landscape within which the disease may be transmitted, and detailed transmission data from a small sampled population can be extrapolated to larger, un-surveyed areas [23] . Attempts have been made to illustrate the spatial structure of epidemics mainly using human movement data [24] [25] [26] [27] , to provide mechanistic insights in how the disease may disperse locally [3, 28, 29] or how effective reactive vaccination campaigns may be [30] [31] [32] . There are continued efforts to reconstruct epidemic dynamics using information derived from pathogen genomic data, which contain unique information about the history of transmission [2, [33] [34] [35] . Although each of these disciplines has an established relationship to disease prevention and control, the benefits of integrating them into a unified framework have yet to be fully achieved. Here we describe the common applications and models used to predict acute viral diseases and discuss the current challenges and limitations. We then outline the advantages of integrating disparate data sources to advance our understanding of epidemic spread. We discuss how such research has been used in recent outbreaks and outline shortcomings that may be addressed in the future. Phylogenetic and phylodynamic tools are increasingly being used to infer a range of outbreak properties [36] . Common spatiotemporal analyses of pathogen genomes focus on mapping and predicting virus lineage exchange among locations, with the underlying aim of reconstructing the pathways of disease introduction and spread, albeit at a coarse spatial resolution, and often retrospectively [2, 8, 33, 35, 37, 38] . An additional feature that can be inferred from genomic data is the timing of individual founder introductions [39] . Blue arrows in Fig. 1 indicate the time when the first report was published inferring the likely geographic origin of four major international infectious disease outbreaks. Phylogenetic tools can help to characterise the number of introductions that lead to disease transmission in a new location [41] , quantify the risk of cross-species transmission [42] , and infer ecological drivers of transmission [43, 44] . Genome-derived estimates have been compared qualitatively to those from epidemiological data, but formal model-based integration of both data sources are rare [45, 46] . In principle, pairing genomic information with epidemiological inference should enable us to quantify the number of cases missed in each location and help to estimate parameters such as the basic reproductive number and doubling time of the epidemic, as done for ZIKV at the tail end of the epidemic (Fig. 1a ) [46] [47] [48] . A common limitation when genetic data are used is the absence of a rigorous and formal sampling scheme. In many instances, genomic sampling is affected by convenience and expedience and may not reflect underlying incidence, although this can be improved post-hoc in large data sets via sub-sampling using, for example continuous phylogenetic inference [49] [50] [51] . Strong sampling biases may affect estimates of the arrival time of a pathogen and its pathways of dissemination among locations [33] . Static disease mapping is a powerful tool to visualise and defines the landscape within which transmission occurs, based on ecological drivers of transmission [17, 18, 22] . When combined with global data on human travel and mobility, it can be used to understand the global dynamic risk surface of infectious disease, especially when there are strong ecological determinants of transmission, as there are for the vector-borne diseases Zika, dengue, chikungunya and YF [27, 52] . Publication of reports that estimate geographic spread for the diseases in Fig. 1 are indicated by green arrows. The global epidemic history of Zika, for example, remains poorly understood. The challenge to accurately reconstruct the epidemic pathway of the virus is further complicated by its relatively unspecific clinical presentation. This may explain why the initial studies that aimed to understand the geographic origin of the Zika epidemic in the Americas were published relatively late into the epidemic (>1 year, Fig. 1a ). For the other major outbreaks highlighted in Fig. 1 , estimates of the geographic origin were documented between 6 and 8 months after the first reports of human cases ( Fig. 1b-d ; Table 1 ). However, given the underlying ecological determinants of transmission that restrict the reproduction of the virus in the mosquito vector species, large areas can be excluded from the risk of local virus transmission. When overlaying information on the reported presence of Zika cases vs. the underlying ecological risk map, surveillance gaps may be identified [19, 27] . Areas where there is a mismatch in the predicted presence and reported presence (i.e. cases detected) should be targeted for active surveillance. The spatial spread process of new pathogens, however, is not only determined by the underlying ecological determinants in each location but also by the dynamic nature of importation, often driven by human movements [61] . Spatial models that take into account the patterns of human spread and mobility may, therefore, improve our ability to characterise and anticipate spatial expansion. Different models have been proposed to predict the geographic spread of epidemics but rarely have they been used in real time during the course of an epidemic [3, 62] (Fig. 1) . For example, during the YF outbreak in Angola and the Democratic Republic of the Congo, estimates of geographic spread to provinces outside Luanda, the capital of Angola, were published >6 months after the last cases were reported (Fig. 1c) . Such information could guide public health institutions to decide where and when to implement surveillance and control programs [27] . More work, however, is needed to dynamically map the spread of infectious diseases and to extract meaningful and interpretable quantities for public health practitioners. In parallel to these efforts to model the spread of pathogens at a meta-population level (e.g. between cities, regions, countries or continents), we also need to better understand transmission dynamics at a much more granular level and assess the characteristics of the inter-human transmission. While historically, the potential for inter-human transmission has often been summarised with a single statistic; the reproduction number R (i.e. the average number of secondary infections generated by a case). However, it has long been recognised that it is also essential to assess heterogeneities in individual R values, since the presence of super-spreading events may have a major impact on the risk of emergence and our ability to control outbreaks [63] . This was exemplified in a large MERS-CoV outbreak in South Korea in 2015 in which only a small number of cases were responsible for the majority of infections [64, 65] . Other factors that may drive the spatial differences in the reproductive number are ecological (population density, climatic factors, or others) and can now be readily incorporated in transmission models [66] . Ideally, these assessments should be performed on detailed data documenting chains of transmission, as such data can provide precise quantification of the transmission potential and the impact of targeted interventions in different settings and over time, and allow testing specific hypotheses about the transmission process (e.g. what is the contribution of re-introductions to the overall dynamic?) [67] . However, such data are rarely available as it is difficult to identify the source of infection for most pathogens. As a result, sophisticated statistical techniques are often required to reconstruct chains of transmission and estimate transmission parameters from more limited data that may include: (i) in the context of zoonoses, the size of human clusters [68] [69] [70] or the proportion of surveillance cases that reported a contact with the natural reservoir [71] , (ii) the growth rate in the case count [72] [73] [74] [75] , (iii) partial data on chains of transmission [76] , or (iv) outbreak data where the timing of symptom onsets and location of cases are recorded in small communities such as households [77] [78] [79] [80] , schools [81] or villages [82] . In cases of high-density sampling, genomic data can help to reconstruct transmission chains [83] . Mechanistic models of infectious disease dynamics can be used to make predictions about the future course of an outbreak within a given location [84] . Increasingly, such models are being used in real time, such that predictions are updated every time a new data point becomes available [85, 86] . Some other applications track pathogen evolution over time as data become available [87] . However, the perceived ability of such models to successfully or unsuccessfully make 'correct' predictions can generate considerable controversy [88, 89] . There are few studies that systematically investigate forecasting accuracy and its relationship to the length of time that is being predicted and to the quantity and quality of data available [90, 91] . Other examples are forecasting challenges for ongoing epidemics such as CHIKV in the Americas (https://www.darpa.mil/news-events/2014-08-15), EVD in West Africa [92] and seasonal influenza [85, 93] , designed and initiated by funding agencies and public health governments. This is an important area for future research. There are clear benefits to combining information from different data sources in order to better predict viral epidemic spread. Previous work most commonly presents estimates from different sources side-by-side, for example, estimates of the epidemic reproductive number derived from genomic vs. epidemiological data [46] . Such comparisons are important to assess the consistency of data sources and may help to derive new hypotheses. Spurred by technological innovation such as portable sequencing using the MinION device (Oxford Nanopore Technologies, Oxford, UK) [94] and by interdisciplinary collaborations during disease outbreaks, researchers have started to work to combine three types of transmission data: spatial, genomic and epidemiological which have now been published for three of the four major outbreaks we considered here (Fig. 1, red arrows) [33, 38, 41] . For example, such interdisciplinary work helped to identify the introduction of Zika into the Americas [2] , investigated the main drivers of transmission of ZIKV through climatic suitability of its mosquito vectors [25] and tried to extrapolate how many people had been infected with the virus [23, 92, 93] . In the context of phylogenetic analyses, environmental and other spatial data may be helpful in reconstructing the drivers of transmission and spread using, for example, information on the reservoir or host movements [35, 95] . In turn, phylogenetic information may complement epidemiological analysis by providing more evidence on the transmission routes that are common in an outbreak [96] . This may be particularly useful for diseases that have a highly structured transmission dynamic, such as MERS or SARS, where a small number of people are responsible for the majority of secondary cases [63, 97] , transmission from the animal reservoir is frequent, or importation drives locally observed epidemics [33] . One common assumption in many epidemiological models is that it is equally likely for people to meet and infect others living in the same location and that population immunity is proportional to the demographic structure [98] . Hence, observed cases are often assumed to arise from other cases that are reported locally as long as they are consistent with the generation time of the disease. However, a wellconnected location can, in principle, accrue a large number of incident cases through recurring introductions from elsewhere, rather than via local transmission [33] . These results can have large implications for surveillance and control, as different competing strategies (e.g. limiting importations or eradicating the disease locally) may be considered. While analytical approaches of various degrees of complexity have been proposed to probabilistically reconstruct transmission trees from incomplete outbreak data [73, 81, 97] , contact tracing, which can be very labour intensive [67] , remains a gold standard information source. This may allow us to is to determine the true distribution of cluster sizes (i.e. the number of subsequent cases resulting from each introduction) but is often only available for a small number of locations. However, using genomic data can help refine the understanding of heterogeneity in transmission but such framework does not yet allow to exactly quantify the fraction of observed cases that are attributable to local transmission versus introduction from elsewhere, or to determine how many importations are responsible for the local incidence, despite its crucial importance for eradication campaigns [42, 100, 101] . In the context of the Zika outbreak in Florida, combining genomic data from the outbreak with epidemiological analysis revealed that the outbreak was driven by a large number of introductions rather than by persistent local transmission. In the recent yellow fever outbreak in southern Brazil, linking epidemiological, spatial and genomic data and techniques could provide insights into the transmission potential and risk of urban transmission [102] . One dataset and analysis alone would have not been strong enough to make such conclusions [33] . Inferences about epidemic processes made using mathematical models rely on a number of assumptions. Geographic modelling approaches, mostly informed by spatial ecology, attempt to fill gaps where no data has been observed, hence inferences may be uncertain, as the underlying ecological process may be poorly understood and dynamical aspects of the invasion process are ignored. These deficiencies can be ameliorated, in part, by adding virus genome data that contain information about past transmission and invasion patterns [103] . However, due to incomplete and poor sampling (as discussed above), genomic data alone may provide an incomplete picture of the timing of viral introduction and spread among locations. This, in turn, can be supported by the addition of epidemiological time series of reported cases and serological information about population immunity [104, 105] . Despite this, building a joint inference framework that combines all available data sources and which characterises observation and sampling processes correctly is a daunting task. However, we are entering a period where the data for this task are becoming available in a timely fashion but need to ensure that results are communicated as soon as they are generated in order to avoid delays. Initial successes have already led to important advancements in epidemic control and should progress to a tool-kit for guiding public health, hopefully available in real time for future epidemics. Author ORCIDs. M. U. G. Kraemer http://orcid.org/0000-0001-8838-7147 Ebola virus disease in West Africathe first 9 months of the epidemic and forward projections Zika virus in the Americas: early epidemiological and genetic findings Spread of yellow fever virus outbreak in Angola and the Democratic Republic of the Congo 2015-16: a modelling study The elimination of urban yellow fever in the Americas through the eradication of Aedes aegypti The hidden geometry of complex, network-driven contagion phenomena Emergence and pandemic potential of swine-origin H1N1 influenza virus Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2 Spread of a novel influenza A (H1N1) virus via global airline transportation Who's blueprint list of priority diseases Digital disease detection -harnessing the web for public health surveillance Updates to the zoonotic niche map of Ebola virus disease in Africa Mapping the zoonotic niche of Marburg virus disease in Africa Towards a genomics-informed, real-time, global pathogen surveillance system Modeling infectious disease dynamics in the complex landscape of global health The global distribution and burden of dengue Progress and challenges in infectious disease cartography Global distribution and environmental suitability for chikungunya virus Mapping global environmental suitability for Zika virus Mapping the zoonotic niche of Ebola virus disease in Africa Mapping the zoonotic niche of Lassa fever in Africa Existing and potential infection risk zones of yellow fever worldwide: a modelling analysis Model-based projections of Zika virus infections in childbearing women in the Americas Assessing Seasonal Risks for the introduction and Mosquito-borne spread of Zika virus in Europe Potential for Zika virus introduction and transmission in resource limited countries in Africa and Asia-Pacific Assessment of the potential for international dissemination of Ebola virus via commercial air travel during the 2014 West African outbreak Zika virus transmission in Angola and the potential for further spread to other African settings Nowcasting the spread of chikungunya virus in the Americas Local and regional spread of chikungunya fever in the Americas Yellow Fever in Africa: estimating the burden of disease and impact of mass vaccination from outbreak and serological data The revised global yellow fever risk map and recommendations for vaccination, 2010: consensus of the informal WHO working group on geographic risk for yellow fever Global yellow fever vaccination coverage from 1970 to 2016: an adjusted retrospective analysis. The Lancet Infectious Diseases Genomic epidemiology reveals multiple introductions of Zika virus into the United States Evolution and spread of Ebola virus in Liberia Virus genomes reveal factors that spread and sustained the Ebola epidemic Unifying the epidemiological and evolutionary dynamics of pathogens Virus evolution and transmission in an ever more connected world Epidemic establishment and cryptic transmission of Zika virus in Brazil and the Americas Bayesian phylogeography finds its roots Situation report. Yellow fever Genomic and epidemiological characterisation of a dengue virus outbreak among blood donors in Brazil MERS-CoV spillover at the camel-human interface Explaining the geographic spread of emerging viruses: a new framework for comparing viral genetic information and environmental landscape data Using viral gene sequences to compare and explain the heterogeneous spatial dynamics of virus epidemics Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility Pandemic potential of a strain of influenza A (H1N1): Early findings Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) Insights into the early epidemic spread of Ebola in Sierra Leone provided by viral sequence data Bayesian molecular dating: opening up the black box Phylogeography takes a relaxed random walk in continuous space and time Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak Anticipating the international spread of Zika virus from Brazil Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak Assessing the international spreading risk associated with the 2014 West African Ebola outbreak Early epidemic dynamics of the West African 2014 Ebola outbreak: estimates derived with a simple two-parameter model Resurgence of yellow fever in Angola Modelling the large-scale yellow fever outbreak in Luanda, Angola, and the impact of vaccination Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study Potential for the international spread of Middle East respiratory syndrome in association with Mass Gatherings in Saudi Arabia. PLOS Current Outbreaks Interhuman transmissibility of Middle East respiratory syndrome coronavirus: estimation of pandemic risk Inferences about spatiotemporal variation in dengue virus transmission are sensitive to assumptions about intra-urban human mobility Spatial spread of the West Africa Ebola epidemic Superspreading and the effect of individual variation on disease emergence MERS in South Korea and China: a potential outbreak threat? MERS-CoV outbreak following a single patient exposure in an emergency room in South Korea: an epidemiological. The Lancet Elsevier Ltd Big city, small world: density, contact rates, and transmission of dengue across Pakistan Chains of transmission and control of Ebola virus disease in Conakry Strategies for containing an emerging influenza pandemic in Southeast Asia Inference of R0 and transmission heterogeneity from the size distribution of stuttering chains Detecting differential transmissibilities that affect the size of self-limited outbreaks Using routine surveillance data to estimate the epidemic potential of emerging zoonoses: application to the emergence of US swine origin influenza A H3N2v virus How generation intervals shape the relationship between growth rates and reproductive numbers Similar impacts of control measures Practice of epidemiology a new framework and software to estimate time-varying reproduction numbers during epidemics Real-time estimates in early detection of SARS Exposure patterns driving Ebola transmission in West Africa: a retrospective observational study A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal data Household transmission of 2009 pandemic influenza A (H1N1) virus in the United States Association between antibody titers and protection against influenza virus infection within households Household transmission of influenza virus Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza How social structures, space, and behaviors shape the spread of infectious diseases using chikungunya as a case study Quantifying influenza virus diversity and transmission in humans Mathematical modeling of the West Africa Ebola epidemic Real-time assessment of health-care requirements during the Zika virus epidemic in Martinique Temporal changes in Ebola transmission in Sierra Leone and implications for control requirements: a real-time modelling study Nextstrain: real-time tracking of pathogen evolution Ebola: the power of behaviour change Estimating the future number of cases in the Ebola epidemic -Liberia and Sierra Leone Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture Evaluating probabilistic dengue risk forecasts from a prototype early warning system for Brazil The RAPIDD Ebola forecasting challenge special issue: preface Results from the second year of a collaborative effort to forecast influenza seasons in the United States Real-time, portable genome sequencing for Ebola surveillance Genomic epidemiology reconstructs the introduction and spread of Zika virus in Central America and Mexico Emerging concepts of data integration in pathogen phylodynamics Unraveling the drivers of MERS-CoV transmission Infectious Diseases of Humans: Dynamics and Control Mapping residual transmission for malaria elimination Spatial and temporal dynamics of superspreading events in the 2014-2015 West Africa Ebola epidemic Measuring the path toward malaria elimination Genomic and epidemiological monitoring of yellow fever virus transmission potential Global spread of dengue virus types: mapping the 70 year history Understanding herd immunity Use of serological surveys to generate key insights into the changing global landscape of infectious disease