key: cord-0752606-g3ofidds authors: Bhatia, Sangeeta; Imai, Natsuko; Cuomo-Dannenburg, Gina; Baguelin, Marc; Boonyasiri, Adhiratha; Cori, Anne; Cucunubá, Zulma; Dorigatti, Ilaria; FitzJohn, Rich; Fu, Han; Gaythorpe, Katy; Ghani, Azra; Hamlet, Arran; Hinsley, Wes; Laydon, Daniel; Nedjati-Gilani, Gemma; Okell, Lucy; Riley, Steven; Thompson, Hayley; van Elsland, Sabine; Volz, Erik; Wang, Haowei; Wang, Yuanrong; Whittaker, Charles; Xi, Xiaoyue; Donnelly, Christl A.; Ferguson, Neil M. title: Estimating the number of undetected COVID-19 cases among travellers from mainland China date: 2020-06-15 journal: Wellcome Open Res DOI: 10.12688/wellcomeopenres.15805.1 sha: 493b48b13eb63f17818fc6fe381cf946043d3f46 doc_id: 752606 cord_uid: g3ofidds Background: Since the start of the COVID-19 epidemic in late 2019, there have been more than 152 affected regions and countries with over 110,000 confirmed cases outside mainland China. Methods: We analysed COVID-19 cases among travellers from mainland China to different regions and countries, comparing the region- and country-specific rates of detected and confirmed cases per flight volume to estimate the relative sensitivity of surveillance in different regions and countries. Results: Although travel restrictions from Wuhan City and other cities across China may have reduced the absolute number of travellers to and from China, we estimated that more than two thirds (70%, 95% CI: 54% - 80%, compared to Singapore; 75%, 95% CI: 66% - 82%, compared to multiple countries) of cases exported from mainland China have remained undetected. Conclusions: These undetected cases potentially resulted in multiple chains of human-to-human transmission outside mainland China. As of 18 th March 2020, over 80,000 cases of COVID-19 (formerly 2019-nCoV) have been reported in China (with 3231 deaths), and more than 110,000 cases have been detected in 152 regions and countries outside mainland China (including Hong Kong SAR and Macau SAR) 1 . Several analyses have been undertaken to predict or estimate the risk of exported cases by country on the basis of flight connections between Wuhan City, China or mainland China as a whole and other regions and countries [2] [3] [4] [5] [6] . In this analysis we built on published work 4 to analyse COVID-19 cases reported and confirmed in different countries that were exported from mainland China, comparing the region-and country-specific rates of detected cases per flight volume to estimate the relative sensitivity of surveillance in different countries. We then estimate the number of COVID-19 cases exported from mainland China that have remained undetected worldwide. Air traffic volume. Air travel data for the months of January, February, and March 2016 were obtained from the International Air Travel Association (IATA), with the sum divided by three to get destination-region-and destination-countryspecific monthly averages. These numbers were not scaled up to reflect recent growth in air travel because any constant scaling of the monthly averages would simply be absorbed into the λ estimate and not affect other results. Flows of passengers within mainland China were excluded from this analysis. We collated data on 3276 cases in international travelers from media reports and provincial and national department of health press releases up until 27 February 2020 1 7 . We defined a local transmission as any transmission that occurred outside mainland China (Hong Kong SAR and Macau SAR are considered outside mainland China for this analysis). We only consider cases that were not transmitted locally. That is, we only considered cases detected outside mainland China that had a travel history to China and arrived outside mainland China by air, excluding repatriation flights (Table 1) . Based upon these inclusion criteria, a total of 173 cases were included in our analysis. The earliest date of travel for the cases included in the analysis is 1 January 2020, and the latest date of travel is 25 February 2020. We assume that the number of exported cases in a country i is Poisson distributed with a mean that depends on the air traffic from Wuhan to i, and the sensitivity of surveillance in i relative to a country j, denoted by s ei . For each country i, let X i be the number of exported cases (a count) and let F i be the volume of air traffic from Wuhan to country i. We can then write a joint log likelihood for the data from countries i and j: ignoring additive constants. Thus, the maximum likelihood estimates for λ and s ei are: ˆˆ and . The likelihood-based confidence intervals are obtained by calculating the maximum log likelihood (over values of λ) for each value of s ei . Then the 95% confidence interval includes all those values of s ei such that of the chi-squared distribution with 1 degree of freedom). These calculations were all performed using R version 3.6.0. The relative sensitivities can also be estimated relative to J countries simultaneously using a method similar to above but with the log likelihood: Expected values can then be calculated for every country i as simply ˆ, i F λ and the expected value for all countries where N is the total number of countries with air traffic from Wuhan Tianhe International Airport (N = 119). The number of exported cases by country was plotted as a function of the average monthly passenger volume originating from Wuhan Tianhe International Airport on international flights (Figure 1 7 ) . This showed Singapore to be an outlier in terms of having relatively many exported cases compared to the measure of air traffic volume. The relative sensitivity of surveillance in individual countries was estimated compared to Singapore. Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada were all found to have relative sensitivity estimates greater than 1 (i.e. more cases were detected per passenger flight than in Singapore). Thus, a second set of relative sensitivity estimates was obtained for all other individual countries compared simultaneously to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada. The region-and country-specific expected numbers of exported COVID-19 cases were in several cases substantially higher than the numbers detected (Figure 2 7 ) . The sum of the expected numbers of exported COVID-19 cases for all regions and countries other than mainland China was 576.8 (95% CI: 372.2 -845.4), based on the analysis relative to Singapore only, and 704.4 (95% CI: 510.3 -942.3), based on the analysis relative to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada. Given that 173 such cases were detected, these central estimates suggest that between 70% (95% CI: 54% -80%, relative to Singapore only) and 75% 1 This has been updated since the analysis presented here was released as a public report by the Imperial College London Coronavirus Response Team on available 22nd February 2020. This report is available at https://www. imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-6international-surveillance/. See https://doi.org/10.5281/zenodo.3736643. (95% CI: 66% -82%, relative to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada) remained undetected. Consistent with similar analyses 4 , we estimated that more than two thirds of COVID-19 cases exported from mainland China have remained undetected worldwide, potentially leaving sources of human-to-human transmission unchecked (70%, 95% CI: 54% -80% and 75%, 95% CI: 66% -82%, undetected, based on comparisons to Singapore only and to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada, respectively). Undoubtedly, the exported cases vary in the severity of their clinical symptoms, making some cases more difficult to detect than others. However, some countries have detected significantly fewer than would have been expected based on the volume of flight passengers arriving from Wuhan City, China. These undetected cases potentially resulted in multiple chains of human-to-human transmission outside mainland China. The air travel data used in this analysis can be purchased from International Air Transport Association (IATA) via the following link: https://www.iata.org/en/services/statistics/airtransport-stats/. Underlying data Zenodo: mrc-ide/COVID19_surveillance_sensitivity: Data and code used for submission. http://doi.org/10.5281/zenodo.3736643 7 . This project contains the following underlying data: • exported_cases.csv(information on the date of report, country of report and travel history of 3,276 cases outside mainland China) Zenodo: mrc-ide/COVID19_surveillance_sensitivity: Data and code used for submission. http://doi.org/10.5281/zenodo.3736643 7 . This project contains the following extended data: • data_processing.R (R code to post-process international case data) Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore This is a well-done analysis that estimates the number of unreported cases that were imported from China early on in the pandemic. It does this using data on air travel volume between China and other countries, and the reported numbers of cases in the places that reported the highest numbers of cases given their air traffic volume. This analysis was highly relevant early in the pandemic as infections spread from China. I have a few comments on points for clarification below. Abstract: The statement of the results about 70/75%... in the abstract is confusing. Suggest rephrasing. Conclusion: I wonder if this is a conclusion from the paper. I would suggest the addition here that the analysis leads to estimates that there were many unreported imported infections, and that potentially lead to transmission. Main text: Background: It would be helpful to have a statement about what the previous analysis in reference 4 did/showed. Methods: Please add more detail on how airports in China were used in the flow calculation, and also on the definition of a destination region. In the results section initially Wuhan is focused on but the general conclusions seem to be from all of mainland China. Please clarify throughout. Please add more detail from where the collated data on imported cases was obtained. Were all excluded cases excluded because they were defined as local or due to missing information on this? Was data available on which location was travelled from within China for the imported cases? If not, how was this dealt with in the analysis? Results: Figure 2 legend, are the numbers shown relative to Singapore numbers, or is the analysis done relative to Singapore and then the estimates of imported cases shown? At the moment, the legend reads as the former, but my understanding of the analysis is that is it the latter. Please clarify. Discussion: Please add on limitations of the analysis, in particular how this relates to the available data including classification as imported vs local, and that this data needed to be publicly available. Are all the source data underlying the results available to ensure full reproducibility? Yes Competing Interests: No competing interests were disclosed. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Christl Donnelly, Imperial College London, London, UK The statement in the abstract has been rephrased for clarity. 1. The conclusion in the abstract has been edited as suggested. We have added a sentence about the analysis and conclusions of Reference 4. 3. Destination-regions refer to Hong Kong SAR and Macau SAR, this clarification has been added to the text. We have now edited the Results and the Discussion sections to emphasise that we are estimating the number of exported cases from Wuhan (rather than China). We have expanded the section on data collation to clarify the points raised by the 6. reviewer. The inclusion criteria have been elaborated to clarify when a case was excluded. As the reviewer has rightly noted, the numbers in figure 2 imported to other countries that originated in China. The observed number of cases imported to each country is assumed to be Poisson distributed, and the expected number of imported cases relative to Singapore (used as a reference country because of its high ratio of detected cases per flight volume) is derived using standard maximum likelihood estimation methods. A similar procedure is used to compare the expected number of imported cases relative to a set of reference countries which had estimated sensitivities of surveillance higher than that of Singapore. The main finding is that at least 2/3 of cases exported from China remained undetected worldwide. Issues and comments: The code provided does not run out-of-the-box. In data_processing.R `exported_cases_paper_test.csv` should be `exported_cases.csv` and the line defining the total number of exported cases should be moved after the definition of `exported`. Furthermore, only the data processing script for cases in international travelers is provided, not the code for data analysis or visualization of results. While we understand that the authors cannot share the IATA data, they should at least provide results sufficient to reproduce Figure 2 , and preferably all code for analysis and visualization such that anyone with access to the data could reproduce the results. There is a disconnect between the use of passenger flow data from only WTIA and analysis of cases thought to have acquired infection in mainland China, regardless of whether they travelled through Wuhan or Hubei; some explanation of the decision to use these inconsistent definitions is warranted. Please clarify why repatriation flights have been excluded when selecting cases for analysis. At the beginning of the analysis section, "We assume that the number of exported cases in a country i…" should read "We assume that the observed number of exported cases in a country i…". This distinction should be clarified throughout. Some mention could be made of the reason for using 2016 flight data rather than more recent data. Some mention of potential biases and pitfalls in methods and the data would be appropriate. There are a small number of relevant, recently published works not mentioned -e.g. https://www.medrxiv.org/content/10.1101/2020.03.23.20038331v2 1 and https://doi.org/10.1016/S0140-6736(20)30411-6 2 . The background in the abstract is very out of date; suggest updating numbers and adding a date marker ('as of…' or similar) Additional labels for the figures would be useful (beyond the select countries labeled in Figure 1 ). Alternatively, per-country estimates could be made available as a table. In the methods section ('Data Sources') the parameter lambda is referred to before being introduced. In the discussion "Consistent with similar analyses" has only one citation. Reconstructing the global dynamics of underascertained COVID-19 cases and infections. medRxiv. 2020. Publisher Full Text Is the work clearly and accurately presented and does it cite the current literature? Partly Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Estimating the number of undetected COVID-19 cases exported internationally from all of China. medRxiv Publisher Full Text Is the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Partly If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? imported cases. The legend has been edited to reflect this. We have expanded the discussion to highlight the limitations and potential biases of the analysis. Competing Interests: No competing interests were disclosed.Are all the source data underlying the results available to ensure full reproducibility? Partly Competing Interests: No competing interests were disclosed.Reviewer Expertise: infectious disease dynamics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Christl Donnelly, Imperial College London, London, UKThe section on data collation has been expanded to clarify the points raised by the reviewer. Thanks, we have edited the indices. A direct comparison with the results of Golding et al is difficult because they present country-specific ascertainment estimates while the goal of our analysis was to estimate the number of undetected COVID-19 cases globally and we have therefore not provided per-country estimates of surveillance sensitivity. We have expanded the discussion to highlight the limitations and potential biases of the analysis. Competing Interests: No competing interests were disclosed. No competing interests were disclosed. We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above. Christl Donnelly, Imperial College London, London, UK We apologise for the omission of relevant code. We have now added all relevant code so that someone with relevant data can reproduce the results. We have also included a file with dummy data on travel volume to help the readers. The README file has also been updated to include instructions on running the code. Everyone we classified as cases detected overseas had travel by air either explicitly mentioned, or implied as the most probable mode of travel from mainland China to the destination (e.g. from China to Italy). Where multiple modes of travel are possible e.g. from mainland China to Hong Kong, we have only classified individuals as cases detected overseas where the mode of travel was explicitly mentioned as air. The text has been updated to clarify this. In most instances, all passengers on repatriation flights had been tested for the presence of SARS-CoV2. The cases detected through surveillance of repatriation flights are therefore not representative of the typical sensitivity of surveillance in a country. We have therefore excluded these from the analysis. The text has been updated to clarify this. This sentence has now been edited and the distinction has been emphasised in the rest of the text. The data from 2016 were the most recent data to which we had access when undertaking our analysis. Further, any constant scaling of the volume of passengers would not affect the estimates of model parameters (lambda and s_e). This has been emphasised in the text. We have included the limitations of the method and data sources in the discussion. 6.Thanks for highlighting these relevant references. We have now included reference to these in the section Background. The abstract and the reference were updated as of August 2021. 8.The reference to lambda has been removed from the methods section and a reference added to the appropriate section. We have added reference to other studies conducted at the time which provide estimate surveillance sensitivity globally. Competing Interests: No competing interests were disclosed.