key: cord-0740032-pcn2jsft authors: Qin, J.; Chen, F.; Ma, H.; Liu, Y.; Follmann, D.; Zhou, Y. title: How many COVID-19 PCR positive individuals do weexpect to see on the Diamond Princess cruise ship? date: 2020-11-16 journal: medRxiv : the preprint server for health sciences DOI: 10.1101/2020.11.14.20230938 sha: 5adbb733ba8028c6446e442f4d7a65854b2833b5 doc_id: 740032 cord_uid: pcn2jsft When COVID-19 was detected among passengers on Diamond Princess (DP) cruise ship in the end of January and beginning of February of this year, unfortunately it has become an ideal experimental model for studying the transmission potential of COVID-19 in a closed environment while it is hard to do so in the wider open population. Information collected from such an outbreak is crucial for policy makers to understand and manage the epidemic. To disclose the information such as infection onset time, transmission time, and so on from the available observed incomplete data, we must develop valid statistic models and solid inference methods. Due to the fact that the priority for RT-PCR test for COVID-19 was given to symptomatic and their close contacts and elderly individuals, we have to take this selection bias into considerations in the statistic inference. Based on RT-PCR test data performed on the Diamond Princess cruise, in this paper we propose a novel mixture model where the mixing proportions vary with time to estimate the infection distribution and the total infected individuals after a 14-day of quarantine. Compared with the epidemiologic description of COVID-19 spread in open space, we have found some unique features in the Diamond Princess cruise ship. Our fndings may shed lights on preventing future pandemic outbreaks in cruise ship. Daily time series of RT-PCR test data from February 5 to February 20 can be found in Table 1 of Mizumoto et al.(2020) , including the number of tests, number of testing positive cases, number of cases in presence or absence of symptoms, etc (Mizumoto et al., 2020) . However, the infection time and the infection rate were unknown. Data from February 11 and February 14 were not available in the original data sources. We carefully address this missing data problem in our statistical analysis. At the beginning, PCR tests had been conducted mainly for symptomatic groups and their high-risk close contacts, and then for almost all persons in the second week. As of February 20, a total 3063 respiratory specimens were tested with 634 positive, including one quarantine officer, one nurse, and one administrative officer. Of these 634 cases, 313 were female and 321 were male. 476 cases had age 60 years or older (Mizumoto et al., 2020) . For convenience, the daily time series data with number of tested individuals and individuals with positive results are provided in Table 1. 4 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; In this Section, we provide an obvious lower bound estimation of the number of infected individuals and a more sophisticated statistical method to estimate this number and the infection distribution using the daily time-series data described in Table 1 . Let f (x) and F (x) be the density and cumulative distribution functions of infection onset time X calculated from February 4, respectively. Therefore F i = F (i) is the probability of the infection onset time occurring before 5 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint day i starting from February 4. For example, F 1 represents the probability of testing positive on February 5. According to the non-decreasing property of the distribution function, F i should satisfy the constraint F 1 ≤ F 2 ≤ · · · ≤ F 14 . First, we may hypothetically assume that no PCR tests were performed in the first 13 days of quarantine. On February 20, the last quarantine day, 52 individuals were selected by the officer on the ship for testing and 13 positives were found. Therefore the infection rate is 13/52 = 25%. Out of the n = 3711 passengers and crew members, we expect to see 3711 * 0.25 = 928 PCR positive results. In reality, the selection of individuals for PCR test in the first week was not random. Symptomatic individuals, elders and closely related individuals were selected first. Therefore, 928 should be a lower bound estimation of total PCR-positive individuals after the end of quarantine, as these 52 should be less likely to be PCR-positive compared to the ship as a whole. We will give a full explanation of the larger estimation of N theoretically below. Next we shall show statistically that a non-random sampling was implemented in the selection for PCR test. Suppose there is no selection bias, i.e., a random sampling was used. Let Let n i and N i be the number of tested positive cases and number of tests at day i respectively, and X ij be the infection time of the jth subject who was tested at day i, j = 1, 2, . . . , N i ; i = 1, 2, . . . , 14. Instead of observing the exact infection time X ij , we can only observe the number of tested positive We use the nonparametric likelihood method directly to estimate F i . The observed likelihood function is This is a standard current status data problem discussed extensively in statistical literature, for 6 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint Figure 1 : Comparison of n i /N i and fitted infection rates using PAVA. n i represents daily number of positive individuals, N i is the total number of tests each day. Scatter points are the rate of n i /N i . Blue points represent those whose corresponding total test number was less than 100, while red points mean the total test number was larger than 100. Black line shows the fitted infection distribution consisting of F i . example, Sun (2006) . To maximize the log-likelihood with non-decreasing constraints on F 1 ≤ F 2 ≤ · · · ≤ F 14 , we can solve the above constrained optimization problem using the well-known pool adjacent violators algorithm (PAVA) invented by Ayer et al.(1955) , which has been implemented in R packages like Iso or isotone. Easily we can find againF 14 = 0.25. Therefore the estimation of the number of PCR-positive patients remains the same. However, frequency and estimated probability on each day have a large discrepancy, especially in the first week. This demonstrates that the random selection process was violated. Next we provide a novel mixture modeling strategy that fully utilizes the non-random sam-7 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint pling method adapted in the Diamond Princess cruise during the 14-day quarantine. Although COVID-19 is a very contagious disease, some people may be immune to it. Naturally the cure rate model discussed by Farewell (1982) , for example, can be used. At each day i, we suppose that tested individuals are a mixture of a proportion of λ i susceptible individuals who eventually get infected and a proportion of 1 − λ i who are not susceptible to COVID-19 and never get infected on the cruise. This means, , θ 4 < 0, i = 8, 9, . . . , 14. where θ 1 , θ 2 are separately the shape, scale parameters of weibull distribution and coefficients perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint and more unsusceptible subjects were left to be tested. At this time, we have E(n i ) = λ i N i F i . The observed likelihood function is We propose to maximize the following weighted log-likelihood Let n = 3711 be the total number of passengers and crews onboard on February 5. Denote as the theoretical number of PCR positive individuals at the end of 14 days quarantine, which can be estimated byN * = In the above formulaiton, (14) represents the estimated number of PCR positive cases among individuals tested before February 20 and (n − N )λ 14F (14) is the 9 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint predictive infectious value for those conducted PCR tests later. We apply the proposed parametric mixture model to PCR test data performed on the Diamond Princess and maximum likelihood estimators of parameters are obtained. We used a parametric bootstrap approach to derive confidence intervals of the unknown parameters. Specifically, we generated 500 bootstrap samples based on the parametric mixture model with estimated parameters. The 95% confidence intervals (CI) were derived through normal approximation, where the estimated standard errors were calculated as the standard deviation of bootstrap sample estimators. Using the mixture modelling strategy, the maximum likelihood estimators by maximizing the Based on our novel mixture model, the estimated total infected number N at the end of quarantine is 1036, with 95% confidence interval [970, 1103] . We estimated N by combining the total tested number 3063 before February 20 with 3711−3063=648 individuals whose test results were unknown. The estimated number 1036, is larger than the obvious lower bound estimation 928 in the previous section, which is in accordance with our expectation. Figure 2 displays the estimated Weibull density function that specifically shows the infection pattern among susceptible individuals on the cruise. It is easy to see from Figure 2 that the infection 10 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint week. We obtained the mean infection time among susceptible individuals was at day 6.06 of the quarantine. Moreover, in this scatter plot, we used red and blue to demonstrate the total number of tests N i > 100 or not, respectively. We found the estimated infection proportions were a better fit for N i > 100. This is not surprising because more weights should be given to points where N i > 100 with larger numbers of tested individuals. Points where N i < 100 with small tested individuals might cause statistical instability. It is worth pointing out that only 6 11 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; According to the public available data released by Johns Hopkins University, there were 712 confirmed cases on Diamond Princess cruise ship as of April 6, 2020. Our "obvious" estimate of 928 by using the last day quarantine data and mixture modelling based estimation of 1036 by taking the biased selection process into account both suggest that the number of PCR-positive individuals should be larger than the reported one. Below is some evidence that may potentially support our findings of the larger estimated N . 12 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint Hung (2020) recruited 215 passengers from Hong Kong who had been on board the Diamond Princess cruise ship. All 215 participants had been found to be negative for SARS CoV-2 by PCR 4 days before disembarking and were transferred to further quarantine in a public estate in Hong Kong, where they were recruited. Participants were prospectively screened by quantitative PCR and other detection methods during the quarantine. Of these 215 participants, nine individuals were positive for SARS-CoV-2 by RT-qPCR or serology while they were considered to be "uninfected" on the cruise. The unprecedented spread of COVID-19 owes its high transmissibility of pre-symptomatic and asymptomatic transmission. A large fraction of infections is asymptomatic and many others result in mild symptoms that could be mistaken for other respiratory illnesses (Perkins et al., 2020) . perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; mission via the central air conditioning system or drainage systems (Zhang et al., 2020) . These factors point to a potentially large reservoir of unobserved infections. Our statistical methods focus on exploring the information contained in the data, without regard to the impact of unnoticed incidents. For example, it should be noted that some asymptomatic cases may be missed due to the imperfect sensitivity of the PCR test. As pointed by Zhao perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint pairs both with known onset of symptoms, Ferretti et al.(2020) found that nearly 37% of them belong to pre-symptomatic transmission. Direct transmission through a contact might occur at a time before awareness of taking conservatory measures. Meanwhile, exposure of the infected mainly depends on symptom-based surveillance all over the world. Many individuals linked to an identified source can be discovered by retrospective analysis. Tracing back to presumed exposure or a location where virus spread is a good approach to explore more unnoticed infections. In this paper we have explored two statistical methods and mainly focused on a novel mixture model to quantify the infection distribution and the susceptible proportions among daily tested people. Since the daily numbers of infected cases are not observable, it is worth pointing out that our statistic modelling and inference play an important role to estimate them accurately. Based perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 16, 2020. ; https://doi.org/10.1101/2020.11.14.20230938 doi: medRxiv preprint False-Negative Results of Initial RT-PCR Assays for COVID-19: A Systematic Review, medRxiv An empirical distribution function for sampling with incomplete information The Use of Mixture Models for the Analysis of Survival Data with Long-Term Survivors Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data The scaling of contact rates with population density for the infectious disease models SARS-CoV-2 shedding and seroconversion among passengers quarantined after disembarking a cruise ship: a case series Passengers to be evacuated from Antarctic cruise ship after almost 60% test positive for coronavirus France finds more than 1,000 virus cases on aircraft carrier Variation in False Negative Rate of RT-PCR Based SARS-CoV-2 Tests by Time Since Exposure Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia A preliminary study on serological assay for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in 238 admitted hospital patients What the cruise-ship outbreaks reveal about COVID-19 Available from Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Field Briefing: Diamond Princess COVID-19 Cases Estimating unobserved SARS-CoV-2 infections in the United States. medRxiv COVID-19 outbreak on the Diamond Princess cruise ship: estimating the epidemic potential and effectiveness of public health countermeasures Non-severe vs severe symptomatic COVID-19: 104 cases from the outbreak on the cruise ship "Diamond Princess Haplotype networks of SARS-CoV-2 infections in the Diamond Princess cruise ship outbreak The Statistical Analysis of Interval-censored Failure Time Data Weak Convergence and Empirical Processes: With Applications to Statistics Distribution Theory, Stochastic Processes and Infectious Disease Modelling Antibody responses to SARS CoV-2 in patients of novel coronavirus disease 2019 Preliminary estimation of the basic reproduction number of Novel Coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak Estimation of the reproductive number of Novel Coronavirus We thank Benjamin Snow, ELS, from Leidos Biomedical Research, Inc for providing a technical review of the manuscript.