key: cord-0712341-281t4lyn authors: Fah Yap, Fook; Yong, Minglee title: Implementation of a real-time, data-driven online Epidemic Calculator for tracking the spread of COVID-19 in Singapore and other countries date: 2021-10-14 journal: Infect Dis Model DOI: 10.1016/j.idm.2021.10.002 sha: 68ca0c40e49c05bc61fef75d4a7b57f43cba2e99 doc_id: 712341 cord_uid: 281t4lyn While there are many online data dashboards on COVID-19, there are few analytics available to the public and non-epidemiologists to help them gain a deeper insight into the COVID-19 pandemic and evaluate the effectiveness of social intervention measures. To address the issue, this study describes the methods underlying the development of a real-time, data-driven online Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available infection case and death data, the calculator is used to estimate the effective reproduction number, final epidemic size, and death toll. As a case study, we analyzed the results for Singapore during the “Circuit breaker” period from April 7, 2020 to the end of May 2020. The calculator shows that the stringent measures imposed have an immediate effect of rapidly slowing down the spread of the coronavirus. After about two weeks, the effective reproduction number reduced to about 1.0. Since then, the number has been fluctuating around 1.0 for more than a month. The COVID-19 Epidemic Calculator is available in the form of an online Google Sheet and the results are presented as Tableau Public dashboards at www.cv19.one. By making the calculator readily accessible online, the public can have a tool to assess the effectiveness of measures to control the pandemic meaningfully. Disclosure of potential conflicts of interest The authors declare that they have no conflict of interest. This research study did not involve collection of data from human participants. Ethics approval is not required. Informed consent Informed consent was not required since there were no individual participants in this study. Data and/or Code availability The data used in the calculations are publicly available and the sources are cited in the References section. The codes for the calculator can be downloaded from the website, www.cv19.one. Authors' contribution statements Conceptualization: Fook Fah YAP and Minglee YONG; Methodology: Fook Fah YAP and Minglee YONG; Formal analysis and investigation: Fook Fah YAP; Writing -original draft preparation: Fook Fah YAP; Writing -review and editing: Fook Fah YAP and Minglee YONG J o u r n a l P r e -p r o o f As countries worldwide take drastic measures to contain the COVID-19 pandemic, people need to understand the effectiveness of such interventions. Making sense of epidemiological data can be challenging, given confusing and overlapping terminology. Raw data and statistics on infection numbers (e.g., Figure 1 ) do not directly help answer the following questions: (1) Are social distancing measures working? (2) How much longer does it take to flatten the curve? (3) What will be the final death toll? Data analysts in other disciplines such as social sciences, economics, and management could explore how epidemiological trends impact their own areas. This paper describes the methods underlying an online COVID-19 Epidemic Calculator for tracking and estimating COVID-19 growth parameters, including reproduction number, final epidemic size, and death toll. These methods are illustrated using the case example of Singapore. We demonstrate how the calculator can reveal the effect of imposing strict social distancing measures ("Circuit breaker") from April 7, 2020 that is not apparent from just looking at infection numbers. While our methodology is similar in certain aspects to several freely available software packages and programming codes for calculating the effective reproduction number (e.g. [21, 23, 24, 33] ), we differ from those work because we implement real-time, data-driven calculations in the widely used Excel spreadsheet, with sub-minute execution time, even for calculating a 4-month, 100country data set. Furthermore, this calculator is readily available as an online spreadsheet [30] to facilitate sharing and collaboration. The input data for the calculator is obtained from publicly available sources [2, 3, 31] and is automatically updated daily. J o u r n a l P r e -p r o o f Singapore. Figure 2 identifies the different overlapping terms used in epidemiology and illustrates the timeline for the various stages of infection. These terms and variables will be used for calculating parameters that can help us understand and monitor the spread of the COVID-19 infection in a country. Exposed is the state at which an individual first becomes infected but is not yet contagious. The latent period is the time from being infected (exposed) to becoming contagious. An infected person can be contagious even before the onset of symptoms. Data suggests that some people could have infected others 1 to 3 days before they developed symptoms [10, 11] . The incubation period is the time from exposed to the onset of symptoms. The mean incubation period for COVID-19 is estimated to be 5 days [4, 29] . The infectious period is the time between becoming contagious to the time of removal or recovery. Hence, it is the difference between the time of removal and the latent period (Tremoved -Tlatent). In Singapore, the 14-day average time from the onset of symptoms to removal ranges from 1.5 to 6 days after the start of the Circuit breaker on April 7, 2020 ( Figure 3 ). The serial interval is the time when a secondary infection is generated. For COVID-19 in Singapore, the serial interval between transmission pairs ranges between 3 days and 8 days [5] . Other researchers have reported serial intervals within the same range [6, 7, 8, 29] . Serial Interval (7) Incubation period (5) Infectious period (5) Onset-Removal ( The rate of infection growth in a population can be estimated using the effective reproduction number. The effective reproduction number is the number of secondary cases directly being infected by a primary case in a population. Social distancing measures should reduce the spread of infection and this would be reflected by a reduction in the effective reproduction number. Hence, monitoring the effective reproduction number over time will allow us to evaluate if social distancing measures or any other interventions are working. We will demonstrate how to estimate the effective reproduction number using a Bayesian approach. We will also show how to derive estimates of dates of actual symptom onset and dates of being exposed which are important for our estimation of the effective reproduction number. Estimating the time needed to flatten an epidemic curve is an important part of forecasting the scope of an infectious disease outbreak. When new cases are significantly reduced, social distancing restrictions can be relaxed and other less intrusive measures can be put in place. In this study, we will show how logistic and Gompertz models can be used for forecasting the future number of cases and deaths over time using only publicly available data. These numbers will allow us to gauge the vulnerability of the population and quantify the direct health impact of COVID- Since these parameters could help non-epidemiologists understand the spread of COVID-19 in their countries and other countries and regions, the objective of this research study is to develop a readily available online COVID-19 Epidemic Calculator to provide estimates of the critical parameters described here. The interested public can access this online calculator to gain a deeper J o u r n a l P r e -p r o o f insight into the COVID-19 pandemic and evaluate the effectiveness of a range of public health and social intervention measures. Step 1: Deriving symptom onset dates from confirmation dates The daily number of reported cases is partly dependent on the number of tests conducted, which may be variable due to factors such as testing capacity and the day of week. To account for this variation, we perform a running 7-day average of test cases. Other methods of applying a smoothing filter to the time series may be used if appropriate. Another issue is the delay between the onset of symptoms and case confirmation (removal or isolation). Case onset dates can be derived if records of onset-to-confirmation dates are available for every individual (e.g. see Figure 3 ). Otherwise, case onset dates can be estimated by using the following procedure. i) For each date, distribute case counts back in time according to a Poisson distribution with a mean of 3 days (symptom onset to removal) as illustrated in Figure 4 . that are yet to be reported. We correct this by estimating the percentage of onset cases on Day (t-a) that have not yet been reported by today (Day t). We can use the cumulative distribution function of the Poisson "onset-to-removed" distribution to adjust for the number of onset cases, thus removing right censoring. Adjusted onset = Onset (Delay ≤ Days from present date) Consider an example illustrated in Figure 6 . Three days ago, there were 470 reported onset cases. This represents the fraction of the actual number reported over the next 3 days. This fraction is equal to the value of the cumulative distribution function of our Poisson distribution at Day 3, which is 65%. Hence, the current count of onset on that day represents 65% of the actual total. After adjustment, the actual total is estimated to be (1/0.65) of 470, which is 723. Figure 7 shows the adjusted onset curve. Step 2: Deriving infection (exposed) dates from onset dates A similar procedure as in Step 1 can be applied to the onset counts to derive the infection (exposed) time series. Figure 8 shows the adjusted exposed time series where the incubation period (from exposed to symptom onset) follows a Poisson distribution with a mean of 5 days. growing at an exponential rate. If R(t) is at 1.0, the spread is sustained at a linear rate. If R(t) is less than 1.0, the infection is spreading at a slower pace and will eventually die out. Although R(t) cannot be measured directly, it can be estimated in different ways. We describe two methods that can be implemented in a spreadsheet without any programming codes. The Bayesian approach allows us to continuously update our estimate of a set of parameters, Θ, as more data becomes available. (Θ), the prior distribution, represents our prior estimates about the true value of Θ. where ∝ means "proportional to". For estimating Rt, the Bayes' theorem that we use is where the data, kt, is the daily number of cases, and the parameter, Rt, is the effective reproduction number. Equation (4) is updated every day by using yesterday's posterior, ( −1 | −1 ), to be today's prior ( ). On day two, the equation becomes So generally, Assuming a uniform starting prior ( 1 ), this reduces to: Note that the posterior on any given day is equally influenced by the distant past as much as the recent day. This is fine if we are estimating a static parameter that does not change with time. However, the value of Rt is dynamic and is more closely related to recent values than older ones. To address this issue, we can adopt Systrom's approach [21] of only incorporating the last m days of the likelihood function: To calculate the likelihood function ℒ( | ) = ( | ), we first assume that the number of new infections on any given day can be described by a Poisson probability distribution with a mean of λ. The probability of seeing k new cases is Bettencourt & Ribeiro [22] has derived an equation relating Rt to λ. where γ is the reciprocal of the serial interval (see Figure 2 ). Figure 9 shows the variation of λ with Rt for some values of kt-1. In evaluating the posteriors, it is more convenient to use the logarithm of the likelihood function. To perform the Bayesian update, we can do a sum of the log-likelihoods over the last m days and then exponentiate to get the likelihood. From equations (8) and (12) From the posterior distribution ( Figure 11 ) we can also obtain the confidence interval for Rt. When the growth rate is slowing down (Rt < 1), we can project the final total cases and death counts by fitting publicly available data to a logistic model. The logistic model is often used to describe the shape of the cumulative epidemic curve (Figure 12 ) where the number of infected cases grow exponentially at first, then slows down, and finally flattens to a maximum limit. The final epidemic size can be estimated based on this slowing growth. Some research [e.g. [34] [35] [36] [37] have also suggested that another parametric model that can be used for forecasting COVID-19 case or death count is the Gompertz function, defined as It is a special case of the generalised logistic function. The final value asymptote of the function is approached more slowly by the curve than the initial value asymptote, unlike the simple logistic function in which both asymptotes are approached by the curve symmetrically. For example, Figure 13 shows the cumulative death over time for a few countries that clearly illustrate the asymmetry. Figure 13 . Cumulative COVID-19 death for Singapore, UK, Brazil and USA. To find the best curve fit to the data and an estimate for , we use the maximum likelihood method [15] . We assume that the number of reported cases, xi, at time, ti, follows the Poisson distribution and has a mean of , where is the calculated number of cases at time, ti. Then, the log-likelihood function to be maximized is We choose the parameter values for CF and r that maximize the log-likelihood function. This can be done by using the Solver function in Excel. The parameter CF is estimated over a rolling window of, say 60 days, to obtain a moving update. See Figure 20 . Figure 14 shows the most likely values of Rt and the confidence interval over time for Singapore during the Circuit breaker period calculated using the Bayesian. The serial interval is assumed to be a Gamma distribution with a mean of 7 days and a mode of 4 days (standard deviation = 4.6 days). We can see that Rt changes with time and the confidence interval narrows with more data. The results generally agree with those calculated using the EpiEstim code ( Figure 15 ) [23, 24] . One problem with the calculation method is that it can only provide a good estimate for the reproduction number up to about one to two weeks before the current date. This is due to the time lag between infection and confirmation. As we get closer to the present day, the calculated mean value of Rt always tends to 1. For example, suppose that today is April 10, 2020, and case data is only available up to this date. 160,104 Table 1 shows the 3-month forecasts for a few countries including Singapore and how they compare with the projections by the Institute for Health Metrics and Evaluation (IHME) [26] at University of Washington Medicine, and the actual death toll. The projections by the IHME are based on more complex analytics and consider factors such as changes in social distancing measures, diagnostic capability, and hospital capacity. Even though we did not directly account for these factors, our forecasts of the total number of deaths were within 20% of the actual numbers. This demonstrates concurrent validity of our model. Assuming current prevailing conditions in the populations, results from the COVID-19 Epidemic Calculator are likely to be realistic estimates. Our simpler model is more accessible to non-epidemiologists and data analysts in other fields who might want to incorporate it in their own analytics. This paper describes the methods underlying the online COVID-19 Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available data, the calculator is used to estimate the distributions at time of symptom-onset and infection, effective reproduction number, final epidemic size, and death toll for Singapore and other countries. The calculator and the associated graphs clearly show that the Circuit breaker measures imposed from April 7, 2020 in Singapore had an immediate effect of rapidly slowing down the spread of the COVID-19. Additionally, the results also reveal that the effective reproduction number has J o u r n a l P r e -p r o o f settled to around 1.0 after about two weeks. Since then, it has remained at that level for more than a month. This indicates that the infection rate among the dormitory residents is sustained and not likely to be reduced until this group become less susceptible. The COVID-19 Epidemic Calculator is available in the form of an online spreadsheet [30] that imports daily infection data from publicly available sources [2, 3, 31] . The results are presented online as dashboards on Tableau Public [32] (Figure 21 ). It has the advantage of fast execution time without the need for any specialized software package or programming script. Users can also interact with the models by changing the parameters. Comparing over eighteen months of data with other similar work, our parameter estimates are in good agreement with those estimated using different models and software. By making the COVID-19 Epidemic Calculator readily accessible online, it is hoped that the public and interested analysts from non-epidemiological disciplines have the tool to assess our effort in fighting COVID-19 meaningfully. COVID-19 Situation Report, Ministry of Health Singapore Our World in Data. Coronavirus Source Data The COVID Tracking Project The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application Investigation of three clusters of COVID-19 in Singapore: implications for surveillance and response measures Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia Serial interval of novel coronavirus (COVID-19) infections The serial interval of COVID-19 from publicly reported confirmed cases. medRxiv Serial intervals of respiratory infectious diseases: a systematic review and analysis Geneva: World Health Organization Presymptomatic Transmission of SARS-CoV-2 -Singapore Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study SEIR Transmission dynamics model of 2019 nCoV coronavirus with considering the weak infectious ability and changes in latency duration A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons Estimating Epidemic Exponential Growth Rate And Basic Reproduction Number Epidemic Modeling and Estimation How generation intervals shape the relationship between growth rates and reproductive numbers Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures An introduction to probability theory and its applications The distribution of a sum of independent binomial random variables The Metric We Need to Manage COVID-19. Rt: the effective reproduction number Real time bayesian estimation of the epidemic potential of emerging infectious diseases A new framework and software to estimate time-varying reproduction numbers during epidemics Epidemic doubling time of the COVID-19 epidemic by Chinese province Institute for Health Metrics and Evaluation (IHME) COVID-19 Projections Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions The estimation of the basic reproduction number for infectious diseases Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study COVID-19 Epidemic Calculator Data on the geographic distribution of COVID-19 cases worldwide Global Covid19 Reproduction Number and Doubling Time Tracker R0: estimation of R0 and real-time reproduction number from epidemics Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models Growth forecast of the covid-19 with the gompertz function, Case study: Italy, spain, Hubei (China) and South Korea Predicting the Trajectory of Any COVID19 Epidemic From the Best Straight Line Disclosure of potential conflicts of interest The authors declare that they have no conflict of interest.Ethics approval for research involving Human Participants This research study did not involve collection of data from human participants. Ethics approval is not required.Informed consent Informed consent was not required since there were no individual participants in this study.Data and/or Code availability The data used in the calculations are publicly available and the sources are cited in the References section. The codes for the calculator can be downloaded from the website, www.cv19.one.