key: cord-0746772-39ywzw6a authors: Utsunomiya, Yuri Tani; Utsunomiya, Adam Taiti Harth; Torrecilha, Rafaela Beatriz Pintor; Paulan, Silvana Cassia; Milanesi, Marco; Garcia, Jose Fernando title: Growth rate and acceleration analysis of the COVID-19 pandemic reveals the effect of public health measures in real time date: 2020-04-02 journal: nan DOI: 10.1101/2020.03.30.20047688 sha: 0d0bb3af5f5665e4b529c7307b56ea40202ebc5d doc_id: 746772 cord_uid: 39ywzw6a Background: Ending the COVID-19 pandemic is arguably one of the most prominent challenges in recent human history. Following closely the growth dynamics of the disease is one of the pillars towards achieving that goal. Objective: We aimed at developing a simple framework to facilitate the analysis of the growth rate (cases/day) and growth acceleration (cases/day^2) of COVID-19 cases in real-time. Methods: The framework was built using the Moving Regression (MR) technique and a Hidden Markov Model (HMM). The dynamics of the pandemic was initially modeled via combinations of four different growth stages: lagging (beginning of the outbreak), exponential (rapid growth), deceleration (growth decay) and stationary (near zero growth). A fifth growth behavior, namely linear growth (constant growth above zero), was further introduced to add more flexibility to the framework. An R Shiny application was developed, which can be accessed at http://www.theguarani.com.br/covid-19 or downloaded from https://github.com/adamtaiti/SARS-CoV-2. The framework was applied to data from the European Center for Disease Prevention and Control (ECDC), which comprised 853,200 cases reported worldwide as of April 2nd 2020. Results: We found that the impact of public health measures on the prevalence of COVID-19 could be perceived in seemingly real-time by monitoring growth acceleration curves. Restriction to human mobility produced detectable decline in growth acceleration within few days, deceleration within ~2 weeks and near-stationary growth within ~6 weeks. Countries exhibiting different permutations of the five growth stages indicated that the evolution of COVID-19 prevalence is more complex and dynamic than previously appreciated. Conclusions: These results corroborate that mass social isolation is a highly effective measure against the dissemination of SARS-CoV-2, as previously suggested. Apart from the analysis of prevalence partitioned by country, the proposed framework is easily applicable to city, state, region and arbitrary territory data, serving as an asset to monitor the local behavior of COVID-19 cases. We used moving regression to model the growth rate (cases/day) and acceleration (cases/day²) of COVID-19 cases in 123 countries as of March 25 th 2020. In countries entering stationary growth (China and South Korea) decline in acceleration was observable up to 1 week after severe restriction to human mobility was adopted as a preventive measure. Deceleration was detectable within 2 weeks, whereas stationary growth was reached within 6 weeks. These results corroborate that mass social isolation is a highly effective measure against the dissemination of SARS-CoV-2. We also found that the impact of public health measures could be evaluated in seemingly real-time by monitoring COVID-19 growth curves. Moreover, reasonable daily predictions of new cases were obtained (R² ~ 0.95). Apart from the analysis of prevalence partitioned by country, moving regression can be easily applied to city, state, region or arbitrary territory data to help monitoring the local behavior of COVID-19 cases. The World Health Organization (WHO) officially declared COVID-19 a global pandemic on March 11 th 2020 (1) . The disease is caused by a novel coronavirus, namely SARS-CoV-2 (2, 3) , which first emerged in Wuhan, China on December 12 th 2019 (4, 5) . Worldwide dissemination has been extremely rapid, and by the time this study was completed a total of 416,916 cases and 18,565 deaths had been reported across 123 countries according to data from the European Center for Disease Prevention and Control (ECDC) (6) . Although 86% of all cases are estimated to have been undocumented prior to the cordon sanitaire in China (7), partial COVID-19 prevalence data is still an invaluable resource to help monitoring and controlling the disease. In particular, extracting daily estimates of growth rate (cases/day) and acceleration (cases/day²) in disease dissemination from real-time case reports can be decisive for an effective and promptly action to restrain further contagion. Here we report the application of a simple statistical framework, namely moving regression (MR), to the analysis of publicly available COVID-19 case reports that are updated daily by the ECDC. We start from the reasonable assumption that the cumulative number of COVID-19 cases over time (i.e., the growth curve) in a specific country or territory should typically follow an unknown sigmoidal function (Fig. 1a) . In fact, empirical data from China (Fig. 1b) and South Korea (Fig. 1c) strongly support that assumption. We note however that this assumption is substantially relaxed later by our model to accommodate complex dynamics in the evolution of COVID-19 prevalence. We define growth rate and growth acceleration as the first and second order derivatives of the prevalence of COVID-19 in respect to time. In our analysis, we selected MR over competing models that are frequently used to describe the behavior of growth curves, such as the Gompertz (8) model, because: (i) it is dependent on a single free parameter, the "smooth factor", which represents the number of neighboring days used in local regression; (ii) growth rate and acceleration estimates are approximated by ordinary least squares equations, which are computationally inexpensive; (iii) we performed extensive simulations of growth curves and found that it produces reasonably accurate estimates of growth rate (median R 2 = 0.99 with smooth factor of 3) and acceleration (median R 2 = 0.92 with smooth factor of 3) (Supplementary Fig. 1) ; (iv) it is very robust to departures from sigmoidal curves; and (v) it does not rely on observations of the whole curve and thus can produce estimates in almost real time. Argument (v) is especially relevant to the analysis of COVID-19 data since the pandemic is ongoing and each country will be at a different stage of the growth curve as time passes. A clear disadvantage of MR is that it may over-fit the growth curve to the data, especially if the selected smooth factor is small (say < 3), in which case accurate prediction of new cases of COVID-19 is limited to very few days in the future. Still, even singleday predictions can be of great use during a pandemic if reasonably accurate. In the ECDC data set, a forward validation showed that single-day predictions were sufficiently accurate (R² ~ 0.95) (Supplementary Fig. 2) . Growth curves can be partitioned into four easily distinguishable stages (Fig. 1a) : (a) the lagging stage, which corresponds to the beginning of the outbreak or disease importation, where the number of cases are low and increase only marginally every day; (b) the exponential stage, when growth starts accelerating and the number of new cases increase rapidly day-by-day; (c) the deceleration stage, where the number of new cases reduces daily and tends to asymptote; and (d) the stationary stage, characterized by stagnation of the prevalence with sporadic new cases occurring each day. The growth rate is the first order derivative of the growth curve and it is approximately bell-shaped, with its peak corresponding to the inflection of the exponential stage. This inflection point signals the beginning of a decline in the growth rate. The growth acceleration is the second order derivative and consists of a combination of two bell-shaped curves: the first one with a peak and the second with a valley. The peak indicates the point where acceleration starts descending towards zero. The moment when acceleration is exactly zero coincides with the inflection of the exponential stage, which marks the beginning of growth deceleration (i.e., negative acceleration). The latter corresponds to the entire concave section of the curve, but the very bottom of the valley indicates that the prevalence is moving towards stagnation. Although sigmoidal curves follow these four stages sequentially, we anticipate that the growth of COVID-19 cases may not necessarily obey a sigmoidal curve in practice, since the dynamics of the disease are complex and highly dependent on public health measures. This implies that a country that has already reached a stationary stage could resume exponential growth, for example by seeding a new outbreak via importation. Likewise, decelerating countries could as well regain acceleration by relaxing prevention measures. These scenarios could produce more complex growth curves that combine multiple exponential, deceleration and stationary stages. Of note, MR has sufficient flexibility to model these complex scenarios and can easily accommodate curves exhibiting anomalous combinations of these four stages. In this study we sought to ascertain whether these characteristics of growth curves could have direct implications in understanding the dynamics of COVID-19 prevalence both globally and locally. Using MR on ECDC data frozen on March 25 th 2020, we found that the number of countries in each stage of growth were: lagging = 37 (30.1%), exponential = 81 (65.9%), deceleration = 3 (2.4%) and stationary = 2 (1.6%). The observation that the majority of the countries were in exponential growth implies that the pandemics is expanding. Only a handful of countries (i.e., those in deceleration or stagnation) were found to be experiencing an apparent control of COVID-19 dissemination. The countries found in stationary stage were China and South Korea (Fig. 1b-c) , whereas the ones in deceleration were Denmark, Estonia and Qatar (Fig. 2) . By projecting official government announcements against the fitted curves of these countries, we observed that decline in growth acceleration occurred shortly after (< 1 week) the implementation of measures that drastically reduced human movement. Deceleration of growth was typically achieved within 2 weeks in these five countries. For China and South Korea, COVID-19 prevalence plateaued within 6 weeks. These results showed that: (i) the effect of public health measures on SARS-CoV-2 prevention can be detected in seemingly real time by monitoring the behavior of the growth rate and acceleration curves; and (ii) restriction to human mobility is very effective in controlling the spread of the disease, but takes several weeks to produce a stationary growth. These findings are in line with a recent study showing that human mobility explained early growth and decline of new cases of COVID-19 in China (9) . As discussed before, one should not immediately assume that a country in deceleration or plateau will remain stationary, since acceleration could take off again. To illustrate further the utility of MR to monitor the COVID-19 pandemic, we decided to look more closely to data from three countries that were in the exponential growth stage as of March 25 th 2020, namely United States of America (USA), Brazil and Italy (Fig. 3) . The latter has been severely impacted with the disease, and by the time we completed our study the country had recorded 69,176 cases and 6,820 deaths. On March 10 th 2020 Italy implemented a strict quarantine, and five days later the country reached its maximum acceleration and started to move towards an inflection of the exponential growth. Our results indicate that, if Italy persists in this path, deceleration could begin within a few weeks in the country. In fact, on March 25 th Italy implemented a complete shut down of its borders, which may help the country to reach deceleration sooner. In contrast, Brazil and USA continued a march of exponential growth with no clear sign of reaching an inflection point soon. Still, since acceleration response to effective measures seems rapid, both Brazil and USA could bend their acceleration curves within days if such measures are implemented. In order to facilitate the analysis of growth rate and acceleration of COVID-19 cases, we built an application using R (10) and Shiny (11). This application automatically loads the latest ECDC case reports and applies MR to real-time data. The app also allows the user to upload custom data (e.g., city, region, province or state), which can be used to monitor the growth behavior of COVID-19 locally. Upon closing of the COVID-19 pandemic, this tool could be useful for the analysis of future outbreaks and epidemics, or even for the analysis of historical disease data. It is important to note that MR relies on case reports, such that sub-notification, delayed communication and the elapsed time between sample collection and diagnostic results may impact the real-time inference of growth dynamics in disease transmission and consequently jeopardize the timely detection of transitions in the growth curve. In conclusion, we demonstrated that the real-time analysis of growth curves of COVID-19 cases can be a powerful tool to monitor the impact of public health measures on the spread of the disease. We also showed that the pandemic is expanding, and that restrictions to human mobility can decelerate the incidence of new cases. Furthermore, we found that only a handful of countries were exhibiting signs of deceleration or stagnation of COVID-19 dissemination as of March 25 th 2020. We would like to express our highest gratitude to all health agents and individuals involved in reporting cases and making COVID-19 prevalence data available to the public. Funding: This study did not receive financial support and was conducted during voluntary social isolation. Author contributions: Y.T.U. conceived the study, performed simulations, coordinated the data analysis and wrote the manuscript. A.T.H.U. built R code for data analysis and programmed the Shiny App Dashboard. R.B.P.T., S.C.P., M.M. and J.F.G. revised growth curves for all 123 countries and pinpointed dates of measures taken by them to reduce human mobility. All authors revised and agreed with the contents of the manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: The data in this study was obtained from the European Center for Disease Prevention and Control (ECDC) and is publicly available at https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtributionworldwide-2020-03-25.xlsx (accessed on March 25 th 2020). The source code for the Shiny App used for data analysis is found in our GitHub repository: https://github.com/adamtaiti/SARS-CoV-2. Material and Methods Figs. S1 and S2 Fig. 1 Moving regression curves for countries approaching a stationary growth. All curves shown here were fitted with a smooth factor of 5. (a) Simulated data using a four-parameters Gompertz model with an asymptote at 80,000, growth coefficient of 0.15, inflection time at 35, and time ranging from 1 to 80. (b) Fitted curves for China between December 31 st 2019 and March 25 th 2020. The first red dot marks the midpoint between January 23 rd and 24 th 2020, when a strict cordon sanitaire was imposed to Wuhan, Shanghai, Jiangsu and Hainan. The second red dot pinpoints February 4 th 2020, when the cordon was extended to a larger portion of the eastern part of China. (c) Fitted curves for South Korea between January 20 th and March 25 th 2020. The red dot is placed between February 20 th and 21 st 2020, when a collection of restrictions to human mobility was imposed, including lockdown of Daegu city, suspension of flights, cancellation of mass gatherings and lockdown of all South Korean military bases. Importantly, reaching a stationary stage is not a guarantee that the disease will remain controlled, such that acceleration could rebounce up. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . Fig. 2 Moving regression curves for countries in growth deceleration. All curves shown here were fitted with a smooth factor of 3. (a) On March 11 th 2020 (red dot), Denmark became the second European country to establish a lockdown. The country started decelerating new cases of COVID-19 2 days after. (b) On March 13 th 2020 (red dot) a state of emergency was declared by the Estonian government, which imposed significant restrictions to travel and mass gatherings. Decline of growth rate was observed one day after, and deceleration started within 4 days. (c) Our analysis indicated that Qatar reached its peaking acceleration on March 9 th 2020 (red dot), the same day when Qatari officials had announced the closure of schools, nurseries, and universities, in addition to strong restrictions for traveling. One day after, the country registered a sudden spike in the number of cases (arrow). The days that followed were marked by a clear decline in acceleration and eventually deceleration of growth, which coincided with a succession of measures by the Qatari government that eventually led to significant restrictions to human movement. If the trend continues, these three countries may transition to a stationary stage soon. The curves of these three countries should be followed closely over time to verify whether they will enter a stationary stage or regain acceleration. Fig. 3 Moving regression curves for three countries in exponential growth. All curves shown here were fitted with a smooth factor of 5. By the time the study was completed, the governments of the United States of America (a) and Brazil (b) had not announced severe strict measures to restrain human movement. These two countries had primarily focused on reducing mass gatherings, besides of closing schools, nurseries, universities and other places that facilitate agglomeration, such as shopping malls. This is in contrast with Italy (c), which imposed strict quarantine on March 10 th 2020 (red dot). Five days later the country reached its peak acceleration and started an acceleration decline. The MR technique adopted here aimed at fitting a smooth growth curve to the COVID-19 prevalence data, such that the resulting curve could describe the cumulative number of cases as a function of time. For n recorded days in a given country or territory, let x be a n-dimensional column vector of days since the first case report and y the reciprocal column vector with elements corresponding to the cumulative number of cases. Relative to day d, we define y d and x d as k-sized subset vectors of y and x, respectively, where k = 1 + 2s and s is a free parameter representing the number of offset days before and after day d. Hereafter, we refer to s as the "smooth factor", since it controls the compromise between over-smoothing (large s) and over-fitting (small s) the curve to the data. Finally, we define X d = [1 k x d ], where 1 k is a k-dimensional column vector with all elements equal to one. The local growth rate was estimated by ordinary least squares regression: where m d is an intercept and g d is the estimated growth rate (cases/day) at day d. In practice, g d corresponds to an estimate of the instantaneous rate of change in the number of cases at day d, which in turn is an approximation to the first order derivative of the unknown growth function evaluated at time d. The smoothed growth curve was obtained by calculating fitted values as: After fitting equation (1) to all n records, we define g is a vector of size n containing all estimated local growth rates and g d as a k-sized subset vector of g. The local growth acceleration at day d was then obtained by adapting equation (1): where a d is the estimated growth acceleration (cases/day²) at day d. Now a d is an estimate of the instantaneous rate of change of the growth rate at day d, which consequently approximates the second order derivative of the unknown growth function evaluated at time d. To test the performance of MR in approximating growth curves and their rate of change and acceleration in scenarios where these curves have been observed only partially (i.e., real-time case report), we selected a widely used sigmoidal mathematical function, namely the Gompertz model, to generate 50,000 simulated growth curves. We used a parameterization of the Gompertz model that is dependent on four parameters: where t is a time point, a is the asymptote (i.e., number of cases at the stationary stage), exp is the exponential function, k is a growth coefficient and d is the time at inflection of the exponential stage (i.e., time when the growth rate reaches its maximum value and acceleration transitions from positive to negative). All simulations were performed considering a 100-days period, with parameters sampled as follows: a ~ Uniform(500,10000), k ~ Uniform(0.05,0.95) and d Ũ niform (5, 95) . Completely stationary curves were discarded. The accuracy of growth rate and acceleration estimates produced by MR with smooth factor ranging from s = 3 to s = 10 were then evaluated by taking the coefficient of determination (R²) of the regression of true values onto estimates. Results from this simulation study are presented in Figure S1 . We analyzed case reports that have been updated daily by the European Center for Disease Prevention and Control (ECDC). The moving regression framework was applied to that data using smooth factors ranging from s = 3 to s = 10. The acceleration curves were clipped at observation ns to avoid poor growth acceleration estimates at the end of the curve. Likewise, the last s days had their growth rates estimated by compounding rates from n -s to n using the acceleration estimated for day n -s. Finally, next-day predictions of COVID-19 prevalence were obtained by summing the last observed prevalence with its estimated growth rate. In order to measure the accuracy of these predictions, we performed a step-wise simulation by censoring observations ahead of each day, fitting MR to the remaining data and then comparing predicted and true next-day prevalence. Accuracy of predictions were again measured by linear regression, and only countries presenting at least 20 observed days were included in this analysis (n = 47). Results from this forward validation are found in Figure S2 . All analyses presented in this paper were performed using R version 3.4.4 (1) . To visualize the growth rate and acceleration of COVID-19 pandemic, we implemented a simple Shiny (2) dashboard application, which offers an intuitive web interface and allow us to be updated on new cases and the prevalence of COVID-19 worldwide. The application automatically loads the latest case reports from ECDC. Alternatively, users can upload their own data to visualize the growth rate and acceleration of COVID-19 of specific states, provinces, cities or aggregate data from arbitrary territory definitions. A simulator implementing a simple acceleration model based on trigonometric functions also provides to users an intuitive interface to simulate complex growth curves for comparison with empirical data. For the implementation we used the following packages: shiny v1.4.0 (3), shinydashboard v0.7.1 (4), shinydashboardPlus v0.7.0 (5), readxl v1.3.1 (6), shinyalert v1.0 (7), httr v1.4.1 (8) and plotly v4.9.2 (9), all available on CRAN (Comprehensive R Archive Network, https://cran.r-project.org/). The application can be downloaded from our GitHub repository at https://github.com/adamtaiti/SARS-CoV-2. Fig. S1 . Accuracy (R 2 ) of moving regression estimates of growth rate and growth acceleration from 50,000 simulated Gompertz growth curves. Fig. S2 . Accuracy (R 2 ) of moving regression predictions of next-day COVID-19 prevalence using real data from 47 countries with a minimum of 20 observed days. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.30.20047688 doi: medRxiv preprint World Health Organization, Coronavirus disease (COVID-2019) situation reports Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 The proximal origin of SARS-CoV-2 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Easy web applications in R shiny: Web Application Framework for R shinydashboard: Create Dashboards with 'Shiny Add More 'AdminLTE2' Components to 'shinydashboard readxl: Read Excel Files Easily Create Pretty Popup Messages (Modals) in 'Shiny' httr: Tools for Working with URLs and HTTP