key: cord-0056743-00ybcgoj authors: Sharma, Aryan; Sapkal, Srujan; Verma, Mahendra K. title: Universal Epidemic Curve for COVID-19 and Its Usage for Forecasting date: 2021-02-27 journal: Trans Indian Natl DOI: 10.1007/s41403-021-00210-5 sha: 312ef7050e5c33053da3eb3891060ca09bb9b3fe doc_id: 56743 cord_uid: 00ybcgoj We construct a universal epidemic curve for COVID-19 using the epidemic curves of eight nations that have reached saturation for the first phase and then fit an eight-degree polynomial that passes through the universal curve. We take India’s epidemic curve up to January 1, 2021 and match it with the universal curve by minimizing square-root error between the model prediction and actual value. The constructed curve has been used to forecast epidemic evolution up to February 25, 2021. The predictions of our model and those of supermodel for India (Agrawal et al. in Indian J Med Res, 2020; Vidyasagar et al. in https://www.iith.ac.in/~m_vidyasagar/arXiv/Super-Model.pdf, 2020) are reasonably close to each other considering the uncertainties in data fitting. Universality is an important paradigm of science. In this framework, varied natural phenomena are explained in a single framework. For example, Newton showed that the gravitational force that acts between planets and stars is same as that Earth exerts on objects on its surface (Verma 2016) ; he derived universal theory of gravitation based on these observations. All kinds of phase transitions are classified as first-or second-order transitions (Wilson and Kogut 1974) . Materials are classified as insulators, metals, or semiconductors using the theory of band structures. Note that the materials are very different, yet, there is a universal theory due to a common dynamics of electron hopping in different kinds of materials. Similarly, we have universal theory of hydrodynamic turbulence (Leslie 1973; Verma 2019) . Biological growth too is described by certain universal dynamics (Goriely 2017). Pingala-Virahanka-Fibonacci numbers and golden mean are used to describe many biological patterns. In the early phase, an epidemic grows among the susceptible population in an exponential manner, somewhat similar to biological growth (Goriely 2017; Bjørnstad 2018; Daley and Gani 2001) . This growth saturates after some time due to nonlinear effects (Bjørnstad 2018; Daley and Gani 2001) . The growth rates and saturation levels of an epidemic depends on the many parameters, such as population's immunity levels, health care facilities, climate, nature of intervention (social distancing, lockdowns), etc. Still, the growth and saturation dynamics of an epidemic at different locations are somewhat similar. However, the levels of the epidemic may depend on various factors. These features were highlighted in Verma et al. (2020) where they showed that the epidemic curves follow a sequence of power laws before saturation (called flattening of the curve). Similarly, Schüttler et al. (2020) showed that I(t) or total death count could be modelled using the error function. Martelloni (2020) attempted universal forecast based on epidemic data. Based on these observations, we attempted to construct a universal curve for COVID-19 epidemic using the data of several nations. We observed that the epidemic data for the first wave of 8 nations follow a universal curve when scaled with the maximum time and maximum infection counts. The universality in COVID-19 epidemic is somewhat surprising considering significant differences in demography, government actions, lockdown conditions, etc. cross nations. However, we believe this universality to be due to underlying basic dynamics of epidemic growth (epidemic spread by contacts and nonlinear effects) Schüttler et al. 2020) . We make a cautionary remark that this issue needs to investigated extensively with more data (e.g., more countries, data for second and third waves, etc.). Still, this preliminary finding is very interesting and useful. For example, we use the universal epidemic curve for COVID-19 to forecast epidemic evolution for India. The COVID-19 pandemic, one of the most devastating disasters in the last 100 years, is raging around the world. As of January 12, 2021, the total infection count is around 90 million, and the total death cases crossed 1.9 million (WorldOmeter 2020). The whole world is trying to mitigate the pandemic; an essential input for this effort is the prediction of epidemic evolution because it allows policymakers to prepare and plan countermeasures. For the same, some modellers try to model the epidemic using a set of differential equations, while others employ data-driven algorithms (including machine learning). Even though our model belongs to the latter class, for completeness and contrast, we summarize key models that are based on differential equations. The SIR model, constructed by Kermack and McKendrick (1927) , was one of the first models of epidemic evolution. In this model, the variables S and I describe the numbers of susceptible and infected individuals, respectively, while the variable R represents the removed individuals who have either recovered or died. When the epidemic has subsided, I = 0 , but S and R take different values depending on the initial condition and state of the population. SEIR model, a generalization of SIR model, includes exposed individuals, E, who are infected but not yet infectious (Bjørnstad 2018; Daley and Gani 2001 ). The SEIR model has a richer phase space description. SEIR model cannot capture the complexities of COVID-19. For example, persons with asymptomatically-infected persons spread of the COVID-19 virus very fast, hence such individuals are called super-spreaders. Also, lockdowns, social distancing, and other restrictions are important parameters for the epidemic evolution. Vaccination and travel restrictions too play a crucial role for the epidemic growth. Considering the impact of COVID-19 on economy and public health, researchers have made many epidemic models (e.g., see Hethcote 2000; Marathe and Vullikanti 2013; López and Rodo 2020 and references therein). Here, we list only some of these models. Peng et al. (2020) constructed a seven-variable model that includes quarantined and death variables and predicted that the daily counts of exposed and infectious individuals in China would be negligible by March 30, 2020. Chinazzi et al. (2020) and Hellewell et al. (2020) studied the effects of travel restrictions and isolation on epidemic evolution. Mandal et al. (2020) constructed an India-specific model that includes intercity connectivity. Shayak et al. (2020) modelled epidemic evolution using delayed-differential equations. In addition, Rahmandad et al. (2020) has also used a model to predict Indian epidemic growth. As described earlier, asymptomatic carriers are superspreaders of COVID-19. Hence, attempts have been made to model the effects of super-spreaders. In particular, Liu et al. (2020) constructed a susceptible-asymptomaticinfected-removed (SAIR) model that takes into account this important factor. Ansumali et al. (2020) and Robinson and Stilianakis (2013) generalized this model by incorporating various factors such as lockdowns and herd immunity. Recently, Vidyasagar et al. (2020) and Agrawal et al. (2020) have adopted SAIR model to study the epidemic evolution in India; this model, termed as supermodel, has many predictions. For example, it predicted 10.6 million cases by the end of 2020, which is quite close to the actual count of 10.286 million on December 31, 2020. Data-driven models are also used for epidemic forecast. For example, Sharma and Nigam (2020) employ time series analysis to forecast epidemic growth in India. Recent analysis of COVID-19 data reveals that the epidemic curve begins with an exponential growth, after which it follows a sequence of power laws (Ziff and Ziff 2020; Komarova et al. 2020; Manchein et al. 2020; Blasius 2020; Cherednik 2020; Chatterjee et al. 2020; Verma et al. 2020; Marsland and Mehta 2020; Singer 2020; Asad et al. 2020; Ranjan 2020) . The epidemic curve flattens after square-root growth. Using the recorded epidemic data (first wave) of eight nations, we construct a universal epidemic curve for COVID-19 by appropriate normalization. The above universal behaviour (Manchein et al. 2020; Martelloni 2020) can be utilized to predict epidemic evolution in various countries. In this paper, we fit India's epidemic curve on the universal curve by minimizing of rms (root mean square) error between the model prediction and the actual numbers. Even though India's epidemic curve has not yet saturated, the fit function describes the present data quite well. In particular, the model predictions for the five weeks between December 11, 2020 to January 14, 2021 are in general agreement with the observed data (within an error bound of 137%). In this paper, we argue that our predictions are reasonable considering the fact that the present daily counts of infections are under-reported. Note that at present, the testing rates have decreased and that many people are getting cured of COVID-19 quite easily (hence, go unreported). In the next section, we construct a universal epidemic curve using the epidemic data of several countries. To construct a universal curve for the COVID-19 epidemic, we consider the first-wave epidemic evolution of eight countries: France, Spain, Italy, Switzerland, Turkey, Netherlands, Belgium, and Germany. We choose these countries because the first-wave infections for them have saturated. We take the data from EU Open Data Portal Covid-19 (2020) and WorldOmeter (2020) websites. The starting dates of the data collection for these countries are given in Table 1 . The epidemic curves, I(t) vs. t, for the above nations look quite different. However, these curves collapse to a single curve (approximately) when we normalize their I(t) and time t by I max and t max respectively. I max and t max for each country are defined as the values of I(t) and t on 30th June 2020 (see Table 1 ). See Fig. 1 for an illustration. In the figure, the dashed curves represent I(t) for individual countries, whereas the solid black curve represents an average of I(t)'s of the eight countries. We term the solid curve as the universal epidemic curve for COVID-19. Note that the universal curve starts with an exponential part, and then it follows various power laws before saturation (see Fig. 2 and the first row of Table 2 ). Also refer to references Verma et al. 2020; Blasius 2020; Marsland and Mehta 2020; Singer 2020; Asad et al. 2020 ) for further details on various power-law regimes of the epidemic curves. For modelling and forecast, it is convenient to fit a single function that passes through the universal epidemic curve of Fig. 1 . For the same, we choose an eighth-order polynomial that fits from t∕t max = 0 to 1. For simplification, we include the exponential regime in the interpolating polynomial. Note that the uncertainty in the transition region between the exponential part and power-law regime would introduce additional errors. This polynomial fit is listed in the second row of Table 2 . Note however that eighth-order polynomial does not describe well the exponential phase of the universal curve; it shows oscillations in this range. Here, we make a cautionary remark that the above universal curve was constructed using the epidemic data of eight nations that have similar geographical conditions and cultural milieu. We need more data and careful analysis for a definitive conclusion and for detecting anomalies. These works will be carried out in future. In the next section, we will model Indian epidemic curve using the derived universal curve. curve is yet to reach saturation, we cannot determine t max and I max from the epidemic curve. Therefore, we estimate t max and I max by an optimization procedure that involves minimizing the rms error between the predicted value (using the universal curve) and the actual data from March 4, 2020 to January 1, 2020. We outline the optimization procedure in Algorithm 1. We estimate t max and I max by minimizing the following function: where P(t∕t max ) is the polynomial fit for the universal function, which is listed in Table 2 . We estimate t max and I max for which the above error is minimum (see Algorithm 1). This process converges towards a unique minimum with I max = 16.22 million and t max = 722 days (approximately 2 years) for which the error (Eq. 1) is 0.319. See Fig. 3 for an illustration. Using the above parameters we obtain maximum overlap between the observed epidemic curve and the universal curve. See Fig. 4 for an illustration. In the same figure, we also plot ̇I (t) , derivative of I(t) , which corresponds to daily infections ). We employ Python's gradient function for the derivative computation. We observe that ̇I (t) computed using the fit function matches with the observed daily cases quite well. We expect to get a better fit at a later date when more data would be available. (1) Error = Table 2 First row: best fit curves for the exponential and power law regimes of the universal curve (refer to Fig. 2) , second row: a polynomial fit for the universal curve of Fig. 1 . The error (standard deviation, std) between the polynomial and black solid curve of Fig. 1 is 0.089. Here t � = t∕t max Best-fit functions with errors In the next section, we will compare the predictions of our model with the actual data. Once India's epidemic curve I(t) has been constructed, we can predict infection count at any date. For simplicity, we employ the eighth-order polynomial for this purpose. In Fig. 5 we present the model predictions of I(t) and ̇I (t) from July 1, 2020 to March 1, 2021 using red and blue curves respectively. In the same figure, we also present the reported cumulative and daily counts using black dashed curves. In Table 3 , we list the weekly new cases, along with model predictions, for India. Note that the model predictions are reasonably close to the actual data. The maximum error between the prediction and actual data is approximately 137%. We argue that such large errors are expected for COVID-19 because the current infection counts are under-reported. Since COVID-19 is reasonably well managed at present with many patients getting cured of COVID-19 at home; many such cases are not added to the main tally. In addition, there are uncertainties in mathematical modelling. Regarding the forecast, the universal curves indicate that the linear regime ( I(t) ∝ t ) starts at around t∕t max = 0.25 . For India, t∕t max = 0.25 translates to the last week of September 2020. Note that the daily cases are approximately constant in the linear regime, but they decrease after the linear regime. and ̇I (t) (solid blue curve) for the duration of July 1, 2020 to March 1, 2021 using polynomial of Table 2 ). The dashed black curves represent the corresponding reported counts. Note that the solid curves of Fig. 4 corresponds to data up to 01/01/2021. Also refer to Table 3 Comparison with Other Leading Epidemic Models for India As described in Sec. 1, there are interesting low-dimensional models of epidemic evolution. These models are refinements of the SEIR model. In this section, we compare our model predictions with some of the leading epidemic models for India. In one such model, Rahmandad et al. (2020) forecasted that in early 2021, the daily infections count in India will reach 0.287 million (2.87 lacs). Also, refer to Song et al. (2020) . Our model predicts much lower counts for 2021, which is consistent with the observed counts. For example, we predicted 43.7 thousand new infections on December 31, 2000, which is around two times the actual number of 19046 (WorldOmeter 2020). This number is reasonable considering the uncertainties in numerical modelling and data collection. India's supermodel Agrawal et al. 2020) , which is based on the SAIR model (Ansumali et al. 2020; Robinson and Stilianakis 2013) , has gained major prominence recently. This model predicts that India may have reached herd immunity with around 38 crores (380 milions) of the population either infected or having antibodies. One of the predictions of supermodel is that the infection counts at the end of 2021 would be 10.6 million, which is quite close to actual number of 10.286 million on December 31, 2020. The predictions of our model are similar to those of supermodel. For example, we predict that total infections on December 31, 2020 to be 11.1 million, which is slightly larger than the prediction of the supermodel. In addition, for the week of January 1-7, 2021, our prediction for the weekly infection count is approximately 300 thousands, which is approximately 2.3 times larger than reported number. We believe that our predictions are reasonable because of the large errors in the fitting algorithm, as well as in data uncertainties. For example, the present testing rate is much lower than that in the past. Also, many people, who recover quite easily from COVID-19, go unreported. Considering these factors, we believe that the reported COVID-19 counts are several times lower than the actual numbers. The first-phase of the epidemic curves for the eight countries-France, Spain, Italy, Switzerland, Turkey, Netherlands, Belgium, Germany-collapse into a single curve when normalized with max infection ( I max ) and total time duration ( t max ). We construct a universal curve for the COVID-19 epidemic by averaging over the above eight curves. In addition, we fit an eighth-degree polynomial for the universal curve. Demography, government actions, lockdown conditions, and other factors have strong impact on COVID-19 epidemic evolution. However, as described in the introduction, due to certain similarities in growth and saturation dynamics of epidemic evolution, epidemic curves of different nations collapse into a single curve after proper normalization. Note that the individual epidemic curves are quite different due to local factors. Chatterjee et al. (2020) studied the second phase of the epidemic curve for the whole world and observed it to have certain similarities with the first phase of the epidemic curve. However, there are instances when the second and third waves of the epidemic are very different from the first wave. For example, the present wave of the epidemic curve for Germany has many oscillations (WorldOmeter 2020). Given these uncertainties, we cannot extrapolate that the universal epidemic curve presented in this paper would work for the second and third waves of the epidemic. We plan to study this aspect in near future. We make a cautionary remark that the universal curve has been constructed using the epidemic data of eight nations during the first wave. These nations have demographic and climatic homogeneity. In future we will study the epidemic curves of other nations, as well as those during second and third waves. Also, even though some Indian states observed first and second waves of COVID-19 epidemic, the cumulative count for India is going through the first wave of the epidemic (see Fig. 4 ). The discovery of the universal epidemic curve gives us an interesting handle for forecasting the epidemic evolution. An advantage of this approach over others is that it is purely data-driven. Hence, we do not need to model various parameters and terms of the differential equations of the model. However, a disadvantage of our method is that we do not have any control parameter. For example, SAIR model can be tuned by changing the coefficients of some terms of the differential equations, but we cannot do so in our data-driven model. We compared India's reported epidemic curve with the universal curve with appropriate scaling. Using an optimization algorithm, we showed that India's present epidemic curve overlaps with a part of the universal curve. This discovery enables us to forecast epidemic evolution in India. We observe that our predictions match with the observed data quite well in spite of so many uncertainties. Note, however, that our predictions tend to be systematically larger than the actual data, which could be due to uncertainties in data and due to errors in mathematical modelling. Our model predicts that the daily counts of India's COVID-19 epidemic are falling rapidly; this observation is consistent with the recorded data. Also, the predictions of our models and those of the supermodel are reasonably close to each other. We believe that the present trend will continue till saturation of the curve, except when COVID-19 virus mutates in India, or when the mutated virus from elsewhere spreads rapidly in India. The universal curve could be further refined using a more advanced algorithm, such as machine learning and deep neural networks. In addition, it will be interesting to work out the universal curves for the daily cases, as well as for the active cases. We are in the process of such extensions. In summary, the universality of epidemic growth is a useful idea that can help in effective modelling of COVID-19 pandemic. Modelling the spread of SARS-CoV-2 pandemic-Impact of lockdowns and interventions Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity Evolution of COVID-19 pandemic in India Epidemics: models and data using R Power-law distribution in the number of confirmed COVID-19 cases Evolution of COVID-19 pandemic: power-law growth and saturation Momentum managing epidemic spread and Bessel functions The effect of travel restrictions on the spread of the (2019) novel coronavirus (COVID-19) outbreak COVID-19 India cases tracker Epidemic modelling: an introduction. Cambridge University Press, Cambridge Goriely A (2017) The mathematics and mechanics of biological growth Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts The mathematics of infectious diseases A contribution to the mathematical theory of epidemics Patterns of the COVID19 epidemic spread around the world: exponential vs power laws Developments in the theory of turbulence A new SAIR model on complex networks for analysing the 2019 novel coronavirus (COVID-19) A modified SEIR model to predict the COVID-19 outbreak in Spain and Italy: simulating control scenarios and multi-scale epidemics Strong correlations between power-law growth of COVID-19 in four continents and the inefficiency of soft quarantine strategies Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: a mathematical model-based approach Computational epidemiology Data-driven modeling reveals a universal dynamic underlying the COVID-19 pandemic under social distancing Modelling the downhill of the Sars-Cov-2 in Italy and a universal forecast of the epidemic in the world Epidemic analysis of COVID-19 in China by dynamical modeling Estimating COVID-19 under-reporting across 86 nations: implications for projections and control Temporal dynamics of COVID-19 outbreak and future projections: a data-driven approach A model for the emergence of drug resistance in the presence of asymptomatic infections Covid-19 predictions using a Gauss model Modeling and forecasting of COVID-19 growth curve in India Transmission dynamics of COVID-19 and impact on public health policy Short-term predictions of country-specific COVID-19 infection rates based on power law scaling exponents An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China Introduction to mechanics Energy transfers in fluid flows: multiscale and spectral perspectives Indian supermodel for COVID-19 pandemic The renormalization group and the expansion Fractal kinetics of COVID-19 pandemic Acknowledgements We thank Soumyadeep Chatterjee, Shashwat Bhattacharya, and Asad Ali for useful discussions. This project is supported by a SERB MATRICS project SERB/F/847/2020-2021.