key: cord-0815333-kmlt9q3m authors: Triambak, S.; Mahapatra, D. P.; Mallick, N.; Sahoo, R. title: A new logistic growth model applied to COVID-19 fatality data date: 2021-11-20 journal: Epidemics DOI: 10.1016/j.epidem.2021.100515 sha: 483360846edb69f159bdec2b859124f8c59a9a17 doc_id: 815333 cord_uid: kmlt9q3m Background: Recent work showed that the temporal growth of the novel coronavirus disease (COVID-19) follows a sub-exponential power-law scaling whenever effective control interventions are in place. Taking this into consideration, we present a new phenomenological logistic model that is well-suited for such power-law epidemic growth. Methods: We empirically develop the logistic growth model using simple scaling arguments, known boundary conditions and a comparison with available data from four countries, Belgium, China, Denmark and Germany, where (arguably) effective containment measures were put in place during the first wave of the pandemic. A non-linear least-squares minimization algorithm is used to map the parameter space and make optimal predictions. Results: Unlike other logistic growth models, our presented model is shown to consistently make accurate predictions of peak heights, peak locations and cumulative saturation values for incomplete epidemic growth curves. We further show that the power-law growth model also works reasonably well when containment and lock down strategies are not as stringent as they were during the first wave of infections in 2020. On the basis of this agreement, the model was used to forecast COVID-19 fatalities for the third wave in South Africa, which is currently in progress. Conclusions: We anticipate that our presented model will be useful for a similar forecasting of COVID-19 induced infections/deaths in other regions as well as other cases of infectious disease outbreaks, particularly when power-law scaling is observed. The COVID-19 pandemic has reinvigorated efforts at an unprecedented scale to better understand the dynamics and mechanism of infectious disease spread. Presently, there is significant interest worldwide to model region-specific infection and mortality curves, while also working on effective intervention and containment strategies. It is hoped that such a collective endeavor would continue working towards preventing an uncontrolled proliferation of the disease, while simultaneously countering near irreparable socio-economic damage from multiple waves of infections. This has resulted in a deluge of scientific literature related to the pandemic, that have proved to be a challenge to keep up with [1] . A large subset of research papers investigated the spatio-temporal evolution of the disease [2, 3, 4] , mostly using variants of the compartmental SIR (Susceptible-Infected-Removed) epidemiological model [5, 6, 7, 8, 9, 10, 11 ] to analyze the number of infections (or deaths) in specific regions. Other methods involved the use of phenomenological models [12, 13] , time-varying and non-linear Markov processes [14, 15] , superpositions of epidemic waves [16] , hybrid nonparametric models [17] and other data-driven approaches [18, 19, 20] , including those based on artificial intelligence [21] , etc. Along these lines, we recently performed a random walk Monte Carlo study to make temporal growth exponent predictions for COVID-19-like disease spread [22] , particularly for a spatially constrained, yet stochastically interacting population. In that work, similar to other simulational approaches [23, 24] , the spread of the disease was modeled on the basis of proximity-based interactions. We identified certain similarities between our simulation results and those obtained from other compartmentalized differential-equation-based extended SIR models [5, 6] , and further showed that spatial mobility plays a key role in determining the eventual growth in the total number of infections/deaths as a function of time. While this conclusion should not be surprising [25] , it was corroborated by a recent data-driven analysis of the 'mobility-network' in Germany, using cellular phone data [26] . These investigations established a connection between the three approaches (data-driven, simulation, and compartmental model) used to better understand infectious disease spread. In terms of phenomenological modelling, some of the most extensively used tools for making COVID-19 related predictions are the family of logistic growth models (LGMs) [13, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 ]. This is not unexpected. LGMs have been successfully used in the past for predicting growth curves in epidemics such as Ebola, SARS, H1N1, dengue, etc. [39, 40, 41] . Along these lines, our previous work [22] also showed that the commonly used generalized LGM [42] works for exponential growth, it may fail to satisfactorily fit epidemic data when there is sub-exponential power-law growth. As a continuation of our engagement with this problem, we present in this work a logistic growth model that describes such data more accurately. In the context of the COVID-19 pandemic, the simplest LGM used by some research groups [29, 30, 31, 32, 33] is described by the well-known Verhulst differential equation where λ is the intrinsic growth constant and K (also called the carrying capacity) is the asymptotic (saturation) limit for N (t), as t → ∞. The general solution of Eq. (1) is of the form where the point of inflection is at t = ln B/λ. The above is a special case of the Richards LGM [42] which is a solution of the Richards differential equation Here, the parameter m decides both the shape of the growth curve as well as its inflection point. For example, as m → 1, Eq. (4) becomes the Gompertz growth curve [43, 44] . The special case of m = 2 describes classical logistic growth, shown in Eqs. (1) and (2) . To be consistent with other recent literature and given the fact that we are only interested in the family of curves with m > 1, we rewrite Eq. (4) as where q = |1 − m| and λ ′ = λ/q. Recently, it has been shown [45] that in order to allow for sub-exponential growth, one can further generalize the Richards equation by replacing N with N p in Eq. (5), where p ≤ 1 is a 'deceleration' parameter [46] . Such subexponential growth was observed with initial COVID-19 data from China, where an analysis [5] of the number of reported cases from several provinces in the country showed a t α type power-law growth in N . This was attributed to effective containment and mitigation measures, as well as behavioral changes of the population [5]. Such control interventions prevent a homogeneous mixing [47] of the population, which if unchecked would lead to exponential growth, provided there is no depletion of the susceptible population [48] . Our simulations [22] further showed that the minimum growth exponent obtained under the most stringent mobility restrictions is quadratic growth (α = 2). More realistically one would expect growth exponents that are slightly higher than 2, under the most effective containment scenarios [5, 22] . It is reasonable to expect that during the first wave of the pandemic (in 2020) most countries followed similar contaiment strategies at various levels to counter the spread of COVID-19 within their populace. Therefore their cumulative infection (and fatality) curves are expected to have power-law growth exponents in the range of 2 to 3 [22] . Below we develop a new LGM that can adequately describe such data, and make reasonably accurate and consistent predictions. To develop the model, we start similarly as in Eq. (1), with the ansatz that the daily infection rate is proportional to N , the number of individuals who are already infected by the disease. Furthermore, it is apparent that for bounded (logistic) growth, one requires the daily rate to also be proportional to a term similar to the ones described in the parentheses of Eqs. (1) and (5). Therefore, for power-law growth, with N ∝ t α , we write a general form of daily infection rate, analogous to Eq. (5) as It is important to note that here dN/dt has an explicit dependence on time, unlike Eqs. (1) and (5). The parameter β is in units of time, so that in the asymptotic limit as t → β (well past the peak of the epidemic curve, for large values of β), dN/dt → 0. The α, γ and δ parameters are dimensionless, while λ has units of 1/t. In the next step, we empirically tested and developed this model, by fitting the above function to available data from four countries, Belgium, China, Denmark and Germany, during the first wave of infections in 2020. These countries were chosen because the data show a reasonably successful containment of the spread of COVID-19 within their population [49] , during the first wave. Similar to our previous work [22] , we performed a time-series analyses for the number of reported daily deaths 2 , instead of infections. This was due to several reasons. Firstly, the death toll is far more important to quantify than the infection rate in a given population, although they are related. Secondly, when performing a global comparison of data from different countries, we assume that COVID-19-related deaths are more accurately and uniformly recorded in general. And finally, given the strong correlation between the number of infected cases and number of deaths, the time-series trends in both death and infection rates are expected to be similar to one another. 3 The fits were performed using a non-linear least squares (NLS) algorithm that minimized the sum of squared residuals (SSR), defined by with respect to the daily reported deaths D i , where and N is the cumulative number of deaths at time t. However, the NLS fitting procedure showed that the five parameter fit in Eq. (6) was not optimal for such analysis. The five parameters were found to be highly correlated, with correlation coefficients in the range of 0.83 ≤ |ρ| ≤ 1. Successive fits to the same data, for different initial values for the parameters resulted in arbitrary and widely-varying values for the converged fit parameters, particularly λ, β and δ. Despite this, the fits yielded very similar values for the minimum SSR and nearly indistinguishable results. The above showed that the model in Eq. 6 2 All data described in this work are 5-day rolling averaged. 3 We caution that one must be careful in making this assumption, which may fail when live-saving treatment options are put in place (or inaccessible) midway, thereby affecting daily mortality rates. Such real-time interventions affect all phenomenological models. was not feasible for an out-of-sample forecasting with partial dN/dt epidemic curves. Consequently, we modified Eq. (6) to as a means to bypass the problem. For this part of the analysis, the β parameter was kept fixed at a large value (β = 500 days). The above prescription reduced the problem to four parameters, while placing significant restrictions on the allowed parameter space. Similar fits, performed as before showed nearly no dependence on β (as long as it is large and fixed), with the parameters consistently converging to very similar values. On fitting the data using our modified power-law logistic function, we obtain good agreement with the daily mortality curves from the four countries considered earlier. This is shown in Fig. 1 . As expected, the product of λ and t α mainly contribute to the rising part of the dN/dt curve. The other two parameters γ and ǫ contribute to truncating the rise. Together, these parameters decide the position of the central value of the peak as well as the nature of any tailing feature that follows it. The tailing in the data depends on country-specific mitigation and containment measures. It is found to be more prominent in the cases of Belgium, Denmark, and Germany. While this part of the analysis was necessary to show the more than satisfactory agreement of our power-law growth model with data, particularly from countries where the pandemic peak had long passed, this is not a robust test of the forecasting ability of the model. We next performed an out-of-sample forecasting test, this time only using data points from the rising part of the dN/dt curve. To avoid fit convergences to 'fake' minima that do not correspond to realistic values, we used a comparison with available data to empirically develop the procedure described below. First the full parameter space was mapped to arrive in the vicinity of the correct SSR minimum. This was done by performing a first round of non-linear SSR minimization with the constraints 4 0 < λ < 0.1, 2 < α < 3, 2 < γ < 3 and 0.5 < ǫ < 1. After this initial step, the restrictions on λ, α and γ were removed and the iterative grid-search NLS algorithm was used evaluate the values that yielded the minimum SSR in that region of parameter space. Our analyses showed that the procedure described above yielded reasonable agreement with the full data from different countries consistently. This agreement is illustrated in Fig. 2 , that also compares our fit results to those obtained from the classical LGM and both versions of the Richards LGM. The forecasting performance of the four models are further compared in Table 1 , which lists three performance metrics. These include the root mean squared (RMS) error, where n is the total number of data points in the curve, the mean absolute percentage error (MAPE), and the coverage of the 95% prediction interval, which quantifies the proportion of observations that fell within that range. As evident in the Fig. 2 and Ta Another check of the robustness of our analysis was performed by systematically reducing the number of in-sample calibration points (marked in red in Fig. 2 ) in the NLS fitting procedure. This effectively tested the stability of our predictions. The results of this systematic check are shown in Fig. 3 and Once assured that our data analysis procedure was on a secure footing, we performed a similar analysis for the second and third waves in South Africa. It may be noted that strict lock down and containment policies were not imposed in these scenarios (compared to the first wave) and that only a partial dN/dt curve for the third wave is available at the present time, allowing us to make predictions. Furthermore, the vaccinated status of part of the population and the different variants of the SARS-CoV-2 virus add additional complications that allow a rigorous test of the power-law growth model. Figure 4 shows power-law growth model fits to the second and third wave fatality data from South Africa. The in-sample red data points were fit similarly as before, with two minor differences. Since the growth exponent is expected to be higher in these scenarios [22] , the ranges on α and γ were increased to 2-6 in the initial restricted fit. An additional 'background' parameter was required to added to Eq. In summary, we used an empirical analysis to present a new logistic powerlaw growth model (LGM) that was applied to COVID-19 fatality data. This is relevant, as sub-exponential power-law growth is not adequately described by earlier variants of LGMs. Our model is found to be rather robust in accurately predicting peak and saturation values in epidemic growth curves from Belgium, China, Denmark and Germany. Following this validation, the power-law LGM is used to predict the COVID-19 induced-fatalities in the second and third waves for South Africa, after robustly testing the model predictions for the former. We anticipate that our presented growth model will be useful for forecasting general. We are thankful to Prof. Niranjan Barik for useful discussions related to this work. ST acknowledges support from the NRF (National Research Foundation), South Africa, under SARChI Grant. No. 85100. The red points show the in-sample calibration points used for the forecasting. The green curves show ±95% prediction intervals. In each case, the date that marks the beginning of the data-analysis region for the epidemic wave is indicated at t = 0 on the time axis. Right panel: Cumulative deaths recorded from the World Health Organization, shown together with our model forecasts at the 95% prediction interval. Time varying Markov process with partially observed aggregate data: An application to coronavirus Superposition of waves for modeling COVID-19 epidemic in the world and in the countries with the maximum number of infected people in the first Spatiotemporal dynamics, nowcasting and forecasting of COVID-19 in the United States Improving the estimation of the COVID-19 effective reproduction number using nowcasting Nowcasting fatal COVID-19 infections on a regional level in Germany Nowcasting Covid-19 statistics reported withdelay: a case-study of Sweden A survey on applications of artificial intelligence in fighting against COVID-19 A random walk Monte Carlo simulation study of COVID-19-like infection spread Spatial contact models for ecological and epidemic spread Studying and approximating spatio-temporal models for epidemic spread and control Acceleration of evolutionary spread by long-range dispersal COVID-19 lockdown induces disease-mitigating structural changes in mobility networks Fatemeh Bahranizadd, and Mohammad Hosein Zare, Nowcasting and Forecasting the Spread Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world The COVID-19 pandemic: growth patterns, power law scaling, and saturation Estimation of the final size of the COVID-19 epidemic Predicting the ultimate outcome of the COVID-19 outbreak in Italy Logistic approximations used to describe new outbreaks in the 2020 COVID-19 pandemic Logistic growth modelling of COVID-19 proliferation in China and its international implications Rational evaluation of various epidemic models based on the COVID-19 data of China Prediction and analysis of Coronavirus Disease A Note on the Evolution of Covid-19 in Italy Dynamics of tumor growth A generalized-growth model to characterize the early ascending phase of infectious disease outbreaks Mechanistic movement models to understand epidemic spread The mathematical theory of infectious diseases and its applications