key: cord-0944963-ek0j1a3e authors: Deepmala; Srivastava, Nishant Kumar; Singh, Sanjay Kumar; Singh, Umesh title: Analysis and prediction of COVID-19 spreading through Bayesian modelling with a case study of Uttar Pradesh, India date: 2022-03-31 journal: OPSEARCH DOI: 10.1007/s12597-022-00580-6 sha: 2da7730ba47157a5aa64c8795df521a5a3ca8cf8 doc_id: 944963 cord_uid: ek0j1a3e Predicting the dynamics of COVID-19 cases is imperative to enhance the health care system’s capacity, monitor the effects of policy interventions, and control the transmission. With this view, this paper examines the transmission process of the COVID-19 employing three types of confirmed, deceased, and recovered cases in Uttar Pradesh, India. We demonstrated an approach that has the power to sufficiently predict the number of confirmed, deceased, and recovered cases of COVID-19 in the near future, given the past occurrences. We used the logistic and Gompertz non-linear regression model under the Bayesian setup. In this regard, we built the prior distribution of the model using information obtained from some other states of India, which have already reached the advanced stage of COVID-19. This analysis did not consider any changes in government control measures. The world has been undergoing an outbreak of the most infectious as well as a communicable disease of the modern era, the coronavirus, whose severity has forced the World Health Organization (WHO) to classify it as a global public health emergency on January 30, 2020 [1] and subsequently regarded as a pandemic on March 11, 2020 [2, 3] . The said spread poses a new peril to the world, especially in the key dimensions including economy, social well-being, health sector, etc. Against this backdrop, a basket of precautionary measures such as maintaining social distancing, wearing masks, cleaning hands regularly, avoiding touching the eyes, nose, and mouth (WHO, 2020), came into regular practice to mitigate the spread of disease at its outset stage; however, it is noted that these practices were proved to be insufficient to placate the same for the long run, especially in the context of high-density countries like India. Since the confirmation of the first case of COVID-19 in India on January 30, 2020, there has been started a thumping pick-up in the number of confirmed cases as well as in deceased cases across all states and Union territories. This is the reason why India is now the most affected country in Asia with the third-largest number of confirmed cases in the world after the United States and Brazil [4] and if the current circumstances continue to persist, it could be soon on the top spot in terms of infected cases of COVID-19 [5, 6] . Furthermore, the examination of state-level data indicates that Maharashtra is one of the hotspots of COVID-19 cases regarding the number of infected cases with more than four lakhs recorded cases [7], followed by Tamil Nadu, Andhra Pradesh, Karnataka, NCT of Delhi, and Uttar Pradesh, wherein the reported number of cases were presently inched up the one lakh cases [7] . Meanwhile, in Uttar Pradesh (UP), a state that had a relatively significant contribution in term of the economy, trade, manufacturing, and services; which make it the second-largest economy in India after Maharashtra in terms of net state domestic product, confirmed cases were stood at a record high on 108,974 [7], out of the 2,025,423 confirmed cases, reported at all India level as on August 6, 2020. As a result, Uttar Pradesh, a state whose inhabitants numbers are well above the 200 million mark, is started to struggle in arresting the transmission of COVID-19 among its population. Therefore, UP is a cause for concern today; data on reported COVID-19 deaths from some other states signal that UP could be hurtling towards a similar crisis that will stretch their health systems in the upcoming days. In such a grim scenario, efficient management of the current health system is central to deal with the pandemic. Because of this, many outcomes are of potential interest to policymakers; for instance: how many confirmed, deceased, and recovered cases will be seen, and at what time? How many will be admitted to ICU or need ventilators? When will be started the declining trend of COVID-19? Even the simple answers to these questions will help the government identify the optimal policies in order to arrest the transmission rate. As there is no vaccine or treatment for this virus as of now, multiple non-pharmaceutical strategies such as computational modeling, statistical tools, and quantitative analyses to control the spread would also be highly essential to curb the current outbreak. A wide range of mathematical and statistical models can be established to study and analyze the uncertain transmission process of COVID-19 through which one can accurately predict the prevalence and explore the situation of the probability of cases and the recovery or deaths. Therefore, to keep the damage of the COVID-19 outbreak at the bare minimum level, the analysis and research of COVID-19 prediction models have become a burning research topic. In this regard, the most commonly used models for predicting infectious diseases includes internet-based infectious disease prediction models, time series prediction models, and differential equation prediction models based on dynamics. Several models have been proposed in literature [8] [9] [10] [11] . The studies, with reference to the India region, are very limited in numbers and out of them, some work has been done by [12] [13] [14] [15] [16] . From an analysis point of view, non-linear models are also important methods to deal with the problem of prediction [17] [18] [19] . As we know the mechanisms of COVID-19 spreading are not enough understood; therefore, analysis of the ongoing pandemic spreading within the bayesian framework [20] would have been interesting. This approach has the advantage that it offers a great flexibility to the researchers to incorporate some other relevant information in the model other than the recorded data. This article provides a in-depth study on the hidden patterns of COVID-19 through three indicators of interest viz. confirmed, deceased and recovered cases, in five selected states and a way to processed this information to analyze and project the pandemic situation of COVID-19 in the state of Uttar Pradesh, India. In this regard, databases for the analysis is described in Sect. 2, mathematical framework is elaborated in Sect. 3, where two non-linear models viz. logistic and Gompertz non-linear regression model, which accord with the statistical law, are discussed in detail. The process of prior elicitation is thoroughly explained in the Sect. 4. The results and conclusion are mentioned in Sect. 5 and 6 respectively. The mechanism of the statistical framework used in this paper is Bayesian non-linear logistic model (LM) and Gompertz model (GM) , whereby the prior information is regulated through the classical non-linear regression model, applied to the following five states: Andhra Pradesh (AP), Delhi (DL), Karnataka (KA), Maharashtra (MH), and Tamil Nadu (TN), which are already entered to the advanced stages of the COVID-19 outbreak. It is imperative to point out that these states have encountered diverse trajectories related to ongoing COVID-19 cases, making them suitable to provide reasonable prior information. The logistic and Gompertz non-linear regression model can be mathematically expressed as: and, respectively. It should be here noted that the interpretations of underlying parameters are identical for the above-expressed models. Theoretically speaking, y t represents cumulative cases (confirmed, deceased, or recovered cases) of COVID-19 observed as on tth time, d is the asymptote that accounts for the predicted maximum of cumulative cases at the end of the outbreak, b is the slope around the inflection point and represents the growth rate coefficient, e is the time at inflection that highlights the (1) time when the maximum daily cases will occur. The mere difference between these two models is that the logistic model is symmetric around the inflection point while Gompertz is not. The estimation of above-mentioned non-linear regression models is done within the R environment. The information metric viz AIC (Akaike Information Criterion) and its generalised version i.e., WAIC (Watanabe-Akaike Information Criterion) are used for comparison of classical and Bayesian models, respectively. In addition, the coefficient of determination ( R 2 ) is used to assess the performance of the considered models based on their ability to fit the data. It is defined as where y t indicate the cumulative cases of interest, ȳ , on the other hand, is the corresponding average of the cases. Here, the value of the coefficient of determination closer to one represents a more accurate fitting. We have used the data from the official website, Covid19India [7]. The COVID-19 data up to August 06, 2020 have been used for carrying out predictions in this paper. As already mentioned in the previous section [3] , we have considered informative priors, and such information is extracted through the principles of least square estimation (LSE). Furthermore, from the time plot and the LSE's result of the observed cases, we noticed that the parameter d, representing the asymptote of the curve, has substantially different among confirmed, deceased, and recovered cases. The typical reasons for such variations include the number of tests being conducted and the reported confirmed cases in different states. Nevertheless, we did not find any such variations for the other two parameters, b and e. Therefore, for defining the prior distribution of parameter d, we used classical results as well as other aforesaid additional information. In addition, for parameters b and e, we used the least square results, i.e., the parameter estimates and the standard error of these estimates to specify the prior distribution. Within the framework of above discussed setting, the prior information on the parameter d is provided as follows: For confirmed cases, For recovered and deceased cases, On the other hand, the prior informations on the parameters b and e are provided as: where d i , b i and ê i are least square estimates corresponding to ith state. T i and C i be the total number of tests conducted at ith state and total number of confirmed cases at ith state respectively. T and C be the total number of tests conducted and total number of cases occured in UP state respectively. d , b and e are the corresponding standard errors of the least square estimates of d, b and e. Moreover, Gibbs sampling algorithm is employed to obtain the data-based inferences under Bayesian setup. The occurrence of daily and cumulative cases of COVID-19 for the selected states have been depicted in Fig. 1 , through bar graphs and scatter plots, respectively, with an rationale to view how cases grow over time for different states. These graphs directly indicate that all considered states exhibits the different types of trends for event occurrence at the same time due to the different stages of COVID-19, prevailed in the states. In this regard, authors also figured out the estimates of the model's parameters using least square principle for the indicators of the interest and listed them in Tables 1, 2 and 3, respectively. An immediate observation from Tables 1 and 2 is that the upper asymptote values of the Gompertz model are greater as compared to those obtained from the logistic model. A similar picture, except for Andhra Pradesh (AP) state, is also provided by the Table 3 wherein we noticed that the estimated asymptote of Gompertz models is significantly higher than that of the logistic model. Besides, we found that there is a difference between the values of time inflection for both the models. The value of R 2 pointed out that both the models fit the data well with respect to all the considered indicators. From Fig. 1 shown that Delhi crossed the inflection point of the curve and reached it's a plateau. Furthermore, from Tables 1, 2, 3, we found that the estimated values of standard error of the asymptote is small, i.e., there are very little chances of fluctuation in the upper asymptote. This signifies that Delhi is now entered into the controlled stage of the pandemic. All considered states except Delhi, the COVID-19 pandemic have not reached in the plateau-state, so there is much uncertainty regarding the asymptote of the curve. Besides, AIC values from Tables 1 and 3 reveal that the logistic model is better model for the confirmed and recovered cases of the majority of the states. However, for deceased cases, Table 2 showed that the Gompertz model outperformed the logistic model in majority of the states. -linear regression model. Tables 4, 5 , and 6 provide the screen shot of the Bayesian estimates, standard error of the estimates, and R 2 for the logistic and Gompertz models fitted to the cumulative confirmed cases, cumulative deceased cases and cumulative recovered cases of COVID-19 in UP respectively. Also, Watanabe Akaike information criterion (WAIC) is computed. Based on this criterion, it is evident that prior information, extracted from the Maharashtra state, is best choice among others. In addition, the fitted curves also suggest that the course of the COVID-19 pandemic curve of UP is in line with the Maharashtra curve. However, this result does not interpret that the prevalence rate of both the state is equal, or they have equal cumulative cases. Therefore, It seems reasonable to consider prior information from the Maharashtra state for predicting the case of interest more accurately. Alternatively, we can also compare the situation of COVID-19 in both the states; MH and UP, from the beginning of the pandemic. There were inadequacy of mass level screening of COVID-19 in both the states. Many people migrated from other countries to Maharashtra, and there is no contact tracing in the beginning. Similarly millions of people and labor class workers migrated from other states of India to UP, and there is a problem of contact tracing of the infected people. In Figs. 5, 6 and 7, we depicted the prediction curve of confirmed, deceased and recovered cases of COVID-19 using the best fitted model i.e., Bayesian non-linear regression model by considering prior information from MH. The prediction results show that the model can predict the pandemic situation through confirmed and recovered cases of COVID-19 very well, however, for the daily deceased number of cases, prediction is not upto mark, the possible explanation for this may be that the factors affecting the death rate are different from that of cumulative confirmed cases and cumulative recovered cases and hence, there is uncertainty in the daily number of deceased cases. The predicted maximum cumulative number of confirmed, deceased, and recovered cases by all the considered models are: 357576-4749919, 1889-11916, and 72836-1908931 respectively as per the current trend. The daily number of confirmed, deceased, and recovered cases will be maximum in between 50-155 days, 24-112 days and 36-145 days from 16 June 2020 respectively. In Table 9 , we provided the predicted maximum number of cumulative confirmed cases, deceased cases and recovered cases and the date when these cases will reach to the maximum using the best-fitted model in UP. The findings exhibited that the maximum cumulative confirmed cases will be occur on 3 June 2021 and cases will be 1157335. The maximum cumulative deceased cases will reach to the 5843 on 28 March 2021 and the maximum recovered cases will be 1145829 on 1 July 2021. The daily number of confirmed, deceased and recovered cases will be maximum at 104th day, 73rd day and 124th day, respectively, from 16 June 2020 using the best-fitted model. After analysed the existing data with the help of the above-defined setup, we observed that the logistic model along with the prior information selected from the Maharashtra state, is the suitable choice. In fact this is where the R 2 and information criterion are most significant among the assessed models. The results documented in the Tables 7 and 8 highlighted that the predicted daily confirmed, deceased, and recovered cases on September 25, 2020, will register the level of 12,589, 40, and 9057 cases respectively. As a result, the predicted cumulative confirmed, deceased, and recovered cases will climbed-up to the level of the 548,710, 4245 and 328,662 cases respectively. In addition, we also concluded that the model predicts the values for every three cases confirmed, recovered, and recovered very well to a great extent. Note that due to the space constraints, predictions for the most crucial period that is recent present are made available at daily frequency while for rest of the period, presented at the interval Fig. 7 Cumulative and daily recovered cases of COVID-19 in Uttar Pradesh, India, and predicted curve by Bayesian non-linear regression model using prior information of 5 days. The results of the best fitted model also showed that the COVID-19 will be over probably by early-June, 2021. We conclude that the proposed method can be useful, and we believe this study can provide some valuable information to strengthen the implementation of strategies to increase the health system capacity and also help the public health authorities to make the relevant decision (Table 9 ). A review of the 2019 Novel Coronavirus (COVID-19) based on current evidence Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China World Health Organization: Coronavirus disease 2019 (COVID-19) situation report-51 India becomes third worst affected country by coronavirus, overtakes Russia COVID-19 pandemic in India: what lies ahead Worldometer-real time world statistics COVID-19: forecasting short term hospital needs in France Data-based analysis, modelling and forecasting of the COVID-19 outbreak Modelling the epidemiological trends and behavior of COVID19 in Italy Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Modeling and forecasting the COVID-19 pandemic in India Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis SEIR and Regression Model based COVID-19 outbreak predictions in India Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming Applied regression analysis Modeling and forecasting trend of COVID-19 epidemic in Iran Nonlinear regression modelling Bayesian Data Analysis Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Author contributions D., NKS, SKS, US.Funding Not applicable. The datasets analysed are available on the official website Covid19India, https:// www. covid 19ind ia. org/. The authors declare that they have no competing interests. Consent for publication Not applicable.