key: cord-304429-qmcrvufu
authors: Deepmala,; Srivastava, N. K.; Kumar, V.; Singh, S. K.
title: Analysis and prediction of Covid-19 spreading through Bayesian modelling with a case study of Uttar Pradesh, India
date: 2020-08-31
journal: nan
DOI: 10.1101/2020.08.25.20180265
sha: 
doc_id: 304429
cord_uid: qmcrvufu

The pandemic of coronavirus disease 2019 (COVID-19) started in Wuhan, China, and spread worldwide. In India, COVID-19 cases increased rapidly throughout India. Various measures like awareness program, social distancing, and contact tracing have been implemented to control the COVID-19 outbreak. In the absence of any vaccine, the prediction of the confirmed, deceased, and recovered cases is required to enhance the health care system's capacity and control the transmission. In this study, the cumulative and the daily confirmed, deceased, and recovered cases in Uttar Pradesh, India, were analyzed. We used the Logistic and Gompertz non-linear regression model using a Bayesian paradigm. We build the prior distribution of the model using information obtained from some other states of India, which are already reached at the advanced stage of COVID-19. Results from the analysis indicated that the predicted maximum number of confirmed, deceased, and recovered cases will be around 1157335, 5843, and 1145829. The daily number of confirmed, deceased, and recovered cases will be maximum at 104th day, 73rd day, and 124th day from 16 June 2020. Moreover, the COVID-19 will be over probably by early-June, 2021. The analysis did not consider any changes in government control measures. We hope this study can provide some relevant information to the government and health officials.

The world is facing the outbreak of the coronavirus which is a highly contagious disease also it is declared as a global public health emergency by the World Health Organisation (WHO) [1] . It is originated in Wuhan, Hubei, China in December 2019 and spread all over the world. The World Health Organization officially named the disease COVID-19 and classified the COVID-19 outbreak as pandemic on March 11, 2020 [9, 21] This virus is highly infectious and transmittable. Precautionary measures for COVID-19 contain maintaining social distancing, wearing masks, cleaning hands regularly, avoiding touching the eyes, nose, and mouth (WHO, 2020). The first confirmed case of COVID-19 in India was recorded on 30 January 2020. Since then, the number of confirmed cases and deceased cases is continuously increasing and most of the states affected by this virus COVID-19. This disease has recorded millions of cases and thousands of deaths due to rapid pandemic potential. At present, India is the most affected country in Asia and has the third-largest number of confirmed cases in the world after the USA and Brazil [11] . It could be soon in the top spot in terms of cases [7, 22] . In India, Maharastra has the highest number of cases of COVID-19 [6] . Currently, in Maharastra, more than four lakhs cases are recorded. Tamil Nadu, Andhra Pradesh, Karnataka, NCT of Delhi and Uttar Pradesh have reported more than one lakh cases [6] . There is no vaccine until now of this virus. The absence of a vaccine creates the situation worse for the already overstretched health care system. For example, the number of hospital beds per thousand population is less than one [10] . It is indicating the miserable situation of India's health care system.

In India, Uttar Pradesh (UP) is a state that holds a significant value concerning the economy, trade, manufacturing, and services. It has the second-largest economy in India after Maharashtra in terms of net state domestic product. As of 06 August 2020 in India, 2025423 confirmed cases had been reported. Of these, 108974 confirmed cases are from the state of Uttar Pradesh, India [6] . Uttar Pradesh, India's largest democracy with a population of 200 million, will have difficulty controlling the transmission of COVID-19 among its population. In such a grim scenario, many outcomes are potential interest to policymakers; for example: how many confirmed, deceased, and recovered cases will be seen, and by what time? How many will be admitted to ICU? How many will need ventilators? When will be started the declining trend of COVID-19? The government could take the optimal steps to improve the health system to slow down the transmission rate by knowing even roughly the answer to these questions. Multiple strategies would be highly necessary to handle the current outbreak; these include computational modeling, statistical tools, and quantitative analyses to control the spread and the rapid development of a new treatment. It is imperative that the trends and predictions for the state of Uttar Pradesh must be studied well so that effective strategies can be implemented. We focus on the number of confirmed, deceased, and recovered cases as our target of prediction due to limited available data on the other outcomes.

Several mathematical and statistical models can be established to study and analyze the transmission process of COVID-19. So that we can accurately predict the prevalence and explore the situation of the probability of cases and the recovery or deaths. Therefore, to reduce the harm of the COVID-19 outbreak, the analysis and research of Covid-19 prediction models have become a hot research topic.

The most commonly used models for predicting infectious diseases are internetbased infectious disease prediction models, time series prediction models, and differential equation prediction models based on dynamics. Several models have been proposed in literature [5, 8, 13, 16] . The studies are very less for the India region, and some work has been done by [12, [18] [19] [20] . From an analysis point of view, non-linear models are also important methods to deal with the problem of prediction [3, 14, 15] . As we know, the mechanisms of COVID-19 spreading are not completely understood. Therefore Bayesian analysis of the epidemic spreading can be interesting. This approach has the advantage that it includes some other prior information in the model other than the data.

This study focuses on the analysis and the prediction of the epidemic situation of COVID-19 in the state of Uttar Pradesh, India, using logistic and Gompertz nonlinear regression model, which are accord with the statistical law of epidemiology. The Bayesian approach [4] is used for estimation purposes. Here prior information is obtained from other states of India.

In this paper, we used the following major databases:

(1) Github-Covid-19 in India available from https://github.com/imdevskp/covid-19-india-data (2) Covid19India is a open source database for India available from: https://www.covid19india.org/

The main objective of the study is to make predictions of the cumulative and daily confirmed, deceased, and recovered cases in the state of Uttar Pradesh, India. 

respectively. Both models have the same meaning of the parameters. Here y t represents cumulative cases (confirmed, deceased or recovered cases) of COVID-19 observed at t th time, d is the asymptote that represents the predicted maximum of cumulative cases at the end of the outbreak, b is the slope around the inflection point that represents the growth rate coefficient, e is the time at inflection that represents the time when will occur the maximum daily cases. The difference between these two models is that the logistic model is symmetric around the inflection point while Gompertz is not.

Non-linear regression models were fitted using the drm function of R package drc [17] of R statistical software. The criteria AIC (Akaike Information Criterion) and WAIC ( Watanabe Akaike information criterion) are used for comparison of classical and Bayesian models, respectively. The regression coefficient (R 2 ) is used to measure the fitting ability of various models. It is defined as

where y t is the actual cumulative confirmed, deceased, or recovered COVID-19 cases, y is the average of the actual cumulative confirmed, deceased, or recovered COVID-19 cases. The fitting coefficient is closer to 1 that represents more accurate prediction.

By using the results of the non-linear models fitted by least square estimation (LSE), we define the prior distribution of the parameters of the Bayesian non-linear models for estimating and predicting the cumulative and the daily confirmed, deceased, and recovered cases of Uttar Pradesh state. From the scatter plot of UP states and other states, we found that the parameter d that represents asymptote of the curve has a large number of variations for confirmed, deceased, and recovered cases. But we did not find such variations for the other two parameters b and e. Therefore for defining the prior distribution of parameter d, we use least square estimates and standard error of these estimates along with other important factors associated with that considered states, due to which such variations are being found. The considered important factors as additional information are: the total number of tests conducted and the total number of confirmed cases (as of till 06 August 2020). But for parameters b and e, we use only least square estimates and standard error of these estimates to specify the prior distribution.

The prior information on the parameter d can be provided as: For confirmed cases,

For recovered and deceased cases,

The prior information on the parameters b and e can be provided as:

whered i ,b i andê i are least square estimates for i th state. T i and C i be the total number of tests conducted at i-th state and total number of confirmed cases at i-th state respectively. T and C be the total number of tests conducted in UP state and total number of cases occured in UP state respectively. σ d , σ b and σ e are standard errors of least square estimates of d, b and e. Parameter inference was done via the Gibbs sampling algorithm. We use Watanabe-Akaike information criterion (WAIC) and R 2 to compare models. These Bayesian analyses were done using the bmrs [2] R package. The prediction on daily cases for UP state was done by using the results of Bayesian non-linear regression model for cumulative cases.

In Figure 1 , we plotted the bar graphs for daily confirmed cases, daily deceased cases and daily recovered cases, and the scatter plot for the cumulative confirmed cases, cumulative deceased cases, and cumulative recovered cases of COVID-19 for some 4 All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted August 31, 2020. . states of India. Figure 1 shows that there are fluctuations in the daily confirmed cases, daily deceased cases, and the daily recovered cases of COVID-19 for all the considered states. Tables 1, 2 and 3 represent the least square estimates for the parameters of the logistic and Gompertz models with respect to considered variables. We used the regression coefficient R 2 to analyze the fitting ability of the models. The value of R 2 closer to one indicates a better prediction. Tables 1 and 2 show that the upper asymptote value is found to be greater through the Gompertz model as compared to those obtained using the logistic model. From Table 3 , we observe that except Andhra Pradesh (AP), Gompertz models provided the greater value of asymptote than the logistic model. We see that there is a difference between the values of time inflection for both the models. The value of R 2 indicates that both the models fit the data well with respect to all the considered variables. From the graphs and tables, we see that Delhi is in the controlled stage of the pandemic. Delhi crossed the inflection point of the curve and reached it's a plateau, and there are fewer chances of fluctuation in the upper asymptote, which explains the low-value standard error of the estimate of the asymptote. From the COVID-19 updates of India, we observe that in Delhi, the death rate is about 2 %, and the recovered percentage is more than 90 %. The possible reasons behind this achievement may be the vast screening of COVID-19, effective containment, patients treatment, and contact tracing. Maharastra has the highest number of cumulative confirmed cases and the cumulative deceased cases (as of till 06 August 2020). Except Delhi the COVID-19 pandemic is not in the plateau for other considered states, so there is much uncertainty regarding the asymptote of the curve, which describes the high values of the standard error of the estimate of the asymptote. AIC values from Tables 1 and 3 showed that the logistic model was the best model for the data of the majority of the states. Table 2 showed that the logistic model was the best model only to the data of Andhra Pradesh and Karnataka, and for the rest of the states, the Gompertz model was found be best.

Our aim is not to analyze COVID-19 in all the states which have been considered. The aim of this paper is the analysis and prediction of the cumulative confirmed cases, cumulative deceased cases, and the cumulative recovered cases in Uttar Pradesh, India. However, we explored the situation of COVID-19 in other states of India to extract the prior information for the Bayesian non-linear regression models. Figures 2, 3 and  4 show the cumulative and the daily number of confirmed cases, deceased cases, and recovered cases of COVID-19 in Uttar Pradesh respectively and the fitted curve by the Baysian non-linear regression model using the prior information. In these figures, we see that due to the lack of randomness in cumulative confirmed cases, cumulative deceased cases, and cumulative recovered cases, the sum of the square residual is quite low.

Tables 4, 5, and 6 represent the Bayesian estimates, standard error of the estimates, and R 2 for the logistic and Gompertz models fitted to the cumulative confirmed cases, cumulative deceased cases and cumulative recovered cases of COVID-19 in UP respectively. Also, Watanabe Akaike information criterion (WAIC) is computed from the fitting of Bayesian Gompertz and logistic models to the data of the cumulative confirmed cases, cumulative deceased cases, and cumulative recovered cases of COVID-19 in UP, India. We used the prior distributions based on the least square estimates from the data of other considered states.

Based on the WAIC criterion, we can say that the COVID-19 pandemic curve of UP is quite closer to the Maharastra curve. This result does not conclude that the prevalence rate of both the state is equal, or they have equal cumulative confirmed cases, cumulative deceased cases, and cumulative recovered cases. The meaning of this 5 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 31, 2020. . perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 31, 2020. . perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted August 31, 2020. . https://doi.org/10.1101/2020.08.25.20180265 doi: medRxiv preprint In Figures 5, 6 and 7 , we plotted the predicted curve by Bayesian non-linear regression model using prior information and the best fitted model, i.e., logistic model for UP state. The prediction results show that the model can predict the pandemic situation of COVID-19 very well for the considered cumulative and daily variables. Prediction is not well for the daily decreased number of cases, the possible explanation for this perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 31, 2020. . https://doi.org/10.1101/2020.08.25.20180265 doi: medRxiv preprint may be that the factors affecting the death rate are different from that of cumulative confirmed cases and cumulative recovered cases and hence, there is uncertainty in the daily number of deceased cases.

According to the fitting analysis of the existing data, the logistic model with the Maharashtra prior may be the best among the considered models studied in this paper. In Tables 7 and 8 , we provided the predicted values of the daily and the cumulative number of confirmed cases, deceased cases and recovered cases of UP state. We presented the actual and predicted values up to 6 August 2020, and after that, only predicted values are given up to 25 September 2020. Here due to space constraints, we presented only five days interval and recent out of sample values at the daily level. From the tables, we conclude that the model predicts the values well to a great extent. Table 7 shows that the predicted daily confirmed cases on 25 September 2020 will be 12589, the predicted daily decreased number will be 40 on the same date, and the predicted daily recovered cases will be 9057 on the same date. Table 8 shows that the predicted cumulative confirmed cases on 25 September 2020 will be 548710, the predicted cumulative deceased number will be 4245 on the same date, and the predicted cumulative recovered cases will be 328662 on the same date.

The predicted maximum cumulative number of confirmed cases, deceased cases, and recovered cases by all the considered models will be: 357576-4749919, 1889-11916, and 72836-1908931 respectively as per the current trend. The daily number of confirmed, deceased, and recovered cases will be maximum in between 50-155 days, 24-112 days and 36-145 days from 16 June 2020 respectively. In Table 9 , we provided the predicted maximum cumulative number of confirmed cases, deceased cases and recovered cases and the predicted date when theses cases reach to the maximum using the best-fitted model in UP state. Results show that the maximum number of cumulative number of confirmed cases will be 1157335 on 3 June 2021. The maximum number of cumulative deceased will reach to the 5843 on 28 March 2021 and the maximum number of recovered cases will be 1145829 on 1 July 2021. The daily number of confirmed, deceased and recovered cases will be maximum at 104th day, 73rd day and 124th day from 16 June 2020 using the best-fitted model. The results of the best fitted model also show that the COVID-19 will be over probably by early-June, 2021. We conclude that the proposed method can be useful, and we believe this study can provide some valuable information to strengthen the implementation of strategies to increase the health system capacity and also help the public health authorities to make the relevant decision.

A review of the 2019 Novel Coronavirus (COVID-19) based on current evidence

Advanced Bayesian multilevel modeling with the R package brms

Applied regression analysis

Bayesian Data Analysis

COVID-19: Forecasting short term hospital needs in France

COVID-19 pandemic in India: what lies ahead

Data-based analysis, modelling and forecasting of the COVID-19 outbreak

Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China

Hospital beds (per 1,000 people). (2020)

India becomes third worst affected country by coronavirus, overtakes Russia

Modeling and forecasting the COVID-19 pandemic in India

Modelling the epidemiological trends and behavior of COVID19 in Italy

Modeling and forecasting trend of covid-19 epidemic in iran

Nonlinear regression modelling

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study

Package drc. Creative Commons

Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India

SEIR and Regression Model based COVID-19 outbreak predictions in India

Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming

Coronavirus disease 2019 (COVID-19) situation report -51

Worldometer-real time world statistics

Deceased