key: cord-0839142-n82362ly authors: Kumar, Jitendra; Agiwal, Varun; Yau, Chun Yip title: Study of the trend pattern of COVID-19 using spline-based time series model: a Bayesian paradigm date: 2021-06-07 journal: Jpn J Stat Data Sci DOI: 10.1007/s42081-021-00127-x sha: 7dc5b8cda2704196aa62e4fa71f58aadb8338b66 doc_id: 839142 cord_uid: n82362ly A vast majority of the countries are under economic and health crises due to the current epidemic of coronavirus disease 2019 (COVID-19). The present study analyzes the COVID-19 using time series, an essential gizmo for knowing the enlargement of infection and its changing behavior, especially the trending model. We consider an autoregressive model with a non-linear time trend component that approximately converts into the linear trend using the spline function. The spline function splits the series of COVID-19 into different piecewise segments between respective knots in the form of various growth stages and fits the linear time trend. First, we obtain the number of knots with their locations in the COVID-19 series to identify the transmission stages of COVID-19 infection. Then, the estimation of the model parameters is obtained under the Bayesian setup for the best-fitted model. The results advocate that the proposed model appropriately determines the location of knots based on different transmission stages and know the current transmission situation of the COVID-19 pandemic in a country. The 2019 novel coronavirus is getting a lot of attention, because it is a new kind of pandemic disease that affects everywhere in the universe. Million of people have died from this disease, and million of confirmed cases are recorded worldwide because of the nonexistence of antiviral drugs and vaccines. Researchers develop various methodologies to analyze and control the spread of COVID-19 and predict the future perspective of coronavirus cases. Jiang et al. (2020) established the time series and kinetic model for infectious diseases, and predicted the trend and short-term prediction of the transmission of COVID-19. Al-Rousan and AL-Najjar (2020) analyzed the consequence of various factors such as sex, region, infection reasons, and birth year on recovered and deceased cases of the South Korea region. The results found that sex, region, and infection reasons affected both recovered and dead cases, while birth year only involved in deadly patients. Gondauri et al. (2020) considered the chain-binomial type Bailey's model for studying and analyzing the correlation between the total volumes of COVID-19 virus spread and recovery for the different countries. Most of the studies investigate the growth of COVID-19 cases based on various regression and time series models, because these models are frequently applied to examine the development or trend of any disease. Under COVID-19 pandemic situation, series of daily recorded cases having non-linear characteristics such as shifting behavior, non-stationary, etc., shows a non-linear trend pattern due to enter in different stages of transmission. This nonlinear trend may be casted by a piecewise time series model with a high order of polynomial-time way. The spline function is the alternative to deal with such a time trend polynomial in a piecewise form. It is analyzed period wise discontinuity by fitting a polynomial of a high order and join at knots. Knots are the points when there are sudden up and down in the series, and the result is a piecewise smooth time function. Eubank (1999) observed that the smoothest piecewise polynomial is a spline function that holds a segmented nature at present. Still, Hurley et al. (2006) called splines as continuous and smooth lines or curves function. Morton et al. (2009) considered a smoothing spline function to analyze the trend of generalized additive models with correlated errors and applied to data from a chemical process and to stream salinity measurements. Montoril et al. (2014) studied the estimation of the functional-coefficient regression model by splines with autoregressive errors and showed the convergence rates of the proposed estimator. Qiao et al. (2015) looked at a B-spline model on the durability of changes in the frequency signal over time. Conrad et al. (2017) modeled the forced expiratory volume 1 (FEV1) data from cystic fibrosis (CF) and chronic obstructive pulmonary disease (COPD) using median regression splines. Osmani et al. (2019) used the B-spline and kernel methods to estimate the model coefficients and showed the application for psoriasis patient's data. In this paper, we study the trend pattern of the COVID-19 series using an autoregressive model with a trend approximated by a linear spline function. Identification of the number of knots and their locations are obtained using Bayes factor and posterior probability, respectively. We use appropriate priors for model parameters to determine the posterior distribution and find the conditional posterior distribution for making inferences about the parameters. We apply the Metropolis-Hastings (M-H) algorithm within the Gibbs sampler to generate posterior samples from the conditional posterior distribution and get the Bayesian estimation for unknown parameters. The number and location of knots within a country explain the stages of transmission and the time points for achieving. Thus, this study gives an overview of the present trend of daily recorded COVID-19 cases and provides the current transmission stage of the most affected countries. A time series model is popularly known to regulate the trend pattern for the series of coronavirus . The COVID-19 transmission is mainly people to people contact and attains four stages: stage-1 (imported cases), stage-2 (local transmission), stage-3 (community transmission), and stage-4 (transmission out of control) as per the infection trend. Trend/growth patterns of the daily COVID-19 recorded cases are not always linear due to attain different transmission stages. Hence, this reaches a non-linear form, because more variability observes in the growth rate of COVID-19 infected cases. Thus, rather than using a linear process, a non-linear model is more suitable to study the trend scenario. The present paper expresses the non-linear trend using the spline function. Spline function is a piecewise polynomial segment, join together at knots based on shifts-or-quantal-jumps. The knot is a common fusion point that occurs when pattern behaviors at different intervals are changed. This change is also seen in the COVID-19 series based on different transmission stages, because the infected person's rate within a country depends upon various steps taken by the government and administration. Thus, knots are useful to determine a country is achieved which stage of transmission. For example, the GDP of India during demonetization goes down. Therefore, the GDP series pattern transforms at different periods, such as before, after, and in-between demonetization. Hence, there is a need to determine the exact location of the point (knots) where the series structure is changed. The detailed discussion to select the location of knots is discussed in Denison (1998 ), Biller (2000 , and Ülker and Arslan (2009) . Recently, this model is discussed by Kumar et al. (2020) for testing the unit root hypothesis in the presence of spline function through posterior odds ratio and applied in monthly import series of ASEAN Regional Forum (ARF) countries. The complete detail about this model is well described by Kumar et al. (2020) . Here, we only write the key expression of the model. Let {y t : t = 1, 2, …,T} is a time series from the model given in Eq. (1): (1) In Eq. (1), ρ is the autoregressive coefficient, δ 0 is the intercept term, δ is the trend coefficient, r is the number of knots that contain the location of knots t 1 , t 2 ,…, t r , and ψ i is the coefficient of the ith knot. The ε t 's are i.i.d. normally distributed random variables with mean zero and unknown variance τ −1 and s i (t) is a spline function, describes as a linear polynomial form In the existing literature, researchers do the modeling of the COVID-19 series based on various regression and time series models but ignore the irregular behavior of daily conformed cases, because most countries take necessary steps to control the spread of COVID-19. These steps change the growth of COVID-19 infected patients in an up and down manner. As a result, the trend pattern is not linear form, and there is an occurrence of sudden jumping phenomena in the series at different stages of transmission. Thus, there is a need to apply other non-linear piecewise models that provide better results based on various transmission stages. The proposed model is very suitable to analyze the non-linear trend pattern of the COVID-19 series, because this model splits the series into a linear format at their knot locations to observe the infection growth using the stages of transmission (Liu, 2009 , Lusa & Ahlin, 2020 . In matrix notations, the model is marked as where and L is T × T matrix with the entire (i + 1)th row and the ith column elements equal to 1, and the remaining elements are equal to 0 and I is T × T identity matrix. (2) The main objective is to study the tendency of daily COVID-19 confirmed cases by fitting this model in a piecewise form and understand the spread of infection in terms of different transmission stages. For this, the number of knots with their locations is determined using posterior probability in each country COVID-19 series to know the transmission stages. These study countries are the USA, India, Brazil, Russia, South Africa, and Peru. Then, Bayesian estimators of the model parameters are derived using the conditional posterior distribution and obtain the estimated values using numerical techniques. For analysis purposes, a Bayesian approach is used to make inferences about the unknown parameter and draw a better inference. In the Bayesian approach, the posterior distribution is the product of the likelihood function and prior distribution. Here, the discrete uniform prior is assumed for the location of knots under consideration of all ordered sub-sequences of (2, 3,…,T) of length r, i.e., ( |r) = 1 .., t r ) for given r. The remaining model parameters ( , , , ) consider similar prior distributions, as described in Kumar et al. (2020) . The form of prior distribution of the parameters is Then, the posterior specification for this model is For parameter estimation under the Bayesian setup, a loss function is used to select the best estimator based on the posterior distribution that minimizes the risk associated with each parameter. Here, we consider the squared error loss function (SELF) as a symmetric loss function. Under this loss function, the Bayesian estimator of a parameter is the posterior mean of the posterior distribution. Due to multiple integrals, it cannot solve without any computational method. Therefore, a computational approach such as the Markov chain Monte Carlo (MCMC) technique is applied for obtaining the value from the estimators. For that, we derive the conditional posterior distribution/ probability for the model parameters The location of knots and autoregressive coefficient are not closed distribution forms. Hence, the M-H algorithm is applied to draw samples from the conditional posterior distribution. In contrast, the remaining parameters generate posterior samples from the Gibbs sampler algorithm, because conditional posterior distribution is obtained in a close distribution. The step-by-step procedure for implementing this proposed method is given as: Step 1. At the first iteration, start with an initial value of the parameters ρ (0) , γ (0) , ψ (0) , and τ (0) . Step 2. Generate posterior samples from the conditional posterior distribution of the model parameters given in Eqs. (5)-(8) and putting the estimated value of parameters in the conditional posterior distribution of the location of a knot. Step 3. For the ith location of a knot, it is detected by considering every time point in the interval (T i−1 + 1, T − r + i − 1) as a knot point and recording the probabilities that occur parallel to these time points. Step 4. We find the ith location of a knot as a maximum of all probabilities corresponding to a single time point, i.e., t i = max t i |y, r . Step 5. We get a vector of knot points at the first iteration ̂ (1) = t 1 ,t 2 , … ,t r . Step 6. Repeat the process up to k iterations, make a sequence of parameters and location of knots. The average comes out to be the estimated value of the parameters and location of knots. The number of knots is determined using the Bayes factor. The Bayes factor (BF n,m ) is the ratio of one versus another model/hypothesis, i.e., it defines by the posterior probability of n knots divided by m knots. For this model, BF n,m is expressed as The procedure starts with the series has no knot and evaluates the evidence to support one or more knots. If there is significant evidence for supporting the existence of knots, then check whether there is one knot, two knots, or so on. Therefore, we aim to find a piece of strong evidence between the models/hypotheses before making a better decision about the number of knots. Kass and Raftery (1995) provided a rule of thumb for interpreting the magnitude of Bayes factor using the transformation 2log e (BF n,m ) as defined in Table 1 and put on the same scale as the likelihood ratio. Another approach is to find out the number of knots using an information criterion discussed by Kumar et al. (2020) . We collect COVID-19 data from the World Health Organization's official daily reports (https:// www. who. int/ emerg encies/ disea ses/ novel-coron avirus-2019/ situa tion-repor ts). This report covers the total number of infected people due to this virus daily for every country. We model the series of some countries (USA, India, Brazil, Russia, South Africa, and Peru) that are most affected and determine the growth structure and current stage of transmission by fitting the proposed model. The study starts from 500 outbreaks of coronavirus cases and up to the date on 1st September 2020 for the selected countries. We notice that confirmed cases of some countries like the USA, India, Brazil, and Peru are rising and increasing rapidly. In contrast, the coronavirus spread is slowed down in the remaining countries (Russia and South Africa). Based on the proposed methodology, the number of knots is determined using the Bayes factor to know the transmission stage of a country, and the results are recorded in Table 2 . From Table 2 , we observe that the proposed model with r = 3 knots is superior to the model with a lesser number of knots for the series of USA, Brazil, and India, because the corresponding value of 2log e (BF n,m ) is between 2 and 6 (positive evidence). It indicates a piece of robust evidence in favor of r = 3 against r = 0, 1, and 2 for the observed series. It concludes that these countries achieve the third stage of coronavirus, i.e., community transmission. The following countries Russia, South Africa, and Peru obtain the model with r = 2 knots, because strong and positive evidence in favor of two knots is recorded compared to r = 0, 1, and 3 knots. For these countries, a maximum of two knots is presented to fit the proposed model as these countries control the COVID-19 confirmed cases in the second stage of transmission. Hence, we observe that the first knot happens in the early days of transmission of the coronavirus, because most of the cases are reported based on travel history from the affected countries, whereas second and third knots are recorded based on local and community transmission within the country, respectively. Hence, the number of knots shows the stage of transmission of the COVID-19 for a particular country. Once the suitable number of knots is determined, locations of the knots are found based on the conditional posterior probability given in Eq. (4) . The values of the conditional posterior probability are fitted to the observed series of the confirmed cases and display in Fig. 1 . The occurrence of the locations of knots is showed by vertical dash lines at which maximum probability is recorded at a particular interval. Based on Fig. 1 , the first joint point is selected by considering every time point in the interval (2, T − r) as a knot location and record the probabilities parallel to these time points. The study finds the first-knot point t 1 at the maximum of all probabilities corresponding to a single time point. Next, the second-knot location is also determined based on higher probability in the given interval, i.e., it records the maximum probability among the bunch of all probabilities corresponding to the time interval t 1 + 1, T − r + 1 , denoted as t 2 . Similarly, the (i + 1)th knot location is obtained based on the range t i + 1, T − r + i and get the time point equivalent to the maximum probability. The locations of knots are displayed in Table 3 based on Fig. 1 . Table 3 shows that most countries have occurred the location of first-knot during the lockdown Table 2 Determine the number of knots in the COVID-19 series based on Bayes factor period, suspension of traveling, economic activities restrictions, etc. (Mbunge, 2020; Pai et al., 2020; Soni, 2021; Tang et al., 2020) . During this period, most cases are reported based on the travel history of the infected person who is traveled from affected nations and imports the virus into the country. The location of the second knot happens in the mid of June when the number of daily COVID-19 cases increases rapidly for India, Brazil, and USA (da Candido Saha & Chouhan, 2021; Zhang et al., 2020) . At this period, some relaxations are given by the government to reduce the daily time of lockdown, permit economic activities, and reopen the public offices under some guidelines. Therefore, the COVID-19 virus may have affected nearby people who have direct contact with the infected person. It is still manageable to locate and provides prompt medical care to the infected person and control the spread of the virus. This situation is under control in the country of Russia, South Africa, and Peru (Garba et al., 2020; Stiegler & Bouchard 2020; Zemtsov & Baburin, 2020) . The location of the third knot is obtained in the period when the number of confirmed cases is rapidly increasing (more than 50,000 per day) for the series of India, Brazil, and USA (Lin et al., 2020; Ray & Subramanian, 2020) . In this period, there is a community transmission in various cluster forms at multiple locations and not easily traceable. Based on the best-fitted model with appropriate locations, Bayesian estimated values of the model parameters for each country series are summarized in Table 4 . Table 4 concludes that variability is more in all country series, because all record a higher number of COVID-19 confirmed cases. The series of all countries are stationary based on the estimated value of the autoregressive coefficient (ρ), because it is under the stationarity condition. The positive (negative) value of the intercept term (δ 0 ) indicates the increment (decrement) of the daily confirmed cases when other variables are not there. The positive (negative) value of the trend coefficient (δ) tells the total increase (decrease) of confirmed cases expected to a unit change in the time (t). The daily confirmed cases shows an increasing (decreasing) time trend pattern when a positive (negative) unit change is happened for the estimated value of the spline coefficient (ψ i ). Based on the estimated values of the parameter, the observed and fitted series is plotted in Fig. 2. From Fig. 2 , we observe that estimated values of the model parameters give a better fit over the observed series. Nowadays, COVID-19 pandemic is a severe challenge for the human to survive on earth. The COVID-19 has a wide range of consequences on human life worldwide, because million of people die due to coronavirus. Therefore, there is a need to study the growth of COVID-19 cases based on various predictive models. It is also available in the literature that transmission of the COVID-19 virus has four stages. Based on daily recorded cases, we observe that the structure of the COVID-19 series in various countries is not linear, because many reasons such as lockdown, infection modes, and poor health infrastructure are present to control or expand this disease. Thus, our paper deals with a non-linear time series model using the spline function that switches the non-linear trend component into the linear trend. It is analyzed based on different segments and fits the linear trend autoregressive model at each segment. Each segment shows the stage of COVID-19 transmission. Parameter estimators and the number of knots are determined under the Bayesian approach. The results conclude that the number of knots and their locations are useful to assess the transmission stage and the location of time for attaining this. Hence, the proposed methodology quickly analyzes the nonlinear trend of the COVID-19 series using the spline function. Data analysis of coronavirus COVID-19 epidemic in South Korea based on recovered and death cases Adaptive Bayesian regression splines in semiparametric generalized linear models Median regression spline modeling of longitudinal FEV1 measurements in cystic fibrosis (CF) and chronic obstructive pulmonary disease (COPD) patients Evolution and epidemic spread of SARS-CoV-2 in Brazil Automatic Bayesian curve fitting Nonparametric regression and spline smoothing Modeling the transmission dynamics of the COVID-19 pandemic in South Africa Research on COVID-19 virus spreading statistics based on the examples of the cases from different countries An evaluation of splines in linear regression Statistical analysis on COVID-19 Bayes factors Bayesian unit root test for AR(1) model with trend approximated by linear spline function The spatiotemporal estimation of the risk and the international transmission of COVID-19: A global perspective Non-linear time series modeling using spline-based nonparametric models Restricted cubic splines for modelling periodic data Effects of COVID-19 in South African health system and society: An explanatory study. Diabetes and Metabolic Syndrome Spline estimation of functional coefficient regression models for time series with correlated errors Smoothing splines for trend estimation and prediction in time series Kernel and regression spline smoothing techniques to estimate coefficient in rates model and its application in psoriasis Investigating the dynamics of COVID-19 pandemic in India under lockdown The application of cubic B-spline collocation method in impact force identification India's lockdown: An interim report Lockdown and unlock for COVID-19 and its impact on residential mobility in India: An analysis of the COVID-19 Community Mobility Reports Effects of COVID-19 lockdown phases in India: An atmospheric perspective. Environment, Development and Sustainability South Africa: Challenges and successes of the COVID-19 lockdown Epidemiology of COVID-19 in Brazil: Using a mathematical model to estimate the outbreak peak and temporal evolution Automatic knot adjustment using an artificial immune system for B-spline curve approximation COVID-19: Spatial dynamics and diffusion factors across Russian regions We are thankful to Mila Marinkovic, Preprints Editor of Preprints.org, for offering an effective pre-publication vehicle and considered this manuscript as a Preprint publication form. The authors are thankful to the Editor-in-Chief and the anonymous referees for providing useful comments on an earlier version of this manuscript. Availability of data and materials https:// www. who. int/ emerg encies/ disea ses/ novel-coron avirus-2019/ situa tion-repor ts and https:// ourwo rldin data. org/ coron avirus-source-data.Code availability R-code is provided for the reviewers as per need. The authors declare that they have no conflict of interest. Numerator (r = n) against denominator model (r = m) 2log e (B n,m ) Evidence against model with r = m USA r = 1 against r = 0 0.9998 Not worth more than a bare mention r = 2 against r = 0 1.1162 Not worth more than a bare mention r = 2 against r = 1 0.1164 Not worth more than a bare mention r = 2 against r = 3 1.0334 Not worth more than a bare mention Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.