key: cord-0787588-i637iv7d authors: S.T., Pavan Kumar; Lahiri, Biswajit; Alvarado, Rafael title: Multiple change point estimation of trends in Covid-19 infections and deaths in India as compared with WHO regions date: 2021-09-03 journal: Spat Stat DOI: 10.1016/j.spasta.2021.100538 sha: 6cb26967b5856ee586bc87a12f93f7560a38fb57 doc_id: 787588 cord_uid: i637iv7d The present study aims at estimating the multiple change points for the time series data of COVID-19 confirmed cases and deaths and trend estimation within the estimated multiple change points (MCP) in India as compared with WHO regions. The data were described using descriptive statistical measures, and for the estimation of change point’s E-divisive procedure was employed. Further, the trend within the estimated change points was tested using Sen’s slope and Mann Kendal tests. India, along with the African Region, American region, and South East Asia regions experienced a significant surge in the fresh cases up to the 5th Change point. Among the WHO regions, The American region was the worst hit by the pandemic in case of fresh cases and deaths. While the European region experienced an early negative trend of fresh cases during the 3rd and 4th change point, but later the situation reversed by the 5th (7th July 2020) and 6th (6th August 2020) change point. The trend of deaths in India and the South-East Asia Region was similar, and global deaths had a negative trend from the 4th (17th May 2020) Change point onwards. The change points were estimated with prefixed significance level [Formula: see text] < 0.002. Infections and deaths were positively significant for India and SEARO region across change points. Infection was significant at every 30 days interval across other WHO regions, and any delay in the infections was due to the interventions. The European region is expected to have a second wave of positive infections during the 5th and 6th change points though the early two change points were negatively significant. The study highlights the efficacy of change point analysis in understanding the dynamics of covid-19 cases in India and across the world. It further helps to develop effective public health strategies. A decade after the outbreak of SARS (Severe Acute Respiratory Syndrome) and the Middle East Respiratory Syndrome Corona Virus (MERS-CoV) as a highly pathogenic virus in Middle Eastern countries [12, 70] , the world has been facing a serious problem of Novel Corona Virus Disease, which is well known as Covid-19. The Novel Corona Virus (2019 n-CoV) or Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) of zoonotic origin was first noticed in Wuhan City of Hubei Province of China [23, 39, 71, 72] . It reaches fast across the world and became a public health emergency of international concern in January, declared by World Health Organization as pandemic on 11 th March 2020. Corona virus can cause sickness in both human beings and animals. Patients with corona virus diseases show symptoms like dry cough, fever, fatigue, sore throat, and loss of sense of taste and smell etc. About 80 per cent of people recover without treatment in hospitals, as per the [64] WHO Report (2020). As of 4th March, 2021 globally 105 million cases of Covid-19 have been reported and caused 2.19 million deaths and India has reported 20.7 million cases and 0.263 million deaths. After the first reporting of a confirmed case of COVID-19 infection in Kerala, on January 30, 2020, India became the largest affected country in Asia by June'2020. Six major cities (Mumbai, Delhi, Ahmedabad, Chennai, Pune, and Kolkata) account for about half of all reported cases in the country [42, 52] . Thus, an effective model to track and predict the course of the epidemic is the need of the hour. In American region 38.7 million cases with 34%), the South-East Asia Region (SEARO: 8%), the Eastern Mediterranean Region (EMRO: 6%), the African region (AFRO: 3%) and the West Pacific Region (WPRO-1%) [64] . Studies on various aspects of Covid-19 have already been done by many researchers globally. Many mathematical models worldwide are explored to contribute to the understanding of COVID-19 [6, 30, 34, 45] . A real-time data-driven model to track and predict the course of the epidemic is found effective in developing strategies for public health and economic crisis [14, 24, 49, 52, 53] The ARIMA model (Auto-Regressive Integrated Moving Average) has been widely used in modelling and forecasting the time series data in COVID-19 studies. It explains a given time series based on its past values, that is, its lags and the lagged forecast errors, so that equation can be used to forecast future values [5, 10, 27, 56] The most frequently used models to study the epidemic evolution of disease with time are the SIR model (Susceptible, Infectious, or Recovered) and its variants [33, 44] . The spatial distribution of the infection is however hardly be addressed by the SIR model, despite its usefulness in investigating the time evolution of an epidemic disease. In this regard, an epidemiological model that encompasses a fractal structure allows a more detailed description of the observed data about the virus in terms of geographical distribution [1] . The local stability of equilibrium points with treatment, some sufficient conditions, and uniform asymptotic stability of equilibria with general incidence rate were ensured by Fractional Susceptible-Exposed-Infectious-Removed (SEIR) model [50, 69] . The modified SEIQRP model synthesized from the generalized fractional-order SEIR model successfully captured the development process of COVID-19. It provides an important reference for understanding the trend of the outbreak in the USA [67] . This epidemic spreading model on a network using concepts from percolation theory was tried out to describe the effects of lock-downs within a population. The network model performed well by using constant parameters, while more involvement of time-dependent parameters to be achieved with similar fitting accuracy in the SIR model [8] . Deep Learning-based models were also used for predicting the number of novel death cases (DC), were used to mathematically model the potential effect of corona virus in the fifteen most affected countries of the world. The model used simple linkage functions and provided highly reliable results for the time series prediction of COVID-19 in these countries [54] . Applications of the forecasting and time series analysis on the COVID-19 pandemic data series in Turkey indicate that the daily numbers of deaths and cases are expected to decrease in the short term [35] . Similarly, the regression models for death cases in Iran showed an increasing trend but with some evidence of turning. The infection rate and the population density are having a polynomial relationship, signifying the importance of estimation of multiple change points and trends in a pandemic situation [48] . The nonparametric Pettitt test [47] is an effective method of identifying the change in the temporal trend in any time series. Due to its sensitivity to breaks in the middle of temporal records, it has been extensively used [18, 26, 28, 31, 40, 60] . The change point analysis deciphers a complex pattern of change. It is a nonparametric technique and hence does not depend on any distributional assumptions. Due to this reason, non-parametric methods are being used more widely than parametric methods. A change point method mainly detects the point where the change has occurred significantly in the time-ordered observations [9] . The change point analysis is applied in a wide variety of fields like Financial Modeling [57] , Bioinformatics [43, 58] , Engineering, Climatology [38] , Neuro Science [36] and other fields of science. The technique also applied to estimate the change point in the high dimensional time series data [4] and to estimate the change points in the presence of outliers [15] . In experimental and mathematical sciences, so-called retrospective AMOC (at most one change) change point problem arises in different issues like, epidemiological, quality control etc. But classical change point problems are considered in various bio-statistical and engineering applications [2] . The Multiple change point for multivariate data using the e-divisive method estimates the change point hierarchically and tests the statistical significance of each estimated change points [9] . Often this method showed consistency due to the estimation procedure. In the field of epidemiology, the change point was estimated to identify the outbreak of certain diseases [17] , and it helps to make [13, 21, 25, 29, 46, 68] . In this paper, we proposed to estimate the multiple change points and existing trends within the estimated change points of covid-19 cases and deaths in India as compared to the WHO region. The results from this study could be a yardstick in understanding the directional change of infection of the virus in India and WHO regions as well. The detection of single or multiple change points alone may not bring clarity but, the dynamics within the change points would be extremely important. These data-oriented approaches help the researchers to further investigate the reasons for the varying trend of infection and deaths due to Covid-19. The World Health Organization (WHO), as a global organization manages, and maintains a wide range of data related to global public health and wellbeing. The WHO created a separate dashboard for the purpose of displaying the real-time global database of COVID-19 and also provides a downloadable database for global researchers. This database is comprised of published records by the Ministries of various countries across the globe. The present study is based on the secondary data of the daily fresh cases and deaths of COVID-19, collected from the WHO website [64] . The time-series data across WHO regions comprises of the time frame starting from 4th January 2020 up to 9th September 2020 (250 Days), and for India, new cases and deaths data collected in the time window 30th January 2020 (Since first cases detected on 30th January 2020 in Kerala state) to 9th September 2020 (224 Days). The data were summarized using descriptive measures. The change points in this method are estimated with a data splitting approach. The first step in detecting the change point is to divide the data series into candidate phases, X1:t and Xt+1:n for which the characteristic function differ maximally. J o u r n a l P r e -p r o o f where ‖ ⋅ ‖ denotes the Euclidean distance. n denotes the size of time series. The optimum change point is estimated by considering, which' ' value maximizes ̂. The second step is to estimate if the change point is significant through the permutation test. After the change point is estimated, then the significance is tested at pre-specified significance level α < 0.002 level, a point which maximizes the value of ̂. The method is conducted by generating R permuted time series obtained by randomly changing the time order of the sequence. At the third step, after the change point obtained in the first step found significant, the series is further divided into one more phase to find any additional change points. Those points are hierarchically structured. This process continues till the optimal (Non-significance of the change point) change points obtained for the data series, and no further bisection of data into phases is showing significance. The e-divisive method is consistent with the estimation of change points [9] . The data has been analyzed using the R package [51] esp. for non-parametric multiple change point analysis of multivariate data. The test statistic is given by Where xjand xi are sequential data values. The variance of S is given as Where 't' varies over set of tied ranks and ft is the frequency that rank t appears. The test Where se=Standard deviation: if there is no monotonic trend. The significance of the trend is tested at 5% (α = 0.05) level. The magnitude of the trendis estimated by Sen's slope. The magnitude of the trend estimated using Sen's Slope Q [55] is based on the median values of variables (Xij). The test statistics is given by Where N is odd Where N is even Where N represents the length of the sample and β-Slope estimator. A positive Q value indicates an upward trend and a negative value represents a downward trend. The magnitude of the trend is tested at 5% (α = 0.05) significance level is considered. Descriptive statistics of the Covid-19 data presented in table 1 and 2 revealed that on average 20,000 new fresh cases and 330 deaths were reported daily till 9th September in India. The major portion of new fresh cases and deaths in the SEARO region was reported from India. Total death cases in India were more than the total average deaths from EURO and AFRO region together. As such, there was no significant peak was observed in the case of fresh cases in India but, the fresh cases continued to increase steadily. The daily average covid-19 confirmed cases were estimated to be 110,010 across the world, and the major portion of the confirmed cases globally was contributed from the AMRO region (More than 50 per cent of the confirmed cases) followed by SEARO, EURO, EMRO, AFRO, and WPRO region. J o u r n a l P r e -p r o o f EURO, and AFRO regions were highly inconsistent, and the WPRO region reported a peaked number of death cases in a single day after 100 days of the COVID-19 outbreak (Table 2 ). Results from Tables 3 and 4 reveal that there was a total of 7 change points (̂) estimated for the AFRO and WPRO region whereas, 6 change points were estimated for the rest of the region and also for India. The significance level (α) < 0.002 was considered for estimation of each change points, and R=499 iterations used performing the permutation tests. The e-divisive procedure was executed with α = 1. In the case of the India and World scenario, the first significant change point with respect to change in mean (µ) was estimated on the 31st day after a very first case was reported (Fig 1, 2 and S4 Fig.) . The significant change points were estimated at every 30 days interval for India and WPRO region up to 121 days of covid-19 with a significance level α <0.002. After that, for India, it almost took approximately 50 (8th July-20) days to reach the next significant change point, and it was clear that the spread of the virus narrowed its gap in India (S1 Table) . It was clear that the number of daily fresh cases in India were declined between the 5th and 6th change points while fresh cases around the world were continued in 30 days cycle. The number of fresh deaths was declined from the 4th change point onwards in India (Delayed by a week between the 4th and 5th change point). Around the globe, the fresh cases declined from the 3rd change point onwards. The gap between the change points was continued to widen till the 5th change point. Values in the parenthesis are p-value , T=250 for WHO regions, T=224 for India, Alpha=1. In the case of AMRO (S1 Fig.) , EMRO (S2 Fig.) , EURO (S3 Fig.) , and SEARO (S5 Similarly, the AFRO region (S6 Fig.) experienced that the spread of the virus was delayed between 4th and 5th CP (April-May, 20), but after this point, it had fallen into 30 days cycle. The results indicate that the spread of virus infection was significant at every 30 days interval for India and most of the regions. The trend of fresh cases and deaths for the SEARO region (includes India) was on the higher side as compared to other regions ( Fig. 3 & 4) . The trend of death cases was slightly slower than that of the number of new confirmed cases on daily basis in India (Fig.5 & 6 ) and across the WHO region (Fig.S7 ) as well. The estimated change points on 12 th July 2020 and 9 th August 2020 for EMRO and SEARO respectively were nonsignificant (S2 Table) . New death cases due to covid-19 also reported similarly as new confirmed cases across regions (Fig 7, S8-S13 Fig.) . The death cases of covid-19 were delayed between the 4th and 5th change point. On average, it took approximately 44 days otherwise, the death cycle was at 30 days interval. In the case of the AMRO region (S8 Fig.) , the first significant change point was observed in 93 days (p <0.002) after the first death reported in the region but later change was found in 30 days (Table. 4 & S2 Table) . deaths. The rate of increase was similar to that of fresh cases. The rate was decreased by negative 5 per cent between the 5th and 6th change points, and it has been observed that the spread of the virus was widened its gap during the 5th and 6th change points. In the case of world death cases, the trend was positive significant up to 3rd change point later the trend found non-significant (4th and 5th change point). The growth rate was found positive during the 1st and 2nd change points, and it was negative from 3rd change point onwards. The negative rate due to fewer death cases reported from the WHO regions ie., AMRO, EURO, EMRO, and WPRO. The Table) . J o u r n a l P r e -p r o o f consequently for 2 change points. In the WPRO region, the death cases decreased after 31 days till the 5th change point. Later the trend was found positive during the 6th change point but found non-significant. The death cases reported across the world were on the decreasing trend from the 83rd day onward (26th March), but the trend was not significant (S4 Table) . The e-divisive method estimates the change points in the hierarchical fashion with measures [16] . The nationwide lockdown in India was announced in the month of March [61] , and its effect was observed after 30 days in case of new confirmed cases and 60 days in case of death cases. The trend of fresh cases and deaths was on the increasing trend in India, as compared to other regions. The trend was almost similar to that of the SEARO region in both fresh and death cases, but the trends of fresh cases have a wider range (Fig 3&4) . The trend of new cases exceeds has a wider range than that of deaths in India (Fig.5) . The trend of India's covid-19 infections was on the increasing mode in all the estimated change points, while the world cases were on declining trend during 6 th change point but the trend in both cases was significant. The box whisker plots revealed that the trend of fresh cases in India surpassed the world's trend i.e., the fresh infections were significantly recorded daily in India [26] . The early lockdown process yielded a significant downtrend of fresh cases in the rest of the world during the last change point. Increased testing and better medical facility in the developed countries were also resulting in reduced infections in the world cases [41, 59, 62, 63, 65] . The trend of fresh deaths was increasing in all the estimated change points for India. Whereas, the world deaths declined from the 4th change point (4 th May 2020) onwards and found a negative trend during the 5 th change point. There was a huge improvement in the trend of fresh deaths across the world over the initial lockdown; the countries were able to contain the number of deaths by imposing strict covid protocols [19, 37, 66] . Many countries in the WHO region started to unlock the process during May-June 2020 in phases to boost the economy by providing relief to people living in the lower economic stratum, job losses and unemployed population [20, 22] . Lifting the restrictions on various public activities resulted in a surge in cases and narrowed the infection gap between 5th and 6th change points across the WHO regions. Better health infrastructure, public attitude towards health warnings helped reduced infections and deaths cases in most of the world regions. Any changes in fresh cases or deaths could be inferred as the result of interventions and mitigation strategies [11] . The COVID-19 pandemic came out a big challenge for human civilization. The Fractal signatures of the COVID-19 spread Change point trend analysis of GNI per capita in selected European countries and Israel Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India. Chaos, Solitons and Fractals Simultaneous multiple change-point and factor analysis for highdimensional time series Application of the ARIMA model on the COVID-2019 epidemic dataset, Data in Brief Dynamic analysis of a mathematical model with health care capacity for COVID-19 pandemic Time series forecasting of COVID-19 transmission in Canada using LSTM networks Spreading of infections on random graphs: A percolation-type model for COVID-19 A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data Forecasting of COVID-19 Confirmed Cases in Different Countries with ARIMA Models. medRxiv From SARS to COVID-19: A previously unknown SARSrelated coronavirus (SARS-CoV-2) of pandemic potential infecting humans -Call for a One Health approach. One Health Time series analysis of COVID-19 infection curve: A change-point perspective Analysis and forecast of COVID-19 spreading in China, Italy and France Changepoint Detection in the Presence of Outliers Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Outbreak definition by change point analysis: a tool for public health decision Changes in stream flow and sediment discharge and the response to human activities in the middle reaches of the Yellow River. Hydrology and Earth System Sciences Discussions Spread and dynamics of the COVID-19 epidemic in Italy: effects of emergency containment measures COVID-19 pandemic and challenges for socio-economic issues, healthcare and National Health Programs in India Projections for COVID-19 pandemic in India and effect of temperature and humidity, Diabetes and metabolic syndrome How the Coronavirus Lockdown Impacts the Impoverished in India Clinical Characteristics of Coronavirus Disease 2019 in China Spread of SARS-CoV-2 in the Icelandic population Assessment of temporal trend of Covid-19 outbreak in India Estimating the Impact of Daily Weather on the Temporal Pattern of COVID-19 Outbreak in India. Earth Systems and Environment Trend Analysis and Forecasting of COVID-19 outbreak in 19 outbreak in India Assessing homogeneity and climate variability of temperature and precipitation series in the capitals of north-eastern Brazil Statistical procedures for evaluating trends in coronavirus disease-19 cases in the United States Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) considering its particular characteristics: The case of China. Communications in Nonlinear Science and Numerical Simulation Statistical analysis for change detection and trend assessment in climatological parameters Rank Correlation Methods A contribution to the mathematical theory of epidemics Modeling the dynamics of novel coronavirus (2019-ncov) with fractional derivative A close look at 2019 novel coronavirus (COVID 19) infections in Turkey using time series analysis and efficiency analysis Single and Multiple Change Point Detection in Spike Trains: Comparison of Different CUSUM First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment Multiple change point detection via genetic algorithms A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action A simulation study to examine the sensitivity of the Pettitt test to detect abrupt changes in mean On rapid spread of Covid-19 variant. 6 th Ministry of Health and Family Welfare, Government of India. www.mohfw.gov.in [Retrieved on 21 st Efficient change point detection for genomic sequences of continuous measurements Mathematical biology. An Introduction. 3ed Mathematical modeling of COVID-19 transmission dynamics with a case study of Wuhan Data-driven estimation of change points reveal correlation between facemask use and accelerated curtailing of the COVID-19 epidemic in Italy A non-parametric approach to the change point problem Spatial modeling, risk mapping, change detection, and outbreak trend analysis of coronavirus (COVID-19) in Iran (days between Evaluation and prediction of COVID-19 in India: A case study of worst hit states Analysis of a fractional SEIR model with treatment RStudio: Integrated Development for A data driven epidemic model to analyse the lockdown effect and predict the course of COVID-19 progress in India COVID-19 epidemic in Switzerland: on the importance of testing, con-tact tracing and isolation Evolutionary modelling of the COVID-19 pandemic in fifteen most affected countries Estimates of the regression coefficient based on Kendall's Tau Temporal relationship between outbound traffic from Wuhan and the 2019 coronavirus disease (COVID-19) incidence in China. medRxiv Long Memory in Economics Dual multiple change-point model leads to more accurate recombination detection Situation report; World Health Organization: Geneva, Switzer-land Homogeneity of 20 th century European daily temperature and precipitation series Covid-19 lockdown in India Word Health Organization, weekly update on Corona Virus Disease (COVID-19) Word Health Organization, weekly update on Corona Virus Disease (COVID-19) WHO Coronavirus Disease (COVID-19) Dashboard World Health Organization, WHO Western Pacific Regional Action Plan for Response to Large-Scale Community Outbreaks of Evaluation of lockdown impact on SARS-CoV-2 dynamics through viral genome quantification in Paris wastewaters Forecast analysis of the epidemics trend of COVID-19 in the USA by a generalized fractional-order SEIR model Spatio-temporal patterns of the 2019-nCov epidemic at the country level in Hubei Province Stability of a fractional order SEIR model with general incidence Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia A novel coronavirus from patients with pneumonia in China Coronavirus disease 2019 (COVID-19): a perspective from China Authors are thankful to the data management teams of the WHO for the quality database.Authors are also thankful to the Central Agricultural University, Imphal, Manipur, India, for providing necessary infrastructural support for conducting the study. The authors declare that they have no known competing interest. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Ethical approval is not required for the study because it doesn't include any human subject.