key: cord-133468-fkwtgq69 authors: Hariharan, Ramya title: When to Relax Social Distancing Measures? An ARIMA Based Forecasting Study date: 2020-10-15 journal: nan DOI: nan sha: doc_id: 133468 cord_uid: fkwtgq69 The spread of the novel coronavirus across various countries is wide and rapid. The number of confirmed cases and the reproduction number are some of the epidemiological parameters utilized in scientific studies for the analysis and prediction of the viral transmission. The positive rate, an indicator on the extent of testing the population, aids in understanding the severity of the infection in a given geographic location. The positive rate for selected countries has been considered in this study to construct ARIMA based statistical models. The goodness of fit of the models are verified by the investigation of residuals, Box-Luang test and the forecast error values. The positive rates forecasted by the ARIMA models are utilized to investigate the scope for implementation of relaxations in social distancing measures in some countries and the necessity to tighten the rules further in some other countries. In the first two decades of the 21 st century, the re-emergence of infectious disease is on the rise. Predominantly originating from zoonotic viruses, the outbreak of viruses such as Ebola, Zika, H1N1 etc., have resurfaced frequently in the past few years [1, 2] . Recently, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a novel virus which belongs to the family of human coronavirus has originated in the Wuhan province of China in December 2019 [3, 4] . Through an intermediate host, the transmission of the virus from bat to humans has occurred. The virus has a higher infectivity rate as compared to the influenza viruses which is manifested by its high reproduction number. Owing to the severity of the respiratory illness World Health Organization (WHO) had declared SARS-CoV-2 as a pandemic in March 2020. The virus has rapidly spread across the globe and almost all the countries are engulfed under its net. As on 01 September 2020, as high as 213 countries around the world have reported a total of 21,095,532 confirmed cases of SARS-CoV-2. The global death toll has also reached about 757,779 deaths. All the affected countries are battling to reduce the transmission by proclaiming stringent guidelines on safety precautions, social distancing, lockdown, home/institutional quarantine and travel restrictions [5] . In spite of these measures, the number of confirmed cases in most of the countries is on the rise. In particular, highly populated countries such as Brazil, USA, India etc., have emerged as epicenters for the infectious disease [6, 7] . Numerous research groups have utilized the epidemic data to understand and the trend and trajectory of the disease. Most of the studies have been conducted by analyzing the countrywise or city-wise data on the number of confirmed cases, recovery rates and mortality rates [8, 9] . Machine learning, deep learning, artificial neural networks based algorithms have been utilized to forecast the transmission of the disease [10] [11] [12] . However, the re-emergence of SARS-CoV-2 cases in countries such as New Zealand, Spain, Germany, Iran etc., has raised speculation on the possibility of a second epidemic wave. The testing for infection which aids in the effective isolation of infected persons and also tracing their contacts is very important to overcome the viral dissemination [13] . Testing the population also helps in efficiently utilize the medical resources, which is being exploited in the pandemic situation. In this study, an interesting epidemiological parameter, namely, the positive rate has been utilized to frame guidelines pertaining to social distancing measures. Positive rate is an indicator on the level of testing with respect to the extent of outbreak. The time series data collected from some of the worst affected countries of the world has been used to build the Auto-Regressive Integrated Moving Average (ARIMA) models. The countries are chosen so as to assess concerns such as increase in SARS-CoV-2 cases as a function of population, economy, testing rate and travel regulations. The study provides a new perspective to understand the current pandemic situation and also provides insight on efficiently using the available resources. The data on positive rate has been collected from open source database of Our World in Data [14] . As on the first week of September 2020, USA, Russia, South Africa, India, Mexico and Spain are some of the countries badly affected by the deadly virus. Therefore, the master dataset has been filtered to obtain the positive rates for the chosen countries. The data was collected from 1 April to 12 September 2020. The country-based positive rates are shown in Figure 1 which also highlights the countries selected in this study. The data was checked for missing values which was approximated by the corresponding monthly average. The as collected data showed a time series behavior. In order to build a suitable model to forecast the trend in variation of positive rate, the steps such as test for stationarity, identification of parameters, estimation, evaluation of model performance and forecasting are performed on the collected data. Among the statistical models, the most powerful and robust procedure is the method established by Box and Jenkins [15] . The ARIMA model which is a combination of the autoregressive (AR) and moving average (MA) models has been used in this study to analyze and forecast the time series data [16] . The "Arima" function in the "forecast" package of R programming (version 3.4.2) was used to build the model. In addition, packages such as "timeseries", "Metrics" and "ggplot2" were used for forecasting, statistical analysis and visualization respectively. The positive rate values for the chosen countries such as USA, Russia, South Africa, India, Mexico and Spain are converted into their corresponding time-series plots. Depending on the nature of the time-series data, suitable ARIMA models are built for each of the countries. Initially, the unit root test is conducted to ascertain the non-stationarity of the time series. In order to apply the statistical theories, it is mandatory for the time series to be stationary. One of the popular tests to estimate the stationarity of data is the Augmented Dickey-Fuller (ADF) test [17] . It is observed from Table 1 that the data from all the chosen countries lacked stationarity as manifested by their P values > 0.05. The probability of significance, P-value, should be < 0.05 to confirm that the time series is stationary. A non-seasonal ARIMA model is generally represented by the parameters (p,d,q). The primary step in fitting an ARIMA model is the determination of the order of differencing, 'd' necessary to stationarize the series [18] . The country-wise estimated order of difference required to yield the positivity data series stationary are shown in Table 1 . The ADF test has been repeated on the differenced data series which showed P-values <0.01 thereby confirming its stationarity. Table 1 . Subsequently, the ARIMA models are built using the least-squares estimation process. The accuracy of the ARIMA models is diagnozed using Akaike's Information Criterion (AIC) and the Schwartz Bayesian Information Criterion (BIC). AIC, shown in Equation 1, is a widely used measure of a statistical model to quantify its goodness of fit and parsimony. A good model is identified as the one which has minimum AIC among all the other models [19] . The models are selected based on this criterion and the corresponding AIC values are listed in Table 1 . Where L is the likelihood value, N is the number of measurements recorded and k is the number of estimated parameters. The However, the outliers are also retained in this study to build a realistic model. Similarly, the estimated autocorrelation coefficients (ACF) of the residuals pertaining to the various models are shown in Figure 3 . It is evident that for all the models, the lags shown in Figure 3 occur well within the confidence interval. It is also noticed from Table 2 that the ACF are statistically insignificant which implies that the residuals have random values and confirm the lack of noticeable correlation in the residuals series [20] . The histogram of residuals are shown in Figure 4 corresponding to the chosen countries. The histograms of all the ARIMA models show a predominant normal distribution trend of the residuals. They also confirms the lack of significant variance. The mean values are also noted to be near-zero. The observation is also confirmed from the low mean error (ME) values compiled in Table 2 . The investigation on the residuals confirm that the built ARIMA models have a good fit with the actual values. The ARIMA models are also verified by the Box-Ljung test which provides a statistical evidence of a good fit [18] . The recorded P values for all the ARIMA models are tabulated in Table 2 . Figure 5 shows a country-wise 30-day forecasted values of positive rate with a confidence interval of 80%. Figure 5a shows a steady reduction in the positive rate for USA from 6% to 4.8% in the forthcoming month. The forecast confirms that the current testing rate in USA is adequate to isolate the infected population. WHO has provided guidelines to observe the positive rate for 14 days before taking decision on relaxing social distancing measures. It has also recommended a positive rate of 5% or lesser as a metric to evaluate the viral spread and to frame government policies [21] . Based on the forecasted positive rate, in the forthcoming months USA would comfortably meet the criterion and policies for social distancing shall be taken favorable to allow people interaction and movement. Figure 5f shows the forecasted positive rates for Spain. The values show a moderate increment from 9.09% to 15.14%. The positive rate is higher than the WHO prescribed limits and the country should increase the testing rate by at least 30-40% to reduce the positive rate to below 5%. In the case of Spain this measure is very critical because it is a European Union country which is well-connected by land and other means of transport with most of the European countries. Hence the government should not only increase the testing rate, but also implement stringent rules until the positive rate falls below 5%. This study provides an insight on the current and forecasted trend in the positive rate in the selected countries. The countries such as USA and South Africa, similar to Australia and South Korea, are on the path of attaining low positive rates [22] . Hence, the guidelines introduced to mitigate the transmission of SARS-CoV-2 such as restriction on people movement and social distancing could be relaxed in a month in these countries. However, countries such as India, Mexico and Spain should adopt more precautions to protect their population by increasing the testing rates. The social distancing rules should also be strictly enforced in these countries for at least a few more months until the positive rate is <5%. Countries such as Russia should be more proactive in maintaining the low positive rates in order to avoid the outbreak of a second wave of the pandemic. This knowledge is important to properly assess the key decision to be implemented on social distancing and lockdown policies. These measures are essential because the SARS-CoV-2 pandemic holds the potential to spread more widely and quickly which can have a devastating impact on not only the economy of the affected country but also the global economy. In this study, the positive rate for different countries is utilized to build ARIMA based forecasting models. The model has been carefully built by testing for stationarity, appropriately choosing the key parameters and validating its accuracy. The verification of the models is conducted by observing the residuals and from the results of the Box-Ljung test. The forecasted values for USA and South Africa showed than in 30 days, the positive rate shall be consistently below 5%. This would pave way to relax the existing stringent measures adopted by the government to reduce the viral transmission. However, in countries such as India, Mexico and Spain, the positive rate is beyond the safe limit of about 5%. In India, even though the positive rate is not inherently high, precarious measures such as active testing would prevent a steep increase in the positive rate. In Mexico, the current testing rates are highly insufficient which need to be increased by at least 60-70%. In Spain, the testing rate should be consciously increased by 30-40% in order to prevent the cross-country spread of the pathogen. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The author declares that they have no conflict of interest. Estimation of the reproductive number of the Spanish flu epidemic in Zaim, Zika; a continuous global threat to public health Coronavirus pandemic: A predictive analysis of the peak outbreak epidemic in South Africa, Turkey, and Brazil A new study of unreported cases of 2019-nCOV epidemic outbreaks COVID-19 outbreak: Migration, effects on society, global environment and prevention Rising Home Values and Covid-19 Case Rates in Massachusetts A sharp increase in the number of COVID-19 cases and case fatality rates after lifting the lockdown in Kurdistan region of Iraq Time series modelling to forecast the confirmed and recovered cases of COVID-19 Predictive model and risk factors for case fatality of COVID-19: a cohort of 21 Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing Partial derivative Nonlinear Global Pandemic Machine Learning prediction of COVID 19 Deep learning methods for forecasting COVID-19 time-Series data: A Comparative study Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Our World in data, Coronavirus (COVID-19) Testing -Statistics and Research -Our World in Data Time series analysis of aerosol optical depth over New Delhi using Box-Jenkins ARIMA modeling approach Chapter 14 -Time Series: Understanding Changes Over Time Distribution of the Estimators for Autoregressive Time Series With a Unit Root Study of ARIMA and least square support vector machine (LS-SVM) models for the prediction of SARS-CoV-2 confirmed cases in the most affected countries Trend analysis and ARIMA modelling of pre-monsoon rainfall data for western India Research on COVID-19 based on ARIMA modelĪ”-Taking Hubei, China as an example to see the epidemic in Italy Public health criteria to adjust public health and social measures in the context of COVID-19, WHO Pathological findings of COVID-19 associated with acute respiratory distress syndrome The author is thankful to Our World in Data organization for the valuable data. The author profoundly thanks the Management and the Principal of B.M.S. College of Engineering, Bangalore for their support.