key: cord-0771357-3686wv7v authors: Fang, Lanlan; Wang, Dingjian; Pan, Guixia title: Analysis and Estimation of COVID-19 Spreading in Russia Based on ARIMA Model date: 2020-10-09 journal: SN Compr Clin Med DOI: 10.1007/s42399-020-00555-y sha: aeee9e6c7fc858a7224451cc2014ee88125eaa7b doc_id: 771357 cord_uid: 3686wv7v Russia has been currently in the “hard-hit” area of the COVID-19 outbreak, with more than 396,000 confirmed cases as of May 30. It is necessary to analyze and predict its epidemic situation to help formulate effective public health policies. Autoregressive integrated moving average (ARIMA) models were developed to predict the cumulative confirmed, dead, and recovered cases, respectively. R 3.6.2 software was used to fit the data from January 31 to May 20, 2020, and predict the data for the next 30 days. The COVID-19 epidemic in Russia was divided into two stages and reached its peak in May. The epidemic began to stabilize on May 19. The case fatality rate has been at an extremely low level. ARIMA (2,2,1), ARIMA (3,2,0), and ARIMA (0,2,1) were the models of cumulative confirmed, dead, and recovered cases, respectively. After testing, the mean absolute percentage error (MAPE) of three models were 0.6, 3.9, and 2.4, respectively. This paper indicates that Russia’s health system capacity can effectively respond to the COVID-19 pandemic. Three ARIMA models have a good fitting effect and can be used for short-term prediction of the COVID-19 trend, providing a theoretical basis for Russia to formulate new intervention policies. Since January 2020, a new coronavirus from Wuhan, China, has been spreading around the world and is named coronavirus disease 2019 (COVID-19). Since the World War II, the COVID-19 pandemic is the most serious global crisis, affecting almost all countries on our planet [1] . On March 11, WHO publicly announced COVID-19 as a "global pandemic." As of May 30, 2020, it infected more than 6 million population and killed more than 367,000 people (https://www.who.int/docs/ default-source/coronaviruse/situation-reports). The pandemic is still growing in most countries and is far from under control. Russia is currently in the "hard-hit" area of the COVID-19 outbreak, with more than 396,000 confirmed cases as of May 30. From May 2 to May 30, more than 8000 new cases were confirmed daily (https://www.worldometers.info/ coronavirus/country/Russia). It is also posing a challenge to the prevention and control of the second wave of the epidemic in China, bordering with Russia. Therefore, the construction of a reasonable prediction model in Russia contributes to prevent and control the spread of COVID-19. Moreover, it can provide a methodological reference for the prediction of infectious diseases in the future. In recent studies, the prediction models of COVID-19 are mainly divided into two categories, which are infectious disease models and their modified models, and artificial intelligence algorithms based on massive data. For example, Shengli et al. [2] established an SEIR epidemic dynamics model that can be used to evaluate and predict the COVID-19 epidemic in Hubei province. Yang et al. [3] applied modified SEIR and AI to predict the trend of China's COVID-19 under public health intervention. Pereira et al. [4] developed a modified stacked auto-encoder for modeling the transmission dynamics of the epidemics and applied this model to forecasting COVID-19 dynamics in Brazil. Al-Qaness et al. [5] improved the adaptive neuro-fuzzy inference system (ANFIS) by applying an enhanced flower pollination algorithm using the Salp Swarm Algorithm to estimate the number of confirmed COVID-19 cases in China. However, they also have their disadvantages. Epidemiological models require parameters and depend on many assumptions. Artificial intelligence algorithm requires huge amounts of data, high complexity, and uncertainty [6] . To overcome these limitations and help public health planning and decision-making, we try to use the autoregressive integrated moving average (ARIMA) model, which has the advantages of simple structure, strong applicability, and strong data set interpretation ability [7] . It has been widely used for short-term prediction of infectious diseases. For instance, Earnest et al. [7] used the ARIMA model to predict and monitor the number of beds occupied during a SARS outbreak in a tertiary hospital in Singapore. Wang et al. [8] estimated the morbidity of Influenza in Ningbo, China, 2006-2014 by the ARIMA model. Singh et al. [9] forecasted the transmission trajectory of COVID-19 disease in the 15 worst-hit countries in the next 2 months based on the ARIMA model. Therefore, we respectively established the ARIMA model based on the monitoring data of cumulative confirmed cases, cumulative death cases, and cumulative recovered cases of COVID-19 in Russia as of May 20, 2020, to predict the trend in the next 30 days. From January 31 to May 30, 2020, Russia's COVID-19 daily reported cumulative confirmed, dead, and recovered data were extracted from the John Hopkins Coronavirus resource center (https://coronavirus.jhu.edu/). The data from January 31 to May 20 is used as a training set to build ARIMA models, and the data from May 21 to May 30 is used as a verification set to test the accuracy of these models. Excel 2010 is used to build the Russian COVID-19 time series database, and R 3.6.2 software is used to build and predict the ARIMA model. Descriptive statistics of the COVID-19 data of Russia between 2020/1/31 and 2020/5/20 are given in Table 1 . The ARIMA model is a time series forecasting method proposed by Box and Jenkins in the 1970s. The model includes AR(p), MA(q), ARMA (p, q), ARIMA (p, d, q), and so on. Among them, ARIMA (p, d, q) means that the time series is differentiated d times, and each observation in the sequence is represented by a linear combination of the past p observations and q residuals [10] . p and q are the autoregressive order and moving average order, and d is the different order. The basic idea of the model is to treat the sequence as a set of time-dependent random variables and describe it with a mathematical model, based on the previous sequence value to predict future values [11] . The predicted value represents a linear function composed of the most recent true value and the most recent prediction error. The ARIMA (p, d, q) model can be presented as given in the Eq. (1). The steps to build the ARIMA model include the following: (1) Stationarity test Use the augmented Dickey-Fuller (ADF) test to detect whether the time series is stationary. The P value of the ADF test is less than 0.05, indicating that the sequence is stationary. If the original sequence is not stationary, it can be converted to a stationary sequence by differential operation [12, 13] , and the value of d is determined. (2) Model identification Use the autocorrelation (ACF) graph and partial autocorrelation (PACF) graph to determine the values of p and q. (3) Fit and evaluation model Use Box. Test () function to test whether the residual sequence is white noise. P > 0.05, the residual sequence is white noise. The ARIMA model can better fit the data. In this study, mean percentage error (MPE), mean absolute percentage error (MAPE), and mean absolute standardized error (MASE) were applied to verify the fitting effect of the developed ARIMA model. [14] . It is expressed mathematically in the Eq. (2). The Tendency Analysis of COVID-19 in Russia As seen in Fig. 1a , Russia has 308,705 confirmed cases, 2972 deaths, 85,392 recovery cases, and 220,341 existing cases as of May 20, 2020. The cumulative number of confirmed cases remained at 2 cases from January 31 to March 1 and was recovered from the hospital on February 12. Therefore, the Russian epidemic was free from February 12 to March 1. Russia's case fatality rate has always been at a very low level of 0.96% as of May 20. As seen from Fig. 1b (1) Stationarity test Judging from Fig. 1a , the original sequence has a clear longterm trend, indicating that it is not stable. Use the diff () function for the second-order difference, and then use the adf.test () function for the ADF test. The results show that Dickey-Fuller (confirm) = 3.55, P = 0.04; Dickey-Fuller (death) = 7.01, P = 0.01; Dickey-Fuller (recover) = − 6.66, P = 0.01, so the sequence after the difference is stationary, and d = 2. (2) Model identification Determine the value of p and q according to ACF and PACF graphs of the sequence after difference. Figure 2a shows that ACF and PACF are both gradually reduced to 0, using the ARIMA (p, d, q) model. Take 0, 1, 2, respectively, and experiment from low order to high order one by one. Debug one by one according to the minimum principle of Akaike information criterion (AIC), and finally choose ARIMA (2,2,1) as the confirmed model. Figure 2b shows that the ACF graph shows tailing, and the PACF graph decreases to 0 after the third order, that is, the third-order truncation, P = 3, so the dead model is ARIMA (3,2,0) . Figure 2c shows that the ACF graph decreases to 0 after the first order, that is, the first-order truncation, q = 1, and the PACF graph shows tailing, so the recovered model is ARIMA (0,2,1). It is seen from Table 2 that the residual sequences are all white noise sequences (P > 0.05). The values of MPE, MAPE, and MASE are also small, and the maximum values of the three indicators are 2.05, 11.75, and 0.30, respectively, indicating that the models fit well. Test the accuracy of the model by comparing the difference between the predicted value and the actual value from May 21 to May 30, 2020. As seen in Table 3 , the MAPE of the confirmed model is 0.60, indicating that the model has high prediction accuracy and is robust. The MAPEs of the death model and the recovery model were 3.90 and 2.40, respectively, indicating that models are relatively robust. Use established ARIMA models to predict the cumulative confirmed, death, and recovered data for the next 30 days in Russia. As seen in Fig. 3 , On June 19, the cumulative number of confirmed, death, and recovered cases will reach 547,143, 6487, 305,558. Discussion COVID-19 in Russia was divided into two stages. From January 31 to March 1, there was no impact on the epidemic. From March 2 to the March 20, the disease broke out, spread, and grew exponentially. There were three main reasons for the Russian epidemic: (1) Russia did not strictly control the entry of people in Europe immediately after the outbreak in Europe. According to relevant reports, more than 100,000 people entered Russia from Europe, and they were distributed in various places in Russia. (2) After the domestic epidemic had been controlled, Russia had also relaxed its prevention and control. (3) Compared with international standards, Russian selfdeveloped kits had two grades lower, so many patients with asymptomatic and mild symptoms were not diagnosed and separated in time. Analyzing these reasons had a good warning meaning to the countries that had controlled the epidemic and prevent the spread of the second wave of the epidemic. The epidemic in Russia peaked in May, with more than 10,000 new cases confirmed daily for 10 consecutive days. Since May 11, there had been a downward trend. Judging from the recovery curve, the daily confirmed cases and recovered cases coincided on May 19, which showed that the epidemic situation in Russia tended to be stable. Russia's case-fatality rate had been at a very low level of 0.96%, which might be because the cases were mostly mild, which also indirectly shown that its intervention measures are effective. Although the number of confirmed cases ranked among the top 5 in the world on May 20, Russia's health system capacity could effectively respond to the COVID-19 pandemic. Time series data can help to predict epidemic trends and improve prevention systems. The time series forecasting models include the exponential smoothing model and artificial neural network. Although these models can be used to predict the incidence of infectious diseases, they have some limitations. For exponential smoothing model, it is difficult to determine the smoothing coefficient because the weight decreases with the extension of the forecast period. For artificial neural networks, it may not well explain the nonlinear relationship in time series data. The ARIMA model is the most popular and classic time series forecasting model because of its simplicity, system structure, and acceptable prediction performance [15] . It obtains a smooth sequence through difference and then selects the best model for prediction. It can not only solve the problem of data autocorrelation but also analyze data with periodic effects. Therefore, it can better simulate the past epidemic situation and obtain better prediction effect. In my study, ARIMA models were developed to predict the cumulative confirmed, dead, and recovered cases of COVID-19 in Russia, respectively. Taking the data from 5.21 to 5.30 as a test set to evaluate the extrapolation effect of the models with MAPE as the evaluation standard. The cumulative confirmed model was very robust, with the MAPE of 0.60. From the MAPEs of 3.9 and 2.4, the cumulative death and recovery models were relatively stable. There have been many articles using the ARIMA model to predict the trend of COVID-19 in different countries [9, 16] . In Iran, Moftakhar et al. [17] used the ARIMA model and artificial neural network to predict the daily new COVID-19 infections, which showed that the ARIMA model was more accurate. Ceylan [16] estimated the prevalence of COVID-19 in Italy, Spain, and France, with M A P E ( I t a l y ) = 4 . 7 5 2 , M A P E ( S p a i n ) = 5.849, and MAPE (France) = 5.634. Therefore, the ARIMA (2,2,1), ARIMA (3,2,0), and ARIMA (0,2,1) models were considered to be reasonable for forecasting with high accuracy. According to the predicted data from 5.21 to 6.19, although the cumulative number of confirmed cases is still on the rise, the number of daily new confirmed cases on May 30 fell to 8000, and then basically stabilized at about 7000, lower than the daily new recovery cases. Therefore, the Russian epidemic will be further controlled. Constructing the predictive model based on time series data is an important method for the prevention and control of infectious diseases. The ARIMA model has a mature theoretical basis and clear criteria, which can effectively predict the epidemic trend of infectious diseases. This will help to effectively allocate medical resources and will have guiding significance for scientific prevention and control of COVID-19. Russia's interventions on COVID-19 have achieved remarkable effects. Russia's health system capacity can effectively respond to the COVID-19 pandemic. Established ARIMA models had a good fitting effect and can be used for short-term prediction of the COVID-19 trend, providing a theoretical basis for Russia to formulate new intervention policies. Modeling and forecasting of epidemic spreading: the case of Covid-19 and beyond Study on the epidemic development of corona virus disease-19 (COVID-19) in Hubei province by a modified SEIR model Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions Forecasting Covid-19 dynamics in Brazil: a data driven approach Optimization method for forecasting confirmed cases of COVID-19 in China Strengths, weaknesses, opportunities, and threats analysis of artificial intelligence and machine learning applications in radiology Using autoregressive integrated moving average (ARIMA) models to predict and monitor the number of beds occupied during a SARS outbreak in a tertiary hospital in Singapore Epidemiological features and forecast model analysis for the morbidity of influenza in Prediction of the COVID-19 pandemic for the top 15 affected countries: advanced autoregressive integrated moving average (ARIMA) model Reducing demand uncertainty in the platelet supply chain through artificial neural networks and ARIMA models The analysis of time series: an introduction Time series analysis: forecasting and control, 5th edition Permeate flux prediction in the ultrafiltration of fruit juices by ARIMA models Application of a combined model with seasonal autoregressive integrated moving average and support vector regression in forecasting hand-footmouth disease incidence in Wuhan Comparison of ARIMA and GM(1, 1) models for prediction of hepatitis B in China Estimation of COVID-19 prevalence in Italy Exponentially increasing trend of infected patients with COVID-19 in Iran: a comparison of neural network and ARIMA forecasting models Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations