key: cord-0891334-rju936u8
authors: Das, Ramesh Chandra
title: Forecasting Incidences of COVID-19 using Box-Jenkins Method for the Period July 12-Septembert 11, 2020: A study on highly affected countries
date: 2020-08-24
journal: Chaos Solitons Fractals
DOI: 10.1016/j.chaos.2020.110248
sha: f7284aa17e76827de5ba8edcd1e89acf2a63f9c2
doc_id: 891334
cord_uid: rju936u8

BACKGROUND: The devastating spread of the novel coronavirus, named COVID-19, starting its journey from Wuhan Province of China on January 21st, 2020, has now threatened lives of almost all the countries of the world in different magnitudes. Mostly the developed countries have been hit hard, besides the emerging countries like China, India and Brazil. The scientists and the policy makers are in dark with respect to its spread and claiming lives in coming days. OBJECTIVES: The present study aims to forecast the number of incidences in severely affected seven countries, USA, UK, Italy, Spain, France, China and India, for the period July 12-Septmeber 11, 2020 and compares the forecasted values with the actual values to judge its depth of severity and growth. METHOD: The study uses Box-Jenkins method of forecasting in an Autoregressive Integrated Moving Average (ARIMA) structure on the basis of the daily data published by World Health Organization from January 21st to July 11, 2020. RESULTS: It is observed that USA and India are the two countries whose increasing trends will continue in the forecasted period (July 12 to September 11), others except China will face lower number of incidences. China's incidence has come to halt around 80000 in numbers. The growth rates of the number of incidences for all the countries during the forecasted period will be diminishing. The mean difference test results between the forecasted and actual values in level and growth forms show that in the former case, USA, India, UK will face increasing forecast than the actual number but in the latter case, all of the countries will face significantly decreasing growth rates in the forecasted values compared to their actual growth values.

Human civilization is now at high risk since its valor is now bowed down to a small organism which is thousands of parts of the area of the tip of a needle in size. Starting its journey from the Wuhan Province of China in 21 st January, 2020, the Novel Corona Virus, named COVID-19, a variant of SARS and MERS, spread to almost all countries of the world by affecting around a crore of people and claiming lacs of lives till date. It has mutated itself several times within a very short period of time so that the scientists are just the spectators to its spread and devastating features. Hospitals and health centres of the countries are flooded with COVID patients and in most of the countries separate open spaces are constructed for temporary hospitals. Grave yards are over loaded, mass graving is getting done in many countries. It is a striking fact that most of the so-called developed countries with improved health facilities are highly affected. USA leads the group followed by many European countries. From the developing world, India and Brazil are in the row. No governments policies and management systems are capable to control its devastating pace. Scientists have been putting their all efforts upon discovering medicines and vaccines to guard its spread. As of now it is known from the World Health Organization's (WHO) data on how many are affected and how many have died but we do not how many will be affected and died by its ill effects. The present study aims to forecast the number of incidences across seven highly affected countries, USA, UK, Italy, Spain, France, China and India.

The study is organized as-brief literature review, rationale and objectives, data and methodology, results and discussion and finally conclusion and recommendations.

Having its novelty, COVID-19 has compelled the scientists and researchers around the globe to focus on it. There is not much studies in the related field till date and the studies on its forecasting is further scanty in the short time literature on COVID. We are here exploring some studies on the roles of immunity and socio-economic and environmental factors behind its spread and devastations, and some of the studies of forecasting or predictions of COVID-19 and of other viral or bacterial diseases.

According to Raja (2008) , Indians have some genetic advantage in fighting against viruses and bacteria, which may be one of the causes of not affecting the world's second most populated country. Hoch (2010), in his work on immune mechanism activated by hunger and stress, finds that hunger or stress cause the production of peptides which protect against bacteria. Hence, the countries from Africa and Asia where relatively the hunger rate is high, they are less prone to COVID. Science Daily (Oct 20, 2016) reveals that Africans have high immunity than Europeans which make capable the former to fight against infectious diseases. In another study Barreiro (2016) has demonstrated that Americans of African descent have a stronger immune response to infection compared to Americans of European descent. Curtis et al (2020) assert that BCG Vaccination against tuberculosis in the weak regions of the world, Asia and Africa, could have increased the immunity level to fight against viruses. Studies related to demographic significance have been worth mentioning (Rook et al, 2014) . The study argues that people living in urban centers who have less access to green spaces may be more apposite to have chronic inflammation, a condition caused by immune system dysfunction.

With respect to the role of socio-economic factors behind the spread of COVID-19, Lau et al (2020) opines that many countries are facing increasing numbers of COVID-19 cases because they are mostly attributed to regular international flight connections with China. The study indicates a strong linear correlation between domestic and international COVID-19 cases and air traffic volume for regions within and outside China. Banik et al (2020) analyse the factors that determine the fatality rates across 29 economies spread from both the developing and developed world. It reveals that factors such as public health system, population age structure, poverty level and BCG vaccination are powerful contributory factors in determining fatality rates. Mukherji (2020) unveils the socioeconomic and health factors that can explain the differential impact of the coronavirus pandemic. It observes that counties with high per capita personal income have a high incidence of both reported cases and deaths. The results are striking in the sense that developed countries in USA in particular or regions of the world in general may not be safe from the outbreak, rather they are highly vulnerable than the less developed or developing countries.

With respect to the studies on predictions of COVID in particular the works of Yang et al (2020 ), Singh, et al (2020 , Kumar, et al (2020a) and Kumar et al (2020b) are worth mentioning. The work of Yang et al (2020) predicts the COVID incidences for Hubei of China and Italy and shows that it will rise in different specifications. Singh et al (2020) predicted the number of incidences, death rates due to COVID for 15 countries for the period April 24 -July 7 using ARIMA method and observed that the predicted values on the confirmed cases, deaths, and recoveries will double in all the observed countries except China, Switzerland, and Germany. It was also observed that the death and recovery rates were rose faster when compared to confirmed cases over the next 2 months. USA, Italy like countries will suffer more. In a similar study, Kumar et al (2020a) predicted some trajectories of COVID-19 in the coming days (until April 30, 2020) using the most advanced ARIMA model. The results predicted very frightening outcomes, which defines to worsen the conditions in Iran, entire Europe, especially Italy, Spain, and France and USA will come as a surprise and going to become the epicenter for new cases during the mid-April 2020. Further, for India, Kumar et al (2020b) , by using ARIMA and Richard's model, predicted that by the end of April 2020, the incidence of new cases is predicted to be 5200 through the ARIMA model versus be 6378 Richard model and the estimated 197 deaths and drop down in the recovery rates will reach around 501 by the end of April 2020.

In the prediction of other infectious diseases, using the time series forecasting model, Siregar and Makmur (2019) investigated the role of climate change in prediction of dengue fever in the districts of Medan in Indonesia for the monthly data for 2012-16 and shown that the trend is seasonal and the impact is high during the rainy season. For the study on dengue in Bangladesh, Choudhury, Banu and Islam (2008) attempts to model the monthly number of dengue fever cases in Dhaka, Bangladesh, and forecast the dengue incidence using Seasonal Autoregressive Integrated Moving Average models for the monthly data, from January 2000 to October 2007, and the results showed that the predicted values were consistent with the upturns and downturns of the observed series. A forecast for the period November 2007 to 1. AR (autoregressive) Process: Past values of the variable and error term generate the data 2. MA (moving average) Process: Only the errors or the disturbance term generate the data 3. ARMA (autoregressive and moving average) Process: Data is generated by the combination of AR and MA processes Sometimes it is taken as ARIMA model where 'I' stands for the order of Integration of the series or how many differencing is done for making the time series of the variable to Stationary.

An AR (p) process is one where the current or present period's value of a variable 'y' depends on only the past values plus an error term. If there are 'p' order in the process i.e. current value of y depends on the 'p' order of past (e.g. t-1, t-2, etc.) values and an error term of the current period then the AR(p) can be written as-

where u t is the white noise (WN) error term with zero mean, constant variance and zero autocovariance.

An MA(q) process, on the other hand, is the linear combination of all the q terms of the past values of the white noise terms depending on time. It is a white noise process in which the current value of y t depends on the current value of the WN error term and all past values of the error terms. Because all the errors are WN, so, an MA process is necessarily a stationary process. It is true further because it is the linear combination of all plus and minus values of the errors which hover around the value zero.

Hence, an MA (q) process can be written as-

An AR process is stationary if the characteristic root lies outside the unit circle or having values > 1. If it is so then then φ becomes less than 1. This means the condition φ<1 lead to the values lying inside the unit circle representing stationarity of the AR process, the model is thus having stability property. The AR coefficients should then be less than unity or they should lie within the unit circle.

An ARIMA (p, q) process is the combination of AR and MA process, I being the order of integration which can be represented by 'd', number of differencing to convert the series from non-stationary to stationary. The model for ARMA (p, d, q) can then be written as- 

This relation (3) stands for invertibility between the AR and MA process which means AR and MA processes can be made invertible from one to another.

The B-J model undergoes several sub-models and it is thus required to determine which model is appropriate. The entire procedure follows four-steps:

Step 1: Identification: To determine the appropriate values of p, d. and q.

 The main tools in this search are the correlogram and partial correlogram

where the values of autocorrelation coefficients (ACF) and partial autocorrelation coefficients (PACF) are generated.

Step 2: Estimation: To estimate the parameters of the chosen model. The parameters are all AR and MA terms and a constant term.

Step 3: Diagnostic Checking: To check if the residuals from the fitted model are white noise. It is based on the statistical significance of the estimated values of AR and MA terms, the values of adjusted R square (which should be maximum), and lowest possible values of the information criterion such as Akaike Information Criteria (AIC) and Swartz Information Criteria (SIC).

 If they are, accept the chosen model; if not, start afresh.

 That is why the BJ methodology is an iterative process.

Step 4: Forecasting. The ultimate test of a successful ARIMA model lies in its forecasting performance, within the sample period as well as outside the sample period. On the basis of the acceptable results obtained from Step 1 to 3, forecasting is made on the appropriate model of ARIMA. The forecasting results are accepted on the basis of the acceptable values of root mean square error (RMSE), bias proportion, variance proportions and covariance proportions. The acceptable forecasted values will be those whose RMSE will be minimum possible and covariance proportions will be greater than bias proportions and variance proportions.

Before going for predicting the number of cases for all the seven selected countries we present the diagrammatical view of the actual trends of the same for all the available data. Figure 1 presents the same.

It is observed from the series of the countries on number of incidences that all countries have experienced increasing trends starting from the first case in 21 st January, 2020 in Wuhan Province of China. Maintaining a highly rising trends for the phase of early several weeks China's case became stagnated from the mid of March 2020. In the meantime, Italy became the epicenter of Europe in terms of number of incidences and deaths followed by Spain, France and UK. The ramifications were not restricted to the European zone only, it quickly spread to USA with a very high rate of growth in terms of both number of cases and deaths. USA overtakes China on March 28 and still maintaining top position in the world level with more than lacs of deaths and more than 30 lacs number of cases. In the interim, side by side with Europe and Americas, India has become the only epicenter of Asia and it overtakes the top European members in the list, UK, in June 12. It is an unknown event now where the countries will stay in terms of incidence and death. We can only make forecasting of the same for all the future periods. The present study aims to forecast the number of incidences for the next two months from July 12. The following section focuses on the determination of the forecasted values by using all the four steps of B-J methods.

Step 1-3 of forecasting Table 1 presents the basic results required in the first 3-steps of forecasting process. Column 2 gives the ADF values to show the stationarity of the series and order of integration, column 3 gives all possible forms of ARIMA based on the shapes of ACF and PACF functions, column 4 shows the regression results which determine the values of AR and MA terms, column 5 gives the values of adjusted R square and column 6 and 7 respectively show the values of AIC and SIC criteria. (1,2,1) AR(1) = -0.14 (0.15) MA (1) Source: Computed by the author It is observed from the table that the order of integration of the series for USA, Italy, Spain, France and India is 2 which means these series are second difference stationary. UK's series is first difference stationary but China's series is stationary at level. On the basis of these unit root test results in Augmented Dickey-Fuller (ADF) lines and shapes of ACF and PACF in the respective correlograms we have determined all possible combinations of AR and MA terms. The bold marked results for AR and MA terms after regression of the current values of the number of incidences upon all of its possible lagged values of AR and MA terms of each of the countries do indicate the acceptable structure of the ARIMA where highest values of adjusted R square and low AIC and SIC values are considered as the marking weapons. The optimum and stable structure of ARIMA for USA is (9,2,1), UK (1,1,1), Italy (7,2,1), Spain (2,2,1), France (1,2,1), China (1,0,28) and India (1,2,2).

On the basis of the acceptable results obtained from Step 1 to 3, forecasting is made on the appropriate model of ARIMA. The forecasting results are accepted on the basis of the acceptable values of root mean square error (RMSE), bias proportion, variance proportions and covariance proportions. Figure 2 and 3 present the graphical plots of forecasted values of number of incidences of the selected countries. The numerical values of the forecasted series are given in the Appendix (Table A1 ).

Source: Drawn by the authors based on the derived forecasted values It is observed from the figures that USA leads the group in both actual and forecasted values of number of incidences followed by India. Both the countries have been experiencing exponential increase in incidences and are expected to experience so in coming two months.

There is no signs of stagnancy or stationary forecasted values till September 11. It is thus inferred that these two countries may not face the peak of the severity in next 60 days. USA is forecasted to record over 80 lacs cases and India more than 40 lacs cases in September 11. 1/21/2020 1/28/2020 02-04-2020 02-11-2020 2/18/2020 2/25/2020 03-03-2020 03-10-2020 3/17/2020 3/24/2020 3/31/2020 04-07-2020 4/14/2020 4/21/2020 4/28/2020 05-05-2020 05-12-2020 5/19/2020 5/26/2020 06-02-2020 06-09-2020 6/16/2020 6/23/2020 6/30/2020 07-07-2020 7/14/2020 7/21/2020 7/28/2020 08-04-2020 08-11-2020 8/18/2020 8/25/2020 09-01-2020 09-08-2020

Now come to the discussion on the forecasted trends of remaining five countries in the list. Figure 3 depicts that UK will follow India but there is a huge gap between India and UK. On September 11, UK may face around 30 lacs cases which is around 10 lacs less cases than India. China is in the next position after UK. Spain, France and Italy follow China. Italy is in the bottom position. The peak points of all these countries except USA, India and UK have already crossed and they are narrowing down the gap between actual occurrence and forecasted occurrence. Remarkable sign is observed for China in the sense that its peak has reached on February 20 and the trend thereafter maintained at a stationary level around 80000. The European victims are now getting relief in terms of number of incidences but USA and India have been expected to hit hard in coming days.

Another way of finding the severity of the forecast values of the number of incidences is the growth trend. Figure 4 depicts it.

-50000 0 50000 100000 150000 200000 250000 300000 350000 1/21/2020 1/28/2020 02-04-2020 02-11-2020 2/18/2020 2/25/2020 03-03-2020 03-10-2020 3/17/2020 3/24/2020 3/31/2020 04-07-2020 4/14/2020 4/21/2020 4/28/2020 05-05-2020 05-12-2020 5/19/2020 5/26/2020 06-02-2020 06-09-2020 6/16/2020 6/23/2020 6/30/2020 07-07-2020 7/14/2020 7/21/2020 7/28/2020 08-04-2020 08-11-2020 8/18/2020 8/25/2020 09-01-2020 09-08-2020 It is observed from the figure that Chinese growth rate was maximum in the very first phase of the outbreak and then USA and UK followed China. But, on an average, the growth rate of China in all the forecasted period is 0.00001 per cent while that of others are around 1 per cent.

We now try to show whether there is a significant difference in forecasted values (for the period July 12 -September 11) of the number of incidences in level and growth forms in comparison to the actual values (for the period January 21 st -July 11). For this purpose, we have computed the mean values of the forecasted values in levels and growths and that of actual values in levels and growths. After that we have computed the standard deviations of the same in both actual and forecasted periods. The test statistics for examining whether there are significant increases or decreases in the level and growth of the forecasted values vis-àvis the corresponding actual values across the countries is student t statistics.

The formula for t statistics for the mean difference is-

with degrees of freedom (n f +n a -2).

where 'µ' stands for mean value for forecasted (f) and actual (a). S 2 represents variance of the forecasted and actual values.

The mean and standard deviation (SD), mean differences and t values for both the level and growth forms are respectively given in Table 2 and 3. -500 0 500 1000 1500 1/21/2020 1/28/2020 02-04-2020 02-11-2020 2/18/2020 2/25/2020 03-03-2020 03-10-2020 3/17/2020 3/24/2020 3/31/2020 04-07-2020 4/14/2020 4/21/2020 4/28/2020 05-05-2020 05-12-2020 5/19/2020 5/26/2020 06-02-2020 06-09-2020 6/16/2020 6/23/2020 6/30/2020 07-07-2020 7/14/2020 7/21/2020 7/28/2020 08-04-2020 08-11-2020 8/18/2020 8/25/2020 09-01-2020 09-08-2020 Whenever we talk of the differences in the growth rates between forecasted and actual values then we see (refer to Table 3 ) that all the countries' forecasted growth rates are getting lower and lower as compared to their actual growth rates. This is evidenced from the negative and significant values of the calculated t statistic. Note: Bold marks indicate significant results at 5% level.

The negative sign of the t values gives the relieving signs to the affected countries in the sense that, although the number of cases is increasing day by day, the growth rates of such increases are going down over time. But it is not clear from the above analysis in which day there will be zero growth of the cases for the countries except China.

The study so far, we made is now in a position to conclude. We started our journey by considering the objectives of forecasting the number of incidences of the seven highly affected countries of the globe in one hand and the severity of this forecasted values on the other. It was observed that USA and India are the two countries whose increasing trends will continue in the forecasted period (July 12 to September 11), others except China will face lower number of incidences. China's incidence has come to halt around 80000 in numbers.

The growth rates of the number of incidences for all the countries during the forecasted period will be diminishing. The mean difference test results between the forecasted and actual values in level and growth forms show that in the former case, USA, India, UK will face increasing forecast than the actual number but in the latter case, all of the countries will face significantly decreasing growth rates in the forecasted values compared to their actual growth values. Hence, in terms of total number of incidences, the forecasting results provide us gloomy pictures but in terms of the growth figures, the sign is definitely of relieving signs.

The study recommends for maintaining appropriate measures such as physical/social distancing, awareness campaigns, large scale testing, uses of masks and sanitizers, incentives for inventing vaccines, sizable amount of national incomes on health care facilities, relief funds for the affected zones in terms of kinds to avoid outside home movements, etc.

In preparing the manuscript the author did not face any conflict of interests and did not use any such materials of others where such conflict would at all arise.

While doing the research and preparing the manuscript the author did not use funds of any government or agencies. 

It is to declare that in preparing the manuscript I have used the open data of WHO and research articles openly available and I have no conflict of interest with any person in this regard.

Why Do COVID-19 Fatality Rates Differ Across Countries? An Explorative Cross-country Study Based on Select Indicators

Immune system of African-Americans

Time series analysis forecasting and control

Forecasting dengue incidence in Dhaka, Bangladesh: A time series analysis

Forecasting the dynamics of COVID-19 pandemic in top 15 countries in April 2020: ARIMA model with machine learning approach

Forecasting COVID-19 impact in India using pandemic waves Nonlinear Growth Models

Prediction of the COVID-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (ARIMA) model

Time series Analysis of Dengue Hemorrhagic Fever Cases and Climate: a Model for Dengue Prediction

Research on COVID-19 Based on ARIMA Model-Taking Hubei, China as an example to see the epidemic in Italy

Considering BCG vaccination to reduce the impact of COVID-19. The Lancet

The association between international and domestic air traffic and the coronavirus (COVID-19) outbreak