key: cord-0765747-2j93ibmg authors: Ajadi, N. A.; Ogunsola, I. A.; Damisa, S. A. title: Modelling the Occurrence of the Novel Pandemic COVID-19 Outbreak; A Box and Jenkins Approach date: 2020-06-16 journal: nan DOI: 10.1101/2020.06.15.20131136 sha: 86cb387e7f15d14fc9f80547ca0fef62f1b97fd9 doc_id: 765747 cord_uid: 2j93ibmg The corona virus disease 2019 (COVID-19) is a novel pandemic disease that spreads very fast and causes severe respiratory problem to its carrier and thereby results to death in some cases. In this research, we studied the trend, model Nigeria daily COVID-19 cases and forecast for the future occurrences in the country at large. We adopt the Box and Jenkins approach. The time plot showed that the cases of COVID-19 rises rapidly in recent time. KPSS test confirms the non-stationarity of the process (p < 0.05) before differencing. The test also confirmed the stationarity of the process (p > 0.05) after differencing. Various ARIMA (p,d,q) were examined with their respective AICs and Log-likelihood. ARIMA (1, 2, 1) was selected as the best model due to its least AIC (559.74) and highest log likelihood (-276.87). Both Shapiro-Wilk test and Box test performed confirm the fitness of the model (p > 0.05) for the series. Forecast for 30 days was then made for COVID-19 cases in Nigeria. Conclusively, the model obtained in this research can be used to model, monitor and forecast the daily occurrence of COVID-19 cases in Nigeria. organ including the lungs. When the virus gets contacted with the cells, the body immune system fights back and reduces the efficiency and spread the virus to the entire cell. Due to this, there is high mortality rate in the older people since they have low immune system to fight the virus and this makes the number of recovered cases in the youths to be higher because they have stronger immune system. According to the world health organization (WHO 2020a), only 2% of the reported cases were below the age of 18 years, and the virus molecule can survive for about 8 hours on paper and fabric while it can survive for more hours on metals, plastic and glass. When an individual has contact with any of these materials which has been contaminated and still in its inoculation period, the virus will temporarily settle on the host, if the host eventually gets to mouth, the eyes or inner part of nose, which are medium to the nasal area, such individual is at high risk on being infected. The initial symptoms include sneezing, itchy eye, dry cough and so on. Recent study shows that, if an individual have closed relationship with a patient with any of the aforementioned symptoms, such individual will be in isolation for at least two weeks, before undergoing a test, if tested positive, will immediately be quarantine for immediate medical care and treatment. In this research, we focus on the study of the trend of the pandemic virus, investigate the causes, develop a suitable model to determine its trend and also use the model to predict future expected developed cases. Literature Review: Though the COVID-19 is a novel pandemic outbreak, but several authors have made immense contribution towards investigating the outbreak and proffer immediate solution to curb the spread of the disease. Ying et al. (2020) published an up to date review of the literatures on the novel pandemic disease. This shall be discussed in details in this section. According to WHO (2020), the average reproductive number is 1.95, that is, it ranges from 1.4 to 2.5. Joseph et al. (2020) was among recent authors to write about the outbreak of coronavirus in Wuhan, the authors applied the use of the Stochastic Markov chain Monte-Carlo methods with sampling prior using the posterior distribution. The authors discovered that the average reproductive number (R 0 ) for the pandemic outbreak is 2.68. i.e. (R 0 =2.68). Shen et al. (2020) developed a model with population divided into five compartments, the authors introduced the non-linear least squares method to get the point estimate with R 0 =6.49, which far greater than the WHO projection. Also, Lin et al (2020) introduce statistical exponential growth with the use of SARS by fitting the exponential growth using poisson regression, the R 0 was found to be 2.90, which is not too wide than the WHO value. Read et al. (2020) also developed a mathematical transmission model by using the Poisson distribution to assume the daily time increment. The reproductive number was found to be 3.11. The mathematical incidence decay and exponential adjustment model was introduced by Majumder et al. (2020). The authors adopted mean serial interval lengths in fitting the model. Their R 0 was recorded 2.55, which is very close to the WHO standard. Recent articles have published in modelling the occurrence and studying the trend of the novel COVID-19 pandemic, such articles include Cori et al (2020), Dung et al (2020), Yuan et al (2020) and WHO (2020b). In this article, instead of using the basic reproductive number in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 16, 2020. . determining the spread of the pandemic outbreak, we intend to adopt a suitable ARIMA model that will assist in forecasting the occurrence of the spread in Nigeria. The Box-Jenkins approach will be adopted in this research and the autoregressive integrated moving average (ARIMA) will be discussed in this section. The Box-Jenkins refers to a systematic method of identifying, fitting, checking, and using integrated autoregressive, moving average (ARIMA) time series models. The method is appropriate for time series of medium to long length (at least 50 observations). In this research we will adopt the Box-Jenkins method, concentrating on the Arima model. One of the steps in the Box -Jenkins method is to transform a non-stationary series into a stationary one. ARIMA models are, in theory, the most general type of models for forecasting a time series which when not stationary can made to be "stationary" by differencing if necessary, probably in conjunction with nonlinear transformations such as logging or deflating. A random variable that is a time series is stationary if its statistical properties are all constant over time. A stationary series has no trend, its variations around its mean have a constant value, and it changes in a consistent way, i.e., its short-term random time patterns always look the same in a statistical sense. The latter condition means that its autocorrelations (correlations with its own prior deviations from the mean) remain constant over time. There autoregressive moving average (p,q) is of the form The white noise of the process is expressed as{ε t }. In lag form, equation (1) becomes; (2) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . An ARIMA process of order (p, d, q) with ARMA (p, q) process can be defined by: Where p, d, q are non-negative integers. Note: When d =0, the ARIMA (p, d, q) becomes ARMA(p, q). In this research the ARIMA model would be applied to the data on Covid-19 to model total daily new cases of the virus from inception February 27 2020 and to further forecast for Total daily occurrence of the virus in the country for a period of time. The most useful and informative diagnostic checks deal with determining whether the assumptions conditioning the innovation series are satisfied by the residuals of the autoregressive integrated moving average (ARIMA) model. Diagnostic checks only have meaning if the parameters of the model are efficiently estimated using the maximum likelihood approach at the estimation stage. For all of the diagnostic checks presented in this paper, it is assumed that a maximum likelihood estimator is used to estimate the model parameters. The paper discusses Shapiro and box test, whiteness tests, portmanteau tests, normality tests, constant variance tests, and so on. The result of the analysis obtained from daily cases of total Covid-19 in Nigeria between February 27, 2020 and May 1, 2020 is presented in this section. The exploratory analysis, Auto Correlation Function (ACF),Partial Auto Correlation Function (PACF), Augmented Dickey-Fuller (ADF) test, Kwitatkowski Philips Schmidt Shin (KPSS) test, Akaike Information Criterion (AIC) and the Autoregressive Integrated Moving Average (ARIMA) test were carried out, and the result were presented in section 4. More so, the discussion of the analysis is presented in the next section. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . https://doi.org/10.1101/2020.06.15.20131136 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . https://doi.org/10.1101/2020.06.15.20131136 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. Table 1 and Figure 1 show the summary of statistics of Covid-19 daily cases in Nigeria from February 27 to May 1 and the histogram of the distribution of the cases. The exploratory data analysis was done to determine the average case for the time frame used in this research, it was recorded that average daily number of thirty-four (34) cases were reported, with the initial case reported to be 1, this was at the initial stage of the outbreak, and the maximum daily case recorded so far is 238 as at May 1. Both the standard error and variance are very high indicating a high difference among the daily cases. Figures 2 and 3 also show the time plots of total daily Covid-19 cases before and after differencing. The result shows that the process is not stable over time. High and persistence rise is greatly noticed from the plot. Table 2 contains KPSS test . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . statistics. The test was used to confirm the non-stationarity of the process before differencing and stationarity of the process after first differencing. Figures 4 and 5 show the ACF and PACF of the series. The ACF decays slowly while the PACF only exceeds the significant bound at lag 1 and 5 only. Table 3 show the summary of ARIMA (p,d,q) models examined with their AICs and Log-likelihoods. The result showed that ARIMA (1,2,1) has the least AIC (559.74) and the highest Log-likelihood (-276.87). Hence, ARIMA (1,2,1) is selected for the process. The model selected is written as: Also, Figure 6 shows the diagnostic plot for the selected model. The results shows that the model fits the process (p-values > 0.5), thereby we fail to reject the null hypothesis and conclude that the model fit the daily occurrence of Covid-19. Table 4 shows Shapiro-Wilk and Box test and the two tests attest to the goodness of fit of the fitted model. Conclusively, forecast for thirty days (May 8, 2020 to June 6, 2020) was then made for the daily cases of COVID-19 cases in Nigeria (see Appendix II) to show how crucial and paramount the need for quick and proactive measure and intervention to curb high and persistence increase in the occurrence of Covid-19 cases. Succinctly put, the rate of daily occurrence of the novel pandemic Covid-19 outbreak in Nigeria continue to increase every day. Thereby resulting into high loss of lives which may lead to extinction and paralysis of the economy and so its control must be of high concern. Hence, the model obtained from this research can be used to model, monitor and forecast the future case of Covid-19 in Nigeria as a whole so as to put a stop to the high mortality rate caused by the pandemic disease. The fear for the coronavirus is really high among people due to the high rate of its spread and lack of adequate knowledge about the cause(s) and how it spread. Hence, more articles and publications about sensitizing people on the coronavirus need to be made available. Also, preventive measure highlighted by WHO such as sanitization of hand regularly, avoiding direct contact with eyes, mouth and nose with unclean hands, keeping distance of at least one meter, avoiding handshake, not travelling while one is sick and seeking medical attention when symptoms of the Covid-19 such as high fever and so on are observed are hereby recommended as solutions to avoid quick spread of the diseases. It is expected that effective implementation of the above recommendations will guide and help in preventing the spread of the disease and thereby reduces its daily occurrences. 1. https://www.who.int/news-room/q-a-detail/q-a-coronaviruses . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 16, 2020. . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 16, 2020. . https://doi.org/10.1101/2020.06.15.20131136 doi: medRxiv preprint Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Modelling the epidemic trend of the 2019 novel coronavirus outbreak in China Transmission dynamics of 2019 novel coronavirus (2019-nCoV) Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions Early transmissibility assessment of a novel coronavirus in Wuhan Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root A new framework and software to estimate time-varying reproduction numbers during epidemics An interactive web-based dashboard to track COVID-19 in real time World Health Organization. Coronavirus (COVID-19) events as they happen Monitoring Transmissibility and Mortality of COVID-19 in Europe