key: cord-0191495-pkmryqjm
authors: Waqas, Muhammad; Farooq, Muhammad; Ahmad, Rashid; Ahmad, Ashfaq
title: Analysis and Prediction of COVID-19 Pandemic in Pakistan using Time-dependent SIR Model
date: 2020-05-05
journal: nan
DOI: nan
sha: b98364ed9cb505e44d8cff50fe29f0287458cf9e
doc_id: 191495
cord_uid: pkmryqjm

The current outbreak is known as Coronavirus Disease or COVID-19 caused by the virus SAR-COV-2 which continues to wreak havoc across the globe. The World Health Organization (WHO) has declared the outbreak a Public Health Emergency of International Concern. In Pakistan, the spread of the virus is on the rise with the number of infected people and causalities rapidly increasing. In the absence of proper vaccination and treatment, to reduce the number of infections and casualties, the only option so far is to educate people regarding preventive measures and to enforce countrywide lock-down. Any strategy about the preventive measures needs to be based upon detailed analysis of the COVID-19 outbreak and accurate scientific predictions. In this paper, we conduct mathematical and numerical analysis to come up with reliable and accurate predictions of the outbreak in Pakistan. The time-dependent Susceptible-Infected-Recovered (SIR) model is used to fit the data and provide future predictions. The turning point of the peak of the pandemic is defined as the day when the transmission rate becomes less than the recovering rate. We have predicted that the outbreak will reach its maximum peak occurring from late May to 9 June with unrecovered number of Infectives in the range 20000-47000 and the cumulative number of infected cases in the range of 57500-153100. The number of Infectives will remain at the lower end in the lock-down scenario but can rapidly double or triple if the spread of the epidemic is not curtailed and localized. The uncertainty on single day projection in our analysis after April 15 is found to be within 5%.

The Corona pandemic that began in the city of Wuhan in South China in early December 2019 has now become the global pandemic. The cause of the virus outbreak was later on identified as a novel coronavirus known as SAR-COV-2. The WHO has already declared the outbreak as a Public Health Emergency of International Concern and subsequently as a pandemic. In the last three months, the pandemic has spread around the world and has infected around three and half million people with about 250,000 causalities worldwide. The infected cases of COVID-19 are rapidly increasing in Pakistan as well and the positive cases have been reported from all parts of the country. Out of the total number of 19103 infections, as recorded on May 2, 4817 have recovered whereas 440 individuals have died.

The rising numbers of infections can overburden the already fragile health care system of the country in the coming months if the spread is not controlled. Any future treatment or vaccination of the COVID-19 will likely take at least a year to be available to the public. In the meantime, the only way to curtail the spread of COVID-19 is through precautionary measures and countrywide lock-downs. Such measures bring economic problems with themselves and are not easy to implement without economic losses. Therefore, an informed and effective decisions by policymakers based on a proper modeling of the pandemic can reduce the spread of the infection.

In the last few months studies have been carried out to understand the spread of the disease. For example, the Susceptible-Exposed-Infectious-Removed (SEIR) model is used to model the outbreak in the city of Wuhan China [1] . Analyses about Pakistan have been conducted but lack the reasonable estimates of the pandemic. The simplest version of the SIR model seems to be working better than other variants, in particular, the prediction made for China in [2] was found to be in agreement with the real data and hence used as a starting point. We followed a similar strategy to implement the time-dependent SIR model. Besides the fitting method in [2] , we checked the machine learning approach as documented in [3] but that method does not give superior results. The SIR model depends upon two tunable parameters β(t), that measures the transmissions per unit time and γ(t) that measures the recoveries per unit time. The proper estimation of these parameters can predict a reasonable number of infections and the peak of the pandemic. The exponential fit is used to estimate their values from data and then predicting future trends.

This manuscript is organized in the following way. In section 2, the mathematical model used for prediction is outlined. In section 3, the data used for the study in this paper and the analysis methods are discussed. Validation of the analysis machinery is discussed in section 4. The results are presented in section 5. The main conclusions are presented in the last section 6.

In the standard SIR model, the total population is divided into three groups. The Susceptible (S), the fraction of the total population that is vulnerable and at a risk of being infected. The Infective (I), the population that has been tested positive for infection. The Recovered or Removed (R), the population that has either recovered and reported negative or lost their lives. Here, we assume that a recovered person will not be infected again, because of developing immunity and through informed preventive measures and isolation. The model consists of the following set of non-linear differential equations that can be found e.g. in [4] .

where t is the time (in day), β(t) is the infection rate that means the number of persons contracting infection per unit time. The parameter γ(t) is the rate at which infected individuals are either recovered or died and are no longer host to the disease. The COVID-19 outbreak is a contiguous disease that has become pandemic quite quickly worldwide and, therefore, can be modelled with SIR. The set of equations (1-3) are coupled through the terms βS(t)I(t) (the newly added infected cases) and γI(t) (the newly recovered cases). The three groups of populations which are represented by variables S(t), I(t), R(t) and determined by the above set of equations, will sum up to give the total number of population i.e.

In the spirit of the model developed in [2] we take the total population of Pakistan as susceptible, however, the lock-down and social distancing measures reduced the size of susceptible population but a large number still remains vulnerable. Initially the number of confirmed infected cases is very low, and most of the population are in the suspectable state, therefore the S(t) is assumed as constant equal to the total population N and hence equation (1) can be ignored in the discussion here. In addition, the constant S in equation (2) can be absorbed in parameter β. The set of equations in SIR model can be simplified into a discrete version for implementation to the COVID-19 pandemic as the data is provided on daily basis. Equation (2) in the finite difference form becomes,

where ∆t is the interval for integration and the change in the infected individuals with unit time depends upon difference of β(t) and γ(t). From these equations, the expression for β(t) is given by,

The second parameter of the model γ(t) is determined from the difference of recoveries given by the expression,

3 Data Analysis

Corona Virus Disease that started in early December has become a global pandemic. It has already spread to more than 183 countries in about 3 months time. In Pakistan, the total number of infected cases have increased from zero to more than 19,000 in about two months as shown in Table 1 . As explained in section 2, the time-varying variables namely infection rate β(t) and removal rate γ(t) are the two very important parameters that have to be estimated from the available epidemiological data. The data used in this paper is taken from the John Table 1 : Epidemiological data of the COVID-19 pandemic, cumulative infected rate is represented by I(t), cumulative recovery rate including both dead and recovered cases are denoted as R(t).

Date Hopkins University dashboard [5] , which provides comprehensive list of worldwide data in the form of spreadsheet for comparison with other countries and analyses.

One important constraint on the estimation of parameters β(t) and γ(t) is that these variables should be monotonically decreasing and increasing functions with time, respectively. These constraints ensure the number of Infectives I(t) in the SIR model to follow a bell-shaped curve in accordance with the epidemic transmission model [7] . On physical ground, the parameter β(t) should be monotonically decreasing with time, otherwise, the pandemic will not stop unless all the susceptible population gets infected. The later case will be a very extreme situation, that has not been happening in the epidemics like SARS in 2003 and still not the case in COVID-19 in many countries such as China and Switzerland which are recovering and finishing the disease period. A second constraint on the parameter β(t) is that it should not be sharply decreasing otherwise the model will significantly underestimate the epidemic transmissibility. A good estimate for the parameter β(t) is found by fitting different subsets of data and constraining it by using the above mentioned criterions.

The parameter γ(t) is generally very small at the beginning of the epidemic because most Infectives have not recovered or removed yet and hence varies slowly with time. We have tried the time-varying parameterizations of γ(t) which did not help significantly. An unreasonably small γ values will cause unrealistically long epidemic duration and vice versa. In this analysis, we followed a similar strategy for estimating parameters β(t) and γ(t) from the data as discussed in [2] . Models are scanned for different fixed values of γ(t) and the uncertainty in the prediction of Infectives is presented for a window of removal rates. The interval for γ must contain the model which fits reasonably well to the data.

To predict the peak position of the distribution of Infective cases versus time, we also use another variable called the reproduction number R 0 (t), which is defined as the ratio of β(t) and γ(t). The reproduction number is also time-dependent variable. Initially, when the infection rate β(t) is large and removal rate γ(t) is small, the R 0 (t) is expected to be large, which decreases gradually and will become zero at the end of the epidemic. The peak of infective cases can be defined as the day when the R 0 (t) becomes equal to unity, afterward the removal rate dominates and the number of Infectives start decreasing. 

To cross-check our analysis, we have applied the same machinery as used in this paper to countries where the pandemic period has either ended or in the finishing phase. For the validation purpose of our analysis procedures and tools, we have reproduced some of the predictions for Chinese pandemic (plots not shown) and performed analysis for Switzerland, Italy, Germany and Spain. For the last four countries a summary is given in Table 2 . The numbers agree reasonably well with the data for these countries, except the prediction of the peak date of the pandemic which is based on the distribution of active infected cases as shown in Figure 1 (a). The peak date of the pandemic is off (overestimated) by a week with respect to the peak position in the data for Switzerland. The reason for discrepancy could be due to the fixed value of γ being used in the model instead of the time dependent parametrization. Another method(plots not shown) which has been studied for predicting the peak of the pandemic, is by comparing the number of daily new infections with the SIR model. That distribution gives the peak of the pandemic to fall in the mid May but the method itself is very sensitive to the fluctuations in data. On the other hand the cumulative active infection distribution is not very sensitive to any fluctuations in data(which may arises due to changes in the testing capacity/strategies or sudden spikes in the number of infections) and is therefore used as baseline method in this analysis. To take the discrepancy into account, an uncertainty of one week is assigned to the peak date of the pandemic determined with the default method used in this paper. The distribution of active Infectives and cumulative Infectives as a function of time for Switzerland are shown in Figure 1 . There has been a very good agreement between the data points and the SIR model for a recovery rate of 0.047. The inflection point of peak of pandemic occurred around 8 April and the model predicts around 29909 cumulative Infective cases in Switzerland, which can be compared to the total cumulative infection of 29905 on 3rd May with almost flatten curve. The results validate our analysis procedure and give us confidence to accurately predict the pandemic scenarios for Pakistan.

The input epidemiological data which has been used in this paper is shown in Table 1 . The cumulative infected cases in Pakistan on March 1st and 2nd were 4 and, therefore the infection rate β(t) is zero. On March 5, the cumulative Infective increased to 5 and, therefore, the infection rate β(t) jumps to 0.25. For the following two days, the number of Infective remains constant and thus β(t) is zero. The recovery rate γ(t) remains zero for obvious reasons and for the first time jumps to 0.17 when one person recovers on March 8. The infection rate jumps sharply to 1.67 on March 10 when the number of cumulative Infective becomes almost triple the number of cases on the previous day. During the following days, β(t) drops sharply once again. On March 16, there is a spike in the number of Infectives which makes β(t) to rise sharply. Such a sudden rise in β(t) due to change in the number of Infectives could be attributed to sharp increase in the number of tests performed or testing a large group of people arriving from abroad. To estimate the time-dependent infection and recovery rates, we pass cumulative Infectives data per day as input to SIR model. The parameters β(t) and γ(t) are estimated by using equations (6) and (7) . The values of β(t) and γ(t) obtained are shown as red circles in Figure 2 (a) and green circles in Figure 2 A time-dependent parametrization of β(t) is obtained by performing exponential fits to the infection rate data as shown in Figure 2(a) . To find the best fit, several subsets of data are extracted by sequentially excluding the data between 1st March to 5th March in the one-day interval. The results of exponential fits for five subsets of data are shown in Figure 2 (a). All the fits are having negative slops which give rise to β(t) decreasing with time. The fits corresponding to subsets of data from 2nd March to 5th March are excluded due to rapidly falling slope, as they will underestimate the peak and cumulative infected cases, and the ending of the pandemic. For pandemic predictions and evolution in coming months, parametrization obtained based on the data set of 1st March, is passed to the SIR model as time dependent parametrization of β(t). The exponential fit corresponding to 1st March is shown as solid line in Figure 2(a) . The exponential function based on dataset starting from 1st March essentially includes all the data since the start of the pandemic in Pakistan when four infected cases were reported.

The variable γ(t) is obtained from data by using equation (7) and plotted in Figure 2 (b). The parameter γ(t) is a slow varying variable with average value of around 0.02. In this analysis different fixed values of γ(t) are used instead of the time-dependent parameterizations to find an optimum window where accurate predictions can be made. For a range of recovery rates (different values of γ(t)), a plot of active Infectives, which is defined as the difference of cumulative Infectives minus recovered cases, is shown in Figure 3(a) . The active infected cases agree reasonably well with the predicted SIR model distribution for a recovery rate of 0.043. The peak of the pandemic will occur at 29 May with active infected cases of around 20000. The peak Infectives exclude the number of cases who either recovered or died as they no more contribute to the transmission of infection. The modelling has been done based on the current data which corresponds to the strict lock-down scenario. Any change in the lock-down and social distancing scenarios could rapidly escalate the infected cases which will shift the peak of Infectives accordingly. Therefore the prediction has to be made in a window of models for different recovery rates. For recovery rates from 0.043 to 0.034, the peak of the pandemic will occur between 29 May and 9 June, 2020. Comparison of data with predictions and the trend of active infection rate don't support models with γ(t) larger than 0.049. Those models will significantly underestimate the cumulative number of Infectives and hence excluded from the discussion here. The cumulative Infective cases as a function of time are shown in Figure 3 (b). The figure shows that the cumulative curves will start flattening in June-July. For a change of 0.003 in the recovery rate there is a significant change in the total number of cumulative Infectives as well as flattening of the curves. This suggests that better health care and recovery rates can significantly reduce the total number of infected people and can shorten the duration of the pandemic. The total number of infected cases could be 57651 for a recovery rate of 0.046 and can rise to 153149 for a recovery rate of 0.034. The total number of Infectives can easily cross a figure of 100,000 if the recovery gets worsen. The cumulative infections for upper and lower bounds are summarized in Table 3 . A comparison of the data and model prediction is zoomed-in in Figure 4 (a). The disagreement between 15 March and 10 April could be attributed to the change in health care strategies such as improving patients testing capabilities. Through enforcing rigorous countrywide lock-down, the active infection rate stabilizes after 10 April. The percentage error on the number of Infective per day is shown in Figure 4 (b). Apart from some outliers, the error on one-day prediction lies within 5% uncertainty band, which is shown as shaded region in the figure. To cross-check the turning point or peak position of the model, a variable known as reproduction number is defined as the ratio of infection rate to recovery rate. The plot of reproduction number R 0 as a function of time is shown in Figure 5 . The solid horizontal line shows the turning point when the recovery rate will dominate the infection rate. At this point, the pandemic will turn around as more people will recover than infected and this also marks the peak position in the SIR model. For different recovery rates the models cross the horizontal line in the time window from late May to 9th June, which marks the turning point of the pandemic. The reproduction number R 0 becomes less than unity after the peak or turning point of the outbreak and the pandemic is expected to fade away with 97% of the recovery already happened by August-September 2020.

The modelling and prediction of COVID-19 for Pakistan have been presented by using a simple time dependent SIR model. The two input parameters namely the infection rate β(t) and recovery rate γ(t) are estimated from data using equations (6) and (7). A time-dependent parametrization of β(t) is obtained by performing exponential fit to the data. The analysis has been validated by predicting the pandemic peak and cumulative Infectives of Switzerland and other European countries. The cumulative Infectives for Switzerland are predicted to be around 29909 when the curves will flatten in the first half of May. This prediction is in good agreement with the total Infectives of 29905 on 3rd May. The analysis is then performed to predict the pandemic scenario in Pakistan. A comparison of active infection rate for data and models are made for different values of recovery rate γ(t). The best match occurs for the model with a recovery rate of 0.043 for unrecovered peak Infectives of 20094, which is expected to happen on 29 May(with uncertainty of a week on the peak date). Based on a window of recovery rate γ(t), the prediction interval of turning or peak point of the pandemic is expected to occur from late May to 9 June with unrecovered number of Infectives from 20000-47000. The cumulative number of Infectives in the existing lock-down scenario is predicted to be in the range 57651-153100. These numbers are very sensitive to both transmission and recovery rates. For example, a 1% reduction in the recovery rate will triple the number of cumulative infected people. The number of Infectives will remain at the lower end under the strict preventive scenario and can rapidly double or triple if the spread of the epidemic is not contained through strict measures. To keep the cumulative Infectives around 60000 and to get rid of the pandemic by August-September, it is proposed to keep strict preventive measures intact till mid-June, 2020. The turning point of the pandemic is also modelled with the reproduction number R 0 which predicts the same time window as given by the SIR model. The reproduction number R 0 becomes less than unity after the peak of the active Infective distribution. The pandemic is expected to fade away completely with 97% Infectives recovered by August-September.

A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action

Early Prediction of the 2019 Novel Coronavirus Outbreak in the Mainland China Based on Simple Mathematical Model

A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons. Link to the latest version of paper

Networks: An Introduction

Covid-19

Mathematical modelling of SARS and other infectious diseases in China: A review

A simple stocchastic model for an epidemic-numerical experiments with MAT-LAB