key: cord-0888685-g8epsrc5 authors: Banerjee, R.; Bhattacharjee, S.; Varadwaj, P. K. title: Analyses and Forecast for COVID-19 epidemic in India date: 2020-06-28 journal: nan DOI: 10.1101/2020.06.26.20141077 sha: 16e5239eb40bb2ea32118e49f6c1412767d3a697 doc_id: 888685 cord_uid: g8epsrc5 COVID-19 is a highly infectious disease that is causing havoc to the entire world due to the newly discovered coronavirus SARS-CoV-2. In this study, the dynamics of COVID-19 for India and a few selected states with different demographic structures have been analyzed using a SEIRD epidemiological model. A systematic estimation of the basic reproductive ratio R0 is made for India and for each of the selected states. The study has analysed and predicted the dynamics of the temporal progression of the disease in Indian and the selected eight states: Andhra Pradesh, Chhattisgarh, Delhi, Gujarat, Madhya Pradesh, Maharashtra, Tamil Nadu, and Uttar Pradesh. For India, the most optimistic scenario with respect to duration of the epidemic shows, the peak of infection will appear before mid September with the estimated R0=1.917, from the SEIRD model. Further, we show, a Gaussian fit of the daily incidences also indicates the peak will appear around middle of August this year. Our analyses suggest, the earliest dates when the epidemic will start to decline in most states are between Jun-August. For India, the number of infected people at the time of peak will be around 1.6 million including asymptomatic people. If the community transmission is prohibited, then the epidemic will infect not more than 3.1 million people in India. We also compared India's position in containing the disease with two countries with higher and lower number of infections than India and show the early imposition of lockdown has reduced the number of infected cases significantly. COVID-19 was announced as a global pandemic by World Health Organization (WHO) [1] owing to its highly contagious and pathogenicity that has been rapidly spreading throughout the world since its first reported outbreak in China in December, 2019. This highly contagious virus is spreading in a much higher rate than the other earlier reported corona viruses. In India, the first case of COVID-19 was reported on 30 January, 2020 and even after 23 weeks of its existence and four phases of lock-down, the daily rise of cases is alarming. To manage an epidemic like this, the administrations and health departments across the country like India with 1.35 billion population, need a more or less clear idea of the challenges they need to tackle during the course of epidemic. Also a forecasting about the said outbreak can help in decisive planning and management of the resource allocation. Therefore a detailed analysis based on real data should prove to be very much useful to estimate the possible amount of testing kits, hospitalizations, quarantine centres etc. needed to contain the disease. Email addresses: rudrab@iiita.ac.in (Rudra Banerjee), srijuster@gmail.com (Srijit Bhattacharjee), pritish@iiita.ac.in (Pritish Kumar Varadwaj) The motivation behind this work is to forecast the situation with the help of SEIRD population model, a well established model to study the dynamics of infectious diseases. This may be useful for the health officials and administration to estimate how they should be prepared for coming days to provide treatment and containment measures for COVID-19. The assessment for different states will also show the trend of the epidemic. With the sufficient amount of data in our hands, we can nowcast and forecast the possible situations that a state or the whole country is going to face. India invoked complete nationwide lockdown in the early phase of the epidemic on March 25th. This has prevented rapid transmission of the virus among people and also has given time to the administration and health officials to prepare for the measures that needed to be taken to tackle the situation. However, an early lockdown for a country with over 13 billion people has slowed down the pace of the spread of the virus considerably and resulted in a delayed appearance of flattening of the temporal progression of the infected people. It is thus important to find how long should the countrymen remain alert and careful about the spread of the virus. For prediction of the progression of epidemic, we need a well structured data. The accurate estimation of model parameters would give us most plausible situation. Although the data published by ICMR, and other agencies are quite well organised but not sufficient to compute certain parameters accurately. In this situation we have two options. Either we take help from estimates of parameters from other countries where a well organised data was used or we can take data of various states in India with different demographics, compute the parameters for each of them, and compute the values of those parameters to find an estimate the parameters. We have chosen 3 states with high population density (Delhi, Tamilnadu, Uttar Pradesh), three states with medium population density (Andhra Pradesh, Maharashrtra, Gujarat) and two states of lower population density (Madhya Pradesh, Chhatisgarh). These states are also further divided into high, moderate and low infection rates. We have employed same methodology for each of these states and projected the values of the different parameters and executed the Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) model to see the trend of COVID-19 transmission. The trends are then compared with the existing data for those states, which in turn enhance predictibility. The recently reported work [2] has compared the whole genome sequence of 28 viral strains from India. The analysis has reported a novel non-synonymous mutation in NSP3 gene of SARS-CoV2 along with other frequent and important mutations reported across the globe. This suggests the possibility of having different strains at different part of India looking into the demographic density, travel history, inflow of migrant labours, cultural practices and living style. It may be also the reason of variation of the basic reproductive ratio (R 0 ) values across the selected states, as reported in this study. As already mentioned, for studying the SEIRD mode, the parameters like-incubation rate, mortality rate, R 0 and recovery rate are required to be calculated. For a given epidemic R 0 is fixed,it denotes the number of secondary infections are generated from a seed infection. The value of R 0 is calculated when an epidemic is in its growing stage or early stage. We have employed a maximum likelihood estimation method depending on daily incidence and generation time outlined in [3, 4] . The value of R 0 is estimated to be 1.917, which is within the ranges reported in various studies [4, 5, 6, 7, 8, 9, 10] . The range is expected to become narrower if analysed with more data. Containment measures may change the effective reproductive ratio R e and in India it is currently between 1.35-1.46 as estimated in this study. The incubation period (d i ) for COVID-19 has been estimated between 2-14 days (WHO). In many studies it is suggested that mean incubation period is between 5-8 days [11, 12, 6, 13, 14] . Also the average incubation period is estimated to be 5.2 in a Wuhan based study in [15] as notified in Worldometer. We have taken a range of 5-7 days, for mean incubation period motivated by these studies and Indian studies [16, 17] . The other important parameter for modelling the epidemic is to find the period of infectiousness. For India, without having relevant data, we have resorted to look upon other studies. The other studies suggest mean infectiousness period (d if ) is between 3 − 5 days [7, 18] . It is to also remember that d if for India will be on lower side , i.e. will be close to 3 days as lock-down, social distancing, contact tracing and isolation of the infected or exposed people are done from very early stage of the disease. However, for a comparative study we have considered the values 3, 4, 5 days for analysing the model. The mortality rate of COVID-19 is generally low. In India it is only 3.1% of the total infected population has deceased as of current data. We have calculated the mortality rate for the 8 states and for India using total number of deaths and total confirmed cases till date. This study shows, India has done much better than some other countries where the number of infections per million has been quite large. The infection in India will decline around 10-12 Spetember. Other states also may see flattening of the curve between June-August. The number of infections would have been at least three times higher than the present scenario with containment measures in place for India. We have employed a standard epidemiological model for infectious disease, namely SEIRD model. Let us briefly discuss the model. To model and analyze the data the Susceptible(S) -Exposed(E) -Infectious(I) -Recovered(R) -Dead(D) model is used. Since COVID-19 has a latent phase during which the individual is infected but not yet infectious. This delay between the acquisition of infection and the infectious state can be incorporated within the simpler SIR model by adding a latent/exposed population, E, and letting infected (but not yet infectious) individuals move from S to E and from E to I. D is the dead population. The dynamical equations for SEIRD model are : For simulation purpose we cast the above equations in the following discrete time progression form with initial conditions S(0) > 0, E(0) 0, I(0) 0 and R(0) 0. Constants β, γ, σ, ν are defined in section 3. Mortality rate can be calculated from Covid-India data and is related to ν. The recovery rate is inverse of mean infectiousness period, that is average time to get cured from the onset of symptoms. The recovery rate is taken as 3, 4 and 5 days, as suggested in [6] from the studies based on Wuhan, inverse of which gives γ in equation (2.1). The number of people infected, recovered and deceased daily from 14th March to 12th June from Indian government's website [19, 20] that are publicly available and data from public domain [21, 22] was obtained. From this available data, few pre-processing were carried out for the SEIRD based model calculations. The population density was collected from 2011 census [23] . Let us now describe the definitions and method of computation of the parameters for studying the population dynamics with the help of SEIRD model. • Basic Reproductive Ratio (R 0 ): The reproduction rate R 0 can be calculated with the daily cases starting from day 1 (March-14). Assuming that daily incidence obeys an approximate Poisson distribution, one employs Maximum Likelihood Estimate method. Given observation of (N 0 , N 1 , ..., N ) incident cases over consecutive time units, and a generation time distribution w, R 0 is estimated by maximizing the log-likelihood The random vector w i , can be estimated as mean doubling time. w is also called serial interval distribution [11, 24] . R 0 calculated for data upto 23 March produced a value between 1.5 − 4 [10] . R 0 produced for data up to 24 May, is 2.093 and R 0 = 1.917 for data up to 12 June. • Serial Interval (T g ): The serial interval use to be the sum of incubation period and infectious period. Incubation period is the number of days after which a person develops symptoms after being exposed to an infected. Whereas infectious period is defined as the number of days an infected person remains infectious or can infect another person, after getting infected. Doubling time is defined as time period in which the cumulative infection doubles [25, 26] . Doubling Time is a function of both T g and R 0 as ln(2)×Tg R0−1 [27] . • Effective Reproductive Ratio (R e ): This is a dynamical quantity that reflects on average during a pandemic the number of person gets infected by an infected person. R e takes care the effect of measures taken to prevent the spread of the infectious disease. Therefore it will vary as time progresses and lockdown, social distancing, contact tracing etc. are being employed by Government and local administrations. The effective reproductive ratio can be calculated in several ways [3] using a detailed data that can trace the secondary infections generated by a primary infector. However, in absence of such data we have employed a different method depicted in [28] . The method uses average latent period (incubation period) d i , average serial interval T g , growth rate λ of the epidemic when it is in exponential phase. The reproductive number then can be estimated as where λ = ln [Y (t)]/t, and f is the ratio of the mean latent period, i.e., time from infection to onset of infectiousness, to the serial interval or doubling time in our case. T g is taken to be the sum of the mean latent period and the mean duration of infectiousness. We have seen the effective reproductive ratio has reduced to 1.45 on the 90th day from 1.84 (on 55th day) currently with T g = 10 days. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. [29] indicate the infectious period is between 3 − 5 days. • Incubation rate (σ): Incubation rate is calculated by taking inverse of mean incubation period d i . WHO data and other studies [29] indicate the infectious period is between 5 − 7 days. • Mortality Rate: (ν): The mortality rate is calculated as number of individuals died per day divided by the total infected population. We have chosen 8 states based on their population density and COVID-19 infected per million population, as shown in Figure (1) . We have omitted low population -high infection, as it does not exist in Indian scenario. We have considered population density > 500/km 2 as high and < 300/km 2 as low. On the other hand, number of infection > 500/10 6 people, is treated as high and < 100/10 6 is considered as low in our distribution. Previous results suggest average incubation period of SARS-CoV-2 virus between 5 to 7 days [11, 30, 6] . So, we have studied the scenario for incubation period of 5(best case scenario), 6 days [11, 30, 6] . So, we have studied the scenario for incubation period of 5(best case scenario), 6 and 7(worst case scenario) days for each state. Also for each incubation period, we have calculated the infection for γ=3,4 & 5. Andhra Pradesh (AP), a state in Deccan peninsula of India, has moderate population density and low covid infection, as shown in table(1b). The SEIRD results for σ = 5, 6 & 7 and γ = 3, 4, & 5 is shown in Figure (2) . Detailed data are shown in Table (1a) . According to our calculations, R 0 (AP ) 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. calculations also shows the pandemic should over in Delhi by 101 to 153 days. 4.8% to 7.99% of susceptible population may be actively infected in Delhi. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. . Gujarat(GJ), the western state in India, has moderate population density and also moderate COVID-19 infection, as shown in Figure (1 Table ( 1g). According to our calculation, R 0 (T N ) is 2.356. Our computation also shows the pandemic should over in TN by 95 to 143 days. 6.2% to 10.4% of susceptible population may be actively infected in TN. Uttar Pradesh(UP), with highest population in India, has high population density but low COVID-19 infection, as shown in Figure (1) . SEIRD results for UP is shown in Figure (9 ) and the detailed data is shown in Table ( 1h). According to our calculation, R 0 (U P ) is 2.416. Our study also 7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. . https://doi.org/10.1101/2020.06. 26.20141077 doi: medRxiv preprint shows the pandemic should over in MP by 99 to 150 days. 6.5 to 10.8% of susceptible population may be actively infected in UP. Finally, India as a whole, has high population density but low COVID-19 infection, with only 30 infected per million population. But, in absolute figure, with 1.3 billion population, this figure is alarmingly high and beyond the capacity of existing medical infrastructure. SEIRD results for India is shown in Figure ( 10) and the detailed data is shown in Table ( To get a real estimate of total infected population when the disease will cross the R e ≤ 1 mark we have calculated R e for seven periods between 51-90 days, starting from 14th May using equation (3.2) . A linear fit shows R e should go below 1.0 by middle of August as shown in Figure (11a) . This result matches with a Gaussian fit of daily infection, which also reaches its peak around the same time, as in Figure ( From the Gaussian fit, the total active cases is calculated to be around 5, 12, 700 at the time of peak. Total number of infected people will be roughly 3.1 million at the end of transmission. With the SEIRD model, we can see the peak appears around September middle. From the ICMR data [31] , it is observed that the current rate of positive cases is 6.4% of the number of tests done. If the test rate remains the same with a steady increase of 15000 tests per fortnight, the total number 9 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 28, 2020. . of tests in India will reach approximately 25.8 million around mid September. The present rate yields a total 1650560 infected cases in India around mid September. These projections are quite remarkably consistent with the predicted values from the Gaussian fit. From Gaussian fit, the total infected at the peak appears to be approximately 1.612 million. It must be noted, the numbers projected from the Gaussian fit are excluding the asymptomatic cases that are not being reported officially. Finally, we compare our result with others countries. For this calculation, we have taken data from worldometers [22] from 15th February to 25th June. We have chosen Italy(3,969 cases/million population), where the pandemic is virtually over, and Russia(4,254 cases/million population), where pandemic is still in severe but in decline. As per our discussion in last paragraph, the figure (11c) shows though India has quite low infection rate (364/million population), per day. Also at the peak India will see almost 30000 case which is lower than the countries like USA, Brazil where at the peak the number exceeded well above 30k. This study can be divided into two parts. The first part contains estimation of a few parameters that were used to predict the course of the disease in India and other states from the data. The second part contains the implementation of SEIRD model with the help of the estimated parameters. In particular, using COVID-19 data we have computed the R 0 s for India and 8 other states with varying population density and infection status. We have implemented a simple but effective model to explain the growth and decline on the epidemic from Indian context. Our study shows that, for states having low (Chhattisgarh) or moderate (Uttar Pradesh) infection rate the pandemic peaks as early as June, where in for a state like Maharashtra, it may take a month longer. In view of unavailability of detailed and structured data to prominently project the incubation rate, recovery rate, etc., we have considered a range of values motivated from the trend of the disease in those 8 states and in India. The general trend is, for higher incubation and infectious period the peak appears late. We have shown, an incubation period of 5 days and an infectious period of 3 days are the most favourable values for which the infection will be under control within 182 days from the onset (assumed to be 14 March in our study) in India. This means the flattening of curve will start around 10 September. If the containment measures are not obeyed and community transmission happens then the effective reproduction ratio will increase and there will be delay in attaining the control over the disease. In the worst scenario the peak may appear around mid December. A Gaussian plot with the available data for India indicates the flattening of the number of daily incidence will start around mid of August. Although this is earlier by almost three-four weeks than the prediction from SEIRD model but we can get a reasonable idea of expected number of infected at peak. This is calculated to be around 1.6 million. With an early estimation of R 0 as high as 4 before lockdown, the total infected population has been reduced to at least 1/3rd of the projected value we have reported here. The states like Chhattisgarh, Karnataka, Kerala have already attained their peaks or on the verge of attaining. In particular, our analysis suggests, Chhattisgarh has already attained its peak infection. This is also consistent with the official data. For other states the peak will be attained within 1-2 months from now. So overall, the situation is improving. This is also indicated from the gradual reduction of R e values for India in (11a). The presentation made here have necessary caveats. The computation method for effective reproduction ratio (R e ) can be improved with a more detailed data and using a time-dependent method suggested by [24] . The apparent mismatch of 3-4 weeks of the Gaussian fit and model 10 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.20141077 doi: medRxiv preprint WHO director-general's remarks at the media briefing on covid-19 Phylogenetic analysis of the novel coronavirus reveals important variants in indian strains The R0 package: A toolbox to estimate reproduction numbers for epidemic outbreaks Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study A systematic review of covid-19 epidemiology based on current evidence cddep.org, Covid-19 in india Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in india: A mathematical model-based approach The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: Estimation and application A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Covid-19 pandemic scenario in india compared to china and rest of the world: a data driven and model analysis, medRxiv Studying the progress of covid-19 outbreak in india using sird model, medRxiv Clinical presentation and virological assessment of hospitalized cases of coronavirus disease 2019 in a travel-associated transmission cluster Covid19 statewise status Covid-19 statewise status Total coronavirus cases in india Provisional population totals Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures An Introduction to Infectious Disease Modelling Infectious Diseases of Humans Infectious disease dynamics Transmission dynamics and control of severe acute respiratory syndrome Estimation of the effective reproduction number of influenza based on weekly reports in Miyazaki Prefecture Sars-cov-2 (covid-19) testing: Status update We acknowledge the infrastructural supports provided by IIIT Allahabad where this work has been done during the lock down period. where actually the first case was detected as early as last week of January. The incubation period, infectiousness period are needed to be calculated from more detailed data identifying an index case and then studying its subsequent transmission channels. That will make the prediction more accurate. An age-structured analysis of the disease would be useful to see the effect of the disease for different age groups.