key: cord-0698343-0ti5n24x authors: Kotwal, Atul; Yadav, Arun Kumar; Yadav, Jyoti; Kotwal, Jyoti; Khune, Sudhir title: Predictive models of COVID-19 in India: A Rapid Review date: 2020-06-17 journal: Med J Armed Forces India DOI: 10.1016/j.mjafi.2020.06.001 sha: 07be88e7d250df9a8bbd712789efdc4faf880202 doc_id: 698343 cord_uid: 0ti5n24x Background The mathematical modelling of coronavirus disease-19 (COVID-19) pandemic has been attempted by a wide range of researchers from the very beginning of cases in India. Initial analysis of available models revealed large variations in scope, assumptions, predictions, course, effect of interventions, effect on health-care services, and so on. Thus, a rapid review was conducted for narrative synthesis and to assess correlation between predicted and actual values of cases in India. Methods A comprehensive, two-step search strategy was adopted, wherein the databases such as Medline, google scholar, MedRxiv, and BioRxiv were searched. Later, hand searching for the articles and contacting known modelers for unpublished models was resorted. The data from the included studies were extracted by the two investigators independently and checked by third researcher. Results Based on the literature search, 30 articles were included in this review. As narrative synthesis, data from the studies were summarized in terms of assumptions, model used, predictions, main recommendations, and findings. The Pearson's correlation coefficient (r) between predicted and actual values (n = 20) was 0.7 (p = 0.002) with R2 = 0.49. For Susceptible, Infected, Recovered (SIR) and its variant models (n = 16) 'r' was 0.65 (p = 0.02). The correlation for long-term predictions could not be assessed due to paucity of information. Conclusion Review has shown the importance of assumptions and strong correlation between short-term projections but uncertainties for long-term predictions. Thus, short-term predictions may be revised as more and more data become available. The assumptions too need to expand and firm up as the pandemic evolves. The mathematical modelling of this evolving pandemic in India has been attempted by a wide range of researchers from the very beginning of cases in India. An initial analysis of these models regarding India, revealed large variations in scope, assumptions, prediction on numbers, the course of the pandemic in India, effect of various interventions, effect on health care services, etc. The literature search did not reveal any review of available models and thus, this study was conducted as a rapid review of the mathematical models used for prediction of COVID -19 in India for narrative synthesis and to assess correlation between predicted and actual value of cases in India. A review protocol was prepared and uploaded in Prospero for registration (Application ID -180513). All articles on mathematical models on the COVID-19 on India were included in the study with predefined inclusion and exclusion criteria. Since this review pertains to different types of mathematical modelling, it did not fit into any types of present guidelines available for systematic review and thus its is being titled as a rapid review and narrative synthesis was conducted as first objective. However, the PRISMA guidelines were followed to the extent possible. To be included in this rapid review, eligible studies had to meet the criteria: a) Study of predictive modeling; and b) Studies done only for India or as part of Multi-countries predictive model with India as one of the countries. The studies that were not included in the rapid review were: a) perspective studies without modeling; and b) studies or reviews without modeling. Studies published in English were only included in the rapid review. A comprehensive two step search strategy was formulated and adopted. First, the literature searches through databases (Medline, google scholar, MedRxiv and BioRxiv) was done. All the articles submitted to these databases for COVID were searched. The literature search was carried out till 10:00 AM on 22 Apr 2020 (IST). The search strategy for Medline using Pubmed has been provided as Supplementary File No.1. In second step, hand searching of the articles was done and known modelers were also contacted for unpublished modeling of the Indian COVID data. Data extraction: The data from the included studies were extracted on data extraction form by the two investigators, (2) and (3) independently. In case of the discordance in the data, the same was resolved with discussion involving the third senior researcher (1). The extracted data were tabulated in the form of two tables. The data were extracted for the following variables: type of mathematical models; Software used; profession of modelers; Effect of lockdown studied or not; Assumptions used; peak infected numbers; data and data sources. Main summary measures were peak infection rate and predicted value for the number of COVID cases. Since the data used for mathematical modelling is based on the hard data acquired from different sources, the predicted number may change in individual study, based on mathematical models used and assumptions taken. The data on peak infected infection if feasible would be averaged out in models giving predictions on full cycle on epidemics. In few studies, based on the mathematical modelling, the predicted value was calculated if not provided in the manuscript. Predicted values were plotted against actual values of the same date of epidemic. The relationship between predicted and actual value was explored using coefficient of determination. For the non-quantitative variables, qualitative synthesis was attempted. The statistical software StataCorp. 2013, Stata Statistical Software: Release 13, College Station, TX: StataCorp LP was used for statistical analysis. The p value of less than 0.05 was taken as statistically significant. Based on the literature search, 30 studies were selected for inclusion in the rapid review . The PRISMA chart is as per Figure 1 . The study characteristics including variables studied, lockdown effect, date of data collection and peak infected numbers are shown in Table 1 . The data extracted from 30 research articles showed that the Modelling on the data available in public domain started as early as 21 Mar 20 13 . The latest data used for modeling was of 13 Apr 20 40 . Mathematical techniques used for modeling were also varied. Types of models used have been depicted in Figure 2 , which shows that most studies (17, 56%) were published using SIR model or its variant. The assumptions made by different models regarding R 0 (R Naught), infectious period, recovery time, serial interval, etc., are given in There was a sudden increase in number of cases on 04 Mar 20 due to various reasons, one of those being change in testing policy 13 . Thereafter, the data regarding increase in testing, cases and deaths is consistent and amenable to model. The earliest SEIR modeling was done on the data from 05 Mar 20 to 23 Mar 20 13 . The majority were based on SIR or its modification, which was first introduced by McKendrick and Kermack and since then is popular for modeling of infectious diseaes 5 . All SIR (or its modifications) have certain assumptions, many of which act as model limitations. The commonest ones being fixed, homogenous population, random mixing, compartmentalization, not catering to change in population dynamics and agent characteristics during the epidemics. Although the approach is flexible to cater for all the #The number in box is reference number of studies assumptions, it increases in complexity and interpretation and moreover, many a times data are not available on above mentioned assumptions. Arithmetic, geometric, and exponential progressions are other methods of prediction. Linear regression models also include its variations like Lasso and ridge regression. Though they are easy to understand and good for short term predictions, their inherent properties preclude them from being accurate for long term predictions as is evident from our review of these models 21, 36 . Even techniques like ARIMA, used alone or in combination with wavelet transformation may be improved upon by use of repressor 27, 33 . However, since they are based on time series, any deviations from the past may not be captured by these models. There is a huge variation among the models in the numbers, which may be attributed to different assumptions by the models and because of mathematical models predicted for different time periods. Hence it was not possible to synthesize the pooled results. It is extremely important to understand the assumptions in the models. Our review showed that few of models did not explicitly mention their assumptions 17, 21, 29 , while some had too few 15 or too many 13 assumptions in the models. The review brings out another interesting fact about the wide varying assumptions used for modelling, for example the value of R 0 varied from 1.4 to 4.98. Such assumptions over wide range have implications on the number of cases which the models predict. These assumptions reflect uncertainty about the disease especially in an evolving pandemic. The study found a fair correlation for short term predictions, thus emphasizing the need for corrections of predictive models as more and more data become available. We opine that longterm predictions may be difficult as predictive models are based on parsimonious inputs for sake of better understanding, which with assumptions may not simulate real life scenarios. However, these short-term predictions are equally important for the health planners, decision makers, etc., for arrangement of adequate resources to tackle epidemics. Complex or hybrid models with explicit assumptions encompassing important ones like effect of non-pharmacological interventions (NPIs), age structure, interactions, stochasticity, quarantine, isolation, socio-economic, etc., are required especially in an evolving epidemic as unique as COVD-19. Most of the models did not incorporate uncertain data, which is an important paradigm of epidemiology. However, this could be attributed to less data to use for the models to begin with and is not a comment on the approaches or the methodology adopted. Another important contribution of mathematical models is the qualitative information generated by each model, which provides a range of inputs to the planners at various levels. This review has provided narrative synthesis of 30 models and can be utilized by modelers, planners and researchers. The rapid availability of models with a large number of those non-peer reviewed and also availability to the lay press and their own interpretation is fraught with the danger of models getting into disrepute. We as researchers and planners need to look beyond the straight-forward answers from the models (magnitude, numbers, mortality) and instead utilize models to try to implement policies which may change the predictions by various scenarios for the greater public and extinct (E), collectively termed SIDARTHE 44 . Now with more data availability, the future models for India may also look at further refinements utilizing different approaches and tools for better utilization of quantitative outputs of the models. Since mathematical modelling involves equations and predictions are made by solving them, there is little scope of subjectivity. The risk of bias as seen in other epidemiological studies may not be quantified. Hence it differs from other rapid review in this aspect. Explicit assumptions and the basis of the assumptions should be included in every predictive modelling study. Due to varied assumptions and mathematical models it becomes difficult to synthesize the results. Another important limitation is to check for the quality of studies of the mathematical modelling, the consensus may involve over period of time but as of now there is lack of scale for quantifying quality of study in Mathematical modelling. This review has clearly shown the importance of assumptions and strong correlation between short term projections but uncertainties for long term predictions. The results for long term predictions could not be synthesized as very few studies have provided the same. The short-term predictions may be revised as more and more data become available. The assumptions too will expand and firm up as the pandemic evolves, since at the start of pandemic data are sparse and making correct assumptions is difficult, models with more realistic assumptions may be developed. There is a case for State specific models in our country owing to the large variation in assumptions for each state. The biblical plague of the Philistines now has a name, tularemia. Medical Hypotheses Drought, epidemic disease, and the fall of classic period cultures in Mesoamerica (AD 750-950) WHO | Disease outbreaks by year. WHO. Available at An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it A contribution to the mathematical theory of epidemics Infectious Diseases of Humans: Dynamics and Control Mathematical modeling of infectious disease dynamics Mathematical modelling and prediction in infectious disease epidemiology A Novel Coronavirus from Patients with Pneumonia in China Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Live): 2,697,316 Cases and 188,857 Deaths from COVID-19 Virus Pandemic -Worldometer Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model Age-structured impact of social distancing on the COVID-19 epidemic in India Modeling and Predictions for COVID 19 Spread in India Modelling and analysis of COVID-19 epidemic in India Nature of transmission of Covid19 in India Epidemic Landscape and Forecasting of SARS-CoV-2 in India COVID-19 in India: Predictions, Reproduction Number and Public Health Preparedness Linear Regression Analysis to predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day 0 (100 cases Recent update on COVID-19 in India: Is locking down the country enough? Corona Epidemic in Indian context: Predictive Mathematical Modelling COVID-19 epidemic: Power law spread and flattening of the curve COVID-19: Mathematical Modeling and Predictions. ResearchGate. Available at Study of Epidemiological Characteristics and In-silico Analysis of the Effect of Interventions in the SARS-CoV-2 Epidemic in India Projections for COVID-19 and evaluation of Epidemic Response : Strategies for India Forecasting COVID-19 impact in India using pandemic waves Nonlinear Growth Models Predictions for COVID-19 outbreak in India using Epidemiological models SEIR and Regression Model based COVID-19 outbreak predictions in India A Comprehensive Analysis of COVID-19 Outbreak situation in India Estimating the Final Epidemic Size for COVID-19 Outbreak using Improved Epidemiological Models A Predictive Model for the Evolution of COVID-19 Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach COVID-19 pandemic: Impact of lockdown, contact and non-contact transmissions on infection dynamics Susceptibility and Sustainability of India against CoVid19: a multivariate approach Germany and the USA on the basis of power law scaling How much of SARS-CoV-2 Infections is India detecting? A model-based estimation Fear of exponential growth in Covid19 data of India and future sketching Possibilities of exponential or Sigmoid growth of Covid19 data in different states of India Risk Assessment of nCOVID-19 Pandemic In India: A Mathematical Model And Simulation Predictions, role of interventions and effects of a historic national lockdown in India's response to the COVID-19 pandemic: data science call to arms User's guide to correlation coefficients Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Source of support: Any grants / equipment / drugs, and/ or other support that facilitated the conduct of research / writing of the manuscript ( including AFMRC project details, if applicable ) 50,00,000 cases in Jun 20