key: cord-0459308-uhlqxe3e authors: Gupta, Sanit; Shah, Sahil; Chaturvedi, Sumit; Thakkar, Pranav; Solanki, Parvinder; Dibyachintan, Soham; Roy, Sandeepan; Sushma, M. B.; Godbole, Adwait; Jaseem, Noufal; Kumar, Pradumn; Ravikanti, Sucheta; Das, Aritra; Babu, Giridhara R.; Bhatnagar, Tarun; Maji, Avijit; Mitra, Mithun K.; Vinjanampathy, Sai title: An India-specific Compartmental Model for Covid-19: Projections and Intervention Strategies by Incorporating Geographical, Infrastructural and Response Heterogeneity date: 2020-07-28 journal: nan DOI: nan sha: 25bf1bf1db3c2507ae63f3ad6bbc5b69e105635e doc_id: 459308 cord_uid: uhlqxe3e We present a compartmental meta-population model for the spread of Covid-19 in India. Our model simulates populations at a district or state level using an epidemiological model that is appropriate to Covid-19. Different districts are connected by a transportation matrix developed using available census data. We introduce uncertainties in the testing rates into the model that takes into account the disparate responses of the different states to the epidemic and also factors in the state of the public healthcare system. Our model allows us to generate qualitative projections of Covid-19 spread in India, and further allows us to investigate the effects of different proposed interventions. By building in heterogeneity at geographical and infrastructural levels and in local responses, our model aims to capture some of the complexity of epidemiological modeling appropriate to a diverse country such as India. Epidemiological compartmental models of the Susceptible-Infected-Recovered (SIR) category and its generalizations [1] [2] [3] [4] [5] [6] [7] have proven to be an invaluable tool in modeling the population-level spread of infectious diseases. These models can often serve to study various prevention, mitigation and preparedness strategies -and can help inform the public health response to the pandemic. The limitations of such modelling often arise from the underlying set of assumptions. One common issue in theoretical models is the inherent uncertainty of the datasets from which model predictions are derived. This issue is amplified in the Indian context by economic constraints, cultural norms and other factors that might delay or avoid seeking care. It is in this context that we construct an India-centric model to understand the qualitative features of the spread of COVID-19 and the impact of intervention policies. Our model, summarised in the section below, is a generalized SEIR model with additional compartments representing pre-symptomatic and asymptomatic populations in addition to symptomatic infected individuals. In order to explicitly model uncertainties in testing rates, we introduce a new compartment which accounts for infected populations who have been tested and found to be positive. This, in conjunction with estimates of true infection numbers, allows us to forecast the spread of the epidemic with a greater degree of accuracy. Furthermore, we incorporate methods from estimation theory to mitigate the uncertainty in the initial data and the uncertainty in the fitted parameters of the model. One of the major drawbacks of the SIR class of models lies in the assumption of well-mixed populations. In the context of a large and heterogeneous country such as India, such assumptions are clearly not valid at a national scale. Our meta-population model assumes a well-mixed population at a district or state scale and then uses a novel technique to construct a district/state level transportation matrix that connects different districts or states. The transportation matrix is constructed using available Census data [53] and allows for the generation of district-wide predictions for the entire country. These models can be used in conjunction with agent-based models at a finer scale such as ward or city level modeling in order to build a comprehensive picture of disease spread. Compartmental models can help project the qualitative features of epidemic spread. We note that the quantitative projections depend on the choice of parameters, and hence should be interpreted with caution. The value of this model lies in understanding the effect of different proposed interventions and how local heterogeneous responses to the epidemic, realistic measures of transport connectivity, and infrastructural and healthcare status of different states, can affect the spread of the disease at a national scale. In the rest of this section, we briefly summarize important related work related to compartmental models. Extended SIR models of COVID-19 disease transmission in different countries in the world including India have been studied previously . In [ 16 ] , a modified SIR model with a new additional compartment that quantifies the symptomatic quarantined infected people is considered to study the COVID-19 outbreak in Mainland China. The references [ 17 ] and [ 18 ] study the impact of COVID-19 in Australia and India respectively by an age-stratified transmission model. The role of the health care system and clinical capacity in reducing the COVID-19 morbidity and mortality rates are analyzed in [ 17 ] . The authors of [ 18 ] explore an extended SEIR model to study the effect of mitigation strategies such as social distancing and testing-quarantine and show that the testing-quarantine strategy yields better results compared to the implementation of social distancing alone, and also the testing-quarantine strategy is more sustainable than a prolonged lockdown. As there are a large number of asymptomatic carriers, the authors have introduced a separate compartment for them. However, their model assumes instantaneous implementation of the lockdown which is generally not practical. In our paper, we have incorporated a more realistic scenario, a gradual implementation of the containment or lockdown. The basic model structure is summarized in Fig. 1(a) . One of the major difficulties in the detection and treatment of Covid-19 is that a large proportion of those tested positive seem to be asymptomatic [8] [9] [10] [11] , with studies claiming up to 50-75% of the population is asymptomatic. However, since the disease has a long incubation period (5.1 days [8] [9] ), these asymptomatic infected can spread the disease to the susceptible population even when they show no symptoms of the disease [ 17 ] . Further, existing studies suggest that some proportion of the infected population may never develop symptoms over the course of progression of the disease. While there is some debate over whether these true asymptomatics can act as carriers for the infection, studies have suggested that they may still act as carriers albeit with lower infectivity [ 12, 23 ] . In order to model this significant proportion of asymptomatic infected population, we introduce two new compartments -True Asymptomatics (E) who never develop symptoms over the course of progression of the disease; and Pre-symptomatics (A) who are currently asymptomatic but will develop symptoms after some incubation period. The pre-symptomatic population become symptomatic infected (I) with some rate (where is the σ σ inverse of the mean incubation period, taken in this study to be 5.1 days). One of the major interventions that had been undertaken in India is the imposition of an unprecedented national lockdown since the 26th of March, 2020 to 31st May, 2020 in stages. Following that, there has been a heterogenous lockdown imposed in various parts of the country by state agencies. Any modeling effort to predict, even qualitatively, the future time course of the epidemic in India, must take into account the effect of this lockdown. We introduce shadow compartments for the population in lockdown corresponding to the Susceptible ( X S ), Asymptomatics ( X E ), Pre-symptomatics ( X A ), and the Symptomatic Infected ( X I ) categories. On the date of imposition of the lockdown, the population starts moving from their normal compartments into the corresponding lockdown compartment with a rate . Similarly, on lifting lockdown, people move κ 0 back from the lockdown compartments to their normal compartment with a rate . In this study we μ have assumed the timescales for this population movement from the normal to the lockdown compartments to be one week in both cases ( and are the inverse of this timescale). κ 0 μ We introduce a new compartment for people who have been tested positive for Covid-19 (P). One of the major public health challenges in India (and indeed, worldwide) has been possibly low numbers of people who have been tested, in comparison to projected estimates of "true" infected numbers. Since the accurate estimation of the real magnitude of the infected population critically affects transmission and hence disease progression, we introduce this new P compartment to reflect shortcomings in the testing criteria as well as testing kit availability and the overall healthcare infrastructure. People who have been diagnosed as positive for Covid-19 and hence quarantined in isolation or a medical facility are removed from the general population and hence contain the spread of infection. This population can only infect the healthcare providers, which is incorporated in our model by a term proportional to the fraction of the population who are healthcare workers [12] . The ratio of the number of people who have been tested positive to the "true" number of the infected population gives a measure of the testing fraction , which then influences the rates at which people travel from the disease f test compartments ( E, A, I, X E , X A , X I ) to the tested positive compartment. As such this testing fraction is a critical component of our model as it determines the fraction of the population moving to the P compartment and hence disease progression at a population level. Unfortunately, systematic and accurate estimates of the testing fraction is a complex problem in its own right and is beyond the scope of this paper. Instead we use a combination of statistical inference and empirical methods to arrive at plausible values of the testing fraction for each state. We assume that Kerala, as the state with the best healthcare indicator in the union, has an accurate reporting of Covid-19 related mortality. We then use estimates of the mortality to recovery ratio [ 54 ] to arrive at a projected value of the testing fraction for Kerala, given by , which implies that Kerala identifies 1 in every 2.5 infected people. We .4 f test KL = 0 then assign a testing fraction to the individual states by benchmarking their testing performance and healthcare index from the National Health Mission to Kerala's according to the empirical formula. The details of this procedure and the implications for the testing numbers of each state are detailed in the Appendix. In order to account for different social and economic constraints on different age groups, as well as the well documented age-based differential Covid-19 mortality and severity, we consider a coarse age-stratification within our model where this basic model structure is replicated across three age groups -(i) 0-20 years, (ii) 20-60 years, (iii) >60 years. Such a coarse graining reflects three qualitatively different groups, namely young children and school going adults, working age adults and adults who are more likely to be retired and/or who stay at home. The contact matrices are aggregated according to a weighted average using the age distribution of the Indian population. The contact matrices between these three age groups are obtained from literature and sub-divided into contact matrices corresponding to home, school, work, and other interactions [13] . This stratification of contact matrices allows for exploration of different intervention strategies. The susceptible population is infected when it comes into contact with the population in any of the disease compartments, with an inherent state-specific transmission rate . Within the lockdown period, the home contact matrix is unaffected, while the school contact matrix is completely switched off. However transmission between the population in lockdown and population not in lockdown (essential workers and other partially open sectors) continues via the contact matrices corresponding to work and other interactions, albeit by a reduced factor of . Likewise interactions between (0 ) 1 ≤ 1 ≤ 1 two individuals (work and other interactions) in lockdown is reduced by a factor of . This 1 2 parameter models the leakiness of the lockdown, with indicating a perfect lockdown, whereas 1 1 = 0 indicates a completely leaky lockdown. Thus in lockdown, disease transmission is not 1 = 1 completely halted, but rather continues with a reduced contact matrix, which reflects the realities of a complex Indian society. Infected people can move from the disease compartments either directly to the recovered (and removed) compartment, or they can first move to the tested positive compartment (P) and thereafter to the recovered (R) compartment. We extract mortality projections from the R compartment by using previously determined estimates of age-stratified mortality for Covid-19. The meta-populations at a district or state level were allowed to come into contact with each other by estimating the number of people commuting across district boundaries. For this, an inter-district transportation matrix representing the expected number of people commuting across the district border was developed. During biological disasters like COVID-19, the home to work commutes would prevail. Though the comprehensive mobility plan for major Indian cities are readily available, transportation planning studies for the rural districts are hard to find. So, to estimate the number of workers commuting across a district boundary, we used the surrogate measures such as the district level aggregated home to work trip length, worker population density and geographic features. The home to work trip length and worker population details were obtained from the 2011 census data [ 53 ] , and the geographic features of districts were assimilated in GIS from various sources [ 56 ] and verified with available information [ 57 ] . Suitable traffic analysis zones (TAZ) were developed by concentric segmentation of the district GIS maps [ 60 ] . A trip length frequency distribution developed from the home to work trip length details was used to estimate the trips across the district boundary from each TAZ. Such trips for all the TAZs of a district were added to represent the worker population expected to commute beyond the district boundary. We assumed the work-related commute not to be beyond the adjoining districts and hence, used the relative density of worker population of the adjoining districts to estimate the number of workers commuting to that district. For this estimation, the GDP (at purchasing power parity) per capita of the districts is a better choice of parameter, but was not readily available for all the Indian districts at the time of this study. A detailed description of the process is available in the Appendix E. There are two "measurements" that must be matched with our model to make predictions about the future of the disease progression. The first of these is the rate associated with mortality, which happens on average about 17 days [14] after the onset of the disease and the second is the number of people that test positive on any given day. Since these two measurements are inherently unreliable, owing to (among other things) socio-economic reasons which might delay seeking care, the data must be treated as inherently noisy. This noisy signal would then lead to variations in the projections of the model. We incorporate this into our model by using an extended Kalman filter (EKF) [15] to estimate the future trajectory by extending the simulation. The EKF is a Bayesian update method that incorporates the measurement errors and allows us to make predictions consistent with the uncertainty in the data. We used the age dependent mortality data available in literature to model the mortality predictions used in the EKF. While our model can simulate a variety of intervention strategies, we propose and investigate two intervention strategies in addition to the base (control) scenario for India. We detail these intervention strategies below: • Base scenario: We used the initial date for the national level lockdown in India which was the 3rd of May, 2020 as the baseline scenario. As a control simulation, we assumed that lockdown ends on this date, and subsequent to this period, all activities return to normal throughout the nation. The predictions for this scenario can then be used to assess the effectiveness of different proposed interventions. • Intervention 1 -Enhanced testing: In this scenario, the national lockdown ends on the 3rd of May, 2020. Preparatory to this, all states undertake efforts to enhance testing capabilities and criteria on an emergency basis. We divide states into three categories based on their Health index score by the National Health Mission. This supplies a ranking of the states based on their health care preparedness and infrastructure. We then propose a differential target for each state by benchmarking to the testing performance of Kerala, to be achieved by the 3rd of May. The top third of the states achieve Kerala's testing fraction, the middle third achieves 50% of Kerala's testing rate, while the bottom third achieves 25% of Kerala's testing rate. The increase in the testing fraction is assumed to be linear. The details of the categorization of states and hence their proposed target testing rates are provided in the Appendix. • Intervention 2 -Heterogenous lockdown: In this scenario, the national lockdown ends on the 3rd of May, 2020. Subsequent to this, schools and colleges remain closed until the end of June, and hence their contribution in the age-stratified contact matrix is switched off. The contact matrices corresponding to work and other interactions are switched on at 50% of their base value, corresponding to continued and stringent enhanced physical distancing measures, public health precautions and enhanced work-from-home situations, wherever possible. The transport matrix is switched on at 50% of its base value, again reflecting a reduction in non-essential travel. If despite these measures, the total tested positive numbers in any state crosses 0.01% of the state's population, the state goes into immediate lockdown. Transport between this affected state and others are completely switched off, the contact matrices corresponding to work and other interactions drop to 20% of their base value within the state. These restrictions continue until the end of June, and testing fractions continue to hold at their currently determined levels. Though the two proposed interventions are simulated assuming only the first lockdown till March 03, 2020, the purpose of studying the results is to ascertain the effect of transportation and heterogeneity in managing the cases of COVID-19 across the country while minimizing economic impact, which disproportionately affects those whose contact matrices cannot be reduced due to economic necessity. Subsequent to this India went through differing lockdown protocols, which were not simulated in this work. This choice is in part due to the ongoing national and state-level interventions that are evolving rather dynamically even at the time of this writing. The predictions of these interventions should thus be interpreted in a qualitative fashion, taking into account the various assumptions and limitations of the model we have detailed. Fig. 2 : National level projections of (a) total infected numbers, and (b) predicted mortality estimates from the three scenarios described in the text. Note that the predicted mortality figures show the cumulative numbers. The enhanced testing intervention, where states proportionately improve their testing performance in line with their respective Health index scores provides a measurable improvement in the total infected numbers, and almost a 30-35% improvement in projected mortality. The heterogenous lockdown situation, flattens the curve even further, given that the proposed measures are extremely stringent. Also note that this intervention pushes the peak later in time, beyond the current simulated timeframe. We first present the results of the base scenario and the two proposed interventions from a state-level national simulation. Figure 2 shows the aggregate national results to provide a comparison of the efficacy of the different interventions. The first intervention -corresponding to enhanced testing with a differential target for each state based on its health index scores -provides a measurable improvement in the total infected numbers, bringing it down from 35% at peak to around 25% at peak. The cumulative number of projected deaths drops from around 6 million to around 4.2 million. While testing at higher rates would yield even better results, for example if all states attempted to measure up to the Kerala rates, this intervention is proposed as an achievable target given the current state of the healthcare infrastructure. The heterogeneous intervention -with reduced contacts and reduced transportation -achieves a major two-fold improvement. On the one hand, it brings down the total number of infected numbers and the total mortality projections because of stringent restrictions. At the same time, it shifts the peak of the curve significantly to the right, delaying the onset of peak, and thus allows the healthcare infrastructure of the country time to set in place better testing and other health infrastructure to manage the pandemic. Under this scenario, the cumulative mortality is at 1 million at the end of August, an almost six-fold improvement on the no-intervention scenario. Note however, that the peak of the infection curve is yet to manifest at this point, which implies that mortality would continue to rise in the absence of any other medical or social interventions. These statistics can help interpret the differential effects of the different states under the proposed interventions. First, we discuss the baseline no-intervention scenario. Kerala has some of India's best healthcare and socioeconomic indicators alongside a robust public healthcare system. The large testing rates observed thus far in Kerala are reflected in the large testing fraction chosen in our model. This causes all three scenarios to have modest increases in disease progression (in % population) in comparison with other states. For instance, in the no-intervention scenario, the peak infected numbers are about 6%. This is contrasted against the no-intervention scenarios of West Bengal, which peaks at 60% (note also the much larger population of West Bengal). Likewise, states such as NCT Delhi show more favourable disease progressions than neighbors such as Uttar Pradesh which have lower healthcare indices and fewer tests per million. To compare interventions, we will use the peak simulated infection as a percentage of the population of the states. Since India has on average about 0.50 beds per thousand people [ 59 ] , this is a good measure of the potential load on the healthcare system. This is important given that cities such as Mumbai and Delhi have already seen a scarcity in available hospital beds. Other measures of interest such as total infecteds and total mortality track the peak infected percentage and can be inferred from the figures in the main text and appendix. Both our proposed interventions do well in lowering simulated infection rates and lowering the infections, though their effectiveness strongly depends on the state. For instance, under intervention 1, Kerala's peak infection rate sees a reduction from 6% to approximately 3.5% whereas West Bengal sees a reduction from 60% to 35%. Gujarat's peak infections reduce from 40% to approximately 25%, once again, an overall reduction by a factor of 1.6. Karnataka on the other hand shows a dramatic improvement from 26% to 2% , which is a reduction in peak infections by a factor of 13. To understand this, we look at the testing rates and health care indices of these states. Kerala has one of the highest testing rates in the country at 0.4 in comparison with Gujarat which has a testing rate of 0.235, and both states are in the upper tertile of the health index rankings. In contrast, Karnataka has a relatively high health index ranking putting it in the upper tertile as well, though its current testing rate is quite low at approximately 0.144. Under intervention 1, Karnataka's testing rate was moved up to Kerala's value at 0.4. We believe this alongside the reduction in contact matrices explains the rapid reduction seen in Karnataka's case. To further inspect this, we look at two other states with proximate testing rates, namely Punjab and Himachal Pradesh. Himachal Pradesh has a relatively high health index score, and hence moves to Kerala's testing rate, causing the peak infection to come down from 12% to approximately 2%. In contrast to this, Punjab has a high Health index putting it in the upper tertile, and hence its simulated results should resemble Karnataka and Himachal Pradesh in showing large reductions in infection rates. We only see approximately a 2 fold decrease in the disease progression as measured by the aforementioned ratio. We believe that this discrepancy might be due to differences in the transport rate between these states. In comparison to the baseline scenario and intervention 1, our proposed intervention 2 is far more severe, where a heterogenous lockdown is imposed on a strict threshold set at 0.01% of the state's population. This period of heterogeneous lockdown is represented in the figures as the shaded region. Such a serious intervention clearly produces drastic results, reducing the simulated peak infections to a manageable 0.02% population for Gujarat and 0.001% of the population in Kerala. Likewise several other states show a dramatic reduction in the number of infections under intervention 2. The state of West Bengal is an exception to this trend, where under the strict intervention 2 there is no abetment of the spread of the infections. This is likely due to the fact that the low (existing) testing rate of West Bengal is insufficient to curb the growth of the infections. This yet again highlights the importance of testing. A major feature of our model is that capability to generate district level predictions for the entire country. As a demonstration of this level of granularity, we simulate the time course of the pandemic at a district level for the whole of India until the end of May. The data for the reported cases, recovery and mortality data was obtained till the 3rd of May, and the simulations were run until 26th of May assuming that lockdown was lifted post 3rd May, and normal life and transportation resumed to pre-lockdown levels. The results are shown in Fig. 9 . Note that, as with all the results of the model, these predictions depend on the various model assumptions and parameter values, and should only be interpreted in a qualitative sense. Note that these projections are not meant to simulate reality since the nation went through various levels of lockdown or varying heterogenous severity during this period. Our simulation aims to show that national and even state-level predictions can often mask a wide range of performances depending on the level of granularity, and therefore accurate models aiming to predict the pandemic should take this granularity into account. In addition, improved parameter estimates can help provide more accurate predictions. Also, note that for districts which have not reported any cases, one can in principle adopt two approaches to these "silent" districts -one can trust the reporting of the data, and assume there have been no cases in these districts; or one could assume that the reporting itself is uncertain, and the true numbers in these districts are unknown. We show in Fig. 9 the results corresponding to the first assumption, where we trust the data reporting. The colour map corresponds to the total number of pre-symptomatic and symptomatic infected people, as well as all diagnosed cases ( ). A + I + X A + X I + P Note that the country as a whole presents a very heterogeneous status of disease progression at a given point of time. The testing rates introduce an immediate source of variability, while the transportation matrix ensures a disparate spread of the disease once lockdown is lifted, even within districts of the same state. Note that since this full lifting of lockdown and a return to pre-lockdown contact rates is not reflective of the true situation, the model projects numbers which are much higher than the true reported number of infections. A faithful prediction would involve incorporating time-varying heterogenous interventions for different districts, as was done. While our model is capable of incorporating such heterogeneous interventions, we do not choose to attempt to simulate this since the Fig. 9 : District level predictions of the projected number of cases at the end of May. The total number of infections are reported as the sum of the Presymptomatic (A), Symptomatic Infected (I), the corresponding lockdown compartments (X A and X I ), and the diagnosed population (P). We do not show the true Asymptomatic population (E and X E ) since they never display symptoms and hence would not be a burden on the healthcare infrastructure. policy for individual states was often unclear. Another simplification we have made in the current model is that the transmission rate is assumed to be the same for all districts within a given state, and was determined by fitting the state data to the model. For accurate projections, the data for each district needs to be fitted individually. We choose not to do this since many districts have very few cases such that an accurate determination of the transmission rate is not feasible. We now turn to quantifying the effect of our mobility matrix on the spread of infection in the country. We simulate the country at a district level using input data until the 3rd of May, and then run our simulations until the 26th of May, under two scenarios -with the transportation matrix switched on, and with the transportation switched off. All other conditions, including resumption of all contacts to pre-lockdown levels were kept constant between the two simulations. The comparative district wide plots on the 5th of May and 26th of May are shown in Fig. 10 . For the 5th of May, since this is immediately after lockdown, the differences in the transportation vs no. transportation scenarios is minimal, as can be seen in Fig. 10(a) . However, with time, the unequal mixing due to resumed mobility of the population kicks in, leading to drastic differences in infected numbers, shown in Fig. 10(b) . We wish to highlight two important features that can be attributed to the effect of transportation. Firstly, there is an overall rise in the number of cases due to mixing of the susceptible and infected populations due to transportation -as is expected. More interestingly, there are emergence of clusters with very high infection numbers when the transportation matrix is switched on. Some such sample clusters are marked in the figure. This highlights how transportation can bring in new infections and cause a growth of cases in areas that would otherwise have the epidemic under control. Note that this effect is heterogeneous -there are areas in the country where the infection stays within control even with transportation. This highlights the inherent disparity in transportation patterns, and suggests that there are more effective methods of epidemic control instead of blanket bans of travel. Transportation may be allowed in different areas at different rates while still keeping the spread of the epidemic in check. Our model aims to provide a framework within which to incorporate multiple levels of heterogeneity appropriate to a country such as India. Accurate quantitative projections require dynamic updating of the model to take into account time variations of parameters such as testing rates, higher quality and more reliable data, and clarity regarding heterogeneous policy implementations at the district and state levels. Nevertheless, in order to benchmark the basic accuracy of our model, we show a comparison of the predicted trends under the base scenario and the two proposed interventions in this manuscript with the actual reported data for the number of diagnosed Covid-19 cases. Note that for our simulations, the reported data for positive population and the Covid-19 associated mortality were incorporated until the 3rd of May, and the simulations were then run until the end of August. The here. The data is till 15th July, 2020, while the predictions were generated using a dataset till 3rd May, 2020. reported data was considered until the 15th of July. The comparison between the different scenarios is shown in Fig. 11 . The data qualitatively agrees with the predicted trend under intervention 2, which corresponds to a heterogeneous lockdown scenario with reduced contacts due to stringent physical distancing and public health measures, reduced nationwide transportation, and local lockdowns depending on the severity of the outbreak. We note that this is the scenario which agrees strongly with the actual policies implemented in India post the end of the national lockdown on the 3rd of May. The base scenario considers the situation where post-May 3rd, the nation returned completely to pre-lockdown levels of contacts and transportation. This was evidently not the case, as India went through varying levels of lockdown post May 3rd, and varying levels of unlock protocol as well, post June 8th. The first intervention -again corresponding to resumption of normal life post May 3rd, with the addition of health-performance based enhanced testing, also does not correspond to the real-life situation for similar reasons. Thus it is expected that the base scenario and intervention 1 predicts much higher infection numbers compared to the actual data. The broad qualitative agreement between the predictions of the second intervention with the actual data serves as a strong consistency check on the background assumptions and methodology of the model, even in the absence of implementation of granular district level intervention policies and testing rates. Incorporation of these granular details in the model can be used to generate more accurate quantitative predictions to guide future policy. We use a simple age stratified model in order to capture differential disease outcomes, intervention strategies, and transportation effects. We use three age categories -people under 20 years, people between 20 to 60 years and people above 60 years. While a finer stratification is available, we chose these three coarse grained age stratifications in order to minimize the number of compartments, and hence parameters in the model, while at the same time ensuring that one can test different India specific interventions. One common strategy is to ensure closure of schools and colleges across the country for an extended period of time, which affects the 0-20 age bracket. Further, the inter-district transportation rates were computed for the worker population of a particular district, and hence these were incorporated only in the 20-60 year age bracket. Further, projected mortalities were estimated using an age-dependent mortality rate, as is relevant for Covid-19. An important ingredient in our model is a state-wide identification of testing rates. This is important because it allows us to build in a state-level heterogeneity that incorporates -(i) the Health Index and (ii) the number of covid tests per million population. The Health index provides a measure of the public health infrastructure of states, including hospital bed availability, number of ASHA and ANM personnel and other relevant health indicators. This is important because the state of the public health infrastructure constrains the response of a specific state. Contact tracing and other measures depend on the state of health preparedness of the state and hence is an important metric that must be taken into account. The second metric, of the number of Covid-19 tests depends on the testing policy and on availability of testing kits and also affects identification and quarantine efforts. It should be emphasized that the numbers for the testing rates may not be the "true" numbers, however, this empirical estimate helps capture the heterogeneous response in different states of the country and hence critically affects outcomes and projections at a national level. Finally, we note that the testing rates are taken as (static) numbers in our model, whereas they are perhaps more accurately described as time series. Incorporating the time series as a determinant of the testing rates can help provide more quantitative prediction for the progression of the epidemic. However, we note that there have been consistent concerns about the accuracy of the data reporting, and further, that accurate quantitative projections require granular details of local intervention policies, which are currently not available for the nation as whole. Since our aim was to provide a framework of incorporating heterogeneity in testing rates, we choose to restrict the model to static rates for the purposes of the current analysis. An important contribution of the model is to estimate and construct a national transportation matrix at a district-level from available census data. Our model simulations show the importance of transportation in the spread of the epidemic and underscores the importance of incorporating this in any theoretical study. Although the transport network in this paper is constructed by inference, as described in the methods section, future work should focus on building real data-driven mobility networks for the entire country at a granular level. We also note that the national level lockdown also set in motion an unprecedented large scale movement of migrant labourers throughout the country. Similar large scale migration also happened with the onset of the unlock protocols. The transportation matrix proposed in this work builds on mobility patterns in normal times and does not take into account this large scale migration. Quantitative estimates of the patterns of this migration can help in improving the model and producing better quantitative predictions. The limitations of our model should be emphasised to interpret the results in the appropriate context. The first limitation of our model is the fact that compartmental models are built on the assumption that the underlying population is well mixed. Though this is somewhat mitigated by the fact that our model has structure (age structure and geographic specificity), the mixing assumption still implies that we have fluctuations not captured by our model. Furthermore, the aggregation of the contact matrices into three age bins is an approximation to the true dynamics. We also include the health indices of each state into our model by the assumptions detailed in the description of the interventions. The separation of all states into three groups assigned to execute intervention policies is a further approximation that allows us to incorporate the health indices in a straightforward fashion. Simplifying assumptions were also made about the infection rates. The district infection rates are pegged to the state infection rates and the coefficient that models the inefficiencies in the lockdown are pegged to the national/state level in the appropriate simulation. Another assumption of our model is that the testing rates are not functions of time. Besides this, our model is built on simplifying assumptions about modelling various parameters via their median values, though this can be generalized easily. For instance, we match model predictions against reported mortality by assigning 2% of the simulated infections to become fatal in a median of 17 days. Both these numbers are better approximated as probability distributions. These distributions are often long-tailed, and hence a mean parameter value may provide poor projections. A stochastic version of this model which incorporates these underlying probability distributions would be more relevant from a predictive approach. Finally, we made the simplifying assumption that the noise model for the measurement and state errors in the extended Kalman filter are Gaussian as described in the methods section. We presume that this is adequate for large populations but highlight it here for the sake of completeness. Our model presents a comprehensive compartmental model that is relevant in the context of India. We take into account a detailed generalization of an SEIR model that accurately reflects the disease dynamics of Covid-19. In addition, we use state level health metrics and Covid-19 responses to model the heterogeneous spread, which is essential for a country of the scale of India. Further, our district-level transportation matrix uses Census data to project the number of people coming into and moving out of a particular district and hence builds in a geographical heterogeneity. Our proposed interventions must then be interpreted against the background of this realistic national model. We show how a combination of various strategies may be utilised to emerge from this pandemic while minimizing the human and economic costs of the pandemic. We hope that our model is relevant in the context of organising a more comprehensive solution to the ongoing pandemic in India. Our model, which is documented with open source code made available online [ 50 ] can be modified to qualitatively model both the spread of the disease and the economic cost of interventions. Furthermore, in addition to age and district level stratification our model may be generalized to include income groups. Such a generalised model would potentially address how different socio-economic groups are affected by the pandemic and the onset of the loss of income due to lockdown. It is generally accepted that the optimal strategy to control the pandemic in any region with an active lockdown is to use that time to reinforce testing facilities, hospital beds and personal protective equipment. We hope that our model plays some role in assessing the risks relative to the rewards in combating COVID-19. In conclusion, we would like to end with a caveat. For a country of the scale and complexity of India, it is impossible for any single model to predict accurately the course of an epidemic. Local responses, super-spreader events, time varying strategies and interventions, and other spatio-temporal fluctuations, all play an important role in the final trajectory of the epidemic. Models such as these and others, can only serve as a guiding tool to policy-makers. However, a holistic approach to the pandemic requires a combination of expertise from public health officials, health-care workers, epidemiologists, sociologists, and economists, in addition to modelers, to come together to determine the most effective response, contingent on the available data. This present work should be taken in the spirit of being only one component in this multi-faceted approach, and should not be considered in isolation. The meta-population model presented in this manuscript can be simulated at a district or state level. Let, denote the susceptible population in the age group in the district/state. is the total number of states and union territories in 6 N s = 3 India. The dynamics of the susceptible population is given by, The susceptible population becomes infected on contact with infected individuals (asymptomatic, presymptomatic, and symptomatic) across all age groups. denotes the i inherent infectivity for the district. We assume that the true asymptomatics infect with a i th lower infectivity ( ) than presymptomatic or symptomatic individuals. The diagnosed b 1 < 1 population can only infect healthcare workers and this lower infectivity is captured by the parameter . The contacts between different age groups were decomposed into home, (≪ ) b 2 1 school, work, and "others" and were calculated using the data in Prem et. al. [ 49 ] . For the three tiered age structuring in our model, the contact matrices are matrices and are given by, The total contact matrix is the sum of all four of these constituent matrices, C = C home + C school + C work + C others Since all lockdowns are imperfect, the susceptible population can come into contact with the infected compartments in lockdown as well. This contact is however reduced from the non-lockdown contact matrix for work and other contacts by a factor , which then C 1 indicates the leakiness of lockdown. We set the school contact matrix to zero since schools are completely closed during this period. This lockdown contact matrix is then given by, . The second term models the mobility of the working population into and out of a particular district. This transport term is effective only if lockdown is not active, which is implemented by the prefactor where if and zero 1 (t, , )) ( − Ξ T L T E (t, , ) Ξ T L T E = 1 T L ≤ t ≤ T E otherwise, and and denotes the start and end dates for the national lockdown.The T L T E mobility or transportation matrix denotes the number of people in age group who M α ij α leave from district to arrive at district per day. Our transportation matrix, as detailed j i subsequently, is calculated for the working age population, and hence . The final two terms in the time evolution of the susceptible population equation models the rate at which susceptible population locks down. The lockdown is effective from time k ) ( 0 to . After the lockdown is lifted at time , the population returns to a susceptible T L T E T E state at a rate , where represents the usual Heaviside function. μ Θ Now, of the susceptible population who become infected, we assume that a fraction are the f true asymptomatics (E), i.e. people who will never show any symptoms over the entire progression of the disease. The time evolution of this compartment is given by, Note that this equation closely mirrors the dynamics of the susceptible population, with a fraction of the newly infected arriving in the E compartment. The transportation matrix f accounts for inter-district/state travel, and the and terms reflect the imposition and k 0 μ lifting of lockdown. The penultimate term models the rate at which these true t ′ asymptomatic people are detected. This can be achieved by contact tracing of known infected individuals. The final term models the recovery of these individuals at a rate which is the γ 1 inverse of the recovery time for the asymptomatic population. The remaining fraction of the population transitions into the presymptomatic (A) 1 − f category. The time evolution of this compartment is given by Note the testing of presymptomatic individuals is achieved at a rate of and these t ′′ presymptomatic individuals progress to the symptomatic stage at a rate which is the inverse σ of the mean incubation period of the disease. The presymptomatic people develop symptoms and move to the symptomatic infected (I) compartment. The infected population can be diagnosed via positive tests at a rate or they t may recover directly without having been diagnosed at a rate . In addition they transition to γ 2 the lockdown compartments and vice versa, and also travel from other districts or states, as in the previous cases. The time evolution of this compartment is then described by, We now turn to the time evolution of the shadow compartments corresponding to the lockdown population. Since lockdowns are leaky, as discussed previously, the susceptible population in lockdown can come into contact with the non-lockdown population with the contact matrix and with other lockdown population with a further reduced contact matrix C L given by . Note that the lockdown is assumed not to C LL (C ) C L = C home + 1 2 work + C others impact home contact matrices at all. Note that for the population under lockdown, the mobility terms do not appear in the time evolution equations, since this population, by definition cannot travel between districts. Analogously, we can write down the evolution equations for the and , X , X E A X I compartments as follow, Finally, we turn to the compartments for those people who have been diagnosed as positive for Covid-19 (P) and the recovered/removed (R) compartment. For the P compartment, the total number of positive cases is the sum of all people who have tested positive from the , , E A I compartments and their lockdown counterparts . These people recover at a rate , , X E X A X I given by . Thus, we have, γ 3 People can recover either from this diagnosed compartment (P) or directly from the symptomatic and true asymptomatic compartments ( without having been , , , ) I X I E X E detected over the course of progression of the disease. This recovery compartment then evolves according to the equation, This set of equations completely define the time evolution of the model. At each meta-population level (district or state), there are compartments -population 0 3 0 1 compartments, times age stratifications for each of them. In subsequent appendices, we will 3 discuss the choice of parameters for which this model was simulated. The model presented here is necessarily complicated, given the complex disease trajectory of SARS-CoV-2 as well as the complex societal and geographical realities of India. The parameters of the model sensitively determine the disease trajectory and the geographical spread of the disease. We have used accepted parameters for the disease progression from the literature, while India-specific parameters have been estimated by a variety of methods as detailed below. Note that different parameter choices will lead to different projections from the model. Hence the numbers reported in this analysis should be interpreted as qualitative trends only, in order to gauge effectiveness of different interventions and the qualitative projected severity of the disease. This model will be continuously updated as and when better estimates of the parameters become available. The basic infectivity parameter. This was determined individually for each state by fitting the available data. The full list of the parameter for each state is listed in the subsequent Table σ 1/5 The rate of progression from pre-symptomatic to symptomatic individuals. This is the inverse of the mean incubation time of the disease and is taken to be 5 days. [8, 9] India specific parameters 1 The leakiness of lockdown. Indicates the reduction in the workplace and others contact matrices under lockdown. This was determined individually for each state by fitting the available data. The full list of the parameter for each 1 state is listed in the subsequent Table. b 2 0.002 The confirmed positive people (P) can only infect health-care workers, since they will be in strict quaran-tine/hospitalization. As a proxy for this parameter then,we can use the estimate of the number of healthcare work-ers in India, which is 20 per 10000 population [ 12 ] k 0 1/7 This represents the rate at which the general population goes into lockdown. In the absence of concrete data, we assume a timescale of one week. Note that different values of this rate will not affect the disease progression, but will only shift the projected curves. [See A.Dhar [ 18 ] ] μ 1/7 This represents the rate at which the population exits lockdown. Again in the absence of data, we assume a timescale of one week. The testing rate for each state is estimated using a semi-empirical method. We detail the calculation of this testing rate below. The testing rate was then used to obtain estimates of and . , Testing rates: In order to compute the testing rates, we follow the procedure outlined in Ref. [ 58 ] . The case fatality ratio is defined as . Let, D(t) denote the F R C = N umber of deaths N umber of deaths + N umber of recovered number of deaths at time t, and, r(t) denote the number of recovered people at time t Let us define the ratio, . The CFR can then be estimated from this ratio as ρ = r(t) . However, because of under-reporting and testing criteria, if the true number of F R C = ρ 1+ρ infected I(t) is uncertain, then the true number of recoveries is also uncertain. An analysis for various countries shows that is likely to be a biased statistical estimator for the CFR. ρ If we assume that the reporting of deaths is error-free, and that the true number of recoveries is , then If we assume a CFR of 0.023 (2.3%) [2] , we can use the reported data for to estimate the testing ρ fraction . We do this analysis for Kerala only, ξ since KL occupies the top spot in the country on the Health Index and hence may be assumed to have accurate mortality reporting. This analysis is meaningful only in the initial stages of the epidemic, and hence we plot this ratio for one ρ week after the first reported death in KL. We note that while testing rates is not obtained from the data for each state, it builds in a heterogeneity in the relative response of states based on the long-term health indices, and the number of covid tests conducted by each state, and hence can be assumed to be a accurate metric of the relative performance of different states. For the district level simulation, we assume all the districts for a particular state have the same testing rate. In order to calculate the mortality rates, we first determine the population distribution of each state into ten-year bins. The mortality rates of each of these 10-year bins are taken from the literature [ 55 ] and are listed in the table below. We then appropriately weigh the population distribution of each state in order to determine a state-specific mortality rate for each of the three age groups in our model (0-20, 20-60, >60). We will consider the disease compartments of the model described above without the lockdown compartments to calculate the basic reproduction rate using the next generation matrix method. The evolution of the disease compartments are given by The usual procedure follows by evaluating the Jacobian of the sub-model evolution separated into three parts. This yields the two matrices and Basic Reproduction Number For completeness, we note two limits of the evaluated . where is the infection rate and is the recovery rate. , , f = 0 t = 0 t " = 0 b 3 = 0 parameters our model reduces to the SEIR model and A bin of our model is the E bin of SEIR model. Buth the basic difference in the SEIR model and our special case is that the infection rate for their model is while for our case it becomes Hence for our case is slightly (I) I/N = (I, ) (I )/N . A = + A R 0 different from the original SEIR model and is given as: For a given model, the process of estimating the true state of the model, given the equations for state dynamics and measurements, is a standard problem in estimation theory. The state vector of our system can be written as for the fundamental S, , , , , , , , , ) ψ = ( X S A X A E X E I X I P R unit cell of our model (which we use to outline the estimation theory). The problem can now be stated as estimating given the two noisy measurements namely and ∈ R, ψ ≥ 0 ψ i i (t ) P i mortality for each day . (t ) m i t i The state equations for a generic state vector for a nonlinear state space model, can be x expressed as: where represents the control input to the system. and represent the process noise u k η k v k and measurement noise respectively. For our system model, several modelling assumptions can be made that make the system amenable for Kalman filtering while preserving its stochastic properties. The evolution of is ψ not governed by any control inputs. We assume that and are additive noises to the η k v k respective equations. The state transition equation is further discretized and the cumulative effect of the process noises is incorporated within an additive noise . Hence, the state w k equations of the discrete-time nonlinear system described by is: ψ where The disturbances and are assumed to be Gaussian, that is, The function , is obtained from the time evolutions of each (ψ(τ)) f ∈ {t , t } τ k k+1 compartment provided earlier in Section A. We obtain by numerically integrating the (ψ ) F k differential equation in real time. The measurements are composed of the compartment that tested positive as well as the mortality. The latter is calculated heuristically based on the compartments of symptomatic people, and on the mortality rate which depends on the age bin and the state. Hence (ψ(τ)) h can be expressed as: We define the observer , which is the estimate of the state vector at time instant , ψ k t k with the following state dynamics and measurement model: The observer state vector is a multivariate random variable, as is the case with every kind of optimal estimator. For this system, it is of multivariate Gaussian form and has a covariance matrix . The prediction step of the Extended Kalman Filter (henceforth referred to as the EKF) consists of the following steps: This prediction is updated based on the measurements obtained. The equations for the update step are as follows: The dataset has measurements upto 3rd May, hence for further dates, we use the predictions in the following manner: The process to develop the inter-district transportation matrix representing the population commuting between adjoining districts is shown in Figure D1 . For this, the trip length frequency distribution, obtained from the worker population trip length recorded during the 2011 census data collection, was used to estimate the probability density function of trip length. Worker population ( ) residing at distance from the district boundary may P W DB D either commute towards the district boundary or in directions not leading to the district boundary. We assumed half of them would travel toward the district boundary. When commuting towards the district boundary, workers would cross the boundary if their trip length ( ) exceeds the . Hence, workers expected to commute across the district L T DB D boundary can be obtained from . The GIS map of each district was segmented as shown in Figure D2 to discretize the . DB D Each of the segmented zones represents the traffic analysis zone (TAZ). The adopted DB D for each TAZ were 1 km, 5 km, 10, km, 20, km, 30 km and 50 km and the total worker population ( ) expected to commute across the district boundary was estimated from T P i Equation D1. The adjoining districts would attract the commuting workers based on its economic status. A district level GDP (at purchasing power parity) per capita can be a good economic status indicator. However, in absence of that we adopted the relative worker population density as a surrogate measure to estimate the workers commuting to a particular adjoining district. Equation D2 is used for the purpose. This process is repeated for all the districts of India to come up with the inter-district transportation or mobility matrix. A contribution to the mathematical theory of epidemics Compartmental models in epidemiology The mathematics of infectious diseases Scaling for dynamical systems in biology A generalization of the Kermack-McKendrick deterministic epidemic model An introduction to infectious disease modelling An introduction to compartmental modeling for the budding infectious disease modeler The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) Covert coronavirus infections could be seeding new outbreaks Situation analysis of the health workforce in India Age-structured impact of social distancing on the COVID-19 epidemic in India Accessed 10th Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle Effective containment explains sub-exponential growth in confirmed cases of recent COVID-19 outbreak in Mainland China Modelling the impact of COVID-19 in Australia to inform transmission reducing measures and health system preparedness Covid-19: analysis of a modified SEIR model, a comparison of different intervention strategies and projections for India With COVID-19, modeling takes on life and death importance A simple planning problem for COVID-19 lockdown Optimal COVID-19 quarantine and testing policies Modelling the COVID-19 epidemic and implementation of populationwide interventions in Italy Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand Optimal mitigation policies in a pandemic: Social distancing and working from home Why is it difficult to accurately predict the COVID-19 epidemic? Infectious Disease Modelling A modified sir model for the COVID-19 contagion in italy Data analysis and modeling of the evolution of COVID-19 in Brazil Modelling transmission and control of the COVID-19 pandemic in Australia Age-structured impact of social distancing on the COVID-19 epidemic in India Modeling and predictions for COVID-19 spread in India Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model Multi-city modeling of epidemics using spatial networks: Application to 2019-nCov (COVID-19) coronavirus in India Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach COVID-19: Mathematical modeling and predictions Estimation of the reproductive number of novel coronavirus (covid-19) and the probable outbreak size on the diamond princess cruise ship: A data-driven analysis Prediction models for diagnosis and prognosis of COVID-19 infection: systematic review and critical appraisal Insights from early mathematical models of 2019-NCoV acute respiratory disease (COVID-19) dynamics". arXiv: Populations and Evolution Estimates of the severity of coronavirus disease 2019: a model-based analysis An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China Early dynamics of transmission and control of COVID-19: a mathematical modelling study An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov) Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures Predictions, role of interventions and effects of a historic national lockdown in India's response to the COVID-19 pandemic: data science call to arms Modeling and forecasting of the COVID-19 pandemic in India INDSCI-SIM A state-level epidemiological model. for India COVID-19 Epidemic: Unlocking the lockdown in India Effectiveness of Testing, Tracing, Social Distancing and Hygiene in Tackling Covid-19 in India: A System Dynamics Model An alternating lock-down strategy for sustainable mitigation of COVID-19 Projecting social contact matrices in 152 countries using contact surveys and demographic data An India-specific compartmental model for Covid-19 Positive RT-PCR test results in patients recovered from covid-19 Report of the who-china joint mission on coronavirus disease 2019 (covid-19) Census data COVID-19 Coronavirus Pandemic The Epidemiological Characteristics of an Outbreak of 2019 Novel Coronavirus Diseases (COVID-19) -China, 2020 Estimating the number of COVID-19 infections in Indian hot-spots using fatality data Hospital beds (indicator) Activity-based travel demand models: a primer