key: cord-0186897-1cvqsqpn authors: Ghatak, Anirban; Patel, Shivshanker Singh; Bonnerjee, Soham; Roy, Subhrajyoty title: A Generalized Epidemiological Model for COVID-19 with Dynamic and Asymptomatic Population date: 2020-11-19 journal: nan DOI: nan sha: 51f7383991b19d68ab121d15d9b6af2ca237d257 doc_id: 186897 cord_uid: 1cvqsqpn In this paper, we develop an extension of standard epidemiological models, suitable for COVID-19. This extension incorporates the transmission due to pre-symptomatic or asymptomatic carriers of the virus. Furthermore, this model also captures the spread of the disease due to the movement of people to/from different administrative boundaries within a country. The model describes the probabilistic rise in the number of confirmed cases due to the concomitant effects of (incipient) human transmission and multiple compartments. The associated parameters in the model can help architect the public health policy and operational management of the pandemic. For instance, this model demonstrates that increasing the testing for symptomatic patients does not have any major effect on the progression of the pandemic, but testing rate of the asymptomatic population has an extremely crucial role to play. The model is executed using the data obtained for the state of Chhattisgarh in the Republic of India. The model is shown to have significantly better predictive capability than the other epidemiological models. This model can be readily applied to any administrative boundary (state or country). Moreover, this model can be applied for any other epidemic as well. 1 Introduction The Novel Coronavirus, or SARS-CoV-2, or as it is commonly called, COVID-19, is spreading throughout the globe in a rapid pace, and it has already caused destruction of an unprecedented scale, economically, physically, and socially. The reproducibility of the virus, the proportion of asymptomatic carrier, the absence of antibodies of the virus in human bodies, and most importantly, the lack of experience of the people in general in handling such a scenario has majorly contributed to this catastrophe. At the time of writing this paper, around 29 millions of people are infected throughout the world, and around 1 million people has already died of the virus. Although the scale of mortality may not sound extremely catastrophic considering the world population of 7.8 billion, but when one considers this to happen in a time period of 8 months, without showing any signs of slowing down in most of the world, it becomes a matter of extreme concern and emergency. A prediction is always an extremely difficult problem in situations like these where enough data points are not available and there is no history of an outbreak of contagious disease in this scale. While, in general, standard methods of learning from earlier outbreaks or outbreaks in other countries can give the researchers a decent head-start, due to various policies adopted by various governments, tracking the spread of the virus in different countries as a learning mechanism does not prove to be a fruitful task. The first documented case of SARS-COV-2 in India was reported on 30 January 2020 in the state of Kerala. As per the current count, India has the largest number of confirmed cases in Asia, and has the second highest number of confirmed cases in the world after the United States [1] . The total number of confirmed cases in India crossed the 100,000 mark on 19 May, 200,000 on 3 June, and 1,000,000 confirmed cases on 17 July 2020 [2] . The documented mortality rate in India is among the lowest in the world at around 1.8% as of 29 August 2020 and is showing a monotonically decreasing trend [2] . India has gone through several restrictive phases in order to contain the spread of the virus. The first nationwide lockdown was announced on 24 March 2020 for 21 days, and that stopped all modes of transportation within and outside of the country. On 14 April, India extended the lockdown till 3 May and that was further continued by two-week extensions starting 3 and 17 May. From 3 May 2020 onwards, the government of India has started to relax the restrictions of lockdown and allowed migrant workers to return to their home state, officially, for the first time since 24 March 2020. This is an important moment for this paper, and we have deliberated on this event later. From 1 June, the government started "unlocking" the country (barring "containment zones") in three unlock phases. The ability to predict the spread of virus beforehand can always be useful in order to be better prepared, locating clusters of utmost importance, and making policy decisions to contain the virus till the vaccines come out. A plethora of researchers worldwide are already trying to predict the spread of the virus in various ways [3, 4, 5] . In this paper, we focus on analyzing the compartmental epidemiological model with an emphasis on computing the basic reproductive number, commonly known as R 0 . We focus on the following two aspects: (i) Inter-state migration within the country, (ii) The asymptomatic population. This is by no means meant to be an all-encompassing discussion of infectious disease modeling but a resource to supplement other more comprehensive texts are, [6] , [7] , [8] and [9] . The following section introduces the model that is developed in this paper. The SIR epidemic model is a widely used simplest compartmental model was introduced by Kermack and McKendrick (1927) [10] . A compartmental model denotes mathematical modeling of infectious diseases where the population is separated in various compartments, S, I, or R, (Susceptible, Infectious, or Recovered). In this paper, we have developed an extension of SIR model, again a compartment based epidemiological model named SINTRUE to track and predict the spread of the virus and have demonstrated the model by applying it on the data available for the state of Chhattisgarh in India * . We have sourced our data from the official reports of The Government of India which is unofficially collated in an open source project called COVID-19 India Project [2] . The model presented in this paper is developed as close to reality as possible, keeping in mind the different contagion rates from the asymptomatic (interchangeable with pre-symptomatic in this paper † ) and symptomatic patients, inflow to and outflow from the population due to migration, possibility of reinfection, and the extent of testing. The model is novel in the aspect of granularity, and specially in the aspect of considering a dynamic population that is often not considered in compartment based epidemiological models. This model is shown to be able to provide us with an estimate of the 'unseen' COVID-19 infected patients in the population, and most importantly, the addition of a compartment of 'unrecorded' recovery is able to provide a hint towards the status of herd immunity of the population. Also, this model, once applied on the data is able to show the presence of second wave of infection as well. This model has been shown to perform better than the other epidemiological models for COVID-19, and the results from this model has been able to provide concrete policy suggestions for healthcare management in the time of the pandemic. The SINTRUE model comprises of seven compartments in the progression of the disease, with the addition of an inflow to and an outflow of people from the population. The seven compartments considered in the model are Susceptible, Infected and pre-symptomatic, Infected and Symptomatic but Not Tested, Tested Positive, Recorded Recovered, Unrecorded Recovered, and Expired. The details of the model along with the description of the infection dynamics and the compartments are elaborated in Section 2. Section 3 details the estimation process of the parameters of the model. Finally, Section 4 concludes with the results and discussions. To incorporate a realistic viewpoint of the dynamics of COVID-19 spread across India, we propose an extension of SIR type model, with 7 compartmental states. There was interstate movements of migrant workers during the period of lockdown which was an undeniable part of the vital dynamics, was considered in only a few previous works [11] , but remained a concern for the media and the governments [12, 13, 14] . In our proposed model, we incorporate this effect of interstate migrant movement by keeping the total population dynamic, with an incoming population of migrants divided into three types: Not Infected (M n ), Pre-Symptomatic (M p ), and Symptomatically Infected (M i ), as well as an outgoing population of migrants from the set of population who are not under medical surveillance. The part of the incoming migrants who are not infected, pre-symptomatic or symptomatically infected join the corresponding part of susceptible S, pre-symptomatic I p and symptomatic and tested people I t compartments at destination. Starting from three compartments of SIR model [10] , we primarily extend it by three aspects: 1. Adding another compartment E concerning the recorded deaths of patients suffering from COVID-19. 2. Splitting up the group of infectious persons into three compartments. (a) Pre-symptomatic individuals I p , who do not show at least one of the primary symptoms of the disease. While in the literature this compartments is popularly known as † We have interchangeably used asymptomatic and pre-symptomatic patients here, as our model allows us to put them in the same compartment as long as they do not show symptoms, and then assign to different compartments based on if they become symptomatic or remain asymptomatic. We understand that these two words are medically different, but the construction of the model allows us to use them interchangeably from an epidemiological perspective. [15] , we prefer to call this pre-symptomatic going by the analysis of World Health Organization(WHO) [16] , which clearly corroborates in favour of pre-symptomatic transmission rather than truly asymptomatic transmission. (b) Symptomatic but not tested individuals namely I sn . These are the people who despite being symptomatic is not under any kind of medical surveillance. (c) Tested Positive individuals, I t . This is the part of the population who are tested positive for COVID-19 and is under medical surveillance. Hence, the reported figures of COVID-19 patients only related to this compartment. 3. Splitting up the group of recovered persons into two compartments. (a) R, the recorded part of the population recovered from COVID-19. (b) U , the unrecorded part of the population recovered from COVID-19. These people have gained immunity from the disease by recovering from it naturally, however, their situation is not reported to the hospitals, governments or any concerning agencies. The model is visually explained in fig. 1 along with the corresponding notations. Starting with the susceptible population, a person might get corona-virus from his/her interaction with a asymptomatic person, as well as a symptomatic person. While there is a natural predilection to avoid symptomatic persons, such a tendency is not prevalent in interaction with asymptomatic persons due to lack of evidence through external appearance. Also, there is evidence of difference in transmission for separate viral loads, complex incubation periods and rates of disease progression [17] . Furthermore, as awareness about the disease begin to spread, the general population tend to avoid too many interactions, as well as reduce the exposure time periods. Thus, we take β 1t and β 2t as the time varying transmission rates, possibly different, for the interaction between the susceptible population with the pre-symptomatic I p and symptomatic not tested I sn population. The implicit assumption that the transmission rate between susceptible and symptomatic tested population is 0, can be justified by the fact that the people under medical surveillance is being put under quarantine, restricted to interact with susceptible population, and health workers interact with these patients with proper Personal Protective Equipment (PPE) so that the rate of transmissions through these interactions is negligible. In addition to these disease transmissions, the transitions between different compartments of infectious states in our model is based on a single principle. At any point of time, an infectious person is either recovered naturally, or the disease aggravates to next stage, or the person comes under medical surveillance, by means of contact tracing. While there is little evidence of evolution of corona-virus during the time span of the pandemic [18] , there were significant changes in testing strategies [19, 20, 21, 22] . Thus, it is reasonable to assume that the rate concerning with exacerbation of the disease or natural cure of the disease are not time varying as they are related to inherent biological variables immune to external circumstantial changes, in contrast, the rates concerning the inclusion of population under medical surveillance should obviously remain a time varying quantity. Under this logical flow, a pre-symptomatic person in I p , naturally heals and moves to U following a counting process with rate λ, or starts to show symptoms (next stage of the disease) following another independent counting process with rate α, or comes under medical surveillance following a time varying rate θ t . Exactly similar to that, a symptomatic but not tested person I sn naturally heals and moves to U with rate κ, or the disease proceeds to next stage resulting in death of the person with rate ζ, or gets tested positive with rate δ t . Finally, similar to SIR model, the people under medical surveillance I t , either recovers with rate γ or becomes deceased with rate τ . We also allow for the possibility of re-infection, by assuming that a proportion f of the people getting recovered each day, will actually join the susceptible group instead, due to lack of sufficient antibodies. The complete dynamics of the model can be mathematically expressed using a set of differential equations as shown in eq. (1) ‡ . ‡ Note: This model does not have mass conservation property as we are considering a dynamic population whose input and output rates are not equal. While the general model described in eq. (1) is more close to the reality, it is also very complex to analyze. In light of the available data, certain assumptions are needed in order to effectively estimate all the parameters of the model. The first implicit assumption in the model dynamics is that interactions between different states of the model can be efficiently modelled by a counting process with the respective rate parameters. In particular, if we have an arrow from state A to state B with rate θ in fig. 1 , then it means the number of people that moves from state A to state B at any given day is determined by a Poisson process with parameter θ, in other words, an individual residing in state A moves to state B after an exponentially distributed number of days, with mean parameter 1/θ. One crucial assumption that we need to make is that f = 0. This is actually supported by dearth of strong evidences of reinfection of any recovered patients, with very inconsistent and sporadic incidents [23, 24, 25] . Next, we take ζ = 0, with the implicit assumption that any deaths due to COVID-19 will be reported. Also, as we do not have any distinction in the deceased data about the sources of the deceased (whether they were monitored under medical surveillance prior to death), we are compelled to make this assumption in order to ensure estimability of the parameters presented in our model. In addition to this, we also assume that the transmission rates β 1t and β 2t differs only by the different exposure time with the susceptible population. Since the type of coronavirus associated with COVID-19 is relatively new, no significant biological study has yet measured the contagiousness of the virus transmitting from a pre-symptomatic individual apart from a symptomatic individual. In this regard, a susceptible person in the population can differentiate between a pre-symptomatic and a symptomatic stranger only on the basis of symptoms that are externally visible like cough, but not on the basis of symptoms like fever, fatigue, increased blood pressure, low oxygen levels etc. as they are improbable to perceive from external appearance without any assist of proper medical instruments. Thus, we may assume β 2t = P (Cough)β 1t for any t. According to [4] , the rate of symptoms like cough shown along with positive COVID-19 cases is 61.7%, and thus we take β 2t = 0.617β 1t as a restrictive assumption in our model. In order to estimate the transmission rate β 1t , we shall assume a specific two parameter family of curves. where a, b are parameters to be estimated. The threshold of 1st May, 2020 is carefully chosen to distinguish the period that denotes the beginning of official migrant movement [26] . Before this official migration started, various lockdown enforcement schemes were expected to reduce the transmission rate continuously. However after 1st May of 2020, various psychological issues among migrants [27] , the difficulty of lockdown enforcements [28, 29] , aggressive violence against health care workers [30] suggests that transmission rate cannot be reduced indefinitely and should remain constant in a state of partial lockdown. We start by delineating the estimation procedure for parameters associated with migrant movement, i.e c 1 , c 2 and c 3 . The total number of incoming and outgoing migrants can be estimated as in [11] . Note that c 3 is essentially the proportion of incoming migrants who are symptomatic and infected, i.e they have contracted the virus in the origin state and have developed the symptoms by the time the reached the destination state. We use Destination State to denote the particular administrative area serving as the destination of the in-migrants. Similarly, Origin State is used to denote the administrative area from which an in-migrants starts his journey to the Destination state. Let D denote a destination state and O 1 , · · · , O n denote n origin states. Let Then we estimate c 3 by estimated proportion of reported Covid-19-positive people among incoming migrants in each day. Thus, We do not yet have a origin-state-specific segregation vector of Migrants entering a destination state each day-normalizing which would have yielded m D . But we have a data § , on what proportion of Infected Incoming Migrants are from which Origin state. We use that vector as § Obtained from Pt. J.N.M. Medical College Raipur, Chhattishgarh m D . This is because, considering the relatively large number of total migrant influx compared to infected migrant influx (i.e belonging to state I t ), and more or less large number of infections in all the significant origin states, we can assume that the infection among in-migrants in independent of the origin state. Mathematically this assumption can be written as This assumption validates m D as an estimate of m D . We estimate A D by assuming that the Other States and/or Unassigned/Unknown column in the data provided by Covid19India dashboard [2] corresponds to migrant people. We have an estimate M i , i = 1(1)n for the Total number of migrants likely to move, as in [11] . Let at time point t, the total number of Unassigned cases in Origin state O i be denoted as (A i ) t . . Then, c 3 at time point t is estimated as: To compute c 2 we note that those incoming migrants who are undetected would have contracted the disease during the journey with fellow migrants. Thus at time point t, a symptomatic migrant from origin state O i would infect on an average (R i ) t people, where R i denote the day-wise reproduction number of the i-th origin state. We use EpiEstim package [31] to obtain these reproduction numbers. Then c 2 at time point t is estimated as: where denote the Hadamard Product i.e element-wise multiplication. Finally, we obtain; Now we move on to estimating other parameters. Once we assume that f = 0 and ζ = 0, the differential equations described the changes of the states E and R simplifies to; Since the periodically released data by Indian government [32] , along with their historical records by Covid19India group [2] contains records of recovered, deceased and current number of hospitalized covid patients, the quantities dR dt , dE dt and I t can be computed with the help of the data and standard numerical differentiation techniques. Now, we choose an L 1 norm of the errors as a criterion for estimating γ and τ , i.e. which essentially boils down to find robust estimators for MAD (median absolute deviation) regression problems to express the dependent variable dR dt and dE dt as a intercept-free linear function of independent variable I t . From the testing strategy mentioned in [22] , it is evident that pre-symptomatic people are getting tested only if they are traced as high-risk contacts of a confirmed case. In other words, a pre-symptomatic person would be tested if any person that he / she had contacted for last few days has coronavirus and is tested positive. Now, turning our attention to a single asymptomatic person, let N be a random variable denoting the number of contacts that the person had in past few days. Assume a Poisson distribution for N with mean parameter N c . Now note that, and similarly, where p s and p a denotes these probabilities for an individuals to be symptomatic or presymptomatic given that he/she is not already tested. Note that both of these random variables are independent of each other and also the unconditional distribution of number of symptomatic (or asymptomatic) contacts is Poisson distributed with mean parameter given by product of N c and the corresponding binomial success probability. Thus, e −θtd = P(The person gets tested in ≥ d days) = P(all symptomatic and asymptomatic contacts gets tested in ≥ d days) assuming the respective quantities are small which finally implies; θ t ≈ N c (p a θ t + p s δ t ). To obtain the probabilities p a and p s , it follows from the model that a plug-in estimator would be which is basically the ratio of the corresponding pre-symptomatic (or symptomatic) individuals in the population and the individuals not under any kind of medical surveillance (i.e. they are at risk of getting tested). However, since (I p θ t + I sn δ t ) is the total number of individuals getting tested on day t (see eq. (1)), the estimating equation for θ t simplifies to; A quick interpretation of the above estimating equation is that, when ∆(I t + E + R) new individuals are tested positive on day t, by means of contact tracing, on average N c × ∆(I t + E + R) many individuals will be tested for the disease next day. Thus, an pre-symptomatic individual, who is very similar to a healthy individual in external factors would have the same rate of being selected at the testing as the ratio of N c × ∆(I t + E + R) and remaining population (N t − I t − E − R). Since N c is an unknown parameter, we take it as the average number of high risk contacts traced (and tested for covid) per covid patients who are tested positive. Focusing on the testing rate δ t for the symptomatic individuals, we again use the assumption that the diffusion from I sn occurs according to independent Poisson processes with corresponding rates, and thus at any point of time T , the probability that a symptomatic individual will get tested before he/she recovers (or dies) is; From the patient database available from Chattishgarh government, it was possible to obtain the number of symptomatic people among those who are tested positive for COVID-19, thus enabling us to obtain estimate of P(Symptomatic | Tested and has covid). Also, the quantity P(Tested and has covid) can be estimated from the number of samples tested for COVID-19. Note that, the estimates for the quantities in the denominator cannot be obtained in the context of the particular country in study, since only the data pertaining to the tested individuals will be collected. However, Siordia [4] reported the rate of different symptoms seen along with COVID-19 patients, which can be used to estimate P(Symptomatic | Covid) based on the definition of symptomatic person as an infected individual showing all of the primary 3 symptoms, namely fever, cough and fatigue. This turns out to be, P(Symptomatic | Covid) = (0.822 × 0.617 × 0.44) = 0.22315656. To estimate the true prevalence P(has covid), we consider to use the data of countries like South Korea, United Arab Emirates as a reference frame, since these countries perform most of the tests in relative to their population [33] . The estimates of test positivity rates for these countries ranges between 2.5% to 5% on average. Some studies [34] shows that the prevalence of COVID-19 among the health workers who are high-risk contacts stand somewhere close to 5%. Turning our attention to the available data, the prevalence among the high risk contacts who are tested stand at 4.229% for Chattishgarh and 8.642% for the whole India. Naturally, it is apparent that the true prevalence would be lower than 4.229%, but it needs to be estimated in a proper way. Let us denote this true prevalence as , which we shall also estimate in the light of available data. Assuming the knowledge of and κ, we can obtain a plug-in estimate for δ t , as ζ = 0 by assumption and δ t (s) = d ds κs (1 − P(Tested | Symptomatic and has covid)(s)) P(Tested | Symptomatic and has covid)(s) where the probability function can be evaluated using eq. (6) using the knowledge of the true prevalence at the discrete timepoints, and then a numerical differentiation can be performed to obtain the time-varying rate δ t . Turning our attention to estimation of the transmission rate, due to eq. (2), it follows that the knowledge of a, b will enable one to estimate β 1t and thus in turn β 2t as 0.617β 1t . Thus with a specific choice of α, λ, κ, , a, b, it is possible to obtain the time varying parameters β 1t , β 2t , δ t . Combining these with the already obtained estimates θ t , τ, γ and time varying estimates regarding migrant movements c 1 , c 2 , c 3 , it is possible to simulate the whole system by solving the set of differential equations eq. (1) numerically. Denoting φ = (α, λ, κ, , a, b), in order to obtain an estimate of the parameter φ ∈ Φ ⊆ R 6 where Φ denotes the underlying parameter space, we consider the following criterion. where w's are some specifically chosen weights and; The above loss functions measure difference in the the simulated counts and the real counts from the available data. For instance, L I (φ), L R (φ), L E (φ) measures the discrepancy between the available data of the number of infected, recovered, deceased individuals and the number of infected, recovered, deceased individuals obtained from the numerical solution to eq. (1) with the choice of the parameters φ. On the other hand, L symp (φ) and L asymp (φ) measures the discrepancy between the obtained data on the number of new symptomatic and pre-symptomatic patients tested positive on each day, with that of the theoretical counterparts θ t I p and δ t I sn respectively. Finally, at the most recent timepoint T with available data, (1 − )N t , which denotes the number of individuals without COVID-19 virus (since is the true prevalence of COVID-19), is matched against the number of susceptible population S. The best fitting parameters φ is obtained by minimizing L(φ). However, in order to circumvent numerical underflow or overflow and convergence related issues due to the possibility of non-convex optimization with numerous local minima, we decided to use grid search algorithm to find the best φ which minimizes log L(φ). The weights in eq. (8) are chosen carefully by performing a cross validation to minimize prediction sum of squares, in order to normalize each of the individual loss functions in the same range to make them comparable, so that all the loss functions are minimized simultaneously when minimizing log L(φ). All the subsequent results are based on the available data as of August 22, 2020 collected from different official [32] and unofficial sources [2] . Based on the robust regression type approach to estimate fatality rate τ and recovery rate γ, we use MASS package [35] in R programming language to obtain these estimates. From the historical data on confirmed, active, recovered and deceased cases for each state, available at [2] , we obtain the estimates of τ and γ for some of the major states in India. The estimates are shown in table 1. The fit of the robust regression type approach is shown in fig. 2 It seems that fig. 2 shows a reasonably good linear relationship between the numerical derivative of deceased and recovered cases, and the number of active cases, except a few outlying datapoints, which also suggests one possible reason for using a robust version of linear regression. Considering the migrant population, the total number of incoming and outgoing migrants can be estimated as in [11] . In case of Chhattishgarh, it was found that about 49.36% of migrants come from Maharashtra, while migrants from Uttar Pradesh, Delhi, Telangana, Gujarat, Tamil Nadu, Haryana, Odisha, Andhra Pradesh constitutes more than 90% of the incoming migrants. Thus when performing the estimation of parameters related to migrant movement, the reproduction number of these states, the number of special trains that connect Chhattisgarh to any of these states, the worker's population at the major cities of those states are taken into consideration. Based on the detailed estimation process described before, the time varying proportion of new incoming migrants based on different groups i.e. c 1 , c 2 , c 3 are estimated. Corresponding results for Chhattisgarh in shown in fig. 3 . The results show an overall decreasing pattern in c 1 , while similar increasing pattern in c 2 , c 3 . Interestingly, a week before official migration started, the proportions c 1 , c 2 becomes non-negative, which could be a possible indication of unofficial migrant movement. A sharp increase in c 2 and c 3 is also noticeable from mid-June, which was possibly a lagged effect of "Unlock-1.0" declared by Govt. of India [36] . [37] show that under natural circumstances, the average number of contacts of a person per day is 13.4 with a varying distribution of mean number of contacts in different countries and in different regions with different socio-economic strata. Because of the skewness in the distribution of number of contacts, national lockdown restrictions and overall awareness about the disease spread, it is natural to assume that the average number of contacts is smaller than 13.4. However, a more recent and relevant study by Leung et al. [38] shows that average number of reported contacts in relevance to spread of a respiratory illness is much lower, ranges from 5.12 to 8.21 over different age groups and socio-economic stratum, with overall mean being 6.93. In addition, from the official contact tracing data of Chhattisgarh, we have obtained that there were 23007 primary contacts who are traced from 3257 positive cases of covid patients, thereby suggesting an estimate of N c as 23007/3257 ≈ 7.0638. Since this matches nearly with the conclusions presented in [38] , we take N c = 7.0638 in our model and estimate θ t using eq. (5). Similarly, minimizing eq. (8) enables us to find the optimized parameter κ, which in turn can be used to obtain the time varying testing rate for symptomatic individuals using eq. (7). The obtained estimated are shown in fig. 4 . Although the estimated rates are highly correlated and shows similar patterns, the different in the magnitude depicts the fact that ceteris paribus, tracing a symptomatic individual and probability of him/her getting tested is about 7 − 8 times as high as the same for a pre-symptomatic individual. However, while about 85 − 90% of the new patients who are tested positive in Chhattisgarh are pre-symptomatic or mildly symptomatic, and as θ t is estimated to be very small, it must be the case that the pool of pre-symptomatic individuals I p must be enormously high. fig. 5 to be compared against the actual observed values. Here, we use only the available data upto August 1, 2020 and use the estimated parameters to perform a short term prediction upto August 22, 2020 to check the validity of our estimation process. From fig. 5 , it is clear that the estimation for the number of hospitalized and recovered patients seems sufficiently reasonable. One of the most important quantity to consider in epidemiological studies is the reproduction number R 0 . It denotes the number of secondary infection spread by an infected person on average. In our model, the population of infected individual is divided into three groups, I p , I sn and I t , all of whom has different transmission and recovery rates. In the model described in eq. (1), the reproduction number R 0 can be calculated as; The expression in eq. (9) basically takes the reproduction number corresponding to each of the group of infected individuals and then take a weighted average of them commensurate to the size of the groups. It should be noted that the transmission rate is assumed to be equal to 0 (or negligible) in case of hospitalized patients on the basis of complete effectiveness of quarantine protocols. In order to validate our model estimation procedure, we use EpiEstim package [31] in R to estimate time varying reproduction rate independently and compare the estimates against our estimates of R 0 for the state Chhattisgarh. Results are shown in fig. 6 . Clearly, the prediction resembles closely overall, except in the month of July, where our estimate of R 0 is slightly higher than the estimates In order to be able to perform long term prediction, we use ARIMA model with seasonality to perform prediction of the time varying parameters like θ t , δ t . Akaike's Information criterion was chosen in order to obtain the best fitting ARIMA model. Also, in the current state of partial lockdown, it is reasonable to assume that the β 1t is not likely to vary considerably in near future, and is, therefore, taken as the constant b = 0.102 throughout the period of prediction. Figure 7 shows the long term prediction for next 8 months of the number of pre-symptomatic (I p ), symptomatic (I s n), hospitalized (I t ) and recorded recovered (R) patients. To compute a confidence interval for the predictions, we rely on the prediction confidence intervals obtained from the ARIMA model prediction of time varying parameters. For each of the time varying parameters, we consider the upper and lower bounds of the 95% confidence interval for them. Then, for each combination of these boundary values, the model is simulated to the end of the prediction regime. With 4 independent time varying parameters such as θ t , δ t , c 2 , c 3 , and one Parameter Estimate Explanation κ 0.0113 On average, about 1.13% of all symptomatic individuals recover from COVID-19 naturally every day. Compared to that, 1.5% of all symptomatic individuals actually get tested positive every day as seen in fig. 4 . α 0.012 Different studies and reports [16] shows that average incubation period is about 4−5 days. This means about 4.8% to 6% of newly infected pre-symptomatic individuals are expected to develop symptoms, before getting tested or naturally recovered. λ 0.079 About 7.9% of the newly infected pre-symptomatic (or mildly symptomatic) individuals are expected to naturally recover from the disease. The true prevalence of COVID-19 is approximately 1.95%, thus affecting more than 6 lakhs individuals among the 3.2 crore population of Chhattisgarh. Table 2 : Estimates of the parameters of our model for Chhattisgarh dependent time varying parameter c 1 = (1 − c 2 − c 3 ), we thus create 16 such prediction scenarios, assuming other parameters to be fixed as their estimated values. Finally, the daywise minimum and maximum of all such prediction scenarios were taken in order to obtain an approximate 95% confidence interval of the predictions. Interestingly, fig. 7 shows that an indication of a small primary wave in April 2020, which then ends during middle of May, 2020. This could serve as an indication of the effectiveness of various lockdown enforcement schemes imposed by Govt. of India. However, the second wave starting from middle of May, 2020 could serve as an indication of increasing migrant movements in India, thus, creating the opportunity of more detrimental and imminent second wave in the virus spread. While it is clear that the number of pre-symptomatic infected individuals will rise considerably to approximately 2 lakhs, the effect on the hospitalization rate will remain much lower, within a range of 25-30 thousands. We also perform a simple sensitivity analysis to find the importance of each of the parameters in prediction. The estimates of fatality Rateτ and recovery rateγ turns out to be extremely robust and subtle changes to the estimates given in table 2 affects only the number of individuals in deceased state (E) and recovered state (R). The estimate of κ affects primarily the I p and I sn states, and to some extent the prediction of I t as well. Increasing κ to 0.15 drops decreases I p and I sn both by 32%, while it slightly increases I t , R and E states by 4%. The effect of α is Figure 7 : Long term prediction of pre-symptomatic, symptomatic, hospitalized, medically recovered (recorded) patients for Chhattisgarh along with 95% confidence intervals fairly robust as long as α lies between 0.006 and 0.018 which incorporates about 50% change over current estimate of α. However, with α being relatively high like 0.05 or more, the estimates of I sn and I t increase rapidly. We found that the estimates of λ and a are pretty sensitive, and these directly affect the shape of the incidence curve of I p and I sn . We found that, about 7% change of these estimates retains the similar shape with two phases of covid spread as shown in fig. 7 . The estimate of b, and the migrant related parameters like c 1 , c 2 , c 3 , has subtle effects on the change of I p and I t . An increase in b increases each of I p , I sn and I t slightly and add a lagged effect towards increment of U, E and R; An increase in c 1 , c 2 increases I p and c 3 increases the incidence curve I t . The time varying parameters θ t and δ t (the testing rate for asymptomatic and symptomatic patients, respectively) have a similar significant contribution to the prediction. A change in δ t is found to affect only the recorded hospitalized (I t ) and symptomatic (I sn ) cases, while a change in θ t is found to correlate with hospitalized (I t ) and pre-symptomatic (I p ) cases. In case of Chattisgarh, increasing θ t by 10% reduces the peak of pre-symptomatic cases by about 7.2% while increasing the peak of hospitalized cases by about 9.8%. However, as most of the detected covid cases are asymptomatic (pre-symptomatic) or mildly symptomatic in nature, an increase in δ t turns out to be fairly robust, only decreasing I sn by 2.3%, and increasing I t even lesser, 1.3%. Furthermore, we have seen that a change in δ t has minimal effect on the public health aspect of the disease, i.e. size of the peak, proportion of population affected etc. On the other hand, the size of the peak and the duration of the disease in the population is seen to be extremely sensitive to θ t . We elaborate on this more in the concluding section. We have performed two comparative studies to see the performance in prediction of our model in comparison with the existing models in the literature like extended SIR model(eSIR) proposed by Song et al. [3] and SIDARTHE model proposed by Giordano et al. [5] . To compare the SINTRUE model with the eSIR model as presented in Song et al. [3] , both models are fitted to the data from the state of Chhattisgarh upto August 15, 2020, and using the estimated parameters, the number of Reported Active cases and Recorded Recovered cases upto September 5, 2020 were predicted. As a measure of deviation from the truth, Root Mean Square Error (RMSE) in prediction is used, and is found to be 4865.11 for the prediction of Reported Active Figure 8 shows the comparative figures for predicted and observed cases for Active and Recovered cases for both the models. From Figure 8 , we see that eSIR model consistently underestimates the number of recorded active cases and overestimates the number of recorded recovered cases. In comparison, SINTRUE model depicts highly accurate prediction for recorded recovered cases, and slightly better prediction for recorded active cases, however, the problem of underestimation still persists. In order to assess comparative performance of the SIDARTHE model with SINTRUE model, we consider Italy's similar to Giordano et al. [39] . However, there was very little trustworthy data about migration situation in Italy [40] , which compelled us to reformulate SINTRUE model into a closed population model with c 1 = c 2 = c 3 = 0, and in particular dIn dt = dOut dt = 0. With the data upto March 12 from various sources [41, 42] SINTRUE model with closed population is trained and a prediction is made from March 13, 2020 to September 5, 2020 for the active cases. In comparison, among the different scenarios for prediction presented in Giordano et al. [39] , we find that the scenario with slightly stronger lockdown and social distancing effect is found to yield a better alignment with the observed incidence curve, compared to the minimal social distancing scenario suggested in the paper. So, the parameter values presented in Giordano et al. [39] with mildly stronger social distancing is chosen. For the SIDARTHE model, the RMSE is found to be 27477.46 while the RMSE with SINTRUE model's prediction is found to be slightly lesser 24768.48. Figure 9 reveal finer details. On the left panel, the SIDARTHE model starts by almost exact prediction, but soon diverges out, giving especially bad prediction near the tail. It also underestimates the peak of the pandemic, which might affect any policy decisions using this model. On the right panel, SINTRUE model provides an excellent estimate both at the start, i.e on short term basis and at the tail of the prediction curve, i.e on long term basis. Its nearly comparable MSE with the SIDARTHE model is accounted by the overestimation at the peak, which can be thought of as an upper bound to the peak number of affected cases, thereby helping policy formulations. Further, the SINTRUE model manages to altogether replicate the shape of the actual curve based on data only upto March 12, whereas the shape predicted by SIDARTHE model is not nearly correct. This is expected, since unlike the SINTRUE model, SIDARTHE model doesn't use any time-varying parameters, rendering it unsuitable for long-term predictions where the parameter values are liable to change. The model presented in this paper comprises seven compartments in the progression of the disease, with the addition of an inflow to and an outflow of people from the population. Further, we have incorporated the pre-symptomatic /asymptomatic population in the model as well as the population who get the virus but remain undetected throughout their journey from being infected to being recovered. The seven compartments considered in the model are Susceptible, Infected and presymptomatic, Infected and Symptomatic but Not Tested, Tested Positive, Recorded Recovered, Unrecorded Recovered, and Expired. One extremely important observation that we make from the SINTRUE model and the subsequent simulation is, the testing rate of symptomatic patients actually does not affect the disease dynamics in any major way. Rather, it is the testing rate of the asymptomatic patients that turn out to be an extremely crucial parameter that can make or break the fight against the pandemic. The dynamic is extremely sensitive against the testing rate of the asymptomatic patients and once the rate goes up, the R 0 comes down drastically. The current R 0 indicates that around 23.664% (of Chhattisgarh) of the population needs to be affected in order to reach herd immunity [43, 44] . Hence, as a result of our model, one definitive suggestion we can make is that in order to fight the pandemic, one has to scale up their efforts on testing the asymptomatic patients. The increase in θ t can be achieved by either increasing the N c in our model, which will mean an increase in the contact tracing endeavour, or to start allowing on-demand testing. Both of these strategies will increase θ t and will bring down the peak considerably. As for the case of India, until September 4, India did not have a provision for people for getting themselves tested without any valid reason (symptoms, contacts etc.) and without being prescribed by a medical practitioner [22] . From September 4 onwards [45] , India has changed the testing strategy to permit anyone to get tested without any reason or prescription. We believe that this will have a direct positive impact on the asymptomatic testing rate θ t and this can heavily affect the progression of the disease in a positive manner. The advantage of the SINTRUE model is that the estimation for the parameter associated with extinct and recovery flow are pretty straightforward. Further, the estimation of the parameter to understand population movement from "Asymptomatic" to "Tested Positive" is obtained even without knowing the prevalence rate. This directly informs us that just looking at the rates of 'confirmed' cases, we cannot readily make a judgment about the prevalence of the disease in the population. To understand the infection of susceptible populations from the asymptotic or symptomatic population, we choose the values a and b. The choice of parameters λ (the rate of naturally healing population without getting detected), a, and b control the shape of the incidence curve in a way that can give a realization of the second wave of COVID-19 or third wave scenarios. So, one advantage the model has is having a much broader range of incidence curves and from a realized incidence curve, a direct idea on the transmission rate can be arrived upon. On the downside, the grid search at the end requires heavy computation (although the codes may be optimized further). It certainly requires a good amount of granular data (like the recorded ration of pre-symptomatic and symptomatic patients, etc.). Based on the available data for one of India's states, i.e. Chhattisgarh, the results discussed in detail in § 4 suggest that the parameters estimated with the migration and asymptomatic certainly can help in designing the COVID-19 control policy. The predictability of the model suggested in this paper is also compared (in § 4.1) with the the extended SIR model (eSIR) proposed by Song et al. [3] and SIDARTHE model proposed by Giordano et al. [5] . The predictability of proposed model in this paper is significantly better based on the performance measurement criteria of root mean squared error (RMSE) and the longterm predictability. The results obtained from the model represented in this paper have shown significantly better predictive capability compared to other newly developed models of COVID-19 progression, when applied on the data of Chhattisgarh State (India) and Italy. Further, this model could be thought out to extend as the aggregate model for the whole country, as the country when virtually under lock-down from May 2020, we can consider that on a state-wise aggregated form; the model can exhibit mass conservation property as dN dt will essentially be 0 for the country. We can show that given the initial states sum up to one, in the nationwide model's equilibrium state, the disease actually dies down. There will only be Susceptible, Recovered (documented or undocumented), and extinct people in the system at the equilibrium, which will mean that the epidemic (in the sense of the pandemic being treated only within the country) is over. This state will be reached eventually, but in order to speed that up, the recommendation from the SINTRUE model is that we have to change the testing policy to cover as much of the population as possible, and then only the spread can be arrested. India becomes country with second highest number of covid cases. The Guardian COVID-19 India Org Data Operations Group. Covid-19india dataset. Accessed An epidemiological forecast model and software assessing interventions on covid-19 epidemic in china. medRxiv Epidemiology and clinical features of covid-19: A review of current literature Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Infectious diseases of humans: Dynamics and control Mathematical models in population biology and epidemiology Compartmental Models in Epidemiology Modeling Infectious Diseases in Humans and Animals A contribution to the mathematical theory of epidemics Implication of repatriating migrant workers on covid-19 spread and transportation requirements Coronavirus: Migrant movement creates fresh spike in covid cases in india Lockdown 2.0: From bandra to surat, migrant crisis across india makes coronavirus fight tougher Migrants' issue in cabinet meet: 10,000 st buses to ply migrants Covid-19: asymptomatic carrier transmission is an underestimated problem Coronavirus disease 2019 (covid-19) situation report -73 Transmission and clinical characteristics of asymptomatic patients with sars-cov-2 infection No evidence for distinct types in the evolution of SARS-CoV-2 Strategy for covid-19 testing in india (version 2) Strategy for COVID-19 testing in India (version 3) Strategy for COVID-19 testing in India (version 4) Strategy for COVID-19 testing in India (version 5) Risk of reactivation or reinfection of novel coronavirus (covid-19) World Health Organization (WHO) Covid-19 reinfection: Myth or truth? The Hindu Yuthika Bhargava. Coronavirus lockdown -railways to run 'shramik special' trains to move migrant workers, other stranded persons Ministry of Health and Govt. of India Family Welfare (MOHFW) The impossibility of social distancing among the urban poor: the case of an indian slum in the times of covid-19 Coronavirus: Is social distancing an oxymoron in india? Covid-19: Ima warns of black day on april 23 if no action taken EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves India National Informatics Center. #indiafightscorona covid-19. Covid-19 Coronavirus pandemic (covid-19). Our World in Data Prevalence of COVID-19 Infection and Outcomes Among Symptomatic Healthcare Workers in Unlock-1 malls, restaurants, places of worship to reopen june 8 Social contacts and mixing patterns relevant to the spread of infectious diseases Social contact patterns relevant to the spread of respiratory infectious diseases in hong kong A sidarthe model of covid-19 epidemic in italy Information disorders on italian facebook during covid-19 infodemic An Interactive Web-Based Dashboard to Track COVID-19 in Real Time Apis for covid-19 statistics Herd immunity: Understanding covid-19 Herd immunity-estimating the level required to halt the covid-19 epidemics in affected countries Strategy for COVID-19 testing in India (version 6) In this research, along with the data sources cited from open sources, the specific data to discuss the results of Chhattisgarh state, we acknowledge the support of Dr. Vinit Jain (Professor, Orthopaedic). Dr. Kamlesh Jain (Professor, Preventive Medicine) and Dr. Santosh Singh Patel (Associate Professor, Ophthalmology) from Pandit Jawahar Lal Nehru Memorial Medical College, Raipur. This work was supported by IIM Visakhapatnam under the seed research grant project titled "Towards Prediction and Control of COVID-19 in Particular and any Pandemic in General".