key: cord-0533341-x5xub3y5 authors: Banerjee, Buddhananda; Pandey, Pradumn Kumar; Adhikari, Bibhas title: A model for the spread of an epidemic from local to global: A case study of COVID-19 in India date: 2020-06-04 journal: nan DOI: nan sha: d7656c0ad5bd101f1814d36d40ee1724c21258bb doc_id: 533341 cord_uid: x5xub3y5 In this paper we propose an epidemiological model for the spread of COVID-19. The dynamics of the spread is based on four fundamental categories of people in a population: Tested and infected, Non-Tested but infected, Tested but not infected, and non-Tested and not infected. The model is based on two levels of dynamics of spread in the population: at local level and at the global level. The local level growth is described with data and parameters which include testing statistics for COVID-19, preventive measures such as nationwide lockdown, and the migration of people across neighboring locations. In the context of India, the local locations are considered as districts and migration or traffic flow across districts are defined by normalized edge weight of the metapopulation network of districts which are infected with COVID-19. Based on this local growth, state level predictions for number of people tested with COVID-19 positive are made. Further, considering the local locations as states, prediction is made for the country level. The values of the model parameters are determined using grid search and minimizing an error function while training the model with real data. The predictions are made based on the present statistics of testing, and certain linear and log-linear growth of testing at state and country level. Finally, it is shown that the spread can be contained if number of testing can be increased linearly or log-linearly by certain factors along with the preventive measures in near future. This is also necessary to prevent the sharp growth in the count of infected and to get rid of the second wave of pandemic. COVID-19 is a pandemic that is actively spreading in the whole world and is an unprecedented challenge for the human race. All the countries infected with COVID-19 are struggling to mitigate the spread through various strategies. This disease is spread by inhalation or contact with infected droplets or fomites. It is observed that successful medical testing and as a result, detection of people infected with SARs-Cov-2 becomes one of the crucial control strategies for the spread of COVID-19 [1] . For instance, the epidemic curve in The Republic of South Korea suggests that this control strategy in South Korea has curtailed the epidemic. Besides, testing is also linked to tracing contact lists of the infected people and finally self-isolation of those people help against the spread. The success of containment of COVID-19 in the Republic of Taiwan has also the influence of proactive testing [2] . Given the fact that there in no effective antiviral vaccine or drug should coming soon, different prevention strategies are adapted by different countries that include voluntary or compulsory quarantine, stopping of mass gatherings, closure of educational institutions or workplaces, social distancing or even nationwide lockdown. However, these strategies may act less significant for the infected people who are at the pre-symptomatic stage, and in that case they act as invisible spreaders for the disease [3] . Thus it becomes increasingly important for mass medical testing for a country. Several researchers around the world are actively working on producing mathematical models of the spread of COVID-19. Here we quote that 'model-based predictions can help policy makers make the right decisions in a timely way, even with the uncertainties about COVID-19 [4] . The primary preventive steps adapted by the Government of India fall into five categories which include social distancing, movement restrictions public health measures, social and economic measures, and nationwide lockdown. A few notable decisions by the Government of India are given in Table 7 . It should be noted that a complete nationwide lockdown from March 25 till May 13 helped to control the spread the disease at large distances but failed to prevent it in neighboring districts, as observed in [5] . For example, before lockdown, infected cases are reported from different districts across India which are at large distances apart, however during the lockdown period it has been observed that new spread is reported in districts which are neighbors of infected districts. Besides, due to lack of well planned policy for migrant workers several of them have been travelling to their native districts during lockdown. Unavailability of data of such a traffic flow across the districts can be crucial in order to do a precise analysis of the spread. It can also be seen that the preventive measures proposed by the Government of India are similar to those adapted in other counties. It is observed in various studies that COVID-19 exhibits significantly different epidemiological attributes than other well studied epidemics in past. Thus it is of paramount interest to develop mathematical models which can characterize the inherent dynamics of the spread of COVID-19. Standard epidemic models such as SIR model considers human-to-human transmission, and it describes the diffusion process through three mutually exclusive stages of infection: Susceptible, Infected and Recovered. These models are also called compartmental models [6] which enables to compartmentalize different individuals based their states for the epidemic in a population. This model can help gain some insights about the growth of the infection based on approximating the model parameters from the available data. However due to a peculiar growth of COVID-19 in different countries, researchers have extended the SIR model and other existing models such as SIS model in order to acquire meaningful insights about spread of the COVID-19 [7] . It is very important to note that these studies can help us to frame control strategies and policies that can mitigate the epidemic [8] [9] . One of the first models for the spread of COVID-19 is proposed by Anastassopoulou et al. based on the data of confirmed cases reported at the Hubei province of China from the 11th of January until the 10th of February, 2020 [10] . They propose a discrete SIRD (Susceptible-Infected-Recovered-Dead) model and estimate the mean values of the corresponding epidemiological parameters such as basic reproduction number, the case fatality and case recovery ratio from the data. This model enables to forecast about the spread in near future. In an another attempt, in [11] the authors study the datasets of transmission from within and outside Wuhan, China to estimate how transmission in Wuhan varied between December 2019, and February 2020, and assess the potential for sustained human-to-human transmission to occur in locations outside Wuhan through a stochastic transmission dynamic model. In [12] , a mean-field epidemiological model is proposed for COVID-19 epidemic in Italy by extending the classical SIR model. Here, in addition to susceptible (S) and infected (I), the other stages of individuals are considered as diagnosed (D), ailing (A), recognized (R), threatened (T), healed (H) and extinct (E), collectively termed as SIDARTHE. In [6] , an Age-stratified model of the COVID-19 is proposed to capture the age-dependent dynamics for nowcasting and forecasting for Switzerland. This model incorporates the compartments of symptomatic and asymptomatic infected individuals along with susceptible and exposed individuals. In [13] , the authors propose a model of COVID-19 epidemic dynamics under quarantine conditions. They also develop methods to estimate quarantine effectiveness in a country or a region which is infected with COVID-19. Besides, a few models are proposed for understanding and predicting the spread of COVID-19 based on metapopulation network approach, see [14] [15] [16] . Several mathematical models are also proposed based on the the available COVID-19 data of India and fitting them into classical epidemic models incorporating other factors such as nationwide lockdown, social distancing etc., see [17] [18] [19] and the references therein. In [20] , a mathematical model of the spread of COVID-19 is proposed based on an age-structured SIR model. However, the comparison of this model prediction with real data is criticized by Dhar in [21] . In [22] , the authors perform state-wise analysis of the data of infected population in different states based three models: Exponential Model, Logistic Model and the SIS model. They also provide state-wise prediction for number of infected people for different states in recent future. An elementary network-based model for geographical spread of COVID-19 in India is proposed in [23] . In [24] , a model for the spread of COVID-19 in India is proposed emphasizing on migration of population based on the spatial network of cities, incorporating the growth-dynamics of SIR model at the city-level. In this paper, we propose an epidemiological model for the spread of a contagious epidemic in a region or country. The entire model is based on combining two growth processes of the spread at local and global level. By local, we mean at the level of city or town or districts or province, and global mean at the level of state or country. First we develop a new discrete model for the growth-dynamics of infected people at local level as follows. We consider four type of individuals living at a location. These are individuals who are tested as infected (X 1 ), tested as non-infected (X 2 ), untested but infected (asymptomatic or pre-symptomatic, X 3 ), and untested and non-infected (X 4 ) for the disease. Total number of such individuals equals the total population living at that location. Given the time series data of these numbers X i (t), t = 1, 2, 3, 4, we define the growth statistic X i (t + 1) − X i (t) utilizing X j (t), j = i and four other parameters each one of them is related to the the spreading pattern of the virus which causes the disease. Note that the different standard compartmental models exist in literature based on susceptible, infected, recovered, and diseased, which do not preserve the effect of parameters in an epidemic like COVID-19. In our proposed model, the growthdynamics at local level include the following parameters: (a) Spread due to infected but asymptomatic and pre-symptomatic individuals (b) Effect of preventative measures like lockdown or restricted movement of individuals across locations (c) Daily testing statistics. Then we consider the metapopulation network of all the locations at local level in order to incorporate the transmission dynamics of disease at the global level. Here we mention that the metapopulation network model is a standard and popular model for analyzing the spread of highly contagious diseases which include Zika virus [25] . Also see [26] and the references therein. In our proposed model, the vertices of the metapopulation network are the locations infected with the disease and the links connecting them represent the possible mode of transportation or spatial distance such as the great circle distance of the latitude and longitude coordinates of the locations at local level. The weight of these links, that represent the rate or percentage of transmission of population per unit time such as a day. Then the final model is defined by combining the dynamics of the spread at local and global level. The values of the model parameters are obtained by a learning technique based on training data and an error minimization. In the case of COVID-19, we consider the model parameters at the local level as testing statistic, social distance, and rate of infected people by an infected but untested individual (asymptomatic or pre-sympotatic) per unit time. In the context of India, the locations are considered as districts which constitutes the states and union territories of India. There are 28 states and 8 union territories in India, and there are a total of 718 districts in India. Based on the proposed model we predict number of COVID-19 infected people both at state level and the country (India) level. The prediction depends on the number of testing Let V = { l | l is the index of a location} be the set of locations where persons infected with COVID-19 are likely to stay in or move to on a day t. Suppose that N l is the population size in location l. Now we introduce the following notations to model the distribution and dynamics of pandemic. If T l (t) denotes the number of tested individuals in the location l then T l (t) = N l − T l (t) stands for the number non-tested individuals up to time t. Let C + l (t) and C − l (t) be the total number of people infected and non-infected with COVID-19, respectively in a location l ∈ V. Here these temporal data varies with time (t) measured in days. In any location l for a given day t, we define a random vector T with four components for the distribution of population the N l . Based on the above discussion X [l] (t) can be represented in a 2 × 2 contingency- the total population at the location l, though X 4 (t) are unobserved or latent random variables. Unlike the standard epidemic models, the asymptomatic infected people or who are infected with COVID-19 but not tested, that is, X 1 (t ′ ) at a future date t ′ > t. Besides, C + l (t) highly depends on the contact networks of C + l (t ′′ ) at a previous date t ′′ < t. But only X 1 (t) is observed. Thus the number of people who are tested for COVID-19 at a given day governs the dynamics of X [l] (t) at a location l over time. Let T l (t + 1) be a strategic number which provides the target quantity of new tests for COVID19 to be performed on day (t + 1) in location l. Given the statistic X [l] (t), new tests also depends on the availability of test-kits. However, this also depends onT l (t), the number of people not tested for the disease at the location l. Hence, we define the possible number of tests to be performed at l as In Table 2 , we introduce some generic notations of model-parameters that are used to develop the dynamics of the system and some more hyper-parameters that are involved in training and updating of model-parameters. All the parameters modified with suffix/super-fix according to the time and locations accordingly. Interpretations Testing-coverage probability among the infected λ 2 Infection spreading probability λ 3 Probability of population migration among locations α Average family size θ Mobility of individuals ǫ Error parameter Hyper-parameters Interpretations α 1 Changing rate of λ 1 α 2 Changing rate of λ 1 for future β 1 Changing rate of λ 2 r 1 Rate of increment in testing under linear growth. r 2 Rate of increment in testing under log-linear growth. Now we define the dynamics of change of X [l] (t) for any location l. 1 (t + 1) ∈ (0, 1) is testing-coverage probability among the infected in location l at time (t + 1). Hence, only a fraction of X 3 (t) will be will be identified as ∆ t X [l] 1 . So it is modelled with binomial distribution. λ 2 (t + 1) ∈ (0, 1) is a probability indicating the average spread of infection among near by people of a group of infected individuals. So, new spread identified-infected people is also modelled with binomial random variable bin(α∆ t X 3 (t + 1) ∈ (0, 1) is a probability closed to zero indicating the influence from adjacent locations. As a consequence it is modelled with Pois λ Parameter ǫ > 0 stands for average noise with Poisson distribution. It may be noted that X Now we consider the meta-population network G(t) with vertex set V of locations in order to incorporate the effect of transmission of COVID-19 across the locations. Let A(t) = [a kl (t)] denote the adjacency matrix associated with G(t). Let d kl denote the distance between k and l. Then define the weights of the edges of G(t) as where θ(t) is the mobility parameter. Here w kl denotes diffusion weight for the human traffic flows per day between the neighboring locations k and l. The value of θ(t) > 0 may be controlled based on government policies. For instance, in the case of strict lockdown the value of θ(t) may be considered as a small value. Now we define the matrix which is a row-stochastic matrix. Finally we propose the following predictive model at the level of state and country for the number of COVID-19 infected people. Note that the traffic flow between locations influences the value of X 4 (t + 1) as followed by Eq. (5) which contribute to X 3 (t + 1) and finally to the number of infected people X 1 (t + 1). Besides, the number of nodes in the metapopulation network G(t) varies with time. At time t, the nodes of G(t) represented the districts which are affected by the diaease at time t. Thus at the level of state S which consists of some locations, X 1 at anytime t. Further, the number of infected people at the country level is calculated based on the proposed dynamics of X where l is a state. This is done presumably due to the traffic flow between neighboring districts may be different from the traffic flow between neighboring states. Hence, at the country level, say India, denoted by I, X at anytime (day) t. In this section we discuss how to determine the values of the parameters involved in the proposed epidemiological model. Note that the initial values can be assumed wisely based on its characteristics observed from data and then as the time passes the model can update the values of the parameters from observed and simulated data. Let [t 0 , t 1 ] be the learning period throughout which the real data is available and the model can learn the data for estimating the values of the parameters. Consequently, the growth-dynamics of parameters can be defined which can update the values of the parameters when the real data is not available in future. First we consider the parameter λ [l] to simulated points using λ [l] 1 (t + 1). In such scenario error X [l] 1 (t) increases. For better fit of the model we need to update the parameter λ 1 (t + 1) is the number of tested positive cases, if we increase the rate of infection spread λ For the growth of λ where β 1 ≥ 0, T l (t) denotes the number of tests performed at the location l at time t. Here observe that, the intuition behind Eq. (8) is that the probability of spread of the disease depends on the number of testings done at the location l. We consider constant values of λ Recall that individuals who are at the asymptomatic and pre-symptomatic stages of infection, act as invisible spreaders for the disease. Hence, detection of individuals who are infected with the virus plays an important role into the growth-dynamics of the number of infected individuals at a particular location. Thus one of the control strategies to prevent the spread is to conduct enough number of tests per day and separate-out the infected people. In a country like India, where approximately 1.4 billion people live, conducting enough tests per day could be a difficult exercise. Besides, due to lack of huge number of test-kits and medical facilities, India is facing a lot of challenges to perform enough tests per day. The testing data in India is plotted in Fig. 2 which is obtained from [27] . It may be observed that the data is not available for three consecutive days after the 30th day. Besides the testing data is not available before March 19, 2020. Note that testing for COVID-19 for random sampling of individuals is not desired due to scarcity of enough testing kits for a large population and medical support facilities. Indeed, targeted testing by tracing social contacts of newly detected individuals with COVID-19 can be more efficient for identifying asymptomatic and pre-symptomatic individuals who are infected with the virus. Hence the increment of number of testing per day should depend on the testing-coverage probability among the infected individuals at a particular location, that is, λ In this model we incorporate two possible growth of testing data over time at a location: linear and log-linear. The parameters which we call rate of gain in the number of tests for COVID-19, are denoted by r 1 and r 2 for the following linear and log-linear growth equations respectively. From the real data it can be observed that the number of tested positive cases has positive correlation (0.9177) with number of test performed. Indeed, from Eq. (7), λ and log-linear increment of testing: T l (t + 1) = 1 + r 2 λ [l] Thus assigning small values of λ 2 (t) in the beginning of the simulation of the model, λ 2 (t), and T l (t) are updated according to Eqs. (7), (8) , and (10) or (9) respectively. Further, α 1 , β 1 , and r 2 can be selected from the interior of the unit cube given by (0, 1) × (0, 1)×(0, 1), whereas r 1 can be larger than 1. The searching method is well-known as as three dimensional grid search. Indeed, mapping the growth given by Eqs. (7), (8) , and (10) with real data, the values of α 1 , λ 2 (t), r 2 (or r 1 ), and T l (t) can be learned and estimated such that the total testing l T l (t ′ ) , and total tested and infected cases l X [l] 1 (t ′ ) at time t ′ ≤ t that are close to real data. It is discussed in details in the next subsection. These estimated values can be used for the training of the model. , and T l (t ′ ) are the learned values from the given data. However, for any t > t ′ when the real data are not available, the trained model can be used for prediction. Thus we define the update of λ Let X [l] 1 (t) and X [l] 1 (t) be the simulated and observed numbers of detected after test as infected with COVID-19 respectively at a location l at the time (day) t. Consider the time series of real data X 1 (t) where t 0 ≤ t ≤ t 1 , for a particular location l ∈ V which is the vertex set of the metapopulation network. Then the complete observed data-set is given by Then the data X 1 is divided into two sets which we call the training set and validation set as follows for estimating the model parameters which define X The model parameters are calculated which minimize the error function where Note that the weight w is defined such that T e and L e are computed over two different sets X V 1 and X T 1 respectively to avoid the imbalances in the data. Now we discuss how the values of the model parameters estimated by real data at local location can be used to predict the number of infected people at a global level such as state and country level in near future. We propose to train the model based on two methodologies at the the state level and country level. Recall that a state in India consists of several districts (locations denoted by l), and in India there are 28 states and 8 union territories. In this paper we adapt two-step approach for the prediction. The global level parameters include the social mobility parameter θ, and the traffic flow across the local level locations, given by the edge weight of the metapopulation network. First, we make state level prediction, that is, X 1 is considered at district level l, where S is a state of India. The metapopulation network for a state S is formed by the vertices which are districts belong to the state S, and the traffic flow which is represented by weights w kl defined by Eq. (6). The distance d kl between two districts k, l is defined by the great circle distance between the longitude and latitude coordinates of k and l. Next, once the estimates for X S 1 are obtained for all states S in India, the prediction at the the nation level is obtained by applying the proposed model treating the location as states. Thus model parameters are further estimated comparing with the real data at the level of states, as described above. Further, the metapopulation network of states is constructed, and the traffic flow is calculated using the wight formula w kl where the distance between two states is considered as the great circle distance between the longitude and latitude coordinates of states k and l. Note that both predictions at state and country level incorporate the social mobility parameter θ which preserves the effect of policies of the Government. For instance, during locklown the value of θ is considered as around 50 (more weights to local travel), and it will take the value around 2000 (includes long distance travel) when there is no lockdown. Besides the metapopulation network between the locations l plays a crucial role into the prediction. The rate of traffic between two locations is considered as given by Eq. (6) . Observed that the effect of social mobility of individuals is also incorporated with the traffic flow. In this section, the proposed model is trained with the data of infected population with COVID-19 and number of testings performed in India from March 4, 200 to May 7, 2020 [28]. Since there is nationwide lockdown during this period, the traffic flow across states is less. Therefore we simulate the model at an initial time t 0 = 0 which is on March 4, 2020 by setting λ 3 (t) = 1 × 10 −3 for all t ≥ t 0 , for every location l. These values are assumed due to the following facts. (1) Testing-coverage probability is very small since the number of people infected with COVID-19 in the beginning of the spread is small. (2) Since the average family size is 4, so approximately 3 people out of 10 may be exposed to get infected assuming that a person be in close and frequent contact with an group of infected people. (3) For simplicity of the model, we consider the value of r 2 such that total number of tested individuals are close to real data. We obtained r 2 = 0.28 during the optimization (training) using grid search. To match the number of testing performed each day, approximately 1, 00, 000 per day given after Eq. (10). (4) Due to nationwide lockdown, the traffic flow across the locations is less. Hence, θ ≤ 70 and λ 3 (t) is very less. The values of θ and λ 3 (t) [l] are obtained using grid search. During error optimization, (α 1 , β 1 , r 2 , θ, λ 3 ) is selected from five dimensional grid search. After the initializing the model the parameter values are learned based on the real data. For instance, the number of testings throughout the period March 04, 2020 to May 20, 2020 in India is approximately 10 5 per day [28, 27] , and hence value of r 2 is kept fixed during the training period of the model with real data. The value of number of testing at a location l is assumed as a random number between 1 to 5 when a first case of COVID-19 is reported. The model is trained with the real data collected from How India Lives [28, 27] for the period of March 04, 2020 to May 07, 2020 (65 days data). The remaining data, that is, the real data for the period Table 3 . After training and validation, two cases of gain in rate of testing are considered: linear and log-linear. We consider 8 states which have highest number of tested positive cases. For each state, we learn a model and do the prediction of probable tested positive cases after 7 days of the last day of validation data. We consider only those states which have sufficient data to train the model (at-least 2000 tested positive cases on May 7, 2020) . Learned values of model parameters corresponding to each state are given in Table 3 . After training the models corresponding to data of each state, we do the prediction of total tested positive cases in all the states on May 20, 2020. Predicted values and actual values are noted in Table 4 . In all the experiments performed in this work, we set ǫ = 2, α = 4, and λ 3 and θ are selected using grid search and values are given in Table 3 . as of May 20, 2020. Thus, we conclude that the model is able to learn and predict the total number of tested positive cases in each of these states. In Figures 3 and 4 , data and corresponding curve fitting is shown in which blue dots correspond to real data which are used as training data set, and the sky blue dots correspond to the validation data-set. The grey circles represent the trained and predicted values due to the proposed model which are following the real data very well. Grey square in plots (X=78 marked point in plots) indicates predicted value on May 20, 2020. Apart from next 7 days of prediction, we do a prediction for after 60 days, 180 days, and 365 days from the date of validation (May 14, 2020) under different values of testing rates r 1 and r 2 . Predicted values are tabulated in Tables 4 and 5 for different states and in Table 6 for India. Note that r 1 = 0 means when the testing statistics remain same as per testing data on May 07, 2020. However, if the number of total testing increases linearly or log-linearly as defined by Eqs. (9) and (10) then the number of people detected with COVID-19 increase significantly. This will be discussed in the next subsection. The total estimate for the total number of COVID-19 infected people in India would be approximately 4,60,000 on July 7, 2020; 19,00,000 on November 7, 2020; and 46,00,000 on May 7, 2021, if the number of testing maintains the present statistics including lockdown condition. Currently, in India the number of testing conducted per day is approximately 1, 00, 000 samples per day. However, training the proposed model over the real data provides the rate of testing r 2 = 0.28. The testing data is not available district level but at the state level. After learning the data, the model uses different rates of gain in testing for validation. If r 2 = 0 then it means that per day testing is constant for the entire period of interest. In order to observe the effect of statistics of testing on the count of total number of people infected with COVID-19, we propose two types of growth in testing: linear and log-linear. First we perform this experiment when testing is increased linearly using Eq. (9), for r 1 = 5000 and r 1 = 50000. As follows form Figure 5 , we consider three cases: (I) Per day testing is constant and corresponding curve is in grey color which is lowest among all three. The number of total cases of COVID-19 increasing slowly due to less number of testing over a huge population. (II) Increment of testings per day with r 1 = 5000 results into a drastic change in number of infected cases and it is increasing continuously and after certain time the rate of infection goes down but does not contain. This implies that this rate of testing may not control the spread of the disease. (III) Finally, When the increment of testing grows linearly with r 1 = 50000 then the simulation shows that the number of infected cases stabilizes and the spread gets contained. During the simulation, we consider parameter values at state level. Now we discuss the effect of testing when it is increasing at log-linear rate under Eq. (10). We consider the following cases: (I) The number of testing is constant with the value as of May 7, 2020. (II) If the gain in the rate of testing is r 2 = 0.1 then it does not stabilize (III) When r 2 = 0.4 the simulation result shows stabilizes the spread and the spread can be controlled. Number of infected cases get stabilized after certain time limit, a magnified view is shown in Figure 6 . 3 (t), and testing r 2 . There three regions in both the sub-figures: (I) left narrow region corresponds to less number of tested positive cases as compared to middle region (II) which is followed by wide spread region (III). It signifies that after certain rate of testing r 2 , infection spreading can be controlled before its pandemic like situation. (a) when λ 3 (t)(= 1/100) is very less then θ does not show its impact. Dark red patches are corresponding to points (r 2 , θ) which has large number of tested positive cases and these points are scattered all over the middle (II) region (highest value of tested positive cases is almost 6 × 10 6 ). 3 (t)(= 1/10) is significantly large and it shows the contribution of local mobility in spreading of infection. In plot, it is observe that dark red patches have more concentration inside circle (lower values of θ and r 2 ). Here in middle (red region) region, lower values of θ gives more weights to near by locations and less weights to locations at distances while higher values of θ give almost equal weights to all regions for mobility. Now we study the effect of mobility of individuals (θ), probability of migration across locations (λ apart but belong to the same state, and it is also true for states within the country during the lockdown period. Besides, the probability of migration is very less corresponds to lockdown effect. In simulation, we notice that the mobility parameter reaches the value up to 70 and the migration probability is less than 0.1. In Figure 7 , (in both the plots) we identify three regions: (I) The region in the middle, where number of tested positive (simulated) is highest, (II) A narrow region, left of (I) where the number of tested positive is less as compared to (I), and (III) stabilizing region, which is right to (I) that has less number of tested positive cases and it signifies that after certain value of r 2 , infection spreading can be controlled. We observe the following: (a) when λ 3 (t) = 0.1 then the contribution of local mobility into the spread of the disease is noticed. In the second plot of Figure 7 , observe that there are dark red patches (inside circles) corresponding to lower values of θ and r 2 . In the middle (red region), the number of infected cases is more as compared to the previous case. Moreover, note from Eq. (6) that if the value of social mobility parameter θ is much larger than max k,l∈V(t) d kl then the traffic flow indicator w kl becomes almost uniform for all k and l. On the other hand, a small value of θ generates more traffic flow between neighboring locations and induce less traffic flow across locations which are a large distant apart. If θ is considered comparatively larger value then the traffic flow are almost equal across locations. Besides, if θ is assigned in such a way that it induces higher traffic flow between the locations which are significantly infected then the infected people in both the corresponding location significantly increase. Consequently, the dark red patches are wide spread over the middle region in Figure 7 (a). Therefore, it is reasonable to conclude that lower values of θ corresponds to dark red patches which signifies local transmission, and larger values of θ corresponds to long-distance transmission of COVID-19. In this section, we do the prediction at country level. X to May 14, 2020. In Figure 8 (a), blue dots are corresponding to data points used to train the model (from March 4, 2020 to May 7, 2020), grey circles are corresponding to trained and predicted values, sky blue dots are data points used for validation of prediction (from May 8, 2020 to May 14, 2020) [27] . In this plot, we have shown that model is trained for infection data of 65 days, validated using next 7 days data, and prediction has been performed for next 7th days almost accurately. Sky blue dots (real data) almost coincide with the centres of corresponding to the predicted values (grey circles). Here training error is 37.2070. Apart from short range prediction, we also do long range prediction in which we do the prediction of possible number of tested positive cases after 60 days, 180 days, and 365 days are noted in the last column of Table 6 under different testing rates r 1 and r 2 . We consider r 1 = 0, 20000, 60000, and r 2 = 0, 0.1, 0.4; after training, r 1 = 0 means testing will continue with current volume (approximately 1,00,000 per day). If the number of testing increases linearly with r 1 = 10 3 then the total number of people infected with COVID-19 would be approximately 5.3 Millions on July 7, 2020; 88 Millions on November 7, 2020; and 220 Millions. For linear growth with r 1 = 5 × 10 3 in testing approximates the total number of infected people in India as 2 Millions on July 7, 2020; 59 Millions on November 7, 2020; and 130 Millions, see Table 6 . Also the same for the log-linear increase of testing per day is given in Table 6 . In this section, we discuss about the stabilization of spreading of COVID-19 in future. This means the number of newly affected gradually decrease, and the number of total number of infected people at country level becomes almost constant. From the analysis of the effect of mobility parameter θ and gain in testing rate r 2 (log-linear) or r 1 (linear), from Figure 7 it can be concluded that higher testing rate is more effective as we know that presently available data is obtained under very less mobility rate. However, as the mobility will increase after lifting the nationwide lockdown, the infection will presumably spread very fast. Here, we demonstrate time series analysis of infection spreading under different values of testing parameter r 1 for all the states and India. From Figures 8(b) and 6(a) for country level, where as 9, and 10 for state level show the stabilization of tested positive cases with increasing number testing of after certain threshold. The second wave of a pandemic is often observed in a region when interventions are effectively applied to mitigate the spread of the disease and but are then lifted [29] . In the proposed model of this paper, the second wave can be examined under the following scenarios: (I) If the number of testing performed daily is not enough, that is, it is at per with the cumulative number of all social contacts of previously detected people with COVID-19, then there would not be any sign of second wave; (II) If the number of testing is large enough such that next day available cases to be tested is decreasing continuously and spreading will get controlled soon; and (III) If the number of testing performed daily is sufficient to detect the number of cases at present state of the number of infections but somehow due to few events, (for example, if number of tested positive daily decreases and also the number of testing to be performed is below the required limit) then a second wave of diffusion may be observed. Thus it is important to track the last trail of infection diffusion completely to control it. Using simulation, we show that the second wave can be observed under different scenario which include: Number of testings is increased (a) linearly (b) log-linearly, in Figure 11 . We have proposed an epidemiological model for the spread of COVID-19. The model is based on spread at local level which can be at the level of province, town, city or districts by combining a statistical approach and using the metapopulation network of infected locations. The model incorporates a few parameters which represent the effect of spread by asymptomatic or pre-symptomatic individuals, restricted mobility of individuals, and the testing statistics. Predictions of total number of tested with COVID-19 people are made at the level of state and the entire country, based on the data of testing as of May 7, 2020, and under linear and log-linear growth of testing statistics. Finally it is shown that the spread can be contained in very near future if linear or log-linear growth of testing is adapted. The stabilization of infected cases primarily depends on the number of testing and the inter location transition of population or the strictness of lockdown. If the testing rate is low or moderate it may show less count of infected cases. But there is a chance of second wave to hit-back. If the testing rate is sufficiently large and executed with proper sampling scheme then the count of positive will get stabilized much early. The proposed epidemiological model can be applied and generalized for prediction of total number of tested with COVID-19 people at any country. Indeed, if metapopulation network is a network of countries then the prediction can be made at the world level based on the data of transmission of populations across the countries. Covid-19 epidemic in switzerland: on the importance of testing, contact tracing and isolation Response to covid-19 in taiwan: big data analytics, new technology, and proactive testing An alternating lock-down strategy for sustainable mitigation of covid-19 How will country-based mitigation measures influence the course of the covid-19 epidemic? Why lockdown: On the spread of sars-cov-2 in india, a network approach Age-stratified model of the covid-19 epidemic to analyze the impact of relaxing lockdown measures: nowcasting and forecasting for switzerland Robust and optimal predictive control of the covid-19 outbreak How and when to end the covid-19 lockdown: an optimisation approach A statistical model to monitor covid-19 contagion growth Data-based analysis, modelling and forecasting of the covid-19 outbreak Early dynamics of transmission and control of covid-19: a mathematical modelling study Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Simplified model of covid-19 epidemic prognosis under quarantine and estimation of quarantine effectiveness A mathematical model for the spatiotemporal epidemic spreading of covid19 Metapopulation modeling of covid-19 advancing into the countryside: an analysis of mitigation strategies for brazil Covid-19 containment policies through time may cost more lives at metapopulation level Seir and regression model based covid-19 outbreak predictions in india Predictions for covid-19 outbreak in india using epidemiological models Forecasting covid 19 growth in india using susceptible-infected-recovered (sir) model Age-structured impact of social distancing on the covid-19 epidemic in india A critique of the covid-19 analysis for india by singh and adhikari Covid-19 in india: State-wise analysis and prediction Modeling geographical spread of covid-19 in india using network-based approach Multi-city modeling of epidemics using spatial networks: Application to 2019-ncov (covid-19) coronavirus in india Spread of zika virus in the americas Spatial epidemiology of networked metapopulation: An overview Not all interventions are equal for the height of the second peak Acknowledgement. The authors thank Vaidik Dalal of How India Lives for his help with the data.