key: cord-0124064-rxwc89ma authors: Biswas, Kathakali; Khaleque, Abdul; Sen, Parongama title: Covid-19 spread: Reproduction of data and prediction using a SIR model on Euclidean network date: 2020-03-16 journal: nan DOI: nan sha: 1360b6c29928bc88e40f1820b6da4bd0efaa6e05 doc_id: 124064 cord_uid: rxwc89ma We study the datafor the cumulative as well as daily number of cases in the Covid-19 outbreak in China. The cumulative data can be fit to an empirical form obtained from a Susceptible-Infected-Removed (SIR) model studied on an Euclidean network previously. Plotting the number of cases against the distance from the epicenter for both China and Italy, we find an approximate power law variation with an exponent $sim 1.85$ showing strongly that the spatial dependence plays a key role, a factor included in the model. We report here that the SIR model on the Eucledean network can reproduce with a high accuracy the data for China for given parameter values, and can also predict when the epidemic, at least locally, can be expected to be over. The novel corona virus (COVID-19), which causes an acute respiratory disease in humans has already spread to nearly 150 countries and has recently been declared as a pandemic by the World Health Organisation [1, 2] . The original epicenter has been identified as the city of Wuhan in mainland China and the virus has later spread to other countries, affecting most severely Italy and Iran. At present, the epicenter is believed to have shifted to Europe. While the rise in the number of cases in the western world is alarming, drastic precautionary steps taken in China, South Korea and Hongkong have been successful in containing the virus as of now. A clear picture of the space-time dependence of the spread of the virus can be understood best from the data of China for which the numbers are quite large. Considerable number of analysis of the available data of the number of cases and deaths have been attempted recently, and a few data driven models have also been proposed [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] . In particular, an exponential growth in time in the early stage is noted in all the countries majorly affected, however, the number of deaths is seen to follow a power law behaviour [17] . This could be due to purely medical reasons, the possibility of survival also depends on the stage of detection and treatment received. Since the virus can be contracted only once, an affected person either dies or recovers; it belongs to the class of infectious diseases considered in the Susceptible-Infected-Removed (SIR) type of models. In this model, the population is divided into three categories, Susceptible, Infected and Removed; the total population (including the deaths) is constant. As the disease propagates, susceptibles are liable to get infected, the infected people either die or recover (treated in the same manner); the total infected population over time form the removed category. Usually the densities are considered and denoted by S, I and R for the three categories and depend on time t, with I = dR dt and S + I + R = 1. In such epidemics, I, corresponding to newly infected density in real situations, initially shows a slow increase with time that changes into a steep rise before it reaches a maximum value. Typically, the increasing phase shows an exponential behaviour. After reaching the peak, it gradually decreases before reaching a zero value when one can declare that the epidemic is over. China is already in this decreasing phase while most other countries are yet to reach the peak value. The cumulative data, R, accordingly, shows a saturation once the peak value has been crossed. In Fig. 1 , we plot the data for the total fraction of cases R versus time for China where the number of total cases is divided by the total population N China , using the data in [19] where the daily reports are available starting from January 21, 2020 (Day 0), The number of records is 50 up to March 11. A jump in the data shows up on the 26th day, this is presumably because initially, the criteria to be satisfied to confirm that a person has actually contracted the disease had been more stringent and relaxed later on. The resultant jump in the data makes it a bit difficult to fit with a smooth function, nevertheless, one can fit the data to the empirical form with a = 1.15698 × 10 −6 , c = 0.0201529 and T = 5.22. The error involved for a and c are about 16% while for T it is 4.3%. This was already noted in an earlier report for the data up to day 40 [18] . The data for the number of newly infected people I has more scatter and instead of a direct fitting, we calculate it by differentiating R with respect to t. The peak value of I can be found out by calculating the maximum of I and one can also locate the time t p when the peak occurred: This gives a value 20.38 which is larger than the actual value (∼ 15), the discrepancy may be due to the discontinuity in the data as mentioned earlier. Note that the above analysis can be made only when I has entered the decreasing phase. The empirical form given in Eq. (1) was actually obtained from a theoretical model of SIR on an Euclidean network studied previously [20] and was also shown to fit accurately the data for the Ebola outbreak in West Africa [21] that occurred around 5 years back. In this model, one assumes that apart from the nearest neighbours, there are also connections randomly at a distance ℓ with a probability P (ℓ) proportional to ℓ −δ . Here an infected person can infect her nearest neighbours with a probability q. Although one assumes that there is no mobility for the agents here, the fact that they can infect a person at a distance, implicitly includes the possibility that the agent can travel. To justify that one requires a Euclidean model to fit the data, we have also calculated the number of cases R(d) recorded as a function of the Haversine distance d, in Km, from the epicenter. Taking Wuhan as the epicenter, this data are obtained for China. Since Italy has also recorded a large number of cases, we plot the data for Italy alongside, taking Bergamo as the epicenter, where the largest number of cases have been recorded. Both show consistency with a power law decay at larger distances as shown in Fig. 2 ; with γ = 1.85 ± 0.1. Since the data understandably have a large amount of scatter (even after binning), this estimate is approximately made. While it is expected that R(d) will decrease with d, the power law behaviour is not obvious. The correlation coefficient is also calculated for the two sets of data; for China it is -0.268 while for Italy the value is -0.383. This distance dependence, however, can only be obtained with the identification of the epicenter, otherwise one obtains no systematic dependence as was the case when the data for the rest of the world were plotted with Wuhan as epicenter [18] . The form of R and the power law dependence of the infection supports the claim that a Euclidean network model is appropriate to study the outbreak phenomena. However, to establish that indeed a SIR model on a Euclidean network can explain the data, we next simulate a "agent based" to find the appropriate values, if any, of the parameters that can reproduce the data accurately. For Ebola outbreak, this was quite successfully done using this simple model. For the SIR model on the Euclidean network, we have two parameters, δ, already defined, and q, the infection probability. The value of δ essentially denotes the range of contact, δ ≥ 2 corresponds to a short range model. The model shows long range behaviour for δ < 2 and for sufficiently small values of δ manifests small world behaviour [22] . For real world network, δ is therefore expected to be less than 2, however, the data plotted in Fig. 2 suggests it is not very far from 2. In the model, there exists a critical value of the infection probability, denoted by q c , above which the disease becomes epidemic. q c obviously depends on δ and for appropriate reproduction of the data, q must be above q c (δ). In the simulation, 100 different network configurations have been used. Initially, one randomly chosen node is infected. For each network, 800 such choices have been considered. A system of 2 12 nodes has been taken; each node is connected to its nearest neighbours and one more node at a distance randomly on an average. To compare the results of the model with the real data, it is also required to normalise the latter and thus we consider the fraction of total infected people that is plotted in Fig. 1 . However, this number is rather small even for the saturation value while in the model it is O(0.1). Thus one needs to rescale the data obtained from the model to show the agreement. The time unit is one Monte Carlo Step (MCS) in the simulation, and also needs to be rescaled according to the real data. The rescaling factor for the fraction of affected people for the results obtained from the data is determined from the saturation values of the real data and data from the simulations. The rescaling is done such that the two coincide, of course, this does not ensure the entire data will match. We also argue that this factor gives an idea of the fraction of the population exposed to the disease. For China, the fraction (rescaling factor) ρ = 6.0783 × 10 −5 . This factor can be interpreted as the fraction of the population exposed to the disease. For the time axis, we find T in days ≈ 0.95 times a Monte Carlo step and the rescaling is done accordingly. Fig. 3 shows the real data and the results from the model after the rescaling is done with the parameters δ = 1.8 and q = 0.85. Here we have also shown the comparison for the new case density. We note that the data match quite well with the simulation results, apart from the discontinuity in the real data that cannot be reproduced in the simulation. Both δ and q values, when varied within ±0.05, continue to show fairly good agreement. From the model, one can also predict when the disease is going to stop. Ideally this happens when I becomes zero -no infected person is left in the population. Here, if I becomes less than 1/N China , one can assume that no infected person remains. Accordingly we find that time to be around 64 MCS steps which corresponds to 61 days. On the other hand, the simulation shows zero infected agents at 82 MCS steps which corresponds to about 78 real days. This indicates one can expect the disease to be eradicated from China approximately within 60-78 days from January 21, 2020. Of course these are averaged values and fluctuations will be there. Discussions: We find a rather high value of δ which gives nice agreement: this may indicate the range of the contact between infected and infectious agents is quite short. In principle, δ can vary from country to country depending on density of population, travel patterns and other factors. It is also significant that the value of δ is quite close to γ (eq. (2)) found in the real data. The model, as mentioned earlier, had also reproduced quite well the data for Ebola outbreak. In comparison, we find a larger value of q, the infection probability, here. This is consistent, as in Ebola, the infection is transmitted through more intimate contact. An important and interesting point is we obtained good agreement without any factor representing preventive measures in the model. Here, how the disease is contained in the model should be well understood. In the model, at later stages, the disease gets controlled as an infected person no longer finds enough susceptible people to infect. Indirectly this means, the susceptible people now live in isolation which is exactly what is being aimed at by imposing curbs on the mobility and social life of the citizens. So the model, where mobility is not allowed, in a way successfully mimics this situation. Incorporating preventing factors by making e.g, q time dependent or reducing the number of neighbours in the model will of course show a flattening of the daily cases curve [23], on the other hand one must then also include the possibility that the virus mutates and becomes stronger. In another recent analysis, the death factor has also been incorporated in the SIR [24] . In any case these factors will make the model more complicated and our results show it is not necessary. Also, whether in reality improvement in treatment helps in controlling the disease is a debatable point. A recent analysis suggests that enhanced medical facilities can theoretically have an effect in flattening the daily cases curve, but not so in practice [25] . So social isolation is the key factor that is ultimately responsible for the control of the disease. A pneumonia outbreak associated with a new coronavirus of probable bat origin Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner Data analysis for the COVID-19 early dynamics in Northern Italy A Time-dependent SIR model for COVID-19 Viewing the Progression of the Novel Corona Virus (COVID-19) with NewsStand Predicting the cumulative number of cases for the COVID-19 epidemic in How many infections of COVID-19 there will be in the "Diamond Princess The Reconstruction and Prediction Algorithm of the Fractional TDD for the Local Outbreak of COVID-19 The Outbreak Evaluation of COVID-19 in Wuhan District of China Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia Youjin Deng, Scaling features in the spreading of COVID-19 Effective containment explains sub-exponential growth in confirmed cases of recent COVID-19 outbreak in Mainland China Artificial Intelligence Forecasting of Covid-19 in China Visual Data Analysis and Simulation Prediction for COVID-19 Epidemic analysis of COVID-19 in China by dynamical modeling Trend and forecasting of the COVID-19 outbreak in China Fractal kinetics of COVID-19 pandemic Space-time dependence of corona virus The susceptible infectedrecovered model on a Euclidean network An empirical analysis of the Ebola outbreak in West Africa Sociophysics: an Introduction Analysis and forecast of COVID-19 spreading in China Don't Flatten the Curve Acknowledgement: We thank Anirban Kundu, Arnab Chatterjee and Soumyajyoti Biswas for interesting discussions and suggestions. Private communication with Amita Kapoor and Robert Ziff is also acknowledged.