key: cord-1029692-ead7ocw1 authors: nan title: Analysis, Modeling, and Representation of COVID-19 Spread: A Case Study on India date: 2021-05-17 journal: IEEE Trans Comput Soc Syst DOI: 10.1109/tcss.2021.3077701 sha: d8db24daaff54a3ff37987999e52adfe78577f18 doc_id: 1029692 cord_uid: ead7ocw1 Coronavirus outbreak is one of the challenging pandemics for the entire human population on Earth. Techniques, such as the isolation of infected people and maintaining social distancing, are the only preventive measures against the pandemic. The actual estimation of the number of infected peoples with limited data is an indeterminate problem faced by data scientists. There are several techniques in the existing literature, including reproduction number and case fatality rate, for predicting the duration of a pandemic and infectious population. This article presents a case study of different techniques for analyzing, modeling, and representing the data associated with a pandemic such as COVID-19. We further propose an algorithm for estimating infection transmission states in a particular area. This work also presents an algorithm for estimating end time of a pandemic from the susceptible infectious and recovered model. Finally, this article presents the empirical and data analysis to study the impact of transmission probability, rate of contact, infectious, and susceptible population on the pandemic spread. A DISEASE is a harmful deviation from the normal condition that adversely affects human health [1] . Every disease is a medical challenge, which may have symptoms and signs for their identification. These symptoms facilitate easier spotting of the disease in the early stage. However, some diseases do not have symptoms, which makes them untraceable. The rapid advancement in medical science helps in curing most of the existing diseases. However, some diseases are spread due to unidentified viruses or bacteria, which creates pandemics such as COVID-19. The newly identified novel coronavirus is responsible for the outbreak of COVID-19 that has the potential for maximal death [2] . The effect of the pandemic (COVID-19) and its preventive measures, such as quarantine and social distancing, can be analyzed using data analysis, modeling, and representation techniques of data mining [3] , [4] . These techniques help in predicting disease's behavior, thus providing the convenience of taking necessary preventive measures. Reproduction number is an essential parameter for predicting the behavior of a pandemic in terms of infection Manuscript spread. It estimates the spread of secondary infections from the primarily infected people. Next, the case fatality rate captures the number of deaths during a pandemic spread. It plays a vital role in estimating the actual damage to human lives. Furthermore, there are four states of infection transmission in a pandemic, i.e., no contact, local, untraceable, and source missing. The first state is the least harmful and often occurs at the initial stage of the pandemic, whereas the last state is critical, indicating that the spread of infection is severe. In the literature, different pandemic models are discussed, including susceptible, infectious, and recovered (SIR), susceptible infectious (SI), and susceptible, infectious, and susceptible (SIS) model. We are currently living in the information-centric era that encourages prediction and analysis of different parameters associated with a pandemic disease. The predictions are performed by the models that take the historical data to learn different patterns. The performance of these models depends heavily upon data representation and storage. Therefore, in this article, we summarize different techniques presented in the existing literature for estimating various parameters of the pandemics. We further discuss the models that predict the end time and the total population infected and susceptible. The impact of the heterogeneous recovery rate of a pandemic on a complex network is analyzed by de Arruda et al. [1] . They have used the modified version of the SIS model to determine the infectivity. Li et al. [2] developed a new model named SEIQR. It incorporates susceptible (S), exposed (E), infectious (I ), confirmed (Q), and recovered (R). SEIQR helped in estimating the imported cases of COVID-19 in the discrete-time interval. Zhou et al. [5] proposed the mechanism of estimating the basic reproduction number using the SEIR model, where they have estimated the transmission probability of COVID-19 in China. Next, Van den Driessche and Watmough [6] proposed the mechanism for analyzing the disease-free equilibrium on the heterogeneous population. They proposed an epidemic model named SEIT, which incorporates susceptible (S), exposed (E), infected (I ), and treated (T ). The methodology for epidemiological modeling is discussed by Ranjan [7] , where he has predicted the reproduction number of the COVID-19. Furthermore, Zhang et al. [8] proposed an improved version of the SIR model for characterizing the dynamics of a 2329-924X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. pandemic on social contacts. The layered architecture for modeling the pandemic is discussed in [9] . Finally, Shi et al. [10] examined the pandemic threshold for identifying the spread of disease. Table I presents a comparative summary of some existing work on pandemic modeling. It compares the pandemic models and parameters, including reproduction number (R o ), case fatality rate (C r ), and state of infection transmission. To the best of our knowledge, this is the first work as a case study that covers data analysis, modeling, and representation of the coronavirus spread in India. This article makes the following major contributions: 1) Reproduction Number: First, we describe the role of the reproduction number for estimating the infected population during pandemic such as COVID-19. We also determine the average value of the reproduction number for India up to the specified date by analyzing the existing dataset. 2) Case Fatality Rate: This work analyzes the role of case fatality rate while estimating the damage caused by the pandemic. We describe the necessity of case fatality rate along with the reproduction number while estimating the infected and recovered population. 3) Infection Transmission States: Next, we describe the different infection transmission states of a pandemic such as COVID-19. This work also proposes an algorithm for iteratively estimating the infection transmission states. 4) Data Modeling: Furthermore, we present different models of a pandemic, including SIR, SI, and SIS. We also propose an algorithm for estimating the end time of a pandemic using the SIR model. 5) Empirical Analysis: Finally, this article presents an empirical analysis to study the impact of transmission probability, rate of contact, infectious, and susceptible population. The rest of this article is organized as follows. Section II illustrates the COVID-19 open-access dataset and the parameters used for measuring the impact of a pandemic. Section III illustrates the different mathematical models of the pandemic. Section IV illustrates the empirical results and this article is concluded in Section V. In this section, we first illustrate an open-access dataset of COVID-19 that we use in the rest of this article. We next define the parameters used for measuring the impact of the pandemic. This work uses the COVID-19 dataset available in [11] for predicting and analyzing the pandemic in India. The dataset illustrates the detailed description of the patients infected from coronavirus. Table II shows the first eight entries of the dataset having different columns, including patient number, state patient number, date announcement, age bracket, gender, detected city, detected state, state code, current status, nationality, type of transmission, and status change date. The column header patient number indicates the actual patient count in India, and state patient number is a combination of state code, detected city code, and patient count. The column current status indicates the current state of the patient. Next, nationality followed by type of transmission that depicts whether the transmission of infection is imported from other countries or local. Finally, status change date illustrates the status change of infected patient to recovered or deceased. The pandemic disease modeling incorporates the reproduction number (denoted by R o ) to understand the disease outbreak by wrapping up all the infection transmission in a particular area. The reproduction number specifies the number of people being infected by the infectious population [5] . We can define R o as follows. Definition 1: The reproduction number of a given region is the ratio of the total number of people infected to the total number of people who are infecting, i.e., Total number of people infected Total number of people who are infecting (1) where "the total number of people infected" means the number of secondary cases from the primary cases. Next, "the total number of people who are infecting" means the number of patients who have infected other people. R o is an essential metric that determines: how contagious the disease is? A disease is said to be highly contagious when an infected individual transmits the infection to a large number of people. The mechanism of providing a proper treatment or appropriate handling of any disease requires full information about how the infection develops in the body, the mode of transmission, and the social structure where it can spread. If all this information is available, then the outbreak can be detected and monitored easily. The reproduction number R o captures all this necessary information by incorporating three components: population density, rate of infection, and rate of recovery [6] . The estimation accuracy of R o does not depend on the dataset but relies on the mechanism of the calculated number of peoples being infected. Fig. 1 shows the value of R o for the given dates. It also shows the minimum and maximum values of R o in the states of India. We conclude that the difference between the minimum and maximum values of R o is very high, which indicates that the pandemic is very high in some states while others at lower threat. The value of R o changes over time, as shown in Fig. 1 . Here, though the value of R o is time-varying but is always greater than 1. As the value of R o ≥ 1 indicates the outbreak of the pandemic. Furthermore, we observed from Fig. 1 that the value of R o achieves sharp peaks and sudden decrements. It is because the reported infection transmission on some specified date can sharply increase or decrease independent of previous days. Table III shows the value of R o for the spread of COVID-19 in different states of India from the dataset available in [11] up till April 18, 2020. The average R o value of some states is less than 1, which is a good indication of lower spread. The case fatality rate (denoted by C r ) provides an estimation of the number of people who dies among the total infected people from the disease during its spread in a particular area. Definition 2: Case fatality rate of a given region is the ratio of the total number of people who died from the pandemic to the total number of the people who are infected, i.e., The reproduction number only provides the rate of disease spread but cannot estimate how many people lose their lives in the entire cycle of the spread. Therefore, the case fatality rate is crucial for such estimation. The pandemic such as measles with R o around 18 has a lower case fatality rate to that of Ebola with R o around 1.5 [12] . Fig. 2 shows the value of C r for the given dates. It also shows the minimum and maximum values of C r in the states of India. We conclude that the difference between the minimum and maximum values of C r is very high, which indicates that a pandemic is very high in some state, while others are free. The result also illustrates the minimum value of C r increases with time. It indicates that the pandemic is spreading in India at a slower rate. Table IV shows the case fatality rate of different states of India. Measures, such as social/physical distancing, reduce the spread of the disease infections by breaking the transmission chains. infection_transmission of each individual can be easily obtained by tracing the infected person's contact with other people. In this work, we are using a dataset available in [11] , which also provides a dedicated column for tracing out the infection source. Therefore, we can easily determine infection_transmission of each individual. The spread of the infection is categorized in the following four states [7] , [ Data modeling helps in quantifying the damage caused by the pandemic (such as COVID-19) and its probable end time. The models simulate the infection transmission from person to person and its geographical spread on the entire population. This section discusses the existing pandemic models. In addition, we estimate different parameters using the existing dataset. SIR model [8] is one of the widely used models to figure out the damage caused by a pandemic. The SIR model incorporates the three types of people, i.e., susceptible, infectious, and recovered denoted as S, I , and R, respectively. Here, the people of one type switch to another type with a specific course of action like the frequent contact with infectious converts S into I . If a pandemic covers the entire population M having S, I , and R type of peoples, then M = S + I + R. The phases (type of people) of SIR models are shown in Fig. 5 . All the people in a particular area who are not infected with the disease and have no reported immunity against the infection are categorized as susceptible (S). The absence of immunity in a person results from no prior exposure to the infection and no vaccination against such infection. During the disease outbreak such as COVID-19, some people came in close contact with the infected person and got infected afterward, forming a group called infectious (I ). These people are solely responsible for the spread of the disease in the community. The quarantine of such infected people preserves the community from being infectious. Every disease such as pandemic COVID-19 has a specific cycle for infection persistence. After this cycle, the infected population recovers from the infection with proper treatment. However, the severe spread of infection in the human body and inadequate cure can result in death. The summation of total recovered and total dead is termed recovered (R) or sometimes removed. The transmission of the disease changes the peoples count in a phase over time. The SIR model can be represented as [9] R = α 2 I − γ I (3) where α 1 and α 2 are force of infection and recovery rate, respectively, γ denotes the death rate, and ∪ is the union operator. The force of infection α 1 is the rate at which the susceptible person becomes infectious on close contact with the infected person. Recovery rate α 2 is the rate at which an infected person recovers from the infection using the proper treatment. The force of infection is not a constant quantity and is proportional to the transmission rate τ . The transmission rate τ is the product of the rate of contact r c and the probability of transmission p t from susceptible to infectious Using (6), we can define force of infection α 1 as where {I /M} × 100 represents the percentage of population infectious among the total population in a particular area. To capture the population change in a phase, the SIR model adopts differential equations. A differential equation captures the progression of the disease. The rate of change Calculate rate of contact (r c ) among the persons. Calculate probability of transmission ( p t ). Estimate force of infection α 1 using Eq. 7. Calculate R and I for given S. I = α 1 S and R = α 2 I − γ I using Eqs. 4 and 3. Calculate the derivatives dS dt , dI dt , and d R dt . t ← t + t. 14 Either Disease Free or Endemic Equilibrium is reached. 15 The pandemic will be ended after t time. in S, I , and R are defined as follows: An essential step after estimating the rate of change in S, I , and R using the differential equation is to determine the SIR model's equilibrium state. To describe the pandemic methodology, two equilibria exist side-by-side, i.e., disease-free equilibrium and endemic equilibrium [14] . Algorithm 2 illustrates the step of estimating the time t for which a pandemic persists using the SIR model. Fig. 6 shows the relationship between SIR cases in India up till April 18, 2020. The figure demonstrates that infectious persons grow up to a limiting value, where recovery starts at a higher pace. It compels most people to come out of the susceptible, and the outbreak is about to die. During the construction of the result, we have taken the value of reproduction number R o = 1.79. Fig. 7 . Illustration of infectious population using the SI model on the COVID-19 dataset of India up till April 18, 2020. Ranney et al. [15] proposed the SI model and assumed that the total population in a particular area during pandemic belongs either to susceptible or infectious. Apart from the SIR model, they do not consider the recovered cases in a particular location. Let, at time t = 0, the number of infectious and susceptible peoples be represented as I o and S o , respectively. Let τ represent the transmission rate given in (6) . Then, the rate of change in susceptible and infectious is given as Upon further calculation as given in [15] , the number of infectious during a pandemic breakdown can be obtained as Fig. 7 shows the fraction of the infectious population for the past 30 days of the pandemic (COVID-19) breakout in India. The result shows that the curved we obtained is an S-shaped curve for infectious. The curve grows exponentially from the early stage of infection in the country, and the infection starts spreading. During the result estimation, we have taken the value of reproduction number R o = 1.79. The SIS model is an extension of the SI model [10] , where the authors assumed that people who recovered from the infection are still susceptible. The rate of change in susceptible is given as where τ and α represent the transmission rate and fraction of the infected population, respectively. The term τ S I can be understood as an average infection transmitted by an individual to τ M susceptibles. The rate of change in infectious is Using the assumption that the pandemic could infect the entire population such that M = S + I , we obtain The result shown in Fig. 8 depicts the relationship between susceptible and the infectious fraction of population in India up till April 18, 2020. The result is estimated by taking the value of the reproduction number as R o = 1.79, i.e., current status for India with a total 12 079 active cases. If the rate of infection in India is decreasing or not exceeding the current rate, then the spread of COVID-19 would be ended in the next 70 days. Fig. 8 shows that the rate of susceptible and the rate of infectious remain constant between 20 and 70 days. In the SIS model, people who are recovered from the infection are still susceptible. Moreover, if the rate of susceptible is constant, then no new person will be infected during the pandemic. Furthermore, at given recovery and case fatality rates, the total population being infected will be recovered in the next 70 days. Thus, no new population will be infected, which indicates that the pandemic is ended. Fig. 9 shows the entities of different pandemic models, including SI, SIS, and SIR. Here, in the SIS model, susceptible and infectious are forming a cycle, where the susceptible population being infected are still susceptible. This section empirically evaluates the performance of the SIR model on the population size of 1000. During the entire evaluation, we have taken the value of reproduction number (R o ) as 1.79. The empirical results on the SIR model rely on the following expression [16] : where I o is the initial infected population, r c is the rate of contact, and p t is the transmission probability. We have considered only the SIR model in the empirical analysis due to its broader applicability for modeling pandemic such as COVID-19. In addition, SI and SIS models are also available, which have not considered the recovered population that plays an essential role in determining the extent of pandemic's spread. Furthermore, the SIS model assumes that the infected populations are still susceptible. This assumption is not suitable for COVID-19 as there persists an uncertainty of infectious being susceptible. 1) Impact of Infectious and Susceptible: At first, this work carried out an empirical analysis to estimate the infectious population when the susceptible increases from 25% to 100%. Here, we fix the rate of contact r c = 1 and the probability of transmission p t = 0.3. Fig. 10(a) shows the increment in infectious with the increase in the % susceptible. If the susceptible reaches 100% on population size 1000, then the infection can be spread to 487 people. Similarly, Fig. 10(b) shows that the increment in the infectious accelerates the growth of susceptible population. Susceptible reaches 564 at the population size of 1000 with 40% infectious. It indicates that the effect of a pandemic (such as COVID-19) can be reduced by decreasing either the susceptible or infectious. 2) Impact of Rate of Contact and Probability of Transmission: Next, we study the impact of the rate of contact (r c ) and the probability of transmission ( p t ) on susceptible and infectious with a population size of 1000. First, we consider the probability of transmission as a constant ( p t = 0.3) and the rate of contact as a variable. Fig. 11(a) shows that the increment in the infectious is nearly linear with the rate of contact and population size. At higher population size (>800), the curve starts flattening as the infection has covered roughly half of the population. Similarly, susceptible increases linearly with increment in the rate of contact. Furthermore, Fig. 11 (c) and (d) shows that the transmission probability is highly sensitive toward the rate of infection spread and increment in the susceptible count. We have considered the rate of contact (r c = 1) for calculating the value of infectious and susceptible. Fig. 11(c) shows that the infectious count is directly proportional to the transmission probability and shows an exact linear increment with the increase in the population size. In a similar pattern, Fig. 11(d) shows that the number of susceptibles increases with an increment in the transmission probability. 3) Impact of Case Fatality Rate and Recovery Rate: Finally, we evaluate the impact of the case fatality and recovery rates on the rate of change in infectious and susceptible. Fig. 12(a) shows the variation in infectious and susceptible when both case fatality and recovery rates increase. The result depicts the simultaneous increase in case fatality and recovery rates rectify the number of infectious and susceptible. This increment in infectious and susceptible continued until a thresholding limit, where the case fatality rate's influence starts degrading to that of recovery rate. We also observe from Fig. 12(a) that when the population size is 50, the change in case fatality and the recovery rates are nearly similar. As the population size increases, the recovery rate is more rapid than the case fatality rate, i.e., gap between recovery and case fatality rates is considerable. Furthermore, at the population size 600, infectious and susceptible cover the entire population of 1000 people; thereafter, infectious and susceptible will only decrease. Decrease: It is the most adverse situation for a pandemic, where the case fatality is increasing, and recovery is decreasing. It indicates that the pandemic is in the worst phase and can infect the entire population in a particular area. Fig. 12(b) shows the adverse situation where both infectious and susceptible are increasing. It is similar to that of the previously discussed case, but one positive thing that mitigates severe damage is the decrement in fatality rate. Fig. 12(c) shows that the simultaneous decrement in both case fatality and recovery rates increase the infectious and susceptible. Increases: The last combination is the condition when the case fatality rate decreases and recovery rate increases. It is a favorable situation where there is an indication that the pandemic is ending. Fig. 12(d) shows that both infectious and susceptible decrease under such circumstances. The pandemic COVID-19 is spreading at a higher pace around the globe. The empirical and data analysis in this article tried to figure out the impact of various parameters, such as the probability of transmission, contact rate, infectious, and susceptible populations. The results indicated that the spread of COVID-19 could be analyzed, modeled, and represented in the tabular and graphical format. The data analysis has provided the complete in-site of the coronavirus spread and approximated the reproduction number of COVID-19 between 1.5 and 4 with a low fatality rate. This article has covered popular models for analyzing different aspects of the pandemic and identifies one model's advantage and disadvantage over others. The representations have provided a more natural way to visualize the actual effect of the pandemic spread. Impact of the distribution of recovery rates on disease spreading in complex networks Analysis of COVID-19 transmission in Shanxi province with discrete time imported cases The model repository of the models of infectious disease agent study Dynamics of information diffusion and its applications on complex networks Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission Predictions for COVID-19 outbreak in India using epidemiological models Modeling epidemics spreading on social contact networks Epidemic propagation with positive and negative preventive information in multiplex networks An SIS model with infective medium on complex networks Four Stages Of Coronavirus Transmission Deviations from the law of large numbers and extinction of an endemic disease Correlates of depressive symptoms among atrisk youth presenting to the emergency department Stability analysis for differential infectivity epidemic models The authors would like thank to Vinit Kumar Patra for doing the implementation and data preprocessing.