key: cord-0839525-hr2g5ww6 authors: Balde, M. A. M. T. title: Fitting SIR model to COVID-19 pandemic data and comparative forecasting with machine learning date: 2020-05-01 journal: nan DOI: 10.1101/2020.04.26.20081042 sha: 70e8c05786b7d95423088ebb9d2ee823d54e61fd doc_id: 839525 cord_uid: hr2g5ww6 In this work, we use a classical SIR model to study COVID-19 pandemic. We aim to deal with the SIR model fitting to COVID-19 data by using different technics and tools. We particularly use two ways: the first one start by fitting the total number of the confirmed cases and the second use a parametric solver tool. Finally a comparative forecasting, machine learning tools, is given. The classical SIR model is given as follow: with the dependent variables S(t), I(t), R(t) being respectively the number of susceptible individuals at time t, the number of infectious individuals at time t and the number of recovered individuals at time t. In addition, the unknowns S(t), I(t), R(t) satisfy N = S(t) + I(t) + R(t). N is the total number of individuals. The initial conditions are S(0) = S 0 ≥ 0, I(0) = I 0 ≥ 0 and R(0) = R 0 ≥ 0. Some parameters must be presented: β is the contact rate, 1/γ is the average infectious period, R 0 = β/γ is the basic reproduction number. The SIR model satises some properties: • If R 0 > N/S 0 , then I(t) increases up to a maximum value I max = I 0 +S 0 −(1+ln( • If R 0 < N/S 0 , then I(t) decreases to 0. Remark II.1. At the outset of an epidemic, nearly everyone (except the infected case) is susceptible. So we can say that S 0 ≈ N and then we can replace N/S 0 by 1. It is also possible to couple the SIR model with other dierential equations like death equation to study the number of death due the epidemic disease or economic model equations to study the impact of the epidemic disease on the economy. To couple the SIR model with a death equation, we consider the death rate µ due to the infection. So the system is given by: with D being the number of death due to the infection. Then the total removed is R(t) + D(t) and N = S(t) + I(t) + R(t) + D(t). 2 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 i. Fitting function to data The starting point of the work is to nd a function that ts the data of total conrmed cases. As in [Liu, Magal, Seydi and Webb, 2020] , we set the total number of infectious cases by: T N I(t) = t t0 I(s)ds. ( The total number of infectious cases function, that t the data, can be written as T N I(t) = b exp(ct) − a. Where a, b, c are parameters to estimate by using the least square method. Then with T N I(t) we can obtain information on the unknown functional variables and parameters of the SIR model. After calculation as in [Liu, Magal, Seydi and Webb, 2020] , we obtain: The simulation results are given in subsection i, gure 1a, 1b, 1c and 1d. Also the values of the parameters β and γ are given. We consider now that at a time T some measures are taken like social distancing, half or full connement. It can be interpreted as the contact rate is reduced by some factor or at a time point, the contact rate is close to 0. Indeed in [Liu, Magal, Seydi and Webb, 2020] , the authors considered that the transmission of COVID-19 from infectious to susceptible individuals stopped after strong measures has been taken in China. Then they have xed the contact rate to 0. In [Lauro, Kiss and Miller, 2020] , the authors considered that the contact rate is reduced by some factor when a social distancing intervention is introduced with a certain duration. Then they have replaced β by (1 − c)β. It is also possible to consider that β will decrease depending on time like in [Liu, Magal, Seydi and Webb, 2020] , where authors has considered an exponential decreasing function of time. In this work, we consider that the contact rate decreases progressively to 0. Then we choose continuous function with a slow decrease to describe the contact rate starting at the date-time of the measures. We propose two types of function for β notedβ(t). The rst one is :β where δ and p are parameters to choose. The second one is: where κ and q are parameters to choose. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 By solving (6), we obtain the new expressions of T N I(t), I(t) and R(t). We show the results in subsection i. Remark III.1. 1. The values of the parameters δ, p, κ and q can be xed such thatβ(t) decreases slowly. 2. Sinceβ depends on time, we must always consider values of t such thatβ remains positive. ii. Fit to data with scaling When there is not enough data the t is generally dicult since the maximal values of the dependent variables can be very huge. For example, S 0 depends on the total population and generally S 0 ≈ N , with N the size of a chosen sample of the population. In SIR models, it is generally possible to choose dierent population sizes with similar characteristics. If we have characteristic measures we can make the model dimensionless so that the quality of the results is always good. Then we scale the SIR model to the data by using scaling [Langtangen and Pedersen, 2016] . Let us x the following constant characteristics size: t c , S c , I c , R c respectively of the time t, the susceptible S(t), the infected I(t) and the recovered R(t). Then the dimensionless variables t,S,Ī,R are given by t =tt c , S =SS c , I =ĪI c , R =RR c . Replacing in (1) and calculating, we obtain the scaled system: Remark III.2. For the adjusting of the data, we choose t c = t d max , with t d max being the last time of the data. 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 May 1, . . https://doi.org/10.1101 May 1, /2020 doi: medRxiv preprint SIR model t to COVID-19 data• 2020, April • Vol. , No. The scaled system can be written, by dropping the -, as follows: Remark III.3. By choosing the characteristics such that S c = I c = R c , we obtainβ 1 =β 2 and γ 1 =γ 2 . And then we get a new SIR model. We estimate the parameters β and γ by using parametric solver in Mathematica. With the parametric solver, we solve the SIR model (2) with solutions depending on the parameters we want to estimate such that the model ts the data. Then plotting the parametric solution for dierent values of β and γ, nally, give a t. Parametric solver typically solve dierential equations by going through several dierent stages, depending on the type of equations. Machine learning is programming computers to optimize a performance criterion using example data or past experience. We have a model dened up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or past experience. The model may be predictive to make predictions in the future, or descriptive to gain knowledge from data or both. Machine learning uses the theory of statistics in building mathematical models because the core task is making inferences from a sample.(See [Alpaydin, 2010] ). Then machine learning can be used for the estimation of parameters of an epidemic model. We can go deep by doing data mining since we have massive world data. We can use machine learning to learn about the worldwide data of pandemic COVID-19 and then to predict the future evolution of the disease. We can also try to learn on the spatial distribution and progression of the disease and predict the location susceptible to be a high level of risk. In this work, we just use a machine learning tool to learn on small data of Senegal's COVID-19 cases and then to predict the evolution in future days. We use it as a comparison with the other forecasts based on the work we do in this paper. Numerical simulations i. The data In this subsection we show table of data we use for simulation. The tables 1, 2 and 3 give data respectively of Senegal, China and France. Data for Senegal is obtained from daily press releases on 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. ii. The function t Here we show simulations related to the subsection i of section III. The formula of T N I(t) is given by T N I(t) = b exp(ct) − a, with a = 13.9324, b = 9.61779 and c = 0.100095 (gure 1a). We use γ = 1/7 and then we obtain: t 0 = 3.7051, I 0 = 1, 39456. The total population of Senegal is N = 16743927 from Senegal Population (2020) -Worldometer (www.worldometers.info) and then we obtain S 0 = N − I 0 . Since S 0 ≈ N we simplify the calculation and obtain β = 0, 242957. Now we consider the time of the measures as 2020, March 23. Then T = 23. Forβ given by (4), results are shown by gures 2a, 2c and 2e. Forβ given by (5), results are shown by gures 2b, 2d 6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 May 1, . . https://doi.org/10.1101 May 1, /2020 doi: medRxiv preprint SIR model t to COVID-19 data• 2020, April • Vol. , No. Recovered cases 0 0 We do the same as in the paragraphs above for France. T N I(t) = b exp(ct)−a, with a = 1, b = 0.88 and c = 0.158 (gure 3a). We use γ = 1/7 and then we obtain: t 0 = 0.809, I 0 = 0.158. The total population of France is N = 65241903 from France Population 2020 (https://worldpopulationreview.com) and then we obtain S 0 = N −I 0 . Since S 0 ≈ N we simplify the calculation and obtain β = 0.300857. For France, we consider the time of the measures as 2020, March 17. Then T = 56. Forβ given by (4), results are shown by gures 4a and 4b. Forβ given by (5), results are shown by gures 5a and 5b. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Fit with data of the total cases: T N I(t) is the blue line and the data are the red dotted. (b) Plot of T N I(t)(red line) with data of total cases(red dotted), the I(t)(blue line) with the data of new cases(blue dotted) and R(t)(green line) with the data recovered cases(green dotted). (c) Plot of T N I(t)(red line) with data of total cases(red dotted) and the I(t)(blue line) with the data of new cases(blue dotted). (d) Plot of T N I(t)(red line) with data of total cases(red dotted) and R(t)(green line) with the data recovered cases(green dotted). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10.1101/2020.04.26.20081042 doi: medRxiv preprint (a) Fit with data of the total cases: T N I(t) is the blue line and the data are the red dotted. (b) Plot of T N I(t)(red line) with data of total cases(red dotted) and R(t)(green line) with the data recovered cases(green dotted). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10. 1101 /2020 iii. The t by scaling The scaled SIR model (8) is plotted in gure 6a. And in gure 6b we t with the data of Senegal in table 1. We x the characteristic parameters as: t c = 30 which is the number of days of the data of Senegal in table 1, I c = S c = R c = 1000. We use β = 5.0753, γ = 1/45 to obtain β 1 = β 2 = 0.00939349 and γ 1 = γ 2 = 0.535714. Here we use a parametric solver in Wolfram Mathematica to solve the SIR model (2) with respect to parameters β and γ. We start by using the values β = 5.0753 and γ = 1/45. Then after we get the new values for the t. The results are shown in gures 7a, 7b and 7c for Senegal, in gures 8a, 8b and 8c for China and in gures 9a, 9b and 9c for France. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Plot of the SIR model(2) with the parameters β = 5.0753 and γ = 1/45 and the proportion of death µ = 0.01γ. The susceptible S(t) in red line, The population in blue line, the infected I(t) in green line, the recovered in orange line and the death D(t) in purple. (b) Plot of the SIR model (2) with the parameters β = 5.0753, γ = 1/45, the proportion of death µ = 0.01γ and reduced population. And plot without t to the Infected data in red dotted, the recovered data in blue dotted, the death data in black dotted. (c) Plot of the SIR model (2) with the parameters β = 0.15, γ = 1/45 and the proportion of death µ = 0.01γ. And plot with t to data. 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Plot of the SIR model(2) with the parameters β = 5.0753 and γ = 1/45 and the proportion of death µ = 0.11γ. The susceptible S(t) in red line, The population in blue line, the infected I(t) in green line, the recovered in orange line and the death D(t) in purple. The total population is reduced. (b) Plot of the SIR model (2) with the parameters β = 5.0753, γ = 1/45, the proportion of death µ = 0.01γ and reduced population. The susceptible S(t) in red line, The population in blue line, the infected I(t) in green line, the recovered in orange line and the death D(t) in purple line. And plot without t to the Infected data in red dotted, the recovered data in blue dotted, the death data in black dotted. (c) Plot of the SIR model(2) with the parameters β = 0.35, γ = 1/40 and the proportion of death µ = 0.11γ. And plot with t to the Infected data in red dotted, the recovered data in blue dotted, the death data in black dotted. Figure 8 : Plot of the SIR model(2) compared with plot of China's data in table 2. The total population of Hubei we use is N=59172000 (https://en.wikipedia.org/wiki/2020_Hubei_lockdowns). On the abscissa axis, the graduation 63 represents 2020, March 08. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Plot of the SIR model(2) with the parameters β = 5.0753 and γ = 1/45 and the proportion of death µ = 0.01γ. The susceptible S(t) in red line, The population in blue line, the infected I(t) in green line, the recovered in orange line and the death D(t) in purple. (b) Plot of the SIR model(2) with the parameters β = 5.0753, γ = 1/45, the proportion of death µ = 0.01γ and reduced population. And plot without t to the Infected data in red dotted, the recovered data in blue dotted, the death data in black dotted. (c) Plot of the SIR model (2) with the parameters β = 0.18, γ = 1/32 and the proportion of death µ = 0.01γ. And plot with t to data. (d) Plot of the SIR model(2) with the parameters β = 0.18, γ = 1/32 and the proportion of death µ = 0.01γ. And plot of the t without data. The SIR model is tted to the data. This allows us to note that the maximum number of infected can go up to 900000 (gure 7d) in Senegal and 2700000 (gure 9d) in France. While in the scaling the gures 6a shows that the maximum number of infected can go up to 320000. However, this analysis does not take into account the nationwide anti-pandemic measures. Using the results shown in the gures 2c, 2e, 2d, 2f, 4a, 4b, 5a and 5b, we can see that the data deviate from their rst exponential evolution to follow a new, slower trajectory which can still be exponential. To understand the nature of this new trajectory, the way to manage the contact rate is decisive and can lead to dierent analyzes. However, having considered a slow decrease over time of the contact rate, we can see in the gures 2e, 2f, 4b and 5b that the updated data pass under the new trajectory for Senegal while in France the data follow the new trajectory. This could, therefore, be due to the eects of the measures taken by these countries. 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 In this section, we plot some predictions for Senegal. The prediction does not concern the total conrmed cases, but only the eective cases obtained by reduced from the total conrmed cases, the recovered cases and the death cases. We show the curve of eective infected cases given by T N I(t)−R(t)−D(t) with bothβ (4) and (5), the curve of the tted infected I(t) in gure 7c, and the curve obtained by using machine learning. We plot these curves with additional dates until 2020, April 21. We use machine learning, based only on data, to do forecasting. Predict is a function of Automated Machine Learning in Wolfram Mathematica. It allows for automatic training and data prediction. We can choose dierent method of regression algorithm: RandomForest, LinearRegression, NeuralNetwork, GaussianProcess, NearestNeighbors, etc. We use the NeuralNetwork regression algorithm which predicts using an articial neural network. Let's recall that our forecasting use in a rst part the results of the works in subsection i,ii and iii where we use a SIR model to t data. And in a second part, the forecasting using only data. The forecasting show in gures 10a, 10c an optimistic situation for Senegal. Also the machine learning gure 10e shows an optimistic forecasting. But in all cases, we see that the additional data (March 31-06) goes down the predicting path. It may be caused by the nationwide antipandemic measures in Senegal. In addition, we have considered in gures 10a and 10b a contact rate gradually reduced since the measures were taken and we see that the data come below the path of the predicted curve. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Plot of the number of infected cases using the scaling SIR model (6) with β given by (4). On the abscissa axis, the graduation 30 represents 2020, March 31. Then 2020, April 06 is the graduation 36. (b) Plot of the number of infected cases using the scaling SIR model (6) with β given by (5) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Plot of the number of infected cases using the scaling SIR model (6) withβ given by (4). On the abscissa axis, the graduation 70 represents 2020, March 31. Then 2020, April 06 is the graduation 76. (b) Plot of the number of infected cases using the scaling SIR model (6) withβ given by (5). On the abscissa axis, the graduation 70 represents 2020, March 31. Then 2020, April 06 is the graduation 76. (c) Plot of the number of infected cases using SIR model (2) with parametric solve. On the abscissa axis, the graduation 70 represents 2020, March 31. Then 2020, April 06 is the graduation 76. (d) Plot of the number of infected cases by training a PredictorFunction to predict the average next values of infected. On the abscissa axis, the graduation 70 represents 2020, March 31. Then 2020, April 06 is the graduation 76. Figure 11 : Forecasting for France, using the results of the works in subsection i,ii and iii and machine learning. The data plotted (red dotted, green dotted), is given in table 3, but completed until 2020, April 06. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Conclusion and Perspectives In this paper, we have used the classical SIR model for tting data and then we have done forecasting. We have also estimated the contact rate, and the average infectious period parameters of the SIR model to obtain t. Machine learning can help in epidemiology to understand the disease, but also to study the impact or eectiveness of the anti-pandemic measures taken. We can also study the ideal date for stopping the containment measures. Ideal in the sense that even without the measures the disease can no longer spread. We can also study the possibility of making periodic connements to reduce the economic impact of the measures. The economy may be impacted by the evolution of the disease and measures. It would, therefore, be interesting to couple economic models with epidemiological models. Since the impact can also be long-term, scaling can be used to dimension the epidemiological-economic coupling. The impact of the environment is also to be taken into account in the models. This means studying the contamination due to the environment (air, objects, etc). i. Using function t in SIR model Then I(t 0 ) = bc exp(ct 0 ) = ac = I 0 and I(t) I(t 0 ) = exp(c(t − t 0 )). Hence, we obtain I(t) = I(t 0 ) exp(c(t − t 0 )), thenİ (t 0 ) = cI(t 0 ). From the second equation in the SIR model (1), we obtain at t 0 ,İ (t 0 ) = βI 0 S 0 N − γI 0 = cI(t 0 ). Hence, we get c = βS 0 N − γ. Using the third equation in the SIR model (1) yieldṡ then integrating, we obtain R(t) = γ c (I(t) − I(t 0 )). 18 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 .04.26.20081042 doi: medRxiv preprint SIR model t to COVID-19 data• 2020 ii. Determination of the scaled SIR S(t) = S(t ct ), I(t) = I(t ct ) and R(t) = R(t ct ), then replacing in left members of the SIR model (1), we obtain:Ṡ = S c t cṠ Replacing again in right members of the SIR model (1), we get: Then we have: Finally, we obtain: . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020 . . https://doi.org/10.1101 /2020 Massachusetts Institute of Technology Infectious Diseases of Humans Coronavirus propagation modeling considerations Epidemiology models Mathematica package. System-Modeling at GitHub Epidemiology models modications Mathematica package The Mathematics of Infectious Diseases Scaling of Dierential Equations Understanding Unreported Cases in the 2019-Ncov Epidemic Outbreak in Wuhan, China, and the Importance of Major Public Health Interventions Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data Epidemic Data for Novel Coronavirus COVID-19