key: cord-0479800-u3krejmz authors: Petrica, Marian; Popescu, Ionel title: A Dynamical Estimation and Prediction for Covid19 on Romania using ensemble neural networks date: 2022-02-28 journal: nan DOI: nan sha: d080d044b3ed6940d5eed6cd2776f354e6ae9c50 doc_id: 479800 cord_uid: u3krejmz In this paper, we propose an analysis of Covid19 evolution and prediction on Romania combined with the mathematical model of SIRD, an extension of the classical model SIR, which includes the deceased as a separate category. The reason is that, because we can not fully trust the reported numbers of infected or recovered people, we base our analysis on the more reliable number of deceased people. In addition, one of the parameters of our model includes the proportion of infected and tested versus infected. Since there are many factors which have an impact on the evolution of the pandemic, we decide to treat the estimation and the prediction based on the previous 7 days of data, particularly important here being the number of deceased. We perform the estimation and prediction using neural networks in two steps. Firstly, by simulating data with our model, we train several neural networks which learn the parameters of the model. Secondly, we use an ensemble of ten of these neural networks to forecast the parameters from the real data of Covid19 in Romania. Many of these results are backed up by a theorem which guarantees that we can recover the parameters from the reported data. Infectious disease pandemics have had a major impact on the evolution of mankind and have played a critical role in the course of history. Over the ages, pandemics made countless victims, decimating entire nations and civilizations. As medicine and technology have made remarkable progress in the last century, the means of fighting pandemics have become significantly more efficient. Another problem is that globalization, the development of commerce, and the ease to travel all over the world facilitate the transmission mechanism of a new disease much more than it did in the past. In 2019, Covid19, a virus from the coronavirus family appeared and spread around the world very quickly. This changed dramatically our world as we know it. On 31 st of December 2019, the first cases of infection with an unknown virus causing symptoms similar to those of pneumonia were reported in China, to the World Health Organization. Shortly after that, the new virus was identified as a new coronavirus, and the public health organizations feared that the situation may degenerate in one similar to the SARS epidemic in 2003. The SARS outbreak was also caused by a newly identified coronavirus and it caused 8000 infection cases, 774 deaths, and significant financial losses. Within less than 3 months COVID-19 outbreak has become a global pandemic, spreading across almost all countries all over the world. As of 29 th of March 2020, there were 725980 COVID-19 confirmed cases that caused 35186 deaths. Although the outbreak started in China, the virus has spread rapidly all over the globe. The fast-evolving spread of the new coronavirus, which has been officially declared a pandemic, is represented below. The charts in Figure 1 show the countries where there have been reported at least 1000 cases of COVID-19 infection, at the mentioned date: Though in February 2020 the virus was still affecting mainly China, it quickly spread to the whole world. Since then, we witnessed many changes in political decisions, including lockdowns, school closures and many other restrictions which were aimed at controlling the virus spread. Because new variants appeared meanwhile, the fight against the disease is still on. Fighting COVID-19 was at first driven by quarantine and other restriction measures. These were imposed because the mechanisms of infection were not well understood and the hospitals were overwhelmed. Later on, various degrees of restrictions were imposed in order to control the spread and, at the same time, let the economy recover. Thus many political decisions affected for better or for worse the transmission of the virus. As with many natural phenomena, mathematical modeling is by now one of the scientific pillars on which we build our understanding of the world. There is a growing body of mathematical models used at the moment for the spread of virus which are somewhat related to our approach in this note. We mention a few sources without any claim of completeness [1, 18, 4, 10, 15, 16, 3, 17, 4, 16] . This turned out to be a valuable tool that can be used in the assessment, prediction and control of infectious diseases, as it is the COVID-19 pandemic. The purpose of this work is to develop a predictive model that can accurately assess the transmission dynamic of COVID-19. One of the popular choice used to mathematically model the epidemic is the SIR model appeared in [12, 13, 14, 6, 9, 8] . This is a compartmental model where an individual can be in one of the following 3 states, at any given time: susceptible (S), infected (I) or recovered (R). Inside the latter category we also have the number of deaths. The basic model assumes constant evolution of the parameters over time. Many studies have been done on the evolution of the pandemics around the world. The main goal of this paper is to analyze the evolution of the pandemic in Romania, though our methodology can be easily adapted to any country. The starting point of our study is the fact that during the pandemic it is very difficult to measure the values of the infected, susceptible or recovered. Among the reported numbers during the pandemic, the number of deaths seems more reliable than the number of infected or recovered. Deceased people are critical cases that have been hospitalized were they were tested and recorded. Guided by this, we propose a SIRD model which we introduced in [11] and independently studied also in [2] . As opposed to the SIR model, the main idea of the SIRD model is to consider the category of deaths separately from the recovered. Given the model, we face another challenge. The dynamic of the pandemic was driven by many factors, for instance, lockdown, opening of the restaurants, elections, opening or closing of schools, vacationing and other measures taken by authorities in order to mitigate the pandemic effects. Thus any reasonable parametric model is negatively impacted by the fact that the coefficients are not constant in time. This implies that the estimations can not be done on long term. In the previous paper [11] we considered a regime switch which was geared toward adapting the parameters to the political decisions of lockdown or relaxation. In this paper the main idea is to use a dynamic model which does not take into account long term evolution or outside assumptions about the status of the pandemic. This time we assume that we consider the data on a short period of time, in our case we chose to work with seven days and estimate the parameters based primarily on the number of deceased people. We do this in several steps. The first one is the cleaning and smoothing of the data. We fixed some anomalies in the reporting and we take a moving average of two weeks time. The second step is to exploit the model and generate data with parameters chosen at random. Based on only seven days of data, we train neural networks to learn the parameters of the model. Furthermore, we used these neural networks to estimate the parameters based on the real data. The key here is some form of ensemble learning which is interesting in itself. A single neural network does not seem to predict the parameters very well, however the average has a much better prediction power. The third and last step is to predict the evolution for a number of future days and compare with the real data. For a range of ten days, we get very close results to the real data. We should point out that in mathematical terms, our approach is a typical inverse problem. Generically, inverse problems can be ill posed. We show that our model is actually well posed, meaning that determination of the coefficients from the observation do identify the coefficient uniquely. This is the backbone of our analysis and estimation. The organization of the paper is as follows. In Section 2 we briefly discuss the SIR model and we introduce the SIRD model. Here we state Theorem 1 whose proof is in Appendix 5. Also here we describe the guiding idea of our approach. In Section 3 we discuss the data and its cleaning and describe the construction of the neural networks. Then we proceed with the Conclusions in Section 4. The first attempts of developing a mathematical model of the infectious diseases spreading were made at the beginning of the twentieth century. One of the most important models that can describe infectious diseases is the SIR model. The first ones that developed SIR epidemic models were Bernoulli, Ross [13, 14] , Kermack-McKendrick [7] and Kendall [5] . The SIR model is a mathematical model that can be used in epidemiology to analyze, at a given time for a specific population, the interactions and dependencies between the number of individuals who are susceptible to get an infectious disease, the number of people who are currently infected and those who have already been recovered or have died as a cause of the infection. This model can be used to describe diseases that can be contracted just one time, meaning that a susceptible individual gets a disease by contracting an infectious agent, which is afterward removed (death or recovery). It is assumed that an individual can be in either one of the following three states: susceptible (S), infected (I) and removed/ recovered (R). This can be represented in the following mathematical schema: Infected Recovered β γ where: We consider N as the total population in the affected area. We assume N to be fixed, with no births or deaths by other causes, for a given period of n days. Therefore, N is the sum of the three categories previously defined: the number of susceptible people, the ones infected, and the ones removed: Therefore, we analyze the following SIR model: at time t, we considerS(t) as the number of susceptible individuals,Ī(t) as the number of infected individuals, andR(t) as the number of removed/recovered individuals. The equations of the SIR model are the following: where: dt is the rate of change of the number of individuals susceptible to the infection over time; • dĪ(t) dt is the rate of change of the number of individuals infected over time; • dR(t) dt is the rate of change of the number of individuals recovered over time. Because there is no canonical choice of N , we will transform the system (1) by dividing it by N and considering S(t) =S(t)/N , I(t) =Ī(t)/N andR(t) =R(t)/N . It is customary to choose N = 10 6 for convenience but this is just an arbitrary choice. For instance, analysis on smaller communities or cities involves less than 10 6 , however, 10 6 is a common choice because countries number their populations in multiples of 10 6 . With these notations, we translate (1) into where β =β/N and γ is the same as in (1) . Observe that we actually have that S(t) + I(t) +R(t) = S 0 + I 0 +R 0 = 1 for all t ≥ 0. We propose a slight change in the SIR model which accounts for the number of deceased people separately. The main idea here being that the number of deceased people might be more reliable than other data, as for instance the number of infected. In the plain vanilla SIR model the recovered and deceased are combined into the single category of recovered. The idea of the SIRD model is to use the provided data of deceased people in a significant way. To this aim, we will work with four variables changing in time, namely S(t), I(t), R(t) and D(t) where R(t) is the proportion of recovered and alive people while the D(t) is the proportion of deceased people. We set the SIRD model as an interaction driven by the system of differential equations: This model appears in [11] and was also introduced independently in [2] . Notice that in this setup the recovered population bifurcates into recovered ones, accounted by R and the dead ones accounted by D. We also point out that by taking sum of the two factorsR(t) = R(t)+D(t) above we fall into the classical SIR model. The reason of accounting for D(t) separately is that the data reports the number of deaths separately and the model above allows to fit the parameters using the data. Even the above system is satisfactory to a certain degree, we would like to point out that in practice, the number of infected as well as the number of recovered is not really known. The data we have at our disposal reports the number of infected and recovered which are documented. The real number of infected is not really observed. Therefore we will adjust the model a little bit by introducing another parameter α which measures the proportion of observed number of infected and recovered. Therefore, we denotẽ In terms of these new quantities, the SIRD model becomes now We are going to estimate the parameters α, β, γ 1 , γ 2 from data based on certain number of days. The advantage of getting an estimate on α is that we can in reality predict the real number of infected people and also the number of recovered people. Of course in our adjusted model we have for all times t ≥ 0. Summing up all the equations in (3), we get that the derivative of S(t)+I(t)+R(t)+D(t) is constant in time. Since S 0 + I 0 + R 0 + D 0 = 1, as they represent the proportion of the entire population, we get that the sum S(t) + I(t) + R(t) + D(t) = 1 and using the definition of α andĨ,R we arrive at (5) . We present next the main mathematical result, with a proof in the appendix. This guarantees that the problem is well posed and we can recover the main parameters of the model. This result is fundamental for our approach. It shows that given a number of daily observations, at least 5 days, we can uniquely determine the parameters of the model. In practice, given any day k of the pandemic, we will use the previous data on a number of days to determine the parameters. The guiding idea: We can imagine the map from the daily data to the parameters as a function Φ(data gen ) = (β, γ 1 , γ 2 , α) where data gen = (Ĩ 0 ,R 0 , D(0), D(1), . . . , D(4)) generated by (4) . The theorem ensure us that this function is well defined on the set Data gen = {(Ĩ 0 ,R 0 , D(0), D(1), . . . , D(4)) solution of (4) for some (β, γ 1 , γ 2 , α)}. At this point, given a real data point data = (Ĩ 0 ,R 0 , D(0), D(1), . . . , D(4)) it is difficult to check that this belongs to Data gen . Thus our goal is to find the best approximation of data with a point in Data gen . In general this is achieved using projection methods, for instance non-linear least square, as it is done in [15] . In our case we consider an extension problem, rather than the projection method, though the method of neural networks trained on simulated data, using the SIRD model. These are functions which construct approximations of Φ defined on the set Data gen , that can naturally extend to the whole space, thus can be interpreted as (approximate) extrapolations of Φ to the whole space. As we will describe below, one single neural network did not work very well in our numerical experimentation, while an ensemble of neural networks achieve a better performance. Numerical simulations show that we get more robust results by considering a larger number of days for the deceased. We noticed that 7 days instead of 5 give more robust results. Also we exploit 2 days of data for infected and recovered to strengthen the robustness. One of our main findings is that the prediction works very well as it is shown in the next pictures. (a) Averages of deaths using 10 future days for predicted and real data. (b) Averages of infected using 10 future days for predicted and real data. (c) Averages of recovered using 10 future days for predicted and real data. Figure 2 : For each day k we predict using our method the next 10 days and take the average for each category. These are plotted along the averages of the real data for 10 days starting from day k. We can observe from Figure 2 that the prediction power is excellent for 10 forward days. We detail and discuss more on how these predictions in the next Section. In this section we present our strategy for the data cleaning and the estimation of the parameters β, γ 1 , γ 2 , α of the SIRD model. We clean the raw data which has several anomalies and we use a regularization by averaging. The basic strategy is the following. Based on the assumed SIRD model we generate data for 7 days and then train a neural network which learns the parameters from the data for this 7 days time interval. Then, using the real data and the neural network we find the parameters in a dynamical way for any 7 consecutive days. Given these 7 days we can predict based on the model what is going to happen on the next few days. In a real world the parameters do not stay constant, they change dynamically and we would like to catch part of this behavior. We took the data from https://datelazi.ro which keeps a record of all the data during the pandemic in Romania. During the pandemic, the reported numbers and the methodology regarding the reporting changed several times causing delays or bad reporting. In October 2020, the definition of a recovered person changed thus causing a data anomaly in the reported number of recovered. Particularly, we can see a spike of 44000 new cases from one day to another, equivalent to the cumulative number of cases until then. To alleviate this anomaly, we distribute the extra number of cases proportionally to the previous days. There are also periods of time in which the number of recovered people is actually 0 for almost three weeks. One look at the data concludes that the reporting during the weekends is significantly different from the reports during the weekdays. However, by law, the methodology of reporting from the health centers allows reporting cases within an interval of two weeks. In order to mitigate the above deficiencies, we replaced the data at time t with the average of the data during the previous two weeks preceding time t. We present in Figure 3 the data before and after the cleaning. The rows describe the data (from top to bottom) of recovered, infected and dead. The left column represents the raw data and the right column represents the adjusted data as we described above. Notice the scale and the spike in the first picture which is adjusted as we pointed out. The data we work with is scaled by 10, 000, 000. To deal with the estimation of the parameters, we follow two main steps. The first one is the generation of data. We take the range of β in the interval J β = (0, 1.5), for γ 1 we consider the range in J γ 1 = (0, 1), for γ 2 we consider the interval J γ 2 = (0, 0.01) and for α we consider the interval J α = (0.01, 1). Next we split each of these intervals into 7 sub-intervals of the same size which we will index accordingly as J β,i = (1.5 * i/7, 1.5 * (i + 1)/7) J γ 1 ,i = (i/7, (i + 1)/7)) J γ 2 ,i = (0.01 * i/7, 0.01 * (i + 1)/7) J α,i = (0.01 + 0.99 * i/7, 0.01 + 0.99 * (i + 1)/7) for i ∈ {0, 1, . . . , 6}. The splitting is motivated by the fact that we want to have good representation of the parameters and at the same time we want to avoid concentration of the parameters in one single region. We tried previously to use a simple uniform choice for each parameter in the whole interval, but we run into the problem of misrepresentation of small values of the parameters. It seems that this phenomena is due to some form of concentration of measure which is aleviated by using this splitting method. With this strategy we also avoid the overfitting problem of the neural networks. Therefore we obtain 7 sub-intervals for each of the 4 parameters which means that we get 7 4 = 2401 combinations of sub-intervals. Next, to generate the data we apply with the following procedure: For each i ∈ {1, 2, . . . , 10}, we create a training sample B i from the dataset ∆ of size 90% chosen at random without replacement. Using B i sample we train a neural network, NN i , i ∈ {1, 2, . . . , 10}, having as input: XT rain = (I 0 , I 1 , and our parameters Y T rain = (β, γ 1 , γ 2 , α) as output. According to our Theorem 1 we know that we can recover the parameters β, γ 1 , γ 2 , α only from I 0 , R 0 , D 0 , D 1 , D 2 , D 3 , D 4 , from our dataset. However, we choose to use more data because the estimates are more robust. This choice can also be interpreted as a regularization which decreases the training/test loss. The architecture of the neural network we use is of the following form The size of the training, respectively test data split for NN i is 80%, respectively 20% from the sample B i . After training the neural networks, the predictions of our parameters is made by averaging the predictions from all the individual neural networks on the real data (β,γ 1 ,γ 2 ,α) = 1 10 With this approach, we can achieve better performance of our model because we manage to decrease the variance, without increasing the bias. Usually, the prediction of a single neural network is sensitive to noise in the training set, while the average of many neural networks that are not correlated, is not sensitive. Bootstrap sampling is a good method of de-correlating neural networks, by training them with different training sets. If we train many neural networks on the same dataset, we will obtain strongly correlated neural networks. In Figure 4 we present the results of the estimated parameters β, γ 1 , γ 2 , α. Knowing the parameters for the model (β k , γ 1,k , γ 2,k , α k ) at each time k and the values (I k , R k , D k ), from the real data, we can generate the predictions P k,0 , . . . , P k,10 using the system (4) for the time interval [k, k + 10]. We take the average of P k,0 , . . . , P k,10 and call this P k . In Figure 5 we plot for each day k the average of the real (smoothed) data for 10 days starting with k alongside with the average P k computed above. As we already pointed out, the fit is very good. (c) Predicted and real recovered. Figure 5 : The plots of the real data and the predicted averages for the next 10 days. As we described above, for each day k we compute the average of the real data for the next 10 days and the average of the predicted data for the next 10 days. Notice the important fact that each prediction is made in terms of the previous 7 days. The close match suggests a very good prediction power of our approach. A slight difference appears in the case of the infected number of people during the forth wave of the pandemic, namely the Fall of 2021. The next images, in Figure 6 , show the prediction on the death for 30 and 60 days. In many cases the prediction is good, however there are regions in which the prediction ceases to be accurate. This highly depends on the timeframe we chose to make the predictions. (a) For each day we predicted the deaths for 30 days. (b) For each day we predicted the deaths for 60 days. Figure 6 : In both pictures, in blue is the real (reported) number of deaths. For each day k, we plotted, in red, the prediction of the deaths, starting with day k. On the left, the prediction is plotted for 30 days, while on the right it is plotted for 60 days. Remark that the 30 days prediction is much better than the 60 days predictions. Next, in Figure 7 , we look in more details at the images above, to see the refined structure of the behavior. We do this for 10 days versus 30 days starting at different moments of time. The dynamic of an infectious disease is highly impacted by numerous factors, including the measures imposed by governments or the attitude of population towards it, as the COVID-19 pandemic has shown. Therefore, it is very unlikely that the parameters of a model designed to asses the spread of a virus are constant over time. We have to be aware of the fact that, in the long term, the prediction is affected by all the restrictions/relaxations taken by most of the countries. We believe that this methodology has a high degree of generality to be used in many other cases of prediction, particularly useful for those cases where the prediction depends on many other factors which change the behavior of the model. We introduced on our model an extra parameter which accounts for the proportion of the infected and reported population versus the whole infected population. The point being that not every infected person is actually tested or reported. Thus the actual number of infected people should be higher. We do not account for the vaccination campaign, though this does not affect our model since we are looking at the parameters on relatively short period of time. The vaccination should in principle change the parameters, which is in fact exactly what we look for. The last author would like to thank Iulian Cimpean, Lucian Beznea and Mihai N. Pascu for interesting discussions about this paper. The statement of Theorem 1 is the following. Theorem 2. GivenĨ(0),R(0) and D(0), D(1), D(2), D(3), D(4) we can uniquely determine the parameters α, β, γ 1 , γ 2 of (6). Proof. We will first reduce the analysis to a single equation, namely the one for D(t). To do this we will write each of the involved quantities as functions of D(t) as follows The easiest to deal with isR because from the last two equations we get which leads toR(t) = γ 1 αγ 2 (D(t) − D 0 ) +R 0 . Now, we treat the function u which determines S(t) = u(D(t)). Dividing the first and the last from (6) we get which can be integrated to give S(t) in terms of D(t) as Furthermore, this allows us to solve forĨ(t) = v(D(t)). To see this, add the first two equations from (6) and then combine this with the last one to arrive at from which we deduce that Solving now forĨ and using (7) we obtaiñ which combined with the last equation of (6) shows that D(t) satisfies the differential equation Before we move forward, we will treat a little bit a general problem. Assume we take a differential equation of the form The solution X t starts positive, and the derivative is positive, thus the solution is non-decreasing for a while. Moreover, the solution is defined for all t ≥ 0 from general results for ordinary differential equations. By continuity of the solution, we have that the solution stays in the region f > 0 and thus it is increasing for all times it stays inside the region f > 0. It can not hit in finite time a point where f (X(t c )) = 0 since then, reverting the equation (looking at X(t c − s)) and combining this with the uniqueness of the solution, we must have that X t = X tc which is a contradiction. Thus, the solution is increasing and we can integrate the equation as follows: ds. Notice here that the function φ is well defined on the interval of f > 0 which contains X 0 . Therefore we have φ(X(t)) = t for all t ≥ 0 and X(t) is increasing from 0 to infinity. Now assume that we have two differential equations dX(t) dt = f (X(t)) and At this stage, the point is that if X(t i ) = Y (t i ) for some sequence of points 0 = t 0 < t 1 < t 2 < t 3 < t 4 , then we obtain that φ(X(t i )) − ψ(X(t i )) = 0 where φ(x) = ds. In particular, this implies that the function φ(x) − ψ(x) has at least five zeros. Since the function φ(x) − ψ(x) is C 1 , this implies that the derivative has at least four zeros, in other words this means that 1 f (x) − 1 g(x) has at least four zeros. Finally, this means that f (x) = g(x) has at least four solutions. Returning to our problem we take now some parameters a, b, c, d,ã,b,c,d and consider f (x) = a − bx + c(1 − e −dx ) while g(x) =ã −bx +c(1 − e −dx ). In the case f (x) − g(x) = 0 has at least four solutions, we actually also get that f (x) − g (x) = 0 has at least three solutions, which then upon taking another derivative gives that f (x) − g (x) = 0 has at least 2 solutions. Now this means that cd 2 e −dx =cd 2 e −dx has at least two different solutions. The point is that if the above is satisfied for two different values of x, say x 1 and x 2 , then cd 2 cd 2 = e (d−d)x 1 = e (d−d)x 2 which then leads to the conclusion that we must have d =d and c =c. Going now back the ladder, using the fact that f (x) = g (x) for three distinct values of x we have −b + cde −dx = −b + cde −dx and thus b =b. Finally, having f (x) = g(x) for five different values of x means that we also get that a =ã, thus all the parameters must be equal. Taking this back to our equation (8) and taking X(t) = D(t) − D 0 , knowing the values X(0), X(1), X(2), X(3), X(4), then we can uniquely determine the values of Knowing these values is not enough to determine all the values of γ 1 , γ 2 , β, α because S 0 we know that which shows that we can solve now Consequently, knowing D 0 , D 1 , D 2 , D 3 , D 4 andĨ 0 ,R 0 we can determine the parameters β, γ 1 , γ 2 , α Mathematical modeling of infectious diseases dynamics, Encyclopedia of infectious diseases: modern methodologies The multicompartment si (rd) model with regime switching: An application to covid-19 pandemic Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand Mathematical models of infectious disease transmission Deterministic and stochastic epidemics in closed populations, Contributions to Biology and Problems of Health Contributions to the mathematical theory of epidemics-i A contribution to the mathematical theory of epidemics Contributions to the mathematical theory of epidemics. iii.-further studies of the problem of endemicity Contributions to the mathematical theory of epidemics-ii. the problem of endemicity Early dynamics of transmission and control of covid-19: a mathematical modelling study A regime switching on covid19 analysis and prediction in romania The prevention of malaria An application of the theory of probabilities to the study of a priori pathometry.-part i An application of the theory of probabilities to the study of a priori pathometry.-part ii Assessment of 21 days lockdown effect in some states and overall india: a predictive mathematical study on covid-19 outbreak Covid-19 predictions using a gauss model Modeling, state estimation, and optimal control for the us covid-19 outbreak Spatio-temporal analysis of surveillance data, Handbook of Infectious Disease Data Analysis The goal of this section is to provide the proof of Theorem 1. Recall the system (4) given by