key: cord-0624280-1j5g1kk8 authors: Petrica, Marian; Stochitoiu, Radu D.; Leordeanu, Marius; Popescu, Ionel title: A regime switching on Covid19 analysis and prediction in Romania date: 2020-07-27 journal: nan DOI: nan sha: 5eed9625c29d7ed7e5ca973a0df6037d535a63fb doc_id: 624280 cord_uid: 1j5g1kk8 In this paper we propose a regime separation for the analysis of Covid19 on Romania combined with mathematical models of SIR and SIRD. The main regimes we study are, the free spread of the virus, the quarantine and partial relaxation and the last one is the relaxation regime. The main model we use is SIR which is a classical model, but because we can not fully trust the numbers of infected or recovered we base our analysis on the number of deceased people which is more reliable. To actually deal with this we introduce a simple modification of the SIR model to account for the deceased separately. This in turn will be our base for fitting the parameters. The estimation of the parameters is done in two steps. The first one consists in training a neural network based on SIR models to detect the regime changes. Once this is done we fit the main parameters of the SIRD model using a grid search. At the end, we make some predictions on what the evolution will be in a timeframe of a month with the fitted parameters. Infectious disease pandemics have had a major impact on the evolution of mankind and have played a critical role in the course of history. Over the ages, pandemics made countless victims, decimating entire nations and civilizations. As medicine and technology have made remarkable progress in the last century, the means of fighting pandemics have become significantly more efficient. Another problem is that globalization, the development of the commerce and the ease to travel all over the would facilitates the transmission mechanism of a new disease much more than it did in the past. In 2019 a new pneumonia was associated to a new virus from the corona virus family known now as COVID-19. This spread out very quickly around the wolds as the next subsection shows. On 31 st of December 2019, the first cases of infection with an unknown virus causing symptoms similar to those of pneumonia were reported to the World Health Organization in China. Shortly after that the new virus was identified as a new coronavirus, and the public health organizations feared that the situation may degenerate in one similar to the SARS epidemic in 2003. The SARS outbreak was also caused by a newly identified coronavirus and it caused 8000 infection cases, 774 deaths and significant financial losses. Within less than 3 months COVID-19 outbreak has become a global pandemic, spreading across almost all countries all over the world. As of 29 th of March 2020, there were 725980 COVID-19 confirmed cases which caused 35186 deaths. Although the outbreak started in China, the virus has spread rapidly all over the globe, the most affected countries being now Italy, Spain, China, Iran, France, United States of America and United Kingdom. The fast-evolving spread of the new coronavirus, which has been officially declared a pandemic, is represented below. The following charts show the countries where there have been reported at least 1000 cases of COVID-19 infection: While at the beginning of February 2020 the virus was still affecting mainly China, it has started to spread rapidly to other countries, causing infections especially in western Europe countries at the beginning of March 2020. In less than a month, by the end of March 2020, the outbreak was present on all continents, affecting most of the countries in the world, which led the World Health Organization to officially name it a pandemic. In the fight with the COVID-19, quarantine was one of the main measures, at least when the hospitals were overwhelmed with patients and the virus propagation and its inside body working was not well understood. A basic tool in analyzing the spread of the virus is the mathematical modeling. There is a growing body of mathematical models used at the moment as for a small sample by no means exhaustive see [WDM19, CGR07, GF08, KRD + 20, SNC20, SSSK20, FLNG + 20, TLSB20]. This turned out to be a valuable tool that can be used in the assessment, prediction and control of infectious diseases, as it is the COVID-19 pandemic, which significantly impacted almost all countries, with important social and economical implications worldwide. The main purpose of this work is to develop a predictive model that can accurately assess the transmission dynamic of COVID-19. In this paper we use as a starting point the standard SIR model initiated in [Ros] and later investigated in depth in [KM91a, KM91b, KM33] . The basic SIR model uses the assumption that the parameters are constant over time. By starting with given initial conditions and observing the solution at a single day, we can determine in a unique way the parameters of the model which we prove in Proposition 1. This suggests that by fixing initial conditions and observing the solutions on a given day we can determine the parameters uniquely. As a new tool here we use a neural network having as input the solutions on a certain number of days for the SIR model with the output the parameters of the model. Now using the data of infected and recovered in Romania, we use the neural network to guess in the first round what the parameters are from day to day. This is not very accurate, because there is a lot of uncertainty in the data. For instance we can not have an accurate estimate of the infected number of individuals, nor can we accurately estimate the number of recovered, particularly when there are so many asymptotic cases and not much testing as it happened at the beginning of the pandemic. By analyzing the data, we draw the conclusion that assuming the parameters constant in time is not a valid assumption. The parameters of the model vary over time due to the measures implemented in the effort to mitigate the spread of the disease. By examining the data, we noticed that we can split the time frame into separate regimes, with transitions periods between them, and we can consider the parameters constant on each such regime with a transition allowed between regimes modeled by a logistic function. The actual fit of the parameters is based on a grid search around the averages suggested by the neural network output we described above. This is worked out by fitting the number of deaths reported and the solution to the differential equation which drives the dynamic of the number of deaths. The organization of the paper is as follows. In Section 2 we introduce briefly the SIR model and provide the main result that knowledge of solution to the system at any given time determines the parameters in a unique way. In section 3 we provide details on the construction of the neural network and the rough guess of the parameters based on the public data. In Section 4 we give a description of the SIRD model in which we separate the deceased numbers from the recovered and discuss how one can arrive at a differential equation for the deaths alone. Next, in Section 5 we introduce the regimes idea and how we model it. Here we also describe how we fit the paramters and in Section 6 we discuss the predictions based on this model. Finally Section 7 includes the main conclusions. The first attempts of developing a mathematical model of the infectious diseases spreading were made at the beginning of the twentieth century. One of the most important models that can describe infectious diseases is the SIR model. The first ones that developed SIR epidemic models were Bernoulli, Ross, Kermack-McKendrick and Macdonald. The SIR model is a mathematical model that can be used in epidemiology in order to analyze, at a given time for a specific population, the interactions and dependencies between the number of individuals who are susceptible to get an infectious disease, the number of people who are currently infected and those who have already been recovered or have died as cause of the infection. This model can be used to describe diseases that can be contracted just one time, meaning that a susceptible individual gets a disease by contracting an infectious agent, which is afterwards removed (death or recovery). It is assumed that an individual can be in either one of the following three states: susceptible (S), infected (I) and removed/ recovered (R). This can be represented in the following mathematical schema: Infected Recovered β γ where: • β = infection rate • γ = recovery rate. We consider N as the total population in the affected area. We assume N to be fixed, with no births or deaths by other causes, for a given period of n days. Therefore, N is the sum of the three categories previously defined: the number of susceptible people, the ones infected, and the ones removed: Therefore, we analyze the following SIR model: at time t, we considerS(t) as the number of susceptible individuals,Ī(t) as the number of infected individuals, andR(t) as the number of removed/recovered individuals. The equations of the SIR model are the following: where: • dS dt is the rate of change of the number of individuals susceptible to the infection over time; • dĪ dt is the rate of change of the number of individuals infected over time; • dR dt is the rate of change of the number of individuals recovered over time. Because there is no canonical choice of N , we will transform the system (1) by dividing it by N and considering S(t) =S(t)/N , I(t) =Ī/N and R(t) =R(t)/N . It is customary to choose N = 10 6 for convenience but this is just an arbitrary choice. For instance, analysis on smaller communities, or cities involves less than 10 6 , however 10 6 is a common choice because countries number their populations in multiples of 10 6 . With these notations we translate (1) into where β =β/N and γ is the same as in (1). Notice that now we actually have that S(t) + I(t) + R(t) = S 0 + I 0 + R 0 = 1 for all t ≥ 0. Since we are interested in the reverse problem, namely determining the parameters β, γ from the observations, we put this as a formal mathematical result as follows. Proposition 1. Referring to the system (2), if we know I 0 , S 0 and the values I(t 1 ), S(t 1 ) for some t 1 > 0, these determine uniquely the parameters β and γ of the system. Notice there the main assumption, that the parameters β, γ do not change in time. Proof. The first step is to notice that by assumption, β, γ constants in time yields in the first place that which in turn gives that and finally integrating this shows that (here we denote ρ = γ/β) In particular, this means that Typically the initial value of S 0 is close to 1 and I 0 is relatively small. In particular, if we assume that the epidemic ends somewhere then we definitely have I(t) = 0 and thus S(t) solves the equation In particular if we assume that I(t ∞ ) = 0 and S(t) converges as t → t ∞ , then we get in the limit that S(t ∞ ) solves (4). One consequence of this argument is that for all time 0 ≤ t ≤ t ∞ , we have that Another important consequence of this model is that if we assume S 0 and I 0 fixed (obviously R 0 will also be determined) but, for a given time t = t 1 > 0, knowing S(t 1 ) and I(t 1 ) (therefore R(t 1 ) as well), we can determine uniquely the parameters β and γ. Indeed this is clearly seen from (3) which gives On the other hand, from (3) in the first line of (2), then we obtain that The problem is that we can not integrate explicitly this to obtain an analytic expression for S(t). However, what we can still show is that by knowing I 0 , S 0 , I(t 1 ), S(t 1 ) we can determine the parameter β. As we already pointed out, we know how to determine ρ = γ/β, thus we can rewrite (5) in the form Now, for α : and notice that using this function, integrating (6), we arrive at Φ(S(t 1 )) − Φ(S(0)) = βt 1 from which it is clear that β is completely determined by S(t 1 ), S 0 , I 0 . Knowing β and ρ, we can immediately solve for γ = ρβ, thus all parameters are determined. Our next goal is to get estimates on the parameters β, γ of the SIR model. There are two basic ideas here. The first one is to train a neural network using a typical inverse problem. The second one is to use this neural network combined with the data to estimate the regimes of the parameters. In a real world the parameters do not stay constant, they change slightly and we would like to catch part of this behavior. We combine the neural network with this day by day estimate to get an indication of the regimes for β and γ. In fact, what we do is we try to detect the regimes where the parameters stay more or less constant. As we will see the regime change is confirmed by the quarantine imposed as a fighting measure against the virus. To deal with the parameter estimates, we do the following. First we discretize β by considering 200 points equally spaced in the interval [0.1; 1.5] and for γ we consider 100 points equally spaced in the interval [0.05; 0.67]. These intervals were chosen based on apriori analysis and much experimentation. Next, we solve the system of differential equations for each pair (β i , γ j ), for 50 days, for a population of 10 6 individuals, and we store the results in a dataset. We train a neural network on the resulting dataset so that the input is of the form: XT rain = (Day, #Susceptible, #Inf ected, #Recovered) or in the terminology of the previous paragraph, we have XT rain = (t, S(t), I(t), R(t)) where t = 0, 1, . . . , 50. and the output is exactly the pair Y T rain = (β, γ), which generated the solution above. We fixed the initial conditions S 0 = 1 − I 0 , I 0 = 2/N and R 0 = 0. We started with 2 infected people because there were two initial individuals who traveled in outside the country in exposed regions and were first spotted as the original spreaders. The neural network we used is of the following form Before we move on with the results of the day by day estimates, we point out that the result of Proposition 1 guarantees that the parameters estimated should be well-determined by the network as each triple (t, S(t), I(t)) determines in a unique way the parameters (β, γ). Once the model had been trained, we use it to predict the day by day β and γ for Romania Covid-19 reported numbers. What this means is that we try to predict a set of parameters such that for a given day t, what we observe is exactly the number of suspected, infected and recovered reported on that day by the officials. Therefore, we assume and try to predict a single set of parameters for the time period [0, t], t here being the corresponding day. The results are represented in the chart below. What this suggests is that we can identify three regimes. The first one is characterized by uncertainty, with a high infection spread and big variation of the two parameters values from day to day. This is approximately for the first 15 days or so. This may be due to the fact that, even if there were not many cases reported yet and the restrictions have not already been imposed, people were starting to be aware of the severity of the situation. On the other hand, the last 25 days have a lower volatility for both parameters, which can be a consequence of the measures taken by the authorities. The intermediate regime can be considered a transition between the first one and the last one. This is roughly centered on the 30th day with a period of ±10 days of regime switching. We should also commment on the fact that the data that is available shows the number of individuals that have been tested positive, but it is very likely that the real number of people infected is in fact much higher, as there are also asymptomatic individuals, people that are not being tested although they present the specific symptoms, so they are not part of the official reports. Another aspect that should be taken into consideration is that the long incubation period characteristic to this virus determines a delay between the moment when a person has been infected and the moment when that person has been tested positive. In order to reduce the effects of the above deficiencies, we consider another model below which accounts the number of deceased as separate rubric. In this process we keep in mind all the points discussed until now, an important one being that the parameters are not constants for all period, they can be at most constant on pieces, reflecting in fact the regime imposed on the population. In the sequel we propose a model in which we modify the SIR model in two different directions. The first one is to consider an interaction between the recovered and the susceptible on one hand and the other direction is to have an account of the deaths in this analysis. If we want to take into account the interaction between the recovered and the susceptible, we really need to separate the deceased ones from the recovered ones. At the end of the day the number of deaths is probably the most reliable number we can account for, as the number of infected people is wildly unknown and the number of recovered is also largely unknown. Thus we have four variables changing with time now. These are S(t), I(t), R(t) and D(t) where R(t) is the proportion of recovered and alive people while the D(t) is the proportion of deceased people. We set the interaction as follows Notice that in this setup the recovered population bifurcates into recovered ones, accounted by R and the dead ones accounted by D. The point of this model is to see how many people die in the long run from this disease. Of course this is not complete as there are other factors which should be taken into account, but we are going to use this simple model. Notice that for R taken as the sum of the two factors R + D above we fall into the classical SIR model. There are two points here for the model. One is that we separate the dead people from the recovered ones which are mixed up in the classical SIR model. We are going to manipulate these equations and reduce the computations to a single equation involving only one of these quantities, the most reliable one, namely D(t). To do this we will write all the other quantities as functions of D as follows: The easiest to deal with is R because from the last two equations we get from which we deduce that R(t) = γ 1 γ 2 (D(t) − D 0 ) + R 0 . Now, we deal with the function u from S(t) = u(D(t)). Dividing the first and the last we get which can be integrated and gives S in terms of D as On the other hand this allows us to solve for I = v(D). First we notice that from which we deduce that therefore we obtain that (as functions of D) This last deduction works in the case the parameters β, γ 1 , γ 2 are all assumed constant in time. However, if they vary with time, then, the equation is a little bit different, the main equation becomes now The main idea from here is to use the data on the death cases to estimate the coefficients involved. As we already pointed out, the number of perished people is the most reliable data, since all the other data is very rough. For instance, the proportion of people which are infected is grossly underestimated since there are probably more infected people than the reported cases tested usually in the hospitals. It is also true that even the dead numbers are probably overestimated as many people perish due to existing conditions which lead to complications which in the end makes the task of deciding the cause of death much more difficult. However this is the most reliable data we can trust. From the technical standpoint, equation (9) is not easy to handle and we will use equation (8) instead together with a regime switch and a piecewise parameter fit. In other words, we fit the number of deceased on pieces where we assume that the parameters do not change. We talked about the existence of different regimes in the spreading of the disease because of the measures that have been taken which had a significant impact on the evolution of the infection rate. We choose to apply an approach in which we have separate scenarios for the free spread period and the case of quarantine, as well as a transition period between these two scenarios. First we define the sigmoid function: σ(x, c, s) = 1 1+e s(x−c) where c models the turning point and s models how swift the transition between these two regimes is taking place. Considering 2 different regimes and a transition period between them the SIRD model becomes σ(t, c, s) )) · S · I dI dt = (β 1 · σ(t, c, s) + β 2 · (1 − σ(t, c, s))) · S · I −(γ 11 · σ(t, c, s) + γ 21 · (1 − σ(t, c, s))) · I − (γ 12 σ(t, c, s) + γ 22 (1 − σ(t, c, s)))I dR dt = (γ 11 · σ(t, c, s) + γ 21 · (1 − σ(t, c, s))) · I dD dt = (γ 12 · σ(t, c, s) + γ 22 · (1 − σ(t, c, s))) · I. where • β 1 , β 2 represent the infection rates in the first regime respectively in the second one; • γ 11 , γ 21 represent the recovery rates in the first regime respectively in the second one; • γ 12 , γ 22 represent the fatality rates in the first regime respectively in the second one With the same notations and same logic we will consider β, γ 1 , γ 2 in (8) as follows: • β = β 1 · σ(t, c, s) + β 2 · (1 − σ(t, c, s)) • γ 1 = γ 11 · σ(t, c, s) + γ 21 · (1 − σ(t, c, s)) • γ 2 = γ 12 · σ(t, c, s) + γ 22 · (1 − σ(t, c, s)) We have previously shown that we can derive all the parameters using the number of dead people, as it is the most reliable one, using equation (8). In other words we will use the above substitutions for β, γ 1 , γ 2 in equation (8) and we aim at finding the solution of the following problem: minimize β 1 , β 2 , γ 11 , γ 12 , γ 21 , γ 22 It is important to mention that we fix some of the parameters. We take as follows • c = 30 based on the moment when the recommendations were made or restrictions were imposed, when people started to be aware of the severity of this virus • s = 0.1 representing how swift was the transition between the two regimes. • n = 60 we solve the problem using the data from the first 60 days. • The optimization problem is solved by using a grid search around the average values of the β, γ obtained from the neural network on each of the regimes outlined in the section above and plotted in Figure 2 . In particular we use here the fact that γ 11 + γ 12 = γ 1 , for the first regime and similarly we have γ 21 + γ 22 = γ 2 for the second regime. We interpreted γ 11 = pγ 1 and γ 12 = (1 − p)γ 1 and thus we in fact search for p on a certain grid for the best fit in (11). We point out that the last two parameters depend on the country, as different countries have acted differently. Solving the optimum problem in these conditions we obtain: We plot here for 2 different timeframes: 60 days and 90 days. An important aspect that we have to keep in mind is that the prediction of the disease evolution can not be done accurately for a long timeframe, as the new restrictions or other measures, treatment and medication developed over the time definitely impact the model. Considering this idea, the prediction of the infectious disease evolution should always incorporate in the model the new information that arises as the time passes and we should split the timeframe in separate regimes. Starting 15th of May, the Romanian Government started to gradually relax the COVID-19 restrictions and on the 15th of June the secound round of relaxations were issued. During this time, various restrictions have been lifted and a new package of relaxation measures has been released every two weeks. In order to generate reliable predictions, we now have to take into consideration the effect of lifting the restrictions and to adapt the model to this new regime. Therefore, we analyze how the measures of relaxation have impacted the parameters of the model and we generate a new prediction. We expect the number of deaths to grow. With the same logic that we used before we consider these new parameters: • β 3 represents the infection rate in the third regime • γ 31 represents the recovery rate in the third regime • γ 32 represents the fatality rate in the third regime; Now, in order to make a new prediction, we will consider β, γ 1 , γ 2 in (8) as follows: • β = [β 1 · σ(t 1 , c 1 , s 1 ) + β 2 · (1 − σ(t 1 , c 1 , s 1 ))] · σ(t 2 , c 2 , s 2 ) + β 3 · (1 − σ(t 2 , c 2 , s 2 )) • γ 2 = [γ 12 · σ(t 1 , c 1 , s 1 ) + γ 22 · (1 − σ(t 1 , c 1 , s 1 ))] · σ(t 2 , c 2 , s 2 ) + γ 32 · (1 − σ(t 2 , c 2 , s 2 )) • c 2 = 115 corresponding to 15th of June when the last round of relaxations were put in place, in particular the opening of restaurants, bars, hotels and pools were regulated. With this we take s 2 = .22 which corresponds to a switch of approximately 5 days from one regime to the other. Solving the equation with the third regime included we obtain the following prediction: This is the prediction based on the three regimes. The confidence intervals shown in blue are based on = 10%, 15%, 20% adjustment done as follows. With the parameters from above we constructed a grid around the mean values of β 3 and γ 3 and took the range of (1± )β 3 , (1± )γ 3 and split this into a mesh with 10 values in each range. Furthermore, we evaluate the number of deaths for each such combination and arrange them according to difference from the number of deaths predicted with β 3 , γ 3 . Then we take the values for which the number of deaths is the most extreme at the end of the period, both underestimated and overestimated. This will give our (β 3,max / min , γ 3,max / min ). These will give the boundary curves for each = 10%, 15%, 20%. As a note, for this training we kept the ratio γ 31 /γ 32 as constant. The left figure is predicting the evolution until 1st of August based on the training till 15th of July, while the second figure gives a prediction until 1st of September. On the new regime, the model predicts that in 30 days (with the fitting based on 141 days, which is up until July 15th), the number of deaths will reach 2318 deaths on 1 st of August (the 160th day), 2478 deaths by Aug 15th and 2600 deaths by Sep 1 st and the corresponding confidence intervals outlined above. It is worth comparing the previous scenario, when we had only two regimes, with the new situation when we have three such regimes: The two graphs above show that the partial lifting of the restrictions will have a negative impact on the number of deaths caused by COVID-19. In the initial approach the number of predicted deaths on 1 st of August would have been 1370 while in the second approach which includes the relaxation period, the predicted number of deaths increases significantly, to 2318. We can easily notice that the first approach having one regime when the pandemic starts and one regime when restrictions are imposed, seems to understand the evolution of the epidemic better. Even so, considering the social and economical aspects and the behavior of the population and governments in the affected countries, we can deduce that the approach which adds the relaxation regime fits better the real situation. We use now the same parameters for SIRD model and in Figure 6 is the plot of the evolution of all the main categories. Figure 6 : The predicted evolution using the SIRD model with three different regimes. The estimates of the main categories for the period Feb 26 -Sep 1 st , which amounts for 190 days. Probably the most interesting part is that by then 80% of the population has been exposed to the virus already. We should note that part of assumptions here is the fact that some form of immunity is built up and the reinfection is not taken into account. Now we summarize what we did here. The main idea is that within the models we used, be it SIR or more realistic SIRD, we split the problem according to various regimes. In this paper we take three regimes. One is the regime before any measures were taken. The second regime is the one in which the quarantine was imposed on the population. We also model the transition from one regime to another. The third regime we consider is the one following the relaxation. The transition is also modeled with the help of logistic function. The fit is done using the number of deaths and using the SIRD model. The search of the parameters is done around the values of β provided by the neural network constructed based on the simpler SIR model. We believe that this methodology is a general one and can be extended to any country provided that we have data, in particular some information about the regime switch for each of the regimes. As a disclaimer, there are several assumptions made here. One of them is that people build up immunity to this virus and the reinfections are negligible. Mathematical modeling of infectious diseases dynamics, Encyclopedia of infectious diseases: modern methodologies Mathematical models of infectious disease transmission Contributions to the mathematical theory of epidemics. iii.further studies of the problem of endemicity Contributions to the mathematical theory of epidemicsi Contributions to the mathematical theory of epidemicsii. the problem of endemicity Early dynamics of transmission and control of covid-19: a mathematical modelling study The prevention of malaria Assessment of 21 days lockdown effect in some states and overall india: a predictive mathematical study on covid-19 outbreak Covid-19 predictions using a gauss model Modeling, state estimation, and optimal control for the us covid-19 outbreak Spatio-temporal analysis of surveillance data, Handbook of Infectious Disease Data Analysis