key: cord-0487787-iouc1bbx authors: Benavides, Efren M. title: Robust predictive model for Carriers, Infections and Recoveries (CIR): predicting death rates for CoVid-19 in Spain date: 2020-03-31 journal: nan DOI: nan sha: 9ea311bcfd2eeec8ebfb4caa6a00ceef5f420089 doc_id: 487787 cord_uid: iouc1bbx This article presents a new model to predict the evolution of infective diseases under uncertainty or low-quality information, just as it has happened in the initial scenario during the CoVid-19 spread in China and Europe. The model has been used to predict the death rate in Spain but can be used to predict the demand of ICUs or mechanical ventilators under different restraint policies. The main novelty of the model is that it keeps track of the date of infection of a single individual and uses stochastic distributions to aggregate individuals who share the same date of infection. In addition, it uses two types of infections, mild and serious, with a different recovery time. These features are implemented in a set of differential equations which determine the number of Carriers, Infections, Recoveries, Hospitalized and Deaths. Comparison with real data shows good agreement. Since the end of January, CoVid-19 has spread through Europe generating an unprecedented medical collapse, firstly in Italy, secondly in Spain. Medical staff and services (especially Intensive Care Units -ICUs), mortuary services, sanitary suppliers, etc. appear all overwhelmed by the number of new patients and deaths per day: current predictive models seem to have underestimated the rate of infection. This article presents a new model to predict the evolution of infectious diseases under uncertainty or low-quality information, just as it has happened in the initial scenario during the spread of CoVid-19 in China and Europe. The model uses a low number of input parameters and stochastic distributions. It is expected that this will provide the model with the required robustness to accurately predict demand on services. In particular, the model has been used to predict deceases, but it can be easily modified to predict the demand of ICUs or mechanical ventilators under different restraint policies. To achieve this goal, the model implements the following four key characteristics: For the previous reasons, the mathematical structure of the proposed model, a Carriers-Infections-Recoveries (CIR) model, departs significantly from the one of a Susceptible-Infectious-Recovered (SIR) model. The kernel of the presented model consists of four differential equations: two for Infections and two for Recoveries. The kernel can be complemented with additional differential equations, one for each sanitary event to be predicted. In the case presented in this article two additional differential equations are implemented: one for predicting the input rate of medical services, say 'hospitalization'; the other for predicting the output rate, which includes recovery and death. This new set of six differential equations has been used to predict the death rate in Spain, demonstrating today an excellent agreement with the actual numbers. One important fact is that only one set of input parameters has been used for the prediction: this means that the input parameters are the same since the onset of the infection and hence no readjustment of the parameters has been required up to date to fit the results. For the case studied and at the moment of this publication, it seems that the proposed model has enough predictive accuracy. Let ℎ(∆ , ) be the probability that an infected individual has of stopping being contagious just in time after living with the virus during a period of time ∆ ; being a parameter distribution which is discussed in Appendix A (note that, if required, the distribution could accommodate more parameters or be changed by other ones). Here, 'stopping being contagious' means to reach immunity or death, which in both cases produces a barrier to propagation. According to the discussion in Appendix B, I suppose that there are two groups of contagious individuals: mild (type 1) and serious (type 2). The difference is that type 2 can recover or die whereas type 1 can only recover. Each type follows a different probability distribution, ℎ(∆ , ), where ∈ {1,2} identifies the type (note that, if necessary, the model can easily accommodate more types). (1) One of the main novelties of the model comes from assuming that infected people have to spend a different period of time to become recovered (immune or dead), and for this reason they are labelled with the date of infection 0 . In effect, assume that ( 0 ) is the number of people who were infected just in the instant 0 and that such people have lived with the illness for a period of time ∆ to reach the current date, then the number of such people who become recovered is ℎ(∆ , ) ( 0 ) . As long as 0 can be any past time, the total number of new recovered people at time is ∫ ℎ(∆ , ) ( 0 ) 0 . Let ( ) be such number, i.e., the total number of new recoveries (immunity or death) generated from type at time during the period , then Equations (1) and (2) let find and with ∈ {1,2} for a set of initial conditions: in this paper it is supposed that the infection starts with one mild infection, that is, with the following initial conditions: 1 (0) = 1, 2 (0) = 0 and (0) = 0. It is remarkable that, in this model, and are respectively the total number of infections and the total number of recoveries which comes from each type. By definition, the carriers (contagious people) are obtained by subtracting the recoveries from the infections. Thus, calling ( ) to the number of carriers at time of type , we have that Note that equations (1) to (3) make a huge difference with respect to the SIR model. This is because the convolution in eq. (2), which comes from keeping track of the date of infection, obligates to separate the number of contagious individuals from the number of infections and, hence, the number of carriers substitutes the SIR's number of infectious. For this reason, the present model is a CIR model where the total number of carriers is the total number of infections is and the total number of recoveries is The number of new infections is proportional to the number of free carriers, who are those carriers whose mobility has not been restricted: ( ) − ( ). Here ( ) is the number of carriers who are isolated in hospitals or who keep themselves at home. I suppose that they are a fraction of the serious (type 2) carriers: ( ) = 2 ( ). The number of new infections is also proportional to the frequency ( ) that a carrier has of finding people who are susceptible to being infected as type . Therefore, finally, the rate of infections is given by (7) Function ( ) includes (a) the average frequency , which is the average number of persons who an average person finds per day; (b) the factor , which measures the average success of contagion; (c) the factor , which is the average fraction of infections that will be of type ∈ {1,2} where 2 = 1 − 1 = (risk fraction); and (d) the fraction of susceptible people. Let be the total susceptible (available) population. Since carriers and recoveries are not susceptible to being infected, the susceptible people are ( ) = − ( ) − ( ) and, using eq. (3), this becomes S( ) = − ( ). Thus, the fraction of susceptible people is 1 − ( )/ . Previous reflections let us write It is convenient to define the following fractions Using these fractions and placing equations (7) and (8) into (1) and (2) we reach the four differential equations that determine the temporal evolution of the infections and recoveries of each type : This system of four differential equations is highly non-linear and very different from the SIR models used normally. Appendix C shows that, for small fractions and only one type of contagious people, it has a stationary solution when time tends to infinity. Type-2 carriers are seriously infected people who, after the incubation period, will use medical services, say hospitals, UCIs, etc. These 'hospitalized' patients will leave the medical services, after a period of time, because they have died or because they have recovered enough. Let ℎ(∆ , ) be the probability that an infected individual has of being an input in a medical service just in time after living with the virus during a period of time ∆ . Let ℎ(∆ , ) be the probability that an infected individual has of being an output in a medical service just in time after being in the hospital a period of time ∆ . Both, and , are parameters which are discussed in Appendix A. Therefore, following a similar argumentation to the one given for equation (2), the fractions of inputs and outputs are calculated by the following two differential equations Equation (16) lets calculate the number of people ( ) using the medical service at any time. In addition, it is supposed that deaths are a constant fraction of the people leaving the hospital, so that The set of six differential equations given by (12) to (15) (as well as the convolution integrals inside them) have been numerically integrated with a low order integrator (Euler) using a time step of 0.4 days. More precision and numerical stability can be obtained using highorder integration methods and a smaller step, but this has not been necessary for the moment. Appendix B presents an estimation of the following input parameters for the model: = 3.10 days, = 11.36 days, 1 = 6.72 days, 2 = 13.92 days, = 0.283 and = 0.165. However, there is no estimation for , , and 2 . In addition, the date of the first infection is also an unknown (Appendix B gives a plausible range). To solve this problem, an optimization process was launched to minimize the difference between the calculated data and the real one (in a logarithmic scale) for three starting points: beginning with a far date (scenario 1) and ending with a close date (scenario 3). The values of the parameters , , and 2 which minimizes the error for each scenario are collected in Table 1 . In all scenarios, the restriction of movements imposed by the government has been taken into account by a tenfold reduction of , that is, the value of before the day of confinement (March 15 th , one day after its official publication) is the one given in Table 1 and it is /10 after such day. The errors reached in the three scenarios are almost identical, showing that this error cannot be used to fit the date of the first infection. Indeed, the values of 2 and and this date are strongly correlated, fixing one of them significantly fixes the others. The closer the date of the infection, the larger the value of 2 . For example, the assumption of March 16 th as the initial date leads to 2 > 1, which is not possible, and hence this date must be discarded. In all scenarios the confinement has reduced the impact over the medical services because of the reduction of the susceptible population (note that the reduction of becomes effective after the day the carriers reach its maximum: the value of does not affect much if there are no new susceptible people to be infected). However, scenarios 2 and 3 have a lower number of recoveries than scenario 1 and therefore almost all the Spanish population is susceptible to contagion the day the mobility is present again. If such is the case, an effective practice would be to conduct tests on the maximum number of people in order to reduce its individual mobility as soon as possible. Results show, as expected, that the susceptible population to be exposed is not all the population of Spain. This means that there are regions of Spain (towns, small cities, etc.) which are not accessible to the contagion. Even in cities like Madrid and Barcelona there could be isolated areas with a negligible exposition to the contagion. Obviously the longer the time available for viral spreading, the greater the exposed population. This explains the differences in Table 1 . Discovering empirically which scenario is the real one would require conducting tests over a significant part of the population. Without this information we can only make a conjecture about the most plausible scenario. This conjecture comes from comparing the value of 2 (note that and are very similar). In effect, 2 changes two orders of magnitude following the change of the susceptible population from 13 to 0.13 million. The risk fraction of 0.00342 is very far away from the values 0.143 and 0.5 estimated as upper bounds in Appendix B whereas the risk factor of 0.369 is in such range. The conclusion is that scenario 3 is more plausible than 1 and that the initial date will probably be between January 31 st and February 5 th . Assuming, 2 = 0.167, the minimization of the error leads to February 3 rd , = 0.2833 million, = 0.550, = 2.95, (∞) = 0.2811 million and (∞) = 13280. The solution of the differential equations for this set of input parameters is shown in figure 1 , where the matching is significantly good. Note that the real active cases for mild and serious cases has not been used to fit any parameter since their definitions are not clear (are the infected who stay at home counted as mild cases?) and probably does not coincides with the ones given in this article. At the moment of writing this article, the number of active cases in Spain was 80110 [1] whereas the final number calculated for Figure 1 is 281100 but can be much higher if the initial date is moved far away. For this reason, only deaths have been used for fitting purposes: the uncertainty on reported deaths, although not zero, is lower. To illustrate this fact, Figure 2 shows the calculated curves for the case less probable (i.e., scenario 1 in Table 1 ). As can be seen, the matching is as good as the one obtained for the case of figure 1 . However, the difference in the number of final recoveries is huge: 13 million (near 27% of the Spanish population) in the less probable case and 0.2811 million (near 0.6%) in the other case. A new predictive model based on differential equations and convolutions has been described and used to estimate the death rate by CoVid-19 in Spain. It has shown that 1) only one set of parameters is required to obtain a prediction over a full curve; 2) there is a strong dependence between the date of the first infection and the susceptible population and risk fraction, fact which allows the estimation of such day; 3) it can be used to estimate the number of susceptible people in near-future massive infections; and 4) it can be used to estimate the demand for hospital services and the effect of different governmental actions. Let ℎ(∆ , ) be the probability that an infected individual has of suffering a given event (for example, developing symptoms, leaving the UCI, leaving the hospital, recuperating or passing away, etc.) just in time after having being infected during a time ∆ . It is plausible to assume that it responds to a general distribution of the form (note that it could be substituted by any other distribution without changing the model): It is convenient to use a new dimensionless function and a new dimensionless independent variable defined as follows A Gaussian-like distribution has = 2. A stochastic distribution with enough uncertainty has 2 = 2 , so that, = 6.484478437, 〈∆ 〉 = 1.871119609 , = 1.800621898 and Reference [2] reports statistics over 191 patients, of whom 137 were discharged and 54 died in hospital. This means that 28.3% of patients died (26% required ICU), which leads to = 0.283. We use this data to obtain the following parameters for the serious infection, that is, for the type 2. However, this value should be treat as a upper limit since we do not know if there were cases not diagnosed. Reference [3] reports that a Spanish woman who was infected on February 29 th began symptoms on March 5th, that is, 5 or 6 days after the infection. She did recover completely on March 12 th , that is, 12 or 13 days after the infection. The CoVid-19 test was positive on March 10 th . She did not transmit the virus to any of her relatives nor to any of the 11 people who she met before knowing she was infected. We can use this case to estimate the parameters for mild infections, that is, for type 1: 1 = 13 = 6.72 days. We can also use this case to estimate the parameter for developing symptoms and entering the medical service: Reference [4] reports that a party held at the end of February in Spain brought together 80 (apparently healthy) people. The result were 14 infections, of whom 7 were hospitalized and 1 was admitted in the ICU. He was admitted in the ICU on March 10th, nearly 12 days after the party, this means that for type 2, = 12 = 6.20 days, which is a result very similar to that obtained previously from reference [3] . It is relevant to remark that this person was healthy, athletic, non-smoking and in his 50s, thus we conclude that all the population is susceptible to be in the type 2; however, only 7 out of 14 were seriously affected and 1 out of 14 was very seriously affected. This sets a rough estimation for the fraction of type 2 infections, 2 = ≈0.5, but note that this number should be taken as a superior limit because there could be non-detected carriers. In addition, it seems plausible that, at that party, there was only one initial carrier of type 1 and hence, the success in the contagion can be estimated as = 13/79 = 16.5%. Reference [5] reports that the first case in Spain (in Gomera island) was confirmed on January 31 st , 2020. The infected person had mild symptoms and was discharged on February 14 th . The contagion came from a German person who had been diagnosed. This case and many others like the one reported in reference [3] make very plausible the hypothesis of having two types of infection: mild and serious. The first case on the Iberian Peninsula (in Catalonia) was a 36-year-old woman who was confirmed on February 26 th . Supposedly, she was exposed to the virus in the north of Italy (Milan and Bergamo) between the days 12 th and 22 nd of February. Taking into account an incubation period of 6 days, we could have two rough estimations for the initial date: January 25 th and February 23 rd . Almost for sure, there were other infections which were not detected because they laid in the mild-condition group (note that this group includes even those infected who had absence of symptoms) and hence the initial infection could have happened, as a first guess, near February 5 th . This data has great uncertainty and must be the object of further discussion. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Alejandra, 20 años, curada del Covid-19: No me sentía mal y seguía quedando Ingresados cuatro miembros de una misma familia sevillana por coronavirus, uno de ellos en UCI ¿Cuál fue el primer caso de coronavirus en España y en la península? Assuming that the fraction of carriers is small and that there is only one type of infections, equations (11) and (12) leads to