key: cord-0661889-6nejnjxp authors: Yanev, Nikolay M.; Stoimenova, Vessela K.; Atanasov, Dimitar V. title: Branching stochastic processes as models of Covid-19 epidemic development date: 2020-04-29 journal: nan DOI: nan sha: c2ba6d6a03e72f774e21ada55f8f7ced14c8183f doc_id: 661889 cord_uid: 6nejnjxp The aim of the paper is to describe two models of Covid-19 infection dynamics. For this purpose a special class of branching processes with two types of individuals is considered. These models are intended to use only the observed daily statistics to estimate the main parameter of the infection and to give a prediction of the mean value of the non-observed population of the infected individuals. Similar problems are considered also in the case when the processes admit an immigration component. This is a serious advantage in comparison with other more complicated models where the officially reported data are not sufficient for estimation of the model parameters. In this way the specific development of the Covid-19 epidemics is considered also for all countries as it is given in the specially created site http://ir-statistics.net/covid-19 where the obtained results are updated daily. infection. In fact this parameter m represents the mean value of the infected individuals by one individual per day. Using the observed statistics some methods for estimation are proposed and corresponding graphics are presented. Two models with and without immigration are compared. In this way we are able to give a prediction of the possible development of the mean value of the infected individuals. Notice that both type processes with or without immigration have an exponential growth in the supercritical case m > 1 but in the critical case m = 1 and in the subcritical case m < 1 the asymptotic behaviour is essentially different. In the critical case the mean value of the process with immigration grows linearly while for the process without immigration the mean value is constant. In the subcritical case the mean value of the immigration process converges to a positive constant but for the process without immigration the mean value goes to zero. The estimation of the immigration mean a is a serious problem because this parameter cannot be estimated by the observed data and one needs additional information. Hence the processes with immigration need more careful investigation. As it is given in the paper the proposed estimators can be applied also in the case when the processes are inhomogeneous in time. The behaviour of the estimators shows that the observed processes are able to change the criticality during the development of the epidemics. The estimated values for the main epidemical parameter m vary greater than 1, equal (or very closed) to 1 and even less than 1. Moreover it seems that the real epidemic process develops like a mixture of both type of the models with and without immigration. In this case four stages of epidemic development are available: exponential growth (m > 1), linear growth (m = 1 and an immigration component), non-increasing and almost stable population (due to m = 1 or m < 1 with an immigration component), convergence to zero (extinction of the epidemics due to m < 1 without immigration). It is obvious that the restriction of the immigration component is very important to the limitation of the epidemic process. It seems that in some countries exist some regions which can be considered as immigration sources for the other regions and in general for the whole country. The abroad immigration plays also an important role. The paper continues the investigations started in [16] where some results for the model without immigration were presented only for Bulgaria, Italy and globally. In the present paper as an illustration of the models with and without immigration the obtained results are presented for several countries: USA, Italy, France, Germany, Spain and Bulgaria. Additional information, reports and plots, related to this research for all countries all over the world can be found on the site http://ir-statistics.net/covid-19. The data used for the estimation of the parameters of the model are taken from European Centre for Disease Prevention and Control [21] , similar to the data provided by World Health Organization [20] . Since these databases are updated daily, the proposed here model is applied regularly on each new data set. Using these results one can compare the infection rate on different countries and regions on the basis of es-timated growth rate. For example, on Table 3 , the 10 countries with lowest and highest growth rate are shown. Even in the cases where the infection growth is less than 1, the confidence interval goes above 1, which states that there is a possibility for increasing of the infection growth in the future. The theoretical model based on two type branching process is described in detail in Section 2. The p.g.f.'s and the mathematical expectations are obtained. Regardless of its simplicity the model has a great advantage using only the observed official data for the lab-confirmed cases. The two-type branching process assuming an additional immigration component is considered in Section 3. The estimation problems are presented in Section 4. Some conclusive remarks are given in Section 5. Finally the estimation of the mean parameter of infection can be considered as a fast test to estimate the rate of Covid-19 epidemic in a country or a region. It can be used also as a first stage of a construction to some more complicated epidemiological models where it is not possible to estimate directly this parameter. Obviously the solution of this problem requires the collaboration of specialists in various fields as epidemiology, mathematics, medicine, microbiology, molecular biology and informatics among others. 2 Two-type branching process as model of Covid-19 population dynamics. Assume that the epidemic process of infection begins with some finite number of immigrants and then the process of immigration is isolated under the quarantine. To describe this situation we can consider a two type branching process {Z 1 (n), Z 2 (n)} where type T 1 are infected (but still healthy) individuals who don't know that they are Covid-19 infected and type T 2 of discovered with Covid-19 virus individuals (and this is the data we use). Every individual of type T 1 (infected) produces per day a random number of new individuals of type T 1 (infected) or only one individual of type T 2 (more precisely, in this case the individual type T 1 is transformed into an individual type T 2 ). Note that T 2 is a final type, i.e. the individuals of this type don't take part in the further evolution of the process because they are isolated under the quarantine. Let 2 ) be the offspring vector of type T 1 . Then the offspring joint probability generating function (p.g.f.) of type T 1 can be defined as follows: Obviously h 2 (s 1 , s 2 ) ≡ 1 because the type T 2 has (0, 0) offspring. Note that p 0 is the probability that type T 1 goes out of the reproduction process (the individual becomes healthy or goes out of the country, i.e. emigrates), p j is the probability to produce new j infected individuals of type T 1 and q is the probability that the individual type T 1 is confirmed ill (or dead). In other words, q = P{T 1 → T 2 }, i.e. with probability q an individual of type T 1 is transformed into an individual of type T 2 . Then from (1) we can obtain also that the marginal p.g.f. are If we assume that Z 1 (0) > 0 and Z 2 (0) = 0 then for n = 1, 2, ... 2 (n; j)} are independent and identically distributed (iid) as (ξ 2 ). The recurrent formula (4) defines two-type branching process {(Z 1 (n), Z 2 (n)), n = 0, 1, 2, ...}. Notice that Z 1 (n) is the total number of individuals (type T 1 ) in the n-th day infected by the individuals of the (n − 1)-th day; Z 2 (n) is the total number of the registered Covid-19 individuals (type T 2 ) in the n-th day. The process starts with is the number of individuals of type T 1 in the n-th day infected by the j-th individual of type T 1 from the (n − 1)-th day, j = 1, 2, ..., Z 1 (n − 1). Similarly the random variable ξ (1) 2 (n; j) is the number of the confirmed infected individuals (type T 2 ) in the n-th day transformed by the j-th infected individual type T 1 from the (n − 1)-th day, j = 1, 2, ..., Z 1 (n − 1). Note that P{ξ .., l; l = 0, 1, 2, ... In other words the probability q can be interpreted as a proportion of the confirmed individuals in the day n among all infected individuals in the day n − 1. Let h 0 (s) = Es Z1(0) , F 1 (n; s) = E(s Z1(n) ), F 2 (n; s) = E(s Z2(n) ). Introduce the following p.g.f. Then it is not difficult to check that for n = 0, 1, 2, ..., we are able to obtain the p.g.f. of the process: Notice that the asymptotic behaviour of the process depends essentially of parameter m. Especially, if m > 1 (supercritical case) then the mean value of the infected individuals M 1 (n) grows exponentially, in the critical case m = 1 it is a constant and for m < 1 (subcritical case) M 1 (n) → 0 as n → ∞. We will use these results to present in the next section a more complicated model with immigration. Note that we can observe only Z 2 (1), Z 2 (2), ..., Z 2 (n). What can be estimated with these observations? Note first that EZ2(n+1) EZ2(n) = m. Hence we can consider (11) m n = Z2(n+1) Z2(n) , n = 1, 2, ...., as an estimator of the parameter m (similar to Lotka-Nagaev estimator for the classical BGW branching process). It is possible to use also the following Harris type estimator (12) m n = n+1 i=2 Z 2 (i)/ n j=1 Z 2 (j), n = 1, 2, ...., or Crump and Hove type estimators (13) m [12] for more details. Estimating m we are able to predict the mean value of the infected (non observed) individuals in the population. In the case when we assume that Z 1 (0) = 1 then M 1 (n) = EZ 1 (n) can be approximated respectively by m n n , or m n n , or m n n,N . In fact it means that we can obtain three types of estimators (14) M 1 (n) = m n n , M 1 (n) = m n n and M 1 (n) = m n n,N . In other words we could say that we have at least three prognostic lines. Therefore if we have the observations (Z 2 (1), Z 2 (2), ..., Z 2 (n)) over the first n days, we are able to predict the mean value of the infected individuals for the next k days by the relations: (15) M 1 (n+k) = m n+k n , M 1 (n+k) = m n+k n and M 1 (n+k) = m n+k n,N , k = 1, 2, ... We are able to estimate also the proportion α(n) of the registered infected individuals among the population in the n-th day. Then we can obtain the following three types of estimators: α(n) = Z 2 (n)/{Z 2 (n) + M 1 (n)}, α(n) = Z 2 (n)/{Z 2 (n) + M 1 (n)}, α(n) = Z 2 (n)/{Z 2 (n) + M 1 (n)}. All obtained estimators will be presented by the observed registered labconfirmed cases. The quality of the estimation, however, depends on the representativeness of the sample due to the specifics of the data collection in each country. Remark 1. In fact our model (4) can be generalized as non-homogeneous in time. In this case m(l) = d ds h * (l; s)| s=1 = Eξ Notice that in this case EZ2(n+1) EZ2(n) = m(n). Hence we can use (11) − (13) to estimate M 1 (n) from (16). Therefore (17) M 1 (n) = Π n l=1 m l , M 1 (n) = Π n l=1 m l , n = 1, 2, .. 3. The two-type branching process with immigration as model of Covid-19 epidemic development. The model considered in Section 2 assumes that the process of infection begins with some random number of infected immigrants and then the immigration process is bounded and it is not essential for the Covid-19 population dynamics. But in some cases the role of the immigration process cannot be negligible. That is why we will introduce random variables {I n }, where I n gives the number of infected immigrants in the n − th day which take part in the process of infection. We will assume first that {I n } are iid r.v. with a p.g.f. (18) g(s) = Es In = l k=0 g k s k , |s| ≤ 1. Then instead of (4) we will consider the following branching process with immigration 2 (n; j)} are independent and identically distributed as (ξ 2 ) with p.g.f. (1) − (3) and they are also independent of {I n }. We can assume that Y 1 (0) > 0 is some random variable independent of 2 (n; j)} and {I n }, and also the case Y 2 (0) = 0. Another possible assumption is Y 1 (0) = Y 2 (0) = 0, which means that in fact the process starts with the first real immigrants. Interpretation: Y 1 (n) is the total number of individuals (type T 1 ) in the n-th day infected by the individuals of the (n − 1)-th day plus the new infected immigrants I n ; Y 2 (n) is the total number of the officially registered infected individuals (type T 2 ) in the n-th day. Then it is not difficult to check that for n = 0, 1, 2, ..., we are able to obtain from (18) and (19) the p.g.f.'s of the process: G 1 (n; s) = E(s Y1(n) ) = g(s)G 1 (n − 1; h * (s)) = f (h * n (s))Π n−1 k=0 g(h * k (s)), G 2 (n; s) = E(s Y2(n) ) = G 1 (n − 1; h(s)) = g( h(s))G 1 (n − 2; h * ( h(s))) = f (h * n−2 ( h(s)))Π n−1 k=0 g(h * k ( h(s))), where the p.g.f. h * k (s) and h k (s) are obtained after k iterations of the p.g.f. h * (s) and h(s) as it is given in (8) and f (s) = Es Y1(0) = N k=0 f k s k , |s| ≤ 1. Notice that if we assume that Y 1 (0) = 0 then f (s) ≡ 1 and (20) G 1 (n; s) = Π n−1 k=0 g(h * k (s)), G 2 (n; s) = Π n−2 k=0 g(h * k ( h(s))). From (18) we can introduce the immigration mean a = EI n = d ds g(s)| s=1 = l k=0 kg k . Then from (18)−(20) it is not difficult to obtain that for n = 1, 2, ... where it is assumed that A 1 (0) = EY 1 (0) = 0 and the parameters m and q are well defined in Section 2 by (5). Hence from (21) and (22) one has (23) A 1 (n) = a(m n − 1)/(m − 1), m = 1, and A 1 (n) = an, m = 1, (24) A 2 (n) = qa(m n−1 − 1)/(m − 1), m = 1, and A 2 (n) = qa(n − 1), m = 1 Therefore by (23) and (24) one obtains as n → ∞ A 1 (n) ∼ am n /(m − 1), m > 1; A 1 (n) = an, m = 1; A 1 (n) → a/(1 − m), m < 1, A 2 (n) ∼ qam n−1 /(m − 1), m > 1; A 2 (n) ∼ qan, m = 1; A 1 (n) → qa 1−m , m < 1. In the general case A 1 (0) = EY 1 (0) = M 0 > 0 and instead of (19) and (20) one has (25) A 1 (n) = M 0 m n + a n−1 k=0 m k , (26) A 2 (n) = q(M 0 m n−1 + a n−2 k=0 m k ). We would like to point out once again that we can observe only the statistics Y 2 (1), Y 2 (2), ..., Y 2 (n) and we have to use for estimation only these observations. Notice first that for m ≥ 1 we obtain lim n→∞ EY2(n+1) EY2(n) = m. Hence for large enough n we can consider (27) m n = Y 2 (n + 1)/Y 2 (n) as an estimator of the parameter m (similar to Lotka-Nagaev estimator for the classical BGW branching process). It is possible to use also for m > 1 and large enough n the following Harris type estimator (28) . See [12] for more details. Estimating m > 1 we are able to predict the mean value of the infected (non observed) individuals in the population. In the case when we assume that the process begins with the first immigrants then A 1 (n) = EY 1 (n) can be approximated using the estimators (27) − (29). The problem is how to estimate the immigration mean a. First of all there is an special case when a = m. Then using (23) and (24) with the Harris estimator we have (30) A 1 (n) = m n ( m n n − 1)/( m n − 1), m > 1; A 1 (n) = m n n, m = 1. In general we have to use some additional information. For example, if we can observe {I k } then we can apply the estimator a * n = n −1 n k=1 I k . Hence (31) A 1 (n) = a * n ( m n n − 1)/( m n − 1), m > 1; A 1 (n) = a * n n, m = 1. One can proceed similarly for the other estimators m n n and m n n,N . Remark 2. Similarly as it is shown in Remark 1 from Section 2 the model (19) can be generalized in the case with non-homogeneous in time offspring distributions. In this case m(l) = d ds h * (l; s)| s=1 = Eξ Recall that both type processes with or without immigration have exponential growth in the supercritical case m > 1. In the critical case m = 1 and in the subcritical case m < 1 the asymptotic behaviour is essentially different. In the critical case the mean value of the process with immigration grows linearly while for the process without immigration the mean value is constant. In the subcritical case the mean value of the immigration process converges to a positive constant but for the process without immigration the mean value limit is equal to zero. The estimation of the immigration mean a cannot be estimated by the observed statistics and we need some additional information. We would like to point out once again that the considered in Section 2 model is versatile but the application in each country is specific because it depends essentially on the official data from the country. The plots and tables below illustrate well some specific details for different countries as well as the common trend. The data used for the estimation of the parameters of the model come from European Centre for Disease Prevention and Control [21]. We will consider first the process without immigration. Note that the observed data is the number of the newly (daily) registered individuals denoted by Z 2 (n). The data about the new number of infected individuals (denoted by Z 1 (n)) is unobservable. The initial number m 0 = EZ 1 (0) is also unknown. Here n is the corresponding day from the beginning of the infection. The estimation of the parameters of the defined model can be summarized in the following steps. Firstly, we will demonstrate the approach described above by the data of the reported laboratory-confirmed COVID-19 daily cases for USA provided by the European Centre for Disease Prevention and Control [21] (the data are retrieved on 02.05.2020). Table 1 represents the estimated model parameters for the last 5 days of the available data set. Every row in the table represents the Harris estimatem n , as well as it's 95 % confidence interval ( CI l -CI r ), the proportion of the registered infected individuals α(n) and the expected values of the non-confirmed cases M 1 ( or A 1 for the process with immigration) , based on n − k observations, i.e. Z 2 (1), ..., Z 2 (n − k); k = 0, ..., 4. Table 1 : Estimation of the model parameters The following figures Figure 1 -6 represent the parameters of the process for the USA data. Figure 3 . After the initially large estimated values it stabilizes below 1.1, which is determined by the branching processes theory as a slightly supercritical process. This corresponds to the exponential growth shown above. The next results shows that the Harris type estimator has more stable behaviour than the Lotka-Nagaev type estimator. The last 5 days results for Italy, Germany, France, Spain and Bulgaria can be compared on Table 2 (the data are retrieved on 02.05.2020). The value of the proportion of the registered infected individuals is considerably higher in USA and France than in Germany and Italy, while countries with a longer infection period observe relatively small values of the Harris estimator of the mean value of the number of the confirmed infected individuals by one infected individual. Even more, the lower boundary of the confidence interval for the Harris estimator falls beneath the value of 1. Using the same data set one can compare the infection rate for different countries and regions on the basis of the estimated growth rate. For example, on Table 3 , the 10 countries with lowest and highest growth rate are shown. Even in the cases where the infection growth is less than 1, the upper bound of the confidence interval goes above 1, which states that there is a possibility that the infection growth will increase in the future. In the countries, where the infection is growing, the growth rate is isoclinically above 1, even though the lower boundary of the confidence interval is less Table 3 : Comparison of the growth rate than 1. It is usually due to the small number of observed infected individuals. First of all the estimation of the mean value of reproduction m allows us to classify the contamination process as supercritical (m > 1), critical (m = 1) and subcritical (m < 1). Recall that both type processes with or without immigration have exponential growth in the supercritical case m > 1. In the critical case m = 1 and in the subcritical case m < 1 the asymptotic behaviour is essentially different. In the critical case the mean value of the process with immigration grows linearly while for the process without immigration the mean value is constant. In the subcritical case the mean value of the immigration process converges to a positive constant but for the process without immigration the mean value limit is equal to zero. Finally the estimating of the mean parameter of infection can be considered as a first stage to construction of a more complicated epidemiological model. As an example, one can use a branching process with random migration considered in [17 − 19] or some other model of controlled branching processes (see [8] ). But for a general pandemic model the collaboration with specialists of epidemiology, mathematics, medicine, microbiology, molecular biology and informatics is absolutely necessary for the application of all information about Covid-19 phenomenon. Remark 3. For more detailed investigation and simulation the following models can be applied in the considered situation: * (s) = q + p 0 + p 1 s + p 2 s 2 + ... + p k s k , where q = 1 − References The Theory of Branching Processes Branching Processes. Nauka, Moscow, 1971 Multitype Branching Processes Branching Processes Branching Processes with Biological Applications Branching Processes Branching Processes: Variation, Growth and Extinction of Populations Controlled Branching Processes Transient Processes in Cell Proliferation Kinetics Branching Processes in Biology Statistical Inference for Branching Processes Statistical inference for branching processes Branching processes as models of progenitor cell populations and estimation of the offspring distributions Stochastic modeling and estimation of COVID-19 population dynamics Critical branching processes with nonhomogeneous migration Critical branching processes with random migration Branching Processes with two types of emigration and state-dependent immigration The authors would like to express their gratitude to P. Jagers, C. Athreya, E. Yarovaya, M. Molina, E. Waymire and all the colleagues of the "branching community" for the useful discussion and suggestions on the first paper [16] on Covid-19 topic.The research was partially supported by the National Scientific Foundation of Bulgaria at the Ministry of Education and Science, grant No KP-6-H22/3 and by the financial funds allocated to the Sofia University "St. Kliment Ohridski", grant N: 80-10-116/2020. Table 2 : Estimation of the model parameters k j=0 p j and p j , j = 0, 1, ..., k, can be specially chosen for k = 2, 3, 4, 5, 6, 7, 8.(It is possible to consider also the restricted geometrical distribution up to some k = 2, 3, 4, 5, 6, 7, 8.(Similarly it is possible to consider also the restricted Poisson distribution up to some k = 2, 3, 4, 5, 6, 7, 8. Note that the parameters of these distributions can be set in the manner that