key: cord-0648162-fc2q7fym authors: Jorge, D. C. P.; Oliveira, J. F.; Miranda, J. G. V.; Andrade, R. F. S.; Pinho, S. T. R. title: Estimating the effective reproduction number for heterogeneous models using incidence data date: 2021-02-25 journal: nan DOI: nan sha: 2667f27774bcef98ea651f57abb3518b213bd4e3 doc_id: 648162 cord_uid: fc2q7fym The effective reproduction number, R(t), is a central point in the study of infectious diseases. It establishes in an explicit way the extent of an epidemic spread process in a population. The current estimation methods for the time evolution of R(t), using incidence data, rely on the generation interval distribution, g(tau), which is usually obtained from empirical data or already known distributions from the literature. However, there are systems, especially highly heterogeneous ones, in which there is a lack of data and an adequate methodology to obtain g(tau). In this work, we use mathematical models to bridge this gap. We present a general methodology for obtaining an explicit expression of the reproduction numbers and the generation interval distributions provided by an arbitrary compartmental model. Additionally, we present the appropriate expressions to evaluate those reproduction numbers using incidence data. To highlight the relevance of such methodology, we apply it to the spread of Covid-19 in municipalities of the state of Rio de janeiro, Brazil. Using two meta-population models, we estimate the reproduction numbers and the contributions of each municipality in the generation of cases in all others. Our results point out the importance of mathematical modelling to provide epidemiological meaning of the available data. The effective reproduction number, R(t), is a central point in the study of infectious diseases. It establishes in an explicit way the extent of an epidemic spread process in a population. The current estimation methods for the time evolution of R(t), using incidence data, rely on the generation interval distribution, g(τ ), which is usually obtained from empirical data or already known distributions from the literature. However, there are systems, especially highly heterogeneous ones, in which there is a lack of data and an adequate methodology to obtain g(τ ). In this work, we use mathematical models to bridge this gap. We present a general methodology for obtaining an explicit expression of the reproduction numbers and the generation interval distributions provided by an arbitrary compartmental model. Additionally, we present the appropriate expressions to evaluate those reproduction numbers using incidence data. To highlight the relevance of such methodology, we apply it to the spread of Covid-19 in municipalities of the state of Rio de janeiro, Brazil. Using two metapopulation models, we estimate the reproduction numbers and the contributions of each municipality in the generation of cases in all others. Our results point out the importance of mathematical modelling to provide epidemiological meaning of the available data. The human population gets along with different microorganisms. Some of them lead to transmitted diseases that result in epidemics, even pandemics such as SARS-Cov-2. It is very important to define reliable measures to characterize the spread of those pathogens, both at the beginning and in the course of epidemics. The reproduction number R(t) indicates the number of new infections that result from a single infected individual at any time t. When it is greater than 1, each infected individual tends to generate more than one infected individuals resulting in the occurrence of an epidemic outbreak. The reproduction number is usually known from its initial value, when all of the population is susceptible, known as the basic reproduction number R 0 . Due to the recovery of infected individuals, a portion of the population becomes immune to the disease, thus, herd immunity begins to emerge in the population. This acts as a barrier to the transmission of the disease, so R(t) takes into account the evolution of the number of susceptible individuals in the population and is usually referred as the effective reproduction number. It provides us with information on how disease transmission occurs during the outbreak it is an important metric to make healthcare managers aware of the current level of the disease propagation. By considering heterogeneities in the homogeneous compartmental models, the reproduction number is affected. Indeed different classes of infected individuals can lead to different contributions for the generation of new infections. A lot of studies have taken the heterogeneity into account when dealing with the basic reproduction number R 0 [1, 2] , however, this is not the same for the effective reproduction number, R(t). To study reproduction number, the renewal equation, a Lotka-Euler type equation [3] , is usually used. This equation emerges from the mathematical models dynamics and can be derived using infection-age models. This type of modelling comes from the study of age-structured population growth and was first used within the scope of mathematical demography with the aim of modeling the birth dynamics due to the offspring produced by an individual over its lifespan. Analogously, the same methodology can be applied to the infectious diseases scenario by modelling the generation of new infected individuals during the infection period of those previously infected [4] . Infection-age models are the origins of the widely known SIR and SEIR models developed by Kermack-Mckendrick [5] . Most known compartment ODE models for infectious diseases have their infection-age identical counterparts. The advantage with respect to the infection-age models is that they allow us to have access to the distribution of the infected individuals during the infectious period. The estimation of the reproduction number in the course of an outbreak usually rely on the daily count of new cases along with the renewal equation. This approach uses the generation interval distribution, also called the serial interval or generation time distribution, and was popularized by [3] . Several works indicate how to obtain this distribution through empirical data [6] [7] [8] [9] or from other known distributions and models [3, [10] [11] [12] [13] . However, these distributions rely on epidemiological and empirical studies and several systems are very difficult to analyse or even do not have the required data for the envisaged evaluation. Moreover, highly heterogeneous systems do not have a general methodology for estimating the reproduction number and its generation interval distribution [14] [15] [16] [17] . In this study, we use mathematical models to bridge the gap between the reported cases and the reproduction number. Therefore, we present a methodology that derives reproduction numbers and the generation interval distribution for an arbitrary compartment mathematical model. With such methodology, we can investigate highly heterogeneous systems using appropriate models and evaluate their reproduction numbers. With that in mind, apply the method to a meta-population model to analyse the role of spatial heterogeneity in the spreading of COVID-19 in the Brazilian territory. Our work is separated as follow: in section 2, we introduce heterogeneity by allowing the existence of different groups of individuals in a population that can be sorted into compartments [18], considering then a very general heterogeneous model that can be reduced to most models in literature. We use this general formulation to develop our methodology to obtain the reproduction numbers and expressions that correlate it to actual data. Then, in section 3, we apply it to a specific scenario related to the role of the inter-municipal commuter flow of people and extract its reproduction number and the generation interval distribution. In section 4 and 5 we use actual data of the emerging SARS-Cov-2 coronavirus pandemic in the municipalities of the state of Rio de Janeiro, Brazil, to estimate the reproduction numbers that emerge from the system. Additionally, we reconstruct the time series of cases from the contributions of each municipality in the propagation of the disease. In a heterogeneous population, individuals can be discernible by age-groups, spatial locations, behaviour, different susceptibility to diseases or any other factor that may distinguish them for one another. In this section, we consider a general heterogeneous model that separates the individuals in m homogeneous compartments. For the purpose of evaluating the reproduction number, we first consider a sub-set of variables encompassing the infected compartments, i.e., those compartments containing individuals that carry the etiologic agent in their organisms. These compartments are denoted by x(t, τ ) = (x 1 (t, τ ), ..., xn(t, τ )), where t and τ indicate, respectively, the calendar time and the infection-age, the time elapsed since they got the infection. Of course the condition x(t, τ ) = 0 for τ > t must be satisfied. If m = m i + m 0 , where m i and m 0 indicate the number of infected and non-infected compartments in the homogeneous model, n = m i . Therefore, x represents the number of individuals in each compartment at a specific calendar time "t" and at infection age "τ ". x entries can be called the infection-age distributions. The total number of individuals in each compartment, denoted as X i (t), can be obtained by integrating x i (t, τ ) with respect of τ from zero to infinity, where X(t) = (X 1 (t), ..., Xn(t)). For simplicity, we will be using matrix notation in some equations. The matrices and vectors will be identified by using bold font, whereby M (y, z) = M ij (y, z) . Also, as usual, integrals and derivatives with respect of scalar variables operate componentwise. Drawing a parallel with Van den Driessche's next generation method [19] , we distinguish the new infections from the other compartment flows, defining F (t) = (F(t), ..., Fn(t)) as the rate of appearance of new infections and V(t, τ ) = (V 1 (t, τ ), ..., Vn(t, τ )) as the rate of transfer into compartments. Therefore, F (t) describes the flow from non-infected compartments into infected ones and depends on X(t). On the other hand,V(t, τ ) is related to the flow between infected compartments or from infected to non-infected ones and must depend on x(t, τ ). Thus, an usual infection-age model can be written as: Infection-age models are partial differential equations (PDE) that takes into account the time elapsed since the infection τ . The equations (2.1) and (2.2) can usually be linked to their respective ODE models, and in fact the Kermack-Mckendrick SIR and SEIR models [5] are the special cases of their infection-age correspondents. This modeling allows us to have access to the distribution of the infected during the infectious period, which enable us to estimate the number of active infected individuals and the reproduction number along the outbreak using the reported data. Integrating (2.1) from zero to infinity of with respect of τ we obtain: which is the correspondent to the sub-set of equations for the infected variables of the ODE model. Therefore, even though we are dealing with an infection-age model, one can simply build F and V from an ODE system and apply the methodology in this work. We demonstrate this for usual ODE models in the Supplementary Material 1. Since F i (t) and V i (t, τ ) are well defined, we proceed to solving equation (2.1) using the method of integration along the characteristic line. Thus, the left side of the equation (2.1) indicates that the characteristics of those PDE's are lines of slope 1, which implies that t = τ + c with c being an arbitrary constant. We fix a point (t 0 ,τ 0 ) and introduce a variable ω such that u i (s) = x i (t 0 + ω, τ 0 + ω) are functions that provide the values of the compartment densities along the characteristic. After straightforward calculations [4] , we obtain In epidemic models,V i (ω) are usually linear equations, making this system easy to solve. In that case, the system can be represented as: is the matrix of the linear system. Assuming that there are no infected individuals prior to t = 0, we only need to take into account the solution where t > τ , leading to ω = τ , t = τ + t 0 and τ 0 = 0. The solution for a linear system can be written as where Γ (ω) = Γ (t, τ ), is the fundamental matrix obtained by solving (2.4). Therefore, we identify F (t 0 ) = u(0) in (2.2). So, (2.6) becomes For a linear V(t, τ ), Γ (τ ) components are exponential functions. Equation (2.7) can also be used to express the solution of a general non-linear V(t, τ ), but if it is not the case, such system will not be considered in this work. On the other hand, we can also assume that F i (t) can be written as (2.8) in which Ω(t, τ ) is related to the generation of infected individuals in "i" due to "j". If Ω does not depend on τ , as in ODE models, it assumes the form of Most system in literature satisfies (2.7) and (2.8), thus the method in this work is very general and can be applied to a wide range of models. To estimate the reproduction number using the reported data of new infections, we need to link them with the equations of the model. This can be done by substituting (2.7) in (2.8) . Arranging the equation we get The functions A ij (t, τ ), analogously to [10] , are referred as the rate of new infections in X i due to X j previous infections at a calendar time t and infection-age τ , whereby A ij (t, τ > t) ≡ 0. So, we can describe the new cases in "X i " from the cases that occurred previously in all the groups. In fact, (2.10) is a general form for the widely known renewal equation, [3, 10] , actually a sum of renewal equations. Thus, by defining the number of new infections in the "i" compartment due to "j" previews infections as it becomes clear that F i (t) = n j J ij (t). Thus, it is possible to quantify the influence of the infections occurred in one compartment new ones in another compartment. To obtain the expected number of new infections in X i a newly infected individual form X j is expected to generate, thus the reproduction number, we integrate A ij (t, τ ) from zero to infinity with respect to τ , as in [10] . (2.13) R is usually called the next-generation matrix, where its entries corresponds to the reproduction numbers of the system [14] . It is remarkable that, for a ODE model where V is linear, the next-generation matrix evaluated at the free-disease fixed point R|x 0 , is equivalent to the one developed in [19] . Naturally from (2.13), we define where g ij (t, τ ) is normalized and denoted as the generation interval distribution [3, 10] . Thus, it is related to the flow of individuals between infected compartments and their recover process. We assume a generation interval distribution depending on t and τ , which is general enough to be applied to models whose dynamics can change over time. Therefore, using (2.13) and (2.14) in (2.12) we obtain: It is important to highlight that R ij (t) is not necessarily the number of new infected in X i generated by X j . Instead, the meaning of R ij (t) is linked to the generation of new infected ones in "i", F i (t), due to infected individuals previously generated in "j", F j (t − τ ), regardless of what stage of the disease these infected individuals who entered "j" are at instant t. That is the case because a new infected individual may generate new infections through out many stages of the disease. In the SEIR model, for example, all of the new infections are in the E compartment, but the individual only becomes infective when it goes to the I. To access the reproduction number a newly infected in X j is expected to generate to all of the other compartments, we can just sum the R ij over i, defining (2.16) Through out this work, the over-line represents the merge of the first index , in this case, in the form of a sum. This lead us to analogously define A j = n i A ij , whereby its integral from zero to infinity with respect to τ is R j . Thus, we are able to write where J j (t) = n i J ij (t) and is the total number of new infections generated by previews infections in X j . It is interesting that the generation interval distribution g j (t, τ ) takes the form of a weighted average in which the g ij (t, τ ) are the weights. Thus, the implementation of the proposed method in this work amounts to: identifying the terms F and V from a model; using them to find Ω and Γ ; obtaining A and integrating to get R and g. Further in this work we present applications of the method and estimations using actual data. Examples of the method in different types of models can be found on the Supplementary Material 1. After obtaining the reproduction numbers of the constituents of the system, we now intend to define a reproduction number for the whole system. The number of new infections in a group, F i (t), can be described as a fraction of the total number of cases from all groups, Since α i (t) is the proportion of the total number of cases that F i (t) represents, the condition 1 = i α i (t) must be satisfied. Thus, using (2.10) with (2.19), we obtain: where R T (t) = α · R is the reproduction number of the system and . (2.21) Since R T (t) is the scalar product between α and R, we can interpret it as the projection of the R over the fractions α forming a weighted average. Thus, it is interesting to note that we can describe the system as an average of its heterogeneities. Since the definition of the total number of reproductions R T is very general, it is not always meaningful. For example, if we form a system of two independent dynamics, that is, (R ij = 0 for i = j) it is still possible to obtain a R T , even if there is no meaning to it. The α i (t) functions are the key for analysing the feasibility of a total reproduction number that has a dynamical meaning. We focus our attention on a case where the α i (t) appears naturally in the equations. If Ω ij (t, τ ) can be separated in a function of t depending on the "i" index and another function of t and τ depending on the "j" index, then it can be written as where ⊗ represents a tensor product. We define Ω j = i Ω ij , and noticed that A j = k Ω k Γ kj e A = α ⊗ A. Thus, Ω and A can be separated in proportions, α and Ω. In fact, the above equation also impacts (2.16) and (2.17) which can be factorized in terms of R = α ⊗ R e J = α ⊗ J . Using (2.18) we can also get g ij (t, τ ) = g j (t, τ ). Furthermore, because the next generation matrix is obtained from a tensor product of vectors, the largest eigenvalue of R corresponds to the scalar product of R e α, that is R T . Thus, in these systems the total reproduction number assessed at the disease-free equilibrium point, t = 0, corresponds to the basic reproduction number, R T (0) = R 0 . Also, in that case, the R T is the spectral radius of R, as in [19] . This is very common in disease transmission models, in fact the SIR and SEIR model are examples of this case. So far, we've developed a general framework to estimate the reproduction numbers from the rate of new infections F. However, when we deal with real data what we have is the collection of all of the new infections in an ∆t period of time, which leads to the definition of B and T . Therefore B i (t) are the reported cases in X i , where ρ i is a constant related to a higher or lower notification, due to sub/super-notification. Analogously we define T ij (t) as the number of reported cases in X i due to previous cases in X j . By assuming that R ij , g ij (t, τ ) and F i are approximately constants during a ∆t interval, we use (2.15) and (2.8) to derive is a general form for the discrete version of the renewal equation. In fact, by using (2.20) we are able to recover a well known result in literature [17] In this section we analyse two meta-population models detailed in Supplementary Material 2. The methodology developed on this study is applied to both models to obtain the correspondent reproduction numbers and generation interval distributions, see Supplementary Material 1 for detailed calculations. (a) SIR-type meta-population model In this section we will consider a meta-population model takes into account groups of spatially separated "island" populations with some level of interaction. Such models are widely used for systems in which the movement of individuals between meta-populations is considered [20] [21] [22] [23] [24] [25] [26] . Since each population is connected with the others, the system can be interpreted as a network for which the nodes represent the meta-populations and the weight of their edges represent the intensity of the movement between them. This type of model does not describe the daily movement of individuals explicitly, but as an interaction of the meta-populations. It is suitable when the population sizes are not permanently affected by the flow of individuals, as in the case of commuter movement between locations of residence, work and study. This type of movement is obligatory cyclical, predictable and recurring regularly, most of the time on a daily basis. In this model, we assumed that each meta-population i, with N i individuals, has its own transmission rate β i (t). Also, the movement of individuals between meta-populations is taken into account, where we introduce Φ ij (t) as the density of flow between the populations i and j. That is, the amount of i resident individuals commuting from i to j divided by the total population of i, N i . The parameters β i (t) and Φ ij (t) are time dependent, so we can incorporate the changes in the behavior of the populations on those variables.In Supplementary Material 2 we derive this model, inspired by [26] , and in Supplementary Material 1 we obtain the corresponding reproduction numbers and generation interval distribution for it, which reads: Here λ ij is related to the transmission of the disease from a meta-population "i" to other metapopulation "j" and is derived based on simple assumptions about the commuter movement of individuals in the network, see Supplementary Material 2. Notably, if we isolate the metapopulations from the network, Φ ij (t) = 0, for all i and j, then the reproduction numbers and the generation interval distributions becomes identical to the classical SIR model [10] . The classical SIR model is a very simple and qualitative approach to disease transmission dynamics, but it does not provide the best description for most diseases. For now on, we will be focusing on a model for a specific disease, the SARS-Cov-2 coronavirus. In this case, the transmission can be facilitated by the existence of individuals whose symptoms are very weak or even nonexistent [27] , therefore, this heterogeneity can change the dynamics. In order to have consistence description of this aspect, it is wise to assume that the existence of two classes of infected individuals, the symptomatic and the asymptomatic/undetected ones, as considered in a general model for the SARS-Cov-2 coronavirus [28] . Therefore, we now take into account infected individuals who do not need to be hospitalized and are not recorded in official registered data, thus becoming undetectable. For the sake of simplicity, we will refer to those individuals only as asymptomatic ones, for simplicity. In Supplementary Material 2 we proceed to derive this model, based in the meta-population SIR-type approach described in the previous section. In Supplementary Material 1 the expressions for the reproduction number and generation interval distribution are detailed derived. There, we show that we only need R for i, j ≤ n to describe the dynamics. Thus, in this main framework, whenever we refer to R or g we will be alluding to their i, j ≤ n elements. Thus, we obtain: the λ ij 's expressions are the same presented for the SIR-type model in Supplementary Material 2. δ is a factor that reduces or enhances the asymptomatic infectivity, p is the proportion of individuals that becomes symptomatic when infected, γa and γs are the recover rates of the asymptomatic and symptomatic individuals, respectively. g a (τ ) and g s (τ ) are expressed in terms of exponential functions described according to (3.3) . Whereby, if we isolate the meta-populations from the network, Φ ij (t) = 0, for all i and j, the reproduction numbers and generation interval distributions returns to the expression obtained in [28] . In this section, we present results for the methodology applied to the meta-population model developed in the previews section. We use actual data of the first six months of the COVID-19 pandemic in Brazilian cities, such as: reported cases in each municipality, daily commuter movement due to work between municipalities and daily mobility tends towards workplaces. In Supplementary Material 3 we derive and present the expressions and parameters needed to estimate the reproduction numbers for both meta-population models. Thus, we obtain a daily time series of the reproduction numbers for each model. In this work, we use daily notifications of new cases due to COVID-19 in Brazil, obtained from public websites: https://covid.saude.gov.br/ and https://brasil.io/datasets/ , which provide data from the Health Ministry. We obtained intermunicipal commuter movement of workers and students data from a study on population arrangements and urban concentrations in Brazil conducted by IBGE (Brasilian Institute of Geography and Statistics) in 2015 that can be found at [29] . In addition, we obtain daily mobility data for each Brazilian state from a public report by Google, accessed at: https://google.com/covid19/mobility/. We used the data observed until September 14th, 2020, and performed a 10-day moving average, in order to attenuate noise and better express the data trend. To take the social distancing restrictions into account, we considered only the commuter movement data related to work, since, with the mitigation measures of COVID-19, the flow due to education were significantly reduced. In addition, because the movement towards work also dropped due to social isolation policies, we used the mobility data obtained from the community mobility report provided by Google to estimate this reduction. This database compares for each state the daily mobility to workplaces with the past trends, therefore, we can access the percentage of reduction in commuting to work. A moving average is performed in the mobility time series and we reduce the intermunicipal work flow according to the percentage indicated on the data. Thus, we obtain the flow of individuals due to commuter work that leads to Φ ij (t). The parameters used to feed the model were obtained in [30] and are displayed in Supplementary Material 3. The data from the state of Rio de Janeiro was selected for this analysis. Ten cities with the highest commuter flow with the capital Rio de Janeiro (RJ), were chosen, namely: In our first results, shown in Figure 1 , we present a comparison between the SIR and SEIIR outputs. Using the daily time series of the reproduction numbers, see Supplementary Material 3, we obtain the series of R(t) and T (t) from (2.16) and (2.24), respectively. We observe that the R of SEIIR model is, in the average, 33% higher then the corresponding results for the SIR model. On the other hand, the estimations of the total number of exported cases that are reported, t n i T ij (t) for i = j, is very similar for both SIR and SEIIR models Also, it seems that the total commuter movement, which is the sum of all the inflow and outflow happening in a municipality, is not the only main factor that determine the number of exported cases of a municipality. This non-linearity can be observed when comparing São João de Meriti (SJdM) and Niterói (Nt) or Duque de Caxias (DdC) and Nova Iguaçu (NI) (see Figure 1b) . Those municipalities have a similar amount of total flow but very different results for the exported cases. Interestingly, even not having the highest R j , the capital, Rio de Janeiro (RJ) presents the largest amount of exported cases, which also showcases the non-linear dynamics of the phenomenon. From now on, we will only look at the results of the SEIIR model. With T ij (t) we are able to access the contribution of each municipality on the outbreaks happening in the state. Thus, by dividing T ij (t) by B i (t) in every time step, we obtain a time series for the proportion of the total cases in "i" generated by "j". Therefore, we proceed into evaluating the mean value through time of T ij /B i , where it is displaced in Figure 2 . It is observed a high autochthonous behavior on the disease transmission, whereby the highest influence of a city is on itself. Therefore, most of the cases generated in a municipality are caused by its own individuals. However, we also identify cities where the cases generated by other municipalities on it are very important. The R(t) matrix also corroborates the presence of an important autochthonous behavior as its diagonal elements correspond to the highest values of the reproduction numbers. We also observed lots of very smalloff-diagonal elements low values through out the matrix. In Figure 3 we illustrate the results of Figure 2 , whereby only non-autochthonous influences above 5% are considered. Thus, we must highlight the capital, Rio de Janeiro (RJ), as the most important agent on the disease transmission to the municipalities on the network. However, cities like Nova Iguaçu (NI) also presents itself as a relevant disseminator of the pathogen. As shown in Figure 1 , Nova Iguaçu NI is the highest city in case exportation, besides the capital. In Figure 3 (Mq) and Queimados(Q) as the main receptors of those cases. Niterói (Nt), even having a NIlike number of exported cases, did not presented a high influence on a lot of cities. On the other hand, Niterói (Nt) generates a significant amount of cases in São Gonçalo (SG), highlighting the importance of the connection of those two municipalities. We illustrate the time series reconstruction in Figure 4 where two scenarios are displaced. In the first one, we focus on the cases reported in São Gonçalo (SG) and compare total amount, B i (t), with the number of daily cases in SG generated by Rio de Janeiro (RJ) and Niterói (Nt). We observe that Nt has a predominance over the capital, presenting a higher number of cases generated in all times. The second scenario is related to the total number of cases in São João de Meriti (SJdM) and the contributions due to Rio de Janeiro (RJ) and Nova Iguaçu (NI). In this case, the capital presents the largest number of cases generated in that city, besides the city itself. It is also interesting to observe in both scenarios how the cities contributions merge into a part of the total cases notified on a municipality. Understanding and dealing with infectious diseases is an on going challenge to humankind. The methodology presented in this work is very general and can be applied to multiple disease transmission systems. By merging the model dynamics with the available epidemic data, this method is able to estimate key epidemiological factors, like the reproduction numbers and the generation interval distributions. These theoretical results are the basis for a lot of possible data analyses, specially for highly heterogeneous systems that have been demanding for a suitable methodology for estimating the reproduction number through actual data. The method is robust and reproduces known results in literature, as shown in Supplementary Material 1 for the SIR, SEIR and SEIIR models. This methodology opens room for the analyses of more sophisticated models, that leads to a better understanding and control of infectious diseases. With the emerging of the COVID-19 pandemic, due to SARS-Cov-2 coronavirus, in December 2019, scientists allover the world have joined forces to formulate control strategies [31] . Factors like the presence of asymptomatic individuals, risk groups and spatial distribution of the disease brings out a system with multiples layers of heterogeneity [26, 27, 32, 33] . Although the application of some vaccines starts recently, mitigating COVID-19 with non-pharmacological strategies is essential. Therefore, the second half of this work we focused on the application of the developed methodology for two models with spatial heterogeneity, one focusing on the COVID-19 specific dynamics. Each municipal demography, captured by β parameters, and the commuter movement were combined by the model lead to some interesting results. The reproduction numbers are obtained for both models indicate that they only differ by multiplying factors, as 1/γ and p γs + δ(1−p) γa , as in [28, 30] . The measured results shows that the asymptomatic presence is related to an increase on the reproduction number, because the model predicts more infected individuals than what is reported. Another substantial result is obtained when comparing the exported cases with the total commuter movement in a municipality. In this case, the relationship is not direct, since the intricate topology of the network fosters a non-linear dynamic in the phenomenon. This result points up the importance of a model to provide epidemiological meaning of the available data. For this specific case, it became evident that analysing the system only by the commuter movement data wouldn't be enough to point out the key cities on the disease transmission dynamic, as we did using the models and methodology. The results of the model display the role of each municipality on the epidemic on the network. Cities like Nova Iguaçu, Niterói and São Gonçalo pops out as important agents on the spread of the pathogen through out the metropolitan region of Rio de Janeiro, Brazil. The capital is highlighted as the main hub of spreading, which is related to its high incidence of the disease combined with its central role on the movement behavior of individuals on the network, as also observed in [26, 30, 34] . However, cases like Niterói and São Gonçalo, where the interaction of both cities is higher than the capital's influence, cannot be neglected. This call's the attention for the relevance of analysis, like in this work, that provides the epidemiological interpretation of data. In this work, we choose to present an analysis with actual data in which the reproduction number is not the main result. We proceeded with this approach to portrait the reproduction number not as a dead end result, but as a tool for obtaining deeper analyses, such as the reconstruction of the time series and the number of exported cases. Our results for the Rio de Janeiro metropolitan area, however, have limitations. The model developed in this work provides a very simple approximation of intercity commuter flow, while more sophisticated models available in the literature are required to provide more precise description of the movement behavior. Another limitation of this work is that it does not consider the interstate flow, since the state of Rio de Janeiro was portrayed as a closed system Also, we did not take into account further heterogeneous features, which could be accounted as well by the general theoretical formalism, besides space and asymptomatic presence, leaving a gap for other key COVID-19 dynamics characteristics, like age-groups. In addition, the examples presented in this work considered actual reported data of confirmed cases and mobility, which may present problems of underestimation and report delay. Taking those limitations into account, the results displaced here are still able to give substantial understanding of the system studied, which is a common feature in mathematical modelling. Finally, we reinforce the importance of the method proposed on this work and highlight its broad application on infectious disease models. Ethics. Since all data handled in this study is publicly available, an approval by an ethics committee is not We proceed to illustrate the application of the method developed in this work to compartmental epidemic models. We chose two simple models that are known in literature and two variations of a meta-population model that is used in the main framework and developed in Supplementary Material 2. We seek to show the method step by step, presenting its detailed calculations. All models presented are composed of ordinary differential equations. In the transitions between infected compartments described in these models, V(t, τ ), the the infected compartments x(t, τ ) appear with linear dependence. Therefore we can simplify: In addition, all the parameters of the models in this Supplementary Material are constant, which leads to The SEIR model is designed by introducing a new exposed E stage of the disease into the SIR model 1 . We can assume that when an individual becomes infected, it must go through a latency period before showing its first symptoms and starting to infect other individuals. This is accomplished by introducing the exposed compartment E and its removal rate κ. Thus, all individuals who are infected start in the exposed state and, on average, after a 1/κ latency time are introduced into the I compartment. It is also possible to introduce a factor related to a pre-symptomatic infection. This way, individuals can start to infect in the exposed compartment, before presenting symptoms. Thus, this model, considering pre-symptomatic infection, can be written as Thus, we can sort the infected compartments as X(t) = [E(t), I(t)]. In this way the distributions of the infectious phase are defined as x(t, τ ) = [i e (t, τ ), i i (t, τ )]. We get F (t) e V(t, τ ): as we recover the sub-set of equations for the infected compartments with: The change from E to I is not considered a new infection, but the progression of the disease stage. Therefore, new infections only occur in the exposed compartment E, causing F 2 (t) = 0. Thus, from F (t), we obtain Ω(t): We proceed to obtain Γ(τ ) by solving In order to obtain Therefore: With Ω(t) and Γ(τ ) we perform the multiplication between the matrices, in order to obtain All terms in the second line are null, leaving only Performing the integral from zero to infinity with respect to τ we obtain the reproduction numbers of the system R(t): Where it is clear that Since there is only generation of new infected in the exposed compartment, we have to F 1 (t) = F T (t). Thus, the vector α(t) can be written as α = 1, 0 , thus Thus, to obtain the total reproduction number of the system, it is enough to make the scalar product between R and α: which leads to When we make → 0, we retrieve the result of the reproduction number of the classic SEIR model (without pre-symptomatic infection),R 0 = β/γ. Similarly, the generation interval distribution of the SEIR model is also reduced to the well-known form in the literature 2 , demonstrating robustness in the method. It is trivial to apply the method to the SIR model, whose analysis corresponds to tanking a limit of κ → ∞. Therefore, both the reproduction number and the generation interval distribution, g(τ ) = γe −γτ , return to the known results from literature 3 . This is an adaptation of the SIR model, where we include two different manifestations of the disease, I 1 and I 2 . In this model, there is only one susceptible population whose individuals can evolve into two compartments that carry the infectious agent. Both types of the disease carry the same pathogen, so that individuals infected with either type can go for I 1 and I 2 . When infected, an individual has the probability p of manifesting the type I 1 of the disease and q = (1 − p) of manifesting the type I 2 . We assume two independent infection rates β 1 and β 2 for I 1 and I 2 . Likewise, each slot has its own γ 1 or γ 2 removal rate. So, we write the model equations: We sort the infected compartments as where it is clear that We obtain F (t) and V(t, τ ) as: Thus The linear O.D.E system described by the matrix −∂V/∂x it is simple to solve, since this is a diagonal matrix. So, we get which, when multiplied by the matrix Ω(t) results in It is interesting to realize that this is one of the cases where A ij (t, τ ) = α i A j (t, τ ), recalling that A j (t, τ ) = i A ij (t, τ ). Therefore α = p, q and we can factorize the matrix into A and α as: Where ⊗ represents the tensorial product, α ⊗ A = α i A j . Integrating A(t, τ ) in relation to τ we get: Therefore, we proceed to the scalar product between α and R: When t → 0, R 0 = pβ 1 /γ 1 + qβ 2 /γ 2 . We realize that the total reproduction number is the sum of the reproduction numbers of the two types of infection times the percentage of occurrence of each. Thus, we proceed to obtain the distribution of the generation interval, which takes the form: In this section we consider a meta-population model that will be used in the main framework and is detailed at Supplementary Material 2. Here we summarize the model. We consider the existence of "n" meta-populations with coupled SIR-type dynamics. Where, due to the movement of individuals between meta-populations, an infected compartment in one meta-population can influence the disease transmission process of all the others. The coupling of the equations happens by the transmission rates λ ik related to the contamination process that emerges from the flow of individuals. The SIR-type model for n meta-populations can be written as in which S i (t), I i (t) and R i (t) correspond to the susceptible, infected and removed individuals that belong to the meta-population "i". The λ ij parameter is related to the transmission between the meta-populations "i" and "j", see Supplementary Material 2. We assume that the recover rate γ is uniform for all meta-populations.The infected compartments are sorted as X(t) = [I 1 (t), I 2 (t), . . . , I n (t)] and x(t, τ ) = [i 1 (t, τ ), i 2 (t, τ ), . . . , i n (t, τ )], where it is clear that X(t) = ∞ 0 x(t, τ )dτ . We proceed to identify F i (t) and V i (t, τ ) as Therefore we get: where I represents the identity matrix of dimension "n". Thus, when multiplying the matrices Ω and Γ we arrive at: Integramos A de forma a obter a matriz de próxima geração do sistema R, dada por: That leads to: Whereby, if only one meta-population is considered, the SIR model results are recovered 3 . Finally, we present a meta-population model related to the transmission dynamics of SARS-Cov-2 coronavirus. As in the previous section, the meta-populations are connected by the commuter movement of individuals. In this model, individuals that are infected have to pass through a latency period to become infectious, during that time, we consider that those are in the exposed compartment E. After a mean latency period of 1/k, those infected can become symptomatic or asymptomatic, I s and I a respectively. While in these compartments, the individuals of "j" are able to generate new infected ones in "i" based on the transmission rate λ ij . However, asymptomatic ones are considered to have a lower transmissibility, thus, the transmission rate must be multiplied by a factor δ that lowers it's infectivity. The SEIIR model was developed by Oliveira in 4 and in Supplementary Material 2 we derive an SEIIR-type meta-population model that can be written as: To proceed with the methodology, we sort the compartments as X = [E 1 , ..., E n , I 1 a , ..., I n a , I 1 s , ..., I n s ] and x = [i e 1 , ..., i e n , i s 1 , ..., i n s , i 1 a , ..., i n a ] in a way that if there are n meta-populations we have 3n infected compartments. Therefore, we obtain F (t) and V(t, τ ) as: where I s j = X j+n and I a j = X j+2n . Thus, from (47), we obtain: Whereby I n is an n × n identity matrix and ⊗ is a tensor product. Also, Ω is divided into nine n × n submatrices, where 0 is an all-zeroes n × n matrix, Ω s ≡ λ ij S i and Ω a ≡ δΩ s . Solving the system of differential equations on the characteristic line, we obtain: We proceed into the multiplication of the matrices Ω and Γ in order to obtain: whereby the submatrices are defined as: A s ≡ e −γ s τ Ω s and A a ≡ e −γ a τ Ω a . Therefore, the next generation matrix is where: Because there is no generation of infected individuals on the symptomatic and asymptomatic compartments, the expressions for R s and R a are not needed. This is so because F i (t) for i > n is null for any time. Therefore, the renewal equations of F i (t) for i > n have the tautological result that zero equals zero, regardless of the values of R s or R a . Thus, all we need to describe the dynamics is R for i, j ≤ n, that is R e . Lastly, we obtain the generation interval distribution matrix for i, j ≤ n: for g a (τ ) = κγ a γ a − κ (e −κτ − e −γ a τ ), g s (τ ) = κγ s γ s − κ (e −κτ − e −γ s τ ). In this Supplementary Material we will be, inspired by 5 , developing a meta-population model that takes into account the movement of individuals between meta-populations to describe the propagation of a disease through out multiple municipalities . We will be interpreting the inter-municipal flow as a complex network where its nodes represent municipalities and the weight of its edges represents the intensity of the flow between the connected municipalities. We also consider that each municipality can be interpreted as a meta-population, with its own compartments and parameters, which can be represented by a vector y i (t), where the index "i" indicates the meta-population portrayed.Each entry of y i (t) represents a compartment of this meta-population, such as y i (t) = S i (t), I i (t), R i (t) in the case of a SIR model. In which S i (t), I i (t) and R i (t) correspond to the susceptible, infected and removed individuals that are residents of the meta population "i", respectively. The sum of the elements of y i corresponds to the number of individuals of the meta-population "i", in the SIR model we have S i (t) + I i (t) + R i (t) = N i (t). We can represent the amount of individuals that goes from the meta-population "i" to another meta-population "j" each day as ϕ ij , so it represents the flow of individuals between those meta-populations. Since each meta-population is described from it's compartments, we can describe the flow of those using y i (t) in We define Φ ij (t) ≡ ϕ ij (t) N i as the density of flow. In that way, Φ ij y i is the number of individuals from each compartment class that are flowing from "i" to "j". It's natural to see that sum of Φ ij y i for each compartment class of y i is equal to ϕ ij , in the SIR model for example: Given that the meta-populations are connected through the flow , it is necessary to identify how they. For this purpose, graphs are used which, in general, represent complex networks, given the large number of connections. To represent networks, matrices are used, as in the case of adjacency matrices whose elements represent the density of flow (Φ ij ). Thus, the values of each Φ ij (t) are the weight of the edges with a null value representing the absence of an edge. Since there are no self interactions in this network, we do not consider the flow of a meta-population to itself, thus ϕ ii ≡ 0 . Due to the circulation of individuals through the network, the characteristics of the meta-populations will be changed. Thus, we define the effective population of "i" described by the vector y e, i = S e ,i (t), I e ,i (t), R e ,i (t) which corresponds to individuals located, at time t, in the "i" meta-population regardless of their origin metapopulation.The effective population represents the new characteristics of a population due to the flow; for example, a meta-population that does not have infected resident individuals may have carriers of the pathogen in their effective population due to the flow from some other meta-population that presents infected individuals. We can write y e, i as: It's clear then that the effective population of "i" is the sum of the individuals from "i" that are in "i" , y i 1 − n j =i Φ ij , with the individuals from other meta populations that are in "i", n j =i Φ ji y j . If we sum each element of y e, i we have the effective population number: N e, i = N i − n j =i ϕ ij + n j =i ϕ ji . It is important to highlight that the resident population is not changed, therefore, we do not consider migration but only commuter periodic movement, whereby the individuals of one meta-population go to another to work or study. We now proceed in order to describe how the transmission of a disease occurs in a meta-population, taking into account the flow of individuals in the network. We must then describe the occurrence of new infections in the Epidemic Models with Heterogeneous Mixing and Treatment An intuitive formulation for the reproductive number for the spread of diseases in heterogeneous populations How generation intervals shape the relationship between growth rates and reproductive numbers An introduction to mathematical epidemiology rspa.royalsocietypublishing.org Proc A contribution to the mathematical theory of epidemics Inferring generation-interval distributions from contact-tracing data Quantitative, modelbased estimates of variability in the generation and serial intervals of Plasmodium falciparum malaria Times to key events in Zika virus infection and implications for blood donation: a systematic review Serial interval of novel coronavirus (COVID-19) infections The Effective Reproduction Number as a Prelude to Statistical Estimation of Time-Dependent Epidemic Trends. Mathematical and Statistical Estimation Approaches in Epidemiology Intrinsic and realized generation intervals in infectiousdisease transmission On the convolution of exponential distribution Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation Transmission potential of the new influenza A(H1N1) virus and its age-specificity in Japan A practical generation-intervalbased approach to inferring the strength of epidemics from their speed 2020 Modeling the Spatiotemporal Epidemic Spreading of COVID-19 and the Impact of Mobility and Social Distancing Interventions Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic An immunization model for a heterogeneous population Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission Multi-patch and multi-group epidemic models: a new framework Spatial Heterogeneity in Epidemic Models A deterministic model for gonorrhea in a nonhomogeneous population Epidemiological implications of mobility between a large urban centre and smaller satellite cities Individual identity and movement networks for disease metapopulations Scaling effect in covid-19 spreading: The role of heterogeneity in a hybrid ode-networkmodel with restrictions on the inter-cities flow rspa.royalsocietypublishing.org Proc Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Mathematical modeling of COVID-19 in 14.8 million individuals in Bahia, Brazil Instituto Brasileiro de Geografia e Estatística -IBGE. In: Estimates for the population of Bahia in 2019 Assessing the nationwide impact of COVID-19 mitigation policies on the transmission rate of SARS-CoV-2 in Brazil | medRxiv The Battle Against COVID-19: Where Do We Stand Now? Introducing risk inequality metrics in tuberculosis policy development 2020 To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic 2021 Network analysis of population flow among major cities and its influence on COVID-19 transmission in China A contribution to the mathematical theory of epidemics Equivalence of the erlang-distributed seir epidemic model and the renewal equation The effective reproduction number as a prelude to statistical estimation of timedependent epidemic trends Mathematical modeling of covid-19 in 14.8 million individuals in bahia, brazil Scaling effect in covid-19 spreading: The role of heterogeneity in a hybrid ode-network model with restrictions on the inter-cities flow Assessing the nationwide impact of covid-19 mitigation policies on the transmission rate of sars-cov-2 in brazil IBGE. Arranjos populacionais e concentrações urbanas no brasil Acknowledgements. The authors acknowledge the discussions and suggestions from members of the CoVida Network (http://www.redecovida.org). Supplementary Material 2-Meta-population models formulation. Supplementary Material 4-Additional information about the municipalities. meta-population y i (t), which can occur within the "i" meta-populationor on another meta-population "j"β i (t) and β j (t) being the transmission rates within the "i" and "j" meta-populations, respectively. The parameters β i (t) and Φ ij (t) are time dependent, so we can incorporate the changes in the behavior of the populations on those variables. Thus, the number of new infections of individuals that belong to "i", F i (t), will be equal to the sum of (60) and (61). We can substitute equation (59) in expressions (60) and (61), in order to expand the term of the effective population of infected individuals(I e ,i (t) ,I e ,i (t)) and be able to express them according to the infected residents in each meta-population. Carrying out these operations, we get to:For:The λ ij (t) are related to the transmission between individuals of the same meta-population and from individuals of the "j" meta-population to individuals of the "i" meta-population. In this section, we construct the set of equations for the SIR-type model based in the contamination process modeled in this Supplementary Material. We will be modeling the resident populations of each municipality and will not be considering migration, therefore, resident individuals of one meta-population will always remain residents of that same meta-population. It is assumed that the susceptible population can only decrease due to infection process, therefore, no life and death dynamic. In the same way, the removed individuals are only generated by a removing rate γ, uniform for all meta-populations. Gathering those considerations, we write the system of equations:Where it is clear that when ϕ ij = 0, for all i's and j's, each meta-population will be described by the classical SIR homogeneous model. Of course, this SIR approach is not taking into account the other heterogeneities of a disease besides space. On the following section we present a more sophisticated approach focused on the Covid-19 transmission dynamics. A meta-population model for Covid-19 (SEIIR) In this section, we establish a more precise description for the dynamics of SARS-Cov-2 coronavirus on the municipalities network. To do so, we consider, as in 4 , that the infected individuals can be separated in three classes: the exposed E, individuals infected which are in the latency period and do not transmit the disease; the symptomatic individuals I s , that are infectious, present a substantial amount of symptoms and are registered in the official data; the asymptomatic/undetected ones I a , that are infectious but present mild/non-existing symptoms and are not registered in the official data. The infected individuals always start in the exposed compartment, a portion p of them eventually becomes symptomatic and the other (1 − p) portion becomes asymptomatic. Since we have two types of infectious individuals, both of them must be taken in consideration in the generation of new infected. Therefore, assuming that the asymptomatic individuals transmission rate is a fraction δ of the symptomatic transmission rate, it is simple to derive, similarly to (60) and (61), the number of infected individuals that are generated on the exposed compartment of each meta-population:Therefore, by the same assumptions presented on the previews section, the following model is obtained:Whereby κ, γ s and γ a are the removing rate of the exposed, symptomatic and asymptomatic compartments, respectively, and are uniform in all meta-populations. Expressions for evaluating R(t) using incidence dataHere, we derive the expressions and parameter values needed to estimate the reproduction numbers for the metapopulation models. Firstly, we start by substituting the explicit expressions of the reproduction numbers of each model in the equation 2.24 of the main framework. After some straightforward calculations, we obtain:whereby, Q i (t) = B i (t)/S i (t) andfor the SIR model andfor the SEIIR model. We consider that the B i (t) is the collection of all the new infections during a ∆t time interval of 1 day. With appropriate units of measure, we have ∆t = 1. In the SIR model we consider that every infection is reported, ρ i = 1 and in the SEIIR case only the symptomatic individuals report their infections, ρ i = p. We proceed into estimating the susceptible population. If we integrate (65) and substitute (62) and equation 2.23 of the main framework, we get:Finally, substituting (63) and (64) in (74) results in:for:Therefore value of Q i (t) and θ ij for each day can be obtained using the reported data and parameters. Thus, for each day, this leaves us with algebraic system of "n" variables, β j (t) and "n" equations. We can analytically solve this system for every meta-population, with the help of a computer algorithm, and obtain the daily values of β j (t) of every meta population. Substituting the values of every β j (t) in (63) and (64), we can compute the values of the λ i j(t)'s which leads us to the values of the reproduction number, equations 3.1 and 3.2 of the main framework.Since the available data is of daily number of cases, the R(t) is evaluated for each calendar day. Parameters To estimate the β i (t) parameters, and consequently estimate the reproduction numbers, the parameters of both models must be obtained. δ, γ a , γ s , p, and κ are estimated for the state of Rio de Janeiro in 6 and can be found on Table 1 . Additionally, in the SIR type model we assume γ = γ s . The intermunicipal commuter movement of workers and students for the cities of Brazil, ϕ ij , can be found in a study conducted by IBGE (Brasilian Institute of Geography and Statistics) 7 . Table 2 Supplementary Table 3 . The selected cities of the state of Rio de Janeiro in the Southeast of Brazil with their acronyms, number of inhabitants, and the reported COVID-19 cases until September 14th (2020). In bold, the capital (Rio de Janeiro).