key: cord-0620799-7pnz98o6 authors: Ponciano, Jos'e Miguel; Ponciano, Juan Adolfo; G'omez, Juan Pablo; Holt, Robert D.; Blackburn, Jason K. title: Poverty levels, societal and individual heterogeneities explain the SARS-CoV-2 pandemic growth in Latin America date: 2020-05-22 journal: nan DOI: nan sha: f307584c6765a36ad6a2eec560e34b3b425c771a doc_id: 620799 cord_uid: 7pnz98o6 Latin America is experiencing severe impacts of the SARS-CoV-2 pandemic, but poverty and weak public health institutions hamper gathering the kind of refined data needed to inform classical SEIR models of epidemics. We present an alternative approach that draws on advances in statistical ecology and conservation biology to enhance the value of sparse data in projecting and ameliorating epidemics. Our approach, leading to what we call a Stochastic Epidemic Gompertz model, with few parameters can flexibly incorporate heterogeneity in transmission within populations and across time. We demonstrate that poverty has a large impact on the course of the pandemic, across fourteen Latin American countries, and show how our approach provides flexible, time-varying projections of disease risk that can be used to refine public health strategies. teen Latin American countries, and show how our approach provides flexible, time-varying projections of disease risk that can be used to refine public health strategies. One Sentence Summary: The growth modality of SARS-CoV-2 among Latin-American countries is well-explained by poverty differences, individual and temporal heterogeneities. ". . .Major global crises. . . demand cooperative global responses that do not leave out the poor. Once SARS-CoV-2 is under control, the world cannot return to business as usual" von Braun et al. (1) concluded in a recent editorial commentary in Science entitled "The moment to see the poor." As the recent flurry of research on the SARS-CoV-2 pandemic shows, the languages of mathematics, statistics and computer science are essential instruments for grappling with the uncertain course of the pandemic. Joel Cohen's (2) remark almost twenty years ago that "mathematics is biology's next microscope, only better" has never been more salient. Deterministic, epidemiological models of the SEIR type (Susceptible S(t), Exposed E(t), Infected I(t) and Recovered R(t)) have long enabled an in-depth exploration of infectious disease processes (3) (4) (5) (6) , and provide a framework of fundamental principles to manage infectious diseases (7) (8) (9) (10) (11) (12) including the SARS-CoV-2 virus pandemic (10, (13) (14) (15) . While these models are useful, parameterizing them for a novel viral pandemic with limited diagnostics and systematic data collection approaches can be challenging. We address this need (1) with a complementary, multi-model approach (16) that incorporates social, individual and temporal heterogeneities. Approaches employing simple yet biologically sound models with few parameters are particularly needed in regions like Latin America where sound strategies to collect public health data are seldom employed. Here we show that a multi-model, multi-stages modeling approach helps elucidate i) early epidemic growth in fourteen Latin-American countries ii) the role of poverty in shaping the growth rate of the number of cases and iii) the probability that the number of cases of SARS-CoV-2 exceeds any given amount within arbitrarily defined small windows of time, starting from the present. Characterizing complex epidemiological processes depends on the adequate formulation of proper probabilistic models of governing processes. Survival, extinction and growth of natural populations -including infectious diseases-are inherently stochastic (17) . At its core, the problem of modeling the growth of the accumulation of SARS-CoV-2 cases is a stochastic population dynamics problem begging for a full characterization of the probabilities of the possible trends. Computer intensive methods to fit biologically sound stochastic models to the growth of cases permit estimation of such probabilities. These methods notably allow a process-based estimation of the time-varying probability that within a given window of time, the number of cases will exceed any given number grounded in the dynamics of the infection process. Mathematically, this problem is analogous to the conservation biology goal of estimating extinction risk for endangered populations. The application of stochastic processes to the estimation of extinction risks and average times until an event of interest happens has enriched conservation biology (18) , wildlife management (19, 20) and evolutionary microbial population dynamics (21) . In conservation biology, Population Viability Analyses (PVA) were transformed by stochastic processes used to predict populations' growth and extinction (17) , despite complexities of the target species' lifecycle (16) . Ideally, a population dynamics model is relatively simple yet retains essential features of demographic and environmental stochasticities (17, 22) as well as robustness in the face of other sampling complexities (16) . These features are particularly important given the paucity of complete data/information. The analysis of SARS-CoV-2 time series data from countries with scant public health resources is not unlike the study of endangered species time series data -both seek to use whatever information is available to distill the general principles governing data fluctuations in order to then make informed projections and assess "what-if" scenarios. It is now recognized that long-term estimates of persistence probabilities are of little use because of the ever-changing conditions of population growth (18) . Viable Population Monitoring (VPM) aims at estimating and updating persistence probabilities over short time horizons using fresh data (18) . Such stepwise, continuously updated, estimates of short-term persistence probabilities are more practical and actionable than long-term projections. We draw on prior work in conservation biology, population dynamics and epidemiological theory to complement the current suite of deterministic epidemiological models, characterize the role of urban poverty in shaping the region's SARS-CoV-2 epidemics, and develop a methodology to generate short (5-15 days), sequentially updatable, process-based forecasts. Early epidemics : patterns and processes Characterizing the early phase of the epidemic growth profiles in Latin America ( Fig. 1A -D) provides a first step towards understanding the transmission dynamics of SARS-CoV-2 in the region. Unconstrained growth is often a fair assumption at the outset of an emerging disease, so growth of the number of cases n(t) over time are properly described by an exponential growth model wherein the per capita rate of change in total case numbers is the constant intrinsic growth rate r: Accordingly, the per capita contribution to the growth of the epidemic is unaffected by the total number of cases. Early in an epidemic, r has been used to estimate the basic reproduction number R 0 using approximate relations (6, 23) but their accuracy is model-dependent. Epidemic growth is frequently limited by many factors, including reactive behavior changes or spatially constrained contact structures. However diverse, these factors tend to act as "densitydependent" processes with slower growth patterns (5). Chowell et al. (5) note that different compartmental models all lead to an early sub-exponential growth. A key factor is inhomogeneous mixing in contact rates, formalized as a non-linear contact rate function of the type The resulting sub-exponential growth describes epidemiological data reported in the literature for many important epidemics (5) . Importantly, this model encapsulates, albeit phenomenologically, the outcome of lower level mechanistic models (5) where clustering, structuring, other forms of heterogeneity and the mean and variance of the contact distribution are ultimately responsible for the degree of sub-exponentiality. The key observation triggering our research was that early sub-exponential growth in fourteen Latin-American countries could be clearly divided into four distinct growth profiles (Fig. 1A-D) according to its approach to pure exponential growth. We then hypothesized that such differences in the degree of sub-exponential growth could then be explained by differences in poverty, which we expect to modulate the distribution of contacts and possibly other mechanisms (5). Compartmental, multivariate SEIR models are an every-day indispensable tool to obtain conceptual and practical insights. But these models are data-hungry and statistically costly (25) due to their (potentially) large number of parameters. The ratio of data to the number of parameters needing estimation remains a challenge in Latin America due to a lack of data gathering infrastructure, cohesive contact tracing plans, and testing and monitoring capabilities. Despite these challenges, we show that in twelve out of these fourteen countries, including some form of heterogeneity (e.g. structuring according to age or poverty or, due to inhomogeneous mixing of susceptible and infected individuals (24)) improves model fits compared with the classic SEIR model mostly used to date (Table 1) . First, we fitted the following SEIR type deterministic model variants: a classic SEIR model with and without non-homogeneous mixing, and a structured population SEIR model with and without non-homogeneous mixing with structuring according to poverty and age class. The two forms of demographic structuring and the non- homogeneous mixing were included to assess the effect of different sources of heterogeneity. For all 14 countries we modeled the start of the epidemic as resulting from imported cases and included the effect of each nation's airport-closing decision (see Table 1 and Supplementary Material). Next, our analyses focused on the development of actionable theory-grounded univariate models requiring fewer parameters. These models incorporate i) different degrees of heterogeneity among hosts in pathogen transmission and ii) variability in the dynamics due to overall poverty levels. The overall time series variability is decomposed into sampling error and two forms of process error: demographic and environmental variabilities (17, 30, 45) . The early epidemic involves demographic stochasticity To model the dynamics of initial infection, we used a stochastic pure birth process -a continuous time and discrete states Markov Process used in various epidemiological contexts (22, 26, 27) . The type of variability displayed here, demographic stochasticity, (17, 20, 22, 26) represents chance variation in infection due to heterogeneities in individual contact rates. This type of process variability looms large at low numbers of infected individuals (17) . Let N (t) be the random number of accumulated cases in a country at time t and p n (t) = P (N (t) = n(t)) the probability of observing n(t) cases at time t. We introduced an inhomogeneous contact rate function to obtain a form of the birth rate λ n that leads to sub-exponential growth early in an epidemic. How to incorporate heterogeneity into a univariate model is detailed next. Either analytically or numerically we calculate p n (t) and use it to compute the probability that the process exceeds a given threshold n c within a pre-determined future time interval (Fig. S2) , as well as first passage times (fpts) defined as Pure birth processes become quasi-deterministic at large population sizes (cases), but process variability remains important. When case numbers are large, deviations from deterministic predictions can emerge from temporal (or spatial) variation in the transmission rate, known as environmental stochasticity (17, 20, 22 ) (in addition to observation error (19)). Spatial heterogeneities may be determined by socio-economic factors. We build a hierarchical model of the accumulation of the total number of cases including poverty and heterogeneity in transmission to jointly fit to 14 Latin-American countries (Fig. 1A-D) . We then extend this model formulation to include environmental variability and use it to formulate practical risk assessment tools for each country. (20, 28) and reflects what one might call the "Cohen principle," which demonstrates that distributions of abundance do not provide a shortcut to understanding the mechanisms that generate those distributions. Each requires analysis (31) . Our minimal assumptions approach has direct, practical consequences: it re-directs the inferential focus towards biologically relevant variance forms of N (t). These are key to accurately estimate population risks (30) . Second, this framework is amenable to multiple parameterizations regarding infection dynamics, for example, by assuming inhomogeneity in p(t). Using Eq. (6) in (5) (an expression for inhomogeneous mixing during infection) we let is the number of infected that day. If γ is the per-day recovery probability, then I(t) −b = (n(t) − n(t)γ) −b so that the average one-step change in the number of cases is: Fig. 2A ) values of c closer to 1 (i.e. higher homogeneity in contact rates), which in turn implies that the accumulation of cases is closer to exponential growth ( Fig. 2A) . Indeed, greater social homogeneity in poorer countries means a higher proportion of the population lives in poverty. Having demonstrated the significance of the urban poverty covariate, we repeated the estimation without Venezuela because it was the only country for which we had difficulties crosschecking the data from multiple sources. Transparency in data reporting is paramount for scientific inference. This time, besides the random poverty-driven effect, we postulated that the growth of the epidemic was dominated by environmental stochasticity. In this new model, Eq. (2) in the natural logarithm scale is the mean of a Markovian transition probability distribution. We called this model the Stochastic Epidemic Gompertz (SEG) model. Using the SEG model, we obtained a tighter relationship (Fig. 2B ) between poverty and the inhomogeneity parameter, despite the added layer of randomness. Just as before, higher urban poverty index yields on average (thick black line in Fig. 2B ) values of c closer to 1 (i.e., higher homogeneity in contact rates). Because the SEG is a model with environmental stochasticity, it allows accommodating a large mean number of contacts per individual while keeping the variance of the "offspring" distribution null (i.e., no demographic stochasticity, only environmental noise). In network theory models, this would amount to specifying a distribution of the number of contacts that has a large mean but a very small (if not null) variance. Our model construction process focuses on the specification of the nature of the variability (see Supplementary Material), and hence can readily accommodate many other mean to variance relationships besides the one implied by the SEG model. The point is, the contact and infection processes (32) , not the distributional assumptions, take a central stage as in (28) . The Kalman Filter (KF) applied to our poverty fit yielded a joint, time-dependent, effective reproduction number (R t ) predictions for the thirteen countries plotted in figure 2B (see Fig. 3 and Supplementary Material for the R t approximation). In all cases, R t declines but remains above one. Notably, poorer countries tend to suffer higher effective reproduction numbers. To illustrate the conceptual and practical benefits of complementing SEIR modeling with our SEG models we conducted two numerical experiments. First, we compared the predictive qualities of our SEG modeling approach with the deterministic predictions of the best SEIR model variant fit in each country (Table 1 case the projections consisted of the deterministic solution of the best-fitting ODE model for four countries (Fig. 4) . Similar results for other countries are in the Supplementary Material. For our SEG model, computing these projections amounted to simulating 50000 trajectories of 16 days using its maximum likelihood estimates. We then plotted for the same four countries the most probable path along with the inter-quartile (IQ) range of these 50000 simulated paths (Fig. 4) . This comparison (analogous to stochastic forecasting of hurricane paths) clearly shows that the hierarchical model approach is at least as good or better than the deterministic predictions ( Fig. 4 , Table 1 ). In every case, the future observations (data towards the end of the observed time series, not part of the fitting procedure but retained for testing) are as good as (for Colombia) or closer to the most probable path than they are to the deterministic predictions. The epidemic forecasting using the SEG model thus appears more reliable for longer forecasts than the deterministic solutions. This property of the SEG model could be particularly useful in the face of sudden changes of some kind (social, political, public health policy, and so forth) in the context within which the epidemic is developing. Risk projections and future waves We developed an RPM tool that mirrors conservation biology approaches (18) to serially update the quasi-extinction probabilities with every increase of the length of the time series of population abundances. Applied to SARS-CoV-2, this process involves using the past records of the cumulative number of cases to estimate the SEG model parameters and then use these to predict for the near future (τ = 5 or 10 days) the probability that the number of cases will rise above a given critical threshold n crit , p n crit (n(t), τ ) = P r(N (t + τ ) ≥ n crit |N (t) = n(t)). With every passing day t , the estimate of p n crit (n(t ), τ ) is updated. The resulting p n crit (n(t ), τ ) trend can be used to diagnose the near future risk of an increase by any amount of the number of cases. This risk can then be propagated to compute the chances of needing the same number of extra intensive care units for Costa Rica is illustrated in Fig. 5 . There, the probability of a spike of 50 cases or more declined over time, indicating the dwindling of the epidemic. This approach is applicable to time series of deaths for studies aiming at assessing trends in mortality risks. Finally, the advantage of introducing non-standard contact rate functions including heterogeneity is that they include non-linear phenomena like bi-stability, which, in the face of environmental stochasticity, explain how can a second wave arises without needing a phenomenological environmental forcing function (see Fig. 6 in (32)). We have used a multi-model approach to better understand and predict the SARS-CoV-2 dynamics in fourteen countries of Latin America. Where possible, we incorporated inhomogeneous mixing and/or the effect of urban poverty. The models we have examined and compared included compartmental SEIR models and models with demographic stochasticity and environmental noise with added sampling error and a poverty random effect (the SEG model). We used the SEG model to illustrate how with only time series of cumulative SARS-CoV-2 cases, countries with scant public health resources can make practical and conceptual advances in understanding the pandemic and forecasting its effects. Our SEG model is one of a suite aimed at decomposing the contributions from demographic, environmental and individual heterogeneities in observed time series of data. Accurately accounting for the factors shaping the variance of the growth rate of N (t) yields better forecasts ( Fig. 4) (20, 30) and the power to estimate short term trends of the risk of epidemic growth ( Fig 5) . These projections along with our finding that urban poverty shapes the region-wide dynamics of the SARS-CoV-2 pandemic in Latin America highlight the potential power of our approach. Developing reliable tools to better understand and predict complex epidemiological processes in poor countries depends on mathematical and statistical approaches attuned to a multiplicity of realities. Yes, more data are needed and always will be. Mathematical "microscopes" tracking the complexities of human behavior are also needed. Yet, here we show that fundamental ecological principles can illuminate the uncertain fate of countries in need. And this using only the most readily available source of information in these countries: the reported time series of the total number of cases up to any given day. The moment to see the poor An introduction to stochastic processes with applications to biology Theoretical population biology Proceedings of the Royal Society of London. Series B: Biological Sciences Parameter estimation for Markov chain models: fundamental statistical concepts and computer intensive methods (National Institute for the Mathematical and Biological Synthesis Tools for statistical inference Funding for this study was provided by the Universidad de San Carlos de Guatemala for J.A.P., the Universidad del Norte for J.P.G. and the National Institutes of Health Grant 1R01GM117617 to J.M.P., R.D. Holt and J.K. Blackburn, Figs. S1 to S14References (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46)