key: cord-0791573-q7qdkqm2 authors: Kong, J. D.; Tekwa, E.; Gignoux-Wolfsohn, S. title: Social, economic, and environmental factors influencing the basic reproduction number of COVID-19 across countries date: 2021-01-25 journal: nan DOI: 10.1101/2021.01.24.21250416 sha: 26f5f5c45e95944f999f6df351684450862d0ea7 doc_id: 791573 cord_uid: q7qdkqm2 Objective: To assess whether the basic reproduction number (R0) of COVID-19 is different across countries and what national-level demographic, social, and environmental factors characterize initial vulnerability to the virus. Methods: We fit logistic growth curves to reported daily case numbers, up to the first epidemic peak. This fitting estimates R0. We then use a generalized additive model to discern the effects, and include 5 random effect covariates to account for potential differences in testing and reporting that can bias the estimated R0. Findings: We found that the mean R0 is 1.70 (S.D. 0.57), with a range between 1.10 (Ghana) and 3.52 (South Korea). We identified four factors- population between 20-34 years old (youth), population residing in urban agglomerates over 1 million (city), social media use to organize offline action (social media), and GINI income inequality- as having strong relationships with R0. An intermediate level of youth and GINI inequality are associated with high R0, while high city population and high social media use are associated with high R0. Environmental and climate factors were not found to have strong relationships with R0. Conclusion: Studies that aim to measure the effectiveness of interventions should account for the intrinsic differences between populations. The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has passed the first peak in the 21 majority of countries in the world. Scientists, health officials and citizens have tried to anticipate and 22 explain why the epidemic initially (i.e., before novel interventions) unfolded differently among countries, 23 but only now has the relevant data reached sufficient global reach and temporal length to begin statistical 24 analyses. Existing studies that examine some of the factors that may contribute to differences among 25 countries together are generally applied to metrics such as mortality, daily and cumulative case numbers, 26 or effective reproduction number (1-4). These metrics are time varying and sensitive to reporting and 27 testing differences, and are therefore not easily comparable across countries. For instance, decreasing 28 testing would allow the reported cases to drop, making raw case reporting incomparable across countries. A key metric, R 0 , has the practical advantage of being reliably estimable (5) and comparable 30 across countries even if testing and reporting rates are different, so long as these rates are either 31 constant or change in roughly the same way over time. R 0 is the basic reproduction number that indicates 32 how many secondary infections are caused by an infected individual at the beginning of an epidemic (6). Without interventions, the portion of the population that is expected to be infected or immunized before 34 the epidemic ends would be 1-1/R 0 . For example, an R 0 of 3 implies that ⅔ of the population would have 35 to be infected or immunized by the end of the epidemic. R 0 for COVID-19 has variably been estimated 36 between 1.4 (7) and 8.9 (8), with a likely value of 2.5 (9). Many studies either implicitly assume or are 37 understood to imply that R 0 is intrinsic to the infectious disease (9), but it is increasingly acknowledged 38 that many non-interventive factors could affect heterogeneity in R 0 among local populations or countries 39 (10). Interventive responses that occur during the initial exponential phase of COVID-19 can be 40 understood as proximate causes of differences in R 0 across populations, but ultimately they are likely pre-41 adaptations anchored on existing social, demographic, and environmental factors. Later interventions 42 generally affect R e , the effective reproduction number at any given time during the epidemic (4). Our goal is to use a diverse and comprehensive set of demographic, social, and environmental-44 climatic factors to begin explaining differences in the initial dynamics of COVID-19 across countries. The 45 predictors are non-contemporary with COVID-19, meaning they were measured before the current 46 epidemic began. The dependent variable is the basic reproduction number R 0 , which is derived from the 47 maximum growth rate of COVID-19 (number of additional hosts infected per infected individual per day) 48 within a country. R 0 can be estimated from the beginning of epidemic curves (5). The results in this study 49 cannot be used to infer the eventual epidemic sizes among countries, which are still unfolding and can be 50 very different from the initial dynamics due to novel interventions. We exclude proximal explanations of 51 R 0 , such as enacted policies during the initial rise of COVID-19, because such explanations would contain 52 statistical endogeneity -the initial epidemic growth may have partly caused the responses, therefore the 53 responses cannot be simply used as predictors. Instead, our study focuses on how pre-existing country 54 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2021. ; https://doi.org/10.1101/2021.01.24.21250416 doi: medRxiv preprint characteristics can explain the initial growth phases of COVID-19, although still without implying 55 causation. We did not attempt to include all possibly relevant covariates because of high correlations 56 even among a limited set, and because the limited number of countries dictate that a small subset should 57 be preselected in order to retain sufficiently positive degrees of freedom for statistical analyses. Observed 58 correlation between the covariates tested here and R 0 may be caused by any number of other covariates 59 that correlate with the identified covariates. Observed relationships should therefore be used for 60 hypothesis generation and further investigation. Covariates chosen belonged to seven categories: demographics, disease, economics, 62 environment, habitat, health, and social. All of these categories have been suggested previously as 63 possible factors for COVID-19 transmission. The most common factors previously studied were 64 temperature (11-24), pollution (13,25-31), precipitation/humidity (18,32,33), population density (34,35), 65 age structure (1,36,37), and population size (1,11,31). For these and additional covariates either 66 previously studied or only mentioned in the media, we rely on statistics measured at a national level. A 67 review of previously found effects on initial COVID-19 epidemic rates related to R 0 are documented in 68 Table 1 . We examined these categories simultaneously in order to better understand which group may 69 have a larger influence on R 0 and should therefore be investigated further at both the national and other 70 scales. This analysis is not meant to be exhaustive or definitive, but rather to help reveal baseline 71 epidemiological differences across countries, shape the direction of future research on COVID-19, and 72 understand infectious disease transmission in general. All data and code are available on a Github repository (38). The basic reproduction number R 0 (the dependent variable) is given by the formula (39) to fitting an exponential curve to early case numbers given that case numbers do plateau in reality. In 89 addition, the logistic growth model performs as well or better than more complicated models when 90 confronted with data (5,44). Mechanistic models with multiple compartments (45) and with time-91 dependent rates (46,47), may be more realistic for COVID-19 outbreaks that in some places exhibit 92 multiple peaks, but such models contain more parameters, require much more data, and are statistically 93 harder to infer reliably. Such complexity is also likely not necessary to describe the initial outbreaks, which 94 appear qualitative logistic ( Figure 1 ). In the logistic growth model, the cumulative case number I is given by: count, because some countries have lingered near peak daily count for much longer than a logistic 105 growth model would predict, which would pull the model peak to later than the actual date of peak 106 incidence and thereby underestimates r. We manually checked each time series and ensured that the 107 highest daily count only occurred during a first peak. We included all countries that were at least 6 days 108 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2021. ; https://doi.org/10.1101/2021.01.24.21250416 doi: medRxiv preprint into a period with at least 30 daily cases as of July 29, 2020, after truncating at the peak. We eliminated 109 countries whose logistic growth model R 2 was less than 0.9. Countries were assigned to the regions of 110 Covariates 121 Next, we compiled data on predictors for each of the countries studied from seven categories 122 (demographics, disease, economics, environmental, habitat, health, and social) from publicly available 123 databases (Table 1 ). We chose covariates that are diverse, specific, and do not obviously covary; for 124 example, gross domestic product per capita was not used because it covaries with many other more 125 precise covariates. In addition, we chose covariates that are comparable across countries; for example, we chose nurses per capita over doctors per capita because in many countries, nurses are the primary 127 caregivers. For each predictor, we used the most recent available data, which ranged from 2000-2019. When appropriate, data reported in absolute numbers were divided by total population to obtain per 129 capita figures. Data with highly skewed distributions were log-transformed and all distributions were 130 centred and standardized before regression. Four additional covariates were examined but were 131 eliminated through sequential variance inflation factor (VIF) analysis based on the mixed effect 132 generalized additive model described in Section 2.3 (adapted from the 'rms::vif' package in 133 RStudio1.2.5033). The goal is to reduce the collinearity of the final covariate set, so that we can make 134 better statistical attributions to how each covariate affects R 0 . In the analysis, we eliminated the covariate 135 with the highest VIF and iterated the elimination procedure until a representative and epidemiologically 136 reasonable set was left (the set in Table 1 ). The eliminated covariates were: 1. population greater than 65 137 years old (50), 2. life expectancy at birth (50), 3. hospital beds per capita (50), and 4. mortality rate 138 attributed to unsafe water, unsafe sanitation and lack of hygiene (51). After compiling the variables, we fitted the generalized additive model (GAM) using the 'mgcv' 142 package in RStudio1.2.5033, to analyze the effects of the covariates listed in Table 1 , on the R 0 value 143 across the globe. The covariates are standardized for effect comparisons. The main advantage of GAMs 144 over traditional regression methods are their capability to model non-linear relationships (a common 145 feature of many datasets) between a response variable and multiple covariates using non-parametric 146 smoothers. The general formula of a GAM is: . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2021. There are ongoing efforts to correct for these temporal biases based on delayed mortality rates (52,55), 169 but the results are currently not credible for smaller countries with poor reporting. At this point we must 170 rely on the reported case numbers, and use random effects to partially account for possible biases. We use the anova() function in R to compare the candidate models and see which one provides 172 the best parsimonious fit of the data. Because these models differ in the use of the random variables, 173 ANVOA will test whether or not including random effects leads to a significant improvement over using 174 just the given covariates without any random variables. For goodness of fits test, we use a chi-squared 175 test. We found that across the globe, R 0 (1.70±0.57 S.D.) was variable and on average slightly lower 219 than previous estimates (8,9). However, previous studies focussed on data from China and other 220 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2021. ; https://doi.org/10.1101/2021.01.24.21250416 doi: medRxiv preprint countries with early epidemic onset, which our estimates show to have higher than average R 0 . We 221 identified four factors (youth, city, social media, and GINI inequality) as having strong relationships with 222 COVID-19 R 0 across countries. Environmental factors, which are the most common factors previously 223 identified (temperature (11-24), pollution (13,25-31), precipitation/humidity (18,32,33)), did not have 224 strong relationships with R 0 when other factors are considered simultaneously, although pollution, 225 temperature, and humidity all have positive associations. The positive relationship between social media usage and R 0 observed here has not been 227 previously found for COVID-19. The trend may be proximally caused by the propagation of false 228 information on social media, for example in downplaying the potential danger of COVID-19, the 229 effectiveness of masks and social distancing, or propping up conspiracy theories on the disease (56). One study showed that more than 80% of online claims about COVID-19 were false at the beginning of 231 the pandemic (57). These proximal mechanisms, at least at the initial onset of COVID-19, seemed to 232 have overridden the potential benefits of social media as an accurate information spreader that allows contact rates and conforms with the main empirical trend (34). However, it is unclear why a low level of 256 city dwelling is also associated with a high R 0 , although the rise is relatively slight. In comparison to the 257 quadratic effects of youth and GINI inequality, the effect of city dwelling appears close to monotonic. Our analysis is based on coarse-grained country-level case data, without explicitly correcting R 0 259 estimates using temporal trends in testing, reporting, and mortality. The factors we analyzed hold across 260 regions within a country to some extent, but it can be argued that each factor or its substitute can be 261 measured more locally (10,65) and result in better statistical power. R 0 can also be estimated using less 262 phenomenological, more mechanistic models such as multiple-compartment (eg. susceptible-exposed-263 infectious-recovered-susceptible) (45), social network (66), or time-varying (46,47) models. However, 264 these approaches are more data intensive and not current available in many countries. Our country-level 265 analysis of R 0 serves as a coarse grain baseline for future analyses pending data availability. An 266 international perspective like the one we took here can help us understand COVID-19 in a broader 267 context, even though we sacrifice the ability to infer local causality. We emphasize that R 0 is not indicative of eventual outbreak sizes or the nature of subsequent 269 waves. Given the same population, a higher R 0 can lead to a higher outbreak size, but this does not 270 account for intervention measures that occur after the initial epidemic growth. For instance, a high initial 271 epidemic growth may provide a strong signal to both citizens and governments, which then may mount a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2021. ; https://doi.org/10.1101/2021.01.24.21250416 doi: medRxiv preprint (48,49)). The dynamic coupling between R 0 response is one reason why it is harder to infer the 277 effectiveness of intervention without taking into account how pre-existing characteristics relate to initial 278 epidemic growth. It is reasonable to believe that early interventions are actually symptoms of pre-existing 279 social, demographic, and environmental characteristics and are not easy to implement in some countries. The factors influencing R 0 identified here reflect the naive or intrinsic factors that may determine a Ethics and consent: All authors have been personally and actively involved in substantial work leading 305 to the paper, and will take public responsibility for its content. Competing Interest Statement: The authors declare no conflict of interest. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2021. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2021. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2021. ; https://doi.org/10.1101/2021.01.24.21250416 doi: medRxiv preprint A re-analysis in exploring the 382 association between temperature and COVID-19 transmissibility: an ecological study with 383 154 Chinese cities Air Pollution and COVID-19: The Role of 385 Particulate Matter in the Spread and Increase of COVID-19's Morbidity and Mortality. 386 IJERPH Role of the chronic air pollution levels in the Covid-19 outbreak risk 388 in Italy. Environmental Pollution Effects of meteorological conditions and air pollution on COVID-390 19 transmission: Evidence from 219 Chinese cities. Science of The Total Environment Is there an association between the level of 393 ambient air pollution and COVID-19? American Journal of Physiology-Lung Cellular and 394 Molecular Physiology Effect of ambient air pollutants and meteorological variables 396 on COVID-19 incidence Regional air pollution persistence links to 398 COVID-19 infection zoning The spread of 2019-nCoV in China was primarily driven by 400 population density. Comment on "Association between short-term exposure to air pollution 401 and COVID-19 infection: Evidence from China" by Zhu et al. Science of The Total 402 Environment Impacts of transportation and 404 meteorological factors on the transmission of COVID-19. International Journal of Hygiene 405 and Environmental Health 407 Association between climate variables and global transmission oF SARS-CoV-2. Science 408 of The Total Environment Crowding and the 410 shape of COVID-19 epidemics The role of the urban 413 settlement system in the spread of Covid-19 pandemic. The Italian case. TeMA -Journal of 414 Land Use Rich at risk: socio-economic drivers of COVID-19 416 pandemic spread The relatively young and rural 418 population may limit the spread and severity of COVID-19 in Africa: a modelling study. 419 BMJ Glob Health Data and Code for: Social, economic, and 421 environmental factors influencing the basic reproduction number of COVID-19 across 422 countries How generation intervals shape the relationship between growth 424 rates and reproductive numbers Serial Interval of COVID-19 among 426 Publicly Reported Confirmed Cases. Emerg Infect Dis Temporal dynamics in viral 431 shedding and transmissibility of COVID-19 Generalized logistic growth modeling of the 434 COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest 435 of the world The impact of non-pharmaceutical interventions, 437 demographic, social, and climatic factors on the initial growth rate of COVID-19: A cross-438 country study. Science of The Total Environment Projecting the transmission 440 dynamics of SARS-CoV-2 through the postpandemic period Mathematical modelling of COVID-19 transmission and 443 mitigation strategies in the population of Ontario Evaluating the Effectiveness of Social Distancing Interventions to 446 Delay or Flatten the Epidemic Curve of Coronavirus Disease. Emerg Infect Dis European Centre for Disease Prevention and Control. Download today's data on the 449 geographic distribution of COVID-19 cases worldwide Trending on Social Media: Integrating Social Media into 490 Infectious Disease Dynamics Income Inequality and COVID-19 Cases and Mortality in the USA Changing Age 495 Distribution of the COVID-19 Pandemic -United States Demographic 498 science aids in understanding the spread and fatality rates of COVID-19 COVID-19 working group Age-dependent effects in the transmission and control of COVID-19 epidemics Impact assessment of 504 non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong 505 Kong: an observational study. The Lancet Public Health Estimation of country-level basic reproductive ratios for novel 507 SARS-CoV-2/COVID-19) using synthetic contact matrices Hypertension, the renin-angiotensin system, and the risk of lower respiratory tract 511 infections and lung injury: implications for COVID-19. Cardiovascular Research COVID-19 pandemic: The African paradox The World Bank. Climate Data API Previous effects on epidemic rates are not necessarily on basic reproduction number R 0 , but rather on 553 cumulative case load, daily cases at certain stages, or effective reproduction number. Effects on epidemic 554 rates are recorded as positive (+), negative (-), insignificant (0), or non-monotonic (u-shape or n-shape). Effects accompanied by (?) are theoretical.