key: cord-0905197-fpmd9nzf authors: Ventura, L.; Vitali, M.; Romano Spica, V. title: BCG vaccination and socioeconomic variables vs Covid-19 global features: clearing up a controversial issue date: 2020-05-26 journal: nan DOI: 10.1101/2020.05.20.20107755 sha: 195055c998033ab7dcd36ab7b645e8a670f14465 doc_id: 905197 cord_uid: fpmd9nzf Background: The Covid-19 pandemic is characterized by extreme variability in the outcome distribution and mortality rates across different countries. Some recent studies suggested an inverse correlation with BCG vaccination at population level, while others denied this hypothesis. In order to address this controversial issue, we performed a strict epidemiological study collecting data available on a global scale, considering additional variables such as cultural-political factors and adherence to other vaccination coverages. Methods: Data on 121 countries, accounting for about 99% of Covid-19 cases and deaths globally, were from John's Hopkins Coronavirus Resource Center, World Bank, International Monetary Fund, United Nations, Human Freedom Report, and BCG Atlas. Statistical models used were Ordinary Least Squares, Tobit and Fractional Probit, implemented on Stata/MP16 software. Results: Based on our results, countries where BCG vaccination is or has been mandated in the last decades have seen a drastic reduction in Covid-19 diffusion (-80% on average) and mortality (-50% on average), even controlling for relative wealth of countries and their governmental health expenditure. A significant contribution to this reduction (respectively -50% and -13% on average) was also associated to the outbreak onset during summer, suggesting a possible influence of seasonality. Other variables turned out to be associated, though to a lesser extent. Conclusions: Relying on a very large dataset and a wide array of control variables, our study confirms a strong and robust association between Covid-19 diffusion and mortality with BCG vaccination and a set socio-economic factors, opening new perspectives for clinical speculations and public health policies. Since the novel coronavirus SARS-CoV-2 was initially detected in Wuhan, China, in December 2019 (1) more than two million cases of Covid-19 have been confirmed worldwide with a death toll about 140,000 by April 2020, as reported by WHO (https://www.who.int/docs/defaultsource/coronaviruse/situation-reports/20200417-sitrep-88-covid-191b6cccd94f8b4f219377bff55719a6ed.pdf?sfvrsn=ebe78315_6). One of the puzzles associated to the outspread of this pandemic is the extreme geographic (2) and ethnical (3) variability of its outcomes, both in terms of contagion and mortality with inevitable economic implications (4). We have witnessed an increased variance in fatality rates as more countries were hit by the virus, generating a clustering of countries in terms of incidence and mortality rates (MR), both across and within affected continents. In Europe, for example, the case fatality rate (CFR) is below or around 3%, such as Portugal, Ireland, Norway and Finland, respectively with 3.45%, 3.79%, 2.35%, 2.35%, but much higher with rates hovering around and above 10% in countries such as 5/21 (14), last updated in 2017 in the online version (http://www.bcgatlas.org/index.php). Data about human freedom comes from the 2019 Human Freedom Report by the Fraser Institute (https://www.cato.org/sites/cato.org/files/human-freedom-index-files/human-freedom-index-2018-revised.pdf). Data from a total of 121 countries, out of the 209 that reported cases of Covid19, accounting for about 99% of both confirmed cases and deaths, have been used. The countries in the analysis, listed in supplementary appendix, have been chosen in view of the availability of observations relative to covariates. The set of dependent and independent variables is reported in Table 1 . In particular, we used confirmed cases per million inhabitants as a proxy for the intensity of contagion; the number of cases 15 days earlier as a proxy for the stage of the diffusion of the virus; population in the largest city as a proxy for density and the degree of urbanization; life expectancy at birth as a comprehensive health indicator, and as a proxy for the share of aged people in the population; the latitude to define both the season as of April 17 th (above or below the Equatorial line) and tropical countries (those countries whose latitude as defined by the corresponding variable in the World Development Indicators lies in between the two tropics). As for BCG vaccination policy, two alternative continuous measures were constructed, and used for robustness checks: the BCG coverage, as reported in national surveys in various years, and the years of absence of mandated vaccination, until 2020. Coverage rates for different vaccines (B Hepatitis, Measles and DPT) were also used, to disambiguate the effects of BCG from those of a more general vaccination policy. Among the variables proxying for economic ties with China, where the epidemic first appeared, we include imports from China, and the levels of inward and outward Foreign Direct Investment (FDI) relative to China. Finally, to proxy for the compliance with the lockdown measures implemented by the various governments, we use the Index of Human Freedom (HFI), . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint 6/21 a weighted average of 79 distinct indicators (37 for the personal freedom subindex and 42 for the economic freedom subindex), each one ranging from 0 to 10, with 10 representing the most freedom. The HFI ranges therefore from 0 to 10, in increasing order of freedom (https://www.cato.org/sites/cato.org/files/human-freedom-index-files/human-freedom-index-2018-revised.pdf). We used Gross Domestic Product (GDP) per capita, and private and general government health expenditure to proxy for countries' level of development (general and of their health system) and for the countries' testing capability (more income and a richer health system should be positively correlated to more Covid-19 testing). To model our dependent variables, we used both ordinary least squared, as a reference estimator, and nonlinear estimation methods. In particular Tobit regressions, estimating both the impacts of covariates on the probability of a country reporting more than 100 cases as of April 17 th , and their effect on relative diffusion, was our preferred estimation method. The reported coefficients in the Tobit regression represent the marginal effects of the explanatory variables on the outcome variable, after accounting for the inclusion of countries in the high incidence group. The second and third outcome variables, i.e. CFRs and MRs, were first modelled by ordinary least squares to obtain benchmark estimations, and then by Probit fractional regression methods to account for the fractional nature of the dependent variables (15). When the dependent variable is a fraction, as with CFRs and MRs, using log-odds transformation or Tobit regressions with lower and upper limits set to 0 and 1 may yield biased results (15, 16). Therefore, fractional regressions will be our preferred estimation method for fatality and mortality rates. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint 7/21 For consistency and comparison purposes all models included the same set of explanatory variables. Moreover, ordinary least squares and fractional regressions also account for heteroskedasticity, by using robust variance-covariance estimators. All statistical analyses have been performed by using Stata/MP 16 for Windows. Table 2 reports the regression results for relative incidence (reported cases over total population) obtained by OLS, Tobit without controlling for other vaccines and Tobit controlling for other vaccines. For both OLS and Tobit, the estimated coefficients represent the marginal effect of the covariates on the outcome variable. Table 2 shows large and strongly significant effects for per capita gross domestic product and Human Freedom Index (positive) and for the summer season and BCG vaccination (negative), even controlling for more vaccinations. Table 3 contains estimated coefficients, with corresponding standard errors and marginal effects, for CFR and MRs fractional regressions. This table reveals large and strongly significant effects for health expenditure and tropical position (positive) and for summer season and, above all, BCG vaccination (negative), even controlling for more vaccinations. Table 4 contains the results of a robustness analysis on relative incidence, where alternative measures for BCG have been used as explanatory variables instead of the BCG dummy. Table 5 contains the results of the same robustness analysis performed on the CFRs and MRs. Both tables 4 and 5 confirm the results obtained with the previous regressions. All regressions feature the same set of explanatory variables, listed in the data section, but tables only report statistically significant coefficients, as well as their respective significance (10%, 5%, or 1%). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint 8/21 To the best of our knowledge, this is the first study assessing the impact of BCG vaccination on the diffusion and mortality of Covid-19 at the global level by controlling for a comprehensive set of social, economic, geographic and demographic variables. This allows to greatly reduce the risk of spurious correlations among variables and confers high statistical robustness to the results which are therefore more amenable to causal interpretation. From the second and third column in Table 2 , containing the statistically significant Tobit coefficients of relative incidence, we observe that the number of positive cases 15 days earlier has a very significant and sizeable positive coefficient, capturing the different stage of the epidemic across countries (a larger number of cases implies a higher probability of contagion). Among the demographic variables, population in the largest city has a very sizeable, positive and statistically significant coefficient, indicating that high urban density fosters the epidemic. Other demographic variables do not reach statistical significance, possibly in view of high variability in data. Lastly, the percentage of immigrant over total population is negatively correlated with the extent of the epidemic, but with a non statistically significant coefficient. Its sign, however, might be read in the light of the impact of BCG vaccine, as explained below. Or it might be read as resulting from the lower probability of those immigrant communities being tested, as suggested by Borjas (17). Among geographical variables, the dummy for Summer (countries below Equatorial Line) is negative, large, and highly statistically significant. This strongly reinforces results in Ozdemir and colleagues (7) , finding that the mean of cases per population ratio was higher in the Northern hemisphere. The magnitude of the coefficient is really large, and indicates that countries in the Southern hemisphere have, on average and all else equal, over 200 cases per million less (against . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint 9/21 a mean value of confirmed cases per million inhabitants of about 470). Seasonality is a key question for describing the trend of pandemic and predicting future transmission dynamics (18). Among the economic and health policy variables, GDP has a large and highly significant coefficient in all specifications, which corresponds to our a priori expectations, in view of the maintained hypothesis of a positive correlation between income and number of tests performed. In other words, including the per capita GDP variable allows to control for different testing policies implemented across countries. Domestic private health expenditure also features a positive coefficient, which can most likely be explained in the same way. The main and well taken criticism, addressed by Curtis and colleagues (6) In our study we avoid this problem by explicitly accounting for the relative wealth of countries and of their respective health systems, thus netting out the relationship between income and testing policies, and by using a large dataset including quite a few high income countries where BCG vaccine is also mandatory (Japan, as a notable example). Closer economic ties to China exert an ambiguous effect. On the one hand, imports from China have a negative and statistically significant effect on the extent of the epidemic, most likely because countries closer to China were affected to a limited extent. On the other, FDI from China have a positive, but limited effect, and might derive from the more frequent personal contacts . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint 10/21 between Chinese and western businessmen, around the start of the crisis. Anyway, if ties with China might have explained the diffusion of the epidemic at earlier stages, they no longer seem to do so. The HFI has a strong and significant impact on the incidence of the epidemic, suggesting that in freer countries lockdown measures were milder, and that compliance might have been lower in such countries. Most importantly, the BCG dummy variable has a strong and negative impact in the Tobit specifications. This finding corroborates in a more comprehensive and robust framework those presented in some recent contributions (7-10) and arguing in favor of a correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19. Another study by Dayal and Gupta (11) obtains similar conclusions, comparing CFR's of countries where BCG re-vaccination is adopted vs. those countries where vaccination is practiced only once in lifetime. As the significance of BCG might be spurious, driven by the correlation with other vaccinations, the third column in Table 2 In fact, the most important effect on CFRs and MRs, as for relative incidence, is exerted by the BCG variable, which is associated to a strongly significant reduction of CFR and MR by, respectively, -4.5 percentage points (both with and without additional vaccination controls) and -1.9 (-1.2 with additional vaccination controls) percentage points. To get a relative idea of the magnitude of these estimates, let us just notice that the mean values of CFRs and MRs in the estimation sample are, respectively, 4.1% and 2.8%. Anecdotal evidence, especially in Europe, seems largely consistent with the large explanatory power of BCG on CFRs and MRs. We mentioned the wide and puzzling differences in CFRs and MRs between contiguous countries, such as Portugal and Spain, Ireland and UK, Norway/Finland and Sweden. In all those pairs of countries, the first has mandatory BCG vaccination or had it until recent times, the second has not. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107755 doi: medRxiv preprint Last but not least, the HFI turns out to have a negative and significant effect on the CFR. This, however, might only be due to the results reported above, i.e. the positive effect of HFI on the number of reported cases, which is the denominator of CFR. As a further control of the robustness of these results, more checks have been performed by replacing the BCG dummy with two different continuous variables related to this vaccination policy. One is BCG coverage, as reported in national surveys. Another is the number of years of missing vaccination, until 2020. Unfortunately, data on coverage were available for a more limited number of countries, but even so the results seem noteworthy. Table 4 contains the results of two Tobit regressions of the relative number of cases, by using the alternative BCG variables. Table 5 reports the results of fractional regressions on both CFRs and MRs, using the two alternative BCG variables. Tables 4 and 5 show a strong and negative impact, of magnitude comparable to that obtained for the BCG variable in the previous analyses, for the BCG coverage variable. Missing years of vaccination also feature a coefficient with the expected positive sign, but only significant in the case of relative incidence and CFRs. Our study, by integrating a wide set of controls, attenuates if not eliminates altogether the effects of possible confounding factors affecting previous studies on BCG and Covid-19. The nonlinear, probabilistic methods used in the analyses confer additional statistical robustness to the results. This way, we are able to confirm a robust and large effect of BCG vaccination on Covid-19 diffusion and mortality, and to uncover other noteworthy and potentially relevant statistical relationships. In this wider context, the association between BCG vaccination and Covid-19 diffusion and mortality is fully addressed by a comprehensive epidemiological perspective, that can support novel hypothesis as well as clinical or experimental studies in the field. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. https://www.thelancet.com/action/showPdf?pii=S0140-6736%2820%2930757-1. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. A novel coronavirus from patients with pneumonia in China Emergence of a Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2: Biology and Therapeutic Options Racial health disparities and Covid-19 -Caution and Context Combating COVID-19: health equity matters An interactive web-based dashboard to track COVID-19 in real time Considering BCG vaccination to reduce the impact of COVID-19 We deeply thank Charles Yuji Horioka and Yoko Niimi for discussions, key insights, comments, and crucial information about the main topic of this paper. Useful advice and help from Maria Ventura and Lory Marika Margarucci is also gratefully acknowledged.