key: cord-0709592-f3ctbxe5
authors: Pawlowski, Colin; Puranik, Arjun; Bandi, Hari; Venkatakrishnan, A. J.; Agarwal, Vineet; Kennedy, Richard; O’Horo, John C.; Gores, Gregory J.; Williams, Amy W.; Halamka, John; Badley, Andrew D.; Soundararajan, Venky
title: Exploratory analysis of immunization records highlights decreased SARS-CoV-2 rates in individuals with recent non-COVID-19 vaccinations
date: 2021-02-26
journal: Sci Rep
DOI: 10.1038/s41598-021-83641-y
sha: 7c21314e0bd87c9ca51cf679dced565919a6a12f
doc_id: 709592
cord_uid: f3ctbxe5

Clinical studies are ongoing to assess whether existing vaccines may afford protection against SARS-CoV-2 infection through trained immunity. In this exploratory study, we analyze immunization records from 137,037 individuals who received SARS-CoV-2 PCR tests. We find that polio, Haemophilus influenzae type-B (HIB), measles-mumps-rubella (MMR), Varicella, pneumococcal conjugate (PCV13), Geriatric Flu, and hepatitis A/hepatitis B (HepA–HepB) vaccines administered in the past 1, 2, and 5 years are associated with decreased SARS-CoV-2 infection rates, even after adjusting for geographic SARS-CoV-2 incidence and testing rates, demographics, comorbidities, and number of other vaccinations. Furthermore, age, race/ethnicity, and blood group stratified analyses reveal significantly lower SARS-CoV-2 rate among black individuals who have taken the PCV13 vaccine, with relative risk of 0.45 at the 5 year time horizon (n: 653, 95% CI (0.32, 0.64), p-value: 6.9e−05). Overall, this study identifies existing approved vaccines which can be promising candidates for pre-clinical research and Randomized Clinical Trials towards combating COVID-19.

Since the genome for SARS-CoV-2 was released on January 11, 2020, scientists around the world have been racing to develop a vaccine 1 . However, vaccine development is a long and expensive process, which takes on average over 10 years under ordinary circumstances 2 . Even for the previous epidemics of the past decade, including SARS, Zika, and Ebola, vaccines were not available before the virus spread was largely contained 3 .

Conventionally vaccinations are intended to train the adaptive immune system by generating an antigenspecific immune response. However, studies are also suggesting that certain vaccines lead to protection against other infections through trained immunity for upto 1 year and in the case of live vaccines for up to 5 years 4 . For instance, vaccination against smallpox showed protection against measles and whooping cough 5 . Live vaccinia virus was successfully used against smallpox. Due to the urgent need to reduce the spread of COVID-19, scientists are turning to alternate methods to reduce the spread, such as repurposing existing vaccines. There are some hypotheses that the Bacillus Calmette-Guérin (BCG) and live poliovirus vaccines may provide some protective effect against SARS-CoV-2 infection [6] [7] [8] . There are several ongoing/recruiting clinical trials testing the protective effects of existing vaccines against SARS-CoV-2 infection, including: Polio 9 , Measles-Mumps-Rubella vaccine 10 , Influenza vaccine 11 , and BCG vaccine [12] [13] [14] [15] .

In this work, we conduct a systematic analysis to determine whether or not a set of existing non-COVID-19 vaccines in the United States are associated with decreased rates of SARS-CoV-2 infection. In Fig. 1 , we provide an overview of the study design and statistical analyses. We consider data from 137,037 individuals from the Mayo Clinic electronic health record (EHR) database who received PCR tests for SARS-CoV-2 between February 15, 2020 and July 14, 2020 and have at least one ICD diagnostic code recorded in the past five years (see "Methods" section). In Table 1 , we show the clinical characteristics of the study population. In particular, 92,673 (67%) individuals have at least 1 vaccine in the past 5 years relative to the PCR testing date. In Fig. 2 , we present the SARS-CoV-2 infection rates for subsets of the study population with particular clinical covariates. We note that Table 3 , we provide a breakdown of patient counts by vaccine type (inactivated, live attenuated, recombinant, unknown) for each of these 18 vaccines. Given this dataset, first we assess the overall association of vaccination status with the risk of SARS-CoV-2 infection (see "Methods" section). We use propensity score matching to construct unvaccinated control groups for each of the vaccinated populations at the 1 year, 2 year, and 5-year time horizons. The unvaccinated control groups are balanced in covariates including demographics, county-level incidence and testing rates for SARS-CoV-2, comorbidities, and number of other vaccines taken in the past 5 years. Then, we compare the SARS-CoV-2 rates between each of the vaccinated cohorts and corresponding matched, unvaccinated control groups which have similar clinical characteristics. Second, we repeat the analysis on a set of age, race, and blood type stratified subgroups of the study population. In particular, for each subgroup, we run propensity score matching and compute the difference in SARS-CoV-2 infection rate between the vaccinated and unvaccinated (matched) cohorts. Third, we compare the rates of admission to the hospital and ICU for the 1 year vaccinated and unvaccinated control groups, in order to see if there is a relation between vaccination status and COVID-19 disease severity. Finally, we run a series of sensitivity analyses to evaluate whether or not these results may be biased from unobserved confounders or other factors. Fig. 3 , we present the vaccination coverage rates for each of these vaccines in the study population for all time horizons. Overall, we observe that the Polio and HIB vaccinated cohorts generally have the lowest relative risks for SARS-CoV-2 infection across all time horizons. The relative risk of SARS-CoV-2 infection is 0.57 (n: 2402, 95% CI (0.42, 0.77), p-value: 0.003) for individuals who have taken the Polio vaccine in the past 1 year, and 0.53 (n: 2061, (95% CI (0.37, 0.77), p-value: 3.2e−03) for individuals who have taken the HIB vaccine in the past year. We note that these vaccines are almost exclusively administered to individuals under 18 years of age, as shown in Fig. 4 . Other vaccines that are commonly administered to younger individuals with strong negative correlations with SARS-CoV-2 infection include MMR and Varicella vaccines.

The other vaccines which are consistently associated with lower SARS-CoV-2 rates include PCV13, Geriatric Flu, and HepA-HepB vaccines. At the 1 year time horizon, the relative risks of SARS-CoV-2 infection are 0.72 for PCV13 (n: 4693, 95% CI (0.56, 0.92), p-value: 0.03), 0.74 for Geriatric Flu (n: 12,085, 95% CI (0.61, 0.89), p-value: 5.6e−03), and 0.80 for HepA-HepB (n: 5858, 95% CI (0.67, 0.97), p-value: 0.05). Although the relative risks are less significant compared to Polio and HIB, these associations may be particularly interesting to explore further because these vaccines are commonly administered across a broader age range of the population (see Fig. 4 ).

Pairwise correlation analysis reveals strong associations between administration of HIB, Polio, Rotavirus, Varicella, and MMR vaccines. In order to identify vaccines which may be confounding factors for other vaccines that are linked to reduced rates of SARS-CoV-2 infection, we conduct a pairwise correlation analysis. For example, it is possible that the lower rates of SARS-CoV-2 infection that we observe for one vaccine are in fact caused by another vaccine which is highly correlated with the former. To measure the correlations we use Cohen's kappa, which is a measure of correlation for categorical variables that ranges from − 1 to + 1. In particular, Cohen's kappa = + 1 indicates that the pair of vaccines are always administered together, Cohen's kappa = 0 indicates that the pair of vaccines are independent of each other, and Cohen's kappa = − 1 indicates that the pair of vaccines are never administered together. Fig. 4 ). We note that in this cluster, the vaccines HIB, Polio, Varicella, and MMR are all consistently associated with lower SARS-CoV-2 rates. This suggests that some of the lower rates of SARS-CoV-2 observed in these vaccinated cohorts may be confounded by the other vaccines in this group. Tables 7, 8 and 9, we present the results of propensity score matching at the 1, 2, and 5-year time horizon, respectively, on study cohorts stratified by race. We observe that PCV13 vaccination is linked with significantly decreased SARS-CoV-2 rates in the Black subpopulation. In particular, the relative risk of SARS-CoV-2 infection for black individuals who have been administered PCV13 is 0. 24 www.nature.com/scientificreports/ In addition, we observe that Polio, HIB, and PCV13 vaccines are linked with decreased SARS-CoV-2 rates in the White subpopulation. However, since 119,979 (88%) of individuals in the study population are white, the relative risks for these vaccinated cohorts are close to the relative risks for the overall population (see Tables 4, 5, 6) . Matching within subgroups was done by age group (0-18, 19-49, 50-64, 65 +) and blood group (A, B, AB, O) as well, but no significant within-subgroup associations between any vaccine and SARS-CoV-2 rates were found. This suggests that associations between vaccines and SARS-CoV-2 infection rates may not be strongly specific to particular age ranges/blood groups. Tables 10 and 11 , we show the COVID-19 hospitalization and ICU rates among the vaccinated and non-vaccinated (matched) cohorts for the 1-year time horizon, respectively. We observe that these rates are relatively similar between the two cohorts, and there are no statistically significant differences. This lack of a statistically significant association may be due to the relatively low rates of hospitalization/ICU admission among COVID-19 patients in this dataset. Considering these findings along with the results from the previous analysis, these results suggest that vaccination status is associated with differential rates of SARS-CoV-2 infection, but there is not enough evidence to determine if vaccination status is associated with COVID-19 disease severity.

Sensitivity analysis. Tipping point analysis shows that associations between reduced SARS-CoV-2 rates and Polio vaccine (1, 2 year time horizons), PCV13 (5 year time horizon) are most robust to unobserved confounders. In this retrospective study, we evaluate the correlations between vaccination and SARS-CoV-2 infection, taking into account a number of possible confounding variables, such as demographic variables and geographic COVID-19 incidence rate (see "Methods" section). However, it is possible that the results from this study have been influenced by unobserved confounders. For example, we do not explicitly control for travel history, which was a significant risk factor for SARS-CoV-2 infection early on in the pandemic.

In Fig. 6 , we present the results from the tipping point analysis on the statistically significant associations between vaccination and reduced rates of SARS-CoV-2 infection in the overall study population. For each time horizon, we show the relative prevalence and effect size that would be required for an unobserved confounder to overturn the conclusion for a given (vaccine, time horizon) pair. For reference, we show the effect size of the www.nature.com/scientificreports/ covariate (county-level COVID-19 incidence rate ≥ median value) as a potential confounder, which has a large relative risk of 2.78. At the 1 year and 2 year time horizons, the associations of the Polio vaccine to lower rates of SARS-CoV-2 infection are most robust to the impact of a potential unobserved confounder. In particular, an unobserved confounder with a large effect size of 2.78 would need to have an absolute difference in prevalence between vaccinated and unvaccinated cohorts of 17.8% (30.9%) in order to overturn the results for the 1 year (2 year) time horizon. On the other hand, at the 5 year time horizon, the association of PCV13 and lower rates of SARS-CoV-2 infection is most robust to the influence by unobserved confounders. An unobserved confounder with a large effect size of 2.78 would need to have an absolute difference in prevalence between vaccinated and unvaccinated cohorts of 19.1% in order to render the findings insignificant.

Ongoing clinical studies offer preliminary evidence that existing vaccines may reduce risk of SARS-CoV-2 infection. For example, interim results from the ACTIVATE trial 13 indicate that the BCG vaccine reduces SARS-CoV-2 infection rates up to 53%. While specific vaccines such as BCG are being tested for cross-protective effects against SARS-CoV-2 infection based upon their prior potential for protection against other diseases 15 , to our knowledge, a systematic hypothesis-free analysis to identify potential vaccines that can have beneficial effects against SARS-CoV-2 infection is lacking. Our retrospective study has analyzed 18 different vaccines and identified key vaccines that are associated with lower rates of SARS-CoV-2 infection after controlling for confounding factors (see "Results" section). In particular, we find that individuals who have been recently vaccinated with one of Polio, HIB, MMR, Varicella, PCV13, Geriatric Flu, or HepA-HepB vaccines have lower rates of SARS-CoV-2 infection. These vaccines are promising candidates for follow-up pre-clinical animal studies and clinical trials in the COVID-19. For the rest of the 18 vaccines that we considered, the correlations with SARS-CoV-2 infection were either insignificant or varied across the time horizons of interest. In some cases, these vaccines may serve as negative controls in clinical trials testing the safety and efficacy of novel COVID-19 vaccines. For example, a clinical trial evaluating the COVID-19 vaccine candidate ChAdOx1 uses Meningococcal vaccine as a comparator arm 16 . In this case, Meningococcal vaccine was used as a control instead of the typical saline solution in order to reduce the risk of unblinding, because viral vector vaccinations are known to be associated with certain typical adverse reactions. Preliminary results from this trial indicate that as expected, Meningococcal vaccine does not induce antibody responses against SARS-CoV-2 spike protein. It may be interesting to evaluate the antibody responses for some of the vaccines that we have found to be significantly correlated with lower rates of SARS-CoV-2 infection, to explore if there is any underlying immunologic mechanism for the associations that we observe. www.nature.com/scientificreports/ Because the BCG vaccine is rarely administered in the US, this vaccine did not meet the sample size threshold for inclusion in our analysis. From the limited data available, there were 51 individuals in the study population who had taken BCG vaccine in the past 5 years, and among these 0 individuals tested positive for SARS-CoV-2 infection (95% CI (0.0%, 7.0%)). Among the 198 individuals who had taken BCG vaccine at least once in their lifetime, there were 6 (3.0%) individuals who tested positive for SARS-CoV-2 infection (95% CI (1.4%, 6.5%)). We note that no individuals in the study population received BCG over the 1-year time horizon, and only 1 over the 2-year time horizon. As a result, more data from additional medical centers would be required for us to assess the associations between BCG vaccine and SARS-CoV-2 infection.

There are prior studies highlighting mechanisms of activation of broad immune signalling pathways by vaccines, which might also be providing protection against SARS-CoV-2. This nonspecific innate response conferring protection to other infections is termed as 'trained immunity' 17 . For example, in the case of tuberculosis vaccine-Bacillus Calmette-Guerin (BCG) induces immune response against micro-organisms beyond Mycobacterium tuberculosis, such as Candida albicans and Staphylococcus aureus 18 . There is also evidence of epigenetic histone modifications observed in the monocytes/macrophages promoting the expression of patternrecognition molecules upon stimulation through BCG 17, 19 . Recently there have been a number of studies systematically exploring the effect of BCG vaccine in treating COVID-19 patients 4, 6, 8, 20, 21 . In the case of Haemophilus influenzae type-B, the activation of complement system by Haemophilus influenzae type-B is well studied 22 and recently there has been a report on decreased complement C3 levels being associated with poor prognosis in patients with COVID-19 23 . The complement C3 inhibitor AMY-101 is currently in phase-2 clinical trial for treatment of COVID-19 24 . Here, the cross-protection provided through Haemophilus influenzae type-B vaccine could potentially be mediated through regulation of the immune complement system. In the case of MMR, engineered live measles vaccine has previously been suggested to confer protection from SARS-CoV in animal models 25 . At a molecular level IFNAR2 deficiency is reported to cause hemophagocytic lymphohistiocytosis (HLH) following measles-mumps-rubella vaccination because of excessive IFN signalling. Although there are reports of SARS-CoV-2 inhibiting the production of IFNβ, externally administered interferons are observed to block the replication of viruses. Thus, interferon signalling indirectly mediated through MMR vaccine could potentially contribute to cross-protection towards SARS-CoV-2. Overall, there are interesting hypotheses around trained immunity from pre-existing vaccines having a potential effect against SARS-CoV-2 and further studies to investigate these are warranted.

Due to the observational nature of this study, there are potential biases which may have impacted the findings, including confounding, selection bias, and measurement bias. The motivation for using propensity score matching was to account for confounding. Although we take into account some potential confounders through propensity score matching, there may still be residual confounding from unobserved factors (e.g. socioeconomic Table 8 . Summary of SARS-CoV-2 rates for race/ethnicity-stratified vaccinated and unvaccinated propensity score matched cohorts (2 year time horizon). Table of www.nature.com/scientificreports/ status, adherence to social distancing measures, use of personal protective equipment etc.) which may be different for each vaccine. For example, travel history is a risk factor for exposure to SARS-CoV-2 infection that we do not explicitly account for in this study. Our motivation for the tipping point sensitivity analysis is to estimate the effect size and prevalence of an unobserved confounder which would be required to overturn the statistically significant findings (see Fig. 6 ). Even among the variables that we consider, there is potential for bias if the cohorts are poorly matched on those covariates. In Tables S1-S7, we present the propensity score matching results for a number of vaccines at the 1 year time horizon, in order to show the matching quality for each of these statistical comparisons. Furthermore, we present plots showing the distribution of the age covariate in particular in Fig. S1 . We note that for some vaccines, differences in age between the vaccinated and unvaccinated (matched) cohorts may have influenced the results. In addition, it is possible that restricting the study population to SARS-CoV-2 PCR tested individuals may have introduced selection bias. For example, vaccinated individuals may engage in more health-seeking behaviors to reduce their potential COVID-19 risk, and also have a higher likelihood of seeking out a PCR test. This type of bias is known as the "healthy user effect", which is suspected to have influenced the findings of recent COVID-19 observational studies 17, 18 . We performed sensitivity analyses using breast cancer and colon cancer screening as negative controls which suggest that the propensity score matching analysis is in part effective in filtering out healthy user effect for the associations between vaccination status and SARS-CoV-2 risk. Finally, measurement bias is a concern as vaccination records may be incomplete for some individuals in our cohort since they may have received the vaccines outside of the Mayo Clinic system. We plan to perform additional sensitivity analyses to further explore these potential sources of bias. Table 9 . Summary of SARS-CoV-2 rates for race/ethnicity-stratified vaccinated and unvaccinated propensity score matched cohorts (5 year time horizon). Table of www.nature.com/scientificreports/ As an initial exploratory analysis linking historical vaccination records to SARS-CoV-2 PCR testing results, more research is warranted in order to confirm the findings. We plan to update this analysis in coming months as more PCR testing data becomes available. Also, we note that this study is based on data from one academic medical center in the United States, which restricts the analysis to vaccines administered in this geographic region. Notably, we do not have sufficient immunization record data on the BCG vaccine, which has shown promise in early clinical trials. As a result, the findings from this study would be well complemented by similar studies from hospitals across the world.

Study design. This is an observational study in a cohort of individuals who underwent polymerase chain reaction (PCR) testing for suspected SARS-CoV-2 infection at the Mayo Clinic and hospitals affiliated to the Mayo health system. The full dataset includes 152,548 individuals who received PCR tests between February 15, 2020 and July 14, 2020. We restricted the study population to 137,037 individuals from this dataset who have at least one ICD code recorded in the past 5 years. This exclusion criteria is applied in order to restrict the analysis to individuals with medical history data. Within this PCR tested cohort, we define COVID pos to be persons with at least one positive PCR test result for SARS-CoV-2 infection, which includes 5679 individuals. Similarly, we define COVID neg to be persons with all negative PCR test results, which includes 131,358 individuals.

For the study population of 137,037 individuals, we obtain a number of clinical covariates from the Mayo Clinic electronic health record (EHR) database, including: demographics (age, gender, race, ethnicity, county of residence), ICD diagnostic billing codes from the past 5 years, and immunization records from the past 5 years (68 unique vaccines; we focus on the 18 taken by at least 1000 individuals over the past 5 years). We use the Elixhauser Comorbidity Index to map the ICD codes from each individual from the past 5 years to a set of 30 medically relevant comorbidities 19 . In addition to the Mayo Clinic EHR database, we use the Corona Data Scraper online database to obtain incidence rates of COVID-19 at the county-level in the United States 19, 20 . By linking the county of residence data from the EHR with the incidence rates of COVID-19 from Corona Data Scraper, we are able to obtain county-level incidence rates of COVID-19 for 136,313 individuals in the study population. We also obtain county-level testing data for 100,433 individuals in the study population from (i) Minnesota state government records and (ii) public county-level testing data scraped from other state/county websites. In Table 1 , we present the average values for each of the clinical covariates in the study population.

Given these clinical covariates, we conduct a series of statistical analyses to assess whether or not each of the 19 vaccines has an association with lower rates of SARS-CoV-2 infection at the 1 year, 2 years, and 5 year time www.nature.com/scientificreports/ horizons. For each vaccine and time horizon, the vaccinated cohort is defined as the set of individuals in the study population who received the vaccine within the past time horizon. For example, the "2-year polio vaccinated cohort" is the set of individuals who received the polio vaccine within the past 2 years. Similarly, for each vaccine and time horizon, the unvaccinated cohort is defined as the set of individuals in the study population who did not receive the vaccine within the past time horizon. For example, the "5-year influenza unvaccinated cohort" is the set of individuals who did not receive the influenza vaccine within the past 5 years. In the following sections, we describe the statistical methods that we use to compare the rates of COVID-19 between the vaccinated and unvaccinated cohorts for each of the (vaccine, time horizon) pairs. First, we describe the propensity score matching analysis to construct unvaccinated control groups that have similar clinical characteristics to the vaccinated cohorts. Second, we describe the statistical tests that we use to determine which of the (vaccine, time horizon) pairs have the most significant association with lower rates of SARS-CoV-2 infection for the 1 year, 2 year, and 5 year time horizons, both overall and for particular demographic subgroups. Third, we describe the covariate-level stratification analysis to identify vaccines which have the largest association with lower rates of SARS-CoV-2 infection for particular demographic subgroups. Finally, we describe the sensitivity analyses that we use to evaluate the robustness of the statistical methods to potential biases from unobserved confounders or other factors that could impact the overall results from this observational study.

Propensity score matching to construct unvaccinated control groups. Before running the propensity score matching step, first we filtered to vaccinated cohorts with at least 1000 persons. For the overall statistical analysis, there were 13, 15, and 18 vaccines which met this threshold for the 1 year, 2 year, and 5 year time horizons, respectively.

For each vaccinated cohort with sufficient numbers of individuals, we applied 1:1 propensity score matching to construct a corresponding unvaccinated control group with similar clinical characteristics 21 . We refer to this as the "unvaccinated (matched)" cohort, which is a subset of the unvaccinated cohort. We considered the following clinical covariates in the propensity score matching step:

Demographics (age, gender, race, ethnicity) County-level COVID-19 incidence rate (Number of positive SARS-CoV-2 PCR tests in county)/(Total population of county) within ± 1 week of PCR testing date. (1) For each vaccine that is associated with lower SARS-CoV-2 rates in a particular time horizon, we plot the (prevalence, effect size) combinations of an unobserved confounder that would be required to overturn the results. The x-axis indicates the absolute difference in prevalence of the confounder between vaccinated and unvaccinated (matched) cohorts. For example, if the unobserved confounder is present in 25% of the vaccinated cohort and 5% of the unvaccinated cohort, then the absolute difference in prevalence would be 20%. The y-axis indicates the relative COVID pos risk (effect size) of the unobserved confounder. For reference, we show the relative risk of (county-level COVID-19 incidence rate ≥ median value) as a horizontal dotted line, which is equal to 2.78. Each plot is annotated with the top 3 vaccines that are most robust to unobserved confounders, along with the intersection point between the vaccine curve and the reference line. For example, for the polio vaccine at the 1 year time horizon, an unobserved confounder with a relative risk of 2.78 which is prevalent in 17.8% of the vaccinated cohort and 0% of the unvaccinated cohort could explain the differences in SARS-CoV-2 infection rates that we observe in the data. For each of the vaccinated cohorts, we fit a logistic regression model to predict whether or not the individual was vaccinated, using these covariates as predictors. We trained the logistic regression model using the scikitlearn package in Python 22 . Then, we used the model-predicted probability of an individual receiving the vaccine as the propensity score for the individual. Matching was done without replacement using greedy nearest-neighbor matching within calipers. Some subjects were dropped from the positive cohort in this procedure. The matching was performed with caliper width 0.2 × (pooled standard deviation of scores), as suggested in the literature 23 .

After the propensity score matching step, we compare the COVID pos rates for the vaccinated and unvaccinated (matched) cohorts. First, we compute the relative risk, which is equal to the COVID pos rate for the vaccinated (matched) cohort divided by the COVID pos rate for the unvaccinated (matched) cohort. We use a Fisher exact test to compute the p-value for this association. We then apply the Benjamini-Hochberg (BH) adjustment 24 on the p-values over all vaccines for each time horizon to control the False Discovery Rate (at 0.05). We also compute and report 95% confidence intervals for the relative risks.

age, race/ethnicity, and blood type stratified subgroups. We repeat the statistical analysis on subsets of the study population stratified by age, race/ethnicity, and blood type. For age, we consider the subgroups: 0 to 18 years, 19 to 49 years old, 50 to 64 years old, and ≥ 65 years old. For race/ethnicity, we consider the subgroups: White, Black, Asian, and Hispanic. For blood type, we consider the subgroups: O, A, B, and AB. We note that age and race/ethnicity were recorded in the dataset for all subjects, but blood type information was only available for 41,828 subjects.

For each vaccine, at the 1, 2, and 5 year time horizons, we use propensity score matching to construct unvaccinated control groups for each age bracket, race/ethnicity, and blood type subgroup. Matching was done on the same covariates as in the overall analysis (apart from the Race/Ethnicity covariates for the race/ethnicity subgroups). We then compared the COVID pos rates between the vaccinated and unvaccinated (matched) cohorts, and reported the relative risk, 95% confidence interval, and BH-corrected p-values. Sensitivity analyses. We performed two sets of sensitivity analyses, as described below.

Cancer screens as negative controls for propensity score matching procedure. To assess the effectiveness of the propensity score matching procedure, we ran the statistical analysis using cancer screens as the exposure variable instead of vaccinations (i.e. negative control exposure). This set of experiments serves as a negative control because it is highly unlikely that cancer screenings are causally linked to risk of SARS-CoV-2 infection. In particular, we considered the following two cancer screens as negative controls:

Colon cancer screen Whether or not the individual received a screening for colon cancer (within a specified time horizon relative to PCR testing date). Mammogram Whether or not the individual received a mammogram screening for breast cancer (within a specified time horizon relative to PCR testing date),

In Table 12 , we present the results from the negative control experiments. In the unmatched cohorts, we observe that persons who have had a mammogram in the past 1, 2, or 5 years have significantly lower rates of SARS-CoV-2 infection compared to persons who have not had mammograms during the same time period. For example, the SARS-CoV-2 infection rate is 2.5% among persons with mammograms in the past 5 years and 4.5% among persons without mammograms in the past 5 years (p-value: 1.9e−47). This significant difference in SARS-CoV-2 infection rate can be explained by confounding variables, because the unmatched cohorts have different underlying clinical characteristics. However, after propensity score matching, the SARS-CoV-2 infection rate is 2.8% among persons with mammograms in the past 5 years and 2.8% among persons without mammograms in the past 5 years (p-value: 1). www.nature.com/scientificreports/ We observe similar results for the colon cancer screening covariate. For example, the SARS-CoV-2 infection rate is 2.5% among persons with colon cancer screens in the past 5 years and 4.4% among persons without colon cancer screens in the past 5 years (p-value: 9.3e−44). After propensity score matching, the SARS-CoV-2 infection rate is 2.5% with and 2.4% without colon cancer screens in the past 5 years (p-value: 1). In total, 6 comparisons (2 controls, 3 time horizons each) were done. After applying Fisher's method to combine p-values, we get a combined p-value of 0.22 (X 2 = 15, df = 12) against the combined hypothesis that none of the controls have an association with SARS-CoV-2 after propensity score matching.

We expect that the individuals who have recently taken cancer screens may have lower rates of SARS-CoV-2 infection due to the "healthy user effect" 18 . In particular, persons who have recently had mammograms or colonoscopies may engage in general health-seeking behaviors which decrease their risk of SARS-CoV-2 infection or generally decrease their risk of a positive PCR test result. The results from the negative control experiment demonstrates that the propensity score matching is able to correct for confounding variables which may contribute to spurious findings such as those caused by the healthy user effect.

Tipping point analysis. In order to evaluate how robust the associations between vaccinations and SARS-CoV-2 infection found in this study are to the effects of potential confounders, we conduct a "tipping point" analysis 25 . The purpose of this analysis is to find the point at which an unobserved confounder would "tip" the conclusion on each vaccine, making the results no longer statistically significant. Here, there are two dimensions to consider: (1) the effect size (i.e. relative risk of SARS-CoV-2 infection) of the confounder, and (2) the relative prevalence of the confounder in the vaccinated vs. unvaccinated (matched) cohorts. For each vaccine, we compute the relative prevalence and effect size that would be required for an unobserved confounder to overturn the conclusion for a given (vaccine, time horizon) pair. We present the results from the tipping point analysis in Fig. 6 .

Institutional Review Board (IRB) for study at the Mayo Clinic. This research was conducted under IRB 20-003278 at the Mayo Clinic, "Study of COVID-19 patient characteristics with augmented curation of Electronic Health Records (EHR) to inform strategic and operational decisions". The Mayo Clinic granted IRB/ ethical approval for this study and waived the need for informed consent (https ://www.mayo.edu/resea rch/insti tutio nal-revie w-board /overv iew). All methods were performed in accordance with the relevant guidelines and regulations supplied by the Mayo Clinic and HIPAA regulations regarding patient privacy protection. Subjects without research authorization on file were excluded".

The primary data underlying this study was accessed via the Mayo Clinic upon approval of the IRB 20-003278 entitled "Study of COVID-19 patient characteristics with augmented curation of Electronic Health Records (EHR) to inform strategic and operational decisions". On a case by case basis, requests for accessing the deidentified data sets will be considered by the Mayo Clinic, in keeping with HIPAA guidelines for patient privacy protection and the specific contents of the data requests. Address data requests to the corresponding authors of this manuscript. Table 12 . Summary of SARS-CoV-2 rates for individuals who did vs. did not receive negative control treatments before and after propensity score matching. SARS-CoV-2 positive rates, relative risks, and associated BH-adjusted Fisher exact p-values for individuals who received or did not receive negative control treatments over the past 1 year, 2 years, and 5 years prior to PCR test. The negative control treatments considered are: (1) Colon cancer screen and (2) Mammogram. The BH adjustment is applied per time horizon, as in the main analysis. Numbers are shown before and after propensity score matching. Unmatched numbers are shown in bold. www.nature.com/scientificreports/ success of nference. A provisional patent application filed covers some of the findings from this study with CP, AP and VS are named as inventors, with nference as the assignee for this patent application.

The COVID-19 vaccine development landscape

Risk in vaccine research and development quantified

Developing covid-19 vaccines at pandemic speed

Defining trained immunity and its role in health and disease

Trained immunity-based vaccines: A new paradigm for the development of broad-spectrum anti-infectious formulations

Considering BCG vaccination to reduce the impact of COVID-19

Can existing live vaccines prevent COVID-19?

BCG vaccine protection from severe coronavirus disease 2019 (COVID-19)

OPV as potential protection against COVID-19-Full text view-ClinicalTrials.gov. Accessed

HCW-Full text view-ClinicalTrials.gov. Accessed

Influenza vaccination, ACEI and ARB in the evolution of SARS-Covid19 infection-Full text view-ClinicalTrials.gov. Accessed

BCG vaccination for healthcare workers in COVID-19 pandemic-Full text view-ClinicalTrials.gov

Bacillus Calmette-guérin vaccination to prevent COVID-19-Full text view-ClinicalTrials.gov

BCG vaccination to protect healthcare workers against COVID-19-Full text view-ClinicalTrials.gov

BCG vaccine for health care workers as defense against COVID 19-Full text view-ClinicalTrials.gov

Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: A preliminary report of a phase 1/2, single-blind, randomised controlled trial

Collider bias undermines our understanding of COVID-19 disease risk and severity

Healthy user and related biases in observational studies of preventive interventions: A primer for physicians

Comorbidity measures for use with administrative data

An introduction to propensity score methods for reducing the effects of confounding in observational studies

Scikit-learn: Machine learning in python

Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies

Controlling the false discovery rate: A practical and powerful approach to multiple testing

Assessing the sensitivity of regression results to unmeasured confounders in observational studies on JSTOR

The authors thank Murali Aravamudan, Patrick Lenehan, Jake Martin, Walter Kremers, and Hilal Maradit-Kremers, for their careful review and helpful feedback on this manuscript.

C.P. and A.P. developed the methods and analytical techniques, interpreted the results, and wrote the manuscript. H.B. and V.A. supported the statistical analysis conducted. A.J. and V.S. led the study design and wrote the manuscript. C.P. and V.S. conceptualized the study and reviewed the manuscript. R.K., J.C.O.H., G.J.G., A.W.W., J.H. and A.D.B. interpreted the results and reviewed the manuscript. All authors reviewed the findings and revised the manuscript based on critical feedback from reviewers and colleagues.

The funding was provide by Division of Intramural Research, National Institute of Allergy and Infectious Diseases (AI110173), amfAR, The Foundation for AIDS Research (109593-62-RGRL) and also by National Institute of Allergy and Infectious Diseases (AI120698).

One or more of the investigators associated with this project and Mayo Clinic have a Financial Conflict of Interest in technology used in the research and that the investigator(s) and Mayo Clinic may stand to gain financially from the successful outcome of the research. This research has been reviewed by the Mayo Clinic Conflict of Interest Review Board and is being conducted in compliance with Mayo Clinic Conflict of Interest policies. ADB is a consultant for Abbvie, is on scientific advisory boards for Nference and Zentalis, and is founder and President of Splissen therapeutics. The authors from nference are employees of nference and have financial interests in the

Supplementary Information The online version contains supplementary material available at https ://doi. org/10.1038/s4159 8-021-83641 -y.Correspondence and requests for materials should be addressed to V.S.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.