key: cord-0324291-r8i71jqd authors: Nguyen, M.; Paul, E.; Mills, P. K.; Paul, S. title: Risk of COVID-19 Reinfection and Vaccine Breakthrough Infection, Madera County, California, 2021 date: 2022-01-23 journal: nan DOI: 10.1101/2022.01.22.22269105 sha: 783d00c9950ab7f58ef7f24bb76e810c6a88be7c doc_id: 324291 cord_uid: r8i71jqd The probability of either testing COVID-19 positive or dying for three cohorts in Madera County, California in 2021 was compared. These cohorts included 1. those unvaccinated, 2. those vaccinated and 3. persons with a previous COVID-19 infection. The three groups were made generally comparable by matching on age, gender, postal zip code of residence, and the date of either COVID-19 infection or of vaccination. The hazard ratio (HR) for death (from all causes) after COVID-19 infection vs. vaccination was 11.7 (95% CI 5.91-23.1, p<0.05). The HR for testing positive for COVID-19 >14 days after initial COVID-19 infection or after completing primary COVID-19 vaccination was 1.98 (95% CI 1.53-2.58 p<0.001). As the majority of positive COVID-19 tests in the post COVID-19 cohort occurred within 90 days of the initial infection, and as these early positives may not represent a new infection, we also compared rates of testing COVID-19 positive 90 days after initial infection or vaccination. After removing these early positive COVID-19 tests that occurred between days 14-90, the HR ratio for testing COVID-19 positive is now lower for the post COVID-19 cohort compared with the vaccinated cohort. The risk for having a positive COVID-19 test occurring 90 days after an initial COVID-19 infection or after vaccination was 0.54 (95% CI 0.33-0.87, p<0.05) for the post COVID-19 group vs Vaccinated group. Thus the risk for testing COVID-19 positive was higher in the first 90 days after COVID-19 infection compared to those vaccinated. However, from 90 to 300 days after COVID-19 infection, the post COVID-19 infection cohort had a lower risk of testing COVID-19 positive than those fully vaccinated. Prospective studies of specific groups such as health care workers can provide high quality data but may be limited in generalizability. (3) (4) (5) Community level observational studies may be biased as they are based on subject-driven testing. Persons with varying degrees of access to and motivation for testing for COVID-19 when symptomatic will differ. There are also unknown levels of undiagnosed prior COVID-19 in the community, and changing levels of COVID-19 incidence and COVID-19 variants in the community over time. (3) The primary goal of this study was to compare the risk of re-infection in persons recovered from an initial COVID-19 infection to the risk for persons who were fully vaccinated. The benefit of full vaccination has been well studied in randomized clinical trials (i.e. vaccine efficacy) and also in large observational studies (i.e. effectiveness),(6) so the protection from full vaccination can serve as a benchmark for comparison. To address various critical real-world issues of changing COVID-19 incidence over time, new variants emerging over time, and the difference in access and willingness to test in different populations, we chose to study the incidence of COVID-19 infections in three matched cohorts. Initially we selected three groups of persons from the available testing and vaccination data in Madera County. These groups were the "unvaccinated" group consisting of persons who have tested COVID-19 negative prior to the start of the study (January 1, 2021) and who were not vaccinated. The criterion of using a prior negative test was partly for convenience as the testing registration process elicited demographic information for this population. This information is not available for the untested, unvaccinated population at large in the County. By testing negative, persons in this group have demonstrated access to testing, and possibly a level of risk and concern for COVID-19 higher than that of the never-tested population. The second group, the "Vaccinated," consists of persons who had either never been tested for COVID-19 or had tested negative for COVID-19 prior to vaccination. The "post COVID-19" group includes all people who tested positive for the first time on or after January 1, 2021. There were significant differences in the characteristics of these three groups, most importantly age (due to the age-based rollout of vaccine eligibility) and also the distribution of dates of testing positive for COVID-19 in 2021 (as there were few vaccinated persons in the first 4 months of 2021). As there could be significant bias from comparing mortality rates in groups of differing ages, and infection rates if different time periods were compared, three matched cohorts were selected from each of the three groups. These three cohorts were created by selecting persons from each of the three groups that had identical gender, postal zip code of residence, age (+/-2 years) and date of infection or completing vaccination (+/-10 days). Residence postal code was included as a matching criterion as a proxy for socioeconomic status and race. Data on socioeconomic status was not available for individuals and race data from vaccine and test registration is often incomplete (especially for earlier time periods) which would make matching difficult. However, as these demographic variables for Madera County population are correlated with location of residence, we used matching by postal code in an effort to match for race and socioeconomic status. The goal of matching people for the final criterion, date of vaccination or of COVID-19 infection, was to ensure that identical time periods of risk were being observed for each cohort. COVID-19 testing results (negative and positive, including antigen tests, molecular testing and COVID-19 antibody tests) must be reported to the California Reportable Disease Exchange system (CalREDIE). (7) Commercial, public health and most clinical laboratories report results electronically directly to CalREDIE. Rapid antigen testing may be reported by manual entry through internet or app-based access. The extent to which home self-testing results are reported is not known. All reported COVID-19 test results . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 23, 2022. Persons were identified and all duplicates were removed by grouping results based on a unique DOB, gender and first three initials of their first and last names. For each person the following information was determined: their first and last COVID-19 positive test dates, their first COVID-19 negative test date, their vaccination status and administration dates, whether they completed the primary vaccine series, and their date of death if deceased (all cause mortality). Vaccine recipients who had no COVID-19 test data were also included in this dataset. Thus, these are the selection criteria for the three subject groups. Matching: There were significant differences in the demographic characteristics of these three groups, most importantly age (due to the age-based rollout of vaccine eligibility) and distribution of dates of testing positive for COVID-19 in 2021 (as there were few vaccinated persons in the first 4 months of 2021). As . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint there could be significant bias from comparing mortality rates in groups of differing ages, and infection rates if different time periods were compared, three matched cohorts were selected from the three groups. These three cohorts were created by selecting persons from each of the three groups that had identical gender, postal zip code of residence, age (+/-2 years) and date of infection or completing vaccination (+/-10 days). Residence postal code was included as a matching criterion as a proxy for socioeconomic status and race. Using test and vaccine registration information, data on socioeconomic status was not available for individuals and race data is often incomplete (especially for earlier time periods) which would make matching difficult. However, as differences in these demographic variables . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; Cohort 1 unvaccinated prior COVID-19 test negative: the date of group 2 vaccination that was used for matching cohorts 2 and 3 was also used as the initial date to begin recording events in cohort 1. This date was chosen as the start date for cohort 1 for recording events to ensure that identical time periods at risk were used for observing events in the three groups. Deaths were recorded on or after this matching date. Infection events were recorded starting after day 14 (or after day 90) from this matching date. These recording start dates were chosen to be identical with the event recording time-period of cohorts 2 and 3. Risk of "failure" in the three above defined risk groups was evaluated using a survival analysis approach. and plotted using the matplotlib library for python 3.9. (18) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. To compare risk of failure in the unvaccinated and post COVID-19 infected groups to the vaccinated cohort, hazard ratios (HR) and 95% confidence limits were calculated using Stata software. (16) Chi-Square calculations were performed using web based calculators. (19) Results: Testing volume varied significantly over time (testing increased during times of increased COVID-19 incidence) and also increased with school reopening in the Fall of 2021. During the time period of this study, the average rate of COVID-19 testing in Madera County was 383 tests/100,000 person-days. There were no reported cases of the omicron variant in Madera County during the time period of this study. Prior to matching, the average age of the vaccinated group was significantly older (48 years) compared with the unvaccinated and post COVID-19 groups (37 and 31 years respectively) (see Table 1 ). The matching process identified 6,318 persons in each group who had a matching age, gender, postal code and date of infection/vaccination (if more than one person could be matched, then a single match was selected at random). The average age of these groups was 36 years old, and 52% were female (see Table 1 ). Race and ethnicity were not used for matching as there were many persons missing this demographic information. However, a comparison of the available demographics for the three cohorts after matching using the available race and ethnicity data is possible. As Cohort 1 (unvaccinated) included all persons with negative COVID-19 tests results beginning in 2020, and 2020 was when race and ethnicity data was most often not recorded, Cohort 1 has by far the largest percent of subjects missing race/ethnicity data (59% missing data for Cohort 1, vs 9,2% and 11.9% for Cohorts 2 and 3). The data in Table 1 demonstrates how the use of zip code for matching resulted in similar race/ethnicity composition for Cohorts 2 and 3. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint differences in race/ethnicity data between groups 2 and 3 is statistically significant, however, p<0.01 by chi-square analysis). (19) The demographics of Madera County overall, and the three groups prior to and after matching are shown in Table 1 . groups, and the matched cohorts drawn from these three groups 1 Only White-Hispanics were reported in the three matched cohorts . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint Death and COVID-19 testing results: The number of deaths and positive tests for Madera County as a whole and the three matched cohorts are shown in Table 2 . Table 2 : Outcomes for Madera County, the three selected groups, and the matched cohorts from each of these three groups. *for Madera County data no 14 day lag was included: 13,577 is the total number of all COVID-19 positive tests for 1/1/2021-12/7/2021 and the incidence rate of 23.6/100,00 person days is based on that number of infections. ** for the unvaccinated group prior to matching, a start date for observation of 1/1/2021 was used. After matching and assigning a start date matched to the vaccination date for the matched pair, the mean start date for the unvaccinated cohort changed to 7/7/2021. This shift in mean start date moved the observation period which then corresponded with the fall surge in infections, thus resulting in a markedly higher incidence rate post matching. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; In this analysis, the risk of death after testing positive for COVID-19 was, as expected, significantly higher than for the vaccinated and unvaccinated cohorts. For subjects that survived their initial COVID-19 infection, the risk of testing positive for COVID-19 positive again was higher in the first 14-90 days for the post COVID-19 cohort compared with the vaccinated cohort. However, the hazard ratio for retesting . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint COVID-19 positive between 90-300 days after the initial COVID-19 infection or after completing vaccination, for the post COVID-19 cohort was significantly lower than for the vaccinated cohort (HR 0.54, 95% CI 0.44-0.87, p<0.05). The unvaccinated cohort had, as expected, the highest risk of COVID-19 infection during this 300-day time period. One goal of this study was to compare the risk of reinfection post COVID-19 infection with the risk of vaccine breakthrough infections. Vaccine breakthrough infection rates can serve as a risk benchmark, as these rates have been studied both in large prospective clinical trials and in ongoing observational studies. The findings of our analysis of community testing based reinfection and vaccine breakthrough rates can provide accurate insight for these relative rates of infection to the extent that the following hold true: that the proportion of undiagnosed infections is similar in both cohorts (which depends largely on equivalent willingness and access to testing in the two cohorts), that the rate of prior immunity from undiagnosed COVID-19 infection is similar in both cohorts, and that the risk of exposure to COVID-19 and specific variants was also similar in both cohorts. The goal of our matching cohorts based on date of infection/vaccination, age, gender, and postal code of residence was a means to minimize these potential areas of bias that would limit generalizability of our findings. The matching of subjects based on date of initial COVID-19 infection or vaccination completion is of critical importance for limiting bias between the cohorts for risk of COVID-19 exposure and to different COVID-19 variants. This matching ensured that new positive tests were being recorded in all three cohorts during identical time periods of community-wide COVID-19 incidence and the presence of the same COVID-19 variants. Matching for age was also of critical importance. Due to the age-based rollout of vaccination eligibility over time, the vaccinated group clearly has a strong deviation from the community demographics that changed over time. Our study design ensured we were comparing rates of death and COVID-19 test positivity for people of the same ages over identical time periods. For COVID-19 infection, age is probably the strongest predictor of mortality and matching on age is critical for this survival analysis. For post-testing COVID-19 positive, matching for age is likely also important for correcting differing exposure risks that may be age related (i.e., working vs. retired age groups), willingness to test, and access to healthcare of different age groups. Vaccination eligibility was also prioritized initially for persons with high-risk medical conditions. We were not able to correct for that factor with the data available. This health-status bias may explain the ongoing mortality in the vaccinated cohort that appears to be higher than in the post COVID-19 cohort. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 23, 2022. ; https://doi.org/10.1101/2022.01.22.22269105 doi: medRxiv preprint that all three groups showed a female preponderance both prior to matching and after matching. While one could posit a higher risk of exposure for one gender or a biological predisposition for infection, that would not explain the female predominance for testing COVID-19 negative (Group 1), or for vaccination. This female bias may be due to willingness to access health care (vaccination and testing) or to a higher risk for non-COVID-19 conditions (such as seasonal allergies) that could lead to higher rates of COVID-19 testing. This female bias demonstrates the importance of our use of matching criteria. Matching by postal zip code was used as a proxy for race/ethnicity and socioeconomic status, and also for proximity to health care services. There are significant differences between these demographic factors in Madera County by postal code area. We did not have income data available for subjects, and race/ethnicity data had too many incomplete entries to effectively match on this variable directly without excluding large numbers of persons. However, a post-matching comparison of the race and ethnicity data that is available shows that our matching process for the cohorts did create similar race and ethnicity demographic profiles, especially for the two cohorts of greatest interest with the most complete data for these variables (post COVID-19 infection and vaccinated group). Our "unvaccinated" cohort is likely the least representative of a truly random sample of unvaccinated persons in Madera County. To have demographic information available and a defined cohort of people to observe for positive test events, we used persons who had completed a test prior to 1/1/2021 that was negative. It is likely that persons who have tested differ from the 31% of the county that had no test results recorded. Having tested negative (vs. never testing) suggests this group may have higher risk of exposure, better access to testing, and a greater willingness to test than the "never-tested" group. These biases could make this cohort have a higher incidence of positive COVID-19 test results than the group of all unvaccinated persons in Madera County. Recovery scenario and immunity in COVID-19 disease: A new strategy to predict the potential of reinfection Incidence of COVID-19 reinfection: an analysis of outpatient-based data in the United States of America. medRxiv Robust humoral and cellular immune responses and low risk for reinfection at least 8 months following asymptomatic to mild COVID-19 Evolution of antibody responses up to 13 months after SARS-CoV-2 infection and risk of reinfection COVID-19 Vaccines and Vaccination Outdoor and Indoor Youth and Recreational Adult Sports Order of the State Public Health Officer Requirements for Visitors in Acute Health Care and Long-Term Care Settings Order of the State Public Health Officer Vaccine Verification for Workers in Schools Nonparametric Estimation from Incomplete Observations Matplotlib -Visualization with Python Chi Square Calculator -Up To 5x5