key: cord-0900249-yys8umd8 authors: Goss, Charles W.; Maricque, Brett B.; Anwuri, Victoria V.; Cohen, Rachel E.; Donaldson, Kate; Johnson, Kimberly J.; Powderly, William G.; Schechtman, Kenneth B.; Schmidt, Spring; Thompson, Jeannette Jackson; Trolard, Anne M.; Wang, Jinli; Geng, Elvin title: SARS-CoV-2 active infection prevalence and seroprevalence in the adult population of St. Louis County date: 2022-03-08 journal: Ann Epidemiol DOI: 10.1016/j.annepidem.2022.03.002 sha: 4f496cd5f359b0c7b87304cdb2eb864bef80322b doc_id: 900249 cord_uid: yys8umd8 Background: The true prevalence of COVID-19 is difficult to estimate due to the absence of random population-based testing. To estimate current and past COVID-19 infection prevalence in a large urban area, we conducted a population-based survey in St. Louis County, Missouri. Methods: The population-based survey of active infection (PCR) and seroprevalence (IgG antibodies) of adults (≥ 18 years) was conducted through random-digit dialing and targeted sampling of St. Louis County residents with oversampling of Black residents. Infection prevalence of residents was estimated using design-based and raking weighting. Results: Between August 17 and October 24, 2020, 1,245 residents completed a survey and underwent PCR testing; 1,073 residents completed a survey and underwent PCR and IgG testing or self-reported results. Weighted prevalence estimates of residents with active infection was 1.9% (95% CI, 0.4% to 3.3%) and 5.6% were ever infected (95% CI, 3.3% to 8.0%). Overall infection hospitalization and fatality ratios were 4.9% and 1.4%, respectively. Conclusions: Through October 2020, the percentage of residents that had ever been infected was relatively low. A markedly higher percentage of Black and other minorities compared to White residents were infected with COVID-19. The St. Louis region remained highly vulnerable to widespread infection in late 2020. Although the global COVID-19 pandemic has led to almost six million deaths globally [1] and over 900,000 in the United States alone [2], fundamental epidemiological characteristics of the epidemic remain incompletely understood. In particular, in the first year of the pandemic clinical manifestations of disease revealed a remarkable spectrum of clinical severity (as many as 50% of infected people are asymptomatic, and 90% are not severely ill) [3] [4] [5] , routine public health surveillance based on diagnosed cases in routine care vastly underestimate both the prevalence of active infection at any given time, and the cumulative incidence over time. The limited supply of tests (early in the pandemic) [6] [7] [8] [9] and the variation in limitations by geography [10] , time, and racial and socio-economic groups [11] [12] [13] all make it inappropriate to use any single "correction" factor. Without an estimate of true incidence and prevalence of infection, however, the actual burden of disease, and differences in burden by geography, racial groups, and socio-economic status remain incompletely known. In addition, calculating the risk of hospitalization and death as well as the reach of municipal testing efforts all require the estimates of true prevalence of infection. Few existing studies that attempt to assess the prevalence of active as well as cumulative COVID-19 infections [14] [15] [16] [17] [18] have been based on survey sampling methods in the United States (US). A number of studies estimating SARS-CoV-2 seroprevalence have been conducted globally [19, 20] , but the majority of these studies were conducted outside of the US and therefore are of uncertain relevance to the US epidemic. Most of the surveys that have been conducted in the US are based on convenience samples [but see 21, 22] which are more likely to generate biased prevalence estimates compared to population-based probability samples. St. Louis County is among the 50 most populated counties in the US and the most populated county in Missouri, with about one quarter identifying as Black and two thirds identifying as White [23] . Early on in the pandemic (April-May, 2020) estimates of seroprevalence in the St. Louis metropolitan area [24] and the Missouri region [25] indicated that about 3% of individuals had been infected with SARS-CoV-2 at some point; however, these studies relied on convenience serum samples collected during routine healthcare and accurate estimates of infection that are critical to estimating key epidemiological measures (e.g., risk of death and hospitalization) and to informing public health decisions were largely lacking in Missouri during the early stages of the pandemic. In the fall of 2020, we undertook a complex survey sampling study to assess the prevalence of active COVID-19 disease (as measured by PCR assay), and the prevalence of infection to that point (using both PCR and antibody assays) in St. Louis County. We stratified the sample to enable examination of prevalence by racial group and geography to identify variable disease burden in sub-populations in the region. We also used the estimate of true infection prevalence to assess regional penetration of testing (number with active infections compared to number tested) as well as hospitalization and fatality rates (number ever infected compared to cumulative hospitalizations and deaths). This formal application of a survey sampling approach offers additional insights into the extent of the COVID-19 pandemic, regional variability in burden of disease, as well as the impact of public health activities. Our target population consisted of St. Louis County, Missouri residents ≥ 18 years that were not in long-term care facilities and who could be reached by landline or cell phone. We sampled individuals from this population for our study through random digit dialing (RDD). Telephone numbers were obtained from Marketing Systems Group (MSG), a commercial vendor who supplies contact information for a range of survey purposes (e.g., public health surveillance, marketing, political campaigns). We initially sought an equal number of Black and White respondents to obtain estimates of comparable precision in each group. Because ~24% of the county residents identify as Black, but the racial/ethnic identity of phone owners is not known a priori, we oversampled phone numbers linked to geographical areas where the majority of the residents are Black. Initially, fully RDD numbers based on Behavioral Risk Factor Surveillance System (BRFSS) [26] protocols were used to recruit St. Louis County residents for testing (eligible population ~777,067). Due to slow ramp-up of calling activities and time constraints, we later supplemented the sample with listed cell and landline phone numbers provided by MSG. Use of listed numbers increases operational efficiency because the sample excludes nonworking and business numbers and is also accompanied by socio-demographic information, but sacrifices a pure probability sample since mechanisms of selection into listed samples are not comprehensively known to the vendor. Residents reached by phone were offered SARS-CoV-19 testing and invited to participate in a 15-minute survey. PCR and/or antibody testing was conducted at one of seven locations distributed throughout St. Louis County. The Health and Behavioral Risk Research Center (HBRRC) at the University of Missouri Columbia was contracted to complete calling, phone interviews, and scheduling of tests. We offered gift card incentives for completing the survey and participating in PCR and/or antibody testing; we also provided roundtrip rides via Uber Health and cab vouchers to participants who needed transportation. The surveys administered were primarily based on the 2020 BRFSS and included newly-developed custom COVID-19 pandemic related questions. At each of seven specimen collection sites, nasopharyngeal swabs were collected. PCR tests were offered to all participants in this study. Presence or absence of infection with SARS-CoV-2 was determined using FDA-approved RT-PCR assays on the Roche Cobas® platform [27] . All PCR testing was performed in CAP/CLIA certified laboratories. At five of seven specimen collection sites, 5 mL venous blood samples were collected from those participants that agreed to an IgG antibody test. Due to personnel limitations, during the last 2 weeks off the study antibody tests were only performed at 2 of the sites. Presence or absence of SARS-CoV-2 specific IgG antibodies was determined using FDA approved Abbott Architect chemiluminescent microparticle immunoassays [28] . All IgG antibody testing was performed in CAP/CLIA certified laboratories. To obtain county-level estimates of the number of individuals infected at the time of testing, we used the estimated prevalence from our study and St. Louis County census and testing data from the Missouri Department of Health & Senior Services. The percentage of individuals that were tested in the county was determined by tabulating the total number of residents that had PCR-positive tests at the midpoint of the study testing period (September 20, 2020) and compared these totals to the total expected individuals with active infection. We used + or -10 days of PCR positivity around date of positive PCR test. Given the sensitivity of estimates to assumed duration of infection and the uncertainty of duration of PCR positivity in nonhospitalized patients, we explore effects on estimates assuming ±7 and ±14 days [29] . Repeat positive tests and congregate facilities (e.g., nursing homes) were removed from the testing dataset. The infection hospitalization ratio (IHR) and infection fatality ratio (IFR) were determined by tabulating the cumulative number of COVID-19 related hospitalizations and deaths (excluding congregate facilities such as nursing homes) from March 2020 through the midpoint of our testing period (hospitalization data from the Missouri Hospital Association, and death data from the Missouri Department of Health & Senior services). These totals were then divided by the number of residents estimated to have ever been infected to obtain IHR and IFR estimates. The lower and upper confidence bounds from the ever-infected prevalence estimates were multiplied by the county census estimates to obtain the denominator for the 95% confidence bounds for IHR and IFR. Estimates are reported for overall and stratified by age, race, and sex. Instances where race was unknown were imputed using a hot-deck imputation approach [30] . Our analytical approach entailed imputation and weighting of the survey data followed by a weighted frequency analysis. Sociodemographic variables used in the weighting process (sex, geographic area, age, race, ethnicity, education, income, smoking history, COVID-19 testing history, depression and adults with reduced contact due to epidemic) as well as other key variables from the survey were imputed using a hierarchical hot-deck imputation approach [30] . The probability sample (n = 951) was weighted using a combination of design-based weights as well as raking based on geodemographic variables. For the targeted non-probability sample (n = 1,363) in addition to geodemographic weighting variables, a set of calibration variables (depression, smoking history, testing history, and reduced adult contact during pandemic) were used in the weighting process for this subset to better integrate these data with the probabilitybased sample and obtain the final weights for the total survey sample (n = 2,314). Individuals with active infection were determined as the subset of individuals who completed the survey and took a PCR test (n = 1,245) . Individuals that completed a survey and took a PCR and IgG test or did not test with us but reported a recent positive or negative test (n = 31) as the main reason for not wanting to test in our study were included in our estimates of "ever infected" group (total n = 1,073). Data for both outcomes were re-weighted using design-based weights and an iterative raking approach. More details regarding the imputation and weighting approach can be found in the supplement (see Supplementary Methods). Active infection (PCR) and ever infected (PCR, IgG or self-reported [previous positive or negative test]) prevalence was estimated using Taylor series linearization [31] to adjust the variance estimates for survey weights. In addition to overall prevalence, we estimated the prevalence and associations between infection and key variables collected from the survey that included: age (18-39, 40-60, ≥ 60 years), race (White, Black/other), sex (F, M), county division (Central, Inner North, Outer North, South, West), prior COVID-19 testing history, mask wearing frequency (always vs not always), COVID-19 related symptoms, income level (< $35,000 vs ≥ $35,000), and education (attended college vs did not attend college). Results are reported as n (%), mean ± SD, and weighted prevalence (%) with 95% confidence intervals (CIs). As sensitivity analyses we estimated unweighted the unweighted prevalence for active (PCR), ever infected (PCR, IgG, self reported), and ever infected with self-reported results excluded. All analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and Pvalues < 0.05 were considered significant. Over the course of the study, a total of 121,423 persons were reached and 4,994 were eligible to be tested ( Figure 1 ). Of these, a total of 1,245 took a PCR test and 1,073 took both PCR and IgG tests or refused to test with us but reported having a recent prior negative or positive test result. Sixty three percent of respondents were female and 63%, 35%, and 2% of respondents were White, Black or another minority group, respectively. The mean age for the cohort was 60±15.5 years. The highest percentage of respondents were in the northern (51%) and western parts of the county (23%), with the southern (13%) and central parts (13%) of the county comprising less than 30% of the total (Figure 2 ). Additional characteristics are provided in Table 1 . We estimated the prevalence of infection as detected by PCR assay ( in females was slightly higher than in males (2.6% vs 1.0%) although not significantly different (P = 0.207). By age group, the highest prevalence was for the 18-39 year age group (2.0%), followed by 40-60 year group (2.4%) and residents > 60 (1.2%) (P = 0.797). There were small (non-significant; P = 0.312) differences in prevalence estimates among different regions in the county with the highest prevalence in the northern areas (Inner North, 3.9%; Outer North, 2.5%), followed by South (2.0%) and West (1.0%); Central County did not have anyone that had a PCR positive test. Symptom status, mask wearing, education level and income level were not significantly associated with having active infection (Table 2) . When we compared the weighted results to the unweighted results, the unweighted absolute prevalence and relative differences between groups were smaller compared to the weighted results but the direction was largely the same (Supplemental Table 1 ). We estimated ( Table 3 ) the prevalence of ever having been infected (as measured by either a + antibody test, + PCR or self-reported positive test) of 5.6% (95% CI, 3.7% to 8.5%). Prevalence estimates for Black residents and other minorities was 10.5% which was almost three times higher compared to White residents (3.6%) (P = 0.008). Those reporting a prior test (any test result) had an estimated prevalence of 9.9% compared to 3.7% for those that reported not having tested previously (P = 0.020). The 18-39 and 40-60 age groups had elevated infection prevalence compared to > 60-year-old group but the difference was not significant (P = 0.691). Female prevalence was slightly elevated compared to males (6.3% vs 4.8%) although not significant (P = 0.539). The northern and southern portions of the county had the greatest prevalence with prevalence estimates near or above 7%, and western and central areas of the county had lower prevalence with estimates of 4.8% and 1.1%, respectively (P = 0.460). Mask wearing, education, COVID-19 symptoms, and income level were not strongly associated with ever being infected (Table 3 ). Sensitivity analyses revealed that, in general, the unweighted and weighted results generated similar prevalence estimates (Supplemental Table 2 ). When we excluded those that reported prior positive or negative tests, we found that the overall unweighted prevalence estimates were just over 1% lower compared to the weighted estimates (overall prevalence, 4.4%); and while there were absolute differences in prevalence estimates for the different groups, the relative differences were generally similar (Supplemental Table 3 ). Based on prevalence of true active infections estimated in our survey (as indicated by PCR), we estimated that the percentage of true infections captured through routine testing ranged from 12.6% to 22.6%, depending on how long individuals are assumed to have active infection ( Table 4 ). The overall infection hospitalization ratio (IHR) was 4.9% (95% CI, 3.3% to 7.5%) and the infection fatality ratio (IFR) was 1.4% (95% CI, 0.9% to 2.2%). IHR and IFR estimates were similar between race and gender groups (Table 5) ; however, there were strong differences between age groups with the youngest age group (18 to 39 years) having the lowest IHR (1.5%) and IFR (~0%), the 40-60 year-old group having intermediate IHR (5%) and IFR (0.4%) estimates, and the oldest age group (> 60 years) had more than double the nearest age group with 10.6% IHR estimate and 5% IFR estimate. Here, we report population-based prevalence estimates of active and ever-infected residents with SARS-CoV-2 in St. Louis County. During the study period (August 17 th through October 24 th 2020), 1.9% of St. Louis County residents had active SARS-CoV-2 infection and 5.6% of residents either had an active infection or had been infected in the past. These data indicate that there has been a substantial burden of disease in the St. Louis region and highlight the considerable portion of the population that was still at risk of SARS-CoV-2 infection at the time of this study in the fall of 2020. This is consistent with the rapid rise in cases in the region during December 2020 and January 2021. Results from our study provided critical and timely information to St. Louis County health officials to make decisions regarding COVID policy in this region. These data are also consistent with other studies in regions that were relatively heavily effects that have led to similar estimates of disease burden around the same calendar time [32] . Although we recorded data on several factors thought to be associated with infection prevalence, race was the only factor with consistently significant associations for both active and ever-infected outcomes. Communities of color have been disproportionately affected by SARS-CoV-2 infection and COVID illness [10] [11] [12] 33] . Consistent with nationwide trends and local testing data, we observed a significant disparity in overall infection prevalence between Black residents and White residents, in which Black residents were nearly three times more likely to have been infected with SARS-CoV-2 than White residents. These disparities are consistent with long-standing race-based disparities in health [34] [35] [36] [37] , increased representation in essential work settings [38] , and lack of access to effective COVID testing resources [10, 11] . Age, sex, and income level have been shown to be associated with infection prevalence in a recent global meta-analysis of infection prevalence [19] . In our study we found numerical differences in the expected direction for two out of three of these factors (higher prevalence for younger ages and lower incomes), but none of the differences were significant. This suggests that our study was either underpowered to detect differences across these groups and/or that these factors are not important in the region that we studied. Nevertheless, the data presented here (esp. racial disparities) are critical for prioritizing resource allocation, modifying testing approaches and improving rates of testing in communities that have been hardest hit by COVID- 19 . Results from the current study indicate that routine testing procedures in St. Louis likely detected 1 in 5 COVID-19 cases, demonstrating that the majority of infections were not identified as cases. A wide range of factors may contribute to this, including a substantial percentage of asymptomatic infections, lack of widespread surveillance testing, decentralized recommendations for when testing was indicated and accessibility issues. While COVID-19 is a disease with remarkably heterogenous clinical manifestations [39] , the low case-ascertainment rate and mild and asymptomatic infections underscores the challenges to public health. Low detection rates severely limit the value of public health strategies such as quarantine of detected cases or contact tracing [40] , strategies whose impact is directly related to the proportion of cases that are rapidly diagnosed and acted upon. Indeed, in most regions in the United States, these measures were unable to keep up with and had minimal impact on the epidemic in part because of limited penetrance of testing. Rapid development of strategies to optimize testing, in addition to the test characteristics themselves, will be critical to the next pandemic response as well as this present one. Consistent with previous estimates, our work suggests approximately 5% of persons with COVID-19 were hospitalized. Population-based studies in Indiana [41] and Connecticut [42] from early on in the pandemic (March through June 2020) report overall IHR estimates that bracket the estimate in our study with IHR estimates of 2.1% and 6.9%, respectively. This figure is relatively sensitive to the age structure of those infected (which is dependent on the age structure of the population) and our results show a strong increase in IHR as a function of age which mirrors these studies. The incidence of hospitalizations among all infected is an important quantity for anticipating the burden on the health system, and therefore for allocating and optimizing resources. The relationship is also of high interest for modeling exercises: because the threshold for hospitalization is relatively uniform, hospitalizations can be used to estimate true burden of infection when diagnosed cases are known to capture only a fraction of true cases. Infection fatality ratio (IFR) estimates from our study indicated that just over 1% of infected individuals died with a strong positive association between age and fatality rates with essentially 0% of the youngest group (18-39 years) dying and 5% of infections in the oldest age group (> 60 years) resulted in death. As with the IHR estimates, our overall and positive association of IFR with age aligns with previously reported IFR estimates from early on in the pandemic [42] . These IFR estimates are also within the range of results from a recent mathematical modelling study in England [43] . Results from the current study to contribute to our understanding of the severity of disease once infected, and confirm that despite the heterogeneity of clinical manifestations, SARS-CoV-2 infection can have severe clinical consequences in a small but substantial proportion of individuals. This study has several limitations. While our study was in progress, we modified the sampling approach to ensure an adequate number of tests for Black residents at a cost of decreasing representativeness by including listed phone numbers to obtain targeted samples. Furthermore, the relatively low response rates (especially among younger participants) could have created additional bias in our estimates. Closing some lesser-used testing sites and having limited evening and weekend testing availability may have deterred some individuals from testing. PCR-based prevalence estimates are limited by the timing of testing relative to infection. An infected individual is most likely to test positive in PCR-based assays within 4 days of the onset of infection. Finally, IgG-based estimates are limited by decreasing antibody assay sensitivity as the time between SARS-CoV-2 infection and testing increases [44] . Our sample did not include residents in long-term care facilities. This group is particularly vulnerable to COVID-19 infection and hospitalization or death which may have led to underestimates of IHR and IFR. For our "ever tested" prevalence estimates, we included selfreported results for a subset of 31 individuals that did not test during the study. The selfreported test results were not confirmed with outside sources (e.g., health care records) and potentially, and only a subset of the patients indicated this as the main reason for not testing during the study. In St. Louis County through October 2020, there was a relatively low prevalence of residents who had ever been infected. Notably, a higher percentage of Black and other minority residents were currently or had ever been infected with COVID-19 compared to White residents. The St. Louis region remained highly vulnerable to widespread infection in late 2020. Routine testing did not identify most true cases, implying that mitigation strategies that depend on identification (e.g., contact tracing) were unlikely to be effective. In understanding the epidemiology of COVID-19, researchers must triangulate among different sources of data to develop an accurate understanding of the transmission dynamics and burden when all methods have shortcomings, and this paper contributes to that understanding. ☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. COVID-19 Global Map Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases 2020 Coronavirus Testing Falls Woefully Short as Trump Seeks to Reopen U.S. -The New York Times 7. RNA Extraction Kits for COVID-19 Tests Are in Short Supply in US | The Scientist Magazine® Nasal Swab Sampling for SARS-CoV-2: a Convenient Alternative in Times of Nasopharyngeal Swab Shortage Comparison of seven commercial RT-PCR diagnostic kits for COVID-19 Racial Segregation, Testing Site Access, and COVID-19 Incidence Rate in Massachusetts Racial Disparities in COVID-19 Testing and Outcomes : Retrospective Cohort Study in an Integrated Health System Understanding Drivers of COVID-19 Racial Disparities: A Population-Level Analysis of COVID-19 Testing among Black and White Populations Patterns of COVID-19 testing and mortality by race and ethnicity among United States veterans: A nationwide cohort study Data-based analysis, modelling and forecasting of the COVID-19 outbreak Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of SARS-CoV-2 in different settings: a mathematical modelling study. The Lancet Infectious Diseases Mathematical modelling of the dynamics and containment of COVID-19 in Ukraine Modeling, state estimation, and optimal control for the US COVID-19 outbreak Update on SARS-CoV-2 seroprevalence: regional and worldwide SARS-CoV-2 seroprevalence worldwide: a systematic review and meta-analysis Estimated seroprevalence of SARS-CoV-2 antibodies among adults in Orange County Seroprevalence of SARS-CoV-2-Specific Antibodies Among Adults Antibodies in Children and Adults in Roche's cobas SARS-CoV-2 Test to detect novel coronavirus receives FDA Emergency Use Authorization and is available in markets accepting the CE mark 28 Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Survey Data Imputation with PROC SURVEYIMPUTE A Simple Method for Approximating the Variance of a Complicated Estimate Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study Invited commentary: "race," racism, and the practice of epidemiology COVID-19 and Racial/Ethnic Disparities Racial Health Disparities and Covid-19 -Caution and Context Racial Disparities in COVID-19 Mortality Among Essential Workers in the United States An Outbreak of Covid-19 on an Aircraft Carrier Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19 How Many SARS-CoV-2-Infected People Require Hospitalization? Using Random Sample Testing to Better Inform Preparedness Efforts SARS-CoV-2 Infection Hospitalization Rate and Infection Fatality Rate Among the Non-Congregate Population in Connecticut Levels of SARS-CoV-2 population exposure are considerably higher than suggested by seroprevalence surveys SARS-CoV-2 antibody magnitude and detectability are driven by disease severity, timing, and assay