key: cord-0755436-c2hgkbg6 authors: Rosenberg, Eli S.; Tesoriero, James M.; Rosenthal, Elizabeth M.; Chung, Rakkoo; Barranco, Meredith A.; Styer, Linda M.; Parker, Monica M.; John Leung, Shu-Yin; Morne, Johanne E.; Greene, Danielle; Holtgrave, David R.; Hoefer, Dina; Kumar, Jessica; Udo, Tomoko; Hutton, Brad; Zucker, Howard A. title: Cumulative incidence and diagnosis of SARS-CoV-2 infection in New York date: 2020-06-17 journal: Ann Epidemiol DOI: 10.1016/j.annepidem.2020.06.004 sha: 73ec0ef86bf8d9857de0c2c9869f2bd731f52d56 doc_id: 755436 cord_uid: c2hgkbg6 PURPOSE: New York State (NYS) is an epicenter of the SARS-CoV-2 pandemic in the United States. Reliable estimates of cumulative incidence in the population are critical to tracking the extent of transmission and informing policies. METHODS: We conducted a statewide seroprevalence study among a 15,101 patron convenience sample at 99 grocery stores in 26 counties throughout NYS. SARS-CoV-2 cumulative incidence was estimated from antibody reactivity by first post-stratification weighting then adjusting by antibody test characteristics. The percent diagnosed was estimated by dividing diagnoses by estimated infection-experienced adults. RESULTS: Based on 1,887 of 15,101 reactive results (12.5%), estimated cumulative incidence through March 29 was 14.0% (95% CI: 13.3-14.7%), corresponding to 2,139,300 (95% CI: 2,035,800-2,242,800) infection-experienced adults. Cumulative incidence was highest in New York City (NYC) 22.7% (95% CI: 21.5-24.0%) and higher among Hispanic/Latino (29.2%), non-Hispanic black/African American (20.2%), and non-Hispanic Asian (12.4%) than non-Hispanic white adults (8.1%, p<.0001). An estimated 8.9% (95% CI: 8.4-9.3%) of infections in NYS were diagnosed, with diagnosis highest among adults ≥55 years (11.3%, 95% CI: 10.4-12.2%). CONCLUSIONS: From the largest US serosurvey to date, we estimated > 2 million adult New York residents were infected through late March, with substantial disparities, although cumulative incidence remained below herd immunity thresholds. Monitoring, testing, and contact tracing remain essential public health strategies. The first cases of COVID-19 were identified in New York State (NYS) in early March, 2020 and since then NYS, particularly the metropolitan New York City (NYC) area, has become one of the mostimpacted communities in the United States [1, 2] . As of June 2, 2020, over 370,000 laboratory-confirmed diagnoses have been made, accounting for approximately 25% of US diagnoses [2, 3] . As with most infections, lab-confirmed diagnoses undercount the true population-level burden of infections; with SARS-CoV-2, the virus that causes COVID- 19 , key factors that contribute to underdiagnosis include absent or mild symptoms and access to testing [4] . Thus although NYS has tested more residents for COVID-19 than any other state (over 2,229,000 persons tested through June 2, 2020), it is likely that laboratory-confirmed cases represent a relatively small portion of the total number of persons with a history of infection in NYS [3] . Estimates of COVID-19 cumulative incidence (i.e. prevalence of previous or current infection) can inform the extent of epidemic spread as well as the number of persons still susceptible and progress towards herd immunity, which are critical for parameterizing simulation models and informing policies, including those for altering societal restrictions [5] . Furthermore, such data provide needed denominators for understanding the extent of diagnosis and rates of hospitalization, morbidity, and mortality, and geographic differences. Antibody testing for SARS-CoV-2 has emerged as an important tool for understanding infection history. Although a several-week window period for development of IgG antibodies and evidence that not all persons with infection develop an antibody response limits their utility for diagnostics, and their interpretation for short-and long-term immunity remain uncertain, as with other infections, antibody prevalence serostudies with validated assays can assess population-level cumulative incidence in the recent past [6] [7] [8] [9] [10] [11] . Antibody serostudies for SARS-CoV-2 are being conducted in other countries and in the US are occurring on the national and county levels, but none have been conducted at the state level and only one population-based serostudy has been peer-reviewed [12] [13] [14] [15] . The current array of recommendations against individual movement and business operation during the pandemic complicates study specimen collection. A recent RNA survey in Iceland and serosurveys in two California counties conducted sampling at centralized testing sites, which offer ease of execution particularly in small geographies, with potentially large self-selection biases [13, 15, 16] . Alternative approaches include random at-home mail-in testing and community-intercept studies in high-traffic locations that remain open [14] . To provide a statewide picture of COVID-19 infection through late-March and diagnoses by early-April 2020, during April 19-28 2020, the NYS Department of Health (NYSDOH) conducted a community-based serostudy throughout NYS. Cumulative incidence among non-institutionalized adults, by geographic and demographic features, was estimated from weighted reactivity rates that were adjusted for validated test characteristics. Combining these findings with cumulative diagnoses enabled estimation of the percent of infections diagnosed. The NYS DOH conducted a convenience sample of over 15,000 New Yorkers attending 99 grocery stores across 26 counties, which contain 87.3% of the state's population, located in all regions of NYS ( Figure) . Grocery stores were chosen as the testing venue because they were classified an essential business to remain open and, due to the necessity of grocery shopping, they attract a heterogeneous clientele [17] . Store locations were chosen to increase sample coverage of the racial and ethnic diversity of the statewide population. Testing occurred over 6 distinct days from 4/19/2020 through 4/28/2020. Each store had a team of 6-8 staff responsible for recruiting participants, collecting specimens, recording data, and managing specimen transport to Wadsworth Center Laboratory (Albany, NY) for analysis. Eligible subjects were adults ≥18 years, New York residents irrespective of county, recruited through a recruitment flyer posted at stores and by systematically approaching each patron as they entered the store. To minimize selection bias, community testing site locations were not announced ahead of time and were changed frequently (i.e., 99 venues in 6 days). Most locations were used only once and no individual site was used more than twice. Testing was halted at locations that became publicized on social media. Patrons were given information about the testing and if interested, completed written informed consent. Procedures included a brief demographic questionnaire and dried blood spot (DBS) collection by trained personnel. Approximately 13% of participants initially had missing demographic data. Staff attempted to capture these data through >2,500 follow-up phone calls, reaching all but approximately 75 participants, who were subsequently excluded from analyses. Test results were delivered to participants by text-message if non-reactive and by phone if indeterminant or reactive. Testing approach Blood was collected by fingerstick onto custom 903 filter paper cards labeled with a specimen ID. Cards were dried for 3-4 hours at ambient temperature and transported to the Wadsworth Center. A fully saturated ≥3-mm diameter DBS was required. A total of 525 DBS cards from eligible individuals were rejected; 433 with insufficient or improperly collected blood, 92 with no specimen ID. Acceptable DBS cards were processed for testing. SARS-CoV IgG testing was conducted using a microsphere immunoassay (MIA) developed and validated for DBS by the NYSDOH Wadsworth Center. Briefly, nucleocapsid (N) antigen-coupled magnetic beads were incubated with blood eluted from a 3-mm DBS punch. Phycoerythrin-labeled goat anti-human IgG secondary antibody was used to detect microsphere-bound IgG antibodies and median fluorescence intensity (MFI) was determined using a FlexMap 3D (Luminex Corp., Austin, TX). The mean MFI of 90-100 negative DBS was used to set cut-offs; results greater than the mean MFI plus 6 standard deviations (SD) were reported as reactive; results less than the mean MFI + 3 SD were nonreactive and results between mean MFI +3 to +6 were indeterminate. Serosurvey testing was initiated with SARS-CoV IgG v1 which used SARS-CoV-1 N antigen (Wadsworth Center, Albany, NY), and was completed [6, 7] . We estimated SARS-Cov-2 cumulative incidence from observed antibody reactivity using two sequential steps: 1) post-stratification weighting to standardize to the New York State population and 2) adjustment by estimated antibody test characteristics. Using the National Center for Health Statistics bridged-race file, weights were assigned to each participant based on their membership in each of 160 strata of sex, race and ethnicity (Hispanic, non-Hispanic white, non-Hispanic black, non-Hispanic Asian, and non-Hispanic other), age (18-34, 35- [18] . Post-stratification weights were defined as the proportion each stratum is represented in the state's population divided by the analogous proportion in the sample [19, 20] . Next, we computed weighted frequencies for the percent reactive statewide, with one-way stratifications by sex, race and ethnicity, age group, and region, and two-way stratifications within levels of region, including 95% confidence intervals (CI), with differences assessed using Rao-Scott χ 2 tests [21] . Indeterminate results were assumed non-reactive and statistical procedures were two-sided at α=0.05. In the second step, weighted reactivity estimates ‫(‬ ௧௩ ) and their 95% CI bounds were corrected for test sensitivity and specificity, based on validation data, to yield cumulative incidence, per Bayes' Rule as applied to the diagnostic 2x2 table: ‫݁ݒ݅ݐ݈ܽݑ݉ݑܿ‬ ݅݊ܿ݅݀݁݊ܿ݁ = ೌೡ ା ௦௧௬ିଵ ௦௦௧௩௧௬ା௦௧௬ିଵ [13, 22] . Primary analyses used the sensitivity and specificity point-estimates from the validation studies, with sensitivity analyses at the extremes of test characteristics' 95% CI ([96.1% specificity, 92.1% sensitivity], [100% specificity, 83.7% sensitivity]). Test-characteristic adjusted cumulative incidence values were multiplied by the one-and two-way non-institutionalized adult populations (e.g. excluding settings such as prisons and nursing homes) from the American Community Survey 2014-2018 Public Use Microdata Sample file [23] . This yielded the estimated total 'infection-experienced' adults with SARS-CoV-2 within each stratum. With a study mid-point of April 23, and literature estimates of mean 4 days from infection to symptom onset and mean 21 days from onset to IgG detection, results represent cumulative incidence through approximately March 29 [6, 8, 24] . In NYS, diagnostic testing for SARS-CoV-2 is mandatorily reported electronically to NYSDOH. Using cumulative diagnoses reported and total numbers of infection-experienced adults, we estimated the percent of infections diagnosed overall and by region, sex, and age. For primary analyses, we accumulated diagnoses through April 9, based on the March 29 final infection date, 4 days to symptom onset, and mean 7 days from onset to diagnosis. Supplemental upper-bound estimates used the last plausible diagnosis date of May 8 th , based on the April 28 final study day, 4 days being earliest time from onset to IgG detection and allowing PCR detection up to 14 days post-onset [8] . Across NYS, a total of 15,626 adult residents with complete data were tested, of whom 15,101 (96.6%) had suitable specimens, of which 1,887 (12.5%) were reactive and 340 (2.3%) indeterminate. Following weighting, 12.5% were estimated reactive and following further adjustment for test characteristics, estimated cumulative incidence was 14 (Table 2 ). Males had significantly higher cumulative incidence in all regions outside of, but not within NYC. The patterns of racial disparity observed statewide were similar and statistically significant within NYC, Westchester/Rockland, and Long Island, but not in ROS. In each of the former 3 regions, Hispanic/Latino persons represented >37% of infection-experienced adults, whereas in the latter non-Hispanic whites comprised a majority of infection-experienced adults (79.4%). An estimated 8.9% (95% CI: 8.4-9.3%) of infections in NYS were diagnosed as of April 9 th 2020 (Table 3 ). Males (9.4%, 95% CI: 8.8-10.1%) had higher diagnosis levels than females (8.2%, 95% CI: 7.7-8.8%)). Those ≥55 years were most likely to be diagnosed (11.3%, 95% CI: 10.4-12.2%). Diagnosis rates in NYC (7.1%, 95% CI: 6.7-7.5%) and ROS (7.5%, 95% CI: 6. Table 8 ). From the largest US SARS-CoV-2 serosurvey to-date, we estimated that over 2 million adult NYS residents were infected through the end of March. Our findings estimate the extent of transmission of and community experience with SARS-CoV-2, particularly in the NYC metropolitan region. Despite large numbers of persons acquiring SARS-CoV-2, this represents only 14.0% of adult residents, suggesting that, even in this COVID-19 epicenter, the epidemic is substantially below the estimated ~70% US herd immunity threshold [25] . Against this remaining epidemic potential, ongoing vigilance through rigorous and extensive epidemic monitoring, testing, and contact tracing are necessary components for predicting, preventing, and/or mitigating a second epidemic wave, consistent with state and federal guidance for reopening [5, 26] . This vigilance is needed even in the rest of NYS outside the metropolitan region, which are in the first phases of reopening in NYS, and where lowest cumulative incidence suggests the highest proportion susceptible. Our finding of higher cumulative incidence in the regions of the NYC metropolitan area, particularly NYC, is consistent with the known distribution of diagnoses. Further, in these regions of high urbanicity, significant racial/ethnic disparities in infection history were found, with minority communities experiencing disproportionate risk. The drivers of greater COVID-19 risk and disparities in urban areas continue to be studied, but may relate to population density and the mechanisms by which transportation, employment, housing, and other socioeconomic or environmental factors shape opportunities for transmission [27] [28] [29] . A recent NYS study on a random sample of COVID-19 hospitalizations showed limited racial/ethnic differences in clinical outcomes, suggesting that observed differences in mortality by race and ethnicity may be in large part driven by different infection histories in the community [3, [30] [31] [32] . Research is needed to understand the drivers of increased COVID-19 risk experienced by minority communities, followed by actions to improve health equity. The finding that over 8.9% of adults were diagnosed reveals both the opportunities for further expansion of diagnostic testing in NYS, yet in the context of far higher diagnosis and testing levels than other US settings suggests substantial progress to-date [1, 13] . Compared to all persons with infection history, there was a higher representation of males and those over age 55 among diagnosed persons. Given the lower reactivity rates observed among this age group, our results expand observations from previous studies that older adults may be more likely to exhibit symptoms or illness or be more likely to seek care [30, [33] [34] [35] . Although not an aim of this analysis, we note that in conjunction with 12,822 publicly-reported COVID-19 deaths for NYS through April 17 (reflecting median 19 days-post-infection to death), our findings suggest an infection fatality ratio (IFR) of 0.6%. This estimate is in-line with estimates of 0.5-1.0% observed in other countries, however additional analyses are needed to more precisely estimate the IFR in NYS [36, 37] . Strengths of our study include a large sample, which contained 0.1% of the adult NYS population, and a systematic sampling approach in one of the only open public venues in the state, where a necessary commodity is purchased. Although a convenience sample, survey weights adjusted for biased demographic/geographic representation, noting that the general agreement of unweighted and weighted results suggests demographic representativeness of the study sample, and we further adjusted results for assay performance, under varied scenarios. Our study may nevertheless be limited by residual non-representativeness of the underlying population. This includes potential undersampling of persons from vulnerable groups who might be less likely to go grocery shopping. For this to impact our findings, those remaining home would need to have differential antibody prevalence compared to their age/sex/racial-ethnic/regional group peers. If persons staying at home had lower prevalence due to self-isolation, our study's cumulative incidence would be a slight overestimate. Further, our sample did not include those who have died from COVID-19 or those who reside in long term care facilities which have been differentially impacted, causing a slight underestimate, nor those in the hospital or athome due to COVID-19 illness, some of whom would be expected to have detectable antibodies [38, 39] . Such actively symptomatic persons would be expected to be a small portion of the cumulative infection burden since the outbreak's commencement, and given most would have been infected after March 29, their exclusion also likely causes observed values to be overestimated. Although data are limited on the potential for self-selection to alter our results, a recent Icelandic study found comparable prevalence when participants were tested following online self-registration vs. random invitation [16] . This finding, in conjunction with our systematic community intercept approach, suggest that this bias may be small, outside of outright non-response. We note that although every effort was made to ensure unbiased sampling through a DOH staff-led recruitment process, patron-initiated requests for testing were honored, and in some sites accounted for a significant percentage of total tests performed. It is possible that customers who seek out testing may be more likely to have been exposed to SARS-CoV-2. If true, our estimate of cumulative incidence would be overestimated. Another source of potential recruitment bias comes from patron refusal to be tested, either upon initial request or after agreeing to participate. Although not systematically collected, nightly report outs by testing leads indicated that most persons approached agreed to be tested and that few persons left after agreeing to be tested, regardless of wait time, supporting low non-response. Results presented may differ from publicly discussed preliminary estimates, given both our inclusion of more participants and analytic adjustments for test characteristics. Timeframes utilized for cumulative infections and diagnoses are approximate, being based on the evolving SARS-CoV-2 immunological and testing literature, with the 10-day sampling period during a linear-growth phase of the epidemic. The findings of this study suggest extensive SARS-CoV-2 transmission in NYS and highlight the remaining opportunities for prevention and diagnosis. As the epidemic grows in other regions of the country, this study offers a potential model for other jurisdictions to monitor their epidemic. Estimates of cumulative incidence can be combined with diagnostic totals, or other epidemic markers such as mortality, to provide a holistic epidemic view during a time of unprecedented pandemic and to best craft high-impact approaches to prevention, containment, treatment and mitigation. Sensitivity analyses for cumulative incidence and percent diagnosed Table 6 . Reactivity and test-characteristic adjusted cumulative incidence of COVID-19, overall and by demographic factors and region: sensitivity 92.1%, specificity 96.1% Test-characteristic adjusted estimated cumulative incidence COVID-19 Testing, Epidemic Features, Hospital Outcomes, and Household Prevalence COVID-19 United States Cases Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Amid Ongoing COVID-19 Pandemic Antibody Detection and Dynamic Characteristics in Patients with COVID-19 Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019 Interpreting Diagnostic Tests for SARS-CoV-2 COVID-19 and Postinfection Immunity: Limited Evidence, Many Remaining Questions Estimating Prevalence of Hepatitis C Virus Infection in the United States Reopening Society and the Need for Real-Time Assessment of COVID-19 at the Community Level Seroprevalence of immunoglobulin M and G antibodies against SARS-CoV-2 in China COVID-19 Antibody Seroprevalence NIH. NIH begins study to quantify undetected cases of coronavirus infection 2020 Seroprevalence of SARS-CoV-2-Specific Antibodies Among Adults Spread of SARS-CoV-2 in the Icelandic Population New York State on PAUSE 2020 Census Populations With Bridged Race Categories Post Stratification Analysis of Health Surveys On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data Modern Epidemiology Public Use Microdata Sample (PUMS) Documentation Dynamics of anti-SARS-Cov-2 IgM and IgG antibodies among COVID-19 patients Herd immunity -estimating the level required to halt the COVID-19 epidemics in affected countries Opening Up America Again High population densities catalyse the spread of COVID-19 Racial Health Disparities and Covid-19 -Caution and Context Assessing Differential Impacts of COVID-19 on Black Communities Association of Treatment With Hydroxychloroquine or Azithromycin With In-Hospital Mortality in Patients With COVID-19 in Disparities In Outcomes Among COVID-19 Patients In A Large Health Care System In California Hospitalization and Mortality among Black Patients and White Patients with Covid-19 COVID-19): Cases in Clinical Characteristics of Covid-19 in Using Early Data to Estimate the Actual Infection Fatality Ratio from COVID-19 in France The Rate of Underascertainment of Novel Coronavirus (2019-nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights COVID-19 in a Long-Term Care Facility Asymptomatic and Presymptomatic SARS-CoV-2 Infections in Residents of a Long-Term Care Skilled Nursing Facility True cumulative incidence is greater than 0, as evidence by diagnosed cases, rendering study-based cumulative incidence inestimable under these test characteristics. Values that cannot be estimated are indicated with ** in table. 757/2,735 25.8 Stratified estimates may not exactly sum to total due to rounding and differences between weighting scheme and non-institutionalized population totals b. Boroughs of Bronx Number