key: cord-0964678-jhf2hhlx authors: Routledge, Isobel; Epstein, Adrienne; Takahashi, Saki; Hakim, Jill; Janson, Own; Duarte, Elias; Turcios, Keirstinne; Vinden, Joanna; Sujishi, Kirk; Rangel, Jesus; Coh, Marcelina; Besana, Lee; Ho, Wai-Kit; Oon, Ching-Ying; Ong, Chui Mei; Yun, Cassandra; Lynch, Kara; Wu, Alan; Wu, Wesley; Karlon, William; Thornborrow, Edward; Peluso, Michael; Henrich, Timothy; Pak, John; Briggs, Jessica; Greenhouse, Bryan; Rodriguez-Barraquer, Isabel title: Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco date: 2021-02-04 journal: Res Sq DOI: 10.21203/rs.3.rs-180966/v1 sha: 27a572187938564738b622cbb250642d28e81dc3 doc_id: 964678 cord_uid: jhf2hhlx Serosurveillance provides a unique opportunity to quantify the proportion of the population that has been exposed to pathogens. Here, we developed and piloted Serosurveillance for Continuous, ActionabLe Epidemiologic Intelligence of Transmission (SCALE-IT), a platform through which we systematically tested remnant samples from routine blood draws in two major hospital networks in San Francisco for SARS-CoV-2 antibodies during the early months of the pandemic. Importantly, SCALE-IT allows for algorithmic sample selection and rich data on covariates by leveraging electronic medical record data. We estimated overall seroprevalence at 4.2%, corresponding to a case ascertainment rate of only 4.9%, and identified important heterogeneities by neighborhood, homelessness status, and race/ethnicity. Neighborhood seroprevalence estimates from SCALE-IT were comparable to local community-based surveys, while providing results encompassing the entire city that have been previously unavailable. Leveraging this hybrid serosurveillance approach has strong potential for application beyond this local context and for diseases other than SARS-CoV-2. The rapid spread of the SARS-CoV-2 virus has laid bare important gaps in routine infectious diseases 14 surveillance. Serological data, particularly when collected at high spatial and temporal resolutions, are a 15 key resource for addressing many key epidemiological questions since they directly quantify the proportion 16 of the population that has been infected by a pathogen 1, 2 . For SARS-CoV-2, serology is particularly useful given the high levels of disease under-ascertainment: serologic surveillance is the gold standard for 18 estimating attack rates (the proportion of the population that has been infected) and highly complementary 19 to virologic and syndromic surveillance systems for providing vital information on where a population is 20 along the epidemic curve 3 . Population-based serosurveys that employ a probabilistic sampling frame are 21 considered to be the gold standard for estimating seroprevalence. However, performing large population-22 based serosurveys can be prohibitively resource-intensive to initiate swiftly or perform repeatedly, 23 especially during an ongoing outbreak, as demonstrated by the relative sparsity of population-based vs. 24 convenience sampled serosurveys for SARS-CoV-2 that have been conducted to date 3 . For example, to 25 date, no population-based serosurveys have been conducted for the city of San Francisco or wider Bay 26 Area, and few have been conducted in the United States, limiting our ability to identify of risk factors for 27 infection, understand population level immunity, and determine which populations and localities may be in 28 need of targeted public health resources such as testing, contact tracing, or vaccine allocation 4 . 29 Residual blood samples from readily available sources (e.g., blood donors or remnant samples collected 30 from routine medical care visits), especially when linked to individual-level meta-data, provide a unique 31 opportunity to address these limitations and to efficiently survey a population for antibodies over an 32 extended period of time 5,6 . Such studies were found to be useful in the 2009 H1N1 influenza pandemic 7-13 , 33 facilitating analyses on a broader spatial and temporal scale than typical cross-sectional serological surveys 34 allow. However, in most studies that use residual blood samples the source population is unknown 14 . This 35 presents a major limitation, as the results are difficult to interpret when it is not known whether the sampled 36 population is representative of the population of interest. 37 The San Francisco Bay Area has widely been recognized for taking an early and proactive response to 38 COVID-19. San Francisco Bay Area counties introduced a shelter-in-place order on 17 March 2020, 39 requiring residents to remain at home unless leaving the house for essential activities. Relative to many 40 other US cities, few cases were detected in San Francisco during the early months of the epidemic, a pattern 41 which continued as the pandemic progressed. However, like many other areas, a high proportion of asymptomatic infections and limited access to diagnostic testing during this time makes it difficult to 43 interpret these numbers. Results from an early San Francisco seroprevalence study conducted on 44 convenience samples in late March to early April 2020 suggested that <1% of the population had been 45 infected overall 16 , in contrast to a seroprevalence of >6% estimated by a community study focusing on a 46 specific neighborhood, particularly among the Hispanic/Latinx population 17 . The lack of citywide, 47 representative seroprevalence estimates during this time period limits the ability to determine to what 48 degree these discrepancies reflect heterogenous exposure or differences in study design. 49 Here we present a blueprint and early results of the ongoing SCALE-IT study (Serosurveillance for 50 Continuous, ActionabLe Epidemiologic Intelligence of Transmission), leveraging residual sera samples 51 from two large hospital systems in San Francisco, California to quantify the prevalence of SARS-CoV-2 52 antibodies. Importantly, these remnant samples are linked to electronic medical records (EMRs) enabling 53 careful algorithmic selection based on demographic and clinical variables, improving their 54 representativeness to the general population. We tested over 5,000 samples collected from late March to 55 June 2020 from San Francisco residents, and calculated raw and adjusted seroprevalence estimates over 56 space, time, and socio-demographic indicators. These data provide estimates of the overall seroprevalence 57 in San Francisco during the initial phase of the local SARS-CoV-2 outbreak and highlight spatial and 58 demographic heterogeneities in transmission across the city. After obtaining the list of eligible samples according to the above criteria, we selected serum samples for 92 the study using a sampling algorithm aimed to ensure an adequate sample size for each of five age strata 93 and to maximize geographic representativity. After setting a daily target sample size for our overall 94 population, we divided this equally between five age bins to set a target sample size for each age bin. We 95 also set a target sample size for each zip code which was proportional to its population size. For each 96 zipcode with a larger number of eligible samples than its target size, we kept all samples from age groups 97 with sample sizes below or at their target and obtained a random sample from any age group that had an 98 eligible sample size above the target size. We intentionally over-sampled pregnant women as a healthy 99 sentinel population by aiming to obtain up to 10% of the samples from pregnant women undergoing routine 100 care, as defined by ICD-10 codes. We used two serologic assays for this study in order to maximize assay specificity. First, we screened all 111 samples using an in-house ELISA assay, and then performed confirmatory testing on a subset of samples 112 above a threshold value using an in-house Luminex assay. The ELISA assay detected IgG to the receptor 113 binding domain (RBD) of the spike (S) protein, based on published protocols with minor modifications 21 . 114 Briefly, 1 ug of RBD was used to coat each well of 384-well high binding plates, secondary antibody was 115 individual patients, from UCSF Health (n=3037 patients) and ZSFG (n=1698 patients) (Figure 1) . By 167 design, the age distribution of sampled individuals remained consistent throughout the study period, and the geographic distribution of residents matched the proportion of the San Francisco population living in 169 each zip code (Figure 2 ). Our sample did not achieve the target sample size for the youngest age group due 170 to the limited number of children receiving routine phlebotomy in the UCSF and ZSFG health systems 171 (Table 1) [CrI]: 2.1%-6.3%). Based on the number of cases reported during the period covered by the study, we 180 estimate that only 4.9% of all infections were ascertained by the reporting system (95% CrI: 3.3%-9.9%) 181 (Supplementary Text 1). Amongst pregnant women seeking routine care (N=268), we estimated a raw 182 seroprevalence of 3.4% (9/268 seropositive), and after adjusting for test performance characteristics we 183 estimate 3.5% (95% CrI: 1.1 -6.4%) seroprevalence amongst this group. This estimate in our sentinel 184 population group is consistent with the estimates across our overall population of samples. 185 We did not observe statistically significant differences in seroprevalence by age ( Figure 3A ) or hospital 187 system (Supplementary Table 2) . We found seroprevalence to be nearly twice as high in uninsured 188 individuals (6.3%, 95% CrI: 3.1 -9.9%)) than in those with some form of insurance, [Private/Commercial: 189 3.4% (95% CrI: 1.6 -4.7%); Government: 4.0% (95% CrI: 2.3 -5.0%)] (Figure 3B) . With respect to 190 race/ethnicity, seroprevalence was highest in those identifying as Hispanic (6.3%, 95% CrI: 4.4-8.3%) 191 followed by Black or African American (4.8%, 95% CrI: 2.8-7.0%), and lowest in those who identified as 192 ( Figure 3D ). Although these samples were obtained over a three-month collection period, given the 195 relatively low attack rate during these initial stages of the pandemic in San Francisco, we were not able to 196 detect meaningful differences in seroprevalence over time (Supplementary Table 2) . 197 198 Geographically, we found seroprevalence to be highest in the Bayview neighborhood in the southeast region 199 of the city, at 8.1% (95% CrI: 4.6%, 12.3%) (Figure 4A, Supplementary Table 3) . Although several other 200 neighborhoods had similarly high seroprevalences, there was much more uncertainty around these estimates 201 ( Figure 4B) . These findings are consistent with patterns of incidence in the city during this period of time 202 ( Figure 4C) . We identified 157 individuals who were homeless in our study, and amongst this group 203 seroprevalence was estimated to be 10.8% (95% CrI: 6.1%, 16.5%). 204 As validation of the representativity of our approach using curated remnant samples, we compared results 206 from this study to two contemporaneous community-based serosurveys conducted in specific 207 neighborhoods of San Francisco. First, we compared these results to a cross-sectional serosurvey carried 208 out in a census tract within the Mission District (census tract 022901, zip code 94110) between April 25 209 and April 28, 2020 17 . Chamie et al tested 2,545 census tract residents for SARS-CoV-2 antibodies and 210 estimated seroprevalence to be 3.1% (95% CI: 2.5-3.9%). This is consistent with our findings of 3.8% 211 seroprevalence (95% CrI: 1.8-6.3%) between April and June 2020 in the broader Mission District 212 neighborhood. Second, we compared our results to a cross-sectional serosurvey carried out in two census 213 tracts in San Francisco's 10th District between May 30 and June 2, 2020 (https://unitedinhealth.org/sf-214 district-10), located in the Bayview neighborhood. Among the nearly 1,600 individuals tested for 215 antibodies, seroprevalence was estimated at 5.6% in Latinx participants (n=320), 2.3% in Black participants 216 (N= 397) and 0.4% in white participants (n=231). The relatively high seroprevalence we detected in the 217 Bayview neighborhood through our study is comparable to the results of this community-based study, and 218 the disparities by race/ethnicity were similar in direction, though different in magnitude, to those identified comparison also rely upon convenience sampling as participation in the studies was voluntary, and therefore 221 may contain inherent selection biases themselves. In this study, we developed and piloted a scalable and systematic pipeline using remnant samples from two 251 major hospital networks in San Francisco to select, collect, and test specimens for SARS-CoV-2 antibodies 252 (SCALE-IT). Through this effort, we estimated seroprevalence during the early months of the epidemic to 253 be relatively low throughout San Francisco (4.2%), but still representing more than 20 times the number of 254 infections identified by PCR-confirmed cases at that time. This may be due to the limited availability of 255 PCR testing during the beginning of the pandemic and the lack of testing of asymptomatic individuals. We 256 also identified important disparities in seroprevalence at the neighborhood level, with highest 257 seroprevalence in the Bayview neighborhood in the southeast region of the city, as well as 258 disproportionately higher seroprevalence in individuals experiencing homelessness and those identifying 259 as Hispanic, Black/African American, or male. Leveraging this hybrid serosurveillance approach has 260 potential for broad application beyond this local context and for diseases other than SARS-CoV-2. 261 The heterogeneities in seroprevalence we observed by race/ethnicity and socio-economic status --here 263 obtained from EMR data on health insurance status and whether individuals were housed --echo patterns 264 which have been highlighted over the course of the pandemic at national and global levels 29,30 . Specific to 265 San Francisco, our results provide estimates of SARS-CoV-2 cumulative exposure at a granular spatial 266 resolution with a scope covering the entire city; despite low overall seroprevalence, we identified specific 267 neighborhoods with disproportionately higher seroprevalence. Interestingly, we also found seroprevalence 268 to be approximately twice as high in those identifying as male compared to female. Potential explanations 269 for this difference include differential pathogen exposure by sex, which is supported by findings of other 270 studies in San Francisco, finding PCR positivity rates of 1.2% (20/1658) in women and 3.3% (63/1908) 271 in men, with an odds ratio of 2.71 (1.64-4.69) for PCR positivity in males, and also that the majority (74%,) 272 of those who tested positive by PCR or were seropositive for SARS-CoV-2 were frontline workers and 273 unable to shelter-in-place 17 , it has been found that males and females mount different immune responses 274 and infection severity 31 , which could affect assay sensitivity, however we believe this is unlikely to explain the large difference we see in our estimates as we do not see sex-based differences in the sensitivity of our 276 assay on the positive controls used in the study, which represent a range of disease severities. 277 278 While a key strength of our approach was leveraging residual sera from two large health system networks 279 and using data from EMRs to algorithmically select samples for inclusion, there are limitations to this type 280 of surveillance that require consideration. Most obviously, patient samples may not be fully representative 281 of the underlying population. This may be particularly true during "shelter-in-place" periods, when 282 behavioral changes may affect the availability and characteristics of the patient population. These issues 283 can ideally be mitigated by careful sample selection, as done here by focusing on a subset of outpatients, 284 with the possibility of further refinement by inclusion of additional selection criteria (e.g., by restricting or 285 weighting sampling to consider specific visit types or underlying conditions). Representativity of the 286 serosurveillance system could also be enhanced by including a broader network of local health systems. 287 We also recognize that the generalizability of our findings may differ by age groups, and is likely to be 288 lower in children who were under-represented in our sample set despite the stratified sampling framework. 289 Additional study designs, such as school-based serosurveys, could be leveraged to augment these data to 290 prospectively assess seroprevalence in specific age-groups, possibly by using non-invasive, saliva-based 291 antibody testing 32 . Despite including over 5,000 samples, our study was not powered to detect differences 292 between covariates or by time in a multiple regression framework, in part due to San Francisco's success 293 in maintaining low transmission and thus low seroprevalence during this time period. Lastly, while we 294 validated our estimates against results from a couple of available community based studies, further 295 validation would be ideal to assess validity of results and findings. 296 In this pilot study, we developed and implemented a SARS-CoV-2 serosurveillance system to detect 298 population-level pathogen exposure in near-real time, and demonstrated how data collected through this 299 platform were comparable to results from more resource intensive community-based serological studies 300 and incidence data. The appeal of this hybrid approach is that it achieves many of the strengths of population-based surveys and provides rich data, while leveraging existing infrastructure to allow for much 302 greater efficiencies often seen in convenience sampling approaches. Using EMR data, we were able to 303 develop a stratified sampling frame, ensuring improved representativeness of the results in contrast to 304 serosurveys performed using convenience samples without these key pieces of information 14 . At the same 305 time, we used these data to identify important spatial and demographic heterogeneities in seroprevalence 306 within our study site; serosurveys performed on residual samples are often limited to coarser levels of meta-307 data on the sampled population 33 . The relative ease with which SCALE-IT can be implemented means that 308 it can be deployed over a broad geographic scale, continuously over time, and dynamically adjusted to 309 address specific surveillance needs. 310 We envision multiple lines of work for future directions. First, the samples that we have selected, collected, 312 and processed in this work could serve as a valuable biorepository for future applications. The ability to 313 link rich EMR data to a large bank of well-curated serum samples opens up opportunities for additional 314 analysis including longitudinal studies of patients. Second, as serosurveillance efforts will be fundamental 315 to monitor SARS-CoV-2 transmission rates and evaluate the impact of control interventions (both NPIs and 316 pharmaceuticals) over the coming months and years, future work could leverage these and prospective 317 serological data to parametrize mechanistic models and to study the effects of control strategies on infection 318 rate. Third, as discussed by others 1,2 , our local SCALE-IT platform could easily be expanded to contribute 319 to a 'Global Immunological Observatory' to perform serosurveillance for other pathogens beyond the 320 SARS-CoV-2 virus. Data generated by such an observatory could be used to address specific public health 321 gaps including serosurveillance for seasonal pathogens such as influenza or emerging infections. Lastly, 322 the insights gained from developing this platform could serve as a blueprint for adoption by other health 323 systems in various contexts. 324 325 Use of serological surveys to generate key insights into the changing global 363 landscape of infectious disease A Global lmmunological Observatory to meet a time of pandemics SeroTracker: a global SARS-CoV-2 seroprevalence dashboard High prevalence of antibodies to the 2009 pandemic influenza A(H1N1) virus in the 386 Norwegian population following a major epidemic Seroprevalence of Influenza A(H1N1)pdm09 Virus Antibody Sero-immunity and serologic response to pandemic influenza A (H1N1) 2009 virus Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United 393 Johns Hopkins Coronavirus Resource Center SARS-CoV-2 seroprevalence and neutralizing activity in donor and patient blood Community Transmission of Severe Acute Respiratory Syndrome Coronavirus 2 Disproportionately Affects the Latinx Population During Shelter-in-Place in San Francisco Universal PCR and antibody testing demonstrate little to no transmission of SARS CoV-2 in a rural community Annual Reports SARS-CoV-2-specific ELISA development EPPIcenter/flexfit: Flexible format standard curve fitting and data processing antigens to assess changes in malaria transmission using sero-epidemiology Conditional dependence between tests affects the 414 diagnosis and surveillance of animal diseases Google Geocoding API Core Team. R: A language and environment for statistical computing. R Foundation for Statistical 418 Stan Development Team. 2020. Stan Modeling Language Users Guide and Reference Manual Antibody responses to SARS-CoV-2 in patients with COVID-19 Johns Hopkins Coronavirus Resource Center COVID-19 Hospitalization and Death by Race/Ethnicity 426 (COVID-19) Sex differences in immune responses that underlie COVID-19 disease outcomes Supervised self-collected SARS-CoV-2 testing in indoor summer camps to inform 432 school reopening Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on 434 dialysis in the USA: a cross-sectional study We acknowledge the significant contribution to this work made by the following persons and organizations: 328Dr. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of 357 the manuscript. None of the authors have conflicts of interest to disclose. 358 359