key: cord-0692081-p5t6gd5n
authors: Siegler, Aaron J.; Sullivan, Patrick S.; Sanchez, Travis; Lopman, Ben; Fahimi, Mansour; Sailey, Charles; Frankel, Martin; Rothenberg, Richard; Kelley, Colleen F.; Bradley, Heather
title: Protocol for a national probability survey using home specimen collection methods to assess prevalence and incidence of SARS-CoV-2 infection and antibody response
date: 2020-08-11
journal: Ann Epidemiol
DOI: 10.1016/j.annepidem.2020.07.015
sha: dac2cd94c287fa05075f433553ec188e624c5395
doc_id: 692081
cord_uid: p5t6gd5n

BACKGROUND: The US response to the SARS-CoV-2 epidemic has been hampered by early and ongoing delays in testing for infection; without data on where infections were occurring and the magnitude of the epidemic, early public health responses were not data-driven. Understanding the prevalence of SARS-CoV-2 infections and immune response is critical to developing and implementing an effective public health response. Most serological surveys have been limited to localities that opted to conduct them and/or were based on convenience samples. Moreover, results of antibody testing are subject to high false positive rates due to a presumably low prevalence of seroconversion and imperfect test specificity. METHODS: We will conduct a national serosurvey for SARS-CoV-2 positivity and immune response. A probability sample of US addresses will be mailed invitations and kits for the self-collection of anterior nares swab and finger prick dried blood spot specimens. Within each sampled household, one adult 18 years or older will be randomly selected and asked to complete a questionnaire and to collect and return biological specimens to a central laboratory. Nasal swab specimens will be tested for SARS-CoV-2 RNA by RNA PCR; dried blood spot specimens will be tested for antibodies to SARS-CoV-2 (i.e., immune experience) by enzyme-linked immunoassays. Positive screening tests for antibodies will be confirmed by a second antibody test with different antigenic basis to improve predictive value of positive (PPV) antibody test results. All persons returning specimens in the baseline phase will be enrolled into a follow-up cohort and mailed additional specimen collection kits 3 months after baseline. A subset of 10% of selected households will be invited to participate in full household testing, with tests offered for all household members aged ≥3 years. The main study outcomes will be period prevalence of infection with SARS-CoV-2 and immune experience and incidence of SARS-CoV-2 infection and antibody responses. RESULTS: Power calculations indicate that a national sample of 4,000 households will facilitate estimation of national SARS-CoV-2 infection and antibody prevalence with acceptably narrow 95% confidence intervals across several possible scenarios of prevalence levels. Oversampling in up to 7 populous states will allow for prevalence estimation among sub-populations. Our 2-stage algorithm for antibody testing produces acceptable PPV at prevalence levels ≥1.0%. Including oversamples in states, we expect to receive data from as many as 9,156 participants in 7,495 US households. CONCLUSIONS: In addition to providing robust estimates of prevalence of SARS-CoV-2 infection and immune experience, we anticipate this study will establish a replicable methodology for home-based SARS-CoV-2 testing surveys, address concerns about selection bias, and improve positive predictive value of serology results. Prevalence estimates of SARS-CoV-2 infection and immune experience produced by this study will greatly improve our understanding of the spectrum of COVID-19 disease, its current penetration in various demographic, geographic and occupational groups, and the clinical range of symptoms associated with infection. These data will inform resource needs for control of the ongoing epidemic and facilitate data-driven decisions for epidemic mitigation strategies.

The global pandemic of SARS-CoV-2 and its associated illness (Coronavirus Disease 2019, or have emerged very quickly, challenging traditional systems of clinical and public health response. 1, 2 There is broad consensus that the United States response to the COVID-19 epidemic has been hampered by lack of adequate testing for SARS-CoV-2. [3] [4] [5] [6] Globally, available statistics representing the scale and growth of the epidemic are based on the numbers of people diagnosed and reported with SARS-CoV-2 infections and the number of people who have died from COVID-19 disease. These measures are informative but biased: diagnoses of COVID-19 disease predominantly count people who were sufficiently sick and symptomatic that they were tested. Moreover, the data are differentially biased by time and jurisdiction. Testing policies have changed over time as test availability increases, and testing policies in heavily impacted areas may be more restrictive for people with mild illness than policies in less impacted areas. Importantly, there are limited population-based data about the proportion of people who become infected with SARS-CoV-2 who remain asymptomatic or about the proportion of people who may already possess antibodies to the virus.

Traditional public health surveillance programs that are linked to disease prevention efforts focus on diagnosing people with an infectious disease and then helping them take steps to minimize the risks of onward transmission. Surveillance data to characterize epidemics are collected from testing and intervention programs, and surveillance data improve as public health screening and testing programs grow. In the COVID-19 epidemic, the traditional public health model in the United States has been disrupted because of how fast the epidemic emerged and limited testing and contact tracing capacity. There are limited resources for testing in terms of supplies (e.g., shortages of swabs for collection, viral transport media, and personal protective equipment for healthcare workers collecting invasive specimens) 7, 8 and personnel to collect samples. Because of these limitations, in many areas testing resources have been focused on the sickest people, providing testing data that present an underestimate of the true extent of the epidemic and that differentially undercount mildly symptomatic and asymptomatic people. Therefore, it is critical to develop a representative depiction of the distribution of SARS-CoV-2 infection and immune experience to inform public health policies and prevention and control interventions.

The field of antibody testing is rapidly evolving, and our understanding of the clinical significance of seropositivity is limited. Local serosurveys using a variety of sampling methods have reported J o u r n a l P r e -p r o o f relatively low prevalence estimates for seropositivity. [9] [10] [11] For low prevalence serosurveys, the predictive value of positive antibody tests is a substantial issue. With overall prevalence findings in many surveys in the single digits, even slight performance problems in specificity could result in substantial changes in outcomes. For instance, if a serosurvey in a population with 'true' prevalence <6% uses an assay with ≤97% specificity, most positive specimen findings would be false positives. Moreover, serological surveys are a lagging indicator of infection, with one study identifying median time to detectable seroconversion to be 13 days post-exposure across antibody types 12 and another finding 15-20 days post-exposure across antibody types. 13 An additional concern of some existing methods, unrelated to bias, regards the appropriateness of conducting in-person testing in the face of limited availability of testing resources for persons who are ill and limited personal protective equipment for healthcare workers An optimal study design might avert use of such resources.

To provide less biased estimates of prevalence of SARS-CoV-2 virus and immune experience, we propose a study design that differs from previous studies in four critical ways. First, we propose to use address-based sampling, commonly considered to be the reference standard for developing population-representative estimates, with a sample frame that includes nearly all addresses in the United States. 14,15 Second, we will use home specimen collection and remote laboratory testing procedures, which have higher acceptability than in-person specimen collection and can reach otherwise hard-to-reach populations such as workers and persons in rural areas. 16 Third, the use of a serology screening test followed by a high-specificity confirmatory test will allow for improved predictive value of positive specimens using antibody tests that target different antigenic components. Last, performing both viral detection and antibody testing will provide a simultaneous understanding of the prevalence of viral shedding (and potential infectiousness) and of the prevalence of antibodies (and potentially immunity). In addition to an initial assessment of prevalence, our initial survey will be a baseline for future serial rounds of viral detection and serology testing, allowing for development of populationbased, minimally biased estimates of incidence SARS-CoV-2 infection and immune experience, overall and in key subgroups (e.g. racial/ethnic minorities, rural areas, specific age groups).

We will use a national address-based household sample to collect survey data on approximately 4,000 US participants by collecting survey data and self-collected specimens for SARS-CoV-2 RCA PCR and serology testing. The overall design of the study is illustrated in Figure 1 . National Probability Sampling Frame: The study will use an address-based sampling (ABS) frame for selection of a probability-based sample, a method commonly considered the reference sampling strategy in the cell phone era, due to its complete coverage of the US households when compared to telephone-and internet-based frames. 14, 15 The frame is based on the USPS Computerized Delivery Sequence File (CDSF) that includes roughly 130 million residential addresses, covering all delivery points in the United States. 17 Each address is geocoded to a unique latitude and longitude before its related geodemographic data from the Census and commercial databases such as Experian are retrieved. Moreover, approximately 50% of addresses are matched to landline and/or cellular phone numbers that will allow for implementation of a multimodal approach, as well as rigorous refusal conversion procedures. This frame, constructed by Marketing Systems Group (MSG) from the latest CDSF, has been previously used in numerous health research studies. [18] [19] [20] [21] For addresses in non-oversampling strata, each address will be selected with an equal probability of selection method (EPSM) to ensure the most efficient estimates at the national level. To increase geographic representation, J o u r n a l P r e -p r o o f we will use systematic random sampling, in which the frame is ordered by 9-digit ZIP+4 first.

Next, a random starting point is selected and every n th address after the random start is selected. Drop units (multi-unit addresses) in the frame each have a drop count, and the frame will be expanded to account for drop units.

The sample for this study will be comprised of 4,000 households selected across the nation using the ABS frame. We will use a two-stage sampling methodology whereby in the first stage a representative sample of households will be selected, followed by a random selection of one adult in each sample households. We will assume a household-level response rate of 22% and anticipated 5% rate for addresses that may be vacant or otherwise unreachable at the time of survey administration for an overall yield rate of 20.9% (22% × 95%).

Recent probability samples for COVID testing in Atlanta 22 and Indiana 23 have achieved similar participation rates. We believe this is a feasible yield rate, since although contingency valuations can overestimate willingness, a recent national online survey nonetheless found 88% of respondents reported willingness to participate in COVID specimen self-collection research. 24 The total sample will therefore include at least 19,129 addresses (4,000 / 0.209).

Invitations and kits will be sent in waves to allow adjustments for observed response rates and underrepresentation of important subgroups. If there is substantial non-response among racial/ethnic minority groups in Wave 1, we will oversample geographic areas (e.g., Census blocks) with high representation of African American and/or Hispanic/Latinx populations in subsequent waves. Over-sampling strategies will be designed with a goal of attaining sample sizes in racial/ethnic minority-subgroups proportional to their size in the underlying U.S. population, facilitating robust estimation for each group. The finite population correction (Nn)/(N-1) will be virtually equal to 1, rendering the difference between sampling with or without replacement in subsequent waves a moot distinction. All waves will be pooled for analysis, and during the weighting process, design weights will be calculated to reflect any oversampling that may be used during the address selection. 25, 26 State and locality oversamples: Study methods are amenable to developing locality-specific estimates. We will target a total sample size of at least n=600 per state though oversampling, additional to the national 4,000 household sample, to develop stable state-specific prevalence estimates for up to seven populous states (CA, FL, GA, IL, NY, TX, WA), contingent on availability of resources. These states represent different geographic regions of the US and J o u r n a l P r e -p r o o f include some states that were impacted early. For these oversampled states, we will coordinate with state health departments to maximally inform local public health efforts. Depending on resource availability, we may also include additional oversampling for some states to develop enhanced estimation ability to inform their local public health needs. For instance, in GA the total sample will include 1,200 households with participation offered to all household members aged 3 or older, allowing for greater precision of local estimation. In total, we anticipate an additional 5,156 participants in 3,495 households to participate in the state and full household oversamples for a national total of up to 9,156 participants in 7,495 households.

Household procedures: All households selected will receive a first-class letter introducing the study and its website, an evidence-based practice demonstrated to increase survey response. 27 The study website, hosted on an academic server and featuring multimedia content, will provide basic information regarding the study, address common participant questions, and encourage participant confidence in the study. Each household for which a consent and baseline survey is completed will receive a home participation kit (HPK) in a study-branded box (with branding consistent with the welcome letter) that will detail how they can participate in the research if they are interested. We may assess whether distribution of HPK to households concurrent with the invitation letter increases participation. The HPK will include (1) instructions on accessing electronic or phone procedures for consent, enumeration, and behavioral survey and (2) instructions and materials for self-collection of specimens and return mailing to our central laboratory for testing. Specimen self-collection materials will include a flocked swab for collection of anterior nares specimens and a transport tube containing phosphate-buffered saline (PBS); for self-collection of dried blood spot (DBS) cards, the kit will include a finger stick device, alcohol wipe, adhesive bandage, and Whatman 5-spot DBS specimen card.

The HPK will have a unique identifier code that will be used for all baseline procedures, including laboratory testing, incentives, and survey. Instructions will direct participants to an online link or toll-free call for screening consent and the household enumeration procedure (detailed below). After the household enumeration is complete, one adult household member will be randomly selected based on an automated algorithm in our survey platform. The selected person will be asked to complete the study consent and baseline questionnaire. Households not completing the enumeration survey will receive two additional postcard reminders to complete the questionnaire and home test kit. Using a multimodal approach, households in the sampling frame with phone numbers (approximately 60%) and those who provided their contact J o u r n a l P r e -p r o o f information at enrollment will also receive up to three text messages and/or calls as additional reminders from a call center specializing in research partnerships. Households will have a onemonth period to complete their kits for inclusion in the study. We will adjust for differential nonresponse for households lacking phone number information by using information available at the Census block level. 28 Consenting persons who complete baseline participation will be enrolled into an incidence cohort for a follow-up period. We will use similar contact procedures, and HPK, to encourage follow-up participation at 3 months after the baseline survey.

Consent and survey processes: All consents and survey data will be hosted on a secure, HIPPA-compliant electronic survey platform. Participants will be able to self-complete the consent and surveys directly in the online platform or can call a toll-free number to have assistance in completing these processes. Participants who wish to complete surveys by phone will be assisted by staff trained in human subjects protocols and study-specific procedures, with trained staff entering data into the same secure electronic platform as the one used by selfcompleting survey participants. Call center procedures such as hours of operation, number of contact attempts, and handling of participant questions will be adapted from procedures used in other large surveys. 29 All study materials will be available in English and Spanish. Study procedures have been approved by the Emory University Institutional Review Board (Protocol 00000695).

The enumeration process will be adapted from procedures used in other national studies. 29 An adult aged 18 or older in the household will complete enumeration, providing for each member of their household first, middle, and last initials, age, gender. One adult aged ≥18 years be randomly selected from the enumeration list to complete the baseline questionnaire and HPK. There is no exclusion criterion other than age. Persons who have a clotting disorder, are on blood thinners, or are aged ≥80 years will only complete the anterior nares specimen procedure.

Specimen self-collection: The sampled household member will be asked to use the HPK to provide two specimens for laboratory testing. An anterior nares (nasal) swab will allow for detection of SARS-CoV-2 RNA by RNA PCR; CDC identifies at-home self-collected anterior nares swabs as a suitable specimen type for PCR analysis 30 , and self-collected anterior nares swabs have high sensitivity for RNA detection when compared to NP swabs. 31 Swabs will be stored and shipped in phosphate buffered saline (PBS). Participants will also perform a finger J o u r n a l P r e -p r o o f prick with an automated lancet and fill in a Whatman 903 protein saver specimen card for the detection of antibodies to SARS-CoV-2. 8 Specimen collection instructions will represent standard instructions, similar to those we have previously published for self-collected specimens, 32,33 customized with branding for this study. Instructions will also include videos that demonstrate each component of the self-collection process. 34 Instructions guide participants to return specimens within 48 hours of collection. Stability testing indicates a longer window, allowing for shipping and laboratory processing time. Laboratorians will determine adequacy of sample based on visual inspection of the specimen and of the date collected and will request retesting for specimens not determined to be adequate. All specimens will be returned through US mail to the central study laboratory in biohazard bags and sturdy outer boxes.

The 15-20 minute baseline questionnaire will assess domains of demographics, COVID-19 knowledge, SARS-CoV-2 testing history, medical history, symptomatic history, illness in household, social distancing and isolation practices, and life changes due to COVID-19 (Supplement 1). Demographic measures will be adapted from the Census Bureaus' American Community Survey 35 , and include age, race/ethnicity, gender, education, income, and health insurance. COVID-19 knowledge items will be adapted from several sources, focusing on information relevant to protective and proactive health behaviors.

SARS-CoV-2 testing history will be based on items previously used for HIV testing history in validated questionnaires. 36 We will assess symptom history at two time points relevant for virus and antibody testing, respectively, using 1-month recall and time since January 2020. Clinical history will be based on symptomology of COVID-19 identified by the US Centers for Disease Control and Prevention, and will use a severity index for experienced symptoms based on an instrument validated for flu. 37 A number of studies have used different considerations to classify persons as "mildly" or "moderately" symptomatic for COVID-19 disease, including assessments based on fever, 38-40 respiratory symptoms and their severity, 38,40-42 cough, 39 non-specific or other symptoms, 38, 39 and risk factors, 42 yet there is not currently a consensus method to assess symptoms remotely. 43 We are therefore implementing a broad list of symptoms and severity assessments to allow for us to meet emerging consensus definitions for case classification.

Social distancing and isolation practices will be based on measures previously used to inform modeling studies, assessing the number of persons with close contact. 44, 45 Laboratory testing:

J o u r n a l P r e -p r o o f RNA-PCR: Anterior nares swab specimens will be processed as previously described. 34 Specimens will first be checked for quality. The samples will then undergo total nucleic acid extraction using the Thermo Kingfisher platform (Fisher Scientific, Waltham, Mass). Isolated RNA will be reverse transcribed to DNA using a one-step, one-tube system using reagents from Thermo (Fisher Scientific, Waltham, Mass). The second half of the one-tube system will involve qPCR. The reverse-transcribed DNA will undergo qPCR with primers and probes targeting 3 gene regions of the SARS-CoV-2 genome (N, S, ORF1), using reagents from Thermo. The results will be analyzed and an interpretation will be made based on cycle threshold values and positive identification of the nucleic acid.

Serology tests: A two-step process will be used: all specimens will be tested with a screening ELISA assay, and all specimens testing positive on the screening assay will also be tested a second time with a second ELISA assay. We implement this strategy by screening with a test with relatively high sensitivity to detect total antibodies (BioRad, Hercules California: Sens: (Table 1) . Performance data from FDA also contain 95% confidence intervals for sensitivity and specificity of tests. If worst-case scenarios (e.g., the lower 95% CI for sensitivity of the screening test and the lower 95% CI for the specificity of the confirmatory test), predictive value of positive tests has more variability, especially for low prevalence scenarios.

J o u r n a l P r e -p r o o f DBS specimens will first be checked for visually quality. 34 A 6mm punch will be obtained from the dried-blood spot and the material will undergo a standard antibody extraction method using TRIS buffer. Once the material is added to the reaction tube, the EIA primary and secondary antibodies (SARS-CoV-2 assay, Total immunoglobulins; BioRad, Hercules, California) will be added using an automated liquid handler instrument (DSX; Dynex Technologies, Chantilly, Virginia). All serologic and molecular tests have FDA EUA authorization. The protocol will follow the manufacturer's guidelines for reaction conditions, data interpretation, and control checks.

For specimens with reactive results on the screening test, a second eluate will be tested with the confirmatory test using the same elution procedures.

Return of results: If use of study assays with self-collected specimens has been approved or cleared under Emergency Use Authorization (EUA) by the FDA at the time of testing, laboratory results will be returned to participants and they will be instructed to seek follow-up care with their usual physician if they have concerns or questions about their results. Results of the overall antibody algorithm will be reported (e.g., if negative on first test, a non-reactive result will be reported; if positive on both tests, a reactive result will be reported). For specimens with discordant results, a third antibody test on a different platform will be run as a tiebreaker and the result of the tiebreaker test will be reported. If the assays used in the study have not received EUA, we will return test results as research results to the extent allowable by FDA, under the auspices of Emory's IRB approval and an informed consent process. For specimens with insufficient quantity or other extraction failure, participants will be mailed an additional kit to allow for repeat specimen collection.

J o u r n a l P r e -p r o o f Primary outcomes: The primary outcomes will be (1) the weighted proportion and 95%

confidence interval (CI) of SARS-CoV-2 RNA-PCR detected specimens and (2) the weighted proportion and 95% CI of SARS-CoV-2 persons with detectable antibodies (i.e., algorithmdetermined positive). These outcomes will be prepared with inference to the United States and to the oversampled states. Table 2 provides more detail regarding study outcomes.

Data system: A unified data system that leverages information from research partners will be used for participant management. Figure 2 represents the data sources, including sources of participant-related data (call center, survey data, laboratory data, and incentive), which will be combined with data from Emory and from the sample frame provider into a secure, unified cloud-based participant data management system (DMS). The DMS will allow for real-time tracking of participant progress in the study, and system-automated responses to facilitate scalability and rapid response. This is best illustrated with a description of how participant data will flow through the system. The sample frame company will enter the list of randomly selected addresses in the sample frame into the DMS, allowing mailing of invitation letters and for the lab to mail an HPK to the selected address. The sending of an HPK will be registered in the DMS, for management purposes. When the household respondent completes their questionnaire, the DMS will receive notification of this from the survey automated programming interface (API).

After a participant has completed their specimen self-collection and returned the specimen to the laboratory, the lab API will place completion data into the DMS, which will trigger the DMS to automatically order a $40 electronic incentive card to be provided to the participant, using the API of the gift card provider. Also, the DMS will communicate to the call center that further follow-up calls should cease. The system will be built to accommodate different participant pathways, such as for participants that need additional reminders or have opted out of study participation. Study staff will remotely manage the DMS and participant contact process, ensuring participants needs are met and that those interested in participating are on target to complete study tasks in a timely manner. The overarching goal of developing the data system, and engaging with external partners accustomed to handling large volumes of interactions including the study call center, sampling partner, and laboratory, is to facilitate feasibility of conducting the project on an expedited timeline.

J o u r n a l P r e -p r o o f First, base (inverse probability) weights will be computed to reflect selection probabilities for both households and persons within households. Next, post-stratification weights will be computed by ratio-adjusting base weights to characteristics of the survey population, based on the latest population estimates from the Census American Community Survey (ACS). For this purpose, we will use a 'raking' procedure to ensure alignment with the U.S. population (and over-sampled states) with respect to various geodemographic characteristics, including gender, age, race-ethnicity, education level, region, income, home ownership, and metropolitan area.

The resulting design effect, which can be approximated by , will be examined to assess the weighting efficiency. We will use Taylor Series linearization for variance estimation.

We will develop weighted estimates for study outcomes, including period prevalence and incidence of SARS-CoV-2 infection and SARS-CoV-2-specific immune response, estimates of the proportion of cases that lead to diagnosed COVID-19 or to fatality, and of the numbers and proportions of cases that are mildly symptomatic or asymptomatic. We will create estimates for J o u r n a l P r e -p r o o f each of these main outcomes for subgroups including age group, gender, race/ethnicity, symptomology, region, and urbanicity. We will conduct a series of analyses using regression models appropriate for each type of data, such as logistic regression models for assessment of the predictive utility of various symptoms on prevalence of study outcomes. Several sensitivity analyses will be conducted to assess potential bias, such as full household estimation (more detail below) to explore potential deviations from random selection of household members, and a separate analysis to assess the impact of enumerated household members who are unavailable to participate due to hospitalization for a respiratory condition. It will also be important to correct for known imperfection in the diagnostic performance of laboratory tests. Our confirmatory testing procedure will serve to minimize false positives, we will nonetheless account for imperfect performance of the overall test algorithm sensitivity and specificity. We will use the procedures proposed by Diggle, implemented in a Bayesian framework, such that uncertainty from design effects and diagnostic inaccuracy are both quantified in prevalence estimates. 48 Study Sample Size Estimation: Early serosurvey data have identified a range of antibody prevalence values, 2.8% weighted prevalence in Santa Clara County, 10 4.3% antibody prevalence in Los Angeles County, 11 and 12.3% in New York state. 9 Figure 3 displays the overall margin of error as a function of different sample sizes and prevalence levels. In Table 3 J o u r n a l P r e -p r o o f based on a rationale that full household testing would likely reduce response rates (leading to bias) and that the additional laboratory testing would greatly add to the study cost without substantially increasing study power due to within-household correlation of outcomes. There is potential for bias in selecting a single household respondent: some households will be more likely to select a member who has symptoms, leading to overestimation of primary outcomes.

Conversely, other households may have one member who is sick and assumed to have COVID19, and they may instead test a household member with more mild symptoms, leading to endpoint underestimation. To describe potential bias in selection of a single household member for testing, we will randomly select a group of households to receive additional testing for all household members aged 3 or older. We will target participation of 400 households, or 10% of households in the national sample, for this purpose. Based on the proportion of single-person households, and average US household size of 2.6, 35 we anticipate this will lead to an additional 396 tests and surveys, allowing for assessment of possible bias in selection of household members for testing. Potential bias will be characterized by comparing test positivity by participant characteristics among fully sampled households to standard households in which one person was sampled. This subset of fully enumerated households will additionally allow for characterizing within-household transmission for households with symptomatic and asymptomatic positive members.

J o u r n a l P r e -p r o o f Incidence cohort design: Participants completing baseline procedures will be mailed identical follow-up testing kits at 3 months after the return of their initial test, regardless of baseline SARS-CoV-2 infection and immune experience at baseline. We will retest participants using the same laboratory methods as for baseline testing and calculate incidence for each laboratory assessment by calculating the weighted number of participants with newly positive assessment at time T 1 /the weighted number of susceptible people at time T 0. Persons testing PCR-or serology-positive will be retested at the follow-up period to characterize ongoing viral shedding, development of SARS-CoV-2-specific antibodies and behavioral changes at subsequent time points. This will allow for improved understanding of these outcomes for a representative sample of persons who are asymptomatic or mildly symptomatic. In past studies we have observed 80% cohort retention; 32,49-51 we anticipate slightly lower (70%) retention at 3 months for this cohort study. We may also include additional incidence-or resample-assessment time points based on the interests and identified needs of collaborating state and local health departments.

Seroprevalence data serve as a key input into dynamic transmission models of infectious disease. 52, 53 For SARS-CoV-2, susceptible, exposed, infectious, and recovered (SEIR) model frameworks are being widely used for a range of applications including forecasting, inferring transmission patterns and examining the potential impact of interventions such as social distancing. [54] [55] [56] [57] We will use our study's estimates of seroprevalence to set the size of the R (recovered) compartment in both simple and agestructured transmission models, thereby increasing the realism of the model scenarios. Our seroprevalence estimates can also be used by the wider modeling community to refine other models for a range of applications. We will specifically use our seroprevalence estimates to model how social distancing can be relaxed in an acceptable safe manner. In doing so, we will investigate how serological testing at the individual and population level may provide data for public health actions.

No results of the study are currently available. We anticipate participant recruitment and data collection will begin in July 2020. We will report the main outcomes, of period prevalence of infection with SARS-CoV-2 and SARS-CoV-2-specific immune response, by variables including symptoms, symptom history, underlying conditions, county population concentration (urbanicity), J o u r n a l P r e -p r o o f region, gender, age, family size, and isolation practices. Given the urgent nature of the pandemic, we will develop a study website for results dissemination to supplement reporting of study outcomes through academic publication. The study website, COVIDVu.org, will serve as a venue to report preliminary prevalence estimation, as a portal for sharing non-identifying public use study datasets, and to share infographics to communicate key study findings to a broader audience.

The COVID-19 pandemic has had wide-reaching economic and social impact in the United 

We propose to leverage a probability sample and home-based testing to produce estimates of the prevalence of SARS-CoV-2 infection and SARS-CoV-2-specific immune response that will include people without symptoms, who might otherwise not seek testing, and who might be unable to travel to a testing location or who might be unwilling to test at a clinical site for testing.

This latter concern is important: our preliminary data indicate that nearly half of people may be unwilling to attend a laboratory testing venue for SARS-CoV-2 as part of a research study. 24 Higher participation rates will minimize selection bias. Importantly, willingness to participate in a home test study for COVID-19 was high overall, and not differential across key demographic variables such as age and race/ethnicity. 24 Further, our mail-based testing platform is highly acceptable to participants, 32 and initial evaluation of self-specimen collection indicated that the great majority of participants are able to self-collect specimens for SARS-CoV-2 PCR and serology testing. 58 To date, a number of important studies on the prevalence of antibodies have been biased because they likely reflected an over-representation of people seeking care; 9,10,59 because they recruited people in ways that encouraged those concerned about their possible exposures to participate; 10 and because they recruited at convenience locations where attendance in a venue might be selectively associated with lower perceived vulnerability to infection. 9 A probability-based, nationally representative survey will provide a context through which to understand studies that are local and/or based on convenience samples.

We recognize that our strategy, while emphasizing representativeness and minimizing selection bias, has important limitations. Some critical populations will not be reached with our sampling strategy. For example, people who are homeless or who reside in informal lodgings without individual postal addresses will be missed in our sampling; this is an important limitation because there is evidence that homeless persons may be at especially high risk. 60 People who are incarcerated will also be excluded and are also at high risk. 61 It is critical that other types of serosurveillance efforts be developed and implemented to compliment household probabilitybased serosurveys; triangulation based on an understanding of the weaknesses of a surveillance system is a well-accepted approach to develop an overall understanding of the impact of a health condition. 6263

Our approach is vulnerable to non-response at baseline, and challenges with retention at the follow-up period. We will attempt to mitigate non-response by maximizing opportunities for participants to corroborate the authenticity of the study: using distinctive, professional branding that will be recognizable across interactions; providing a website that will be hosted on an J o u r n a l P r e -p r o o f academic web platform with multiple modes of information about the study (e.g., video, FAQ formats); and by mailing a professionally branded and designed home participation kit to all selected households. We will assess for differential non-response by characteristics of the census tract of selected participants -for example, median income, racial/ethnic distribution, health insurance coverage, health literacy, etc., and weight to adjust for such non-response. We have estimated that we will have 22% response, a level lower than reported willingness to participate in home testing studies for COVID-19 research, 24 in order to account for the possibility that the willingness contingency valuation overestimated willingness. We will seek to enhance retention abilities by collecting a full set of detailed contact information at baseline, and by using multiple modes of contact strategies at follow-up, strategies we have successfully used in other remote home testing studies. 32, 64, 65 We will perform analyses at the follow-up period to assess ways in which differential retention across subgroups might impact study findings. We will also seek to consider the impact of non-response and loss-to-follow-up by considering data triangulation, assessing data across different sources including other infection and antibody surveys and local case surveillance data.

Data for public health action is most powerful when it is most local. Our proposal aims to develop national estimates with systematic sampling to ensure adequate geographic diversity. It also represents a framework and a set of tools that will be applied to state-level estimation and could be expanded to smaller jurisdictional levels such as metropolitan areas. As part of the initial study, we will also collect data to develop overall state-level estimates of RNA-PCR and antibody positivity for a number of highly populated states.

Understanding the prevalence of SARS-CoV-2 infection, and SARS-CoV-2-specific antibodies that may indicate immunity can provide the roadmap to estimating levels of required resources, to improving our understanding of the spectrum of COVID-19 disease, and to understanding the differences in infection in key sub-populations in the US epidemic. By combining traditional methods of recruiting a representative sample with novel methods to allow laboratory assessment of participants without requiring participants to visits a clinical location for specimen collection, we will be able to represent all housed persons in the US, including otherwise hard to access populations such as people who live in rural areas and people who are hesitant to go into healthcare settings because of concern of contracting the virus. Public health surveillance is the conscience of an epidemic 66 ; our planned national survey will be an important data source J o u r n a l P r e -p r o o f that will provide a representative national picture and help to put case reporting and other types of serosurveys in context.

J o u r n a l P r e -p r o o f

Coronavirus disease 2019 ( COVID-19) : situation report

Preventing a covid-19 pandemic

On the importance of early testing even when imperfect in a pandemic such as COVID-19

Priorities for the US Health Community Responding to COVID-19

Covid-19: BMA calls for rapid testing and appropriate protective equipment for doctors

Diagnostic testing for the novel coronavirus

Emergence of a Novel Coronavirus Disease (COVID-19) and the Importance of Diagnostic Testing: Why Partnership between Clinical Laboratories, Public Health Agencies, and Industry Is Essential to Control the Outbreak

Understanding Antibody Testing for COVID-19

Amid Ongoing COVID-19 Pandemic, Governor Cuomo Announces Results of Completed Antibody Testing Study of 15,000 People Show 12.3 Percent of Population has Covid-19 Antibodies. Governor's Office

COVID-19 Antibody Seroprevalence

Seroprevalence of SARS-CoV-2-Specific Antibodies Among Adults

Antibody responses to SARS-CoV-2 in patients with COVID-19

Serology characteristics of SARS-CoV-2 infection since exposure and post symptom onset

New Developments in Survey Data Collection

Address-Based Sampling -Alternatives for Surveys That Require Representative Samples of Households

Willingness to seek laboratory testing for SARS-CoV-2 with home, drive-through, and clinic-based specimen collection locations. medRxiv

Efficient Use of Commercial Lists in U.S. Household Sampling

Sample design and cohort selection in the Hispanic Community Health Study/Study of Latinos

Exposure to Court-Ordered Tobacco Industry Antismoking Advertisements Among US Adults

Exposure to Suicide in the Community: Prevalence and Correlates in One U.S. State

Address-Based Sampling for Recruiting Rural Subpopulations: A 2-Phase, Multimode Approach

Centers for Disease Control and Prevention. Population Point Prevalence of SARS-CoV-2 Infection Based on a Statewide Random Sample -Indiana

Willingness to Use Home Specimen Collection Test Kits for SARS-CoV-2 Testing for Research. preprint under review

Vital and Health Statistics, National Health and Nutrition Examination Survey

Poststratification of Pooled Survey Data

Cell-phone-only households and problems of differential nonresponse using an address-based sampling design. Public opinion quarterly

Centers for Disease Control and Prevention. Interim Guidelines for Collecting, Handling, and Testing Clinical Specimens from Persons for Coronavirus Disease

Swabs Collected by Patients or Health Care Workers for SARS-CoV-2 Testing

Developing and assessing the feasibility of a home-based PrEP monitoring and support program

Suitability and Sufficiency of Telehealth Clinician-Observed, Participant-Collected Samples for SARS-CoV-2 Testing: The iCollect Cohort Pilot Study

Detection of SARS-CoV-2 RNA and Antibodies in Diverse Samples: Protocol to Validate the Sufficiency of Provider-Observed, Home-Collected Blood, Saliva, and Oropharyngeal Samples

The Annual American Men's Internet Survey of Behaviors of Men Who Have Sex With Men in the United States: Protocol and Key Indicators Report

Development of the Flu-PRO: a patient-reported outcome (PRO) instrument to evaluate symptoms of influenza

Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet infectious diseases

Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: a case series. The Lancet infectious diseases

CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19)

Digestive Symptoms in COVID-19 Patients With Mild Disease Severity: Clinical Presentation, Stool Viral RNA Testing, and Outcomes

Smell dysfunction: a biomarker for COVID-19. Int Forum Allergy Rhinol

Covid-19: a remote assessment in primary care

Social contacts and mixing patterns relevant to the spread of infectious diseases

Study design and protocol for investigating social network patterns in rural and urban schools and households in a coastal setting in Kenya using wearable proximity sensors

Distinct features of SARS-CoV-2-specific IgA response in COVID-19 patients

Estimating prevalence using an imperfect test

Double-Blind, Single-Center, Randomized Three-Way Crossover Trial of Fitted, Thin, and Standard Condoms for Vaginal and Anal Sex: C-PLEASURE Study Protocol and Baseline Data

Levels of clinical condom failure for anal sex: A randomized cross-over trial

Explaining racial disparities in HIV incidence in black and white men who have sex with men in Atlanta, GA: a prospective observational cohort study

Estimation of effective reproduction numbers for infectious diseases using serological survey data

Estimating reproduction numbers for adults and children from case data

Modeling shield immunity to reduce COVID-19 epidemic spread

Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet infectious diseases

Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet infectious diseases

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period

Suitability and Sufficiency of telehealth clinicianobserved participant-collected samples for SARS-CoV2 testing: the iCollect Cohort Pilot Study

USC-LA County Study: Early Results of Antibody Testing Suggest Number of COVID-19 Infections Far Exceeds Number of Confirmed Cases

COVID-19 and the US response: accelerating health inequities

COVID-19 and the US response: accelerating health inequities

The point of triangulation

Updated guidelines for evaluating public health surveillance systems: recommendations from the Guidelines Working Group

An Electronic Pre-Exposure Prophylaxis Initiation and Maintenance Home Care System for Nonurban Young Men Who Have Sex With Men: Protocol for a Randomized Controlled Trial

Usability and Acceptability of a Mobile Comprehensive HIV Prevention App for Men Who Have Sex With Men: A Pilot Study

The Conscience of the Epidemic. The open AIDS journal