key: cord-0819339-mrc41hu0 authors: Kahn, Rebecca; Kennedy-Shaffer, Lee; Grad, Yonatan H; Robins, James M; Lipsitch, Marc title: Potential Biases Arising from Epidemic Dynamics in Observational Seroprotection Studies date: 2020-09-01 journal: Am J Epidemiol DOI: 10.1093/aje/kwaa188 sha: 9c94db7632ed2b126f1e6229fcd5b4a03c60e0bc doc_id: 819339 cord_uid: mrc41hu0 The extent and duration of immunity following SARS-CoV-2 infection are critical outstanding questions about the epidemiology of this novel virus, and studies are needed to evaluate the effects of serostatus on reinfection. Understanding the potential sources of bias and methods to alleviate biases in these studies is important for informing their design and analysis. Confounding by individual-level risk factors in observational studies like these is relatively well appreciated. Here, we show how geographic structure and the underlying, natural dynamics of epidemics can also induce noncausal associations. We take the approach of simulating serologic studies in the context of an uncontrolled or a controlled epidemic, under different assumptions about whether prior infection does or does not protect an individual against subsequent infection, and using various designs and analytic approaches to analyze the simulated data. We find that in studies assessing whether seropositivity confers protection against future infection, comparing seropositive individuals to seronegative individuals with similar time-dependent patterns of exposure to infection, by stratifying or matching on geographic location and time of enrollment, is essential to prevent bias. The extent and duration of immunity following SARS-CoV-2 infection are critical outstanding questions about the epidemiology of this novel virus (1) . Serologic tests, which detect the presence of antibodies, are becoming more widely available (2) . However, the presence of antibodies, or seroconversion, does not guarantee immunity to reinfection, and experimental data with other coronaviruses raise concerns that antibodies could under some circumstances enhance future infections (3) . Studies are needed to evaluate the short and long term effects of seropositivity. Understanding the potential sources of bias and methods to alleviate biases in these studies is important for informing their design and analysis. Serologic studies may be useful for a variety of reasons, including to assess the cumulative incidence of infection within a community, to identify risk factors for transmission, and to determine the extent of clustering of infections within a community (4, 5) . (6) . A crude analysis of this longitudinal study would compare time from enrollment to infection between those that are seropositive and those that are seronegative at enrollment. However, because seroprotection studies are observational, as the exposure (i.e., seropositivity) is not assigned at random, potential confounders must be controlled for to obtain unbiased estimates. Studies of seropositivity and its effect on future infection are particularly prone to confounding because factors that affect someone's risk of infection and therefore their serostatus prior to enrollment (the exposure) are likely similar to factors that affect someone's risk of infection after enrollment (the outcome). For example, individuals in high-risk occupations (e.g., health care workers) are more likely to become seropositive and are more likely to be exposed again once they are seropositive. Confounding by individual-level risk factors is relatively well appreciated. Less obvious perhaps is that geographic structure (7) or the underlying, natural dynamics of epidemics (8,9) can induce noncausal associations between an exposure and an outcome. For example, even when seropositivity confers no protection against future infection, if the overall size of an epidemic is very different in different communities, individuals in communities with small epidemics will have low prevalence of the exposure (seropositivity) and low incidence of the outcome (infection after enrollment), while individuals in communities with larger epidemics will have higher prevalence of the exposure and higher incidence of the outcome, biasing estimates of the effect of seroprotection. Bias may also occur if individuals are enrolled at different times during an epidemic. If enrollment occurs during an upward trajectory (such as the early exponential phase of an epidemic), individuals enrolled early in the epidemic will be both less likely to be seropositive (exposure) and also less likely to become infected at a given point in time after enrollment (outcome) than those with a later date of enrollment. Moreover, in an epidemic that is controlled (thus with an up-then-down trajectory of incidence) the representation of seropositive individuals will increase with time, but the rate at which these individuals experience the outcome will increase then decrease, creating potential for confounding in either direction. In this study we take the approach of simulating such studies in the context of an uncontrolled or we assume partially immune) (13,14). In simulations with partial immunity, we make the simplifying assumption that susceptibility is immediately decreased following the infectious period and remains constant over time. Seroconversion is assumed to be detectable at the end of the infectious period. We simulate scenarios with limited control measures in place (R E =1.5) and scenarios in which control measures that reduce the force of infection per infected individual ( ) are implemented at day 120 of the study period, reducing R E from 2 to 0.8. is set to yield these values of R E . Table 1 shows the specific numbers corresponding to these parameters of the simulations, and Web Appendix 1 describes the generation of the network and outbreak in more detail. For each simulation setting (one or ten communities, well mixed or clustered communities, control measures or not, and seroprotective efficacy), we consider three sampling designs: Second, given the potential for stochasticity to generate heterogeneous outbreaks between communities (7), we also conduct an analysis stratified by community and day of enrollment to prevent confounding by these variables. In this analysis, a Cox proportional hazards model with time starting from enrollment is fit with a separate baseline hazard function for each community and day of enrollment combination, but a common hazard ratio due to seropositivity. R code for the simulations and analysis is available on Github (15) , and additional analyses examined are described in Web Appendix 2, Web Figure 2 , and Web Figure 3 . Figure 1 shows the results for 1,000 simulations for each of 36 combinations of parameters (see Table 1 ). Figures Figures 1A, 1B, and 1D) , an unadjusted analysis creates the same upward bias, regardless of whether enrollment is on the same or multiple calendar dates, as the same calendar date does not mean the same phase of the epidemic in each of the communities. Once again, the bias is upward because individuals in communities with larger or more advanced epidemics are exposed to higher hazards and are more likely to be seropositive at baseline (Figures 2A-D) . As before, the Clustering of contacts within communities (a departure from the assumption of a well mixed epidemic, Figure 1C ) produces an upward bias even in the matched design and stratified analyses. As noted, this reflects that the different parts of the network have different local prevalence at any given time, resulting in a milder form of the same heterogeneity-induced bias seen when there are many discrete communities. Because these clusters of high and low prevalence areas overlap and arise during the study, there is no a priori way to adjust for them. When there is clustering within communities, a slight upward bias remains, suggesting the local network structure in a study is an important factor to consider. While most individuals are susceptible when they are enrolled into the study, it is possible for individuals to be exposed or infectious upon enrollment. Excluding individuals who are infected soon after enrollment (e.g., within the average latent period length) would remove many of these These simulations focus on the bias inherent in some study designs that may be considered, but do not address the feasibility of implementing these designs. In addition, we do not focus on the power of these studies; this may have important consequences in determining an adequate sample size. Sample size considerations will be particularly important in balancing the advantage of starting enrollment later, when the cumulative incidence is higher and thus the exposure arms are more likely to be balanced, and avoiding the tail of an outbreak or a setting after control measures have been implemented, which will reduce the infection risk for all participants. We have shown that matching can address these issues, but matching requires exposure status to be known at enrollment. This may be feasible if the study is designed following a serological survey, where individuals can be enrolled on the basis of their antibody presence from the survey. If the exposure needs to be measured for the seroprotection study, however, matching Investigators will need to consider the relative sample size requirements and testing burden of these designs in the context of their specific study. As serologic studies begin, understanding potential sources of bias and how to alleviate them are important for accurately estimating the extent and duration of immunity to SARS-CoV-2. Here Opinion | Who Is Immune to the Coronavirus? CDC begins studies for more precise count of undetected Covid-19 cases News Feature: Avoiding pitfalls in the pursuit of a COVID-19 vaccine Use of serological surveys to generate key insights into the changing global landscape of infectious disease Serology for SARS-CoV-2: Apprehensions, opportunities, and the path forward World Health Organization. Correlates of vaccine-induced protection: methods and implications Bias due to misclassification in the estimation of relative risk Network theory and SARS: predicting outbreak diversity