key: cord-0429637-1ntedt0m authors: Vos, E. R. A.; van Bove, M.; den Hartog, G.; Backer, J. A.; Klinkenberg, D.; van Hagen, C. C. E.; Boshuizen, H.; van Binnendijk, R. S.; Mollema, L.; van der Klis, F. R. M.; de Melker, H. E. title: Associations between measures of social distancing and SARS-CoV-2 seropositivity: a nationwide population-based study in the Netherlands date: 2021-02-12 journal: nan DOI: 10.1101/2021.02.10.21251477 sha: ae66adc39d1fcd7babf8da122169c00840a13303 doc_id: 429637 cord_uid: 1ntedt0m This large nationwide population-based seroepidemiological study provides evidence on the effectiveness of physical distancing (>1.5m) and indoor group size reductions on SARS-CoV-2 infection. Additionally, young adults seem to play a significant role in viral spread, opposed to children up until the primary school age with whom close contact is permitted. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 12, 2021. ; https://doi.org/10.1101/2021.02. 10 Here, we provide evidence from a large population-based study on the effectiveness of physical 125 distancing (>1.5m) as well as indoor group size reductions on SARS-CoV-2 infection, and these data 126 substantiate policy of allowing close contact between teachers and children in primary school. 127 Our results on physical distancing are in line with the few previous reports mostly derived from 129 healthcare settings and households [8] . Seroprevalence rates were low in children aged ≤12 years 130 despite close contact, and similar to observations from other European countries with comparable 131 nationwide estimates [1, 9] . Interestingly, the likelihood of infection among persons in close contact 132 with children was not statistically significantly increased, most likely indicating a low contribution in 133 transmission, as suggested previously [10] [11] [12] [13] . On the other hand, particularly young adults, who 134 engage in relatively more social interaction as opposed to older age groups [14] and often living in 135 larger (student) households, most probably play an increased role. Applying physical distancing 136 measures within households may not always be feasible, however stressing its relevance in outbreak 137 management could help to reduce (ongoing) transmission. Further, like in ample other countries 138 [15], these data underline the increased risk of infection among nursing home workers. Hence, while 139 working with the most vulnerable, this requires specific attention. 140 Our study has strengths and limitations. Strength is that our study provides a large population 142 sample covering a full age-range from young to old, combining a sound indicator of prior infection, 143 i.e., seropositivity, to extensive questionnaire data. Also, samples could be classified accurately since 144 antibodies were measured with a highly specific and sensitive immunoassay. Limitations include the 145 relatively low response rate, which might have introduced potential selection bias, e.g., of relatively 146 more health-conscious individuals adhering to social distancing measures. Further, some variables 147 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 12, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 12, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 12, 2021. ; https://doi.org/10.1101/2021.02.10.21251477 doi: medRxiv preprint area under the curve (as a measure of goodness-of-fit) of 0.72. D. shows the aOR with 95% 228 confidence envelope for age derived from the multivariable model, with 12 year as reference 229 category. E. displays SARS-CoV-2 seropositivity (and 95% confidence intervals) by number and nature 230 of non-household close contact the day before filling out the questionnaire. Nature of non-231 household close contact was defined as the proportion of non-household close contacts with 232 children aged < 10 years of the total number of non-household close contacts. 233 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 12, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint In this supplement we detail our sampling strategy, provide information on nonresponse rates, and explain how we have included post-stratification weighting in the analyses. The risk factor analyses in the main text use logistic regression based on binary classification of the data (seronegative versus seropositive). Here, we also provide an underpinning of this classification using a twocomponent mixture model. In this model, samples are not rigidly classified as either seronegative or seropositive, but belong to either the negative or positive component with certain probability [1, 2] . As the probability of seropositivity may depend on age, we model the mixing parameter (i.e. the probability of seropositivity, or seroprevalence) with an age-dependent penalized spline [3] . We fit the model to antibody concentration measurements from the population sample described in the main text while incorporating information from a test panel of proven negative and positive samples [4] . Subsequently, we derive test characteristics (sensitivity, specificity) for various cut-offs, showing that the binary classification used in the main text performs well. In a final step we present additional results for the weighted seropositivity estimates by municipal health services, and for the age-specific odds of seropositivity. The current cohort includes persons who had participated in our earlier SARS-CoV-2 serosurveillance study in March, 2020 (sample 1) and an additional nationwide sample (sample 2). Details on the first sample have been described previously [5, 6] . In this previous study, 2, 634 participants had been included. Anticipating a 10% drop-out rate from the first study, and given the low estimated seroprevalence in the first study (2.8%), we aimed to increase the overall power of the study. Hence, the initial cohort was supplemented with an ad- A total sample size of 6, 400 participants, i.e. with an average of 380 participants per age group, would enable us to estimate an overall and age-specific seroprevalence with a precision of 1.25% and 5%, respectively. Following previous experience, we anticipate a response rate of at least 15%. Hence, for the additional sample, we randomly selected 27, 200 persons from the population registry, of which 26, 854 remained eligible for participation after an initial screening. All randomly-selected persons who were invited in the first serological study were also invited for the current study. Of these, 2, 317 participated in the current study. This cohort was subsequently supplemented with an additional sample (as described above). Specifically, we invited 26, 854 randomly-selected persons of which 4, 496 participated, resulting in an overall number of 6, 813 participants. Table S1 shows the number of participants and response rates, stratified by sex, age group, region, and ethnic background. See main text for details and discussion. where X i is the total number of persons in stratum i, N is the total population size (i.e. the Netherlands), x ij is the number of participants in stratum i in study sample j, and n j is number of participants in sample j. Figure S1 shows the regional distribution of samples, and Figure S2 shows the antibody concentration measurements by age. For the analyses, we also include a validation panel that has been used for validation of the assay [4] . Specifically, we take a set of 384 samples from uninfected persons that had been drawn from the Dutch population before the pandemic, and a set of 115 proven SARS-CoV-2 infections with mild to severe disease [4] . Mean and standard deviation of the and as seropositive above the cut-off (red). the log-transformed data, so that θ neg = (µ neg , σ neg ) and θ pos = (µ pos , σ pos ), while the mixing parameter is modelled with a Bayesian penalized-spline using cubic basis functions and first order penalization [7, 8] . Throughout, we consider the age range [0, 100] years, placing knots at 10-year intervals (11 knots in total), so that the total number of basis functions is 13 [7, 8] . Parameters are estimated in a Bayesian framework using Hamiltonian Monte Carlo, implemented in Stan [9] . To improve performance at low prevalence, we employ a logistic transformation for the age-specific prevalence. For the spline smoothing parameter (RWvar) we take an inverse gamma distribution [8] , and for the weights of the spline base functions w i (i = 1 . . . 13), we take where it should be noted that the prior weights are defined on the logistic scale. Results from properly converged chains are obtained within hours (using 10 cores on our servers). Estimates for the parameters defining the mixing distribution and the spline smoothing parameter are given in Table S2 , together with convergence diagnostics R and n eff [9] . In a sensitivity analysis we have re-run the fitting procedure with uninformative prior distributions (only assuming that µ pos > µ neg ). These analyses yield virtually identical results (not shown). Figure S3 gives a visualisation of the data (gray histograms) and model fit (colored lines), suggesting good agreement between the two. Notice also that overlap between the negative and positive component is small which bodes well for efforts to distinguish seronegative from seropositive samples. To further investigate the implications of the analyses, Figure S4 shows the estimated probability of infection as function of antibody concentration. Here, the probability of infection calculated as the estimated positive density (at a certain concentration) divided by the sum of the positive and negative densities (at that concentration) [1] . The figure shows that, in the absence of information on age-specific preva-lence, the estimated probability of infection is close to 0 for concentrations of −1 (log(AU)/mL) and lower, and close to 1 at concentrations of 0 (log(AU)/mL) and higher. The above results show that for the majority of samples there is limited uncertainty as to whether they should be classified as seronegative or seropositive. Therefore, we feel confident that reliable binary classification of the samples is feasible. Here, we investigate the optimal cut-off value for such binary classification, and associated test characteristics (sensitivity and specificity). For a given cut-off, the proportion of the negative distribution with concentrations higher than the cut-off defines specificity of the test (high proportion implies low specificity), while the proportion of the positive distribution with concentrations lower than the cut-off defines sensitivity of the test. Technically, both sensitivity and specificity are calculated using cumulative density functions of the negative (specificity) and positive distributions (sensitivity) [1] . Figure S6 In Table S3 we show test characteristics for two specific scenarios. The first takes cut-offs that maximize the Youden index. Here, the estimated optimal cut-off is -0.56 (95%CrI: -0.67--0.44) and the estimated maximal Youden index is 0.97 (94%CrI: 0.95-0.98). This cut-off, however, is not useful in practice as expected seroprevalence is low (< 10%), and control of the false positive rate is more important than control of the false negative rate. Therefore, in a second scenario we aimed at a specificity of 0.999. Such specificity can be reached with the test, at a cut-off of 0.04 and a sensitivity of 0.943 (which is really good for such specificity!). In the following and in the main text we have opted for a cut-off of 0.04. Finally, Figure S8 shows the posterior distribution of test sensitivity at a cutoff of 0.04 (log(AU)/mL). Mean and standard deviation of the distribution are 0.942 and 0.0151, respectively. These values can be incorporated in Rogan-Gladen-type corrections for estimating true prevalence from observed apparent prevalence in binary classification [10, 11] . The main text provides main results and interpretation of the analyses with logistic regression using the binary classification described in the above. Below we provide additional results on the regional estimates of seroprevalence (Figure S9 ), as well as the age-specific estimates of the unadjusted odd ratios for seropositivity derived from the univariable model ( Figure S10 ). Odds Ratio Figure S10 . Estimates of the unadjusted odd ratios for seropositivity as function of age (see main text for adjusted odds ratios). The estimate is based on (random-effects) univariable logistic regression. Also shown is the 95% confidence envelope. Reference age is 12 years (odds ratio = 1). Usefulness of sero-surveillance for Trichinella infections in animal populations Age-dependent patterns of infection and severity explaining the low impact of 2009 influenza A (H1N1): evidence from serial serologic surveys in the Netherlands Infectious reactivation of cytomegalovirus explaining age-and sexspecific patterns of seroprevalence Binnendijk. SARS-CoV-2-Specific Antibody Detection for Seroepidemiology: A Multiplex Analysis Approach Accounting for Accurate Seroprevalence Third national biobank for population-based seroprevalence studies in the Netherlands, including the Caribbean Netherlands Nationwide seroprevalence of sars-cov-2 and identification of risk factors in the general population of the netherlands during the first epidemic wave Flexible smoothing with b -splines and penalties Bayesian p-splines Stan: A probabilistic programming language Estimating prevalence from the results of a screening test Confidence limits for prevalence of disease adjusted for estimated sensitivity and specificity