key: cord-1001292-c3fziyni authors: Barchuk, A.; Skougarevskiy, D.; Titaev, K.; Shirokov, D.; Raskina, Y.; Novkunkskaya, A.; Talantov, P.; Isaev, A.; Pomerantseva, E.; Zhikrivetskaya, S.; Barabanova, L.; Volkov, V. title: Seroprevalence of SARS-CoV-2 antibodies in Saint Petersburg, Russia: a population-based study date: 2020-11-04 journal: nan DOI: 10.1101/2020.11.02.20221309 sha: b3bdfeb41127aced2ca8c13980e15cb4f4aa8b46 doc_id: 1001292 cord_uid: c3fziyni Estimates from SARS-CoV-2 serological surveys could be biased due to convenience sampling and non-response. This study aims to estimate the seroprevalence of SARS-CoV-2 infection in Saint Petersburg, Russia accounting for non-response bias. We recruited a sample of adults residing in St. Petersburg with random digit dialling. Telephone interview was followed by an invitation for an anti-SARS-CoV-2 antibodies tests - CMIA and ELISA. The seroprevalence estimates were corrected for non-response with the aid of bivariate probit model that jointly estimated individual propensity to agree to participate in the survey and seropositivity. 66,250 individuals were contacted, 6,440 adults agreed to be interviewed and blood samples were obtained from 1,038 participants between May 27, 2020 and June 26, 2020. Naive seroprevalence corrected for test characteristics was 9.0% (7.2-10.8) by CMIA and 10.5% (8.6-12.4) by ELISA. Correction for non-response decreased seroprevalence estimates to 7.4% (5.7-9.2) and 9.1% (7.2-10.9) for CMIA and ELISA, respectively. The most pronounced decrease in non-response bias-corrected seroprevalence was attributed to the history of any illnesses in the past 3 months and COVID-19 testing. Seroconversion was negatively associated with smoking status, self-reported history of allergies and changes in hand-washing habits. These results suggest that even low estimates of seroprevalence in Europe's fourth-largest city can be an overestimation in the presence of non-response. Serosurvey design should attempt to identify characteristics that are associated both with participation and seropositivity. Further population-based studies are required to explain the lower seroprevalence in smokers and participant reporting allergies. Serological surveys in the midst of COVID-19 pandemic address the issue of underestimation of the number of cases registered officially with RT-PCR using material from nasopharyngeal swabs [1; 2] . They use blood antibody tests that are markers of past infection. WHO recommends serological surveys to monitor COVID-19 spread [3] . However, estimates from serological surveys can be also biased. Estimates can be distorted by non-response bias, non-representativeness of the study sample, and imperfect test characteristics. Previous serological surveys so far have all but focused on the former [4] [5] [6] [7] [8] [9] [10] . This poses a significant problem when some observed factors that influence the decision to participate in the survey may be also associated with test results [11] . Non-response or self-selection bias has been widely acknowledged in descriptive epidemiology [12] [13] [14] [15] . In particular, it has been predominantly addressed in seroprevalence surveys of HIV [16] . In this paper we present seroprevalence estimates coming from the first cross-sectional data of our longitudinal study with serial sampling to assess the spread COVID-19 in Saint Petersburg, Russia conducted between May 27 and June 26 2020. St. Petersburg is the second largest city in the country and fourth largest in Europe with the population of approximately 5.2 mln. The first case in the city was registered on 5 March, 2020 and 36,667 cases (7.1 per 1000) were reported as of 31 August, 2020. The study of the spread of COVID-19 in St. Petersburg was established to estimate the extent of epidemic in a population-based compute conditional probability to participate in the study (holding all but one variable at mean levels at a time). Our bivariate probit model is formally introduced in Statistical Appendix App. 1). We analyse variables obtained from CATI and the clinic paper-based survey (ordered or unordered factor variables), and results of antibody tests (binary variables). Participant age was split into groups (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46) (47) (48) (49) , 50-64, or ≥65 years old). In the secondary analyses we also assessed seroprevalence by week based on the date of interview and the date of blood sampling. In subgroup analysis we first compared seroprevalence estimates corrected for non-response between different groups of individuals based on their answers in CATI. To explore individual risk factors for test positivity and obtain prevalence ratios we estimated a generalised linear model with Poisson distribution and a log link restricted to data from participants who completed clinic paper-based survey. We have entertained the possibility to use robust variance-covariance matrix in our adjusted prevalence ratio analysis. However, such adjustment narrowed the confidence intervals rendering our adjusted estimates less conservative [18] . For this reason we report confidence intervals from the unadjusted variance-covariance matrix. In sensitivity analysis we explored how inclusion of different sets of observable characteristics of individuals (namely, travel history, face mask use, public transport use, visits to public places and others) in the model that corrected seroprevalence for non-response influenced the results. We also applied alternative definitions of seroprevalence (test combination either favouring sensitivity or specificity). To account for possible sample non-representativeness in sensitivity analysis we computed raking weights to match the survey age group and educational attainment proportions in 2016 representative survey of adult city population (see Supplementary Appendix Table A3 for description of this survey and the target proportions). R package anesrake was used to compute the weights [19] . We then estimated seroprevalence on re-weighted data. We treated refusals to answer certain phone or paper-based survey questions as missing data, for this reason the results onwards are considered after listwise deletion of observations with missing variables. All reported seroprevalence results were also corrected for test characteristics using the manufacturer's validation datasensitivity (100% and 98.7%) and specificity (99.6% and 100%) for CMIA and ELISA test, respectively [20] . Standard errors were computed with delta method. Detailed description of statistical analysis is provided in Statistical Appendix App. 1). All analyses were conducted in R with the aid of GJRM package [21] , study data and code is available online (https:// github.com/eusporg/spb_covid_study20). The study was approved by the Research Planning Board of European University at St. Petersburg (on May 20, 2020) and the Ethic Committee of the Clinic "Scandinavia" (on May 26, 2020). The study was registered with the following identifiers: NCT04406038 and ISRCTN11060415. Participation rates. Between May 21 and June 25, 2020 66,250 individuals were reached using RDD. Of 13,071 respondents agreed to participate in the CATI 6,671 were excluded for various reasons (see Figure 1 ). The resulting 6,400 individuals responded to CATI questionnaire (see Supplementary Appendix Table A2 for details regarding missing records on variables of interest). The respondents were representative of the city population in terms of their gender, employment status, and household size, but were younger than the adult city population as of 2016 and had higher levels of educational attainment (see Supplementary Appendix Table A3 ). 3,390 of surveyed individuals agreed to receive a phone call from the clinic and schedule a visit for antibody testing. Between May 27 and June 26, 2020 only 1038 individuals that satisfied eligibility criteria visited the clinic and provided blood samples (16.2% and 30.6% of those who were interviewed and agreed to participate in serosurvey, respectively). The rest declined the invitation or did not show up at the test site. 1038 CMIA tests and 1035 ELISA tests were eventually performed on eligible individuals. The clinic-visiting participants have also filled out 965 clinic paper-based survey forms. 652 (62.8%) of 1,038 participants were women; 396 (38.2%) were aged 18-34 years, 357 (34.4%) were aged 35-49 years, 218 (21.0%) were aged 50-64 years, and 67 (6.5%) were older than 65 years, the majority of participants lived in multipleperson households, 843 (81.2%) (see Supplementary Appendix Table A2 for summary statistics on phone survey respondents and tested individuals). In the course of the study we observed the gradual attrition of participants. Compared with the individuals who limited their participation to the CATI, participants who took part in antibody testing were younger, more likely to be female, report a higher education level, experience illnesses in the previous 3 months, report a history of previous COVID-19 testing and a change in their hand-washing habits during the epidemic. Our attempt to randomly incentivize respondents to take part in the study by offering taxi did not reach its purpose. (see Supplementary Appendix Figure A2a ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint Seroprevalence estimates. Between May 27 and June 26, 2020, 115 positive results were reported by any test (97 positive tests out of 1038 were reported by CMIA and 107 positive tests out of 1035 were reported by ELISA). 30 of these 115 (26.1%) individuals with any positive test result did not report any symptoms of past illnesses in the previous 3 months. Naïve seroprevalence corrected for test specificity and sensitivity was 9.0% (95% CI 7.2-10.8) by CMIA and 10.8% (8.8-12.7) by ELISA (see Table 1 ). When we accounted for non-response bias with respect to demographic and socioeconomic characteristics our seroprevalence point estimates did not change considerably. Inclusion of characteristics associated with seroprevalence as regressors in our single imputation model shifted point estimates of seroprevalence downwards and after adjustment for all aforementioned characteristics in the model seroprevalence was 7.4% (95% CI 5.7-9.2) for CMIA and to 9.3% (7.4-11.2) for ELISA. Secondary subgroup analysis. Seroprevalence was similar between men and women and was slightly lower in the older (65+) age group (see Table 2 ). The seroprevalence was higher for individuals who reported past history of illnesses -(15.1% (95% CI 11.6-18.6) for CMIA and 20.0% (95% CI 14.8-25.2) for ELISA) compared to those who did not (3.8% (95% CI 2.1-5.5 for CMIA and 7.4% (95% CI 5.4-9.3 for ELISA). It was also higher for individuals who reported past history of COVID-19 tests, but was slightly lower in individuals who reported that they started washing hands more often since the onset of pandemic and lived alone. There was noticeable variation in seropositivity between city districts (see Figure 2 ). We observed a slight increase in seroprevalence by the week of the phone interview (see Figure 3a ) and by the week of the blood draw (see Figure 3b ). Our secondary analysis of participants who filled out clinic paper-based survey forms revealed additional covariates associated with seroconversion. It was negatively associated with smoking status with prevalence ratios 0.46 (95%CI 0.22-0.87) and 0.34 (95%CI 0.14-0.72) (PR for current smokers vs non-smokers based on CMIA and ELISA, respectively), and self-reported history of allergies with prevalence ratios 0.54 (95%CI 0.30-0.90) and 0.53 (95%CI 0.28-0.93). (see Supplementary Appendix Table 3 ). Sensitivity analysis. Alternative definitions of seroprevalence (test combination either favouring sensitivity or specificity) did not qualitatively change the effect of non-response bias (see Supplementary Appendix Table A4 ). Seroprevalence estimates obtained on re-weighted survey data (based on age group and education attainment level) were similar to estimates from the main analysis. (see Supplementary Appendix Table A5 ). Our study aimed to assess the spread of epidemic in the fourth largest European city -St. Petersburg. Although the seroprevalence estimate varied based on the test used and type of correction applied, the total number of population with detectable antibodies was still far lower than the proportion needed for herd immunity. Overall seroprevalence in the range between 7% and 10% was in line with the results obtained from the previous studies and provides evidence of the similar epidemic development across the world with less than one tenth of population affected in the first months [5; 6] . To the best of our knowledge, this is the first seroprevalence survey of COVID-19 that applied correction based on characteristics that are associated with the risk of seropositivity in combination with incentivised participation. Early COVID-19 serological surveys are likely to exhibit high sampling error because of recruitment methods. [22] Population based studies with random sampling relied on probability weighting obtained from the comparison with the source population [5] [6] [7] . . Our findings show that even low estimates of seroprevalence (around or below 10%) obtained in population surveys can be an overestimation in populations with high risk of non-response bias. We detected only a slight change in the estimate of seroprevalence when we corrected our estimated for non-response bias with respect to demographic or socioeconomic characteristics, but far more significant difference was detected when several behavioural characteristics were included in models and applied in the correction. In general, our analysis shows that naïve estimates that do not account for the non-response bias tend to drive prevalence estimates upward. In contrast to the findings in the literature examining the non-response bias in HIV serosurveys, on average participants who are more likely to have antibodies are more likely to participate in COVID-19 surveys [16; 23] . Participants with history of illness in the last 3 months or past history of tests for COVID-19 in the last 3 months were more likely to agree to antibody testing in our study probably seeking external confirmation. In our sample of participants we did find only a slight age difference in the seropositivity rates, and there was no difference between men and women, which is in line with previous findings [6] . However, we observed several clear differences in seroprevalence estimates in a subgroup analysis. First of all, we detected an elevated seroprevalence in participants who reported history of illness and history of any COVID-19 test in the last 3 months, this association was seen regardless of the modelling approach. Second, seroprevalence was lower in participants who lived alone and reported that they started to wash their hands more often. Third, in the secondary analysis of participants who were tested we observed that seroprevalence was lower in current smokers compared to never smokers, it was also lower in participants who reported past history of allergies. All associations revealed in our study should not be immediately regarded as causal due to limitations in the study design and analysis. History of testing and illness in the last 3 months can be easily interpreted. Seroprevalence among those reporting 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 4, 2020. ; a history of COVID-19 testing was relatively low (around 20%), this can be explained by the high scale of testing in Russia since the onset of the epidemic. However, our study is not a direct evidence of the effectiveness of hand hygiene, as selfreported change in habits can reflect other differences between sub-populations. There is limited and conflicting evidence about the smoking rates in COVID-19 patients [24; 25] . While our study is the one of the first that compared population-based seroprevalence estimates between smokers and non-smokers there is a need for more studies to confirm this finding [9] . There are many examples when smoking effects were subject to structural epidemiological biases [26] . Even if this association is causal, then behavioural or biological mechanisms should be explored. Smoking is a well-established risk factor for many diseases and it is likely linked to COVID-19 severity regardless of the risk of infection [24] . It is also tempting to immediately search for biological explanation that link allergy status and risk of infection [27] . However, we should be very cautious due to limitations of study design and other possible explanations, e.g. people who self-report being allergic may behave in a way to minimize risk of being infected. The question about allergy was very general in our paper-based survey, that also limits the value of this finding. Important source of bias in serological studies is the performance and the nature of the serological tests [28] . Possible explanation of the difference in our study includes different classes of Ig analysed -IgG in case of CMIA and IgG+IgM+IgA in case of ELISA. However, given the total seroprevalence of not more than 10% it seems that lack of IgM and IgA in CMIA test can only partially explain the difference. A recent study showed that seroconversion started on day 5 after disease onset and IgG level rose even earlier than IgM [29] . Another possible explanation for different seroprevalence estimates of two tests is the nature of antigen. SARS-CoV-2 antibody responses specific to the Spike (S) and/or the nucleocapsid (N) proteins are equally sensitive in the acute infection phase [30] . However, as compared to anti-S antibody responses, those against the N protein appear to wane in the post-infection [31] . Recent evaluations of CMIA test used in our study reported sensitivity far below 100% reported by manufacturer. This may also explain the difference [32; 33]. Another source of underestimation is a proportion of infected that do not seroconvert. Straightforward adjustments for this sort of biases are not available without additional laborious testing [34] . Our study has several other important limitations. We are addressing seroprevalence in adults only, while previous studies also included participants younger than 18 years old [5; 6] . Our study had a relatively low participation rate given the existing propensity to answer phone calls in the city. However, we assumed missingness at random for those who did not complete the interview or did not pick the phone. Comparison with the previous representative city survey showed that our sample was representative (see Supplementary Appendix Table A3 ). We have also excluded distant city districts from our sampling. Even though we observed statistically significant differences between by-district seroprevalence, the lion's share of city residents (about 4.3 mln of 5.2 mln) live in the surveyed districts. Our randomized incentivisation scheme was not successful because randomly assigned taxi offer was not associated with participation agreement and failed to become a valid exclusion restriction. In our main analysis we did not apply post-stratification methods adopted previously [5] . However, application of raking weights estimated to match targets from a representative survey of adult city population showed little to no changes in weighted seroprevalence estimates. We explained this by little to no association between seroconversion and age or education level. Finally, we report cross-sectional results but longitudinal data are needed to offer additional insights to immunity waning and prolonged defence against re-infection. Conclusion. COVID-19 pandemic has already affected at least 300 000 residents of St. Petersburg that can be extrapolated to millions in the whole country. However the vast majority of population does not carry antibodies to SARS-CoV-2. This highlights the need for further high-quality population based studies that can provide evidence for measures to diminish the impact of the pandemic. The study was funded by Polymetal International plc. The main funder had no role in study design, data collection, data analysis, data interpretation, writing of the report or decision to submit the publication. The European University at St. Petersburg, clinic "Scandinavia" and Genetico had access to the study data and The European University at St. Petersburg had final responsibility for the decision to submit for publication. AB, DSk, VV, KT, LB, DSh and PT conceived the study. AB, DSk and VV drafted the first version of the manuscript. KT, YR, AN, EP and DSh contributed to drafting sections of the manuscript. DS, AB and DSh did data analyses. SZ and EP did lab analyses. All authors participated in the study design, helped to draft the manuscript, contributed to the interpretation of data and read and approved the final manuscript. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint AB reports personal fees from MSD and Biocad outside the submitted work. AI, EP and SZ report a pending patent for the test system (ELISA) for detecting antibodies specific to the SARS-COV-2 in a biological sample. Other authors have no conflict of interest to declare. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint "Demographic characteristics" means the following variables: individual age group (18-34, 35-49, 50-64, 65+ years old) and sex. "Socioeconomic characteristics" means the following variables: higher education status and higher self-reported income level. "Characteristics associated with seropositivity" means the following variables: history of illness in the last 3 months, history of COVID-19 testing, whether respondent lives alone, change in hand washing habits during pandemic, week of the phone interview, and city district. All models include a variable indicating random offer of taxi transportation to and from the clinic test site for interviewed participants. All estimates are corrected for tests characteristics (see Statistical appendix for details). All estimates are from the model that includes demographics, socioeconomic status and characteristics associated with seropositivity. All estimates are corrected for test sensitivity and specificity (see Statistical appendix for details). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint (b) Naïve prevalence by blood sample draw week Figure 3 . Prevalence estimates over time 10 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. When participation is universal D i (Z) = Z i ∀i. In reality one could observe non-zero refusal rates. In what follows we assume one-sided noncompliance. Variable Y characterises antibody status (seroconversion) of the i-th individual and takes the following values: {1, 0}, where 1 -has antibodies to SARS-CoV-2 and 0 -does not have antibodies. Antibody status can be both observed and unobserved: D) . We are able to observe antibody status for Y i (Z i = 1, D i = 1), i.e. for the surveyed individuals who agreed to volunteer in the study and were tested. We are interested in population seroprevalence estimate π ≡ 1 /N × N i=1 Y i . Having conducted our study we can estimate the naïve seroprevalence where n is the number of tested individuals. To arrive at population-level seroprevalence estimates the following assumptions are required: [1] . First we assume that decision of a surveyed individual to agree to participate in the survey and come to the clinic test site is determined by a latent variable such that D i = 1 if D * > 0 and D i = 0 otherwise. T i is a variable equal to unity if surveyed individual was offered free taxi to and from the clinic test site during the phone survey, ε i is the error term. We observe D only for n out of N individuals in the population. We observe antibody status only for those with D i = 1 and assume that seroconversion is determined by a latent variable Since we have offered taxi to random phone survey participants we can safely assume that cov ([Y|X] , T) = 0 and T becomes a valid exclusion restriction. App. 1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint We impose a structural assumption of independent and identically Normal-distributed error terms ε i and ζ i with nil mean and unit variance. Their joint cumulative distribution function is given by Φ (ε i , ζ i , ρ) where ρ is covariance (correlation coefficient). This bivariate probit is estimated with R package GJRM [2] . We use different definitions of seropositivity Y = 1 depending on antibody tests or combinations thereof and different set of variables in design matrix X. Our first step is to estimate ρ and test whether it is statistically significantly different from zero. Estimates under different sets of variables are reported below with 95% CIs in parentheses are reported in Supplementary Appendix Table A1 . As our baseline we adopt a model where a rich set of demographic, socioeconomic, and seropositivity-related characteristics is included in the design matrix X. In this model results from both the simulated CIs and Lagrange multiplier test with null ρ = 0 (not reported here, available at request) suggest that one cannot reject the null hypothesis of error term independence between the selection stage and antibody test result stage. Under error term independence Heckman correction is not required to arrive at seroprevalence estimates for the entire city population when response is non-random. However, the naïve seroprevalence estimate can still be biased since the tested individuals are not representative of the city population. To circumvent this we use the estimated parameters from baseline seroconversion probit (see equation 2 above) and predict antibody status Y for all surveyed individuals regardless of their agreement to participate in the survey. Such (univariate) single imputation that assumes no unobserved confounders permits us to correct the naïve seroprevalence estimates for missing data for those individuals who have refused to get tested or did not visit the clinic. Symmetric confidence intervals come from standard errors estimated with delta method. Results do not change qualitatively when we consider non-symmetric confidence intervals after Bayesian posterior simulation of the parameter vector estimate (not reported here, available at request). Finally, we rely on manufacturers' test characteristics to correct all reported prevalence estimates π using the formula π corrected = π + specif icity − 1 sensitivity + specif icity − 1 ; std.dev. ( π corrected ) = std.dev. ( π) sensitivity + specif icity − 1 . App. 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint "Demographic characteristics" means the following variables: individual age group (18-34, 35-49, 50-64, 65+ years old) and sex. "Socioeconomic characteristics" means the following variables: higher education status and higher self-reported income level. "Characteristics associated with seropositivity" means the following variables: history of illness in the last 3 months, history of COVID-19 testing, whether respondent lives alone, change in hand washing habits during pandemic, week of the phone interview, and city district. All models include a variable indicating random offer of taxi transportation to and from the clinic test site for interviewed participants. All estimates are corrected for tests characteristics. Serosurvey sample was re-weighted with raking weights estimated to match the survey age group and educational attainment proportions in 2016 representative survey of adult city population (see Supplementary Appendix Table A3 for description of this survey and the target proportions). R package anesrake was used to compute the weights. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 4, 2020. ; https://doi.org/10.1101/2020.11.02.20221309 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 4, 2020. ; Assessing the extent of SARS-CoV-2 circulation through serological studies The paramount importance of serological surveys of SARS-CoV-2 infection and immunity Population-based age-stratified seroepidemiological investigation protocol for coronavirus 2019 (COVID-19) infection, 26 May 2020. World Health Organization Population-based surveys of antibodies against SARS-CoV-2 in Southern Brazil Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a populationbased study. The Lancet Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. The Lancet Estimation of seroprevalence of novel coronavirus disease (COVID-19) using preserved serum at an outpatient setting in Kobe, Japan: A cross-sectional study. medRxiv Performance Characteristics of the Abbott Architect SARS-CoV-2 IgG Assay and Seroprevalence Testing in Idaho Antibody prevalence for SARS-CoV-2 in England following first peak of the pandemic: REACT2 study in 100,000 adults. medRxiv Seroprevalence of immunoglobulin M and G antibodies against SARS-CoV-2 in China Survey non-response in the Netherlands: effects on prevalence estimates and associations Nonresponse research: an underdeveloped field in epidemiology Invited commentary: selection bias without colliders Commentary: on representativeness Analysis of non-response bias in a mailed health survey Analytical methods used in estimating the prevalence of HIV/AIDS from demographic and cross-sectional surveys with missing data: a systematic review Exact confidence limits for prevalence of a disease with an imperfect diagnostic test A modified poisson regression approach to prospective studies with binary data Package 'anesrake'. The Comprehensive R Archive Network A simultaneous equation approach to estimating HIV prevalence with nonignorable missing responses A note on COVID-19 seroprevalence studies: A meta-analysis using hierarchical modelling Validation, replication, and sensitivity testing of Heckman-type selection models to adjust estimates of HIV prevalence The impact of COPD and smoking history on the severity of COVID-19: a systemic review and meta-analysis Clinical characteristics of coronavirus disease 2019 in China Deconstructing the smoking-preeclampsia paradox through a counterfactual framework Allergic inflammation alters the lung microbiome and hinders synergistic co-infection with H1N1 influenza virus and Streptococcus pneumoniae in C57BL/6 mice Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysis Serologic Response to SARS-CoV-2 in COVID-19 Patients with Different Severity Side by side comparison of three fully automated SARS-CoV-2 antibody assays with a focus on specificity Characteristics associated with seropositivity" means the following variables: history of illness in the last 3 months, history of COVID-19 testing, whether respondent lives alone, change in hand washing habits during pandemic, week of the phone interview, and city district. All models include a variable indicating random offer of taxi transportation to and from the clinic test site for interviewed participants Correcting HIV prevalence estimates for survey nonparticipation using Heckman-type selection models A joint regression modeling framework for analyzing bivariate binary data in R. Dependence Modeling Exact confidence limits for prevalence of a disease with an imperfect diagnostic test We acknowledge personal support from Vitaly Nesis (Chief Executive Officer, Polymetal International, plc). We thank Alla Samoletova (European University at St. Petersburg) for administrative support and management of the study. We are also beholden to Dmitriy Serebrennikov (EU SPb) for managing paper-based survey data entry, Ruslan Kuchakov (EU SPb) for initial assistance with visualizations. We also gratefully acknowledge support from Yana Novikova and Aleksey Gladkikh (Invitro Laboratory) regarding the CMIA testing, Yulia Stepantsova (Chursina) regarding phone based interviewers, Maya Perestoronina (Clinic "Scandinavia") for comments on the protocol, Lizaveta Dubovik and Irina Shubina for the science communication, and Sergey Nechiporenko for the protocol translation. We thank the interviewers, nurses, general practitioners, and administrative personnel of the Clinic "Scandinavia". We also thank Ilya Fomintsev for his help and support during the initial stages of the study. We also thank all study participants. All analyses were conducted in R with the aid of GJRM package [21] , study data and code is available online (https:// github.com/eusporg/spb_covid_study20). The study was approved by the Research Planning Board of European University at St. Petersburg (on May 20, 2020) and the Ethic Committee of the Clinic "Scandinavia" (on May 26, 2020). The study was registered with the following identifiers: NCT04406038 and ISRCTN11060415.