key: cord-0429101-dcaq26k0 authors: Bouman, J. A.; Kadelka, S.; Stringhini, S.; Pennacchio, F.; Meyer, B.; Yerly, S.; Kaiser, L.; Guessous, I.; Azman, A. S.; Bonhoeffer, S.; Regoes, R. R. title: Applying mixture model methods to SARS-CoV-2 serosurvey data from the SEROCoV-POP study date: 2021-07-19 journal: nan DOI: 10.1101/2021.07.19.21260410 sha: 09e779a990fd4da0a9e0026cc5344bf9b15f6cb7 doc_id: 429101 cord_uid: dcaq26k0 Serosurveys are an important tool to estimate the true extent of the current SARS-CoV-2 pandemic. So far, most serosurvey data have been analysed with cut-off based methods, which dichotomize individual measurements into sero-positives or negatives based on a predefined cutoff. However, mixture model methods can gain additional information from the same serosurvey data. Such methods refrain from dichotomizing individual values and instead use the full distribution of the serological measurements from pre-pandemic and COVID-19 controls to estimate the cumulative incidence. This study presents an application of mixture model methods to SARS-CoV-2 serosurvey data from the SEROCoV-POP study from April and May 2020 (2766 individuals). Besides estimating the total cumulative incidence in these data (8.1% (95% CI: 6.8% - 9.8%)), we applied extended mixture model methods to estimate an indirect indicator of disease severity, which is the fraction of cases with a distribution of antibody levels similar to hospitalised COVID-19 patients. This fraction is 51.2% (95% CI: 15.2% - 79.5%) across the full serosurvey, but differs between three age classes: 21.4% (95% CI: 0% - 59.6%) for individuals between 5 and 40 years old, 60.2% (95% CI: 21.5% - 100%) for individuals between 41 and 65 years old and 100% (95% CI: 20.1% - 100%) for individuals between 66 and 90 years old. Additionally, we find a mismatch between the inferred negative distribution of the serosurvey and the validation data of pre-pandemic controls. Overall, this study illustrates that mixture model methods can provide additional insights from serosurvey data. Introduction from Stringhini et al [2] . Besides the age of the individual and the measured IgG OD ratio of the Euroimmun SARS-CoV-2 serological assay this data also reports the sex of each individual, the household structure 93 between the individuals and the date of the measurement. The Euroimmun SARS-CoV-2 serological assay 94 measures the IgG and IgA antibodies against the S1-domain of the spike protein of SARS-CoV-2 [14] . The IgG 95 ratio is the result of the immunoreactivity of the sample measured at an optical density of 450 nm (OD450) 96 divided by the OD450 of the calibrator [14] , [15] . 97 Mixture model methods 98 We have assembled all observations of the SEROCoV-POP study from April and May and apply the mixture COVID-19 control measurements, and π is the cumulative incidence. The likelihood is extended for the model where the outpatient and hospitalized cases are estimated separately, 105 see equation 2. Here, π out is the cumulative incidence of outpatient cases and π hosp the cumulative incidence 106 of hospitalized cases, σ i can be 0 (no past infection), 1 (past outpatient infection) or 2 (past hospitalized 107 infection). The 95% confidence intervals are estimated by bootstrapping the control distributions as well as the observa-109 tions from the serosurvey. The various mixture models are compared with a likelihood ratio test. We applied the extended model described above to the serosurvey data segregated into three age cate- This model is then compared to the model of Equation 2 to test if the additional distribution has significantly 120 improved the likelihood of observing the serosurvey data. 3 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. [14] . Solid lines indicate the empirical distributions. The purple solid line shows the inferred additional distribution that is an indication of the mismatch between the pre-pandemic controls and the serosurvey data. 4 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 19, 2021. ; https://doi.org/10.1101/2021.07.19.21260410 doi: medRxiv preprint outpatient control data is significantly better than model based on one type of 141 controls only 142 The significant difference between the distributions of the IgG OD ratios for the hospitalized and the outpatient 143 controls allows the mixture model method to simultaneously estimate the cumulative incidence of both types 144 of cases in the data from the SEROCoV-POP study from April and May 2020 (see Equation 2). We find a 145 cumulative incidence of 4.0% (95% CI: 0.8% -7.4%)) for cases with a distribution of antibody levels similar to 146 hospitalized controls and 4.2% (95% CI: 1.4% -7.4%)) for cases with a distribution of antibody levels similar 147 to outpatient controls. As a result, the fraction of cases in the serosurvey that can be explained with the 148 distribution of the IgG OD ratios from the hospitalized controls, which we refer to as the indirect indicator of 149 disease severity, is 51.2% (95% CI: 9.9% − 83.7%). The large 95% CI of this indicator of disease severity is To investigate if the model improves by including a separate estimate for both types of positive controls, we 154 compared the likelihood of the estimates above to the likelihood from a model that is based on either the 155 hospitalized or outpatient control data only (see Equation 1 and Table 1 ). The p-values in Table 1 indicate 156 that the model is indeed significantly improved by estimating two cumulative incidences separately. Table 1 157 also shows that the point estimate of the total cumulative incidence estimate is higher if the mixture model is Hospitalized data only 7.7% (6.3% -9.6%) 5.0e-07 Outpatient and hospitalized data treated as separate distributions 8.1% (6.8% -9.9%) - Indirect indicator of disease severity differs between age groups 163 It is known that there is a correlation between the age of an infected individual and the severity of a SARS- CoV-2 infection [16] . To validate our methodology, we estimated the indirect indicator of disease severity for 165 three age-classes: 5 to 40 years, 41 to 65 years and 66 to 90 years. These estimates, together with the total 166 cumulative incidence estimates for the age-classes, are shown in Table 2 . Indeed, the indirect indicator of 167 disease severity is highest for the oldest age class: we estimated that 100 % of the cases in the serosurvey can 168 be explained by the distribution of the hospitalized COVID-19 controls, for the middle and young class this 169 is 60.2 % and 21.4 % respectively (see Figure 2) . Figure 3 shows that the maximal observed IgG ratio as 170 well as the median of all values above the cutoff provided by the manufacturer (red dots) increase with age. However, the overall median of the distribution does not increase with age (black dots). This illustrates that 172 the observed increase in the indirect indicator of disease severity is driven by the upper part of the IgG ratio 173 distributions. The model that separately considers the age classes is significantly better than the model without 174 these age classes after correcting for the increased amount of parameters (likelihood-ratio test, p-value = 0.009). 175 176 5 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 19, 2021. Mismatch between pre-pandemic controls and individuals without previous SARS- overlaying part of the distribution of the pre-pandemic control samples (see Figure 1 ). This indicates that the 190 6 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 19, 2021 The mismatch we identified between the control data and the serosurvey data appeared in the lower range 197 of the IgG OD ratio. Hence, we assembled all values below 0.34 into a single point mass to eliminate this 198 mismatch and test for an additional mismatch on the higher end of the observed IgG OD ratios. However, we 199 did not find any evidence for such an additional mismatch. This suggests that the individuals with high IgG 200 OD ratios in the serosurvey are well represented by the positive control data. In this study, we present an application of mixture model methods to SARS-CoV-2 serosurvey data. Serosurvey 203 data are currently used to determine the proportion of seropositivity and to estimate the cumulative incidence 204 and the relative risk of seropositivity in various sub-groups. This is usually done by introducing a cutoff for 205 seropositivity. We show that mixture models that use the entire distribution of the antibody levels rather than a cut-208 off for seropositivity, provide additional insights into aspects of an epidemic that are usually not addressed 209 in serosurveys. Specifically, we have used mixture models to infer the cumulative incidence from distinct CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2021. ; https://doi.org/10.1101/2021.07.19.21260410 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2021. ; https://doi.org/10.1101/2021.07.19.21260410 doi: medRxiv preprint Assessing the extent of SARS-CoV-2 circulation through serological 265 studies SEROCoV-POP): a population-based study Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, 274 and public policy implications Serological evidence of human infection with 278 a systematic review and meta-analysis How can we interpret SARS-CoV-2 antibody test results Performance assessment of 11 285 commercial serological tests for SARS-CoV-2 on hospitalised COVID-19 patients Validation of a commercially available SARS-CoV-2 317 serological immunoassay Respiratory Syndrome Coronavirus 2Specific Antibody Responses in Coronavirus Disease Patients Association between age and clinical characteristics and outcomes of COVID-19 Male sex identified by global COVID-19 meta-analysis as a risk factor for 330 death and ITU admission 333 An alternative, empirically-supported adjustment for sero-reversion yields a 10 percentage point lower 334 estimate of the cumulative incidence of SARS-CoV-2 in Manaus by Immunological memory to SARS-CoV-2 338 assessed for up to 8 months after infection serological assays informs future diagnostics and exposure assessment