key: cord-0262879-twae11pi authors: Dite, G. S.; Murphy, N. M.; Spaeth, E.; Allman, R.; Initiative, Lifelines Corona Research title: Validation of a clinical and genetic model for predicting severe COVID-19 date: 2022-01-15 journal: nan DOI: 10.1101/2022.01.14.22269270 sha: fac453496bcf5b7f212b1fdf1a159d8a00665977 doc_id: 262879 cord_uid: twae11pi Using nested case-control data from the Lifelines COVID-19 cohort, we undertook a validation study of a clinical and genetic model to predict the risk of severe COVID-19 in people with confirmed COVID-19 and in people with confirmed or self-reported COVID-19. The model performed well in terms of discrimination of cases and controls for all ages (area under the receiver operating characteristic curve [AUC] = 0.680 for confirmed COVID-19 and AUC = 0.689 for confirmed and self-reported COVID-19) and in the age group in which the model was developed (50 years and older; AUC = 0.658 for confirmed COVID-19 and AUC= 0.651 for confirmed and self-reported COVID-19). There was no evidence of over- or under-dispersion of risk scores but there was evidence of overall over-estimation of risk in all analyses (all P < 0.0001). In the light of large numbers of people worldwide remaining unvaccinated and continuing uncertainty regarding vaccine efficacy over time and against variants of concern, identification of people at high risk of severe COVID-19 may encourage the uptake of vaccinations (including boosters) and the use of non-pharmaceutical inventions. under-dispersion of risk scores but there was evidence of overall over-estimation of risk in all 48 analyses (all P < 0.0001). In the light of large numbers of people worldwide remaining 49 unvaccinated and continuing uncertainty regarding vaccine efficacy over time and against 50 variants of concern, identification of people at high risk of severe COVID-19 may encourage 51 the uptake of vaccinations (including boosters) and the use of non-pharmaceutical inventions. 52 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101 https://doi.org/10. /2022 the regular online COVID-19 questionnaires via an emailed link during the first eight weeks 78 of data collection [4] . 79 The questionnaire response dates corresponded to the period from around one month 80 after the beginning of the first wave of the COVID-19 pandemic in the Netherlands through 81 to the peak of the fourth wave in May 2021. During this time, the original SARS-CoV-2 virus 82 accounted for over 95% of infections in the Netherlands until early January 2021, after which 83 the alpha variant became more prevalent and had accounted for over 95% of infections by the 84 end of March 2021 [7] . The presence of the delta variant was negligible during the period of 85 data collection for this study. 86 COVID-19 vaccinations became available in the Netherlands in mid-January 2021 87 and were initially offered to high-risk groups and then progressively to other groups (such as 88 care workers) and younger age groups until all adults became eligible in mid-June 2021 [8] . At the beginning of data collection, when testing for SARS-CoV-2 infection was not 93 widely available in the Netherlands, Lifelines COVID-19 questionnaires 1-4 asked 94 participants whether a doctor had told them they had . From questionnaire 5 95 (early May 2020) onwards, the questionnaires also asked about positive test results. From 96 these questions we identified a group of participants with confirmed COVID-19. In addition, 97 the questionnaires asked participants to self-report having had COVID-19. We used this 98 question with the previous questions to identify a broader group of participants who had 99 either confirmed or self-reported Given the limited availability of testing early in the data collection period, the 101 confirmed COVID-19 group is likely to miss some participants who had . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101 https://doi.org/10. /2022 to have some false positives. The true number of participants who had COVID-19 will be 104 somewhere between the two. Therefore, we conducted two sets of analyses: (i) using 105 participants with confirmed COVID-19 and (ii) using participants with confirmed and self-106 reported COVID-19. 107 As we did previously, we used hospitalization as a proxy for severe . 108 The Lifelines COVID-19 questionnaires specifically asked participants whether they had 109 been hospitalized for COVID-19. The questionnaires also asked about being given 110 supplemental oxygen, admission to an intensive care unit and being placed on a ventilator, 111 but there were too few positive responses to these questions to allow separate analysis. 112 The risk factors included in the calculation of the risk of severe COVID-19 are age; 113 sex; body mass index; a history of cerebrovascular disease, diabetes, haematological cancer, 114 non-haematological cancer, hypertension, kidney disease or respiratory disease (excluding 115 asthma); and the genotypes of seven single nucleotide polymorphisms (SNPs) -116 rs112641600, rs10755709, rs118072448, rs7027911, rs71481792, rs112317747 and 117 rs2034831 [3] . The log odds of the risk of severe COVID-19 is the sum of the intercept and 118 the product of the value and beta coefficient for each of the risk factors listed in 119 Supplementary Table S1. The probability of severe COVID-19 is then the inverse logit of the 120 log odds (x), that is, We used the age reported at the completion of the participant's first Lifelines COVID-122 19 questionnaire. The questionnaires asked about a history of cancer, cerebrovascular disease, 123 diabetes, hypertension, kidney disease and respiratory disease on three occasions. If any of 124 the participants' responses to the risk factor questions were missing for all answered 125 questionnaires, we used responses from their Lifelines baseline questionnaire. 126 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint participants so we used the risk associated with having a non-haematological cancer for all 128 reported cancers. In the Lifelines questionnaires, the respiratory disease question included 129 asthma, whereas this is excluded in the model calculations. Because we were not able to 130 distinguish respiratory disease solely due to asthma, we included all reports of respiratory 131 disease in the model calculations. Gender, ethnicity, weight and height were taken from the 132 Lifelines baseline questionnaire. If two weight or height measurements were available, we 133 used the most recent weight measurement and the mean of the height measurements. To extend the model to people aged less than 50 years, we estimated the risk 139 associated with younger age groups using data from the Centers for Disease Control and 140 Prevention [9] such that, compared with the 50-69 years baseline age group, people aged 18-141 29 years were at 0.27 times the risk, people aged 30-39 years were at 0.43 times the risk, and 142 people aged 40-49 years were at 0.67 times the risk. 143 In each analysis -(i) using participants with confirmed COVID-19 and (ii) using 144 participants with both confirmed and self-reported COVID-19 -the cases were those who 145 reported having been hospitalized for COVID-19 and the controls were the remainder of the 146 group. We also did analyses restricting the dataset to those aged 50 years or older (the ages in 147 which the model was developed). 148 As we did previously [3], we assessed the association between quintile of risk score 149 and severe COVID-19 using logistic regression. We used the area under the receiver 150 operating characteristic curve (AUC) to assess discrimination. We used logistic regression of 151 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint the log odds of the risk score to assess calibration in terms of the overall estimation of risk 152 (the intercept) and the dispersion of risk (the slope), and we drew calibration plots of deciles 153 of expected and observed cases of severe COVID-19. We used Stata MP version 13.1 154 (StataCorp LP, College Station, Texas, USA) for all analyses and all statistical tests were two 155 sided. 156 The Lifelines protocol has been approved by the Medical Ethical Committee of the 157 University Medical Center Groningen, The Netherlands, under Approval Number 2007/152. 158 All participants provided written informed consent to Lifelines before data collection began. 159 This research was conducted using Lifelines data under Project Number OV20-00101. 160 The data used in this study was made available to us by Lifelines and is not publicly 161 available. Researchers can apply to use the Lifelines data used in this study, and more 162 information about how to request Lifelines data and the conditions of use can be found on 163 their website (https://www.lifelines.nl/researcher/how-to-apply). Stata MP Version 13.1 code 164 for the analysis is available for non-commercial purposes from the corresponding author on In the final dataset, 55 participants were hospitalized for their COVID-19 infection 174 and were considered cases in this study. We used two control groups: the first comprised the 175 1355 participants who had confirmed COVID-19; the second comprised both the first control 176 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint age was 57.6 years (standard deviation [SD] = 10.3) and the mean number of completed 179 questionnaires was 17.0 (SD = 6.3). In the confirmed COVID-19 control group, there were 180 905 (66.8%) women and 450 (33.2%) men; their mean age was 53.0 years (SD = 11.6) and 181 the mean number of completed questionnaires was 14.1 (SD = 6.8). In the confirmed and 182 self-reported COVID-19 control group, there were 2414 (62.3%) women and 1459 (37.7%) 183 men; their mean age was 51.5 years (SD = 11.8) and the mean number of completed 184 questionnaires was 12.2 (SD = 7.2). 185 In the cases, the mean probability of severe COVID-19 was 0.225 (SD = 0.019); in 186 the confirmed COVID-19 controls, the mean was 0.165 (SD = 0.002); and in the confirmed 187 and self-reported COVID-19 controls, the mean was 0.165 (SD = 0.001). The risk 188 distribution for the cases, both control groups and the whole Lifelines COVID-19 cohort are 189 show in in Supplementary Figure S1 . 190 The top half of Table 1 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint The bottom half of Table 1 The true number of people with COVID-19 is unknown but is likely to be somewhere 208 between the number who test positive for SARS-CoV-2 infection and the number who self-209 report having had COVID-19. In this study we have addressed this uncertainty by conducting 210 two sets of analyses: the first in individuals with confirmed COVID-19, and the second with 211 individuals with confirmed and self-reported COVID-19. In terms of discrimination, the 212 AUC of the risk prediction model was almost identical in the two analyses and only slightly 213 lower than the AUC in validation group in the model development paper [3] . This and the 214 similarity in the association per quintile of risk (Table 1) have been unrelated to COVID-19, some will represent people who became infected and 224 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint were too unwell to complete a Lifelines COVID-19 questionnaire before they died. This 225 limitation may have attenuated some of the results seen in this study. 226 As the pandemic continues to evolve, there are two major issues that can affect the 227 utility of our risk model. First, we have to address the impact of viral variants on the 228 performance of the risk model. The model development paper [3] and the present study used 229 datasets in which the original and alpha SARS-CoV-2 variants were predominant. We have 230 not been able to assess our model in datasets with known delta or omicron SARS-CoV-2 231 variants. We hypothesize that the clinical and genetic risk factors have broad effects in terms 232 of risk of severe disease because the delta and omicron SARS-CoV-2 variants appear to 233 affect transmissibility rather than severity [10] . 234 Second, our model does not incorporate the protection offered by vaccination. Thus in 235 vaccinated adults, the model will overestimate their risk of developing severe disease. 236 However, we know that vaccine immunity wanes over about six months through a steady 237 reduction in antibody levels leading to greater number of breakthrough infections among the 238 vaccinated [11] . The wide range of immunity across individuals makes it hard to predict the 239 impact of waning vaccination in terms of risk. Thus, we believe that the model can be used to 240 provide a baseline risk of developing severe disease, even in the context of vaccinated adults. 241 Herein, we have validated our model to predict risk of severe COVID-19 if infected 242 with SARS-CoV-2 in a dataset unrelated to the one in which the model was originally 243 developed and validated. Despite new SARS-CoV-2 variants of concern, the model may 244 complement current public health efforts in vaccine (and booster) uptake and may enable 245 healthcare providers to have more informed discussions with patients about their risk-246 mitigation options and early treatment awareness, if ever infected. 247 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. December. Available at https://www.scientificamerican.com/article/omicrons-effect-275 wont-be-as-mild-as-hoped1/ (Accessed 13 January 2021). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint (especially the data management team), the contributing research centres delivering data to 281 Lifelines and all the study participants. 282 The analyses undertaken in this study were fully funded by Genetic Technologies 288 Limited. 289 which is assigned to Genetic Technologies Limited. 298 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2022. ; https://doi.org/10.1101/2022.01.14.22269270 doi: medRxiv preprint Risk factors for severe and critically ill COVID-19 patients: A