key: cord-0906213-79srk0fa
authors: Xiao, Shujie; Sahasrabudhe, Neha; Hochstadt, Samantha; Cabral, Whitney; Simons, Samantha; Yang, Mao; Lanfear, David E; Williams, L Keoki
title: Predicting death from COVID-19 using pre-existing conditions: implications for vaccination triage
date: 2021-12-22
journal: BMJ Open Respir Res
DOI: 10.1136/bmjresp-2021-001016
sha: d1239eed31c8a244c7df11a6ae0ed80305ea1e46
doc_id: 906213
cord_uid: 79srk0fa

INTRODUCTION: Global shortages in the supply of SARS-CoV-2 vaccines have resulted in campaigns to first inoculate individuals at highest risk for death from COVID-19. Here, we develop a predictive model of COVID-19-related death using longitudinal clinical data from patients in metropolitan Detroit. METHODS: All individuals included in the analysis had a laboratory-confirmed SARS-CoV-2 infection. Thirty-six pre-existing conditions with a false discovery rate p<0.05 were combined with other demographic variables to develop a parsimonious prediction model using least absolute shrinkage and selection operator regression. The model was then prospectively validated in a separate set of individuals with confirmed COVID-19. RESULTS: The study population consisted of 15 502 individuals with laboratory-confirmed SARS-CoV-2. The main prediction model was developed using data from 11 635 individuals with 709 reported deaths (case fatality ratio 6.1%). The final prediction model consisted of 14 variables with 11 comorbidities. This model was then prospectively assessed among the remaining 3867 individuals (185 deaths; case fatality ratio 4.8%). When compared with using an age threshold of 65 years, the 14-variable model detected 6% more of the individuals who would die from COVID-19. However, below age 45 years and its risk equivalent, there was no benefit to using the prediction model over age alone. DISCUSSION: Using a prediction model, such as the one described here, may help identify individuals who would most benefit from COVID-19 inoculation, and thereby may produce more dramatic initial drops in deaths through targeted vaccination.

The COVID-19 pandemic, caused by SARS-CoV-2, has exceeded 32 million cases and a half million deaths in the USA. 1 Real-world vaccination effectiveness studies suggest that the mRNAbased vaccines are highly effective in preventing symptomatic disease-up to 82% after one dose and 94% after two doses. 2 Even with the recent overwhelming spread of the SAR-CoV-2 Delta variant, vaccination is associated with a >11 times lower age-standardised incidence rate ratio for death. 3 To date over 7 billion vaccine doses have been administered, yet the distribution of these doses has been skewed-approximately 65% of individuals in high-income countries have been vaccinated in contrast to 6.5% in low-income countries. 4 Given limited supply of vaccines, immunisation roll-out strategies have prioritised highrisk individuals; this prioritisation has been largely based on patient age. 5 6 Vaccination prioritisation could be improved through better algorithms to identify individuals at highest risk of death once infected. Here, we leverage detailed longitudinal clinical information of pre-existing comorbidities to develop a prediction model of COVID-19-related death in a racially diverse patient population from southeast Michigan. It is hoped that the information gleaned from this well-characterised patient population can inform COVID-19 severity prediction and vaccination roll-out strategies elsewhere.

This study was developed to identify individuals at highest risk for COVID-19 death to help target individuals for early immunisation. For

► Prioritisation for COVID-19 vaccination has largely been based on age; it is not known if using additional comorbidity information can improve targeting high-risk individuals without substantial increasing the numbers needing to be vaccinated. ► We show that using comorbidities can substantially improve identifying individuals likely to die from COVID-19 if infected over using age alone as a predictor; however, the relative benefit of this added information disappears below the risk equivalent of age 45 years. ► Our prediction algorithm was developed in a large and diverse patient population from metropolitan Detroit with electronic data on longitudinal care; hence, we were able to build and validate a prediction model of COVID-19-related death that is broadly generalisable and easily applied. 

Open access the purposes of this study, we use terms African American and European American to refer to individuals who selfidentified as non-Hispanic black and non-Hispanic white, respectively.

The prediction models were developed in accordance to Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines. 7 We used medical records from patients receiving care at the health system to identify adults with the following characteristics: age ≥20 years, a PCR confirmed SARS-CoV-2 infection, and ≥1 outpatient visit 2 years to 1 month before the first positive SARS-CoV-2 test was collected (ie, the index date). Online supplemental figure 1 illustrates how individuals were identified and how their data were used for developing and testing the prediction model. A COVID-19-related death was defined as one occurring during a hospital admission or within 14 days of hospital discharge for a COVID-19 infection (n=805). We also included 89 patients who died outside of the hospital and whose last diagnosis of record was a COVID-19 infection (n=89). There were 15 502 laboratory confirmed COVID-19 cases with 894 related deaths (13 686 cases from 2020 and 1816 cases from 2021); index dates ranged from 12 March 2020 to 21 February 2021. We used 85% of the cases from 2020 (n=11 635 with 709 deaths) to develop the prediction model (training set), and we randomly set aside a group of 2051 cases (with 123 deaths) from 2020 and 1816 cases (with 62 deaths) from 2021 to measure model performance (testing set). The distinction by year of disease onset was done to ensure that the prediction model was robust to the changing characteristics of the epidemic and the patient populations affected.

All primary encounter diagnoses within the health system between March 2018 and January 2021 were categorised into 133 separate clinical categories, and each category consisted of multiple International Classification of Diseases, 10th Revision (ICD-10) codes. A preexisting condition was defined as receiving at least two diagnoses (ICD-10 codes) within a category between 2 years prior to 1 month prior to the index date. Conditions in which ≤1 individual with COVID-19 was affected or conditions confined to only one sex (eg, pregnancy, erectile dysfunction and menopause) were not analysed. This resulted in 78 separate pre-existing conditions available for assessing association with COVID-19-related death. Patient age, sex, race-ethnicity, smoking status, body mass index (BMI), serum creatinine values and pre-existing conditions (the latter restricted to those with a false discovery rate adjusted p<0.05), were used as initial input in constructing the prediction model. Least absolute shrinkage and selection operator (LASSO) regression was used to select a parsimonious set of predictor variables for COVID-19-related death. LASSO regression was used since it can handle overfitting and *The training set comprised individuals used to identify risk factors for COVID-19-related death; these individuals were used to create the prediction model. The testing set comprised a separate group of individuals used to test the performance of the prediction model. †Other or unknown race-ethnicity included individuals who did not identify as exclusively African American, European American, Asian or Latino. This group also included individuals who identified as being part of multiple groups. ‡This was based on the most recent measure of BMI or creatinine >1 month prior to the index date. Index date was defined as the date that the first SARS-CoV-2 test was collected. §Ever smokers included both past and current smokers. BMI, body mass index.

Open access multicollinearity 8 9 ; less influential coefficients are shrunk to zero. 8 The penalty parameter (lambda=0.01) for LASSO regression was selected to optimise performance in 10-fold cross validation. 9 10 A risk score (RS) was constructed from the variables selected via LASSO; this score was assessed for its predictive performance in the set aside group of 3867 individuals. Variable weights were transformed by multiplying each coefficient by 1000; this ensured that all model weights were >1. Each individual's composite RS was calculated by summing the weighted variable results. Analyses were conducted using the statistical software R 11 ; the R packages glmnet and caret were used for LASSO regression and for calculating the confusion matrix, respectively. 12 13 

As shown in table 1, the average age of the 15 502 study individuals was 56.0 years (SD=18.0 years), and 9118 (58.8%) were female. The race-ethnic breakdown included 9176 (59.2%) European Americans, 4117 (26.6%) African Americans, 609 (3.9%) Latinos and 284 Asians (1.8%). Overall, 894 of the 15 502 individuals with laboratory-confirmed SARS-CoV-2 died from COVID-19 (case fatality ratio of 5.8%). The demographic characteristics of the individuals who died of COVID-19 differed from those who survived. When compared with those who survived, individuals who died tended to be older (76.95 years vs 54.73 years), were more likely to be male (54.7% vs 40.4%), had higher serum creatinine levels (1.58 mg/dL vs 0.99 mg/dL) and had a history of smoking (60.7% vs 40.8%). Patients in the training and testing sets were characteristically similar. A total of 709 of the 11 635 individuals used in the train set died of COVID-19 (case fatality ratio 6.1%), and 185 of the 3867 individuals in the testing set died of COVID-19 (case fatality ratio 4.8%).

The prediction model was developed and trained in a randomly selected set of 11 635 SARS-CoV-2-infected individuals. The demographic and clinical variable association results from model development are shown in both table 2 and online supplemental table 1 of the online supplement. Age was the most significant predictor for COVID-19-related death (p=1.76×10 −144 ), but male sex (p=2.68×10 −5 ), African American race (p=5.94×10 −5 ), a history of smoking (p=3.28×10 −6 ) and higher serum creatinine values (p=7.28×10 −16 ) were also significantly associated. BMI was not associated with COVID-19-related death after adjusting for the above variables (p=0.675). Thirty-six pre-existing conditions were also associated with COVID-19-related death with a false discovery rate adjusted p<0.05. The most significant pre-existing conditions were a history of respiratory failure (p=1.22×10 −18 ) and congestive heart failure (p=1.27×10 −17 Test performance was assessed in a separate group of 3867 individuals ( figure 1 and table 3 ). The optimum cutpoint, defined by the Youden index, 14 was a RS of ≥6685.4 in the 14-variable model and an age of ≥68 years in the age-only model. An age threshold of ≥65 years, the age used by many states to define early eligibility for vaccination, 15 had a sensitivity of 83.2%, a specificity of 70.2%, a positive predictive value (PPV) of 12.3% and a negative predictive value (NPV) of 98.8%. In comparison, the 14-variable RS with the same specificity of 70.2% (RS ≥6646.5), had a sensitivity of 89.2%, a PPV of 13.1% and an NPV of 99.2%. At an age threshold of ≤45 years, there was no difference in sensitivity using the 14-variable RS with the same specificity (RS ≤5304.8).

Vaccines have been effective at reducing COVID-19 severity and death, 16 yet global supplies are still limited, particularly in developing countries. 4 Even in countries with ready access to vaccination, uptake has been insufficient to halt SARS-CoV-2 spread and a resurgence of infections and deaths. For example, in the US more than 40% of individuals are not fully vaccinated. This underscores the continued importance of targeting high risk individuals in the initial stages of vaccine roll-out, as well as for uptake once supply needs are met.

Wynants et al performed a systematic review of existing prediction models of COVID-19-related outcomes, but found a number of deficiencies in the existing literature. 17 Of the 107 articles on COVID-19 prognostic models reviewed, 39 were for predicting mortality. Problematic issues in the existing literature included small study sizes and high potential bias (eg, by not adhering to prediction model reporting standards, using proxy measures for outcomes, and including study individuals not reflective of the larger target population). However, the review did identify three studies with uncertain bias but large sample sizes. [18] [19] [20] Nevertheless, only two of these studies predicted COVID-19 mortality, 19 20 and these were among individuals already severe enough to be admitted to the hospital.

In contrast, our prognostic score may be useful in identifying high-risk individuals based on pre-existing conditions (ie, characteristics that predispose to dying from COVID-19 prior to becoming infected). Design features which bolster the importance of our findings Open access 

include using separate large and racially diverse groups for model development and validation, restricting cases to those with laboratory-confirmed SARS-COV-2 diagnoses, drawing on an extensive longitudinal record of pre-existing clinical conditions, and accounting for COVID-19-related deaths both within and outside of the hospital. In this regard, our RS represents a valuable tool to identify individuals at greatest risk from dying of COVID-19 and thus could inform vaccination roll-out schemas. For example, as compared with using an age cut-off of ≥65 years alone, our study found that incorporating pre-existing comorbidities could identify 6% more of the individuals who would die from COVID-19 if infected without increasing the total number of individuals deemed 'high risk' (ie, improved sensitivity with the same specificity). Conversely, our data suggest that once high-risk individuals (RS ≥5304.8) and individuals aged ≥45 years have been vaccinated, additional triage based on age or risk score among adults is not needed. Our study should be considered in light of potential limitations. First, all the participants were recruited from a single health system in southeast Michigan. While this may limit the generalisability of our prediction model, it is important to note that our study population included all documented cases of SARS-CoV-2 infection with the health system. As a result, we broadly captured the diversity of the Detroit metropolitan area. Second, as this is an observational study, it is possible that our model missed other important predisposing clinical conditions. Nevertheless, the large number of conditions that we considered (ie, diagnoses made over nearly 3 years for the entire covered patient population) makes it is unlikely that we missed common diseases with large effects. However, the large number of variables that we evaluated simultaneously could also result in erroneous parameter estimation via multicollinearity, as has been observed elsewhere. 21 To mitigate the effect of multicollinearity, we used LASSO regression. This penalised regression method constrains the degree of parameter inflation, selecting some variables for model inclusion while shrinking the parameter estimates of others to zero. In so doing, LASSO regression can improve model prediction §OR for COVID-19-related death according to a one-unit increase in the listed variable (coding and directionality for each variable retained is described in the preceding footnote). BMI, body mass index; FDR, false discovery rate; LASSO, least absolute shrinkage and selection operator. Figure 1 Receiver operating characteristic (ROC) curves demonstrating the performance of two models to predict COVID-19-related deaths among individuals with laboratory confirmed SARS-CoV-2 infection (n=3867) from southeast Michigan and the Detroit metropolitan area. The black line denotes the 14-variable prediction model with black circles representing risk score thresholds. The grey line denotes the age-only prediction model with grey circles representing age thresholds. Red circles represent the Youden index (ie, the point that maximises Sensitivity +Specificity -1). The area under the curve (AUC) for the 14-variable ROC curve was 0.868 (0.846-0.891), and the AUC for the age-only ROC curve was 0.846 (0.821-0.871). Risk score=2340.58 + 62.90×age (in years)+7.18×sex (male=1, female=0)+129.33×serum creatinine (in mg/dL)+583.17×history of respiratory failure (yes=1, no=0)+539.28×congestive heart failure (yes=1, no=0)+296.49×chronic obstructive pulmonary disease (yes=1, no=0)+77.60×coronary artery disease (yes=1, no=0)+76.81×atrial fibrillation/atrial flutter (yes=1, no=0)+127.74×cerebrovascular disease (yes=1, no=0)+91.23×musculoskeletal disease affecting thorax (yes=1, no=0)+141.21×peripheral vascular disease (yes=1, no=0)+217.31×pressure ulcer (yes=1, no=0)+48.44×traumatic brain injury (yes=1, no=0)+134.12×dementia (yes=1, no=0).

*High risk is defined as greater than or equal to the threshold age or risk score. †Prediction models were tested in 3867 individuals with PCR confirmed SARS-CoV-2 infection in 2020 and 2021. Test specificity at a given age was used to determine the corresponding 14-variable risk score performance for same specificity. ‡The 14-variable prediction model risk score was calculated using the following formula:. §Absolute difference in sensitivity=sensitivity of 14-variable model -sensitivity of age-only model. ¶Absolute difference in the per cent at high risk=per cent designated high risk using 14-varaible model -per cent designated high risk using age-only model.

Open access accuracy while limiting the number of variables to those with the strongest effects. 8 Third, the factors that predispose to COVID-19-related death, may not have the same relationship to vaccine response. For example, older age, smoking and obesity have been associated with lower response to SARS-CoV-2 vaccination. 22 23 Unfortunately, in this study, we did not have measures of vaccine response; hence, we could not incorporate this into our model of COVID-19 mortality prediction. On the other hand, vaccines against SARS-CoV-2 were not widely available for the vast majority of our observation period. Therefore, it is highly unlikely that vaccination status confounded our risk model of COVID-19-related death. Lastly, the COVID-19 pandemic has been ever changing with new viral variants emerging rapidly. 24 25 This rapid evolution has implications on vaccine response, breakthrough infections and viral virulence. 26 27 To partially address this issue, we evaluated the performance of our model in patients first infected in 2020 and in 2021; our model produced similar results (data not shown). Therefore, it is possible that the predictive ability of our model may change over time, but we did not observe a noticeable difference in our time window.

In conclusion, we have developed a prediction model of COVID-19-related death using pre-existing patient characteristics and comorbidities. Since our model was based on a large, diverse and well-characterised patient population, we believe that the resulting prediction equation may be broadly suited to identifying individuals at high risk of COVID-19 death. Our model suggested that while age is an important and dominant risk factor for COVID-19-related death, if used alone to determine vaccine prioritisation, it would miss a substantial portion of high-risk individuals (ie, persons who would receive the largest risk benefit from vaccination).

COVID data tracker: centers for disease control and prevention

Interim Estimates of Vaccine Effectiveness of Pfizer-BioNTech and Moderna

Vaccines Among Health Care Personnel -33 U.S. Sites

Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status -13

COVID-19) Dashboard: World Health Organization

The Advisory Committee on Immunization Practices' Interim Recommendation for Allocating Initial Supplies of COVID-19 Vaccine -United States

Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

Regression shrinkage and selection via the lasso

Regularization paths for generalized linear models via coordinate descent

Classification and regression trees

R: a language and environment for statistical computing

Lasso and Elastic-Net Regularized Generalized Linear Models

Building Predictive Models in R Using the caret Package

Index for rating diagnostic tests

As States Expand Access To the Vaccine, Hopes Rise Along With the Confusion. The New York Times

SARS-CoV-2 vaccines

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal

Individualizing risk prediction for positive coronavirus disease 2019 testing: results from 11,672 patients

Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4C mortality score

IL-6 and CD8+ T cell counts combined are an early predictor of in-hospital mortality of patients with COVID-19

Potential multicollinearity among NLR and other variables in the prediction model for the COVID-19 mortality

Central obesity, smoking habit, and hypertension are associated with lower antibody titres in response to COVID-19 mRNA vaccine

Infliximab is associated with attenuated immunogenicity to BNT162b2 and ChAdOx1 nCoV-19 SARS-CoV-2 vaccines in patients with IBD

Spike mutation D614G alters SARS-CoV-2 fitness

Genomic reconstruction of the SARS-CoV-2 epidemic in England

COVID vaccine makers brace for a variant worse than delta

SARS-CoV-2 variants and vaccines

Contributors SX and LKW conceived the work; SX, NS, SH, WC, SS, MY, DEL and LKW were involved in either acquiring, analysing or interpreting the data for the work; SX, NS, DEL and LKW were involved in drafting the work; SX, NS, SH, WC, SS, MY, DEL and LKW revised the work for important intellectual content; all authors gave final approval of the version to be published; and all others agree to be accountable for all aspects of the work ensuring that questions related its accuracy or integrity are appropriately investigated and resolved.Funding This work was supported by the Fund for Henry Ford Hospital (DEL and LKW) and from the following institutes of the National Institutes of Health: National Institute of Allergy and Infectious Diseases (R01AI079139 to LKW), the National Heart Lung and Blood Institute (R01HL103871 and R01HL132154 to DEL and R01HL118267, R01HL141845, and X01HL134589 to LKW) and the National Institute of Diabetes and Digestive and Kidney diseases (R01DK113003 to LKW).

Competing interests DEL reports serving as a consultant for Amgen, Janssen, Ortho Diagnostics, DCRI (Novartis), Cytokinetics and Martin Pharmaceuticals and having participated in the running clinical trials for Amgen, Bayer, and Janssen; these activities are unrelated to the subject matter of the current manuscript. LKW reports owning stock in companies which produce SARS-CoV-2 vaccines; there was no transactional relationship related to this manuscript. None of the other authors report any competing interests.

Ethics approval This study involves human participants and was approved by approved by the Institutional Review Board (IRB) of Henry Ford Health System. No approval ID provided.This study was approved by the Institutional Review Board (IRB) of Henry Ford Health System. The IRB permitted a waiver of individual consent to use and analyze longitudinal clinical records of health system patients in order to build a prediction model of COVID-19-related death. This waiver was predicated on the use of the data involving no more than minimal risk to study subjects (ie, data were collected as part of clinical care), that the research could not be practicably performed without a waiver, and that the waiver didn't negatively affect the rights or welfare of study subjects. Given the rapidly evolving pandemic, it was also not possible to include patients or the public in the design or conduct of the study or in the reporting or dissemination of this work.Provenance and peer review Not commissioned; externally peer reviewed.Data availability statement All data relevant to the study are included in the article or uploaded as online supplemental information.Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.Open access This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http:// creativecommons. org/ licenses/ by-nc/ 4. 0/.