key: cord-1020469-3re5dpjd authors: Hao, Boran; Hu, Yang; Sotudian, Shahabeddin; Zad, Zahra; Adams, William G; Assoumou, Sabrina A; Hsu, Heather; Mishuris, Rebecca G; Paschalidis, Ioannis C title: Development and validation of predictive models for COVID-19 outcomes in a safety-net hospital population date: 2022-05-09 journal: J Am Med Inform Assoc DOI: 10.1093/jamia/ocac062 sha: 22577e0fcd191a303b252b1574b672c0d0131274 doc_id: 1020469 cord_uid: 3re5dpjd OBJECTIVE: To develop predictive models of coronavirus disease 2019 (COVID-19) outcomes, elucidate the influence of socioeconomic factors, and assess algorithmic racial fairness using a racially diverse patient population with high social needs. MATERIALS AND METHODS: Data included 7,102 patients with positive (RT-PCR) severe acute respiratory syndrome coronavirus 2 test at a safety-net system in Massachusetts. Linear and nonlinear classification methods were applied. A score based on a recurrent neural network and a transformer architecture was developed to capture the dynamic evolution of vital signs. Combined with patient characteristics, clinical variables, and hospital occupancy measures, this dynamic vital score was used to train predictive models. RESULTS: Hospitalizations can be predicted with an area under the receiver-operating characteristic curve (AUC) of 92% using symptoms, hospital occupancy, and patient characteristics, including social determinants of health. Parsimonious models to predict intensive care, mechanical ventilation, and mortality that used the most recent labs and vitals exhibited AUCs of 92.7%, 91.2%, and 94%, respectively. Early predictive models, using labs and vital signs closer to admission had AUCs of 81.1%, 84.9%, and 92%, respectively. DISCUSSION: The most accurate models exhibit racial bias, being more likely to falsely predict that Black patients will be hospitalized. Models that are only based on the dynamic vital score exhibited accuracies close to the best parsimonious models, although the latter also used laboratories. CONCLUSIONS: This large study demonstrates that COVID-19 severity may accurately be predicted using a score that accounts for the dynamic evolution of vital signs. Further, race, social determinants of health, and hospital occupancy play an important role. Coronavirus disease 2019 has affected more than 450 million people globally. Although about 65% of the US population has been vaccinated against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), 1 rates of immunization have been uneven especially among different racial/ethnic groups and between rural versus urban communities. 2 Limited vaccination rates and the emergence of new variants suggests that COVID-19 will remain a concern for health systems worldwide. 3 Making predictions about disease severity is important in clinical triage, resource allocation, staffing, and overall planning, within a hospital system, and at the state/country scale. Artificial Intelligence (AI) methods have been used to that end, 4 including the prediction of patient outcomes for COVID-19. [5] [6] [7] [8] [9] [10] [11] However, these studies used data from relatively few patients (the largest used 2,500 6 ) and a limited collection of pre-existing conditions, laboratories, and inhospital data. More importantly, no predictive models of hospitalization, disease severity, and mortality have been developed using data from a safety-net hospital caring for a large percentage of racially/ethnically diverse patients, including many lower-income individuals with pressing needs associated with social determinants of health (SDOH). In addition, no models have leveraged SDOH for patients receiving clinical care. While work exploring disparities has considered aggregate nationwide data in the United States 12 and Brazil, 13 there is a need for more detailed analysis 14 and a concern that, if not properly adjusted, models may perpetuate biases. 15 The present study includes a large percentage of Black and Hispanic patients and person-level information on SDOH, enabling a characterization of specific race/ethnic and SDOH variables that influence the predictive models. An additional characteristic of the current study is the availability of rich information on the daily/hourly evolution of vitals for hospitalized patients. Most of the previously published predictive models were static-they considered a snapshot of the patient's condition and made a forward prediction. In the current work, we leveraged neural networks with long short-term memory cells 16 and a transformer 17 encoder to build a score of vitals that captures their dynamic evolution. Models based just on this score perform surprisingly well compared to more complex models that also use a host of laboratories. In addition, access to hospital occupancy data reveals how they may influence care decisions. We de-identified data for all 7,102 patients with a positive reverse transcription polymerase chain reaction (RT-PCR) SARS-CoV-2 test at the Boston Medical Center (BMC) between January 1 and December 31, 2020. As a tertiary care academic medical center, BMC is the largest safety-net hospital in New England, providing care for about 30% of Boston residents. 18 Features extracted included demographics, SDOH variables, depression status, travel-contact information, vital signs, radiological findings, past medical history, symptoms, medications, laboratory tests, hospital occupancy, hospitalization course, admission to the Intensive Care Unit (ICU), mechanical ventilation, and mortality. SDOH variables were based on answers to the THRIVE survey administered at BMC, which identifies social needs in 8 domains: housing, food, medication, transportation, utilities, childcare, employment, and education. 19 We also used self-reported race and ethnicity in the electronic health records and hospital occupancy, which was measured by the daily bed usage percentage for surgeries, COVID, and non-COVID patients. The Supplementary Material includes additional details. The study was approved by the BMC Institutional Review Board. We developed predictive models for the following outcomes: (1) hospitalization, (2) ICU care, (3) mechanical ventilation, and (4) mortality. For each patient, we built a profile containing all outcome labels and extracted features. Instead of using computer vision techniques to extract information from radiology images, 20 we used natural language processing (NLP) to extract radiology findings from text (see Supplementary Material). We applied "one-hot" encoding to represent categorical features as 0 and 1. We retained variables for which we had values from at least 350 patients, and we imputed the missing values in a continuous-valued feature using the mean of its nonmissing values. All features were standardized to zero mean and unit standard deviation. For the hospitalization model, we used the admission date as reference date, and the earliest positive SARS-CoV-2 test as the reference date for non-admitted patients. Features were extracted according to the reference dates. We utilized all features except laboratory results, medication, and radiological findings, which were typically not available for non-hospitalized patients. The features we used for predicting hospitalization include pre-existing conditions and SDOH information from the patient's hospital record, symptoms, and observed vital signs. All these features would have been readily available to physicians in either the emergency room or the outpatient clinics making these decisions. For admitted patients, the closest records before their reference date were extracted and we only included records that were within 48 h before the reference date. For non-admitted patients, we only included records that were within 48 h before or after the reference date. For ICU, mechanical ventilation and mortality prediction models, we only considered admitted patients. In addition to the features utilized in the hospitalization model, laboratory results and radiological findings were used, and we excluded symptoms since severe COVID-19 patients were less likely to describe their symptoms. The earliest and latest vitals and lab results to be used for ICU/Intubation/Mortality models depend on the timeline settings for models, which will be introduced in "Timeline Strategy" section. For patients with the identified outcomes (ICU care, mechanical ventilation, death), the date of the outcome was used as the reference date. For patients with absent outcomes, a random date during their hospitalization was used as their reference date. In general, all input features for predicting the various in-hospital outcomes would have been readily available to physicians in advance of the predicted outcome. We introduced a timeline strategy to capture the dynamic evolution of vital signs, labs, and radiological findings for predicting ICU, mechanical ventilation, and mortality. Given the reference date t and a desired "drop time" s 0 , we first eliminated all features during the interval ½t À s 0 ; t. Then, we defined k consecutive time windows with length s each, tracing back from t À s 0 . The mean (for continuous features) or maximum (for categorical features) of all feature records in the ith time window ½t À s 0 À is; t À s 0 À ði À 1Þs was computed and defined as "featureis." We used the sequence "feature À 1s,". . .,"feature À ks" as a feature timeline to train the models. We did not implement timelines for the hospitalization model, because laboratory and radiology findings were not used and vital sign records were sparse. We applied linear and nonlinear classifiers to predict outcomes. Linear methods included logistic regression (LR) and support vector machines (SVM). 21 Nonlinear methods included XGBoost 22 and Random Forest (RF). 23 We introduced regularizations to prevent the influence of outliers in data. 24, 25 Furthermore, we used LSTM-Transformer neural networks to compute a score capturing the dynamic evolution of vitals over the timeline. We applied statistical feature selection (SFS), removing variables with high p-value. We removed one from each pair of features with absolute correlation coefficient >0.8. We further implemented '1regularized LR recursive feature elimination (RFE). Features retained from RFE were used to derive a parsimonious LR model (see Supplementary Material for details). We evaluated model performance using 2 metrics: area under the curve (AUC) of the receiver-operating characteristic (ROC) and Weighted-F1 score. ROC plots the recall (or sensitivity) against the false positive rate, and AUC can be interpreted as the probability that a randomly chosen sample from the positive class will score higher than a randomly chosen sample from the negative class. The F1 score is the harmonic mean of recall and precision. The weighted-F1 score is calculated by weighting the F1 score of each class by the number of samples in that class. Values for both metrics are between 0 to 1 and a higher value implies a better model. We split patients into a training (80%) and test set (20%). We trained the models on the training set and evaluated them on the test set. We repeated this procedure 5 times, each with a different random split. The average and standard deviation on the test set over the 5 random splits are reported. For each split we further applied 5fold cross-validation on the training set to find the best hyperparameters of each model; therefore, the test set is completely independent and kept separate from the training process. We performed external validation to assess the generalizability of our hospitalization models. We trained hospitalization models using all BMC samples and evaluated their performance on data from Mass General Brigham used in our earlier work. 6 We did not attempt external validation for other models because they rely on clinical variables and it was not possible to match those across the 2 data sets. We compared our models with the NEWS2 26 score for predicting deterioration and the sepsis score qSOFA. 27 These are computed from vital signs, so we compared them with our LSTM-Transformer vital score. In addition, we trained the "BMC protocol," which is a classifier using a group of labs and vital signs chosen for evaluating COVID-19 severity by BMC physicians (see Supplementary Material). Prediction models The hospitalization model used the entire data set, labeling patients as hospitalized (class 1) or non-hospitalized (class 0). About 126 variables for each patient were retained after preprocessing. The average of the obtained metrics over 5 random splits is reported in Table 1. We compared the performance of linear (ie, best performing SVM and LR) and nonlinear (ie, XGBoost and RF) methods using all 126 variables. After SFS, 70 variables were retained and RFE retained 20 variables. The latter "parsimonious" model was enhanced by adding 2 hospital utilization variables, while controlling for additional relevant variables. Specifically, for each patient we added "Total Non-COVID Percentage" and "Total COVID Percentage," indicating the ratio of the number of patients treated for non-COVID diseases and COVID, respectively, over the total number of BMC beds, computed at the patient's reference time. This resulted into parsimonious models with 22 variables. The parsimonious models performed almost as well as the models with all 126 features. Table 1 also reports the composition of an '2-regularized LR model. Larger values of the variables with positive (respectively, negative) coefficient increase (respectively, decrease) the likelihood of hospitalization. For instance, the likelihood of hospitalization decreases with increased hospital occupancy. Two SDOH variables (Food insecurity and need for Transportation) were observed to increase hospitalization likelihood. We trained the model with the 20 variables retained after RFE on all BMC patients and evaluated its performance on patients from 5 hospitals in the Mass General Brigham system used in earlier work 6 ( Table 1) . We retained 249 Black and 251 White patients for testing and trained a model (with the 22 features of the parsimonious model) on the rest of the patients. Table 1 presents the performance of this model on the 2 cohorts. We used a treatment equality 28 definition to evaluate the fairness of the hospitalization treatment, which requires the ratio of the false positive rate (FPR) over the false negative rate (FNR) to be equal among the 2 cohorts. This ratio is 73.6% higher for black patients than for whites. Note that we are controlling for the most important variables associated with a hospitalization, hence, this bias is due to unmeasured factors not used by the model, or possibly from missing values of variables the model uses that may more severely affect one of the cohorts. To resolve this racial bias, we modified the prediction threshold of the LR model (to which the predicted likelihood is compared). The default value for this threshold is 0.5. We selected 2 different thresholds, 1 for black patients and 1 for white patients, seeking to equalize the FPR/FNR ratio while keeping the FNR relatively low (around 0.25). Table 1 reports these thresholds and the resulting metrics. The ICU prediction results are in Table 2 . We first trained one immediate model (0-drop), using the features in the past 36 h to predict need for immediate ICU care. For vitals we used a k ¼ 6, s ¼ 6 h timeline, while for laboratory and radiology findings we only used one s ¼ 36 h window, since most laboratory data and imaging were taken at most once a day. After combining the vitals into the respectively, the model is not using any information for the patient in the 12/24 h before ICU admission. The parsimonious models maintained a high AUC of 86.5% and 81.1%, respectively, which match and exceed the corresponding best nonlinear full models with AUC of 86.6% and 79.9%, respectively. While for these early predictions NEWS2 and qSOFA-based models performed poorly, the LSTM-Transformer score remained a strong predictor. Apparently, for immediate predictions all models did relatively well, whereas for longer term predictions the LSTM-Transformer score and other models including it show significant advantage. The mechanical ventilation prediction results are in Table 3 . As with the ICU models, we trained one immediate model (0-drop), using the past 36 h features to predict if a patient needs to be intubated immediately. For vitals we used a k ¼ 6, s ¼ 6 h timeline, while for laboratory and radiology findings we only used one s ¼ 36 h window. After combining vitals into the LSTM-Transformer score, we selected 10 features using RFE and trained a parsimonious model. The top features are reported in Table 3 . The parsimonious model obtained an average AUC of 91.2%, close to the AUC of the best full model (93.8%). Using only NEWS2 or qSOFA scores yields an AUC of 66.0% and 63.1%, respectively, lower than the AUC of 90.0% obtained by using just the LSTM-Transformer vital score. We further trained 2 extreme models to predict if a patient would need intubation after 12 h (12-h drop model) and 24 h (24h drop model); the corresponding parsimonious models have AUC of 90.3% and 84.9%, respectively. NEWS2-and qSOFA-based models do considerably worse in these advance predictions. Due to the relatively longer mean time gap between hospitalization and death, we built different timelines for the mortality models. The first mortality model only uses features within 3 d after admission (adm-based model), and k ¼ 3, s ¼ 24 h are applied in this timeline. Consequently, we can predict a patient's mortality at the very early stage of hospitalization. Another model uses a drop time of 24 h prior to death (24-h drop model), using k ¼ 7 and s ¼ 48 h for the timeline. For both settings, the LSTM-Transformer vital score is used in parsimonious models. Performance and top features are reported in Table 4 . For adm-based models, the best full model achieved 91.4% AUC, while the parsimonious model using LR with only 13 features did better (AUC of 92.0%). The AUC of qSOFA and NEWS2 models did not exceed 69%, and the LSTM-Transformer score yielded a model with AUC of 84.3%. For the 24-h drop model, the best nonlinear model achieved 96.2% AUC, and the parsimonious model using 13 features achieved an AUC of 94.0%. When the outcome draws near, the advantage of the LSTM-Transformer score over the NEWS2 score remains significant. The best AUCs achieved by the 4 models are between 93% and 96%, indicating strong predictive power. Strong predictions are achieved with relatively few features used by parsimonious models. These models use no more than 22 features each for hospitalization and mortality prediction, and no more than 10 features each for ICU and ventilation prediction, yielding similar (or better) performance with an AUC differential of À2.6% to þ1.2% compared to the best models. This indicates the possibility of implementing simple, actionable predictive models to aid triage, staffing, and resource planning. The models produced outperformed related models in the literature (eg, the ventilation model outperforms an earlier model with a 74% AUC 29 ). Patients' vital signs were the most important factors for ICU, ventilation, and mortality prediction. These vital signs imply the severity of the disease and the potential need for cardiorespiratory resuscitation. Most of the prior studies use vital signs as "static" independent predictive variables. [6] [7] [8] [9] In this study, we used an LSTM þ Transformer encoder deep neural network to develop a single score combining all vitals and capturing their dynamic evolution over time. Models for ICU and ventilation (short-term and longer term) predictions using just the LSTM-Transformer vital score have an AUC within 1.2-4.9% from the corresponding parsimonious models which also use other clinical variables; essentially, for these models, vital sign trends alone suffice! For mortality predictions, the LSTM-Transformer score is the top variable but other clinical variables significantly enhance performance. Long-term predictions are more challenging than short-term: ICU, ventilation, and mortality predictions deteriorate as we move Note: The values inside the parentheses denote the standard deviation of the corresponding metric. SVM-L1 and LR-L1 refer to the ' 1 -norm regularized SVM and LR models. We report the composition of an ' 2 -norm regularized LR model, including the coefficient of each variable (Coef), the correlation of the variable with the outcome (Y-corr), the mean of the variable (Y1-mean) in the hospitalized, and the mean of the variable (Y0-mean) in the non-hospitalized. For each variable, we also report the corresponding p-value, the odds ratio (OR), and its 95% confidence interval (CI). SpO2: oxygen saturation; BP: blood pressure; BMI: body mass index; PMH: past medical history; CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; CHF: congestive heart failure; SDOH: social determinants of health; Total non-COVID percentage: (Total number of non-COVID patients at the hospital/Total number of beds)Â100; Total COVID percentage: (Total number of COVID patients at the hospital/Total number of beds)Â100. further from the time of the outcome. While most models do relatively well for short-term predictions, the parsimonious models which include the LSTM-Transformer score increase their advantage to baseline models (eg, NEWS2 and qSOFA) when longer term predictions are sought. Specifically, the AUC differential between the parsimonious model and the best of the NEWS2 and qSOFA-based models increases from 8.4-25.2% for short-term predictions to 17.3-32.2% for longer term predictions. Incidentally, the protocol used at BMC fares better and is closer to the parsimonious model for both short and long-term predictions. Some of the variables included in the ICU prediction model have previously been identified in the literature. Patient age and past medical history like renal disease (CKD), cardiac disease (CAD) have extensively been described as factors influencing disease severity. 30 Laboratory data such as CRP and ferritin 31 are acute phase reactants and have also previously been associated with COVID-19 disease severity. The large and diverse population used in our work strongly supported these findings, and our interpretable LR model coefficients further numerically show their relative importance. The mortality model also includes laboratory data that have also previously been identified in the literature as being associated with disease severity such as CRP, ferritin, and LDH. 6, 32 Since mortality prediction models use multiple time windows for labs as well, the most informative period of a certain lab is further revealed. Analyzing data from a safety net hospital with a high proportion of Black patients and information on SDOH needs, gave us an opportunity to assess the effect of racial bias and socioeconomic variables. We elected to consider potential racial bias only between Black Note: For each full model, we only report results from the algorithm with the highest AUC out of LR, SVM, XGBoost, and RF. We present the LR coefficients of each variable (Coef), the correlation of the variable with the outcome (Y-corr), the p-value, the mean of the variable (Y1-mean) in the ICU patients, and the mean of the variable (Y0-mean) in the non-ICU patients. LDH: lactate dehydrogenase; BUN: blood urea nitrogen; NRBC: nucleated red blood cell; CKD: chronic kidney disease; PMH: past medical history; CAD: coronary artery disease; DVT: deep vein thrombosis; HLD: hypersensitivity lung disease; Total COVID percentage: (Total number of COVID patients at the hospital/Total number of beds)Â100. and White patients, avoiding to also examine bias involving Hispanic or Latino, which is another racial group with sufficient number of patients for such analysis. Several reasons for this choice: (1) there is considerable ambiguity on how people self-identify as Hispanic or Latino 33 ; (2) in our data set about 44% of the patients have a missing race variable and the majority of those also identified as Hispanic/Latino (about 80%); and (3) there are disparities even between Black and White Hispanic/Latino individuals. 34 Food insecurity and need for transportation became the top predictive features in the hospitalization model, possibly because they serve as a marker for severe economic hardship. Food is the most basic need and is related to patients' lifestyles and state of health. The COVID pandemic further expanded food insecurity worldwide, making it harder for vulnerable households to address their needs. Similarly, patients with transportation needs rely more on the most affordable public transit, which increases their risk for exposure to the SARS-CoV-2 virus, while people with private cars and those who work from home can avoid exposure. Further, delayed access to care leads to a possibly worse clinical condition when arriving in acute settings. The other SDOH variables, such as housing insecurity were not as predictive as "Food" and "Transportation," possibly because the homeless rate in Boston has dropped sharply in recent years; specifically, 97-98% of the homeless population has been sheltered according to the latest homeless census. 35 The percentage of Black patients in the data set is 35.1%, yet their percentage in the admitted, ICU patients, mechanically ventilated, and deceased ranges from 43.1% to 45.5%. Predictive models exploit biases in the underlying data. 36 The hospitalization model exhibits bias, being more likely to falsely predict that a Black patient will be hospitalized. This reinforces the consideration of race as a social construct; persons who identify as Black being adversely affected by structural racism, and associated with a host of Note: For each full model, we only report results from the algorithm with the highest AUC out of LR, SVM, XGBoost, and RF. We present the LR coefficients of each variable (Coef), the correlation of the variable with the outcome (Y-corr), the p-value, the mean of the variable (Y1-mean) in the intubated patients, and the mean of the variable (Y0-mean) in the nonintubated patients. CRP: C-reactive protein; Total Elective Surgery percentage: (Total number of Elective Surgeries/Total number of beds)Â100. circumstances, conditions and comorbidities that increase hospitalization risk. 37 As discussed earlier, it is possible to correct for this bias by employing different decision thresholds for Black and White patients. Hospital census, such as the percentage of COVID-19 and non-COVID-19 patients and elective surgeries performed, affect the prediction results. This implies that an oversaturated hospital does affect resource allocation for new patients and further exacerbates the risk of future decompensation without adequate medical support. Our hospitalization prediction model can be useful in any outpatient or emergency care setting given that the variables used are readily available to clinicians. This includes information on SDOH which is regularly collected at BMC. We note that such SDOH information gathering practices are becoming more widespread. Patients from underrepresented groups with potential SDOH needs are in fact more likely to present to an emergency care setting with ambulatory sensitive conditions compared to others. 38 An FNR on the order of 0.25 achieved by the modified models corresponds to a reasonable compromise between false positive and Note: For each full model, we only report results from the algorithm with the highest AUC out of LR, SVM, XGBoost, and RF. We present the LR coefficients of each variable (Coef), the correlation of the variable with the outcome (Y-corr), the p-value, the mean of the variable (Y1-mean) in the deceased, and the mean of the variable (Y0-mean) in the nondeceased. PMH: past medical history; CHF: congestive heart failure; CAD: coronary artery disease; CRP: C-reactive protein; LDH: lactate dehydrogenase. false negative decisions. Not hospitalizing/not transferring soon enough can lead to increased overall resource utilization as patients will present with more severe disease. This will have implications on the spread of the disease while they remain outside the hospital, length of the hospitalization and recovery if they end up being hospitalized, and associated economic consequences such as loss of wages. On the other hand, hospitalizing patients who may not need it leads to increased resource utilization and can result in hospitals been full and not having the ability to treat other patients who require care. Due to the novelty of COVID-19, including emerging variants, the related costs are not well characterized and vary greatly in different regions/hospitals, depending also on the local epidemiological conditions; therefore, hospitals may set this threshold based on their specific local situations. A potential limitation of the study is that even though the hospitalization model has been externally validated, it has not been possible to do the same with the remaining models, particularly using data from other safety-net hospitals. In addition, although the patient past medical history data we used has no time limitations, underlying comorbidities may not be recorded in the EHR, which can potentially influence the performance or introduce bias in our models. Our COVID-19 prediction models that are based on a large diverse patient population can accurately predict outcomes, potentially aiding in triage, resource allocation, and staffing determinations. Additionally, the use of dynamic variables such as vital signs improves the predictive ability of models and should be considered in future model development. This study highlights the importance of ensuring diverse patient populations are represented in advanced analytics development and suggests how to careful consider and interpret race within predictive models. BH, YH, SS, and ZZ developed the models, obtained results, and co-wrote the manuscript. WGA, SAA, HS, and RGM provided access to data, medical intuition, contributed to the writing the manuscript, and reviewed the manuscript. ICP designed/led the study, contributed to model development, and co-wrote the manuscript. Supplementary material is available at Journal of the American Medical Informatics Association online. Tracking coronavirus vaccinations around the world. The New York Times Disparities in COVID-19 vaccination coverage between urban and rural counties -United States Challenges and issues about organizing a hospital to respond to the COVID-19 outbreak: experience from a French reference centre Personalized predictions of patient outcomes during and after hospitalization using artificial intelligence China Medical Treatment Expert Group for COVID-19. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 Early prediction of level-of-care requirements in patients with COVID-19 An interpretable mortality prediction model for COVID-19 patients Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score Clinical and laboratory predictors of inhospital mortality in patients with coronavirus disease-2019: a cohort study in Wuhan, China A tool for early prediction of severe coronavirus disease 2019 (COVID-19): a multicenter study using the risk Nomogram in Wuhan and Guangdong, China Predictors for severe COVID-19 infection Variation in racial/ethnic disparities in COVID-19 mortality by age in the United States: a cross-sectional study Physiological and socioeconomic characteristics predict COVID-19 mortality and resource utilization in Brazil Racial health disparities and Covid-19-caution and context Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Long short-term memory Attention is all you need Implementing an EHR-based screening and referral system to address social determinants of health in primary care COVID-19 automatic diagnosis with radiographic imaging: explainable attention transfer deep neural networks Support-vector networks XGBoost: a scalable tree boosting system Random forests Detection of unwarranted CT radiation exposure from patient and imaging protocol meta-data using regularized regression The national early warning score 2 (NEWS2) Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) Fairness in machine learning: a survey Development and validation of prediction models for mechanical ventilation, renal replacement therapy, and readmission in COVID-19 patients Mortality and clinical outcomes among patients with COVID-19 and diabetes C-reactive protein, procalcitonin, D-dimer, and ferritin in severe coronavirus disease-2019: a meta-analysis Lactate dehydrogenase levels predict coronavirus disease 2019 (COVID-19) severity and mortality: a pooled analysis Issues in the assessment of "race" among Latinos: implications for research and policy Are black Hispanics black or Hispanic? Exploring disparities at the intersection of race and ethnicity Dissecting racial bias in an algorithm used to manage the health of populations Racism, not race, drives inequity across the COVID-19 continuum Factors influencing emergency department preference for access to healthcare None declared. A data use agreement with the Boston Medical Center does not allow us to make the original data available. Code for the various algorithms that produced the results are available at https://github. com/noc-lab/BMC_COVID.