key: cord-0715279-qnrfis4f authors: Chen, Zhe; Russo, Nicholas W.; Miller, Matthew M.; Murphy, Robert X.; Burmeister, David B. title: An observational study to develop a scoring system and model to detect risk of hospital admission due to COVID‐19 date: 2021-03-31 journal: J Am Coll Emerg Physicians Open DOI: 10.1002/emp2.12406 sha: 09fc260e414536497c956471ff42f7a1b50fb70f doc_id: 715279 cord_uid: qnrfis4f BACKGROUND: COVID‐19 has caused an unprecedented global health emergency. The strains of such a pandemic can overwhelm hospital capacity. Efficient clinical decision‐making is crucial for proper healthcare resource utilization in this crisis. Using observational study data, we set out to create a predictive model that could anticipate which COVID‐19 patients would likely be admitted and developed a scoring tool that could be used in the clinical setting and for population risk stratification. METHODS: We retrospectively evaluated data from COVID‐19 patients across a network of 6 hospitals in northeastern Pennsylvania. Analysis was limited to age, gender, and historical variables. After creating a variable importance plot, we chose a selection of the best predictors to train a logistic regression model. Variable selection was done using a lasso regularization technique. Using the coefficients in our logistic regression model, we then created a scoring tool and validated the score on a test set data. RESULTS: A total of 6485 COVID‐19 patients were included in our analysis, of which 707 were hospitalized. The biggest predictors of patient hospitalization included age, a history of hypertension, diabetes, chronic heart disease, gender, tobacco use, and chronic kidney disease. The logistic regression model demonstrated an AUC of 0.81. The coefficients for our logistic regression model were used to develop a scoring tool. Low‐, intermediate‐, and high‐risk patients were deemed to have a 3.5%, 26%, and 38% chance of hospitalization, respectively. The best predictors of hospitalization included age (odds ratio [OR] = 1.03, confidence interval [CI] = 1.02–1.03), diabetes (OR = 2.08, CI = 1.69–2.57), hypertension (OR = 2.36, CI = 1.90–2.94), chronic heart disease (OR = 1.53, CI = 1.22–1.91), and male gender (OR = 1.32, CI = 1.11–1.58). CONCLUSIONS: Using retrospective observational data from a 6‐hospital network, we determined risk factors for admission and developed a predictive model and scoring tool for use in the clinical and population setting that could anticipate admission for COVID‐19 patients. health emergency with more than 23.7 million confirmed cases globally as of August 25, 2020. 2 In the United States alone, there have been 5.7 million confirmed cases, with over 177,000 deathsnumbers that increase by the day. 2 The virus displays person-to-person transmission, 3, 4 and to cope with the tremendous burden this places on the healthcare system, governments worldwide have instituted quarantine measures to slow the spread. Although the typical intensive care unit (ICU) occupancy is 60%-80%, 5 the strains of a pandemic, such as that of COVID-19, can overwhelm hospital capacity. 6 Furthermore, COVID-19 demonstrates widespread symptomology with varying degrees of illness and inconsistent radiologic findings. Although the majority of patients present with fever, cough, shortness of breath, and respiratory distress, 7, 8 gastrointestinal symptoms, such as nausea and vomiting, have also been reported, 9 as well as the asymptomatic patient. 10 Predictive analytics that use algorithms to identify patterns in large amounts of data 18 25 and act as a COVID-19 mortality predictor using clinical features. 26 Although data have been used to inform decision making and provide a prediction of prognosis and hospitalization, 27, 28 these models rely on clinical data that have to be obtained on a hospital visit (ie, lab values, vital signs, and chest X-ray findings). An accurate model has not yet been introduced that predicts whether a patient who tested positive for COVID-19 will be admitted to the hospital using only age and historical variables. Such a model can be used both in the clinical and population level. We created a logistic regression model using retrospective observational study data to try to predict which patients will likely be admitted to the hospital that test positive for COVID-19. We also understand that, on a practical level, incorporating a predictive model into an electronic health record for decision support can be challenging, and, thus, we also used variables from our logistic regression model to develop a practical scoring tool. This study is an observational retrospective study that includes description and analysis of COVID-19 patients across our 6-hospital network in northeastern Pennsylvania who had data collected in a COVID-19 registry database. The registry was developed to be used as an analytical tool as a part of the organization's quality improvement COVID initiatives. It was built in the Epic electronic health record system (Epic Systems, Verona, WI) after being developed and maintained by the network's enterprise analytics team in the information support (IS) department. The data were extracted from the database with deidentified patient data, and variables within the database were used in our analysis. A COVID-19 patient was defined as a patient who had a positive SARS-COV2 PCR test. Patients younger than 18 years old and older than 90 years old were excluded from the analysis based on our network institutional review board requirements to maintain a quality improvement initiative and preserve patient confidentiality. We limited our analysis to age, gender, and historical variables because our goal was to develop an ambulatory predictive tool to predict which patients likely will be hospitalized. The demographic and historical variables were defined using Epic electronic health record groupers that generally are used in our electronic health record to capture patient clinical history. The initial independent variables that were extracted include age, hypertension, diabetes, chronic heart disease, gender, smoking history, chronic kidney disease, whether the patient was taking an ACE inhibitor, a history of cancer, chronic obstructive pulmonary disease, asthma, chronic liver disease, chronic renal failure, corticosteroid use, whether the patient was taking an immunosuppressive, and history of chronic bronchitis and HIV status. The groupers aggregate together ICD-10 codes on problem lists that fall into a particular category. We first split the data into a training set that consisted of 80% of the data. Exploratory data analyses and model developments were con- The definition of hospitalization for COVID-19 was a positive test, defined as a positive SAR-COV2 PCR test, and hospitalization, defined as time of admission order placement, within 7 days of each other. We used the best predictors to train a logistic regression model to predict which patients will likely be hospitalized. We reported odds ratios (ORs) and confidence intervals (CIs) for those variables in our model. The performance of the model was validated on the 20% test set data that the model did not see as mentioned above. We used the coefficient values from our logistic regression model with binned age to develop a manual scoring tool that allows a manual calculation of the risk of hospitalization. We then validated the score on the test set data. The R statistical software was used to conduct all statistical analysis, and the stats and randomForest package were used for model development. This study proposal was reviewed by our institutional review board and deemed to be non-human research. This retrospective study used data from 6,485 patients with COVID-19 at 6 hospitals in Pennsylvania to develop a logistic regression model and scoring tool to predict hospitalization using easily-obtained input variables. This model has potential to support resource planning for patients with COVID-19. A total of 6485 patients were included in our analysis. Of these, 707 patients were defined as being hospitalized for COVID-19. There was a clear difference in age between those who were hospitalized compared to those who were not with a mean of 64 and 48 years of age, respectively. Table 1 shows the variable differences between hospitalized versus non-hospitalized patients. The best predictors of hospitalization were age, a history of hypertension, diabetes, chronic heart disease, gender, tobacco use, and chronic kidney disease (see Figure 1 , variable importance plot). The receiver operating characteristic (ROC) curve for the model tested on the 20% validation data is presented in 27 We then used differences in historical variable data to inform our variable importance plot and used the top features to train a logistic regression model. Our model was internally validated on a test set of data. We then created an easy-to-use risk stratification score based on our statistical analysis. There are other models that predict the risk of hospitalization and prognosis. 28 Given the high prevalence of these chronic diseases 37 and potential role they play in disease progression, it is important that our model and future models incorporate these diagnoses. We are currently using this model to inform referral for our network's remote home monitoring program to allow early remote intervention. Our model can be calculated without an office visit needed to collect clinical visit data, which other models would require. Therefore, referrals can be made once patients' SARS-COV2 test results are back. This could allow us to potentially reduce ER visits and hospitalizations, and we hope to publish on the results of our program. In summary, we have described the predictors of hospital admission from our observational data and created a tool that predicts hospitalization rates in COVID-19 patients using common clinical variables and comorbidities without collecting vital signs or laboratory values. This information can be used to guide clinical decision making and increase efficiency and prioritization of patient care in an era where hospital resources are being pushed to their limits. Authors would like to acknowledge Erin Shigo, BA and Marna Greenberg for their scholarly work in formatting and editing. None. ZC, MM, RM developed the study concept and design and participated in acquisition of the data. ZC performed the analysis and all authors (ZC, NR, MM, RM, DB) participated in the described interpretation of the data; NR and ZC drafted their portion of the manuscript, and all participated in the critical revision of the manuscript for important intellectual content. All authors take final responsibility for the manuscript as a whole. World Health Organization. WHO Director-General's Opening Remarks at the Media Briefing on COVID-19 -20 An interactive web-based dashboard to track COVID-19 in real time Importation and human-tohuman transmission of a novel coronavirus in Vietnam Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia ICU occupancy and mechanical ventilator use in the United States Locally informed simulation to predict hospital capacity needs during the COVID-19 pandemic Clinical characteristics of Coronavirus disease 2019 in China The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health -the latest 2019 novel coronavirus outbreak inWuhan, China First case of 2019 novel coronavirus in the United States Covid-19: identifying and isolating asymptomatic people helped eliminate virus in italian village Metabolic syndrome and COVID-19: an update on the associated comorbidities and proposed therapies Prevalence of underlying diseases in hospitalized patients with COVID-19: a systematic review and meta-analysis Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area CDC COVID-19 Response Team. Severe outcomes among patients with Coronavirus disease 2019 (COVID-19)-United States Estimates of the severity of coronavirus disease 2019: a model-based analysis Prevalence of comorbidities in patients and mortality cases affected by SARS-CoV2: a systematic review and meta-analysis Comorbidity and its impact on patients with COVID-19 An introduction to machine learning for clinicians Machine learning and medical education AI-Driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal Data Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 Machine learning to assist clinical decision-making during the COVID-19 pandemic Using machine learning to predict ICU transfer in hospitalized COVID-19 patients Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study Artificial intelligence-enabled rapid diagnosis of COVID-19 patients Clinical predictors of COVID-19 mortality. The Lancet Digital Health Hospitalization rates and characteristics of patients hospitalized with laboratory-confirmed Coronavirus disease 2019 -COVID-NET, 14 States Development and validation of a model for individualized prediction of hospitalization risk in 4,536 patients with COVID-19 CoVA: an acuity score for outpatient screening that predicts Coronavirus disease 2019 prognosis Can early treatment of patients with risk factors contribute to managing the COVID-19 pandemic Covid-19 in critically ill patients in the Seattle Region -Case Series Clinical characteristics of 113 deceased patients with coronavirus disease 2019: retrospective study Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis Clinical features of COVID-19 and factors associated with severe clinical course: a systematic review and meta-analysis Predictors of mortality in hospitalized COVID-19 patients: a systematic review and meta-analysis An empirical study of chronic diseases in the United States: a visual analytics approach