key: cord-0886116-22l36xkv authors: Rizzo, S.; Chawla, D.; Zalocusky, K.; Keebler, D.; Chia, J.; Lindsay, L.; Yau, V.; Kamath, T.; Tsai, L. title: Descriptive epidemiology of 16,780 hospitalized COVID-19 patients in the United States date: 2020-07-29 journal: nan DOI: 10.1101/2020.07.17.20156265 sha: fededfaaa25c8169e662b80cf96bad6f17134207 doc_id: 886116 cord_uid: 22l36xkv BACKGROUND: Despite the significant morbidity and mortality caused by the 2019 novel coronavirus disease (COVID-19), our understanding of basic disease epidemiology remains limited. This study aimed to describe key patient characteristics, comorbidities, treatments, and outcomes of a large U.S.-based cohort of patients hospitalized with COVD-19 using electronic health records (EHR). METHODS: We identified patients in the Optum De-identified COVID-19 EHR database who had laboratory-confirmed COVID-19 or a presumptive diagnosis between 20 February 2020 and 6 June 2020. We included hospitalizations that occurred 7 days prior to, or within 21 days after, COVID-19 diagnosis. Among hospitalized patients we describe the following: vital statistics and laboratory results on admission, relevant comorbidities (using diagnostic, procedural, and revenue codes), medications (NDC, HCPC codes), ventilation, intensive care unit (ICU) stay, length of stay (LOS), and mortality. RESULTS: We identified 76,819 patients diagnosed with COVID-19, 16,780 of whom met inclusion criteria for COVID-related hospitalization. Over half the cohort was over age 50 (74.5%), overweight or obese (72.1%), or had hypertension (58.1%). At admission, 29.1% of patients presented with fever (>38 C) and 30.6% had low oxygen saturation (<90%). Among the 16,099 patients with complete hospital records, we observed that 58.9% had hypoxia, 23.4% had an ICU stay during hospitalization, 18.1% were ventilated, and 16.2% died. The median LOS was 6 days (IQR: 4, 11). CONCLUSIONS: To our knowledge, this is the largest descriptive study of patients hospitalized with COVID-19 in the United States. We report summary statistics of key clinical outcomes that provide insights to better understand COVID-19 disease epidemiology. The novel coronavirus disease poses an unprecedented public health challenge with significant morbidity and mortality. In March 2020, the World Health Organization (WHO) officially declared COVID-19 a pandemic. As of 17 July 2020, the number of deaths attributed to the virus numbered over 120,000 in the United States, and these numbers continue to rise every day. 1 Patients with COVID-19 present with a wide range of symptoms, including but not limited to fever, cough, shortness of breath, headache or body aches, fatigue, and/or loss of taste and smell. 2, 3 Transmission primarily occurs during person-to-person contact via respiratory droplets, but can potentially occur by touching a surface with the virus and subsequently touching your eyes, mouth, or nose via fomites. 4 Additionally, many reports suggest asymptomatic cases and transmission, which pose a unique challenge to epidemiological studies of this disease. 5 The scientific community is quickly gaining insights into the SARS-COV-2 virus and clinical spectrum of COVID-19 disease, and rigorous epidemiological studies from real-world settings are a key component to fully understanding this novel disease. Specifically, detailed epidemiologic studies of hospitalized patients¾which reflect a more severe subset of the infected population¾can help inform scientific understanding of disease progression, comorbidities, treatment patterns, and important clinical endpoints like ventilation and ICU stay. While some real-world studies of hospitalized patients in the United States have been published, 6, 7, 8, 9 few studies possess the necessary sample size and inpatient medication details to provide a robust snapshot of patients with COVID-19. Additionally, reproducing results from various study populations is necessary for synthesis and understanding of the true nature of the disease. 10 To help fill this knowledge gap, the primary objective of this study is to describe key patient characteristics, comorbidities, treatments, and outcomes in a large U.S.-based cohort of patients hospitalized with COVID-19 using electronic health records (EHR). Additionally, the study describes vital signs and clinical conditions upon admission. To the authors' knowledge, this study describes the largest cohort of patients hospitalized with COVID-19 in the United States to date. To address this research objective, individuals with COVID-19 were extracted from the Optum® De-identified COVID-19 EHR dataset from 20 February 2020 to 6 June 2020, sourced from a dataset containing patient-level medical and administrative records from hospitals, emergency departments, outpatient centers, and laboratories from across the United States. Data are deidentified in compliance with the HIPAA Expert Method and managed according to Optum® customer data use agreements. The COVID-19 EHR dataset sources clinical information from hospital networks that provide data meeting Optum's internal data quality criteria. Missing end dates from inpatient records were imputed using evidence of continuous inpatient presence using diagnostic, administered medications, and vital records, allowing for a 1-day gap. The last date of continuous presence was used as the hospitalization end date. This imputation method was validated using inpatient records with known end dates as well as by physician review. As the Optum EHR data included patient records for individuals who may still be hospitalized at the time when the database was censored, patient records were treated the following way: (1) patients who remained in-hospital at the time of data delivery were included in estimations of COVID-related outcomes and descriptions of clinical characteristics on admission, and (2) patients with incomplete hospitalization up to a week before data delivery were excluded from estimates that require completed hospitalization (length of stay [LOS], intensive care unit [ICU], ventilation, medications administered). Patient characteristics for the overall and hospitalized COVID-19 cohorts are reported in Tables 1 and 2 . Comorbidities were assessed using relevant ICD-9/10 diagnosis codes in the 1 year prior to index date. We report the presence of any pregnancy-related diagnosis code in the 6 months prior to index date. The Charlson Comorbidity Index (CCI) reported in Table 2 is a summary metric for describing comorbidity burden in patients. 13 Selection of relevant clinical outcomes, laboratory values, and medications was based on prior publications and/or clinical consultation. Clinical conditions and outcomes for the patients hospitalized with COVID-19 are reported in Table 3 , overall and age-stratified. Hypoxia, any ventilation, invasive ventilation, bilevel positive airway pressure (BiPAP) or continuous positive airway pressure (CPAP) therapy, and extracorporeal membrane oxygenation (ECMO) were identified using ICD-9/10 diagnosis, ICD-9/10 PCS, CPT, and REV codes. Respiratory failure including acute respiratory distress syndrome (ARDS), pneumonia, and sepsis were identified using ICD-9/10 diagnosis codes, and reported stratified by timing of occurrence. Mortality was captured primarily from EHR records where death has been documented. Only month and year of death were available; exact day of death was not included in the dataset. Select laboratory values for the hospitalized cohort are reported in Table 5 . We report the first measurement of each analyte within a 3-day window of hospital admission. We exclude laboratory values with non-numeric results or missing unit values. For troponin, we additionally exclude laboratory values with missing reference ranges since this laboratory value is reported as a proportion above normal range. If multiple testing methods were reported for the same analyte, the most prevalent test(s) were used until greater than 95% of measurements were captured. Similarly, when multiple units were reported for the same analyte, the most prevalent unit(s) were used, and units were harmonized where appropriate. Relevant medications administered during hospitalization are reported in Table 6 . 14, 15, 16, 17 When appropriate, medication classes from ATC were converted to NDC or HCPCS, and these NDC and HCPCS codes were used to ascertain medications administered. Since remdesivir lacked an NDC code at the time of this analysis, we used a free-text match for 'remdesivir' in the medication descriptions. Due to particular clinical interest in these drugs, we report dexamethasone separately from 'corticosteroids excluding dexamethasone'; similarly, we report azithromycin separately from 'antibiotics excluding azithromycin'. As the objectives of this analysis were descriptive; no statistical significance testing was conducted. Python was used to conduct all analyses. 18,19,20 We identified 76,819 patients that met the inclusion criteria for the overall COVID-19 cohort. Of these, 51.8% had a positive diagnostic test, 76.2% had a U07.1 diagnosis, and 16.2% had a B97.29 diagnosis. We observed very few pediatric patients (younger than 18 years of age) (2.3%), while 29.4% of the cohort was above the age of 65. Among those with race/ethnicity data, the cohort was 27.4% African-American and 15.1% Hispanic, though race/ethnicity data were missing for almost a quarter of patients. Most patients came from the Northeast (50.8%) or Midwest (42.5%). More than half of patients were overweight or obese (75.5%, among patients with non-missing values), though BMI was missing in 25.8% of patients. Approximately 4.9% of patients had death recorded between COVID-19 diagnosis date and end of study period (see Table 1 ). Within the broader cohort, we then identified 16,780 patients (21.8%) that met inclusion criteria for the hospitalized COVID-19 cohort. Of these, 16,099 had a completed hospitalization. We again observed very few pediatric patients (1.1%), while 45.2% of this cohort was over the age of 65. Over three-quarters of the hospitalized patients were overweight or obese (77.2%, among patients with non-missing values), which was consistent with the overall COVID-19 cohort. Similar patterns for race, ethnicity, and geographic region were observed as compared to the overall cohort (see Table 1 ). African-American patients had higher rates of hospitalizations as compared to white patients (28.7% vs. 20.0%). Approximately 16.2% of patients with completed COVID-19 hospitalizations had death recorded between hospital admission and end of study period (see Table 3 ). Comorbidity burden was high in the overall and hospitalized cohorts. In particular, diabetes mellitus (19.5%, 34.1%) and hypertension (38.0%, 58.1%) were prevalent in the overall and hospitalized cohorts, respectively. Approximately 53.9% of the overall cohort had no reported CCI-eligible comorbidities, as compared to 33.5% of the hospitalized cohort (see Table 2 ). A sensitivity analysis to confirm patient presence in the EHR system indicated that patients lacking records prior to COVID-19 index date accounted for <0·01% (approximately 1 in 20,000 patients), indicating low missing data in this 1-year comorbidity lookback. Those with at least one CCI-eligible comorbidity were hospitalized at a higher rate than those without comorbidities (31.5% vs. 13.6%). We observed that 58.9% of patients had hypoxia at any point during hospitalization. Among the hospitalized cohort, 18.1% of patients were ventilated, 4.1% had non-invasive CPAP or BiPAP therapy, 14.4% had invasive ventilation, and 23.4% of patients were in the ICU at some point during their hospitalization. Median LOS was 6 days (IQR: 4, 11), and varied by age group (see Table 3 ). Among patients who had an ICU stay at some point during their hospitalization, the overall LOS was 11 days (IQR: 7, 19) , and the ICU-specific LOS was 3 days (IQR: 1, 9) . Many patients were admitted to the hospital with complications, including respiratory failure (41.3%), pneumonia (60.7%), and sepsis (20.4%). Vital signs and laboratory values at hospital admission are reported in Tables 4 and 5 In this study, we report key patient characteristics and clinical outcomes in a large, multicenter cohort of patients hospitalized with COVID-19. Consistent with other publications, we observed significant comorbidity burden among hospitalized patients, as evidenced by the high proportions for both individual and combined (>1) comorbidities as measured by the CCI. In fact, those with at least one CCI-eligible comorbidity were hospitalized at a higher rate than those without comorbidities. We also observed a higher median age among hospitalized patients compared to the overall COVID-19 cohort. Rates of hospitalization increased by age, with 12.2% of adults younger than 40 years and 37.0% of patients older than 75 years being hospitalized. These observations are consistent with reports that older patients and patients with high comorbidity burden are more likely to develop severe disease that requires hospitalization. 21, 22 The high proportion of hospitalized patients with hypertension (58.1%) is particularly striking. This report is consistent with Richardson et al, which reported that 56.6% of their hospitalized cohort had hypertension. 6 Some published reports hypothesize that high rates of hypertension in patients with COVID-19 may be explained by the fact that hypertensive patients are often treated with angiotensin-converting enzyme (ACE) inhibitors, which may lead to increased expression of ACE2, increasing the risk of COVID-19 infection. 23, 24, 25, 26 Pneumonia is often part of the COVID-19 sequelae. 27, 28 Using diagnosis codes, we observed pneumonia in 61% of patients on admission and an additional 10.1% after admission. Acute respiratory distress syndrome (ARDS) is an acute inflammatory lung condition that sometimes occurs as part of the sequelae for patients with severe COVID-19. However, ARDS is often underreported by clinicians, 29 and instead potentially captured more broadly as respiratory failure. Hence, we report all respiratory failure which includes ARDS, which accounted for 41.3% of patients on admission and an additional 11.7% after admission (Table 3) . Bacterial pneumonia is also hypothesized to occur as a complication of COVID-19, though reports are inconsistent. 30, 31, 32 While we observed bacterial pneumonia (3.6%) on admission, these results should be interpreted with caution as it may be underreported in patients who are already diagnosed with COVID-19. Furthermore, much of the bacterial pneumonia in this population may be associated with invasive ventilation, and thus develops over the course of inpatient stay rather than on admission. We assessed medication use among patients hospitalized with COVID-19 that have been hypothesized in the literature 14, 15, 16, 17 to be related to COVID-19 disease trajectories. Among the medications captured, anticoagulants were strikingly prevalent, with 70.9% of patients receiving anticoagulants at some point during hospitalization. We hypothesize that in most cases these medications were most likely used prophylactically to prevent deep vein thrombosis (DVT) due to reports of higher risk of blood clots in COVID-19 patients. 33, 34 We observed low use of remdesivir and tocilizumab (1.3% and 2.9%, respectively). Finally, we observed approximately a third of hospitalized patients received hydroxychloroquine/chloroquine, which is striking considering more recent reports on the lack of efficacy of this drug class for COVID-19. 35, 36 6. This study contributes meaningfully to our understanding of the descriptive epidemiology of COVID-19 in a real-world U.S.-based setting, including disease presentation, clinical outcomes, and medication use; however, some important limitations must be considered. First, the case definition utilized in this study relied on either a diagnosis code or positive diagnostic test. The diagnostic test aspect of the case definition allows for high specificity but potentially low sensitivity. Conversely, the diagnostic code aspect of the case definition allows for a broader capture of patients hospitalized with COVID-19 that do not have a recorded diagnostic test; this may result in lower specificity and higher sensitivity. This definition excludes a subset of patients who (a) did not get tested, (b) got tested outside of the relevant window, (c) did not receive the appropriate diagnosis code, or (d) had COVID-19, with or without signs and symptoms, but did not seek care. This may result in potential selection bias whereby our cohort reflects a more severe COVID-19 population; therefore, our hospitalization and mortality rates in the overall COVID-19 cohort should be interpreted with caution. Notably, our analysis had a broader case definition as compared to other published reports of patients hospitalized with COVID-19, which restrict to laboratory-confirmed cases. 6, 7 We conducted a sensitivity analysis removing the B97.29 code from our case definition and observed an overall cohort size reduction of 4.8% and no meaningful changes to key outcomes. Finally, it is possible that patients received a SARS-CoV-2 test within the Optum EHR system, but are admitted elsewhere, causing an underestimation of hospitalization rate in this cohort. Given the use of EHR data that has been captured in real-time during a pandemic setting, the issue of missing data is an important consideration and impacts a number of key study outcomes. In particular, we were unable to report supplemental oxygen and ventilator-associated pneumonia due to insufficient capture of these outcomes. Additionally, we rely on diagnostic, procedural, and revenue codes to capture key conditions like ARDS, bacterial pneumonia, sepsis, and respiratory failure, and outcomes such as ventilation, meaning that these features are likely to be underreported. It is also noteworthy to highlight that our mortality estimates may underestimate the true mortality rate, since we do not capture deaths that occur outside of hospitalization. Finally, our study population may not perfectly generalize to the broader population of patients hospitalized with COVID-19. Of note, most of our cohort is from the Northeast or the Midwest. Public health response, availability of testing, social-distancing policies, hospital and ICU capacity, and discharge practices differ by hospital and geography, which leads to (a) substantial heterogeneity within our cohort, and (b) limited ability to generalize findings to the overall population of the United States. Recently pregnant was defined as the presence of pregnancy-related ICD 9/10 diagnosis codes in the 6 months prior to COVID-19 diagnosis date. 10. A. Cohort definition The U07·1 and U07·2 ICD-10 diagnosis codes reflect the WHO and CDC guidelines for coding confirmed or probable COVID-19 cases effective date 1 April 2020 (WHO citation, CDC citation). To ensure full capture of the earliest COVID-19 cases, we also included the B97·29 code ('Other coronavirus as the cause of disease classified elsewhere'), which was the interim code used before the U07·1 and U07·2 codes were implemented on 1 April 2020 (CDC citation, Nalleballe et al). For the diagnostic laboratory tests used for inclusion, we only allowed COVID-19 tests that explicitly have the following strings in their name : 'pcr', 'rna', 'naat', 'np' (nasopharyngeal), or 'op' (oral-pharyngeal) to ensure tests were diagnostic tests; antibody tests were not used as part of our case definition. For the index date used in Tables 1 and 2, earliest date of diagnosis code or positive diagnostic test was used. We did not impose any age restrictions on the study population. A sensitivity analysis was performed to evaluate the inclusion of the 4.8% of cases that had a B97.29 code and no U07.1 diagnosis or positive lab. No changes were observed in the results. Additionally, a sensitivity analysis was performed to evaluate the inclusion of COVID-19 diagnosis codes in the definition. No meaningful changes were observed in the results when restricting to a cohort of patients with a positive PCR laboratory test (n=39,798). Centers for Disease Control and Prevention -Daily Updates of Totals by Week and State Centers for Disease Control and Prevention -Symptoms of Corona Virus Clinical Characteristics of Coronavirus Disease 2019 in China Centers for Disease Control and Prevention -How Covid-19 Spreads Asymptomatic Transmission, the Achilles' Heel of Current Strategies to Control Covid-19 Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area Characteristics of Hospitalized Adults with COVID-19 in an Integrated Health Care System in California Epidemiology, clinical course, and outcomes of critically ill adults with COVID-19 in New York City: a prospective cohort study Clinical Characteristics of Hospitalized Covd-19 Patients in New York City. medRxiv Posted Defining the Epidemiology of Covid-19 -Studies Needed Using Electronic Health Records to Estimate the Prevalence of Agitation in Alzheimer disease/dementia Assessing Occurrence of Hypoglycemia and Its Severity From Electronic Health Records of Patients With Type 2 Diabetes Mellitus Charlson Comorbidity Index: ICD-9 Update and ICD-10 Translation COVID-19 Diagnosis and Management: A Comprehensive Review The Origin, Transmission and Clinical Therapies on Coronavirus Disease 2019 (COVID-19) Outbreak -An Update on the Status Dexamethasone in the Management of Covid -19 Effect of tocilizumab in hospitalized patients with severe pneumonia COVID-19: a cohort study The pandas development team Data structures for statistical computing in python Control and Prevention -Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019 -United States Risk Factors of Critical & Mortal COVID-19 Cases: A Systematic Literature Review and Meta-Analysis Are Patients With Hypertension and Diabetes Mellitus at Increased Risk for COVID-19 Infection? Antihypertensive Drugs and Risk of COVID-19? Structural Basis for the Recognition of SARS-CoV-2 by Full-Length Human ACE2 ACE inhibitors, AT1 receptor blockers and COVID-19: clinical epidemiology evidences for a continuation of treatments. The ACER-COVID study Current Status of Epidemiology, Diagnosis, Therapeutics, and Vaccines for Novel Coronavirus Disease 2019 (COVID-19) Center for Disease Control and Prevention. Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19) Definition and Epidemiology of Acute Respiratory Distress Syndrome Bacterial and Fungal Co-Infection in Individuals With Coronavirus: A Rapid Review to Support COVID-19 Antimicrobial Prescribing Bacterial and Fungal Coinfection Among Hospitalised Patients With COVID-19: A Retrospective Cohort Study in a UK Secondary Care Setting Online ahead of print Recognition and Management of Respiratory Coinfection and Secondary Bacterial Pneumonia in Patients with COVID-19 Incidence of Asymptomatic Deep Vein Thrombosis in Patients With COVID-19 Pneumonia and Elevated D-dimer Levels COVID-19 and Its Implications for Thrombosis and Anticoagulation Chloroquine and Hydroxychloroquine in covid-19 An Updated Systematic Review of the Therapeutic Role of Hydroxychloroquine in Coronavirus Disease-19 (COVID-19) Comorbidities were defined as the presence of relevant ICD 9/10 diagnosis codes in the 1 year prior to COVID-19 diagnosis date.