key: cord-0786135-a4babgpl authors: Murk, William; Gierada, Monica; Fralick, Michael; Weckstein, Andrew; Klesh, Reyna; Rassen, Jeremy A. title: Diagnosis-wide analysis of COVID-19 complications: an exposure-crossover study date: 2021-01-04 journal: CMAJ DOI: 10.1503/cmaj.201686 sha: a5c0302bbebed9a1158eee6e744b1c8cc49051d6 doc_id: 786135 cord_uid: a4babgpl BACKGROUND: Many studies reporting coronavirus disease 2019 (COVID-19) complications have involved case series or small cohorts that could not establish a causal association with COVID-19 or provide risk estimates in different care settings. We sought to study all possible complications of COVID-19 to confirm previously reported complications and to identify potential complications not yet known. METHODS: Using United States health claims data, we compared the frequency of all International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) diagnosis codes occurring before and after the onset of the COVID-19 pandemic in an exposure-crossover design. We included patients who received a diagnosis of COVID-19 between Mar. 1, 2020, and Apr. 30, 2020, and computed risk estimates and odds ratios (ORs) of association with COVID-19 for every ICD-10-CM diagnosis code. RESULTS: Among 70 288 patients with COVID-19, 69 of 1724 analyzed ICD-10-CM diagnosis codes were significantly associated with COVID-19. Disorders showing both strong association with COVID-19 and high absolute risk included viral pneumonia (OR 177.63, 95% confidence interval [CI] 147.19–214.37, absolute risk 27.6%), respiratory failure (OR 11.36, 95% CI 10.74–12.02, absolute risk 22.6%), acute kidney failure (OR 3.50, 95% CI 3.34–3.68, absolute risk 11.8%) and sepsis (OR 4.23, 95% CI 4.01–4.46, absolute risk 10.4%). Disorders showing strong associations with COVID-19 but low absolute risk included myocarditis (OR 8.17, 95% CI 3.58–18.62, absolute risk 0.1%), disseminated intravascular coagulation (OR 11.83, 95% CI 5.26–26.62, absolute risk 0.1%) and pneumothorax (OR 3.38, 95% CI 2.68–4.26, absolute risk 0.4%). INTERPRETATION: We confirmed and provided risk estimates for numerous complications of COVID-19. These results may guide prognosis, treatment decisions and patient counselling. S evere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel strain of coronavirus that has been identified as the cause of the coronavirus disease 2019 (COVID- 19) pandemic. As of Nov. 20, 2020, more than 50 million people have received a diagnosis of COVID-19 globally. 1 The clinical spectrum of disease is wide and can range from symptoms typical of the common cold to respiratory failure and death. 2 Most patients have mild symptoms and can be managed as outpatients, but as many as 20% have a severe form of the disease requiring admission to hospital, commonly presenting with hypoxia secondary to pneumonia. 3 Studies also show that COVID-19 is associated with a wide variety of nonrespiratory sequelae, including endothelial, thrombotic, cardiac, inflammatory, neurologic and other complications. [4] [5] [6] [7] [8] [9] Whether these associations are causal is not well established, as many of these findings originate from case reports, which are prone to publication bias and cannot provide risk estimates, or from cohort studies that often do not provide relative risk estimates. An alternative strategy for identifying potential complications of COVID-19 is studying all possible complications as captured in International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10 CM) diagnosis codes, which allows for the discovery of unreported complications and can confirm previously identified ones. The objective of our study was to analyze all diagnoses associated with COVID-19, to identify those that could be complications of the disease and to present both the absolute risk and relative odds of any complications identified. This study used de-identified United States medical claims from HealthVerity's Marketplace data set. These data are claims from nationally representative health plans that encompass all major payer types (commercial, Medicaid and Medicare). The data include open claims, which are sourced in near real time from practice management systems, billing systems and claims clearinghouses, as well as closed claims, which are sourced from insurance providers and payers. Data from open claims can be captured within days of a health care encounter, while data from closed claims have longer lag times. Thus, open claims provide more recent data, and closed claims encompass a more complete view of a patient's interactions with the health care system. These data sources are further described in Appendix 1, Methods, available at www.cmaj.ca/lookup/doi/10.1503/ cmaj.201686/tab-related-content. Claims were available with dates of service from Nov. 1, 2019, through May 30, 2020. We included all patients who had had at least 1 medical encounter related to COVID-19, defined as any claim with an ICD-10-CM diagnosis code of U07.1 (COVID-19, virus identified) or B97.29 (other coronavirus as the cause of diseases classified elsewhere) that occurred within the period from Mar. 1, 2020, to Apr. 30, 2020. Apr. 30 was the last eligibility date to allow for a month of possible follow-up for all patients. Codes were selected based on US Centers for Disease Control and Prevention (CDC) coding guidance for confirmed COVID-19 cases. 10, 11 The first eligible claim within the study period was the index date. Patients with a code for U07.1, B97.29 or B97.21 (SARS-associated coronavirus) occurring before index, and those with a missing year of birth, were excluded. Patients with a code for Z03.818 occurring between 7 days before index and 30 days after index were also excluded, because this code indicates that SARS-CoV-2 exposure has been ruled out. 10 To ensure that all analyzed patients were observable before their COVID-19 diagnosis, patients were excluded if they did not have any medical claim between 120 days before index and 30 days before index. We also assessed subgroups of patients treated only on an outpatient basis, patients treated as inpatients, and patients who were admitted to the intensive care unit (ICU) (Appendix 1, Methods). We used an exposure-crossover design 12 to identify diagnoses whose odds increased after the onset of COVID-19. This design minimizes unmeasured confounding by comparing a patient's condition upon COVID-19 diagnosis (the "hazard period") with their own condition at a previous time (the "baseline period"). Any increased odds of a condition are likely attributable to COVID-19 sequelae or treatment. To capture activity related to COVID-19 that occurred just before the patient received a diagnosis of the disease (Appendix 1, eFigure 2), we defined the hazard period as the 7 days before the index date through the 30 days after index. We defined the baseline period as the 120 days before the index date through 30 days before index. We chose this 90-day window as it was long enough to capture most chronic conditions while ensuring equivalent capture of baseline status for all patients. We considered the 23 days between the baseline and hazard period as a period of unknown COVID-19 disease state (Appendix 1, eFigure 2) and excluded that person-time from the analysis. We also performed sensitivity analyses to assess the effect of the duration of the baseline period by conducting an analysis where the baseline period was identical in length to the hazard period (days 68 through 30 before index; 38 days total). The study design is illustrated in Appendix 1, eFigure 3. To identify ICD-10-CM codes associated with COVID-19, we paired observations of each patient in the baseline and hazard periods and computed matched odds ratios (ORs) to estimate the strength of association of the code with COVID-19, as well as McNemar's test p values. 13 We used exact tests if the number of discordant pairs was less than 25. ICD-10-CM codes are constructed as a 3-character header indicating system classification and disease and can be followed by up to 4 digits offering greater specificity. 14, 15 For our primary analysis, we analyzed all 1724 ICD-10-CM diagnosis codes present in the data set, aggregated at the header level (e.g., "R43 -Disturbances of smell and taste"). For any header code from the primary analysis found to be significantly associated, we also analyzed it at the individual code level (e.g., "R43.0 -Anosmia"). For our secondary analysis, we analyzed all 64 931 individual ICD-10-CM codes. We applied Bonferronicorrected significance thresholds, defined as a nominal type 1 error of 0.05 divided by the number of analyzed codes (2.9E-05 and 7.7E-07 for primary and secondary analyses, respectively). 16 We excluded codes used to identify cases (B97 and U07). A candidate COVID-19 complication was any code that increased in odds with COVID-19 at a Bonferroni-corrected level of significance. We also calculated the absolute risk of becoming newly diagnosed with each ICD-10-CM code upon having a COVID-19 diagnosis. This was the percentage of patients with the code in the hazard period, calculated among patients who did not have the code in the baseline period (Appendix 1, eFigure 3). This study was approved under exemption by the New England Institutional Review Board (#1-9757-1). We identified 70 288 patients with a diagnosis of COVID-19 (Table 1 ). Excluded patients are described in Appendix 1, eFigure 4. Of these patients, 53.4% were admitted to hospital and 4.7% to the ICU. The median age was 65 years, and 55.8% were female. The 5 most common states of residence of patients were New York (19.2%), New Jersey (10.2%), Michigan (9.3%), Pennsylvania (7.5%) and Illinois (7.2%). The median numbers of diagnosis codes in the baseline and hazard period are shown in Appendix 1, eTable 1. Baseline prevalence estimates of selected conditions that may affect the risk of COVID-19 or its sequelae are listed in Table 1 . Among all 1724 diagnosis codes considered at the code header level in our primary analysis, we identified 69 codes that increased in odds with COVID-19 at a Bonferroni-corrected level of statistical significance. The respiratory and circulatory systems were the most broadly affected physiologic systems ( Figure 1 ). Odds ratios for these candidate COVID-19 complications are shown in Figure 2 . For the respiratory system, codes having the strongest association with COVID-19 included pneumonia (e.g., viral pneumonia: OR 177.63, 95% CI 147. 19 Among identified respiratory system codes, those with the highest risks included pneumonia (e.g., viral pneumonia: 27.6% and 81.0% among all patients and patients admitted to the ICU, respectively), respiratory failure (22.6% and 75.3%, respectively) and ARDS (4.3% and 26.0%, respectively). Among identified circulatory system codes, the highest risks included atrial fibrillation or flutter (4.6% and 16.1% among all patients and patients in the ICU, respectively), hypotension (3.8% and 17.3%, respectively) and acute myocardial infarction (1.8% and 8.1%, respectively). The risk for acute kidney failure was 11.8% among all patients and 50.7% among patients in the ICU, and the risk for other sepsis was 10.4% and 54.1%, respectively. Other disorders of the brain (including encephalopathy and other conditions; Appendix 1, eTable 4) showed risks of 4.9% and 24.9%, respectively. Figure 3 shows risk estimates plotted against OR estimates, where it is evident that the most high-risk, highly associated disorders included viral pneumonia, respiratory failure, sepsis or systemic inflammation, acute kidney failure and ARDS. Risks for these conditions by age and hospital admission status are shown in Figure 4 and Appendix 1, eTable 6. Most other complications had relatively low overall risk estimates even if they were highly associated with COVID-19, such as pneumothorax (0.4% and 2.6% absolute risk among all patients and patients in the ICU, respectively), disseminated intravascular coagulation (0.1% and 0.9%, respectively) and acute myocarditis (0.1% and 0.5%, respectively). Appendix 1 contains risk information for all candidate COVID-19 complications identified in the primary analysis, calculated among all patients (eTable 2), age-stratified patients (eTable 7), sex-stratified patients (eTable 8), and by hospital admission status (eTable 9). Our sensitivity analysis to evaluate the effect of using a shorter, 30-day baseline period resulted in many chronic conditions apparently increasing in frequency after COVID-19, confirming that the longer, 90-day baseline period of the primary analysis was necessary to detect chronic conditions that were already present before COVD-19 (Appendix 1, eTable 10). It also found that codes that may represent acute conditions, such as "other venous embolism and thrombosis," became moderately increased in frequency with COVID-19 (OR 1.74, 95% CI 1.58-1.93), suggesting that the primary analysis could have missed some acute conditions with comparatively modest associations with COVID-19. In this study of more than 70 000 individuals who received a diagnosis of COVID-19, we found that the disease was associated with a broad range of complications. The more common complications that we identified -including viral pneumonia, respiratory failure, acute kidney failure and sepsis -were expected, as they have been well described in the literature. [17] [18] [19] [20] We also identified less common complications, previously described in case series or small studies, such as disseminated intravascular coagulation, 21 pneumothorax, 22 myocarditis 23 and rhabdomyolysis. 24 This study provides estimates of absolute risk and relative odds for all identified diagnoses related to COVID-19, which are needed to help providers, patients and policy-makers understand the likelihood of complications. For example, acute myocarditis was found to have an OR of 8.17 but an overall risk of 0.1%, illustrating how a very strong association of a condition with COVID-19 does not necessarily translate into a high overall risk. Since the first case of COVID-19 described in 2019, a sharp rise in the number of studies related to COVID-19 has made it challenging for clinicians to keep up with the literature and distinguish spurious findings from causal effects. Reporting bias and publication bias can result in an over-representation of some findings, and published findings beget confirmation of the same. A key strength of our analytic approach is that it considered all possible ICD-10-CM diagnoses codes and simply quantified diagnoses that occurred after the onset of COVID-19. Although COVID-19 has been widely reported to increase the risk of stroke, 25 this was not seen in our study. For I63 ("cerebral infarction") in our overall population, we observed an OR of 0.58 and an overall risk of 1.5% (Appendix 1, eTable 3), suggesting that while patients with COVID-19 do experience stroke at significant frequencies, a causal association with COVID-19 was not supported in this population. Multisystem inflammatory syndrome in children could not be directly assessed because it has no specific ICD-10-CM code, although no association was noted for the similar condition Kawasaki disease, as evaluated under the code M30.3. We also observed many diagnosis codes that appeared to decrease in odds after the onset of COVID-19. We propose 2 possible explanations. During the initial stages of COVID-19 treatment, many chronic conditions and less severe conditions may not have been considered priorities for care and were therefore less likely to be captured in a claim. In addition, for acute conditions, the baseline period (90 d) could be accruing more events than the hazard period (38 d) because it is longer in duration. This could explain why, for example, cerebral infarction appeared to decrease in odds after the onset of COVID-19 in our primary analysis, while it slightly increased in odds in our sensitivity analysis using a shorter baseline period. In interpreting our risk estimates, it is important to note that mild or less clinically overt conditions are not as commonly coded in claims data. This likely explains the fact that, although we observed strong associations with COVID-19 for complications such as cough and disturbances of smell or taste, their overall risk estimates (22.0% and 0.6%, respectively, in the overall population) were substantially lower than has been reported (79% and 65%, respectively). 9, 26 Thus, the findings for codes related to relatively mild conditions may be more valuable in representing strengths of association than absolute risk estimates. In contrast, risk estimates for severe, overt disorders are more likely to reflect actual risk, as they are more likely to be treated or brought to the attention of a care provider, and thus are more consistently captured in a medical claim. 27 For example, our risk estimates among inpatients for respiratory failure (40.0%) and acute kidney failure (21.2%) are similar to previously reported estimates (54% and 15%, respectively). 20 An additional caveat to the interpretation of our risk estimates is that they estimate only the risk of newly diagnosed disease and do not estimate the risk of events where a pre-existing condition may have been exacerbated by COVID-19. Moreover, our estimates reflect only the risk among patients who seek medical care for COVID-19. Strengths of this study include its large sample size and data-driven approach. An additional strength is the exposurecrossover design, which controls for confounding by matching observations pre-and post-COVID-19 diagnosis. 12 By using inpatient and outpatient claim records, we were able to leverage a relatively complete medical history for each patient to account for conditions that were already present before COVID-19 diagnosis. A limitation of the claims data is that we identified COVID-19 cases using diagnosis codes. Although the codes we used to identify patients are intended for confirmed cases of COVID-19, 10,11 it is possible that some patients were misclassified. Furthermore, the recency of the COVID-19 pandemic necessitated the use of open medical claims that may not be fully adjudicated at the time of capture and may be subject to change if disputed by a payer. As a result, some misclassification may be present in the data, which could reduce the precision of estimates. In addition, the ICD-10-CM codes are not necessarily validated disease definitions and are assigned by medical professionals working in different places of care throughout the US; thus, these codes do not capture their intended disease concepts with complete consistency or fidelity. However, given the size of the data set, we believe that these issues did not have a substantial impact on our results. Moreover, because older data in the baseline period are likely more complete than those of the hazard period, there may have been an underestimation of ORs, resulting in a gener-ally conservative bias in the identification of COVID-19 complications. Given that the results of this study are broadly in line with known complications of COVID-19, we believe that this did not meaningfully affect our results. The care settings reflected in the database were not exclusively academic medical centres, where more severe cases of COVID-19 may be treated; as such, our results may have missed some of the more severe cases. Another limitation on generalizability is that patients were required to have at least 1 medical claim and thus may be more ill in the baseline period than the general population. As a result, risk estimates reported here may be greater than the risks in the general population, as the studied patients were more likely to have comorbidities. Finally, some of our findings may not be direct consequences of infection with SARS-CoV-2 but instead may be iatrogenic effects of treatment. Overall, the most common complications associated with COVID-19 among patients seeking medical care include pneumonia, respiratory failure, kidney failure, and sepsis or systemic inflammation. After analyzing all possible diagnosis codes, we confirm that COVID-19 is also associated with a diverse array of additional cardiac, thrombotic and other conditions, although the overall risks for most of these complications are comparatively low. Understanding the full range of associated conditions can aid in prognosis, guide treatment decisions and better inform patients as to their actual risks for the variety of COVID-19 complications reported in the literature and media. ) 2.11 (1.86-2.38) 1.32 (1.18-1.48) 13) 1.21 (1.15-1.27) 2.79 (2.70-2.88) 3.42 (3.31-3.53) 3.10 (2 COVID-19) dashboard. Geneva: World Health Organization Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Interim clinical guidance for management of patients with confirmed coronavirus disease (COVID-19) Endothelial cell infection and endotheliitis in COVID-19 Incidence of thrombotic complications in critically ill ICU patients with COVID-19 Cardiovascular complications in COVID-19 Multisystem inflammatory syndrome in children during the COVID-19 pandemic: a case series Neurologic manifestations of hospitalized patients with coronavirus disease 2019 in Wuhan, China Alterations in smell or taste in mildly symptomatic outpatients with SARS-CoV-2 infection Atlanta: Centers for Disease Control and Prevention ICD-10-CM official coding and reporting guidelines The exposure-crossover design is a new method for studying sustained changes in recurrent events Exact McNemar's Test and matching confidence intervals ICD-10: history and context Baltimore: Centers for Medicare and Medicaid Services Multiple significance tests: the Bonferroni method Acute respiratory failure in COVID-19: Is it "typical Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area Clinical characteristics of COVID-19 in New York City Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia SARS-CoV-2 infection associated with spontaneous pneumothorax Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China Rhabdomyolysis as potential late complication associated with COVID-19 Stroke risk, phenotypes, and death in COVID-19: Systematic review and newly reported cases Early chest CT features of patients with 2019 novel coronavirus (COVID-19) pneumonia: relationship to diagnosis and prognosis Guidance for industry and FDA staff: best practices for conducting and reporting pharmacoepidemiologic safety studies using electronic healthcare data William Murk is a consultant to and holds stocks in Aetion, Inc. Monica Gierada, Andrew Weckstein and Jeremy Rassen are employees of and hold stock options or stocks in Aetion, Inc. Reyna Klesh is an employee of HealthVerity, Inc., which provided the data for this study. No other competing interests were declared. This article has been peer reviewed. Data sharing: Data-sharing agreements prohibit the patient-level data to be publicly available. Aggregate data of patient counts for diagnosis codes not already provided in the supplement are available from the authors upon request. This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is non-commercial (i.e. research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/ licenses/by-nc-nd/4.0/