key: cord-0076703-gjsu5u6y authors: Ellis, Randall P.; Hsu, Heather E.; Siracuse, Jeffrey J.; Walkey, Allan J.; Lasser, Karen E.; Jacobson, Brian C.; Andriola, Corinne; Hoagland, Alex; Liu, Ying; Song, Chenlu; Kuo, Tzu-Chun; Ash, Arlene S. title: Development and Assessment of a New Framework for Disease Surveillance, Prediction, and Risk Adjustment: The Diagnostic Items Classification System date: 2022-03-25 journal: JAMA Health Forum DOI: 10.1001/jamahealthforum.2022.0276 sha: 9ed5c1ecd3a9a664eed6bd12789bb9fa6da3d1a6 doc_id: 76703 cord_uid: gjsu5u6y IMPORTANCE: Current disease risk-adjustment formulas in the US rely on diagnostic classification frameworks that predate the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). OBJECTIVE: To develop an ICD-10-CM–based classification framework for predicting diverse health care payment, quality, and performance outcomes. DESIGN, SETTING, AND PARTICIPANTS: Physician teams mapped all ICD-10-CM diagnoses into 3 types of diagnostic items (DXIs): main effect DXIs that specify diseases; modifiers, such as laterality, timing, and acuity; and scaled variables, such as body mass index, gestational age, and birth weight. Every diagnosis was mapped to at least 1 DXI. Stepwise and weighted least-squares estimation predicted cost and utilization outcomes, and their performance was compared with models built on (1) the Agency for Healthcare Research and Quality Clinical Classifications Software Refined (CCSR) categories, and (2) the Health and Human Services Hierarchical Condition Categories (HHS-HCC) used in the Affordable Care Act Marketplace. Each model’s performance was validated using R(2), mean absolute error, the Cumming prediction measure, and comparisons of actual to predicted outcomes by spending percentiles and by diagnostic frequency. The IBM MarketScan Commercial Claims and Encounters Database, 2016 to 2018, was used, which included privately insured, full- or partial-year eligible enrollees aged 0 to 64 years in plans with medical, drug, and mental health/substance use coverage. MAIN OUTCOMES AND MEASURES: Fourteen concurrent outcomes were predicted: overall and plan-paid health care spending (top-coded and not top-coded); enrollee out-of-pocket spending; hospital days and admissions; emergency department visits; and spending for 6 types of services. The primary outcome was annual health care spending top-coded at $250 000. RESULTS: A total of 65 901 460 person-years were split into 90% estimation/10% validation samples (n = 6 604 259). In all, 3223 DXIs were created: 2435 main effects, 772 modifiers, and 16 scaled items. Stepwise regressions predicting annual health care spending (mean [SD], $5821 [$17 653]) selected 76% of the main effect DXIs with no evidence of overfitting. Validated R(2) was 0.589 in the DXI model, 0.539 for CCSR, and 0.428 for HHS-HCC. Use of DXIs reduced underpayment for enrollees with rare (1-in-a-million) diagnoses by 83% relative to HHS-HCCs. CONCLUSIONS: In this diagnostic modeling study, the new DXI classification system showed improved predictions over existing diagnostic classification systems for all spending and utilization outcomes considered. Notes: HCC is the Hierarchical Condition Category model, CCSR is the Clinical Classifications Software Refined model, DXI is the Diagnostic Items model, OLS is ordinary least squares, and SW is stepwise. For the HCC, CCSR, and DXI models, we calculated the residuals from the top-coded total spending model at the enrollee-year level and then assigned these residuals to every unique ICD-10-CM diagnosis each enrollee had in a year. We then calculated enrollee-weighted mean residuals in the validation sample using the binned frequencies of diagnoses in the full sample, with frequency intervals determined by powers of ten per million. Plot whiskers correspond to 95% confidence intervals, corrected for clustering at the patient level. The count of inpatient admissions by enrollee-year was defined as the count of distinct values of the CASEID variable from the Inpatient Admissions Tables I. The year of an inpatient admission was based on the date of admission. Watson Health used a proprietary admission construction methodology to group claims and encounters into inpatient admissions, which were uniquely identified in the Inpatient Admissions Tables I by the In total, there were 89 enrollee-years that had an annualized sum of LOS that exceeded 365 days (or 366 in 2016). To avoid this, we topcoded the annualized sum of LOS by enrollee-year at 366 in all years. Emergency department (ED) claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Outpatient and Inpatient Services Tables O and S, which corresponded to the service type. SVCSCAT values that have the last 2 characters "20" corresponded to ED visits. There may have been multiple professional and facility claims associated with a single ED visit. ED claims with adjacent service dates for the same enrollee may have corresponded to a single overnight ED visit or distinct ED visits. To group ED claims into ED visits, we followed the Yale Operational Definition for ED Visitation presented in Venkatesh et al. (2017) . 27 In this method, a professional ED claim is treated as a unique ED visit. ED facility claims which occur within ± 1 day of a professional ED claim are grouped with the professional ED claim. All other ED facility claims are treated as distinct ED visits. Because multiple facility and professional claims may be associated with a single ED visit in the data, we grouped all ED claims by enrollee, date of service, and professional or facility category into day episodes before we applied this method. That is, a professional ED day episode was treated as a unique ED visit. ED facility day episodes which occurred within ± 1 day of a professional ED day episode were grouped with the professional ED day visit. All other ED facility day episodes were treated as distinct ED visits. Claims were classified as professional or facility using the variable FACPROF. There were a small number of enrollees with a high number of ED visits with dates of service that fell outside of their period of enrollment from the Enrollment Tables A. Because the count of ED visits at the enrollee-year level were annualized using the months of enrollment from the Enrollment Tables A, this resulted in 2 enrollee-years in the development sample where the annualized count of ED visits exceeded 365 days (or 366 in 2016). To avoid this, we top-coded the annualized count of ED visits at 366 in all years. IP facility and specialty drug claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Inpatient Services Tables S. The last two digits of SVCSCAT corresponded to the service type -"34" indicated facility pharmacy services and "36" indicated specialty drug services. Definition Outpatient Facility Pharmacy Spending OP facility and specialty drug claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Outpatient Services Tables O. The last two digits of a SVCSCAT value corresponded to the service type -"34" indicated facility pharmacy services and "36" indicated specialty drug services. Outpatient Retail Pharmacy Spending All records from the Outpatient Pharmaceutical Claims Tables D corresponded to retail pharmacy and mail-order drug claims. Laboratory Spending Laboratory claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Outpatient and Inpatient Services Tables O and S. The last two digits of a SVCSCAT value corresponded to the service type -laboratory services had "5" as the penultimate character. Laboratory services included chemistry tests, hematology, immunology, microbiology, pathology, urinalysis tests, and other laboratory services. Imaging Spending Imaging claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Outpatient and Inpatient Services Tables O and S. The last two digits of a SVCSCAT value corresponded to the service type -imaging services had "6" as the penultimate character. Imaging services included CT scans, mammograms, MRIs, nuclear medicine, PET scans, therapeutic radiology, ultrasounds, X-Rays, and other radiology services. Preventive Care Visits Spending Preventive care visits claims were identified using the last two characters of the service sub-category code in the SVCSCAT variable from the Outpatient and Inpatient Services Tables O and S. The last two digits of a SVCSCAT value corresponded to the service type -"24" indicated preventative care visit services. Notes: All outcomes are annualized and then weighted by the fraction of the year eligible for all enrollee-years except newborns. We recoded spending for the 0.008% of enrollee-years with negative total spending to zero, and the spending for the 0.016% with spending over three million dollars to three million dollars. Together these two adjustments lowered mean total spending and paid amounts by 0.051% and 0.059%, respectively. We recoded the spending by type of service for enrollee-years with spending over one million dollars for a given type of service to one million dollars. This appendix provides additional information on specific topics related to creating and evaluating DXIs that are mentioned, but not discussed extensively, in the main text. For the existing Marketplace risk adjustment model, as well as for most performance assessment and other severity adjustments done using ICD-10-CM diagnoses, the norm is to filter out non-billable diagnoses before calculating payments. 1 For many research projects this is not always done, since even non-billable diagnoses may contain information that may be useful for prediction and/or disease surveillance. Most root codes of billable diagnoses are not billable and thus may be called "invalid diagnoses" even if more detailed codes are valid. Despite this standard policy for payment purposes, non-billable diagnoses were relatively common in the IBM Marketscan commercial claims dataset used here, comprising 1.31 percent of all diagnoses appearing on claims. This was true even after processing the claims using CMS algorithms to remove diagnoses not attached to clinician types that the Medicare program considers as valid for assigning diagnoses. We mapped all 22,512 non-billable diagnoses and 71,934 billable codes as of October 2019 (N = 94,446 in total). Subsequent to our original physician reviews of all chapters, we further included the 2020 emergency use ICD-10-CM codes for COVID-19 and vaping-related disorders. The DXI modifier categories and their labels were created by physicians primarily using the long and short labels of individual diagnoses and their root codes as reported on the AHRQ web site as of May 2020. 2 This project was also informed by the March 2019 release of the WHO ICD-11 coding system, scheduled for use in adopting countries in January 2022, which included a chapter for "extension codes" that can be added to ICD-11 diagnoses to capture additional clinical detail. 3 Although the content of many of these extensions was already adopted in the existing ICD-10-CM labels used for this project, in a few cases the ICD-11 naming system was relied on to standardize terminology. ICD-11 extensions were prominent in the neoplasm and injuries chapters, which used a matrix rather than a list format for presenting diagnostic information about disorders. Physician assignment of DXI modifiers was supplemented by text searches for ICD-11 extension code description strings in the US long descriptions of ICD-10-CM codes, including key words such as "bilateral", "left", "right", and "unspecified side". Although it was our goal to create DXIs with at least 500 cases in them, for some sets of diagnoses grouped into DXIs this was not possible, and smaller sizes were allowed. In our development sample, only six DXI_1s had fewer than 200 cases; we excluded these six indicator variables from regressions due to concerns of imprecision. These zero or low frequency DXI_1s included Ebola, COVID-19, severe acute respiratory syndrome (SARS-2) and vaping-related disorders. These DXIs were created for their future use for disease tracking and research purposes, but no regression coefficients were assigned to any of them in our predictive models. We excluded 75 variables that were collinear with other variables in the model, including CCSR that coincided with our DXI_1s or their sums after filtering on ICD-10-CM billable codes. Only 18.4% of all ICD-10-CM diagnoses were ultimately assigned to one DXI. eFigure 1 shows the distribution of diagnoses according to the number of DXIs (of all three types) assigned, which ranged from one to seven. eFigure 2 illustrates the similar structure of the CCSR, where multiple CCSR were allowed, but only 8.2% of diagnoses were assigned to multiple categories (ranging from one to five CCSR per diagnosis). In this paper we utilized only the DXI_1 main effects, reserving for future research incorporating the information in the DXI_2 modifiers or the DXI_3 scaled variables. All scaled variables were also stored as binary flags for each value as DXI_2 modifiers, which we did not attempt to aggregate to maintain at least 500 cases in each variable. Since some of the modifier information, such as initial, subsequent, and sequela, has been used in the HCC and CCSR classification systems, our DXI+CCSR system partially benefitted from such modifiers. In this study we estimated predictive models without imposing any restrictions on coefficients or imposing hierarchies on variables. Non-negativity restrictions are common in payment models, where researchers have often constructed models to ensure that all included coefficients are positive. Previous work has documented that the original CMS-HCC models included manual corrections to coefficients to avoid negative predictions, such as by resetting to zero one or more negative age-sex interaction terms, and constraining selected HCCs for severe developmental disabilities to be nonnegative. 4, 5 In our framework, a single negative coefficient does not necessarily imply that payment predictions will be negative. For example, if there are DXIs A, B, and C such that the variables DXI_A= DXI_B + DXI_C and the coefficient on DXI_B is less than on DXI_C, then when only DXI_A and DXI_B are included in the model then the coefficient on DXI_B will be negative, and the sum of the coefficients on A and B is positive. This will remain true when variables are simply correlated rather than perfectly colinear. This holds true in particular in our framework because we intentionally included overlapping CCSR and DXI variables: it was common that sets of detailed DXIs were a strict subset, or approximately so, of many CCSR categories. The linear prediction models developed here illustrate the predictive power of each of the information sets examined but have not been optimized for use in payment models. Understanding the predictive power of the different diagnostic classification systems is informative even if further work is needed to ensure that predictions are non-negative. Figure 2 in the text calculates average residuals by diagnostic frequencies for our top-coded total spending and not top-coded total spending models. To calculate these frequencies we counted for each billable diagnosis in the full sample how many enrollee-years had at least one claim with a given billable ICD-10-CM code, and divided this count by the number of enrollee-years in the sample. We grouped diagnoses by prevalence into logarithmic base 10 bins (< 1 per million, 1-10 per million, …, 10,000-100,000 per million). We generated a dataset with each distinct combination of enrollee-year and billable diagnosis in the validation sample, and then mapped onto it the residuals from that sample by enrollee-year. We then calculated the validation sample mean residual values by model and disease prevalence bin. Because the data sample included repeated draws of enrollees across calendar years, we calculated the standard errors of the sample means correcting for clustering at the enrollee-year level. Clinical Classification Software Refine CCSR version v2020.3. ICD-10-CM Diagnosis Tool Fiscal Year 2020, Released May 2020 -valid for ICD 10-CM diagnosis codes through ICD-11 extension codes support detailed clinical abstraction and comprehensive classification Using diagnoses to describe populations and predict costs Risk adjustment of Medicare capitation payments using the CMS-HCC model