key: cord-0968402-y65qfhno authors: Fecho, Karamarie; Ahalt, Stanley C; Appold, Stephen; Arunachalam, Saravanan; Pfaff, Emily; Stillwell, Lisa; Valencia, Alejandro; Xu, Hao; Peden, David B title: Development and Application of an Open Tool for Sharing and Analyzing Integrated Clinical and Environmental Exposures Data: Asthma Use Case date: 2022-04-01 journal: JMIR Form Res DOI: 10.2196/32357 sha: 6d4f20addaa85b3ec99858984c64111aa8115aa0 doc_id: 968402 cord_uid: y65qfhno BACKGROUND: The Integrated Clinical and Environmental Exposures Service (ICEES) serves as an open-source, disease-agnostic, regulatory-compliant framework and approach for openly exposing and exploring clinical data that have been integrated at the patient level with a variety of environmental exposures data. ICEES is equipped with tools to support basic statistical exploration of the integrated data in a completely open manner. OBJECTIVE: This study aims to further develop and apply ICEES as a novel tool for openly exposing and exploring integrated clinical and environmental data. We focus on an asthma use case. METHODS: We queried the ICEES open application programming interface (OpenAPI) using a functionality that supports chi-square tests between feature variables and a primary outcome measure, with a Bonferroni correction for multiple comparisons (α=.001). We focused on 2 primary outcomes that are indicative of asthma exacerbations: annual emergency department (ED) or inpatient visits for respiratory issues; and annual prescriptions for prednisone. RESULTS: Of the 157,410 patients within the asthma cohort, 26,332 (16.73%) had 1 or more annual ED or inpatient visits for respiratory issues, and 17,056 (10.84%) had 1 or more annual prescriptions for prednisone. We found that close proximity to a major roadway or highway, exposure to high levels of particulate matter ≤2.5 μm (PM(2.5)) or ozone, female sex, Caucasian race, low residential density, lack of health insurance, and low household income were significantly associated with asthma exacerbations (P<.001). Asthma exacerbations did not vary by rural versus urban residence. Moreover, the results were largely consistent across outcome measures. CONCLUSIONS: Our results demonstrate that the open-source ICEES can be used to replicate and extend published findings on factors that influence asthma exacerbations. As a disease-agnostic, open-source approach for integrating, exposing, and exploring patient-level clinical and environmental exposures data, we believe that ICEES will have broad adoption by other institutions and application in environmental health and other biomedical fields. Several large-scale initiatives are advancing efforts to reduce barriers surrounding access to patient data maintained in electronic health record (EHR) systems. Relevant initiatives include Columbia Open Health Data [1] and Medical Information Mart for Intensive Care [2] . The common goal is to promote open access to and sharing of patient data for research purposes, while respecting and preserving patient privacy and institutional assurances. As part of the Biomedical Data Translator program (Translator) [3, 4] , supported by the National Center for Advancing Translational Sciences, we have developed a disease-agnostic, regulatory-compliant framework and approach for openly exposing and exploring patient data: the Integrated Clinical and Environmental Exposures Service (ICEES) [5] . ICEES was designed to overcome the regulatory, cultural, and technical challenges that hinder efforts to openly share and explore patient data [6, 7] . ICEES is unique from similar efforts toward open patient data in that the service provides access to clinical data that have been integrated at the patient level with environmental exposures data derived from a variety of public sources. Thus, ICEES allows for patient-level research in environmental health and related fields. Herein, we describe the further development and application of ICEES to data on a large cohort of patients with a diagnosis of asthma or a related condition. We examine the impact of select airborne pollutant exposures, demographic factors, and socioeconomic exposures on asthma exacerbations, which we define using 2 primary outcome measures: annual emergency department (ED) or inpatient visits for respiratory issues and annual prescriptions for prednisone. We present our findings and compare results for the 2 outcome measures. All study procedures were approved by the Institutional Review Board at the University of North Carolina at Chapel Hill (protocol #16-2978). Informed consent was not required as the study involved existing biomedical data only and patient contact was not involved. was designed as a disease-agnostic, regulatory-compliant, open platform. For the work described here, we focused on 157,410 patients with asthma or a related pulmonary condition at UNC Health (all available sites). The specific criteria used to select patients for inclusion in the ICEES asthma cohort were adapted from [8] and included a combination of diagnoses, medications, and laboratory measures. (Details can be found in [5] .) Briefly, we captured data on (1) patients with a diagnostic code for "asthma" and prescribed or administered medications that are typically used to treat asthma; (2) patients with a diagnostic code for a respiratory condition other than asthma and prescribed or administered medications that are typically used to treat asthma; (3) patients with a diagnostic code for a pulmonary condition other than asthma but prescribed tests or procedures that are typically used to manage asthma; and (4) patients with a diagnostic code for a respiratory condition other than asthma but with frequent ED visits in which albuterol nebulizer treatments were administered. "ICEES integrated feature tables" are key to the open design of ICEES. These tables were created using a complex custom software pipeline within a secure environment and under a protocol (#16-2978) approved by the Institutional Review Board at the University of North Carolina at Chapel Hill. For data extraction, Clinical Asset Mapping Program for Health Level 7 Fast Healthcare Interoperability Resource (CAMP FHIR) converted patient data from the PCORnet common data model to FHIR files [9] . FHIR Patient data Integration Tool (FHIR PIT) then ingested the FHIR files and integrated the patient data with multiple sources of environmental exposures data, using patient geocodes as reported in the EHR and dates [10] . The exposures data were derived from public sources and included airborne pollutant exposures data from the United States (US) Environmental Protection Agency Fused Air Quality Surface Using Downscaling repository; major roadway or highway exposures data (a proxy for airborne pollutant exposures) from the US Department of Transportation; and socioeconomic exposures data from the US Census Bureau American Community Survey. (Additional information on the sources of environmental exposures data can be found in [11] .) After the data were integrated, the resultant ICEES integrated feature tables were stripped of identifiers per the Safe Harbor method outlined in the Health Insurance Portability and Accountability Act (HIPAA) before being exposed with an open application programming interface (OpenAPI). ICEES integrated feature tables were created with respect to 1-year "study" periods, that is, calendar years, to provide a reference point for date-based calculations such as age and estimated exposure. Rows contained binned or recoded data on individual patients, with column headers representing data fields for each of the integrated feature variables. Of note, our institution classifies exposure estimates as "secondary protected health information" because the estimates are derived using primary protected health information (PHI; namely, geocodes and dates) to account for the fact that exposure estimates vary across space and time. We addressed this concern by binning all exposure estimates. The binning strategy that was applied to each feature variable was based on a combination of expert opinion, published literature, and mathematical approaches. Age on day 1 of the 1-year study period was binned using our prior approach [5, 12, 13] : <5, 5-17, 18-44, 45-64, and 65-89 years (89 years being the oldest permissible age per HIPAA). Sex was treated as male or female as coded in the EHR. Multiple race categories were available in ICEES; we focused on Caucasian and African American, as each of the other categories encompassed ≤1% of the total patients. Rural versus urban residence was examined using the US Census Bureau classifications based on American Community Survey-estimated residential density: rural area (<2500 persons per Census block group); urban cluster (between 2500 and 50,000 persons per Census block group); and urbanized area (>50,000 persons per Census block group). Estimated probability of no health insurance and estimated median household income were binned using the pandas.qcut function, which bins according to frequencies: We accessed the ICEES OpenAPI through the ICEES Swagger OpenAPI interface and by command-line requests. An ICEES user interface was also available. ICEES was designed to support several functionalities for exploring and displaying the data, including chi-square tests, with counts of patients, chi-square statistics, and probabilities returned to users. In this study, we applied an ICEES functionality that allows users to run multiple chi-square comparisons based on available features and a primary outcome measure, with options to include a correction metric for multiple comparisons or collapse contiguous bins. In all cases, missing data were excluded from analysis. We queried the ICEES OpenAPI for data on all patients included in the asthma cohort and focused on outcomes in year 2016, which was the most recent year available with complete exposures data. We ran separate queries for each of the following primary outcome measures: (1) 1 or more annual ED or inpatient visits for respiratory issues; and (2) 1 or more annual prescriptions for prednisone. Specifically, we asked the following natural language question: "Among all patients within the ICEES asthma cohort, what airborne pollutant exposures, demographic features, and socioeconomic exposures differ significantly between patients with 0 versus 1 or more annual ED or inpatient visits for respiratory issues in year 2016?" The corresponding command-line API request was: curl -X POST "https://icees.renci.org:16340/patient/2016/cohort/ COHORT%3A12/associations_to_all_features" -H "accept: text/tabular" -H "Content-Type: application/json" -d "{\"feature\":{\"TotalEDInpatientVisits\":{\"operator\":\"=\", \"value\":0}},\"maximum_p_value\":1}" A similar query was used to examine the primary outcome of 1 or more annual prescriptions for prednisone. The exploratory 1 × N feature association functionality available via the ICEES OpenAPI automatically invoked a chi-square test of the association between available features and our user-defined primary outcome measure, significance level, and multiple-comparison correction. We considered the primary outcomes of 1 or more annual ED or inpatient visits for respiratory issues and 1 or more annual prescriptions for prednisone. We focused our analysis on select feature variables that were considered a priori to have a potential impact on asthma exacerbations and were available for patients within the asthma cohort: demographic factors (age, sex, and race); socioeconomic exposures (residential density, health insurance access, and median household income); and airborne pollutant exposures (proximity to major roadway or highway, and exposure to PM 2.5 and ozone). We set the significance level at α=.05, which was adjusted by Bonferroni correction to α=.001. A power calculation was not conducted, as this was an observational, exploratory study focused on existing biomedical data. We successfully queried the ICEES OpenAPI for outcomes data on year 2016. Of the 157,410 patients who met the criteria used to define the asthma cohort, 26,332 patients (16.73%) had 1 or more annual ED or inpatient visits for respiratory issues, and 17,056 patients (10.84%) had 1 or more annual prescriptions for prednisone. Table 1 provides additional details on the cohort, including demographic and clinical profile and environmental exposures. We then examined associations between select feature variables and annual ED or inpatient visits for respiratory issues, focusing initially on demographic factors ( Figure 1A -C). We found that the percentage of patients with asthma exacerbations was higher among females than males ( We also examined associations between socioeconomic exposures and annual ED or inpatient visits for respiratory issues (Figure 2A-C) . We found that the percentage of patients with 1 or more annual ED or inpatient visits for respiratory issues was higher among patients residing in low-density rural areas than among those residing in higher-density urban clusters ( We then examined associations between airborne pollutant exposures and annual ED or inpatient visits for respiratory issues ( Figure 3A-C Results for the primary outcome of annual prescriptions for prednisone ( Figures 1D-F We describe the further development and application of ICEES+ to explore select feature variables associated with asthma exacerbations in a large cohort of patients with asthma or a related condition. We focused on select demographic factors, socioeconomic exposures, and airborne pollutant exposures. We compared results for 2 outcome measures that are indicative of asthma exacerbations: annual ED or inpatient visits for respiratory issues and annual prescriptions for prednisone. We found that female sex, Caucasian race, rural residential density, high probability of no health insurance, low estimated median household income, close residential proximity to a major roadway or highway, and exposure to relatively high levels of PM 2.5 or ozone were significantly associated with asthma exacerbations. Moreover, the results were largely consistent across outcome measures, even though rates of annual ED/inpatient visits for respiratory issues were higher than those for annual prednisone prescriptions. Our study has several limitations that should be considered when interpreting the results. Specifically, as an open service that exposes EHR data, ICEES must abide by stringent regulatory and institutional regulations that limit the granularity of data that can be exposed and the statistical capabilities that are supported. For instance, ICEES exposes binned or recoded data, not raw data. In addition, our institution treats exposure estimates as secondary PHI because they are derived from primary PHI (ie, geocodes and dates); as such, we are unable to reveal the estimated values themselves, only the bins, thus preventing a determination of mean exposures and other statistics based on continuous values. Finally, ICEES currently only supports basic bivariate statistical capabilities. However, we are developing approaches to adapt ICEES to support, in a regulatory-compliant manner, more sophisticated multivariate statistical approaches and machine learning algorithms [15, 16] . We highlight several scientific findings and discuss unexpected findings. First, we observed an increase in the proportion of asthma exacerbations among females versus males. Asthma and acute exacerbations of asthma are more common in males than females in childhood. However, in adulthood, the effect of sex shifts, with females accounting for the majority of asthma and asthma exacerbations. As the majority of patients in our cohort were adults, this observation is consistent with what has been reported in the literature [17] [18] [19] . In addition, the increase in asthma exacerbations among patients with lower median household income and those lacking health insurance reflects established disparities in asthma management, particularly among minorities [20] . However, the increase in the proportion of asthma exacerbations among Caucasians versus African Americans was unexpected and contradicts both our findings [10] and those of other investigators [21] . While the reason for this apparent discrepancy is unclear, several possible explanations exist, including the fact that our institution's racial category of "Caucasian" does not definitively distinguish Hispanic Caucasians from non-Hispanic Caucasians, which may have introduced variability. We are currently exploring approaches that may allow us to clearly distinguish Hispanic and non-Hispanic Caucasians and thus refine our racial and ethnic categorization. Another possible explanation is that our prior study focused on year 2010 [10] , whereas this study focused on year 2016, and our institution's demographics and patient catchment area have changed significantly over that period [22] . Second, the relationship between age and asthma exacerbations was U-shaped when based on annual ED or inpatient visits for respiratory issues and linear when based on annual prescriptions for prednisone. We suspect that this difference is due to the heterogeneity of wheezing phenotypes in the younger age range, which can be associated with different long-term prognoses for the development of asthma and variance in the use of oral corticosteroids for disease exacerbation [23] [24] [25] [26] . Third, one of the key features of ICEES is that it supports research on the impact of environmental exposures such as airborne pollutants on health and disease. Indeed, we identified that asthma exacerbations increased with increasing exposure to PM 2.5 and ozone, as we and others have shown [5, 27] . We also found an increase in asthma exacerbations among patients residing in close proximity to a major roadway or highway, as others have found when using roadway exposure as a proxy for airborne pollutant exposure [14, 28, 29] , although the effect in this study was modest. While one might have expected an increase in asthma exacerbations among persons living in densely populated areas, we found the opposite to be true, with increased asthma exacerbations among persons residing in low-density regions classified by the US Census Bureau as rural areas versus higher-density regions classified as urban clusters. We suspect that several factors might explain these findings. For instance, UNC Health's patient catchment area draws heavily from rural regions of North Carolina, with multiple clinics and small hospitals located across the state and many patients relying on the state hospital system for health care services. Indeed, not a single patient in the cohort described in this study resided in a region classified by the US Census Bureau as an urbanized area. This may have introduced bias into the results. In addition, we note that many major roadways and highways run through rural parts of our patient catchment area, and so any presumption that close proximity to a major roadway or highway is more common in urban versus rural regions may not be valid. A related point is that rural exposures carry risks that may differ from urban exposures. For instance, we are expanding ICEES to include data on concentrated animal farming operations and landfills so that we can begin to examine exposures that may uniquely impact persons residing in rural regions. We also highlight key technical aspects of this study and discuss limitations. First, the data reported herein are openly available via the ICEES OpenAPI, without any regulatory restrictions or login credentials. This allowed us to rapidly execute the queries and analyze the results, thereby accelerating the speed of discovery. Because ICEES is designed to be disease agnostic and is not restricted to patients with asthma and related conditions, we can adapt our approach and the service itself to support any number of use cases and explore environmental influences on virtually any disease. Indeed, we have deployed additional ICEES instances that expose data on patients with drug-induced liver injury and patients with coronavirus infection. In addition, we are adapting ICEES to support a use case on primary ciliary dyskinesia and related rare pulmonary disorders. Second, by using health care system EHR data, a large and clinically relevant patient sample can be identified. In this study, our sample size was approximately 160,000 patients, thus supporting rigorous open statistical analysis. While the statistical tests available via the ICEES+ OpenAPI are currently limited to bivariate analyses, we are developing approaches to support multivariate analyses such as generalized linear models, random forest trees, and causal inference models, with options to control for potential covariates, account for missing data, and examine only those patients who are active in a given year, meaning that they were seen at 1 or more clinics within UNC Health. One significant challenge is the binning approach that is adopted for variables. For instance, automated binning algorithms typically bin data by value or by frequency. The former supports the study of extreme values, but at the expense of evenly distributed bin sizes; the latter supports an even distribution of observations among cells, but at the expense of overlap in patients with equal exposures between bins and bin cutoff points that may not be scientifically meaningful. We are systematically exploring this issue. Our results demonstrate that the open-source ICEES can be used to replicate and extend published findings on factors that influence asthma exacerbations. While we are actively researching the limitations of the service and developing ways to improve it, we believe that ICEES will greatly speed and democratize the use of EHR data to support research and discovery. Moreover, to the best of our knowledge, ICEES is the only open source of clinical data that have been integrated at the patient level with multiple sources of public environmental exposures data. While we have described an application use case focused on asthma, ICEES is disease agnostic. We expect the service to advance research in environmental health and related fields and continue to grow as we expand both our user base and the service itself to support new clinical use cases, additional EHR elements (eg, laboratory measures), and new data sources (eg, survey data). Moreover, because ICEES is open source, the model and software code [30, 31] can be adopted by other institutions as a novel approach for openly exposing and sharing sensitive data. Indeed, ICEES may have application as an open, privacy-preserving approach to inform decision making by the US Environmental Protection Agency and other federal agencies regarding the patient-level impact of environmental exposures on risk of disease. Finally, we are assessing regulatory-compliant options for applying ICEES as a tool for clinical decision support by identifying patients with asthma (and eventually patients with other chronic diseases) or geographical regions at high risk for poor health outcomes based on their exposures profile and then flagging those patients in their EHR to inform patient care. Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data MIMIC-III, a freely accessible critical care database. Sci Data Biomedical Data Translator Consortium. The Biomedical Data Translator program: conception, culture, and community Toward a universal biomedical data translator A novel approach for exposing and sharing clinical data: the Translator Integrated Clinical and Environmental Exposures Service Biomedical Data Translator Consortium. Clinical data: sources and types, regulatory constraints, applications | iss. 4 | e32357 Biomedical Data Translator Consortium. Sex, obesity, diabetes, and exposure to particulate matter among patients with severe asthma: scientific insights from a comparative analysis of open clinical data sources during a five-day hackathon Asthma outcomes: exacerbations Fast Healthcare Interoperability Resources (FHIR) as a meta model to integrate common data models: development of a tool and quantitative validation study FHIR PIT: an open software application for spatiotemporal integration of clinical data and environmental exposures data Translator Exposure APIs: open qccess to data on airborne pollutant exposures, roadway exposures, and socio-environmental exposures and use case application Postoperative mortality after inpatient surgery: incidence and risk factors Anesthesia-related perioperative adverse events during in-patient and out-patient procedures Toll-like receptor 4 pathway polymorphisms interact with pollution to influence asthma diagnosis and severity An approach for open multivariate analysis of integrated clinical and environmental exposures data Open application of statistical and machine learning models to explore the impact of environmental exposures on health and disease: an asthma use sase Factors associated with exacerbations among adults with asthma according to electronic health record data Sex and gender in asthma Epidemiology of asthma exacerbations Asthma Disparities in America: A Roadmap to Reducing Burden on Racial and Ethnic Minorities. Asthma and Allergy Foundation of America Neighborhood poverty, urban residence, race/ethnicity, and asthma: Rethinking the inner-city asthma epidemic About Us -History. UNC Health Asthma and wheezing in the first six years of life Asthma phenotypes in childhood Management of preschool recurrent wheezing and asthma: a phenotype-based approach Research priorities in pediatric asthma: results of a global survey of multiple stakeholder groups by the Pediatric Asthma in Real Life (PeARL) Think Tank Outdoor PM , ambient air temperature, and asthma symptoms in the past 14 days among adults with active asthma Near-roadway pollution and childhood asthma: implications for developing "win-win" compact urban development and clean vehicle strategies | iss. 4 | e32357 Proximity to major roadways and asthma symptoms in the School Inner-City Asthma Study This work was supported by the National Center for Advancing Translational Sciences within the National Institutes of Health (OT3TR002020, OT2TR003430, UL1TR002489) and the US Environmental Protection Agency (CR 83578501). The sponsors were otherwise not involved in the work. The authors acknowledge the following persons: James Champion for assistance with the patient data; Ann Moss Joyner and Allan Parnell of Cedar Grove Institute for Sustainable Communities for guidance on the ACS data; Patrick Wang and Max Wang of CoVar Applied Technologies for development of the Integrated Clinical and Environmental Exposures Service user interface; and Perry Haaland, Ashok Krishnamurthy, Bo Lan, Priya Sharma, and Meghamala Sinha for helpful discussions and critical comments. We also thank our collaborators within the Biomedical Data Translator Consortium and the Personalized Environment and Genes Study at the National Institute for Environmental Health Sciences. DBP receives funding from the National Institute of Environmental Health Sciences; the National Institute of Allergy and Infectious Diseases; the National Heart, Lung, and Blood Institute; the US Environmental Protection Agency; and the US Department of Defense. He has been a consultant for GlaxoSmithKline, Teva, and Sanofi. All other authors declare no potential conflicts of interest. KF, SCA, and EP conceived the general design of the Integrated Clinical and Environmental Exposures Service (ICEES); KF led the scientific implementation of ICEES; HX led the technical implementation of ICEES; KF and HX developed and implemented the binning strategy and statistical functionalities and also performed quality control testing; KF analyzed the data, prepared the figures and table, and drafted the first version of the manuscript; S Appold contributed the US Census Bureau American Community Survey data; S Arunachalam contributed the US Environmental Protection Agency Fused Air Quality Surface Using Downscaling airborne pollutant data; LS and AV contributed the US Department of Transportation roadway data; EP contributed the UNC Health patient data; LS performed all geocoding; SCA provided project leadership; DBP led the study design and asthma use case, provided scientific rationale, and served as the subject matter expert. All authors reviewed and approved the manuscript for journal submission. Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.