key: cord-0840291-1d9trdn0 authors: Tariq, A.; Tang, S.; Sakhi, H.; Celi, L. A.; Newsome, J.; Rubin, D.; Trivedi, H.; Gichoya, J. W.; Banerjee, I. title: Fusion of Imaging and Non-Imaging Data for Disease Trajectory Prediction for COVID-19 Patients date: 2021-12-05 journal: nan DOI: 10.1101/2021.12.02.21267211 sha: d9ad0094f0879a94e26a5d2451f8d9707a51a30f doc_id: 840291 cord_uid: 1d9trdn0 Purpose: This study investigates whether graph-based fusion of imaging data with non-imaging EHR data can improve the prediction of disease trajectory for COVID-19 patients, beyond the prediction performance of only imaging or non-imaging EHR data. Materials and Methods: We present a novel graph-based framework for fine-grained clinical outcome prediction (discharge, ICU admission, or death) that fuses imaging and non-imaging information using a similarity-based graph structure. Node features are represented by image embedding and edges are encoded with clinical or demographic similarity. Results: Our experiments on data collected from Emory Healthcare network indicate that our fusion modeling scheme performs consistently better than predictive models using only imaging or non-imaging features, with f1-scores of 0.73, 0.77, and 0.66 for discharge from hospital, mortality, and ICU admission, respectively. External validation was performed on data collected from Mayo Clinic. Our scheme highlights known biases in the model prediction such as bias against patients with alcohol abuse history and bias based on insurance status. Conclusion: The study signifies the importance of fusion of multiple data modalities for accurate prediction of clinical trajectory. Proposed graph structure can model relationships between patients based on non-imaging EHR data and graph convolutional networks can fuse this relationship information with imaging data to effectively predict future disease trajectory more effectively than models employing only imaging or non-imaging data. Forecasting clinical events can enable intelligent resource allocation in hospitals. Our graph-based fusion modeling frameworks can be easily extended to other prediction tasks to efficiently combine imaging data with non-imaging clinical data. To successfully perform any clinical prediction task, it is essential to learn effective representations of various data captured during patient encounters and model their interdependencies, including patient demographics, diagnostic codes, and radiologic imaging. Graph convolutional neural networks (GCN) present an intuitive and elegant way of processing multi-modal data presented as a graph structure. Recent works in GCN have enabled fusion of various data types and preserve their interdependency including imaging data such as MRI, CT scans, or X-rays with non-imaging electronic health record (EHR) or phenotypic data. Availability of comprehensive public databases such as TADPOLE [1] , Alzheimer's Disease Neuroimaging Initiative (ADNI) 1 and ABIDE 2 containing imaging and non-imaging information for patients with Alzheimer and autism spectrum disorder (ASD), respectively, has facilitated the use of GCN for disease prediction [2, 3, 4, 5, 6] with innovation in GCN architecture involving kernel size selection and use of recurrence. We build on this trend of research by applying the GCN model to understand the relationship between imaging and non-imaging data and by incorporating holistic weighted edge formation based on patient clinical history and demographic information. We propose a novel framework for fine-grained clinical event prediction for COVID-19 patients based on GCN-inspired models. Much of the literature regarding predictive modeling for COVID-19 patients has been largely focused on either diagnosis by processing imaging data (chest X-rays) through deep convolutional neural networks (CNN) [7] [8] [9] [10] [11] [12] , or mortality prediction using clinical risk factors [13] [14] [15] [16] [17] . In contrast, we propose a graph-based framework (see Figure 4 ) for predictive modeling of disease trajectory during hospitalization of the patient. Our framework includes multi-stage predictive models for various clinical events such as discharge from the hospital, ICU admission, and mortality. While most of the previous work has focused on either imaging data or hand-picked clinical factors [15, 16, [18] [19] [20] , our framework fuses chest radiographs with non-imaging EHR data to capture patient similarity based on clinical history and demographic factors. While a pre-trained model was used for chest X-ray featurization, we developed innovative feature engineering schemes to model sparse information regarding past medical procedures and diagnosed illnesses of patients. Our relational graph-based modeling allows prediction processes to gather cues from similar cases, i.e., patients with similar demographic features and medical history. Our experiments clearly indicate merits of our fusion modeling in comparison to models employing only imaging or only non-imaging features. Biases based on demographic features like race, or socio-economic indicators like insurance status, or alcohol use, may affect clinical decision making regarding discharge from hospital or admission into the ICU [21, 22] . Therefore, we investigated disparity in the predictions made by the proposed framework. We developed a graph-based fusion AI model to predict COVID-19 disease trajectory by using multimodal patient data including radiology imaging data, demographic information, and clinical history ( Figure 1 ). Cohort description: Following approval of the Emory Institutional Review Board (IRB), we collected all chest X-rays of patients with at least one positive RT-PCR test for COVID-19, performed in 12 centers of the Emory Healthcare network from January 2020 to December 2020. As shown in Figure 2 , the data consisted of 47,555 chest X-rays belonging to 23,831 unique patients. We only considered posteroanterior (PA) and anteroposterior (A) view of chest radiographs during the period of hospitalization. We also collected hospital admission, mortality and discharge data from the hospital billing system. Patients who were never admitted, or those who were admitted but did not receive regular intervals of x-rays (at minimum every three days) were discarded. This left 2,201 unique patients corresponding to 2,983 hospitalizations for which chest-x rays were obtained at least every 3 days (7,813 total chest-x rays). In order to avoid data leakage, we applied a patient-wise split of the train and test sets. Some patients were hospitalized multiple times during the year 2020. We considered the chest X-ray exam date and time as a . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. timepoint for predicting future clinical events and stratified these events based on severity -discharged from hospital with 3 days, admission to ICU, or death. There were 929 patients and 4,850 chest X-rays after which the patient was admitted to the hospital for more than three days. Out of these, 380 patients corresponding to 543 chest Xrays were admitted to ICU within 3 days. There were 1,754 patients corresponding to 2,963 chest X-rays who were discharged from hospital within 3 days. 220 patients corresponding to 502 chest X-rays died within 3 days. The selection process for this cohort is shown in Figure 2 . Outcome data for this cohort was collected from hospital billing records and warehouse EHR data sources. Our goal was to predict adverse events as early as possible to provide enough decision making time for the clinicians. In our preliminary experiments, we evaluated 3 days up to 7 days, however given the unbalanced dataset, we found 3 days to be the most prevalent in terms of different prediction labels. This cut-off provided a rather balanced distribution of discharged vs. not discharged labels. Table 1 . Multimodal encounter data description and feature engineering: To fulfill the aim of multi-modal data fusion, we tested different approaches to handling imaging and non-imaging data. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint Non-Imaging EHR Data: We extracted the current and historic EHR data from 2020 for all patients in our cohort. Demographic information: Gender (male/female), self-reported race (African American, Caucasian, Native Hawaiian or Other Pacific Islander, Asian, American Indian or Alaska Native, Multiple, Unknown), ethnic group (Hispanic or Latino, Non-Hispanic or Latino, Unknown), and age at the time of admission (binned in 10-year intervals). All features were one-hot encoded. Current Procedural Terminology codes (CPT): CPT is a five-digit procedure code that reports medical, surgical, and diagnostic procedures and services to entities such as physicians, health insurance companies and accreditation organizations. CPT codes are maintained and grouped in a hierarchical structure by the American Medical Association (AMA). Each CPT code was reduced into a higher-order parent category based on the defined hierarchy 3 (details available in supplementary material). We selected all of 21 groups with more than 1000 occurrences. Comorbidities: Past and current diagnoses of patients are structured as International Classification of Disease, 9th edition (ICD-9) codes which we grouped based on hierarchical structure [22] (details available in supplementary material). We selected all of 29 groups occurring more than 5000 times in the data. Like any CNN, a GCN learns to generate a vector embedding of each node optimized for the downstream prediction task, While CNN defines 'neighborhood' of an instance based on spatial proximity such as neighboring pixels in images, GCN allows the neighborhood to be defined by edges in graph structure. Edges can be defined by any relevant similarity metrics/interconnection between the nodes, providing extreme flexibility for modeling complex clinical scenarios. Node embeddings generated by a GCN are dependent on node representation itself and neighborhood/edge-connected nodes. In our graph design, chest X-ray embeddings (1024-dimensional vector) are used as nodes and edges are decided by similarity 3 https://www.aapc.com/codes/cpt-codes-range/ . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint in patients' demographic information or medical history (see Figure 3 ). Therefore, our GCN supports the notion of 'neighborhoods' based on clinical and/or demographic similarity between patients and makes predictions for future clinical events by considering trends among similar patients. For edge formation, we encoded an EHR feature vector for each node (chest radiograph) based on one-hot encoding of EHR information of the patient corresponding to that radiograph. Edge formulation only employs retrospective data collected before the chest radiograph exam date to avoid any data leakage. Cosine similarity between EHR feature vectors corresponding to two nodes is used to decide edge between the nodes, while edge weight encodes the strength of the similarity between nodes. Many GCN variations are designed for transductive learning such that they can only process graph structures used for training with no ability to generalize to unseen nodes or new graph structure. To avoid this limitation, we used the SAGE (SAmple and aggreGatE) graph convolution network (GraphSAGE) [25] that optimizes a sample aggregate function to collect 'messages' from neighboring nodes while generating vector embedding of a node. For inference, GraphSAGE employs an optimized aggregate function to generate embedding for unseen nodes in unseen graph structures. GraphSAGE based prediction models can inductively reason to assign predictive labels to unlabeled nodes by learning from labelled nodes in the graph. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint especially for Model-2 and Model-3 ( Figure 2 ). We employed undersampling of majority label and weighted loss to tackle this challenge. In this framework, disease trajectory is predicted as predictive events include indicators of worsening (admission in ICU, mortality) or improving (discharge from hospital) prognosis. A patient is evaluated every time a chest radiograph is taken, while staying in the hospital. Model comparison and statistical evaluation: Evaluation of the proposed modeling framework is focused on two aspects; 1) performance of the fusion model in comparison to models using single modality (either imaging data or non-imaging data), 2) comparative effectiveness of different EHR data sources in terms of graph structure definition-based prediction. We designed a Random Forest classifier and used the Pulmonary X-ray severity (PXS) scores computed by a CNN [26] to predict clinical events as well. As PXS scores are based on deep learning based processing of imaging data, we wanted to establish the benefits of our selection of imaging features in our relational graph by comparison. We report the model performance in terms of Area Under the Receiver Operating Characteristics curve (AUCROC), precision, recall and f1-score on held-out test-set from Emory University cohort, as well as on external test-set composed of patients' of Mayo Clinic. Quantitative performance: Our framework consists of three binary prediction models: Model-1 (Discharge from hospital), Model-2 (mortality), Model-3 (ICU admission). Tables 2 presents results for these three prediction models evaluated using the class-wise and aggregated (weighted average) precision, recall, and F-score as well as confidence interval (95% confidence) on a randomly held-out set of test samples. In Figure 5 , we also represent the receiver operating characteristics (ROC) curves for these evaluations. Tables 2, 3, and 4 show the performance of predictive models -(1) EHR only: Random forest model using all EHR data sources as input (demographics, CPT groups, ICD-9 groups), (2) PXS only: Random . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint forest model using only PXS score derived from the images as input, (3) Image only: softmax classification of the pretrained DensenNET-121 using chest X-ray as input, and (4) GraphSAGE with XX: GraphSAGE network with graph structure based on different EHR sources (XX) like demographics, CPT groups, and ICD-9 groups. For both prediction of discharge from hospital and mortality, GraphSAGE with CPT achieved the highest performance, surpassing baseline single modality models -Image-only and EHR only, while PXS-only model also achieved suboptimal performance. GraphSAGE with demographics and ICD do not provide any significant boost over the baseline performance for both the prediction tasks. However, GraphSAGE with ICD was the best performing model for ICU admission prediction, surpassing the performance of single modality models and PXS-only model. Overall results demonstrate that graph-based models increase the performance beyond the performance of any individual source for all the target prediction tasks. The performance tables show pvalues, computed by statistical t-test to evaluate statistically significant performance differences, for all models at all decision points compared with the best performing model at the corresponding decision point. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint Model bias analysis: Interestingly, GraphSAGE models trained on demographic similarity data, do not surpass the performance of the baseline models, even for the ICU admission and mortality. We used the Aequitas toolkit [27] to analyze model disparity focused on false positive rate (FPR) and false omission rates (FOR). Figure 6 shows these FPR and FOR disparity plots based on alcohol use, race and insurance status for discharge (Model 1) and ICU admission (Model 3). Following the proposed strategy, we built graphs based on demographics (age, gender, race, and ethnicity), common ICD-10 and common CPT codes for this cohort and tested our GraphSAGE models as well as image-only and EHR-only models for all prediction labels. Results of their performance are provided in Table 2 -4. It is clear that graph based models have higher and more consistent generalization . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint capabilities than baseline models (image-only and EHR-only) achieving better F-score for all three tasks. Baseline models do achieve higher AUROC only for mortality prediction It appears that elements in clinical history, like past medical procedures and comorbidities play a larger role in clinical event prediction than demographic features like age, gender, and race. Though past medical procedures often produced better performance than coded comorbidities, past procedures are already correlated with specific comorbidities. In addition to single modality model, we also compared the different GraphSAGE models against a model trained on PXS score [26] where the PXS were directly computed from the X-ray images and the CNN model was pretrained on ~160,000 images from CheXpert and transfer learning on 314 CXRs from patients with COVID-19. GraphSAGE model also outperformed the PXS only model with a statistically significant margin. We also validated the performance of GraphSAGE against the individual modality on an external institution where the common comorbidities (diabetics, renal disease) are rare (<20%) compared to our internal training data (>55%) and as a consequence the respective procedures are less frequently performed. In such distribution shift, often the machine learning model failed to generalize [28]; however graph based modeling is flexible enough to handle data-shift by updating the similarity definition for edge generation and produce comparable results on the external data. Despite the encouraging performance, the bias analysis shows that model performance is biased towards race and current insurance status for discharge and ICU admission. However, this is more related towards the practice since as multiple past studies have highlighted biases in crucial care services and discharge delays based on insurance status [29, 30, 31, 32, 33, 34] . Limitation of the study: The proposed framework may face limitations in terms of application scope as it requires imaging data to be collected on regular interval and was trained on data collected from highly integrated academic healthcare system. Prediction interval is also limited to 3 days which is still longer . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint than most studies done in the past [18, 19] . Due to the two-fold informational fusion of the GraphSAGE which involves mathematically irreversible calculations, it is not feasible to apply traditional model interpretation techniques, and hard to explain the decision reason for the prediction of the each node classification. We proposed a graph-based framework to preserve interdependencies in multi-modal data, i.e., EHR and radiologic images, to predict future clinical events (e.g. discharge, ICU admission, and mortality) for the in-patients population tested positive for COVID-19 within 3 days of admission. During the graph-based learning, in theory, two-fold information fusion of node features and graph structure ensure that relevant features (nodes and graph structure) for the targeted task are amplified while similar non-relevant attributes are suppressed. To our knowledge, this is the first attempt to encode and learn the patient-wise similarity within imaging data using a GCN model and apply that for predictive modeling hospital resource optimization. Our experiments clearly establish superiority of relational graph based fusion modeling, over single-modality predictive models, in terms of prediction performance and adaptation to different populations. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 5, 2021. ; https://doi.org/10.1101/2021.12.02.21267211 doi: medRxiv preprint TADPOLE Challenge: Prediction of Longitudinal Evolution in Alzheimer's Disease Disease prediction using graph convolutional networks: application to autism spectrum disorder and Alzheimer's disease Using DeepGCN to identify the autism spectrum disorder from multi-site restingstate data Inceptiongcn: receptive field aware graph convolutional network for disease prediction Going deeper with convolutions Multiple-graph recurrent graph convolutional neural network architectures for predicting disease outcomes Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Covid-resnet: A deep learning framework for screening of covid19 from radiographs COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Medical hypotheses Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images Automated detection of COVID-19 cases using deep neural networks with X-ray images. Computers in biology and medicine CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The lancet Clinical Characteristics, Associated Factors, and Predicting COVID-19 Mortality Risk: A Retrospective Study in Wuhan Machine Learning to Predict Mortality and Critical Events in COVID-19 Positive New York City Patients: A Cohort Study Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19 Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19. medRxiv Development and Prospective Validation of a Transparent Deep Learning Algorithm for Predicting Need for Mechanical Ventilation. medRxiv Using Machine Learning to Predict ICU