key: cord-0055436-7h2jgz93 authors: Wang, Winston T; Zhang, Charlotte L; Wei, Kang; Sang, Ye; Shen, Jun; Wang, Guangyu; Lozano, Alexander X title: Clinical longitudinal evaluation of COVID-19 patients and prediction of organ specific recovery using artificial intelligence date: 2020-12-28 journal: Precis Clin Med DOI: 10.1093/pcmedi/pbaa040 sha: fec8b40d5f66543f00b4e1299b6da1b995e127e8 doc_id: 55436 cord_uid: 7h2jgz93 Within COVID-19 there is an urgent unmet need to predict at the time of hospital admission which patients will recover from the disease, and how fast they recover in order to deliver personalized treatments and to properly allocate hospital resources so that healthcare systems do not become overwhelmed. To this end we have combined clinically salient CT imaging data synergistically with laboratory testing data in an integrative machine learning model to predict organ-specific recovery of patients from COVID-19. We trained and validated our model in 285 patients on each separate major organ system impacted by COVID-19 including the renal, pulmonary, immune, cardiac, and hepatic systems. To greatly enhance the speed and utility of our model, we applied an artificial intelligence method to segment and classify regions on CT imaging, from which interpretable data could be directly fed into the predictive machine learning model for overall recovery. Across all organ systems we achieved validation set area under the receiver operator characteristic curve (AUC) values for organ-specific recovery ranging from 0.80 to 0.89, and significant overall recovery prediction in Kaplan-Meier analyses. This demonstrates that the synergistic use of an AI framework applied to CT lung imaging and a machine learning model that integrates laboratory test data with imaging data can accurately predict the overall recovery of COVID-19 patients from baseline characteristics. . Moreover, little is known about the organ-specific recovery of COVID-19 symptoms after hospital discharge (18) . In general, lung lesion sizes on a CT scan of COVID-19 patients increase from their initial hospital admission to the beginning of remission (19, 20) , but the longitudinal evolution of their lesions and clinical presentation remains unclear. It would be of great clinical utility to characterize and predict recovery within Cohort statistics The general scheme of our study design and procedures are described in Figure 1 . We prospectively followed 285 COVID-19 patients from January to July of 2020. Patients characteristics are presented in Table 1 . These patients were followed up monthly after their initial discharge from the Yichang Central Hospital where they were first diagnosed with COVID-19. If they agreed to participate in this longitudinal study, they were enrolled and consented the use of the anonymized medical record data, including demographic information, lifestyle (including smoking, alcohol use), routine physical examination, and clinical laboratory data. We then analyzed these data and established an AI model to predict organ functional recovery (Fig. 1 ). Demographics and clinical parameters of all the study subjects are provided in Table 1 , and the association of each clinical parameter with critical illness is quantified with a t-test p-value for each. We trained five organ-specific machine learning models that synergistically used AI-analyzed baseline CT scan data combined with laboratory testing values to predict organ recovery for the renal, pulmonary, immune, hepatic, and cardiac organ systems, where baseline data is defined as the first laboratory and imaging data obtained after admission. The models predicted whether a patient would recover at any point during their follow-up, including after hospital discharge. In order to ensure that the models were not overfit and that they were generalizable, we then applied these models onto a held-out validation set of 87 patients. We achieved strong AUCs In order to demonstrate the prognostic utility of the organ-specific recovery AI models, we applied our model scores to predict prognosis within the context of a Kaplan-Meier analysis, which is well-suited to the analysis of data with differing follow-up timepoints. We used the models to stratify the patient population into "high model score" and "low model score" groups for each of the organ systems specific overall recovery prediction when applied to the pooled training and validation cohorts (Fig. 3) . It is clear from the Kaplan-Meier curves that the patients who are predicted to recover by the model tend to recover earlier and more consistently than those predicted to not recover. The liver, lung, and coagulation models also demonstrate a statistically significant ability to stratify patients into recovering and non-recovering populations, with the most striking difference coming early in the patients' disease course (Fig. S2 ). In order to characterize the relationships between each laboratory test and lesion size, we computed Spearman's rank correlation coefficient across the set of laboratory test values and CT lung imaging feature sizes as determined by the AIimaging pipeline, and the applied a hierarchical clustering algorithm (hclust(), implemented in R using the complete linkage method) to the results (Fig. S3) . We have clustered the laboratory and CT feature values such that highly correlated features come together, and highly anti-correlated features move apart. There are two large clusters that are formed. The first is located in the lower right portion of the figure and comprises features that are positively correlated with larger lung lesion sizes, and generally indicate a heightened inflammatory state that portends worse clinical outcome. Higher levels of several key features from this data are known in the literature to be associated with severe COVID-19 disease including erythrocyte sedimentation rate (21), C-reactive protein (22) , lactate dehydrogenase (23) , and Blood urea nitrate (24) . Conversely, the cluster in the top left of the figure are features for which lower values correspond to more favorable outcomes, and smaller lesion size on CT imaging. Several are described within the literature as anti-correlated with COVID-19 severity including sodium (25) , hemoglobin (26) , and albumin (27). To our knowledge, this study is the first to demonstrate organ-specific recovery prediction from baseline imaging and laboratory values in COVID-19 patients. The integrative machine learning models were shown to generalize to a held-out validation dataset, achieving AUCs greater than 0.8 for prediction of recovery in each of the organ systems to which they were applied. In the era of precision healthcare, prediction of recovery using only baseline data is highly valuable to clinicians, as it is critical to understand the longitudinal In sum, COVID-19 can have critical impacts on key organ systems within the body, and there is an unmet need to predict organ-specific recovery from the impacts of this disease in order to better allocate healthcare resources and provide the patients with highly personalized medical treatments. We have trained and validated in a heldout test set 5 organ-specific integrative machine learning models that can predict recovery from baseline patient imaging and laboratory testing upon hospital admission and demonstrated AUCs > 0.8 across each organ system to which the model is applied. This was a retrospective observational study performed at the Yichang Central if they did not meet any single criteria of recruitment above. CT scans from the 285 patients were collected longitudinally both before and after their hospital discharge (Fig. 1 ). Scans were acquired using a Siemens CT scanner with 2 -3 mm thick slices. In order to ensure that these images were accurate and usable, first-pass screening was done on all images to filter out low-quality, unreadable, and artifact-heavy scans. These scans were then fed into a segmentation and classification algorithm (19) in order to calculate the volume of different lung lesions such as ground glass opacity (GGO), consolidation, interstitial thickening, pleural effusion, and fibrosis lesions. Both demographic data and a wide range of laboratory value data was collected from each patient. Blood tests were collected many times for each patient, although the specific timing and number of blood tests was variable. Laboratory values that were collected include albumin, C-reactive protein, lactate dehydrogenase, and other salient values (Table 1 ) (19) . We applied the LightGBM machine learning model (28) (29) (30) Patients were categorized by hospital physicians into critically ill and noncritically ill cohorts, with ICU admission or need for mechanical ventilation as the criteria for the critically ill cohort. Each organ or system that we examined in this study (liver, lung, kidney, immune, and coagulation) was associated with a set of laboratory values (CT scan values for lung) that we observed, which were available to us in the study. (36) , and the MELD score for liver disease (37) . In all three clinical scoring systems, several markers of disease are aggregated linearly to result in a single score. Similarly in our case, if any of a patient's laboratory values were determined to be abnormal as per guidelines in the UpToDate clinical decision resource (38) , that patient's organ function score would increase by 1. Therefore, in the example above, a patient could have a minimum score of 0, and a maximum score of 5. As there was no strong rationale to treat organ markers differently, we weighted them equally within the organ score function. In addition, we did not have sufficient data to stratify organ function classification with more granularity than "normal" and "abnormal". We further considered a score of 0 to be normal or recovered, and a score of 1 or above to be abnormal. We then defined a patient as recovered for a given organ when the score for that organ reaches zero, i.e. all the associated lab values for the organ have returned to normal. For example, a patient would be considered "liver recovered" if they had follow-up lab tests with a liver score of 0. Additionally, we estimated the day that the patient recovered as the number of days after admission when the first score of 0 blood test was taken. In order to assess the model's performance of for each classification task, accuracy, receiver operator characteristic curve (ROC) analysis was applied. ROC curves are generated by plotting the true positive rate (sensitivity) versus the falsepositive rate (1-specificity) across different cut points for the output machine learning model score. An AUC value of 1 indicates perfect performance, whereas an AUC approaching 0.5 indicates performance equivalent to random chance. Sensitivity, specificity and accuracy were determined by using a cutpoint model score of 0.5. The organ-specific trained machine learning models were used to stratify patients into two populations, which were allocated into high and low risk groups in a A Novel Coronavirus Emerging in China -Key Questions for Impact Assessment A novel coronavirus outbreak of global health concern. The Lancet Importation and Human-to-Human Transmission of a Novel Coronavirus in Vietnam A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster A Novel Coronavirus from Patients with Pneumonia in China Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): a multicenter European study Clinical Characteristics of Coronavirus Disease 2019 in China Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine Quantitative CT Analysis of Diffuse Lung Disease Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning Use of Chest CT in Combination with Negative RT-PCR Assay for the 2019 Novel Coronavirus but High Clinical Suspicion Assay Techniques and Test Development for COVID-19 Diagnosis Chest CT for Typical Coronavirus Disease 2019 (COVID-19) Pneumonia: Relationship to Negative RT-PCR Testing Outcomes of Cardiovascular Magnetic Resonance Imaging in Patients Recently Recovered From Coronavirus Disease 2019 (COVID-19) Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Persistent Symptoms in Patients After Acute COVID-19 Long-term Health Consequences of COVID-19 Clinically Applicable AI System for Accurate Diagnosis, Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. The Lancet Infectious Diseases Erythrocyte sedimentation rate is associated with severe coronavirus disease 2019 (COVID-19): a pooled analysis C-reactive protein, procalcitonin, D-dimer, and ferritin in severe coronavirus disease-2019: a meta-analysis Lactate dehydrogenase levels predict coronavirus disease 2019 (COVID-19) severity and mortality: A pooled analysis Diagnostic performance of initial blood urea nitrogen combined with D-dimer levels for predicting in-hospital mortality in COVID-19 patients Sodium status and kidney involvement during COVID-19 infection Anemia is associated with severe coronavirus disease 2019 (COVID-19) infection Decreased serum albumin level indicates poor prognosis of COVID-19 patients: hepatic injury analysis from 2,623 hospitalized cases Lightgbm: A highly efficient gradient boosting decision tree Consistent individualized feature attribution for tree ensembles A LightGBM-Based EEG Analysis Method for Driver Mental States Classification A Nextgeneration Hyperparameter Optimization Framework. arXiv preprint Clinical course, treatment, and multivariate analysis of risk factors for pyogenic liver abscess Blood urea nitrogen is elevated in patients with nonalcoholic fatty liver disease Baseline circulating IL-17 predicts toxicity while TGF-beta1 and IL-10 are prognostic of relapse in ipilimumab neoadjuvant therapy of melanoma Multiple Organ Dysfunction Score: A reliable descriptor of a complex clinical outcome The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) Model for end-stage liver disease (MELD) and allocation of donor livers The cohort of these 285 patients as COVID-19 pneumonia is stratified into groups of critically ill and non-critically ill patients. P values indicated comparison of critically ill patients versus non-critically ill patients and were calculated by independent-samples T test via The authors extend their gratitude to Prof. Shan X. Wang of Stanford University for many productive discussions about the project. Charlotte Zhang performed a summer internship at Stanford University during which she participated in this project.