key: cord-0914264-21m8m660 authors: Zhang, Mudan; Zeng, Xianchun; Huang, Chencui; Liu, Jun; Liu, Xinfeng; Xie, Xingzhi; Wang, Rongpin title: An AI-based radiomics nomogram for disease prognosis in patients with COVID-19 pneumonia using initial CT images and clinical indicators date: 2021-08-10 journal: Int J Med Inform DOI: 10.1016/j.ijmedinf.2021.104545 sha: 33768bc87c400d8819ba5bee4d4aa32c97571441 doc_id: 914264 cord_uid: 21m8m660 This study utilized a comprehensive nomogram to evaluate the prognosis of patients with COVID-19 pneumonia. Methods: COVID-19 pneumonia data was divided into training set (256 of 321, 80%), internal validation set (65 of 321, 20%) and independent external validation set (n =188). After image processing, lesion segmentation, feature extraction and feature selection, radiomics signatures and clinical indicators were used to develop a radiomics model and a clinical model respectively. Combining radiomics signatures and clinical indicators, a radiomics nomogram was built. The performance of proposed models was evaluated by the receiver operating characteristic curve (AUC). Calibration curves and decision curve analysis were used to assess the performance of the radiomics nomogram. Results: Two clinical indicators that were age and chronic lung disease or asthma and 21 radiomics features were selected to build the radiomics nomogram. The radiomics nomogram yielded an Area Under The Curve (AUC) of 0.88 and accuracy of 0.80 in the training set, an AUC of 0.85 and accuracy of 0.77 in internal testing validation set and an AUC of 0.84 and accuracy of 0.75 in independent external validation set. The performance of radiomics nomogram was better than clinical model (AUC = 0.77, p <0.001) and radiomics model (AUC = 0.72, p = 0.025) in independent external validation set. Conclusions: The radiomics nomogram may be used to assess the deterioration of COVID-19 pneumonia. some patients (5) (6) . However, there were no clear evidence that showed how RT-PCR, clinical symptoms, and laboratory tests were correlated with the severity of the COVID-19 disease. In China, CT scans were used as criteria for clinical diagnosis of COVID-19 in the areas because of its higher sensitivity of detection of COVID-19 pneumonia than that of RT-PCR (7) (8) (9) . A few studies reported that chest CT scans could accurately locate the lesions and the severity or changes in the lesion area during the course of the disease (10) (11) . Yuan et al. published a simple CT scoring method to predict mortality by evaluating the association of the radiologic findings with the mortality of patients with COVID-19 (12) . However, there are limitations of using CT scans to understand disease prognosis. Firstly, evaluation of the disease severity based on routine CT images relies on radiologists' expertise. Additionally, the number and appearance of different types of lesions in chest CT images are often varied and irregular (13) . Furthermore, CT images may appear to be normal during early infection or may appear abnormal even in the absence of symptoms (10) . Thus, more researches are needed to understand the correlation of CT findings with the severity and progression of the disease. (14) . Artificial intelligence (AI) technology has been used to improve the efficiency of clinicians in the radiology field. A recent study showed that AI surpassed human-level performance in automatic detection of lung diseases during the COVID-19 outbreak (15) . Zheng et al. developed a deep learning based model for automatic detection of COVID-19 lesions on chest CT and noted that the AUC value attained was 0.95-0.97 (16) .Another study also reported that AI 6 / 46 systems performed well in their ability to diagnose and predict diagnoses, quantitative measurements, and the prognosis of COVID-19 pneumonia (17) . Our study developed an AI-based radiomics nomogram to assess the disease prognosis of COVID-19 pneumonia by integrating radiomics signatures from initial CT images with the clinical indicators. We hope that the radiomics nomogram produced by the model can be used by hospital teams for the management of the COVID-19 epidemic, especially in hospitals with a shortage of medical resources. The data of patients with COVID-19 pneumonia in training set and internal validation set was collected from the Radiology Quality Control Center database of Hunan province, Optics Valley Hospital of Hubei Province (18) and four hospitals in Guizhou province. The data of patients in independent external validation set was from Huoshenshan Hospital, China. This multicenter study was approved by the ethics committees of all hospitals (2020, NO.01 listed in supplements). Because of its retrospective nature, the need to obtain informed consent from the patients was waived. The study was performed according to the principles of the declaration of Helsinki. Figure 1 shows the workflow of our study. The patient data for the study is from confirmed COVID-19 pneumonia patients hospitalized between January 12 and April 30, 2020. In training set and internal validation set, 185 patients have moderate pneumonia while 136 patients have severe pneumonia. In independent external validation set, 101 patients have moderate pneumonia and 87 have severe pneumonia. We confirmed the diagnosis and clinical classification of patients by two associate chief physicians according to the data from electronic health records (EHRs), laboratory information system and Diagnosis and Treatment Protocol for Novel Coronavirus Pneumonia (Trial Version 7, listed in the supplements). If the two doctors are divided, there will be a chief physician to make a final diagnosis that used as the gold standard. Data from patients with a normal initial chest CT and patients with mild symptoms (19) , were not included. All images are non-enhanced chest CT images and collected from Picture Archiving and Communication system then be reconstructed at a slice thickness of 1.00 mm. Details of the CT characteristics are listed in Supplementary Table 1 . We chose the chest CT images scanned within four days of initial diagnosis (20) as well as the clinical features. Clinical examination and CT scan were finished within 24 hours after admission. If the CT scan or examinations were done more than once, we chose the one closer to initial diagnosis. Missing data were imputed using median of the variable distribution. Before any data pre -processing steps and model construction, data of 321 patients is randomly split into two individual sets as 80% ( n=256 ) and 20% ( n=65 ) in training and internal validation sets respectively. Data of 188 patients from Huoshenshan hospital is used as independent external validation set. Figure s1 demonstrates the inclusion and exclusion criteria. Secondly, features that had higher than 10% missing rate were excluded. For features with less than 10% of missing rate, we imputed the average value based on the train set to replace the missing values. The same steps were applied to internal and independent external validation sets. First, regions of interest (ROI) volumes were segmented by an automated segmentation architecture based on three deep learning algorithms. The evaluation of the auto-segmentation accuracy was completed before image segmentation. A B-spline interpolation resampling was used to normalize the voxel size, and the anisotropic voxels were resampled to form isotropic voxels of 1.0 mm × 1.0 mm × 1.0 mm. Radiomic feature extraction was carried out using Pyradiomics 2 . Based on the original images, six common feature groups were extracted. They were first-order features ,shape features, gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size 9 / 46 zone matrix (GLSZM), and gray-level dependence matrix (GLDM) (21) . Next, the training set was standardized with the standard scaler package 3 , and the standardized model in the training set was applied to the two validation sets. We performed a feature dimension reduction process, as high-dimensional features were extracted to select the most relevant features. Additionally, the intra-class correlation coefficient and inter-class correlation coefficient were used to evaluate consistency of measurements made by different observers measuring the same quantity and the same observer measuring different quantity, respectively. (Supplementary Figure 2) . Features with an intra-class correlation coefficient > 0.75 and an inter-class correlation coefficient > 0.75 were considered to have a satisfactory agreement and were selected for further analysis. Next, a univariable analysis named K-best was employed (22) . This test selected features according to the K highest scores as computed through the ANOVA F-value between the label and the feature. Features with a significant difference (p < 0.05) were selected. The least absolute shrinkage and selection operator (LASSO) feature-selection algorithm was used to screen the most informative image features to avoid the "curse of dimensionality". After feature extraction and selection, logistic regression (LR) algorithms were trained to construct a radiomics model for the disease prognosis by using a five-fold cross-validation strategy. This process including feature extraction, selection and model construction was all finished in train set then we applied them into internal validation set and independent external validation set. We used univariate analysis to assess the relationship between clinical factors, serum biomarkers, and disease outcome. The features with p< 0.05 were introduced into a multivariable logistic regression analysis to select a combination of clinical factors and serum biomarkers. Next, we built a clinical model with the selected clinical indicators to predict the disease prognosis. A radiomics nomogram was constructed, based on the radiomic features along with the clinical indicators, using a multivariate logistic regression model in the training set. To detect the multi-collinearity among variables in the radiomics nomogram, the collinearity diagnosis was conducted by calculating the variance inflation factor (VIF) for variables in the radiomics nomogram. In the end, the radiomics nomogram was verified in the validation sets. The calibration curves and Hosmer-Lemeshow test were used to assess the relationship between the predicted risks and the actual results. Finally, decision curve analysis (DCA) was used to evaluate the performance of radiomics nomogram. Before modelling, the differences in clinical factors and serum biomarkers between the moderate and severe pneumonia sets were assessed using the Mann-Whitney U test or independent t-test for continuous variables and the chi-square test for categorical variables (SPSS for Windows, v.20.0; Chicago, IL). A p-value < 0.05 was considered a statistically significant difference. The area under the curve (AUC) of receiver-operating characteristics (ROC) with 95% confidence interval (95% CI) was used to evaluate the performance of the models. Accuracy was calculated to assess the prediction performance. Differences in the AUC values between different models were estimated by the DeLong test. Youden index was used to classify the patients into the high-risk or the low-risk group. A total of 509 patient data were included in this study. 321 patient data were included in training and internal validation sets, 188 were included in independent external validation set. Patient characteristics in the training and internal validation sets are listed in Table 1 . No significant differences were observed between the training set and the internal validation set in age (p = 0.890) and sex (p = 0.214). All patients were Asian. High-risk heart conditions (including hypertension, hyperglycemia, and dyslipidemia), chronic lung disease or asthma, white blood Seventeen clinical factors and serum biomarkers were included in our study (Table 1) . A total of 12 factors were selected from univariate logistic regression analysis, and two predictive indicators were selected from multivariate logistic regression analysis (Supplementary Figure 3 ). The clinical model to predict and assess COVID-19 pneumonia was developed based on the following two independent predictive factors: age and chronic lung disease or asthma. Higher total points based on the sum of the assigned number of points for each factor in this model were Feature selection and radiomics model building 13 / 46 Analysis of pneumonia segmentation using CT images on 30 patients randomly selected from the entire data set was done using the Dice coefficient (DC) as the evaluation metric. In 30 patients, the average DC value was 0.825 ± 0.047, suggesting a good segmentation result. In the intra-reader class, 1157 out of 1218 (95%) radiomic features had a good agreement with the ICCs ranging from 0. Multivariable analysis revealed that radiomics score and two clinical indicators were significant independent factors that accessed disease prognosis. A radiomics nomogram incorporating these two variables was built (Figure 4) . By using the collinearity diagnosis, the VIFs for the radiomics score and two clinical indicators were less than 10, indicating no severe collinearity existing in these factors (Supplementary Table 3 DeLong's test was used to compare the performance of clinical model, radiomics model, and radiomics nomogram. The result showed that the radiomics nomogram was significantly better than clinical model (p <0.001) and the radiomics model (p = 0.025) in the independent external validation set. The calibration curve showed the agreement between predicted and actual values. The Hosmer-Lemeshow tests were not significant in the training set (p = 0.973), internal validation set (p = 0.932) and independent external validation set (p = 0.273), which suggested there was no significant departure from actual values. The calibration curves of the radiomics nomogram in both sets were shown in Figure 5 . Decision curve analysis (DCA) was used to evaluate the performance of the radiomics nomogram ( Figure 6 ). If the threshold probability was more than 20%, the radiomic nomogram was more net benefits than other models and the treat-all or treat-none scheme, indicating its good performance with clinical application. In this study, we found 21 radiomic features and 2 clinical indicators that were significantly related to the disease prognosis of COVID-19 pneumonia. We then constructed and validated a radiomics nomogram for predicting disease prognosis based on radiomics features extracted 16 / 46 from initial CT images combined with clinical indicators. Results in our study indicated that the radiomics nomogram performed better than the radiomics model and clinical model. But there was tiny difference of performance between internal validation set and independent external validation set that perhaps caused by the presence of significant differences among the two populations, or maybe different instruments were used in the two sets. Our results suggested that radiomics could also be a potential tool for evaluating the disease prognosis in COVID-19 pneumonia. FANG et al. also investigated the value of radiomics in screening COVID-19 in their study, Chen et al. constructed a system based on deep learning for detecting COVID-19 pneumonia on high resolution CT (23) and the reading time of radiologists was decreased by 65% using this system . Other researchers have done similar studies (24) (25) (26) . But in our study, we implemented an AI-based semiautomatic method that substantially reduced the time required for obtaining ROI as compared with a completely manual process. And we designed and implemented the models using MINIMAR (MINimum Information for Medical AI Reporting) (27) and checked using IJMEDI checklist (28) that to manage concerns in terms of accuracy and bias. Because with the huge increased number of AI researches of COVID-19, there were some critical doubt: Michael Roberts et al. (29) systematically reviewed 62 studies about COVID-19 modelling and find that none of the models are of potential clinical use due to methodological flaws and/or underlying biases. Similar questions have been raised by other researchers and they all suggested that in order to 17 / 46 solve this problem, standard reporting list must be adhered to, for example TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) proposed in 2015 (30) , MINIMAR or IJMEDI checklist (28) that we chose in our study. In addition, we used multi-center data to train and evaluate our model. But there are still some drawbacks in our study. For instance, we did not report the brand and model of the analyzer equips about the hematochemical parameters as well as the report about model interpretability and explainability. Moreover, the most meaningful evaluation of an algorithm's performance is to assess it in a clinical setting and take deep insight into the biological meaning of radiomic features (31) , which are the future direction of our research. Our results suggested that the inflammatory signs of patients with severe pneumonia as seen on the initial CT images were different from those of patients with moderate pneumonia, which may be related to the different pathological changes caused by the virus. Another study found that CT findings of viral pneumonia are diverse and may be affected by the immune status of the host and the underlying pathophysiology of the viral pathogen (32) . Mild and moderate cases of COVID-19 mimic common respiratory viral infections. However, histological examination from a patient who died of COVID-19 (33) showed that the right lung had evident desquamation of pneumocytes and hyaline membrane formation, indicating Acute Respiratory Distress Syndrome (ARDS), and the left lung tissue displayed pulmonary edema with hyaline membrane formation, which is indicative of early-phase ARDS. This indicates that disease severity is related to ARDS. Thus, early pathological differences in lungs of 18 / 46 patients with moderate versus severe symptoms are likely to be detected with chest CT images. Patients with COVID-19 pneumonia may get co-infected with other pathogen in the later stages of the disease, which could aggravate the disease. The pathological changes and CT signs of viral pneumonia are different from those of other pneumonia (13) .Fang et al. suggested that hospital-acquired pneumonia (HAP) is a possibility in the later stages of the disease, and it is important for clinicians to be aware of it (6) , Bassetti et al. noted that in their study, bacterial infections (pneumonia or bloodstream infection) developed in 10% COVID patients (34) ,so appropriate antibiotics are administered to these patients . In conclusion, by using an AI-based method, we established a radiomics nomogram for disease risk prediction based on the initial CT images and clinical indicators of patients with COVID-19 pneumonia. We believe that this radiomics nomogram can be used in the COVID-19 epidemic, especially in situations where there is a shortage of healthcare workers. Figure 2 A,B,C the box plots of the radomics scores in the training set, internal validation set and independent external validation set. The optimal cut-off value was 0.059 according to the maximized Youden index in the training set. The difference radiomics scores between moderate pneumonia set and severe pneumonia set was computed with t-test. WHO Director-General's statement on IHR Emergency Committee on Novel Coronavirus (2019-nCoV). Geneva: WHO Clinical management of COVID-19,Interim guidance.WHO CRISPR-Cas12-based detection of SARS-CoV-2 Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Retrospective analysis of clinical features in 134 coronavirus disease 2019 cases Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. The Lancet Infectious Diseases Correlation o f Chest CT and RT-PCR Testing in Coronavirus Disease Novel Coronavirus (2019-nCoV) Pneumonia Time Course of Lung Changes On Chest CT During Recovery From Association of radiologic findings with mortality of patients infected with Radiographic and CT Features of Viral Pneumonia COVID-19 pneumonia: what has CT taught us? Lancet Infect Dis Digital technology and COVID-19 A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography Multidimensional Evaluation of All-Cause Mortality Risk and Survival Analysis for Hospitalized Patients with COVID-19 The Role of Chest Imaging in Patient Management During the COVID-19 Pandemic: A Multinational Consensus Statement From the Fleischner Society Time Course of Lung Changes at Chest CT during Recovery from Coronavirus Disease 2019 (COVID-19) Image biomarker standardisation initiative Scikit-learn: Machine Learning in Python Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study Radiomics Analysis of Computed Tomography helps predict poor prognostic outcome in COVID-19 Radiomics nomogram for the prediction of 2019 novel coronavirus pneumonia caused by SARS-CoV-2 Radiologists and Clinical Information in Predicting Outcome of Patients with COVID-19 Pneumonia MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care The need to separate the wheat from the chaff in medical informatics Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration The Biological Meaning of Radiomic Features Viral pneumonia COVID-19-New Insights on a Rapidly Changing Epidemic The novel Chinese coronavirus (2019-nCoV) infections: Challenges for fighting the storm Radscore=-0.05650492+wavelet-LHL_firstorder_Range*-0.139362611046986+wavelet-LHL_firstorder_Maximum*-0.138133196001071+wav elet-HHL_glszm_ZoneEntropy*-0 169+log-sigma-5-0-mm-3D_glcm_Contrast*-0