key: cord-1037719-rik9xqlc authors: Wang, Lu; Kelly, Brendan; Lee, Edward H.; Wang, Hongmei; Zheng, Jimmy; Zhang, Wei; Halabi, Safwan; Liu, Jining; Tian, Yulong; Han, Baoqin; Huang, Chuanbin; Yeom, Kristen W.; Deng, Kexue; Song, Jiangdian title: Multi-classifier-based identification of COVID-19 from chest computed tomography using generalizable and interpretable radiomics features date: 2021-01-15 journal: Eur J Radiol DOI: 10.1016/j.ejrad.2021.109552 sha: d6a9438c7c331414043f5e9d6f1182035b8eabfe doc_id: 1037719 cord_uid: rik9xqlc PURPOSE: To investigate the efficacy of radiomics in diagnosing patients with coronavirus disease (COVID-19) and other types of viral pneumonia with clinical symptoms and CT signs similar to those of COVID-19. METHODS: Between 18 January 2020 and 20 May 2020, 110 SARS-CoV-2 positive and 108 SARS-CoV-2 negative patients were retrospectively recruited from three hospitals based on the inclusion criteria. Manual segmentation of pneumonia lesions on CT scans was performed by four radiologists. The latest version of Pyradiomics was used for feature extraction. Four classifiers (linear classifier, k-nearest neighbour, least absolute shrinkage and selection operator [LASSO], and random forest) were used to differentiate SARS-CoV-2 positive and SARS-CoV-2 negative patients. Comparison of the performance of the classifiers and radiologists was evaluated by ROC curve and Kappa score. RESULTS: We manually segmented 16,053 CT slices, comprising 32,625 pneumonia lesions, from the CT scans of all patients. Using Pyradiomics, 120 radiomic features were extracted from each image. The key radiomic features screened by different classifiers varied and lead to significant differences in classification accuracy. The LASSO achieved the best performance (sensitivity:72.2 %, specificity:75.1 %, and AUC:0.81) on the external validation dataset and attained excellent agreement (Kappa score:0.89) with radiologists (average sensitivity:75.6 %, specificity:78.2 %, and AUC:0.81). All classifiers indicated that "Original_Firstorder_RootMeanSquared" and "Original_Firstorder_Uniformity" were significant features for this task. CONCLUSIONS: We identified radiomic features that were significantly associated with the classification of COVID-19 pneumonia using multiple classifiers. The quantifiable interpretation of the differences in features between the two groups extends our understanding of CT imaging characteristics of COVID-19 pneumonia. The number of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is rapidly increasing. As of 1 December 2020, over 62.1 million cases and 1.4 million deaths were reported globally since the start of the pandemic [1] . The coronavirus strain SARS-CoV-2 causes the 2019 coronavirus disease (COVID- 19) , resulting in impaired lung function and decreased blood oxygen saturation in affected patients. The main symptoms of COVID-19 include fever, cough, fatigue, and body aches. In addition to the recommended viral nucleic acid real-time reverse transcriptase-polymerase chain reaction (RT-PCR) test, the value of radiological images for diagnosis of COVID-19 has been verified [2, 3] . Based on radiologists' evaluations of the signs of pneumonia on chest computed tomography (CT), an average sensitivity of 80% and specificity of 83% of clinical diagnosis have been achieved [4] . The distinct characteristics of COVID-19 pneumonia on CT images include bilateral involvement, peripheral distribution, multifocality, mixed ground-glass opacity, consolidation, and vascular thickening [5, 6] . However, as radiologists' evaluations of these specific characteristics may be affected by subjective experience, the accuracy of COVID-19 diagnosis varies [4] , especially in cases of viral pneumonia with clinical symptoms and CT signs similar to those of COVID-19. Previous studies have confirmed the value of a pre-designed artificial intelligence (AI) framework for task-oriented COVID-19 analysis based on radiological images [7, 8] . However, this method is currently hampered by insufficient training data. Recent studies underscore the need for further verification of the sensitivity and stability of current AI-based solutions [9] and for more evidence to establish AI as a production-ready solution for COVID-19 diagnosis [10, 11] . Based on the region of interest (ROI) for pneumonia lesions delineated by radiologists, radiomics may provide additional knowledge for survival prognosis and classification of illness severity for COVID-19 pneumonia [12] [13] [14] [15] . The latest analytical tool of radiomics, Pyradiomics, has paved the way for standardised radiomics analysis [16] . As an open-source Python package for extracting radiomic features from the ROI on medical images, the standardised radiomic feature extraction program provided by Pyradiomics can reduce programming bias in researchers, improve the reproducibility of radiomic features, and thus enhance the credibility of radiomics results . We hypothesised that radiomics features could be used to differentiate COVID-19 from other types of viral pneumonia with clinical symptoms and CT signs similar to those of COVID-19. We evaluated the performance of radiomics features by using different feature selection and prediction classifiers and investigated standardised high-throughput CT image features extracted by Pyradiomics to clarify the value of radiomics features for the diagnosis of COVID-19. Between 18 January 2020 and 20 May 2020, patients with viral pneumonia were initially recruited from two hospitals in China and one hospital in the United States. All procedures performed in studies that involved human participants were in J o u r n a l P r e -p r o o f accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration. Our institutional review board approved this retrospective study (IRB: no.* BLIND*) and waived the need for written informed consent. There are no conflicts of interest to declare. All patients included in this study were admitted during the COVID-19 pandemic. RT-PCR tests for COVID-19 were performed using respiratory secretions obtained by bronchoalveolar lavage, endotracheal aspirate, nasopharyngeal swab, or oropharyngeal swab. These COVID-19 RT-PCR tests were performed at least twice for each patient, to reduce the potential false-negative RT-PCR test. For SARS-CoV-2 positive patients, the inclusion criteria were as follows: 1) patient over 18 years of age, 2) complete clinical data and CT scan available at admission, and 3) all RT-PCR test results positive. CT scans were reviewed by at least two radiologists to identify the outline of each pneumonia lesion. Patients whose pneumonia lesions could not be visually detected by radiologists were excluded from this study because the ROI of the pneumonia abnormalities could not be delineated for radiomics analysis. For SARS-CoV-2 negative patients, in addition to the points 1) and 2) of the above-mentioned inclusion criteria, 3) all RT-PCR test results negative, All patients underwent a chest CT scan at admission. The detailed CT scanning parameters of the three hospitals included in this study are listed in Supplementary Appendix A. We tabulated the number of days between symptom onset and date of the CT scan. The time from symptom onset to admission for CT examination was defined as early (0-2 days), intermediate (3-5 days) , or late (6-12 days). Four radiologists from our local hospital, each with >5 years of radiological experience, completed the manual segmentation of all pneumonia lesions layer-bylayer using ITK-Snap software (v.3.6.0) [17, 18] . Any potential signs of COVID-19 on chest CT were reviewed by radiologists, including bilateral involvement, peripheral distribution, multifocality, mixed ground-glass opacity, consolidation, and vascular thickening, as reported previously [5, 6] . Examples of pneumonia lesion segmentation are provided in Supplementary Appendix B. All four radiologists underwent training to identify the characteristics of COVID-19 pneumonia on CT images, to distinguish the CT signs of COVID-19 pneumonia and other types of pneumonia, and to detect the boundary of the pneumonia lesions on CT images. All training sessions were completed online, and approximately eight lessons were provided by radiologists/respiratory experts with experience in the clinical diagnosis of COVID-19 pneumonia. All segmentations were summarised and reviewed by an expert with >10 years of radiological experience. Based on the expert evaluation, the four radiologists held an online consensus meeting to resolve disagreements as to lesion segmentation and to determine the boundaries of these lesions on the CT images. To verify the credibility of the manual segmentation and feature extraction in this study, ten patients' CT scans were randomly selected and segmented by one radiologist from a different hospital to correct for the potential experience-related bias in feature extraction. Analysis of variance (ANOVA) and Mann-Whitney U tests were performed on the features extracted from the two rounds of segmenting the pneumonia lesion images to verify the inter-observer agreement. The recommended standardised radiomics analysis workflow [19] was used in this study. Pyradiomics was used to extract high-throughput image features of the ROI of presented previously [16, 20] . To verify the performance of radiomics features for the classification of COVID-19 pneumonia, the following classifiers were used in this study: linear classifier, knearest neighbour (KNN), least absolute shrinkage and selection operator (LASSO), and random forest (RF). The pneumonia lesion images from China were randomly divided into training and test datasets with an 80:20 ratio based on the related radiomics studies [21] [22] [23] . After obtaining the classification model from the training dataset which executed on the server in China, the model was then provided to the dataset from the United States. The images from the United States were used as an external validation dataset to independently verify the method proposed in this study. The pneumonia lesion images manually delineated by the radiologists will be published for open-access after the review process. To compare the diagnostic results of radiomics with radiologists, CT images in the two test datasets were additionally diagnosed by three radiologists from another hospital in China with 3, 5, and 10 years of radiological experience, respectively. The radiologists were blinded to the patients' RT-PCR results. We compared the sensitivity and specificity of the diagnosis obtained by radiologists and the radiomics methods. Also, a Kappa consistency analysis was used to evaluate the agreement of the The ROC curve, the area under the curve (AUC), sensitivity, and specificity were used to evaluate the diagnostic accuracy for COVID-19. All statistical analyses were performed using the R language (version 3.4.3, Vienna, Austria). The linear classifier, KNN, LASSO, and RF were implemented by the "lm", "kknn", "glmnet", and "randomforest" functions, respectively. Chi-square tests and ANOVA were used to evaluate the differences in demographics between the two groups. For Kappa analysis, the classification results of the two methods were considered to be excellent agreement when the Kappa score ≥0.85, and [0.6, 0.85) indicated good agreement, and [0.45, 0.6) indicated moderate agreement, and Kappa score <0.45 was considered poor agreement. P<0.05 was considered statistically significant. In total, 266 patients were initially recruited from the three hospitals. Table 2 . The ROC curves of the four classifiers are presented in Figure 2 . The results of the radiologists' diagnosis indicated an average sensitivity of 75.6% and specificity of 78.2% for the images in the two test datasets, the detailed comparison is presented in Table 3 . For Kappa analysis, the Kappa score between the radiologists was 0.93, and results indicated that excellent agreement score was obtained between the LASSO and the radiologists (Kappa=0.89). In contrast, the agreement between the radiologists and Table 4 ). The values of the "original_firstorder_Uniformity" feature in the COVID-19 lesion images were significantly lower than those in the non-COVID-19 lesion images in the three datasets (P<0.05, t-test). Evaluation of the differences in this feature reflected in the CT images revealed that the structure of COVID-19 lesion images was more chaotic than that of non-COVID-19 lesion images with high feature values. Specifically, the internal structure of COVID-19 lesions and boundary structure were more likely to demonstrate a sharp contrast, and the internal structure tended to be more heterogeneous than that of non-COVID-19 lesions. Further, the uniformity of COVID-19 pneumonia lesion images with low feature values tended to be worse ( Figure 3) . Additionally, although the discrimination ability of the "diagnostics_Image-original_Mean" feature was significant in both the training and test datasets, the performance was decreased in the external validation dataset. We verified the value of radiomics for differentiating COVID-19 from other types of viral pneumonia with clinical symptoms and CT signs similar to those of COVID-19 based on the standardised radiomics workflow. Using the features extracted by Pyradiomics, our study identified the radiomics features that were significantly associated with the classification of pneumonia patients with and without COVID-19 using multiple classifiers. We also clarified the differences in CT images reflected by different key features. Image recognition technologies have been demonstrated to be effective methods for the clinical diagnosis of COVID-19 [24, 25] . Deep learning-based methods have developed rapidly due to the unnecessary manual segmentation of pneumonia lesions on CT images [26, 27] . However, owing to the black-box nature of AI [28, 29] , neither the operator nor the clinician can intuitively understand the working process of AI for distinguishing patients with COVID-19 pneumonia from those with non-COVID-19 pneumonia. Although high diagnostic accuracy was reported by AI, we test datasets. However, their performance in the external validation dataset was poorer (P<0.05) than that of the LASSO. Therefore, studies that use a single machine learning classifier for radiomics feature selection and signature construction may have potential defects. The use of multiple classifiers to evaluate radiomic features in future radiomics studies may play an important role to ensure the credibility of radiomics results. Although the main purpose of this study was not to propose a method beyond the radiologists' diagnosis, we found that the diagnostic performance and Kappa score of the LASSO were closest to that of the radiologists' based on the diagnosis of CT slices. This finding supports the superiority of the LASSO in radiomics workflow, which has been reported in previous studies such as human oncological diseases and immunotherapy-induced pneumonitis [18, [30] [31] [32] [33] [34] . Our results support the application of radiomics to assist with diagnosis of COVID-19 pneumonia. The recent development of radiomics has provided a new research paradigm in clinical studies [35, 36] , and there have already been radiomics studies published for COVID-19 survival prognosis and illness severity identification [12] [13] [14] [15] . However, current radiomics studies on the differentiation of COVID-19 from other types of viral pneumonia with clinical symptoms and CT signs similar to those of COVID-19, and the evaluation of radiomic feature among different classifiers on COVID-19, are scarce. Our study differed from previous studies [37] [38] [39] in several ways. First, we used a variety of machine learning classifiers to prove the effectiveness of radiomics for COVID-19 lesion feature analysis, and the performance of the current mainstream classifiers for COVID-19 classification was determined. Next, the radiomic features that were significant for COVID-19 classification were identified by multi-classifiers. This further strengthened the conclusion that radiomics can be used to assist in the diagnosis of COVID-19 pneumonia in previous studies, which provides a reference for more accurate COVID-19 diagnosis. Finally, we found that different classifiers produced significantly different classification results for the same radiomic features. This finding is of significance for the future radiomics studies to validate multiple machine learning classifiers to improve the credibility of the radiomics results. In addition to the current specific CT manifestations of COVID-19 observed by radiologists, there is a clinical need for interpretable features that reflect the differential expression of COVID-19 and non-COVID-19 pneumonia on CT images. This study has several limitations. First, only patients from China and the United States were included. Future studies should employ datasets from other countries to provide a more detailed classification of pneumonia subtypes, such as eosinophilic pneumonia, and other interstitial pneumonia to improve the robustness of the radiomics results. Next, four current mainstream classifiers were used in this study to differentiate COVID-19 and non-COVID-19 pneumonia. Still, the results indicated that the key radiomics features among the classifiers varied, and the results from the external validation dataset indicated that all four classifiers produced overfitting. More approaches should be tested to determine the optimal classifier. Also, even though each patient underwent at least two RT-PCR tests, the possibility of double false-negative RT-PCR tests cannot be eliminated. Finally, this study did not limit CT scanning parameters. Although robustness was improved, the accuracy of the external validation dataset was reduced. Future studies should design an optimised image normalisation method to mitigate the decrease in accuracy caused by different CT scanning parameters. In conclusion, this study provides new evidence supporting the use of radiomics for the diagnosis of COVID-19 pneumonia. Our study identified the CT imaging features with significant discriminatory potential for patients with COVID-19 pneumonia and non-COVID-19 viral pneumonia using multiple classifiers. We also clarified the differences in CT images reflected by these key features. Our results expand knowledge regarding CT characteristics of COVID-19 pneumonia, which will improve the clinical diagnosis of COVID-19 pneumonia in the current radiological workflow. Declarations of interest: none J o u r n a l P r e -p r o o f (b1-b3) images by the feature of "original_firstorder_Uniformity". Figure a(1) represents the lung of a 63-year-old male with fever, chest tightness, and anorexia for 9 days. CT manifested as bilateral involvement and multifocality. Figure a (2) represents the lungs of a 88-year-old male with anorexia for 6 days and fever for 2 days. CT manifestation indicates peripheral distribution, diffuseness, and mixed ground-glass opacity. Figure a(3) represents the lungs of a 53-year-old male with weakness and muscle aches for more than 10 days. CT manifestation included bilateral involvement, consolidation, and vascular thickening. J o u r n a l P r e -p r o o f COVID-2019) situation reports, Coronavirus disease (COVID-2019) situation reports, World Health Organization Chest CT for Typical 2019-nCoV Pneumonia: Relationship to Negative RT-PCR Testing Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT Relationship to Duration of Infection Coronavirus Disease 2019 (COVID-19): A Perspective from China Digital technology and COVID-19 Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 Artificial intelligence vs COVID-19: limitations, constraints and pitfalls Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Identification of common and severe COVID-19: the value of CT texture analysis and correlation with clinical characteristics Radiomics nomogram for the prediction of 2019 novel coronavirus pneumonia caused by SARS-CoV-2 A Quantitative and Radiomics approach to monitoring ARDS in COVID-19 patients based on chest CT: a retrospective cohort study Radiomics Analysis of Computed Tomography helps predict poor prognostic outcome in COVID-19 Computational Radiomics System to Decode the Radiographic Phenotype Variability of manual segmentation of the prostate in axial T2-weighted MRI: A multi-reader study Development and validation of a prognostic index for efficacy evaluation and prognosis of first-line chemotherapy in stage III-IV lung squamous cell carcinoma Radiomics: the bridge between medical imaging and personalized medicine PyRadiomics: Radiomic Features Evaluating the HER-2 status of breast cancer using mammography radiomics features Development and validation of a radiomic signature to predict HPV (p16) status from standard CT imaging: a multicenter study Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms A Fully Automatic Deep Learning System Diagnostic and Prognostic Analysis Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound False-Negative Results of Real-Time Reverse-Transcriptase Polymerase Chain Reaction for Severe Acute Respiratory Syndrome Coronavirus 2: Role of Deep-Learning-Based CT Diagnosis and Insights from Two Cases Deep learning for healthcare: review, opportunities and challenges Opportunities and obstacles for deep learning in biology and medicine Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer Radiomics Features of Multiparametric MRI as Novel Prognostic Factors in Advanced Nasopharyngeal Carcinoma, Clinical cancer research : an official journal of the American Association for Radiomics signature of computed tomography imaging for prediction of survival and chemotherapeutic benefits in gastric cancer Discovery of pre-therapy 2-deoxy-2-(18)F-fluoro-Dglucose positron emission tomography-based radiomics classifiers of survival outcome in non-small-cell lung cancer patients Radiomics to predict immunotherapy-induced pneumonitis: proof of concept A review of original articles published in the emerging field of radiomics Beyond imaging: The promise of radiomics A Novel Machine Learning-derived Radiomic Signature of the Whole Lung Differentiates Stable From Progressive COVID-19 Infection: A Retrospective Cohort Study Decoding COVID-19 pneumonia: comparison of deep learning and radiomics CT image signatures Discrimination of pulmonary ground-glass opacity changes in COVID-19 and non-COVID-19 patients using CT radiomics analysis This study received funding from *BLINDED*, *BLINDED*, and *BLINDED*. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. There are no conflicts of interest to declare.J o u r n a l P r e -p r o o f