key: cord-0838283-a4jyoqlv authors: Xu, Fangyi; Lou, Kaihua; Chen, Chao; Chen, Qingqing; Wang, Dawei; Wu, Jiangfen; Zhu, Wenchao; Tan, Weixiong; Zhou, Yong; Liu, Yongjiu; Wang, Bing; Zhang, Xiaoguo; Zhang, Zhongfa; Zhang, Jianjun; Sun, Mingxia; Zhang, Guohua; Dai, Guojiao; Hu, Hongjie title: An original deep learning model using limited data for COVID‐19 discrimination: A multicenter study date: 2022-04-18 journal: Med Phys DOI: 10.1002/mp.15549 sha: 4fb2522a29a291f482fe09808e3bf18e2cbfbc08 doc_id: 838283 cord_uid: a4jyoqlv OBJECTIVES: Artificial intelligence (AI) has been proved to be a highly efficient tool for COVID‐19 diagnosis, but the large data size and heavy label force required for algorithm development and the poor generalizability of AI algorithms, to some extent, limit the application of AI technology in clinical practice. The aim of this study is to develop an AI algorithm with high robustness using limited chest CT data for COVID‐19 discrimination. METHODS: A three dimensional algorithm that combined multi‐instance learning with the LSTM architecture (3DMTM) was developed for differentiating COVID‐19 from community acquired pneumonia (CAP) while logistic regression (LR), k‐nearest neighbor (KNN), support vector machine (SVM), and a three dimensional convolutional neural network set for comparison. Totally, 515 patients with or without COVID‐19 between December 2019 and March 2020 from five different hospitals were recruited and divided into relatively large (150 COVID‐19 and 183 CAP cases) and relatively small datasets (17 COVID‐19 and 35 CAP cases) for either training or validation and another independent dataset (37 COVID‐19 and 93 CAP cases) for external test. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, precision, accuracy, F1 score, and G‐mean were utilized for performance evaluation. RESULTS: In the external test cohort, the relatively large data‐based 3DMTM‐LD achieved an AUC of 0.956 (95% confidence interval, 95% CI, 0.929∼0.982) with 86.2% and 98.0% for its sensitivity and specificity. 3DMTM‐SD got an AUC of 0.937 (95% CI, 0.909∼0.965), while the AUC of 3DCM‐SD decreased dramatically to 0.714 (95% CI, 0.649∼0.780) with training data reduction. KNN‐MMSD, LR‐MMSD, SVM‐MMSD, and 3DCM‐MMSD benefited significantly from the inclusion of clinical information while models trained with relatively large dataset got slight performance improvement in COVID‐19 discrimination. 3DMTM, trained with either CT or multi‐modal data, presented comparably excellent performance in COVID‐19 discrimination. CONCLUSIONS: The 3DMTM algorithm presented excellent robustness for COVID‐19 discrimination with limited CT data. 3DMTM based on CT data performed comparably in COVID‐19 discrimination with that trained with multi‐modal information. Clinical information could improve the performance of KNN, LR, SVM, and 3DCM in COVID‐19 discrimination, especially in the scenario with limited data for training. The novel coronavirus disease 2019 (COVID-19) has spread as a pandemic all over the world since its first outbreak in the late of 2019, with great threats and economic implications to human life. 1 As of February 2021, there have been more than 110 million confirmed cases worldwide with almost 2.5 million deaths included according to the latest report from the World Health Organization. 2 Presently, the reverse transcriptase polymerase chain reaction (RT-PCR) is widely used for the diagnosis of patients with COVID-19. 3 Nevertheless, RT-PCR might not be sensitive enough for COVID-19 screening, especially for early detection of the suspicious patients. [4] [5] [6] [7] As a fast imaging technology, computed tomography (CT) could show the pulmonary structure and certain abnormalities of patients rapidly without any invasive operations, which had been proved to be able to provide complement information for early detection in suspicious COVID-19 patients and severity assessment in confirmed cases. 5, [8] [9] [10] However, demand for chest CT examinations in COVID-19 screening among highly suspected cohorts increased the interpretation burden of radiologists dramatically and led to certain consumption of limited medical resource in emergent scenarios. Furthermore, COVID-19 could present heterogeneous imaging findings and may share some similar radiological features with pneumonia caused by other infection, making it challenging to discriminate between COVID-19 and other types of pneumonia. 5 Recently, artificial intelligence (AI) is developing rapidly and has been extensively applied to clinical settings to do medical tasks, for example, the pulmonary nodule detection, the cerebral hemorrhage prediction, the malignancy identification of mass in human anatomic organs and the treatment management and prognosis prediction of tumor. [11] [12] [13] [14] [15] Regarding the COVID-19 diagnosis, AI has been proved to be a highly efficient and accurate tool. 16 Several studies have demonstrated the promise of machine learning and deep learning in COVID-19 relevant investigations. [17] [18] [19] [20] [21] A deep learning algorithm was developed with 19291 CT scans from 14435 pneumonia patients with or without COVID-19 and achieved an accuracy of 94% for lesion detection in validation cohorts. 19 In another study, 1381 patients were used to build an automated radiomics CT signature for COVID-19 detection, which had an area under the receiver operating characteristic curve (AUC) of 0.882 (95% CI, 0.851∼0.913) in the test cohort consisting of 641 patients. 22 However, previous AI studies on COVID-19 usually required either enough label force or a large number of targeted cases for algorithm development, which was physically and emotionally exhausting. Considering the certain radiological similarity between COVID-19 and community-acquired pneumonia (CAP), specific clinical features like laboratory test results might provide critical supplemental information for COVID-19 diagnosis, 1 but the diversity of laboratory tests and the validity of responding results increased the difficulty of data collection, which to some extent limited its use in the field of COVID-19-related AI studies. Therefore,the purpose of this study was to construct a diagnostic algorithm with high robustness using limited multi-modal data for the discrimination between COVID-19 and CAP. The institutional review board of the five hospitals approved this multicenter retrospective study and waived the informed consent since patient information was anonymized to ensure privacy. A sum of 644 patients were enrolled between December 2019 and March 2020 from five different hospitals. The corresponding clinical information and CT data were collected and reviewed. Patients with positive RT-PCR results for severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) were included in COVID-19 dataset. Patients with positive CT findings but diagnosed as other CAP by negative RT-PCR results since the COVID-19 outbreak were included for CAP dataset. The exclusion criteria for COVID-19 and CAP datasets were as followed:(1) lack of corresponding laboratory test results; (2) the time interval between RT-PCR test and chest CT scans >14 days; (3) CT images with poor quality. Process of patient enrollment was showed in Figure 1 , and detailed information of patient distribution and clinical types was summarized in Table 1 and Appendix S-1. Image acquisition CT A novel weakly supervised algorithm that combined multi-instance learning with the long and short-term memory (LSTM) architecture (MIL-LSTM) was designed for the discrimination between COVID-19 and CAP. The lesion layers in 3D CT scans, instead of one randomly selected slice from averaged groups or all slices in CT scans, were selected as the input instances for this novel 3D-MIL-LSTM (3DMTM) algorithm using a lesion instance generator based on a pneumonia segmentation model (constructed by Infervision Medical Technology Co., Ltd.), 23 so as to reduce the annotation label force and to enhance model performance by extracting more spatial information of lesions. Meanwhile, another three dimensional convolutional neural network (3D CNN) and three classic machine learning algorithms including logistic regression (LR), k-nearest neighbor (KNN), and support vector machine (SVM) were also developed using 3D CT data to validate the feasibility of newly proposed algorithm. To verify the role of clinical information in identifying COVID-19, clinical and radiological features were also concatenated for training when exploring the effects of multi-modal information on the performance of algorithms in identifying COVID-19. Notably, the impact of training data size on model performance was also studied by exchanging training and validation cohorts. Figure 1 showed the process of model development. Details of algorithm design were available in Figure 2 . The proposed 3DMTM algorithm with the relatively large dataset (150 COVID-19 and 183 CAP cases) for training presented excellent performance (area under the receiver operating characteristic curve (AUC) = 0.956) to differentiate COVID-19 from CAP. 3DMTM trained with the relatively small dataset (17 COVID-19 and 35 CAP cases) maintained a good diagnostic performance (AUC = 0.937) for COVID-19 discrimination while the AUC of 3D CNN decreased from 0.803 to 0.714 dramatically with the reduction in training data. No matter which dataset was used for primary training, 3DMTM based on CT data showed comparable performance in COVID-19 discrimination with that based on multi-modal information. Data from four of the five hospitals was used for model development and was divided into relatively large and small datasets as either training or validation cohorts through different combinations while the fifth hospital acted as the data supplier of the independent external test cohort. Detailed dataset combinations are showed in Table 2 . Three classic machine learning models, including KNN, SVM, and LR, were trained with selected clinical features or the combination of clinical and radiomics features. We first utilized the relatively large dataset as the training cohort and the relatively small dataset as the validation cohort. In subsequent, training and validation cohorts were switched in order to reveal the robustness of employed algorithms on different sized datasets. A sum of 12 machining learning models was obtained (Table 3) . Of note, 3DCM and 3DMTM were both developed with the same procedure and dataset combinations (LD-CT and SD-CT, MMLD and MMSD). The corresponding developed models are listed in Table 3 . To understand how 3DCM and 3DMTM models identified COVID-19, we visualized the most informative regions for these models on CT images using gradientweighted class activation mapping (Grad-CAM). 24 As an output, attention heat maps were generated to indicate F I G U R E 1 Flow diagram of patient enrollment. A sum of 644 patients with or without COVID-19 were collected from five hospitals in this study. Based on inclusion and exclusion criteria, 204 COVID-19 patients (298 CT scans) and 311 CAP patients (470 CT scans) were finally recruited for model development. Patients from four hospitals (H1∼4) were used for model development while patients from the fifth hospital (H5) as independent external test data. During model development, large and small datasets were exchanged once from training to validation sets for robustness assessment. CAP, community-acquired pneumonia; COVID-19, coronavirus disease 2019; CT, computed tomography; H1∼5, hospital 1∼5; RT-PCR, reverse transcriptase polymerase chain reaction Abbreviations: CI, clinical information; CT, computed tomography. the suspicious area in CT images that contributed most to identify COVID-19. Area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, precision, accuracy, F1 score, and G-Mean were utilized to evaluate the diagnostic performance of these proposed models. using DeLong test. 26 In addition to the newly proposed 3DMTM algorithm, we also utilized 3D CNN algorithm and classical machine learning models (KNN, SVM, and LR) to identify COVID-19. A relatively large dataset (150 COVID-19 cases with 251 CT scans and 183 CAP cases with 334 CT scans from H1 and H4) and a relatively small dataset (17 COVID-19 cases with 17 CT scans and 35 CAP cases with 35 CT scans from H2 and H3) were firstly utilized as the training and validation datasets, respectively. As shown in Figures 3A and 4 To explore the feasibility of the proposed algorithms in different scenarios, the relatively small and large datasets were switched once as training set to simulate the data-insufficient scenario and to explore the impact of data size on model performance. Although performance decrease was noted in all small data-based models in the validation cohort, 3DMTM-SD still presented excellent ability in differentiating COVID-19 from CAP (AUC, 0.928, 95%CI, 0.898∼0.957) with an increased F1-score of 0.919 (Figures 3C and 4) . In the independent external test cohort, 3DMTM-SD outperformed other small data-based algorithms with a comparable AUC of 0.937 (95% CI, 0.909∼0.965) and a F1-score of 0.910 to 3DMTM-LD ( Figures 3D and 4 and Appendix S-5). 3DCM-SD showed significantly inferior diagnostic performance with the reduction of training data ( Figure 3D and Appendix S-5). Noticing the value of radiological information in identifying COVID-19, we further studied if multi-modal data would improve the model diagnostic performance in discriminating between COVID-19 and CAP by combing CT imaging features with selected clinical features. It turned out that all models in our study benefited from the additional clinical features in the validation cohort, no matter which dataset (the relatively small or large datasets) they were trained on (Figure 4) . In the external test cohort, the performance of KNN-SD, LR-SD, SVM-SD, and 3DCM-SD got improved dramatically while and 3DMTM benefited slightly from the inclusion of clinical information (Appendix S-5). Attention heat maps were generated in our study to interpret the diagnostic process of 3DCM and 3DMTM,which could provide visual information like lesion location and the probability of targeted lesion to be COVID-19. As can be seen in Figure 5 , inflammation lesions focused by 3DMTM were much larger than that noted by 3DCM and shared a decent consistency with gold standard lesions annotated by senior radiologists. In this study, a novel weakly supervised 3DMTM algorithm was developed for the discrimination between COVID-19 and CAP. Compared to the previous studies, this study owned four innovations. First, the original 3DMTM algorithm was developed with limited multi-modal and multicenter data; second, no manual annotation was required for algorithm training; third, we systematically evaluated the performance of 3DMTM, classic machine learning algorithms, and 3D-CNN in identifying COVID-19 from CAP; last, the impact of sample size on the performance of those algorithms was investigated, and an independent external dataset was used to verify the model robustness in this study. Many scholars have demonstrated the promising value of machine learning or deep learning technology in diagnosis, prognosis prediction, and medical management of COVID-19 since its outbreak. 20,28-31 Li et al. used a dataset consisting of 3322 patients with 4356 chest CT exams to develop a deep learning model, which could fully automatically detect COVID-19 with an AUC of 0.96 in the test set. 4 In another study, which included 1020 chest CT images from 108 COVID-19 patients and 86 non-COVID-19 pneumonia patients, 10 well-known CNNs were trained and showed good performance to differentiate COVID-19 and non-COVID-19 pneumonia with AUCs of 0.894-0.994. 32 Xu et al. established an early screening system to differentiate COVID-19 from influenza-A viral pneumonia (IAVP) and normal patients with 618 CT samples (219 COVID-19, 224 IAVP and 175 normal cases), of which the overall accuracy was up to 86.7% from the perspective of CT cases as a whole. 33 Compared with previous deep learning researches about COVID-19, which tended to recruit a large number of data or annotation for algorithm training, the novel deep learning algorithm in our study, 3DMTM-LD, was trained with less than 500 chest CT scans (150 COVID-19 cases with 251 CT scans and 183 CAP cases with 334 CT scans) and showed comparable excellent In addition, no manual annotation was required during the model development. What's more, 3DMTM also demonstrated a decent feasibility when trained on the small dataset (17 COVID-19 cases with 17 CT scans and 35 CAP cases with 35 CT scans) and validated on the relatively large dataset,as evidenced by the unaffected diagnostic performance of 3DMTM-SD (AUC = 0.928, accuracy = 95.3%). In contrast, an obvious decrease was noted in the performance of 3DCM-SD to differentiate COVID19 from CAP when trained on the relatively small dataset. The decent robustness of 3DMTM algorithm in differentiating COVID-19 from CAP benefited from its key components consisting of MIL-LSTM architecture. The automatic segmentation algorithm in lesion instance generator enabled efficient selection of instances (slices) with lesions from whole CT scans to improve the signal noise ratio (SNR). MIL, in which labels are associated with bags rather than the instances in the bag, greatly reduces label requirement while CNN is a fully supervised deep learning model that asks for fully labeled samples for training. [34] [35] [36] [37] LSTM is one special type of recurrent neural networks (RNNs),and it has better control in long-term memory to reduce the signal loss during the process of conventional RNN architectures and to provide spatial information among layers. 18, 38, 39 Thus, the combination of those two algorithms allowed 3DMTM to extract more spatial information with high SNR from targeted lesion without any manual annotation. Especially in the case of insufficient training data, 3DMTM could effectively extract useful information from limited data for training without any manual annotation. Given that the novel SARS-COV-2 may coexist with human in our daily life for a long time, radiological manifestations may vary with the mutation of virus or the regional divergence in COVID-19 patients all over the world. The robustness of our algorithm with different data size may allow the timely diagnosis and treatment management for those patients with mutated SARS-COV-2 from different regions, which may also have potential value in medical management of rare diseases. Epidemiological investigations verified the role of clinical information in the diagnosis and management of COVID-19 patients. 1,40-43 Li et al. discovered several new associations between clinical features by reviewing COVID-19 data from 151 published studies and developed an AI model to discriminate COVID-19 from influenza cases with a sensitivity of 92.5% and a specificity of 97.9%. 44 Zhang et al. developed an AI system for the differentiation of COVID-19 from common pneumonia and normal controls with 3777patients and demonstrated that clinical data could improve the performance of the system in prognosis prediciton significantly. 45 No matter which dataset was used for training and validation, the inclusion of clinical information could improve the diagnostic performance of all models proposed in our study, which confirmed the importance of clinical data for COVID-19 diagnosis. Meanwhile, in the external test cohort, KNN-MMSD, LR-MMSD, SVM-MMSD, and 3DCM-MMSD benefited significantly from clinical information while models trained with relatively large data just achieved slight enhancement in performance of COVID-19 discrimination, indicating the essential roles of multi-modal information when sample size was limited. Of note, the slightly enhancement of 3DMTM with multi-modal data might result from its ability to effectively extract key and extra spatial information from lesions on CT images, which equalized the impact of multi-modal data on model performance. Considering the difficulty of clinical data collation, the 3DMTM algorithm in this study might be useful in the early screening of COVID-19, especially in the case without comprehensive clinical information. The black box mechanism of deep learning technology leads to the lack of the transparency of its operation process. 4 To improve the interpretability of deep learning algorithms in this study, attention heat maps were generated using Grad-CAM to indicate suspicious areas that contribute most to the identification of COVID-19. 24 The visualization of 3DCM and 3DMTM was realized in our study to show not only the judgment process of 3DCM and 3DMTM models, but also the more precise recognition of inflammation lesion in CT scans of 3DMTM. Without manual annotation, more lesion area was noted by 3DMTM rather than 3DCM, and a higher SNR was obtained by 3DMTM, which might explain its outstanding diagnostic performance in identifying COVID-19. This visual output provided relatively intuitive information about lesion location and reference proportion in the deep learning process, which might be especially useful for the detection of subtle pathological changes in asymptomatic patients with no obvious macroscopic imaging findings. There were also several limitations in our study. First, pneumonia could be caused by different factors like bacteria, virus, fungus, and medicine, we only focused on the binary discrimination between COVID-19 and CAP instead of a detailed etiology classification due to the lack of etiological confirmation of CAP cases involved in this study. Second, the 3DMTM algorithm was just trained for COVID-19 diagnosis in our study. Subsequently, we would further expand our data collection for the severity classification, prognosis prediction of COVID-19, and the detailed etiological analysis of pneumonia. Third, 3DMTM was not compared with radiologists in COVID-19 diagnosis, and we would then make a systematic analysis on the potential value of 3DMTM in clinical practice. In conclusion, the weakly supervised algorithm 3DMTM developed in this study showed excellent robustness in discrimination between COVID-19 and CAP with limited chest CT data. Clinical information could significantly improve the performance of KNN, LR, SVM, and 3DCM in COVID-19 discrimination in the scenario with limited data for training. 3DMTM based on CT data performed comparably in COVID-19 discrimination with that trained with multi-modal information. AUC area under the receiver operating characteristic curve CAP community acquired pneumonia CI confidence interval COVID-19 coronavirus disease 2019 CT computed tomography KNN k-nearest neighbor LR logistic regression MIL-LSTM multi-instance learning with the long and short-term memory RT-PCR reverse transcriptase polymerase chain reaction SARS-CoV-2 severe acute respiratory syndrome coronavirus 2 SVM support vector machine 3D CNN three-dimensional convolutional neural network 3DMTM three-dimensional MIL-LSTM algorithm WHO coronavirus disease (COVID-19) dashboard Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 Cases Sensitivity of chest CT for COVID-19: comparison to RT-PCR Positive conversion of COVID-19 after two consecutive negative RT-PCR results:a role of low-dose CT The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the fleischner society Chest CT findings of COVID-19 pneumonia by duration of symptoms Role of computed tomography in predicting critical disease in patients with covid-19 pneumonia: a retrospective study using a semiautomatic quantitative method Application of computational biology and artificial intelligence technologies in cancer precision drug discovery Artificial intelligence in lung cancer pathology image analysis Artificial intelligence in cancer diagnosis and prognosis: opportunities and challenges Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs A nomogram model of radiomics and satellite sign number as imaging predictor for intracranial hematoma expansion Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19 Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images Deep learning analysis provides accurate COVID-19 diagnosis on chest computed tomography From community-acquired pneumonia to COVID-19: a deep learning-based method for quantitative analysis of COVID-19 on thick-section CT scans Development and validation of an automated radiomic CT signature for detecting COVID-19 A deep learning integrated radiomics model for identification of coronavirus disease 2019 using computed tomography Grad-CAM: visual explanations from deep networks via gradient-based localization The meaning and use of the area under a receiver operating characteristic (ROC) curve Fast implementation of DeLong's algorithm for comparing the areas under correlated receiver operating characteristic curves Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach A predictive model and scoring system combining clinical and CT characteristics for the diagnosis of COVID-19 Utility of artificial intelligence amidst the COVID 19 pandemic: a review COVID-19 Pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image: model development and validation Predictive value of CT in the short-term mortality of coronavirus disease 2019 (COVID-19) pneumonia in nonelderly patients: a case-control study Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks A deep learning system to screen novel coronavirus disease 2019 pneumonia. Engineering (Beijing) A prototype learning based multi-instance convolutional neural network A transfer learning-based multiinstance learning method with weak labels Classification of volumetric images using multi-instance learning and extreme value theorem Fast multi-instance multi-label learning Long short-term memory Antidecay LSTM for siamese tracking with adversarial learning A diagnostic model for coronavirus disease 2019 (COVID-19) based on radiological semantic and clinical features: a multi-center study Modeling and forecasting the COVID-19 pandemic in India Retrospective analysis of clinical features in 134 coronavirus disease 2019 cases An AI-based radiomics nomogram for disease prognosis in patients with COVID-19 pneumonia using initial CT images and clinical indicators Using machine learning of clinical data to diagnose COVID-19: a systematic review and metaanalysis Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography The authors have no conflict to disclose. The datasets generated for this study are available upon request to the corresponding author.