key: cord-0859153-16k8ytj1 authors: Zhao, Wentao; Jiang, Wei; Qiu, Xinguo title: Deep learning for COVID-19 detection based on CT images date: 2021-07-12 journal: Sci Rep DOI: 10.1038/s41598-021-93832-2 sha: 2ce267a608196b0627d6f2fdb5c1e3bae603454c doc_id: 859153 cord_uid: 16k8ytj1 COVID-19 has tremendously impacted patients and medical systems globally. Computed tomography images can effectively complement the reverse transcription-polymerase chain reaction testing. This study adopted a convolutional neural network for COVID-19 testing. We examined the performance of different pre-trained models on CT testing and identified that larger, out-of-field datasets boost the testing power of the models. This suggests that a priori knowledge of the models from out-of-field training is also applicable to CT images. The proposed transfer learning approach proves to be more successful than the current approaches described in literature. We believe that our approach has achieved the state-of-the-art performance in identification thus far. Based on experiments with randomly sampled training datasets, the results reveal a satisfactory performance by our model. We investigated the relevant visual characteristics of the CT images used by the model; these may assist clinical doctors in manual screening. www.nature.com/scientificreports/ • We used various training steps, resolutions with and without mixup to test the impact of these hyperparameters on the results and discovered that a higher resolution and an appropriate number of training steps are effective in raising the model performance. As the model itself already yields excellent results, provided the data are sufficient, there is little impact of implementing mixup on the results. • With five different strategies for parameter initialization in the models, we studied the impact of initialized parameters on the model performance. Our results demonstrate that different pre-training parameters influence the final performance of fine-tuned models. By utilizing a larger out-of-field dataset for pre-training, the model can be more effectively generalized. • By comparing our results with those from previous studies, we demonstrate that our models based on transfer learning are better than those based on structural design and that our models achieve state-of-the-art performance. Furthermore, we evaluated the performance of our model in a case in which there was a small quantity of downstream data and found that it still showed excellent performance in identifying COVID-19. • With visualization, we investigated the mechanism behind the model for COVID-19 testing to better aid clinical decision-making. COVID-19 research. Currently, research on COVID-19 is being effectively carried out in various areas. Reference 20 review the various types of scalable telehealth services used to support patients infected by COVID-19 and other diseases. Reference 21 discuss the different wearable monitoring devices and respiratory support systems which are frequently used to assist coronavirus affected people. Reference 22 present an overview of the existing technologies, which are frequently used to support the infected patients for respiration. They outline a comparative analysis among the developed devices necessary challenges and possible future directions for the proper selection of affordable technologies. Reference 23 propose a system that restricts the spread of COVID-19 by detecting people not wearing any facial mask in a smart city network. In the face of the potential for using CT images as a complementary screening method for COVID-19, alongside the challenges of interpreting CT for COVID-19 screening, extensive studies have been conducted on how to detect COVID-19 using CT images. Deep learning is now widely used in all aspects of COVID-19 research aimed at controlling the ongoing outbreak [24] [25] [26] [27] [28] , reference 29 give an overview of the recently developed systems based on deep learning techniques using different medical imaging modalities such as CT and X-ray. Reference 17 established a database of hundreds of CT scans of COVID-19 positive cases and developed a deep learning approach with high sample efficiency based on self-supervision 30 and transfer learning 31 . In addition, researchers have developed an artificial intelligence system capable of diagnosing COVID-19 and separating the disease from the other common pneumonia as well as the normal cases 32 . Furthermore, reference 33 created a library containing CT images of 1,521 pneumonia patients (including those with COVID-19), 130 clinical symptoms (a series of symptoms including biochemical and cellular analysis of blood and urine), as well as the clinical symptoms of SARS-CoV-2, and made predictions on whether each patient experienced negative, mild, and severe cases. With machine-driven design exploration, reference 34 proposed a deep convolutional neural network structure, COVIDNet-CT, based on CT images. Similarly, leveraging 104,009 CT images from 1,489 patients collected from the China National Center for Bioinformation (CNCB) (China) 32 combined with data cleaning and preparing in a suitable format for benchmarking, a COVIDx-CT dataset was built, along with explainability-driven performance validation and analysis using the GSInquire technology 35 . Building upon the above progress, researchers proposed the COVIDx CT-2 datasets, which increases the number and diversity of patients 36 . Transfer learning. Transfer learning is the cornerstone of computer vision. Various categorization tasks related to images 37 can achieve greater performance with datasets of a limited size with transfer learning than using any other method. Previous work has shown that effective performance can be achieved through pretrained models fine-tuned on specific tasks 38, 39 . With the global spread of the COVID-19 pandemic, accessibility of first-hand CT images and clinical data is critical for guiding clinical decisions, providing information which can deepen our understanding of the patterns of infection by the virus, and offering systematic models for early diagnosis and timely medical interventions. A key approach is to establish a comprehensive database with open access to CT images and associated clinical symptoms to facilitate the global fight against COVID-19. As mentioned in Related work section, several datasets have been built and are open for researchers, doctors, and data scientists for COVID-19-related research. Currently, although the COVIDx-CT dataset is evidently larger than many other CT datasets used in the literature on COVID-19 testing, a potential limitation of using COVIDx-CT for deep neural network learning lies in the limited patient demographic diversity. Specifically, as COVIDx-CT is collected from the CNCB, only information from the different provinces in China is available, meaning the symptoms of COVID-19 in the CT images may not be appropriately generalizable to cases beyond China. Increasing the number and diversity of patients would make deep neural networks more varied and comprehensive, so that they can be more generalizable and applicable in different clinical environments around the world. By carefully processing and organizing the CT images of patients based on various CT devices, solutions, and validation abilities, previous researchers 36 established the COVIDx CT-2A and COVIDx CT-2B datasets. COVIDx CT-2A involves 194,922 images from 3,745 patients aged between 0 and 93, with a median age of 51. Each CT scan per patient has many CT slides. We use the CT slides as the input images to detect COVID-19, making the COVID-19 detection problem an image 45 . The purpose of establishing this validation set is to investigate, for instance, whether adding weak validation (i.e., findings without using RT-PCR and radiological tests) training data would boost the performance of the model. This validation can further increase the breadth and diversity of the dataset. In view of the comparison with previous working models and the openness of data, in the present study we employed COVIDx CT-2A for COVID-19 testing. Figure 1 illustrates the relevant examples in the COVIDx CT-2A dataset, including 3 types of CT scans: novel coronavirus pneumonia (NCP) infected by SARS-CoV-2, common pneumonia (CP), and normal controls. We applied some modifications to images from the database to facilitate our models. Specifically, as the potential contrast in the background of the images may result in biases in the models, we removed the background with an automatic cropping algorithm to standardize the field to the body area (as shown by the red frames in Fig. 1 ). By means of comparison across various types, we identified the ground glass opacity (GGO), lung consolidation 46 , and even the presence of white pneumonia in the groups of CP and NCP. However, owing to the considerably subtle visual differences in the images between those infected with common pneumonia and those infected with SARS-CoV-2, there might be tremendous variations in the ability to distinguish between the diseases, even for radiologists. Figure 2 presents the distribution of the different types of infections and images in training, test, and validation sets. With the design exploration mode forming with machine-driven generation, previous researchers 34 have designed the deep convolutional neural network COVID-Net-CT for COVID-19 testing based on CT images. The subsequent COVID-Net CT-2 36 was then designed using this architecture as its basis. In our experiment, we adopted the ResNet-v2, which is a modified version from ResNet 47 . Next, we substituted www.nature.com/scientificreports/ group normalization 48 for batch normalization 49 and conducted a weight standardization 50 for all convolutional layers. To investigate how transfer learning utilizes external data in COVID-19 testing based on CT images, we incorporated the pre-training data from CIFAR-10 51 , ILSVRC-2012 52 , and ImageNet-21k 53 as the parameters for initialization to train the models. Hyperparameter settings for training. The general flowchart of the COVID-19 diagnosis system based on deep learning is illustrated in Fig. 3 . The total system contains two sections. In the training section, the training data are used to update the model parameters, and the performance of developed model is appraised by test data. In the test section, the model can be used to extract the feature, and finally identify the class labels based on the feature. Lastly, the developed model is assessed by some evaluation metrics like accuracy, sensitivity, specificity, and so on. The pseudocode for fine-tuning the Convolutional Neural Network (CNN) and obtaining the accuracy can be seen in Algorithm 1. For each iteration, we randomly selected b CT images to calculate the gradient and updated the network parameters. Unlike the previous standard training process, we did not constrain the epoch of iteration, but constrained the training steps instead. Regarding the choice of hyperparameters, we used the stochastic gradient descent (SGD) and set the learning rate at 0.003, the momentum at 0.9, and the batch size at 64. RGB reordering was applied, and the final input to the proposed model was provided as 512 × 512 × 3 image. Concerning data augmentation, for the training set we first tailored the images according to the annotated cropping frame, and then adjusted them to 512 × 512 pixels, randomly segmented them to 480 × 480 pixels, followed by random horizontal flips and normalization. For the test set, we simply adjusted the images that were cropped according to the annotation, and then resized them to 480 × 480 . We used 10,000 training steps in our experiments. To fine-tune the model, we first conducted a warmup 54 for the learning rate, and then reduced the learning rate three times at a rate of 10x during the entire training. The details are provided in Parameter sensitivity section. Finally, we used mixup (Eq. (1)) for data augmentation. www.nature.com/scientificreports/ Here, x i and x j are the initial input vectors, while y i and y j are the labels. Through mixup, we obtained new vectors and labels. As the calculation of loss using cross entropy is a convex optimization problem, the convex optimization problem has good convergence properties when solved by gradient descent, we used cross entropy as the loss function (Eq. 2). where x ∈ R N×C is the output of the model, class ∈ R N is the label of the CT imaging and 0 ≤ class[i] ≤ C − 1. In this section, we investigate the model performance in testing for COVID-19. Specifically, we endeavor to address the following questions: • How are different hyperparameters, including various resolutions, training steps, and mixup, used to affect the model performance? Test performance. We utilized the training setting described in Hyperparameter settings for training section to train the models. The results are summarized in Table 1 and are compared with those from the current most advanced methods. Random, Bit-S and Bit-M are the models adopted in our laboratory, and refer to the random initialization, and methods of pre-training on ILSVRC-2012 and ImageNet-21k, which will be introduced in Impact of parameter initialization section. We compared our model with the most advanced COVID-Net CT-2 L. Table 1 reveals that our Bit-S and Bit-M models which rely on transfer learning saw an increase in accuracy of 0.71% and 1.12% over COVID-Net CT-2 L model, respectively. In addition, the accuracy of our model of random initialization was 3.60% higher than that of COVID-Net CT-1, suggesting that in comparison with models using structure space search, our model with random parameter initialization also has excellent performance. Figure 4 shows the distribution of the CT images representations after dimensionality reduction, which highlights the proper differentiation of the different categories. In the confusion matrix 55 in Fig. 5 , we demonstrate that even though radiologists may sometimes fail to distinguish between CP and NCP, Figure 4 . Distribution of characteristics of CT images after dimensionality reduction with t-SNE 57 . Each node refers to a different CT image, the color reflects the information on categories, and the meaning of the color is defined in the legend. ]. The first parameter refers to the number of steps in the warmup step, the last parameter is the end step, and the rest are the step nodes with a learning rate decaying by 10 times. Figure 6 displays the test accuracy for different resolutions and training steps with and without mixup. The results emphasize that a higher resolution can increase accuracy in identification, which means that clearer CT images contain more diagnostic clinical information. A larger training step can also improve accuracy, but the effect is less significant when it exceeds 10,000. The results suggest that for resolutions of (512, 480) and a training step of 10,000 between Fig. 6a and b, the accuracy rates are exactly the opposite (The hyperparameter settings for the experiments are the same). This phenomenon is a result of the random sampling. It indicates that the performance of the model is not enhanced by the mixup due to the data being already rich enough. To evaluate the impact of parameter initialization on the task performance, we used the pre-trained ResNet50x1 models to investigate how upstream pre-training can affect the fine-tuning performance. Random means the parameters were randomly initialized in the models. BIT-M was pre-trained on the complete ImageNet-21k dataset, a public dataset with 14,200,000 images and 21,000 categories. The images could contain multiple labels. BIT-S was pre-trained on the ILSVRC-2012 variant from Ima- www.nature.com/scientificreports/ geNet, which include 1,280,000 images and 1,000 categories. BIT-M-S was first pre-trained on the ImageNet-21k dataset and then fine-tuned on ILSVRC-2012. BIT-M-C first went through pre-training using the ImageNet-21k dataset and was then fine-tuned on CIFAR-10 which contains 60,000 images ( 32 × 32 pixels) across 10 categories. The weight initialization was pre-trained on out-of-domain data from a previous study 58 . For a fair comparison, we set the training step as 10,000 and used mixup, while the other settings were the same as those in Hyperparameter settings for training section. The impact of weighting initialization is illustrated in Table 3 . We repeated the experiment and the results were slightly different from Table 1 because of random sampling and random initialization of model parameters. We realized that the parameter pre-trained on ImageNet-21k exhibited better performance in generalization compared to that pre-trained on ILSVRC-2012. Meanwhile, this performance would not be affected even by the fine-tuning on out-of-field datasets. Afterwards, we calculated the test performance for every 100 steps, presented in Fig. 7 . The models pre-trained on ImageNet-21k (BIT-M, BIT-M-S, and BIT-M-C) exhibited better performance in the evaluation with the test set at later stages than did the ILSVRC-2012 initialized weighting (BIT-S). This result highlights that training with the larger dataset results in greater generalizability. Influence of the size of labeled training data on model performance. To evaluate how the models perform on the small downstream datasets akin to those which would be used in real-world situations, a certain number of images from each category were randomly selected for a performance test. For each category, we randomly chose 50, 100, 500, and 1,000 samples for training and tested the trained model to see the identification rate with the test set. The results of these tests were presented in Fig. 8 . The histogram on the right showed the outcomes of the Imagenet21k pre-trained model using the entire training set, CT-2L, CT-2S, and CT-1. When conducting these tests, we noticed that BIT-M achieved a higher test accuracy with a limited number of labeled images. When 100 images were selected from each category, the accuracy (94.8%) already exceeded that of the experimental result using CT-1 (94.5%). When 1,000 images were selected, the accuracy (98.0%) was as good as that of CT-2S (97.9%). This lends support to the immense potential of our transfer learning models, which can still function well using limited dataset. This suggests that the priori knowledge learned through pre-training on large, out-of-field datasets can still ensure an excellent performance in the case of limited training data. Qualitative analysis of Covid-19 testing of the model. Although performance indicators are useful for model evaluation, they fail to explain the decision-making behavior of the network. In this regard, we employed the Grad-CAM 59 visualization technique to explore the areas of concern for the models in COVID-19 testing, to better understand which characteristics of CT images are key for diagnostic accuracy, and thus aid clinical decision-making. As demonstrated in Fig. 9 , we first cropped the images using the detection frame (introduced in Hyperparameter settings for training Section), enlarged them to 480 × 480 pixels, and used www.nature.com/scientificreports/ Grad-CAM for visual explanation. All the predictions of the model using CT images in Fig. 9 are the same as the actual detection results. In most cases, the performance of the model is the same as would be expected for typical human visual cognition. This is particularly true for CP, as the model successfully focus on the disease areas, and display the affected regions of lungs. The radiologist further can apply color visualization approach using Grad-CAM for making efficient and confident decision 60 . For the norm case, the model focuses more on the lower region. Although NCP due to SARS-CoV-2 could be detected using the first and third CT images (third row in Fig. 9 ), the model was more interested in the texture at the periphery. Such a visual heuristic different from human visual perception merits further exploration, to gain better knowledge on how the model detect for COVID-19 and which features they consider most diagnostic. The discovery of these features would contribute to explaining the power of the model in COVID-19 testing, as well as assisting clinical doctors in discovering new visual indicators for COVID-19 infections for use in manual screening based on CT images. Our study applied transfer learning on COVID-19 testing using CT images and discussed the impacts of various initialization parameters on the results, demonstrating that our model which were pre-trained on ImageNet21k has strong generalizability in terms of CT images. The proposed model provides an accuracy of 99.2% while detecting the COVID-19 cases. Compared to the neural architecture search model, our model shows the stateof-the-art performance, across all metrics we have described. These ensure that COVID-19-negative patients are correctly diagnosed as negative in the vast majority of cases, reduce probability of diagnosing COVID-19-negative cases as positive and reduce the burden on the health care system. Additionally, we examined the performance of the model with limited data and found that the model still perform satisfactorily. This shows that our model is still applicable with a limited data, which is characteristic of the real situation, where large and diverse datasets may not be readily available. Finally, we explored the relevant mechanism of COVID-19 testing using Grad-CAM visualization technique to make the proposed deep learning model more interpretable and explainable. The model performs performance validation through interpretability driven in a manner consistent with the radiologist's interpretation for the CP. The investigation of normal and NCP CT images helps to explore new visual indicators to assist clinical doctors in further manual screening. The experiments demonstrate that our models are effective in COVID-19 testing. In future, we will pay attention to the evaluation of the severity of COVID-19 and attempt to discover more valuable information from CT images to combat the pandemic. We will further conduct explanatory analyses on the models, which will shed light on the detection mechanism of COVID-19, to identify key characteristics in the CT images and to facilitate the screening by clinical doctors. Although the system has good performance on public datasets, the work is still at theoretical research stage, and the models has not been validated in actual clinical routine. Therefore, we will test our system in the clinical routine and communicate with physicians to understand how they use it and their opinions about the models. Thus, we can further improve the models in our future work. The datasets analysed during the current study are available in the COVIDNet-CT repository, https:// github. com/ hayde ngunr aj/ COVID Net-CT. Coronaviridae study group of the international committee on taxonomy of viruses. the species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 A new coronavirus associated with human respiratory disease in China A novel coronavirus from patients with pneumonia in China Chest CT findings in patients with coronavirus disease 2019 and its relationship with clinical features A pneumonia outbreak associated with a new coronavirus of probable bat origin Coronavirus disease 2019 (COVID-19): A perspective from China Detection of SARS-CoV-2 in different types of clinical specimens Sensitivity of chest CT for COVID-19: Comparison to RT-PCR Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19 Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis Pulmonary pathology of early-phase 2019 novel coronavirus (COVID-19) pneumonia in two patients with lung cancer The role of chest computed tomography in asymptomatic patients of positive coronavirus disease 2019: a case and literature review Performance of radiologists in differentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images COVIDNet-S: Towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic Wearable technology to assist the patients infected with novel coronavirus (COVID-19) Breathing aid devices to support novel coronavirus (COVID-19) infected patients An automated system to limit COVID-19 using facial mask detection in smart city network Deep learning applications to combat novel coronavirus (COVID-19) pandemic A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images Automated COVID-19 diagnosis from X-ray images using convolutional neural network and ensemble of machine learning classifiers Diagnosis of COVID-19 from X-rays using combined CNN-RNN Architecture with transfer learning Predictive data mining models for novel coronavirus (COVID-19) infected patients' recovery A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19) Unsupervised feature learning via non-parametric instance discrimination A survey on transfer learning Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning COVIDNet-CT: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases From Chest CT Images Do explanations reflect decisions? a machine-centric strategy to quantify the performance of explainability algorithms COVID-Net CT-2: Enhanced deep neural networks for detection of COVID-19 from Chest CT images through bigger, more diverse learning Convolutional neural networks for medical image analysis: full training or fine tuning How to represent paintings: A painting classification using artistic comments Compare the performance of the models in art classification Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets A Fully Automated Deep Learning-based Network For Detecting COVID-19 from a New And Large Lung CT Scan Dataset Towards efficient covid-19 ct annotation: A benchmark for lung and infection segmentation The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans COVID-19 | Radiology Reference Article | Radiopaedia Chest CT Scans with COVID-19 Related Findings Dataset. Preprint, Radiology and Imaging CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity Deep residual learning for image recognition Group normalization. Proceedings of the European Conference on Computer Vision (ECCV) Batch normalization: Accelerating deep network training by reducing internal covariate shift Learning multiple layers of features from tiny images Imagenet: A large-scale hierarchical image database Imagenet large scale visual recognition challenge Accurate, large minibatch sgd: Training imagenet in 1 hour Multiclass confusion matrix library in Python Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet Visualizing data using t-SNE General Visual Representation Learning Grad-cam: Visual explanations from deep networks via gradient-based localization A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images W.Z. conceived the experiment, W.Z. and X.Q. conducted the experiment, W.Z. and W.J. analyzed the results. All authors reviewed the manuscript. The authors declare no competing interests. Correspondence and requests for materials should be addressed to X.Q. www.nature.com/scientificreports/ Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.