key: cord-1000160-qp7ct9wu authors: Nguyen, P.; Iovino, L.; Flammini, M.; Linh, L. T. title: Deep Learning for Automated Recognition of Covid-19 from Chest X-ray Images date: 2020-08-14 journal: nan DOI: 10.1101/2020.08.13.20173997 sha: cf02eda4abb178855765e26cc753a82ed247968e doc_id: 1000160 cord_uid: qp7ct9wu Abstract Background: The pandemic caused by coronavirus in recent months is having a devastating global effect, which puts the world under the most ever unprecedented emergency. Currently, since there are not effective antiviral treat- ments for Covid-19 yet, it is crucial to early detect and monitor the progression of the disease, thus helping to reduce mortality. While a corresponding vaccine is being developed, and different measures are being used to combat the virus, medical imaging techniques have also been investigated to assist doctors in diag- nosing this disease. Objective: This paper presents a practical solution for the detection of Covid-19 from chest X-ray (CXR) images, exploiting cutting-edge Machine Learning techniques. Methods: We employ EfficientNet and MixNet, two recently developed families of deep neural networks, as the main classifica- tion engine. Furthermore, we also apply different transfer learning strategies, aiming at making the training process more accurate and efficient. The proposed approach has been validated by means of two real datasets, the former consists of 13,511 training images and 1,489 testing images, the latter has 14,324 and 3,581 images for training and testing, respectively. Results: The results are promising: by all the experimental configurations considered in the evaluation,our approach always yields an accuracy larger than 95.0%, with the maximum accuracy obtained being 96.64%. Conclusions: As a comparison with various existing studies, we can thus conclude that our performance improvement is significant. our approach always yields an accuracy larger than 95.0%, with the maximum accuracy obtained being 96.64%. Conclusions: As a comparison with various existing studies, we can thus conclude that our performance improvement is significant. Covid-19 is a coronavirus-induced infection that can be associated with a coagulopathy and infection-induced inflammatory changes [1] . The disease poses a serious threat to public health, and thus in March 2020, the World Health Organization (WHO) declared Covid-19 a pandemic. So far, the virus has infected 5 more than ten millions of people across the world, and has claimed over five hundreds thousands peoples' lives. The clinical spectrum of the disease is very wide, ranging from fever, dry cough and diarrhea, but can be combined with mild pneumonia and mild dyspnoea. In some cases, the infection can evolve to severe pneumonia, causing approximately 5% of the infected patients to severe 10 lung dysfunction. Given the circumstances, patients need ventilation as they are highly exposed to multiple extra pulmonary organ failure. Since so far there have been no effective antiviral vaccines for Covid-19, it is crucial to reduce mortality by early detecting and monitoring the progression of the disease [2], so as to effectively personalize patient's treatment. Radiology is 15 part of a fundamental process to detect whether or not the radiological outcomes are consistent with the infection and radiologists should expedite as much as possible the exploration, and provide accurate reports of their findings. Chest X-ray (CXR) images of Covid-19 patients usually show multifocal, bilateral and peripheral lesions, but in the early phase of the disease they may present a 20 unifocal lesion, most commonly located in the inferior lobe of the right lung. Providing doctors with a preliminary diagnosis of Covid-19 from CXR images would be of great importance, also considering the number of false positives obtained by swab results. In recent years, Artificial Intelligence (AI) has been in the forefront of methodologies applied to improve products and services in various aspects of everyday life. The proliferation of advanced Machine Learning algorithms enables a numerous number of applications in various domains. Machine Learning (ML) algorithms attempt to simulate humans' cognitive functions [3] , aiming to acquire real-world knowledge autonomously [4] . In this way, ML techniques 30 are capable of conceptualizing from concrete examples, without needing to be manually coded [5, 6] . Thanks to this characteristic, they have applications in various domains. For example, they have been applied to improve Web search by learning from a user's long-term search history [7] . For recommender systems, ML algorithms demonstrate their superiority by analyzing sentiment with 35 ensemble techniques in social applications [8] , or allowing systems to learn from various profiles, thus boosting up the recommendation outcomes [4] . In the Health care sector, the potential of ML to allow for rapid diagnosis of diseases has also been proven by various research work [3, 9, 10, 11] . Aiming to assist the clinical care, this paper presents a practical solution 40 for the detection of Covid-19 from CXR images exploiting two cutting-edge deep neural network families, EfficientNet [12] and MixNet [13] , empowering the learning process by means of three different transfer learning strategies, namely ImageNet [14] , AdvProp [15] , and Noisy Student [16] . Our experimental results on two considerably large datasets show that the proposed 45 solution outperforms the existing studies that we are aware of, in terms prediction accuracy. The main contributions of our work can thus be summarized as follows: • A framework for recognition of Covid-19 from CXR images using state-ofthe-art deep learning techniques; 50 • A successful empirical evaluation on two large datasets of CXR images; • A software prototype in the form of a mobile app ready to be downloaded. The paper is organized into the following sections. In Section 2 we briefly review convolutional neural networks, EfficientNet and MixNet as well as the transfer learning methods. Section 3 explains the dataset and metrics used for 55 our evaluation, together with the main results. The related work is reviewed in Section 5. Finally, Section 6 provides some conclusive remarks and discusses possible future research directions. As a base for our presentation, we provide a background on convolutional 60 neural networks in Section 2.1. Two families of deep neural networks, i.e., EfficientNet and MixNet, which are used as the classification engine in our work, are introduced in Section 2.2. Finally, a brief introduction to transfer learning is given in Section 2.3. Convolutional neural networks (CNNs) [17] are a family of supervised learning techniques that work on images, attempting to capture some of their intrinsic features, such as spatial and temporal structures, using a filter or kernel. A filter is a small square sliding window, and it is used to capture features from an input image, such as nodes and edges. Various types of features can be captured 70 with several filters. The convolution operation is performed by sliding the filter along the width and height of a feature map, which is either the input image, or the result of the convolution operation. An output feature map of one layer becomes the input feature map of the succeeding layer. In general, a CNN also contains the following intrinsic elements: Convolution layer : as the name suggests, this layer extracts important features of an input image by convolving the image with filters; Pooling layer : such a layer is used to downsample a feature map by taking the maximum value within a window, normally a square one, to reduce the number of parameters [17] ; Fully-connected layer : the layer works as a conventional perceptron, each of its neurons is fully connected to the previous layer. Dropout: it is used to distribute the learned representation across all the neurons. Dropout is an effective measure to combat overfitting [18] ; Softmax : the function converts a set of real numbers to probabilities, which in 85 turn sum to 1.0 [17] . Softmax is normally used as activation function in the last fully-connected layer of a CNN. Given C categories, denoted as y k the output of the k th neuron, the class that gets the maximum probability is selected as the final prediction, i.e.,ŷ = argmax p k , k ∈ 1..C, with p k being defined as below. Based on the observation that a better accuracy and efficiency can be obtained by imposing a balance between all network dimensions, EfficientNet [12] 105 has been proposed by scaling in three dimensions, i.e., width, depth, and resolution, using a set of fixed scaling coefficients that meet some specific constraints. By the most compact configuration, i.e., EfficientNet-B0 shown in Fig. 2 , there are 18 convolution layers in total, i.e., D=18, and each layer is equipped with a kernel k(3,3) or k (5, 5) . The input image contains three color channels R, G, B, The conventional practice is to use k(3,3) [19] , [20] , k(5,5) [21] , or k(7,7) 120 kernels [22] . However, larger kernels can potentially improve the model accuracy and efficiency. Furthermore, large kernels help to capture high-resolution patterns, while small kernels allow to better extract low-resolution ones. To maintain a balance between accuracy and efficiency, the MixNet [13] family has been built based on the MobileNets architectures [20, 23] . This network family 125 also aims to reduce the number of parameters as well as FLOPs, i.e., the metric used to measure the computational complexity [24] , counted as the number of In order to tune their internal parameters, i.e., weights and biases, normally 135 CNNs need a huge amount of labeled data. Furthermore, the deeper a network is, the more parameters it contains. In this respect, deeper networks would require more data to prevent overfitting and be effective. As a result, it is crucial to feed them with enough data, so as to foster the training process. However, such a requirement is hard to be met in practice, since the labeling 140 process usually is made manually, thus being time consuming and prone to error [25] . To this end, transfer learning has been conceptualized as an effective way to extract and transfer the knowledge from a well-defined source domain to a novice target domain [26, 27] . In other words, transfer learning facilitates the export of existing convolution weights from a model trained using large 145 datasets to create new accurate models exploiting a relatively lower number of labeled images. As it has been shown in various studies [28, 29] , transfer learning remains helpful even when the target domain is quite different from the one in which the original weights have been obtained. In this work, we consider the following learning methods: • ImageNet [14] : The ImageNet dataset has been widely exploited to apply transfer learning by several studies, since it contains more than 14 million images, covering miscellaneous categories; • AdvProp [15] : adversarial propagation has been proposed as an improved training scheme, with the ultimate aim of avoiding overfitting. • NS [16] : the Noisy Student learning method attempts to improve Ima-geNet classification Noisy Student Training by: (i) enlarging the trainee/student equal to or larger than the trainer/teacher, aiming to make the In an attempt to develop an expert system that can help doctors to early detect Covid-19 from CXR images, we make use of EfficientNet and MixNet as the classification engine. Moreover, we obtain network weights by means of 165 the three different learning strategies mentioned above, i.e., ImageNet, Ad-vProp, and NS. In the following section, we present the evaluation settings used to study the performance of our approach. This section explains in detail the material and methods used to evaluate 170 the proposed approach. In particular, we made use of two existing datasets and recent implementations 1 of EfficientNet and MixNet, which were built on top of the PyTorch framework. 2 Moreover we adopted pre-trained weights from different sources to speed up the learning process. The tool developed through this paper has been also published in GitHub to make it available for future 175 research. 3 We answer the following research questions to study the performance of the classifiers with respect to the different transfer learning methods: • RQ 1 : Which network family between EfficientNet and MixNet brings the 180 best prediction performance? For a classifier, it is crucial to get accurate outcomes, according to various quality metrics. We determine which deep neural network family yields the best prediction performance. • RQ 2 : Which transfer learning technique is beneficial to the final outcome? We are interested in finding which transfer learning method be-185 tween ImageNet, AdvProp, and NS helps which network, i.e., Effi-cientNet and MixNet, to obtain a better outcome. We exploited existing datasets, used by some preovious works [30, 31] All images in D 1 and D 2 have been assigned a label, i.e., either Normal or Pneumonia or Covid-19. From the testing data, three independent groups of images with same labels were created, i.e., G = (G 1 , G 2 , G 3 ), also called 210 ground-truth data. Using either EfficientNet or MixNet as classifier on the test set, we obtained three predicted classes i.e., C = (C 1 , C 2 , C 3 ) of images. The classifier performance is evaluated by measuring the similarity of the classified categories with the ground-truth ones. To this end, we exploited three metrics, namely accuracy, precision and recall, and F 1 score [29] . The rationale behind 215 the selection of such metrics is that precision, recall and F1 are useful when in the dataset the number of positive images accounts for a very small percentage of all the items in the dataset. If we call T P i = |G i ∩ C i |, i = 1, 2, 3, as the number of true positives, i.e., the items that appear both in the results and ground-truth data of class i, then 220 the metrics are defined as follows. Accuracy: This is defined as the fraction of correctly classified items to the total number of images in the test set. Precision and Recall: Precision measures the ratio of classified images for Class C i that are found in the ground-truth data Group G i ; while Recall is 225 the number of true positives found in the ground-truth data. The metric is computed as the harmonic average of precision and recall by means of the following formula: Furthermore, we make use of an additional metric to measure the computa-230 tional efficiency. Recognition speed: We measure the average number of generated predictions per second, using a system whose configurations are presented in Table 2 . To train deep neural networks such as EfficientNet and MixNet, it is neces-235 sary to have a server with a powerful computational capability. 100MB of storage space each. In the evaluation, we applied the five-fold cross validation technique on the datasets, i.e., each dataset is divided into five equal parts and each validation was performed in five independent rounds. By each round, one part is used as testing and the other four parts are used as training. In the next section, we present in detail the experimental results by referring 260 to the aforementioned research questions. This section reports and analyzes the results obtained from our experiments. We address our two research questions separately. the classifiers are able to predict the testing images with high precision. With respect to recall, we can see that for category Covid-19, all the classifiers get a considerably low score. In particular, the highest recall is 0.700, obtained by C 5 . This means that while the approach is able to find good predictions for the category, it cannot return all the items in the ground-truth data. 280 We suppose that this happens due to the limited data available for training. As shown in Altogether, through Table 4 we can see that C 5 , that is the row marked 295 with the gray color, is the configuration among the others that brings the best prediction performance. Compared to existing work that performs evaluation on the same dataset [30, 32] , our approach achieves a better performance with respect to accuracy, precision, recall, and F 1 -score. For instance, the work by Wang et al. [30] , the 300 maximum accuracy is 93.0% with similar experimental settings. In this respect, we conclude that application of the two network families EfficientNet and MixNet as well as the different transfer learning techniques brings a good prediction performance on the considered dataset. Using the system specified in Table 3 , we counted the number of predictions 305 returned by the classifiers in a second, as depicted in Fig. 3 . From the figure it is clear that C 1 , C 2 , and C 3 , corresponding to using EfficientNet-B0 as classification engine, are the most efficient configurations, as they return 138 images per second in average. EfficientNet-B3 also yields a good timing performance, i.e., using C 4 , C 5 , or C 6 as the experimental configuration, the system generates 310 127 predictions per second. All the configurations that use the MixNet family as classification engine are less efficient than the ones of the EfficientNet family. In particular, MixNet-XL is the least efficient configuration, returning only 83 predictions within a second. In this research question, we performed experiments following the five-fold cross-validation methodology. Moreover, to further investigate the applicability of the proposed approach, we made use of the D 2 dataset, which contains more images than D 1 (cf. Table 1 ). Figure 4(a) , Fig. 4(b) , and Fig. 4 (c) depict the 320 confusion matrices for EfficientNet-B0 using the three different transfer learning techniques mentioned in Section 2.3. The computed metrics for all the confusion matrices are shown in Table 5 . transfer learning with AdvProp (cf. Fig. 4(b) ) induces a better performance for Category Normal, i.e., 1,981 among 2,038 images are classified to the correct categories. Looking at Fig. 4(c) , we see that compared to the other transfer learning methods, NS has an adverse effect on the recognition of all the categories. In summary, we can conclude that EfficientNet-B0 with ImageNet 335 transfer learning fosters the best prediction performance. For EfficientNet-B3, we see that weights pre-trained with ImageNet are beneficial to the Normal category (cf. Fig. 4(d) ). At the same time, AdvProp is the transfer learning method that is suitable for recognition of Pneumonia, i.e., it helps to detect 1,388 out of 1,477 pneumonia images, which is best among the Finally, let us consider the results obtained by running MixNet-XL with weights from ImageNet, as depicted in Fig. 5 . Table 5 , we see that Configuration C 1 , i.e., the row marked with the gray color, corresponding to training EfficientNet-B0 with weights by ImageNet, is the most effective configuration with respect to accuracy, precision, recall, and F 1 for almost all 355 categories. Moreover, together with the results obtained from RQ 1 , we conclude that ImageNet is the best transfer learning strategy for both network families on the two datasets D 1 and D 2 . Answer to RQ2: Using EfficientNet-B0 in combination with weights pre-trained from the ImageNet dataset brings the best performance. This section describes the threats to the internal, external, construct, and conclusion validity. To the best of our knowledge, compared to different existing studies [34, 36, 44] , our work is the first one that deals with big datasets. In particular, in dataset D 1 there are 15,000 images, and in D 2 17,905. However, given such a large amount of data, our proposed approach is still able to obtain a 475 high prediction accuracy, gaining a reasonable recognition speed. Thus, in our opinion the results demonstrate a more reliable applicability in practice, even if it is our belief that the proposed system can be refined with more training data, so as to make it more and more effective in real-world settings. In this paper we proposed a practical solution for the detection of Covid-19 from chest X-ray images exploiting two suitable building blocks: EfficientNet and MixNet as the prediction engine and effective transfer learning strategies. The approach has been validated on two existing datasets which have been widely used in various studies. The experimental results show that our pro- A few useful things to know about machine learning, Com-510 mun Machine learning in manufacturing: advantages, challenges, and applications, Production & Manufacturing Research Probabilistic models for personalizing web search Enhancing deep learning sentiment analysis with ensemble techniques in social applications Artificial intelligence-enabled rapid diagnosis of patients with covid-19 Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images Covid-19 identification in chest x-ray images on flat and hierarchical classification scenarios Rethinking model scaling for convolutional neural networks Mixed Depthwise Convolutional Kernels ImageNet Large Scale Visual Recognition Challenge Adversarial Examples Improve Image Recognition Self-training with noisy student 560 improves imagenet classification Deep convolutional neural networks for image classification: A comprehensive review Dropout: A simple way to prevent neural networks from overfitting Xception: Deep Learning with Depthwise Separable Convolu-570 tions MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Platform-Aware Neural Architecture Search for Mobile ProxylessNAS: Direct neural architecture search on target task and hardware Mobile Networks for Classification, Detection and Segmentation Pruning Filters for Efficient ConvNets Deep learning in agriculture: A survey A survey of transfer learning Using advice to transfer knowledge acquired in one reinforcement learning task to another Transfer Learning with Deep Convolutional Neu-605 ral Network for SAR Target Classification with Limited Labeled Data, Remote Sensing Automated fruit recognition using efficientnet and mixnet Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography 615 images Covid-19 image data collection Automated detection of covid-19 cases using deep neural networks with x-ray images Estimating uncertainty and interpretability in deep learning for coronavirus (covid-19) detection Classification of covid-19 in chest x-ray images using detrac deep convolutional neural network Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks, 635 Physical and Engineering Sciences in Towards an efficient deep learning model for covid-19 patterns detection in x-ray images Covid-19 screening on chest x-640 ray images using deep learning based anomaly detection Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images Deep learning based drug screening for novel coronavirus 2019-ncov, Interdisciplinary Sciences Predicting commercially available antiviral drugs that may act on the novel coronavirus (sars-cov-2) 650 through a drug-target interaction deep learning model, Computational and structural biotechnology journal Prediction of criticality in patients with severe covid-19 infection using three clinical features: a machine learning-based 655 prognostic model with clinical data in wuhan Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity Explainable deep learning for pulmonary disease and coronavirus covid-19 detection from x-rays Finding covid-19 from chest x-rays using deep learning on a small dataset Automated detection of covid-19 cases using deep neural networks with x-ray images Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Covid-caps: A capsule network-based framework 680 for identification of covid-19 cases from x-ray images Estimating uncertainty and interpretability in deep learning for coronavirus (covid-19) detection