key: cord-0813078-ghukdkap
authors: Sahinbas, Kevser; Catak, Ferhat Ozgur
title: Transfer learning-based convolutional neural network for COVID-19 detection with X-ray images
date: 2021-05-21
journal: Data Science for COVID-19
DOI: 10.1016/b978-0-12-824536-1.00003-4
sha: 7ccbd267026286239f347586f5aa7644e6b9ac51
doc_id: 813078
cord_uid: ghukdkap

Countries the world over have focused on protecting human health and combatting the COVID-19 outbreak. It has had a destructive effect on human health and daily life. Many people have been infected and have died. It is critical to control and prevent the spread of COVID-19 disease by applying quick alternative diagnostic techniques. Although laboratory tests have been widely applied as diagnostic tools, findings suggest that the application of X-ray and computed tomography images and pretrained deep convolutional neural network (CNN) models can help in the accurate detection of this disease. In this study, we propose a model for COVID-19 diagnosis, applying a deep CNN technique based on raw chest X-ray images of COVID-19 patients, which can be accessed publicly on GitHub. Fifty positive and 50 negative COVID-19 X-ray images for training and 20 positive and 20 negative images for testing phases are included. Because the classification of X-ray images needs a deep architecture to cope with the complicated structure of images, we apply five different architectures of well-known pretrained deep CNN models: VGG16, VGG19, ResNet, DenseNet, and InceptionV3. The pretrained VGG16 model can detect COVID-19 from non-COVID-19 cases with the highest classification performance of 80% accuracy among the other four proposed models, and it can be used as a helpful tool in the department of radiology. In the proposed model, a limited dataset of COVID-19 X-ray images is used that can provide more accurate performance when the number of instances in the dataset increases.

COVID-19 and provide a clinical diagnosis by using radiographical changes of COVID-19 in CT images. In this way, the model can save critical time in diagnosing the disease.

In this study, we used X-ray images of patients undergoing COVID-19 scanning, which can be accessed publicly on GitHub. We collected datasets from different hospitals; thus, we resized the pictures to be 256 Â 22. Because of the insufficient number of examples, we enriched the dataset by adding image augmentation methods. In this way, we tended to improve the performance of the classification model. Evaluation and experiment of the proposed model were achieved in terms of 50 positive and 50 negative COVID-19 X-ray images for training and 20 positive and 20 negative images for testing phases. In this study, the five different architectures of well-known pretrained deep CNN models (VGG16, VGG19, ResNet, DenseNet, and InceptionV3) were used for transfer learning by employing X-ray images of COVID-19 patients for our proposed method. Each pretrained deep CNN models can analyze an X-ray image to distinguish a COVID-19 case and healthy group owing to their more in-depth structure. Our model achieves a classification accuracy of 80% for VGG16, which gives the best performance. We used an open public dataset of chest X-ray and CT images to enable COVID-19 detection. By showing our results in graphs and tables, we demonstrate the performance of the classification.

We present a brief summary of the related research in relation to transfer learningbased diagnostics of COVID-19 using chest X-rays as inputs. X-rays and CT images have been extensively applied to detect COVID-19. Public datasets containing COVID-19 X-ray images of infected patients are provided by the studies. According to some studies, combining chest radiological imaging with clinical findings may assist in the early detection of COVID-19 [8, 9] . Ardakani et al. [10] proposed a COVID-19 diagnostic method by applying an artificial intelligence technique with 1020 CT images from 180 infected patients as the COVID-19 group and 86 other patients as the non-COVID-19 group. They used 10 familiar CNNs to separate COVID-19 infection from non-COVID-19 groups. The best performances was realized using the ResNet101 and Xception methods, which achieved 99.51% and 99.02% accuracy rates, respectively. They proved that ResNet101 is a highly sensitive model for diagnosing COVID-19 infections. Ucar et al. [11] presented a SqueezeNet model network design a COVID-19 diagnosis with Bayesian optimization additive in X-ray images. Ozturk et al. [12] presented a model for COVID-19 detection using chest X-ray images by implementing diagnostics for binary classification multiclass classification using a DarkNet model that applied 17 convolutional layers with 1125 images (125 COVID-19). Their model can help radiologists by the cloud system. Hemdan et al. [13] identified a new COVIDX-Net deep learning framework for helping radiologists to diagnose COVID-19 automatically in X-ray images. They apply seven wellknown CNN models such as VGG19 and DenseNet using 25 confirmed positive COVID-19 patients to classify COVID-19 in X-ray images. Wang et al. [14] presented a deep CNN framework called COVID-Net to detect COVID-19 cases in chest X-ray images. They provide a CXR dataset including 13,800 chest radiography images across a 13,725-patient open access data platform to assist clinicians. Ioannis et al. [15] detected coronavirus using CNN architectures that apply transfer learning with 224 images with confirmed COVID-19 in a dataset of X-ray images. They achieved an overall accuracy of 97.82%. Narin et al. [16] proposed CNN-based models such as pretrained ResNet50, InceptionV3, and InceptionResNetV2 to detect COVID-19 cases in chest X-ray images with 98% accuracy, in which the pretrained ResNet50 model was the best one. Song et al. [17] introduced a CNN-based CT diagnosis system to classify COVID-19 patients using chest CT scans of 88 patients diagnosed with COVID-19, with an area under the curve of 0.99. Wang et al. [14] presented proof-of-principle for the modified Inception transfer learning CNN model to classify COVID-19 cases by using 325 CT images with 85.2%. Xiaowei at al [18] . present early screening three-dimensional deep learning model to separate COVID-19 pneumonia from influenza-A viral pneumonia and irrelevant-to-infection groups using CT images with 86.7% accuracy. Xu et al. [19] provided a pretrained deep CNN ResNet model to detect COVID-19 patients using CT images with an 86.7% accuracy performance rate. Sethy and Behera [20] proposed a CNN model built on the pretrained ResNet50 with an support vector machine (SVM) classifier using X-ray images and demonstrated the best performance. Finally, other studies in this field have focused on X-ray and CT datasets related to COVID-19 detection [21, 22] .

The organization of the chapter is as follows: Section 1 contains the introduction, information about COVID-19 disease, the aims of this chapter and a brief summary of related studies with X-ray and CT using neural networks methods. Section 2 includes materials and methods, CNN, and transfer learning methods. Section 3 provides materials and methods, information about data preprocessing, the amount of data used for testing and training, and evaluation model performance results; and finally, Section 4 provides the conclusion.

In this section, we will consider deep neural network analyzes and transfer learning methods. CNN is the most frequently used neural network class to analyze visual images in deep learning. CNN mainly contains many layers of neural networks, providing solutions especially for image and video recognition, classification, and analysis. A CNN architecture was designed with inspiration from the organization's visual cortex, similar to the connection model of neurons in the human brain [23] . Recently, learning from large scale datasets such as ImageNet has been effective in CNN's success. CNN basically consists of three main layers. These are the convolution layer, the pooling layer and the fully connected layer. Basically, convolutional and pooling layers provide the learning of the model, while the full connection layer provides the classification [24] .

The convolutional layer is the main part of CNN architectures. In this layer, we can learn about the properties of the inputs. The feature map is created by applying highand low-level filters in the input image. In general, in this layer, the sigmoid, ReLU, or tanh function can be used as an activation function [24] . The mathematical equation of the convolution process is presented in Eq. (24.1). Whereas m and n in the equation indicate the dimensions of kernel m Â n, i and j indicate the matrix coordinate from which the convolution will be calculated:

A pooling layer is generally applied between the convoluted layers. Its main purpose is to reduce the computational power required to process the model by reducing the size of the feature map. In addition, it provides effective training of the model by removing invariable and dominant features from the model. Although there are many pooling processes, maximum and average pooling layers are most commonly applied.

In the full connection layer, all neurons from the previous layer are connected to each neuron in this layer. Depending on the structure of the CNN architecture, the full link layer can be one or more. The output layer comes after the last full link layer. In classification studies, at this stage, using Softmax regression, output distributions are obtained by acquiring probability distributions for the output classes [24] .

Transfer learning is a strategy in machine learning in which information extracted by a CNN from given and related data is transported to solve a different but related problem [25] . Transfer learning is based on the principle of developing learning by using information obtained from a related task learned in a new task by transfer. Pretrained deep learning networks are already well-trained using other datasets, which could be refined to obtain high accuracy with a much smaller dataset; thus, it is preferred by researchers [26] .

Pan and Yang [27] provided a comprehensive review of transfer learning. In transfer learning, the learning process is not started from scratch; rather, it is started with patterns learned while solving a different problem. In this way, previous learning is used and it avoids having to start the learning process from the beginning. Besides, it prevents time-consuming computations and enables the creation of a correct model [28] .

It can be said that transfer learning is a design methodology for machine learning, not a machine learning model or technique. This design methodology commonly applies to pretrained models. These pretrained models are based on deep evolution neural networks. In deep learning, this method includes the initial training of a CNN for a classification problem using large-scale training datasets. Because a CNN model can learn to extract critical properties of the image, the availability of data for introductory training is an essential part of successful training. Depending on the CNN's capacity to recognize and select the most outstanding display features, it is evaluated whether this model is fit for transfer learning.

The VGG-Net model was developed by Simonyan et al. [29] with a small convolution in the network. Although it is a simple model, its most significant difference compared with previous models is that it is widely applied to CNN models because of its more in-depth structure, followed by layers of associated double or triple convolution layers [29] . In previous models, the layers of sharing and convolution follow each other. Approximately 138 million parameters are calculated in this model [30] .

VGG has a good representation of features for more than a million images (ImageNet dataset) from 1000 different categories. The model can function as a useful feature extractor for suitable new images. ImageNet dataset is able to extract related features from images, even new ones that do not exist or that might be in entirely different categories in the dataset. This provides the advantage of using pretrained models as an effective feature remover [29] . Fig. 24 .1 indicates the architecture of the VGG16. The VGG16 architecture uses three convolution filters with 13 convolution layers for feature extraction; each convolution layer follows a ReLU layer and has maximum pooling layers for sampling. It has three layers that are fully linked for classification, two of which serve as hidden layers, whereas the final classification layer consists of 1000 units representing image categories in the ImageNet database [29] .

This structure simulates a larger filter while preserving the benefits of smaller filter sizes. VggNet has been shown to perform better using fewer parameters, especially compared with previous models. In addition, two ReLU layers were used instead of a single ReLU layer for two convolution layers. Because the spatial size of the input volumes in each layer decreases (the result of the convolution and partnering layers), the depth of the volumes increases owing to the increasing number of filters. It works well for both object classification and edge detection problems [30] . FIGURE 24 .1 VGG16 architecture [29] . cv, convolution; fc, fully connected.

We use VGG16 network model to fine-tune our model. Suppose, we have a dataset with m samples fðxð1Þ; yð1ÞÞ; .; ðxðmÞ; yðmÞÞg for training. The network overall cost function can be defined as [29] :

ji is the connection weight between the jth element of layer 1 and the ith element of layer l þ 1, and b is the bias term of the hidden layer neuron. Eq. (24.2) is a regulation item on the right side, which can prevent overfitting, reduces the weight greatly, and adjusts the relative importance of the two terms before and after cost function l. Solving the minimum values of Eq. (24.2) and solving the minimum value of JðW ; bÞ adopts the recognized batch gradient descent optimization algorithm and calculates the partial derivative of W and b for the reverse conduction algorithm.

ResNet is an architecture designed to be more in-depth structured than all previous architectures. It consists of 152 layers. ResNet was developed in 2015 [31] . It ranked first in the ImageNet competition held in 2015, with an error rate of 3.6% [31] . Fig. 24 .2 demonstrates the architecture of the residual mapping of the model.

The most significant feature that distinguishes it from other architectures is that the blocks that feed the values to the next layers are added to the model. This value, added every two layers between the Linear and ReLU activation codes, changes the system value as stated. The value of a Â R Ã from the previous layer is added to value a Â R þ2 Ã [32] . Increasing the number of layers in a model normally means that the performance will increase, but in practice, the situation is changing. Thus, if w [l þ 2] ¼ 0, according to the new theory, a [l þ 2] ¼ b [l þ 2]. This causes a problem in which the derivative produces 0 value. It is undesirable [31] . However, now the value feed optimizes the learning error even if the value a [l] from the two previous layers is 0, and the network is trained faster. The architecture of ResNet50 consists of replacing both layer blocks in a 34-layer network with a three-layer bottleneck layer. This 50-layer network contains 3.8 billion floating point operations per second (FLOPs).

ResNet101 and ResNet152 build more layers than three-layer blocks use. ResNet152 contains 11.3 billion FLOPs. It currently has a lower complexity than VGG16/19 networks [31] .

ResNet architecture was the winner in a study on the CIFAR10 dataset with 10 classes, 50,000 educational images, and 10,000 test images [33] .

As the depth of CNN models increases, the problem of increasing gradients that vanish and explode affects the convergence of CNN models.

This architecture consists of residual blocks shown in Fig. 24. 2. In the residual block, the convolution of input x yields an F (x) result after the ReLU convolution series. This result is then added to the original x entry and is expressed as

The ResNet50 model provides easy training and a significant advantage of use owing to learning residuals from images instead of features [31] .

While neural networks are being trained, feature maps decrease owing to convolution and subsampling processes. At the same time, there are losses in the image feature in transitions between layers. To use image information more effectively, Huang et al. [34] developed the DenseNet system.

In the system, each layer is fed forward to the other layers. In this way, any layer l can access the property information of all layers before it. It is possible to see the general network architecture of densely connected neural networks in Fig. 24 .3 [34] .

. ; X lÀ1 Þ 24.3 FIGURE 24 .3 DenseNet architecture [34] . BN, batch norm; conv, convolution.

Layers ½X 1 ; X 2 ; X 3 ; .; X lÀ1 are a combination of property information. H l is the transfer function used to process property information. In this way, the rate of propagation of the features increases; it becomes easier to use again and the number of parameters decreases significantly.

In 2013, Lin et al. [35] proposed an innovative solution to the calculation complexities of previous models with the 1 Â 1 convolution process. The idea behind Inception is to provide an optimal local sparse structure in a convolutional vision network that can be approximated and covered by readily available dense components [36] . The basic idea of 1 Â 1 convolution is to reduce the number of channels in the image. Thus, the number of parameters is also reduced. Normally 1 Â 1 convolution does not affect the size of the matrix; however, if the input matrix is multichannel, the channel number of the output matrix of the 1 Â 1 convolution process is equal to the number of channels of the 1 Â 1 convolution filter applied.

The network model named Inception consists of several modules. In each module, different sizes of convolution and maximum partnering are applied. In the literature, GoogLeNet is also known as InceptionV1. Later versions called InceptionV2, InceptionV3 [37] , and InceptionV4 [38] were also developed. In InceptionV3 architecture, there are two parts: feature extraction and classification.

The feature extraction section includes the CNN. InceptionV3 is a well-known architecture; the input of the network should be an image of 299 Â 299 pixels. On the other hand, the classification section is fully connected and includes Softmax layers.

The start modules consist of a 3 Â 3 maximum joint and 1 Â 1, 3 Â 3, and 5 Â 5 convolution layers, as indicated in Fig. 24 .4. After applying this Inception layer on the data from the previous layer, the network collects filters at the output and transfers them to the next layer. with the contribution of this module, both general and specific properties of an object may be discovered. In Fig. 24 .4, the boxes with number 1 in parentheses is the Inlet and the top one is the output of the model.

The cost of calculation increases because there will be too many parameters during the convolution process in the Inception structure in Fig. 24.4 . To overcome this problem, as shown in Fig. 24 .5, 1 Â 1 convolution was applied to dimension reduction. 

In this work, we used an open public dataset of chest X-ray and CT images of COVID-19 patients, which were positive or suspected for COVID-19 or other viral and bacterial types of disease (Middle East respiratory syndrome, severe acute respiratory syndrome, and acute respiratory distress syndrome [ARDS]). The dataset is collected from public sources as well as through indirect collection from hospitals and physicians. All images and data are published openly in the GitHub repository. The original dataset contains both X-ray images and CT scans. In this application, we chose the X-ray scan images to build a COVID-19 detection model. We used 50 positive and 50 negative COVID-19 posteroanterior X-ray scan images to create a training dataset and another 20 positive and 20 negative scans to test the model. The X-ray scan images were rescaled to a size of 256 Â 222. We implemented different image augmentation methods to produce more enhanced input instances, such as flipping righteleft and upedown, rotation, and translation using five random angles. Fig. 24.6 shows two examples from the dataset used in our experiments. The dataset used in the study can be said to contain insufficient X-ray scanning results. This dataset is the publicly accessible; it currently contains the most data and is tagged at the time of writing. In the future, both the number of datasets and the number of samples they contain will increase. Another issue is labeling. This dataset contains X-ray scan images containing the disease. There are only three records in the dataset with COVID-19e negative samples, and the "finding" field was chosen as no finding. Therefore, X-ray pictures with ARDS and Streptococcus results are tagged to select COVID-19enegative samples. In the future, we believe that samples of COVID-19enegative patients will increase in this type of data collection (Fig. 24.6) .

We used image augmentation techniques to increase samples in the dataset and improve the performance of the classification model. Our image augmentation parameters were a rotation range of 20, zoom range of 15, width shift range of 0.2, height shift 

Although the dataset used in our experiments is almost balanced, traditional accuracybased performance evaluation is not enough to find an optimal classifier. We used four different metrics (overall prediction accuracy, average recall, average precision, and F 1 score) to evaluate classification accuracy. These are common measurement metrics in machine learning [39e41].

Precision is defined as the fraction of retrieved samples that are relevant:

Recall is defined as the fraction of relevant samples that are retrieved:

The F 1 score is defined as the harmonic mean of precision and recall:

The dataset was divided into two groups: 80% to train the model and 20% to evaluate the classification performance. 

We used the dataset containing X-ray images of patients undergoing COVID-19 scanning, which can be accessed publicly on GitHub [42] . Because these data were obtained from different hospitals, the picture resolutions differed from each other. To overcome this problem, we resized the pictures to 256 Â 22. Because we used a CNN-based method, it was not affected by the adverse effects of this type of data compression.

Owing to the insufficient number of examples, we enriched the dataset by adding image augmentation methods such as flipping and different angles. In this way, we aimed to improve the performance of the classification model. In the last part of the transfer models we used, we added a heading model that we designed. In this way, we transformed the transfer learning models to detect COVID-19 patients from X-ray scanning images. The output of the classification model in this problem is in the form of binary classification such that it is positive and negative. We canceled optimization by training the layers belonging to the transfer models. The model we created is trained only by making parameter optimized to the heading model part. Findings in Table 24 .1 indicate that the pretrained VGG16 model provides the highest classification performance of automated COVID-19 classification with 80% accuracy compared with the other four proposed models. The VGG19, DenseNet and InceptionV3 models indicate the same performance of classification with an accuracy of 0.60, and the ResNet shows the lowest classification performance with 0.50 accuracy. The VGG19, DenseNet, and InceptionV3 models have the same performance for COVID-19 classification with F 1 scores of 0.55. The VGG16 model presents the highest classification performance with an 80% F 1 score.

Medical imaging such as X-ray and CT images as rapid diagnosis techniques have a key role in treating COVID-19. We first introduced the reason for using X-rays images in COVID-19 detection. We then described a few related studies on pretrained CNN methods using X-ray images. We used an open public dataset of chest X-ray and CT images to build COVID-19 detection. Because of the insufficient number of public COVID-19 datasets, we collected 50 positive and 50 negative COVID-19 X-ray images for training and 20 positive and 20 negative images for testing phases. We resized the pictures to be 256 Â 22. We enriched the dataset by adding image augmentation methods such as flipping and different angles. We proposed five pretrained deep CNN models such as VGG16, VGG19, ResNet, DenseNet, and InceptionV3, which are employed for transfer learning by using the X-ray images of COVID-19 patients. The pretrained VGG16 model provided the highest classification performance of automated COVID-19 classification with 80% accuracy compared with the other four proposed models. By showing our results in graphs and tables, we indicated the classification performance. The VGG19, DenseNet, and InceptionV3 models had the same performance of classification with an accuracy of 0.60, and the ResNet had the lowest classification performance with 0.50 accuracy.

The dataset used in the study contains an insufficient number of examples. Therefore, the limited dataset of COVID-19 X-ray images was used. By increasing the number of instances in the dataset, the model can achieve more accurate performance.

Clinical features of patients infected with 2019 novel coronavirus in

A review of coronavirus disease-2019 (COVID-19)

Coronavirus disease 2019 (COVID-19): a perspective from China

CT features of coronavirus disease 2019 (COVID-19) pneumonia in 62 patients in Wuhan, China

CT imaging features of 2019 novel coronavirus (2019-nCoV)

COVID-19): role of chest CT in diagnosis and management

Chest imaging appearance of COVID-19 infection

COVID-19 pneumonia: what has CT taught us?

Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks

COVIDiagnosis-Net: deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images

Automated detection of COVID-19 cases using deep neural networks with X-ray images

COVIDX-Net: A Framework of Deep Learning Classifiers to Diagnose COVID-19 in X-Ray Images

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images

COVID-19: Automatic Detection from X-Ray Images Utilizing Transfer Learning with Convolutional Neural Networks

Automatic Detection of Coronavirus Disease (COVID19) Using X-Ray Images and Deep Convolutional Neural Networks

Deep Learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) with CT Images

Deep Learning System to Screen Coronavirus Disease

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia, 2020 arXiv preprint arXiv

Detection of Coronavirus Disease (COVID-19) Based on Deep Features

Coronavirus (COVID-19) Classification Using CT Images by Machine Learning Methods

Deep Learning-Based Detection for COVID-19 from Chest CT Using Weak Label

Comprehensive Guide to Convolutional Neural Networks

Simple convolutional neural network on image classification

A study on CNN transfer learning for image classification

Accurate seat belt detection in road surveillance images based on CNN and SVM

A survey on transfer learning

Deep convolutional neural networks for image classification: a comprehensive review

Very Deep Convolutional Networks for Large-Scale Image Recognition

Visualizing and understanding convolutional networks

Deep Residual Learning for Image Recognition

Derin Ö grenme ile Kalabalık Analizi Ü zerine Detaylı Bir Araştırma

Learning Multiple Layers of Features from Tiny Images

Densely connected convolutional networks

Going deeper with convolutions

Rethinking the Inception Architecture for Computer Vision

Inception-v4, inception-resnet and the impact of residual connections on learning

User performance versus precision measures for simple search tasks

Available

Introduction to Information Retrieval

Performance measures for information extraction

COVID-19 Image Data Collection. ArXiv (2020)

Data augmentation based malware detection using convolutional neural networks