key: cord-0988806-suy0cjpi
authors: Wang, Xusheng; Gong, Cunqi; Khishe, Mohammad; Mohammadi, Mokhtar; Rashid, Tarik A.
title: Pulmonary Diffuse Airspace Opacities Diagnosis from Chest X-Ray Images Using Deep Convolutional Neural Networks Fine-Tuned by Whale Optimizer
date: 2021-12-01
journal: Wirel Pers Commun
DOI: 10.1007/s11277-021-09410-2
sha: 87608e9639c5a0123c42fda113e144b8571e9187
doc_id: 988806
cord_uid: suy0cjpi

The early diagnosis and the accurate separation of COVID-19 from non-COVID-19 cases based on pulmonary diffuse airspace opacities is one of the challenges facing researchers. Recently, researchers try to exploit the Deep Learning (DL) method’s capability to assist clinicians and radiologists in diagnosing positive COVID-19 cases from chest X-ray images. In this approach, DL models, especially Deep Convolutional Neural Networks (DCNN), propose real-time, automated effective models to detect COVID-19 cases. However, conventional DCNNs usually use Gradient Descent-based approaches for training fully connected layers. Although GD-based Training (GBT) methods are easy to implement and fast in the process, they demand numerous manual parameter tuning to make them optimal. Besides, the GBT’s procedure is inherently sequential, thereby parallelizing them with Graphics Processing Units is very difficult. Therefore, for the sake of having a real-time COVID-19 detector with parallel implementation capability, this paper proposes the use of the Whale Optimization Algorithm for training fully connected layers. The designed detector is then benchmarked on a verified dataset called COVID-Xray-5k, and the results are verified by a comparative study with classic DCNN, DUICM, and Matched Subspace classifier with Adaptive Dictionaries. The results show that the proposed model with an average accuracy of 99.06% provides 1.87% better performance than the best comparison model. The paper also considers the concept of Class Activation Map to detect the regions potentially infected by the virus. This was found to correlate with clinical results, as confirmed by experts. Although results are auspicious, further investigation is needed on a larger dataset of COVID-19 images to have a more comprehensive evaluation of accuracy rates.

feature is related to its automatic characteristic: users do not need any knowledge about DCNN's structure. However, this method's main drawback is that the GA's chromosomes become too large in large DCNNs, which slows down the algorithm.

The Social-spider optimization algorithm has optimized a deep ANFIS network to predict the biochar yield; however, this combination confronts the ill-conditioning problem [64] .

It is worth mentioning that there are other algorithms and techniques tried to improve the performance of deep networks in different fields of study, including the multi-objective transportation [65] , Improved feature selection technique [66] , a firefly-based algorithm with Levy distribution [67] , discrete transforms with selective coefficients [68] , signcryption technique [69] , equivalent transfer function [70] , the enhanced whale optimization algorithm [71] . However, all mentioned methods suffer from high computational complexity and unreliability dealing with image processing problems with high-dimension search space.

Considering the aforementioned limitations and shortcomings, our proposed approach includes training a DCNN network from a draft on the COVIDetectioNet dataset [72] to learn to classify infected and normal X-ray images. Subsequently, the last fully-connected layer of the pre-trained DCNN will be replaced by a new fully-connected layer, which is fine-tuned with WOA [73] . In this regard, a particular network representation will be introduced by searching agents. However, in the residual layers of the pre-trained DCNN, the other weights are preserved, which causes training a linear model with the features produced in the prior layer. The proposed scheme is as follows:

• Exert the COVIDetectioNet dataset to the convolutional DCNN and load its pre-trained weights. • Substitute the final fully-connected layer of the pre-trained DCNN with the WOAtrained fully-connected layer. • Preserve the remaining layers' weights. • Finally, retrain the whole model using WOA For the rest of this research paper, the organization is as follows. In Sect. 2, we review some background materials. Section 3 introduces the proposed scheme. Section 4 presents the simulation and discussion results, and finally, conclusions are presented in Sect. 5.

The background knowledge, including the WOA algorithm, the DCCN, and the COVID-Xray dataset, will be represented in this section.

Generally, WOA is a novel Swarm Intelligence (SI) optimization algorithm that mathematically formulates humpback whales' hunting manner [73] . The distinct difference between WOA and other benchmark optimization algorithms is the updating model to improve the candidate solutions. This mathematical model contains searching and attacking prey called bubble-net feeding. The bubble-net feeding behavior is shown in Fig. 1 .

This figure shows that intelligent hunting is carried out by generating a trap by moving in a spiral "9-shape" way around the prey and generating bubbles along the spiral path. The encircling is another hunting mechanism of humpback whales. Firstly, WOA initiates the hunting process by encircling the prey using the bubble-net feeding mechanism. The mathematical model of WOA is as follows [73] :

where X is a vector presenting the positions of whales during iterations and X* is a vector indicating locations of the prey, r represents a random number inside [0,1], indicating the distance of the ith searching agent and the prey, k is a constant number determining the bubble-net spiral shape, l is a random number inside [-1, 1], t indicating the current iteration, Q is a vector which linearly decreases from two to zero during iterations, and R is a random vector inside [0, 1]. While r < 0.5, Eq. (1) simulates the encircling behavior; otherwise, the equation intends to model the bubble-net feeding mechanism. Therefore, r randomly exchanges the model behavior between these two main phases.

(1) (t + 1) = * (t) − r < 0.5 � cos(2 t) + * (t) r > 0.5 Fig. 1 The bubble-net feeding mechanism of hunting.

Generally, DCNN is a conventional Multi-Layer Perceptron (MLP) based on three concepts: connection weights sharing, local receive fields, and temporal/spatial sub-sampling. These concepts can be arranged into two classes of layers, including subsampling layers and convolution layers. As shown in Fig. 2 , the processing layers include three convolution layers C1, C3, and C5, which are located one between layers S2 and S4, and final output layer F6. These sub-sampling and convolution layers are organized as feature maps.

Neurons in the convolution layer are linked to a local receptive field in the prior layer. Consequently, neurons with identical feature maps (FMs) receive data from various input regions until the input is completely skimmed. However, the same weights are shared.

In the sub-sampling layer, the FMs are spatially down-sampled by a factor of 2. As an illustration, in layer C3, the FM of size 10 × 10 is sub-sampled to conforming FM of size 5 × 5 in the next layer, S4. The classification process is the final layer (F6).

Each FMs are the outcome of a convolution from the previous layer's maps by their corresponding kernel and a linear filter in this structure. The weights w k and adding bias bk generate the k th (FM) FM k ij using the tanh function as Eq. (7).

By reducing the resolution of FMs, the sub-sampling layer lead to spatial invariance, in which each pooled FM refers to one FM of the prior layer. The sub-sampling function is defined as Eq. (8) .

where n×n i are the inputs, and b are trainable scalar and bias, respectively. After a diverse convolution and sub-sampling layer, the last layer is a fully connected structure that carries out the classification task. There is one neuron for each output class. Thereby, in the case of the Covid-19 dataset, this layer contains two neurons for their classes. 

In this paper, a dataset named COVID X-ray-5 k dataset, including 2084 training and 3100 test images, was utilized [74] . In this dataset, considering radiologist advice, only anterior-posterior Covid-19 X-ray images are utilized because the lateral images are not applicable for detection purposes. Expert radiologists evaluated those images, and those that did not have clear signs of COVID-19 were removed. In this way, 19 images out of the 203 images were removed, and 184 images remained, indicating clear signs of COVID-19. With this method, the community a more clearly labeled dataset was introduced. Of these 184 photos, 100 images are considered for the test set, and 84 images are intended for the training set. For the sake of increasing the number of COVID-19 samples to 420, data augmentation is applied. Since the number of non-COVID images was minimal in the covid-chestxray-dataset [75] , the supplementary ChexPert dataset [76] was employed. This dataset includes 224,316 chest X-ray images of 65,240 patients. In this dataset, 2000 and 3000 non-COVID images were chosen for the training set and test set, respectively. The final number of images related to various classes is reported in Table 1 . Figure 3 indicates six stochastic sample images from the COVID-Xray-5k dataset, including two COVID-19 and four standard samples. Test set 100 3000

Six stochastic sample images from the COVID-X-ray-5 k dataset

Generally, there are two main issues in tuning a deep network using a meta-heuristic optimization algorithm. First, the structure's parameters must be represented by the metaheuristic algorithm's searching agents (candid solution). Next, the fitness function must be defined based on the interest of the considered problem [77] . The presentation of network parameters is a distinct phase in tuning a DCNN using the WOA algorithm. Thereby, important parameters of the DCNN, i.e., weights and biases of the fully connected layer, should be determined to provide the best detection accuracy [78] .

To sum up, the WOA algorithm optimizes the last layer's values of weights and bias used to calculate the loss function as a fitness function. The weight and bias values in the last layer are used as searching agents in the WOA algorithm.

Generally speaking, three schemes are used to present weights and biases of a DCNN as candid solutions of the meta-heuristic algorithm: vector-based, matrix-based, and binary state. Considering the fact that the WOA needs the parameters in a vector-based model, in this paper, the candid solution is shown as Eq. 9 and Fig. 4 .

where n is the number of the input nodes, W ij indicates the connection weight between the i th input node and the j th hidden neuron, b j is the jth hidden neuron's bias, and Mjo shows the connection weight from the j th hidden neuron to the o th output neuron. As previously stated, the proposed architecture is a simple LeNet-5 structure. In this section, two Fig. 4 Assigning the DCNN's parameters as the candid solution (searching agents) of WOA structures, namely i-6c-2 s-12c-2 s and i-8c-2 s-16c-2 s, are used where C and S are convolution and sub-sampling layers, respectively [79, 80] . The kernel size of all convolution layers is 5 × 5, and the scale of sub-sampling is down-sampled by a factor of 2.

The WOA algorithm trains DCNN (DCNN-WOA) to obtain the best accuracy and minimize evaluated classification error and network complexity in the proposed meta-heuristic method. This objective can be computed by the loss function of the metaheuristic searching agent or the Mean Square Error (MSE) classification procedure. However, the loss function used in this method is as follows:

where o shows the supposed output, u indicates the desired output, and N shows the number of training samples. Two termination criteria: reaching maximum iteration or predefined loss function, are utilized by the proposed WOA algorithm. Consequently, the pseudo-code of DCNN-WOA is shown in Fig. 5 .

As stated before, the principal goal is to improve the classification accuracy of classic DCNN by using the WOA algorithm. According to references [25, 73] and also in order to have a fair comparison, the population is set to 10, and the maximum iteration is set to 10. The parameter of DCNN, i.e., the learning rate α and the batch size, are equal to 1 and 100, respectively. Also, the number of epochs is considered between 1 and 10 for every evaluation. The evaluation was carried out in MATLAB-R2019a, on a PC with processor Intel Core i7-4500u, 16 GB RAM running memory, in Windows 10, with five separate runtimes.

As proven in the literature [81, 82] , the accuracy rate does not represent enough information about the detector's performance. Thereby, Receiver Operating Characteristic (ROC) curves were used to exploit the classifier on all the samples in the test datasets, devoting an evaluated probability of images P T for each sample. Next, a threshold value T ∈ [0, 1] was introduced, and for each threshold value, the detection rate was calculated. Thereby, the computed values were plotted as a ROC curve. Generally speaking, the area under the ROC curve (AUC) shows the probability of correct detection. Figure 6 indicates the calculated ROC curves for the detection of COVID-19 samples using DCNN-WOA and conventional DCNN. This comparison was carried out because the test dataset, the initial conditions, and the primary convolutional network (LeNet-5 DCNN) are entirely identical. Therefore, the effectiveness of the WOA algorithm on classic LeNet-5 DCNN can be fairly compared. The ROC curves show that DCNN-WOA significantly outperforms LeNet-5 DCNN on the test dataset.

The training was carried out ten times so that the training times varied between 5 and 10 min, and the designed DCNN-WOA had the detection accuracy of between 98.11% and 99.38% on the COVID-19 validation set.

Because of the extensive range of different results, the ten trained DCNN-WOA are ensemble by weighted averaging, using the validation accuracy as the weights. The To further evaluate the performance of DCNN-WOA in the detection of COVID-19 samples from uninfected ones, newly proposed benchmark models include conventional DCNN [83] , DUICM [84] , and matched subspace classifier with adaptive dictionaries [85] are utilized. Figures 7 and 8 show the outcome ROC and precision-recall curve for i-6c-2s-12c-2s and i-8c-2s-16c-2s structures, respectively.

As can be seen from these figures, the DCNN-WOA detector indicates outstanding COVID-19 detection results compared with other benchmark models. For the sake of comparison, the proposed DCNN-WOA provides over 98.25% correct COVID-19 sample detection for less than a 1.75% false alarm detection rate, which shows the WOA algorithm's capability to increase the performance of the DCNN model.

Generally, the precision-recall plot shows the tradeoff between recall and precision for various threshold levels [86] . A high area under the precision-recall curve represents both high precision and recall, where high precision indicates a low false-positive rate, and high recall indicates a low false-negative rate [86, 87] . As can be observed from the curves in Figs. 7 and 8, DCNN-WOA has a higher area under the precision-recall curves. Therefore, it indicates a lower false positive and false negative rate than other benchmark detectors. . It is evident that as the number of epochs increases, the time efficiency of the WOA is more prominent because the stochastic nature of the WOA algorithm leads to decreasing the complexity of the search space. It should be pointed out that the results of the i-8c-2s-16c-2s structure indicated in Tables 4 and 5 approve the prior conclusion for the i-8c-2s-16c-2s network. Consequently, WOA can improve the performance of DCNN with the i-8c-2s-16c-2s structure as well as the i-6c-2s-12c-2s structure.

From the viewpoint of data science experts, the best result could be indicated in terms of the confusion matrix, overall accuracy, precision, recall, ROC curve, etc. However, these optimal results might not be sufficient for medical specialists and radiologists if the results cannot be interpreted. Identifying the Region of Interest (ROI) leading to the network's decision-making will enhance medical experts' understanding [88] . In this section, the results provided by designed networks for the COVID X-ray-5k dataset were investigated. The Class Activation Mapping (CAM) results were displayed for the COVID X-ray-5k dataset to localize the areas suspicious of the Covid-19 virus. To emphasize the discriminative regions, the probability predicted by the DCNN model for each image class gets mapped back to the last convolutional layer of the corresponding model that is particular to each class. The CAM for a determined image class is the outcome of the activation map of the Rectified Linear Unit (ReLU) layer following the last convolutional layer. It is identified by how much each activation mapping contributes to the final grade of that particular class. The novelty of CAM is the total average pooling layer applied after the last convolutional layer based on the spatial location to produce the connection weights. Thereby, it permits identifying the desired regions within an X-ray image that differentiates the class specificity preceding the Softmax layer, which leads to better predictions.

Demonstration using CAM for DCNN models allows the medical specialists and radiology experts to localize the areas suspicious of the Covid-19 virus indicating in Figs. 9 and 10. Figures 9 and 10 indicate the results for Covid-19 detection in X-ray images. Figure 9 shows the outcomes for the case marked as 'Covid-19' by the radiologist, and the DCNN-WOA model predicts the same and indicates the discriminative area for its decision. Figure 10 shows the outcomes for a 'normal' case in X-ray images, and different regions are emphasized by both comparing models for their prediction of the 'normal' subset. Now, medical specialists and radiology experts can choose the network architecture based on these decisions. This kind of CAD visualization would provide a useful second opinion to the medical specialists and radiology experts and also improve their understanding of deep learning models.

In this paper, the WOA was proposed to design an accurate DCNN model for positive Covid-19 X-ray detection. The designed detector was benchmarked on the COVID-Xray-5k dataset, and the results were evaluated by a comparative study with classic DCNN, DUICM, and MSAD. The results indicated that the designed detector could present very competitive results compared to these benchmark models. The concept of Class Activation Map (CAM) was also applied to detect the virus's regions potentially infected. It was found to correlate with clinical results, as confirmed by experts. A few research directions can be proposed for future work with the DCNN-WOA, such as underwater sonar target detection and classification. Also, changing WOA to tackle multi-objective optimization problems can be recommended as a potential contribution. The investigation of the chaotic maps' effectiveness to improve the performance of the DCNN-WOA can be another research direction. Although the results were promising, further investigation is needed on a larger dataset of COVID-19 images to have a more comprehensive evaluation of accuracy rates.

Funding Not Applicable.

The resource images can be downloaded using the following link and references [72] . https:// github. com/ ieee8 023/ covid-chest xray-datas et, 2020.

The source code of the models can be available by request.

The authors declare that there is no conflict of interest regarding the publication of this paper. 

The impact of mortality salience on quantified self behavior during the COVID-19 pandemic

A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy

Osteopontin as a multifaceted driver of bone metastasis and drug resistance

Intelligent diagnostic prediction and classification system for chronic kidney disease

Multi-layer security of medical data through watermarking and chaotic encryption for tele-health applications

A novel model for evaluation Hospital medical care systems based on plithogenic sets

Cosine similarity measures of bipolar neutrosophic set for diagnosis of bipolar disorder diseases

Real-time detection of cole diseases and insect pests in wireless sensor networks

Transport of intensity phase retrieval and computational imaging for partially coherent fields: The phase space perspective

High-resolution transport-of-intensity quantitative phase microscopy with annular illumination

Effective features to classify ovarian cancer data in internet of medical things

Optimal bilateral filter and convolutional neural network based denoising method of medical image measurements

An evolutionary lion optimization algorithm-based image compression technique for biomedical applications

Highly reliable and low-complexity image compression scheme using neighborhood correlation sequence algorithm in WSN

COCO enhances the efficiency of photoreceptor precursor differentiation in early human embryonic stem cell-derived retinal organoids

Evolving deep convolutional neutral network by hybrid sine-cosine and extreme learning machine for real-time COVID19 diagnosis from X-ray images

Multi-object detection and tracking (MODT) machine learning model for realtime video surveillance systems

Optimal feature level fusion based ANFIS classifier for brain MRI image classification

The research on 220GHz multicarrier high-speed communication system

DTCNNMI: A deep twin convolutional neural networks with multi-domain inputs for strongly noisy diesel engine misfire detection

Feedback convolutional network for intelligent data fusion based on near-infrared collaborative IoT technology

Deep learning with LSTM based distributed data mining model for energy efficient wireless sensor networks

A framework for big data analysis in smart cities

Evolving deep learning convolutional neural networks for early COVID-19 detection in chest X-ray images. Mathematics, 9

Realtime COVID-19 diagnosis from X-Ray images using deep CNN and extreme learning machines stabilized by chimp optimization algorithm

An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment. Software Practice and Experience

Application of neural network algorithm in fault diagnosis of mechanical intelligence

Cryptographic keys exchange model for smart city applications

Deep learning model for real-time image compression in Internet of Underwater Things (IoUT)

Computer network security evaluation simulation model based on neural network

Extending self-organizing network availability using genetic algorithm

Recognizing human activity in mobile crowdsensing environment using optimized k-NN algorithm

Bioenergetic crosstalk between mesenchymal stem cells and various ocular cells through the intercellular trafficking of mitochondria

Semantic-k-NN algorithm: An enhanced version of traditional k-NN algorithm

Numerical study on hysteretic behaviour of horizontal-connection and energy-dissipation structures developed for prefabricated shear walls

On optimization methods for deep learning

Unsupervised model for detecting plagiarism in internet-based handwritten Arabic documents

Development of pressure-impulse models and residual capacity assessment of RC columns using high fidelity Arbitrary Lagrangian-Eulerian simulation. Engineering Structures

Development of 340-GHz Transceiver Front End Based on GaAs Monolithic Integration Technology for THz Active Imaging Array

Deep learning via Hessian-free optimization

A hybrid artificial intelligence and internet of things model for generation of renewable resource of energy

Multi-objective feature selection for microarray data via distributed parallel algorithms

MOF-BC: A memory optimized and flexible blockchain for large scale networks

Automatic removal of complex shadows from indoor videos using transfer learning and dynamic thresholding

Experimental and numerical investigation on the complex behaviour of the localised seismic response in a multi-storey plan-asymmetric structure

Dual watermarking framework for privacy protection and content authentication of multimedia

Trust based cluster head election of secure message transmission in MANET using multi secure protocol with TDES

NoSQL injection attack detection in web applications using RESTful service

Efficient fire detection for uncertain surveillance environment

An efficient hierarchical clustering protocol for multihop Internet of vehicles communication

A fast learning algorithm for deep belief nets

Region-based scalable smart system for anomaly detection in pedestrian walkways

Chimp optimization algorithm. Expert Systems with Applications

Loan portfolio optimization using genetic algorithm: a case of credit constraints

Hybridization of firefly and improved multi-objective particle swarm optimization algorithm for energy efficient load balancing in cloud computing environments

Extended genetic algorithm for solving open-shop scheduling problem

A novel 220-GHz GaN diode onchip tripler with high driven power

The genetic convolutional neural network model based on random sample

Optimizing robot path in dynamic environments using genetic algorithm and bezier curve

Fine-tuning convolutional neural networks using Harmony Search

Unsupervised person re-identification: Clustering and fine-tuning

Automatically designing CNN architectures using the genetic algorithm for image classification

Social-spider optimization algorithm for improving ANFIS to predict biochar yield

A multi-objective transportation model under neutrosophic environment

Improved feature selection model for big data analytics

Intelligent firefly-based algorithm with Levy distribution (FF-L) for multicast routing in vehicular communications

Application of discrete transforms with selective coefficients for blind image watermarking

Reliable data transmission model for mobile ad hoc network using signcryption technique

Optimal tuning of decentralized fractional order PID controllers for TITO process using equivalent transfer function

An enhanced whale optimization algorithm for vehicular communication networks

COVIDetectioNet: COVID-19 diagnosis system based on X-ray images using features selected from pre-learned deep features ensemble

The whale optimization algorithm

Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning

Covid-19 image data collection: Prospective predictions are the future

CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp

Self-maintenance model for wireless sensor networks

Trust-based secure clustering in WSN-based intelligent transportation systems

Four-hundred gigahertz broadband multi-branch waveguide coupler

A mechanical reliability study of 3dB waveguide hybrid couplers in the submillimeter and terahertz band

Training RBF NN using sine-cosine algorithm for sonar target classification

Maximum undeformed equivalent chip thickness for ductile-brittle transition of zirconia ceramics under different lubrication conditions

Classification of anti-submarine warfare sonar targets using a deep neural network. Ocean

DUICM deep underwater image classification mobdel using convolutional neural networks

Underwater unexploded ordnance (UXO) classification using a matched subspace classifier with adaptive dictionaries

Walnut fruit processing equipment: academic insights and perspectives

Experimental evaluation of the lubrication properties of the wheel/workpiece interface in minimum quantity lubrication (MQL) grinding using different types of vegetable oils

CovXNet: A multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization