key: cord-0941816-pqcsaipv
authors: Sadik, Farhan; Dastider, Ankan Ghosh; Fattah, Shaikh Anowarul
title: SpecMEn-DL: spectral mask enhancement with deep learning models to predict COVID-19 from lung ultrasound videos
date: 2021-07-09
journal: Health Inf Sci Syst
DOI: 10.1007/s13755-021-00154-8
sha: 121bd0d4d5f7fa0687f74dafb3d88c73fdb2ce53
doc_id: 941816
cord_uid: pqcsaipv

Lung Ultrasound (LUS) images are considered to be effective for detecting Coronavirus Disease (COVID-19) as an alternative to the existing reverse transcription-polymerase chain reaction (RT-PCR)-based detection scheme. However, the recent literature exhibits a shortage of works dealing with LUS image-based COVID-19 detection. In this paper, a spectral mask enhancement (SpecMEn) scheme is introduced along with a histogram equalization pre-processing stage to reduce the noise effect in LUS images prior to utilizing them for feature extraction. In order to detect the COVID-19 cases, we propose to utilize the SpecMEn pre-processed LUS images in the deep learning (DL) models (namely the SpecMEn-DL method), which offers a better representation of some characteristics features in LUS images and results in very satisfactory classification performance. The performance of the proposed SpecMEn-DL technique is appraised by implementing some state-of-the-art DL models and comparing the results with related studies. It is found that the use of the SpecMEn scheme in DL techniques offers an average increase in accuracy and [Formula: see text] score of [Formula: see text] and [Formula: see text] , respectively, at the video-level. Comprehensive analysis and visualization of the intermediate steps manifest a very satisfactory detection performance creating a flexible and safe alternative option for the clinicians to get assistance while obtaining the immediate evaluation of the patients.

In December 2019, the world discovered a novel type of coronavirus causing viral pneumonia outbreak which quickly reached the global stage, and the World Health Organization (WHO) declared this Coronavirus Disease (COVID-19) as a global pandemic [1] . The rapid spread of this disease has created a worldwide emptiness in the medical capacity, demanding an efficient complementary scheme to detect COVID-19 at the earliest period and thereby curtailing its spread. The reverse transcription-polymerase chain reaction (RT-PCR) test, the gold standard for detecting COVID-19, is of limited capacity, time-consuming, and strictly dependent on swabcollection techniques [2] . Complementary attempts are aimed at using computed tomography (CT) scan, X-Ray, and Lung Ultrasound (LUS) images [3] [4] [5] . Considering the radiation hazards, cost, and flexibility, LUS is better than CT scans or X-Rays with even better performance in some cases than the others [6] . Therefore, introducing LUS imaging techniques in COVID-19 diagnosis by accurately separating it from pneumonia or regular healthy cases can be a vital step to fight the current pandemic by ensuring rapid care for the patients.

Most of the machine learning (ML)-based works on COVID-19 detection are devoted to analyze the LUS images through the classification into certain categories [7, 8] , sometimes followed by a supervised or unsupervised segmentation step. Supervised segmentation needs properly annotated data, which is a mammoth task; and a publicly available annotated LUS dataset related to COVID-19 is quite inadequate. Video-based grading and frame-based disease severity score prediction are other ways to deal with the LUS images [9] . However, investigations on COVID-19 detection through LUS Sadik et al. Health Inf Sci Syst (2021) 9:28 images are somewhat limited compared to the studies on other relevant imaging-based diagnostic fields. In [7] , a VGG16-based simple classification network namely POCOVID-Net is implemented with moderate classification performance. Here, the authors presented a collection of the open-source LUS dataset on COVID-19, namely POCUS dataset, which is getting richer day by day. In [10] , the same research group demonstrated explainable LUS image analysis on raw recordings for accelerating COVID-19 differential diagnosis. However, these attempts are quite straightforward and leave scope for improvement in this field. In [8] , a comparative analysis of COVID-19 detection performance is presented on CT scan, X-Ray, and ultrasound datasets, and it is concluded that the LUS images exhibit comparatively the best result in terms of disease detection accuracy. Here, the authors utilized the same dataset for LUS that was used in [7, 10] . It is to be noted that these studies were conducted considering only a selected portion of LUS images from the large dataset. An adroit classification scheme with efficacious detection performance under source-independent conditions is therefore much coveted to predict the disease class accurately and rapidly.

In this paper, an automatic scheme is proposed for classifying the LUS images into COVID-19, pneumonia, and regular/healthy categories. The main idea here is to develop an efficient LUS image enhancement scheme and utilize the resulting enhanced LUS images in the deep learning (DL)-based classification networks for achieving better classification performance. The pipeline of the proposed method is presented in Fig. 1 . For the purpose of LUS image enhancement, first, the contrast-limited adaptive histogram equalization (CLAHE) pre-processing is performed. A spectral mask enhancement (namely SpecMEn) scheme is proposed that generates a mask by utilizing the CLAHE enhanced images and the mask is then employed on each image plane to further reduce the effect of noise. The 3-channel pre-processed LUS images are headed towards the DL classification network. A thorough evaluation of the proposed scheme by both frame-level and video-level results, and comparison with related studies manifest its capability of enhancing the prediction performance of the classification networks.

In this paper, the point-of-care ultrasound (POCUS) dataset is utilized [7] , which is publicly available and an open-sourced dataset. It comprises various types of videos from the COVID-19, pneumonia, and regular/ healthy cases. The COVID-19 class includes some subclasses, such as pregnant cases, dialytic cases, and some unlabelled cases. Similarly, the pneumonia class includes viral pneumonia and unlabelled pneumonia cases. In this paper, all these three classes, namely COVID-19, pneumonia, and healthy cases are considered (excluding the unlabeled data). The dataset used in this study consists of 123 LUS videos. After extracting the frames from each video, a total of 41,529 images are obtained. Among them, 74 videos ( 60% ) with 27,920 frames are placed in the training set, and the other 49 videos ( 40% ) with 13,609 frames are placed in the testing set. The detailed distribution of these training and testing sets for each of the three classes is presented in Table 1 .

It is well known that noises can be introduced in ultrasound images during the data acquisition, transmission, storage, and retrieval processes. The presence of noise generally tends to reduce the image resolution and contrast, thereby reducing its diagnostic capability. Especially, the low-intensity regions of ultrasound images with very low contrast may create an obstacle to extract resolvable details for differentiating various classes. In order to reduce the effect of noise in ultrasound images, the contrast-limited adaptive histogram equalization (CLAHE) is employed, which is found to be very effective in enhancing the ultrasound images [8, 11, 12] . In the CLAHE method, better equalization in terms of maximum entropy is obtained and it limits the contrast of an image [13] . Here, the neighboring block boundaries are eliminated using a bilinear interpolation. The traditional adaptive histogram equalization (AHE) method over-amplifies the contrast in comparatively homogeneous regions of the image, resulting in an increased amount of noise as well [14] . Although the CLAHE method offers better performance in comparison to the performance of the AHE method, in many cases, overenhancement is observed. Due to the over-enhancement, noises may get boosted in some cases. In order to demonstrate the noise-reduction performance of the CLAHE pre-processing technique, in Fig. 2 , the LUS images of three different classes (COVID-19, pneumonia, and normal) are shown considering three cases: without using any pre-processing (raw images), using histogram equalization and using the CLAHE. It is observed from the figure that after applying the CLAHE pre-processing technique, exposure and contrast of the images are increased, which makes the darker portions of the images more visible. The CLAHE method shows a better performance in this regard, but a boost to noise still exists to some extent. In order to overcome this problem, in the proposed method, a spectral domain noise enhancement scheme is introduced.

In the proposed method, a spectral-domain LUS image enhancement approach is introduced prior to the classification stage, so that better detection performance is achieved. As discussed before, after the CLAHE-based pre-processing stage, both the significant features and noises may also get enhanced. However, the effect of noises should be eliminated while preserving the major features.

The strength of the Fourier transform in analyzing ultrasound image characteristics is widely known [15] . For this purpose, 2D spectral analysis is performed using the discrete Fourier transform (DFT) which is found to be very effective in analyzing ultrasound image characteristics [15] . For a 2D signal g(x, y), the DFT is defined as [16] : From a recent study on COVID-19 and pneumonia affected ultrasound images, it is experimentally found that COVID-19 affected ultrasound images exhibit some distinguishable sonographic features, such as thickening, blurring, and discontinuities in the pleural lines of the ultrasound images [17, 18] . The spectral domain representation of an image obtained by using the DFT follows the geometric structure in the spatial domain. For example, the high-frequency components arise due to the sharp intensity changes in the border regions. The Fourier spectrum of an ultrasound image is expected to exhibit bright rays emitting from the central frequency based on the information related to directions of dominant discontinuities in edges and other geometric textures [15] . Hence, the spectral analysis of ultrasound images under consideration can extract some key information and present that through some frequency components [16, 19] . By investigating the 2D spectral masks of several ultrasound images of three different classes: healthy, pneumonia, and COVID-19, it is found that the central region of the spectral mask exhibits significant differences among the three classes, as expected. More energy concentration is also observed in the central regions of the spectrum.

The pre-processed LUS images, following the implementation of CLAHE-based image enhancement, are resized into 128 × 128 × 3 RGB channels and converted into grayscale images. Considering the efficient implementation, 2D fast Fourier transform (FFT) is applied to the pre-processed image. In the resulting magnitude spectrum, it is observed that the low-frequency components exhibit a higher magnitude than that of the high-frequency components. More energy concentration is also observed in the central regions of the spectrum. As a result, a brighter area can be found near the central region. A rectangular window covering that lowfrequency region is used to adjust the magnitudes near the central region, as shown in Fig. 3 . Such a magnitude scaling operation helps to further control the contrast. This scaled magnitude spectrum is used along with the phase spectrum to construct the spectral mask. The enhanced image can be reconstructed from the spectral mask through inverse fast Fourier transform (IFFT), and the resulting single-channel grayscale image is found, the normalized version of which is then multiplied with the 3-channel CLAHE pre-processed LUS images. The resulting spectral mask enhanced 3-channel images are then employed in the DL models to perform the classification task. The proposed spectral mask enhancement stage followed by the DL models to classify the LUS images is termed as SpecMEn-DL method.

From the LUS images obtained after the SpecMEn stage, the target is now to extract effective features for classifying the images into three classes: COVID-19, pneumonia, and normal/regular. Instead of using the conventional ML-based techniques, in the proposed scheme, a deep convolutional neural network (CNN) architecture is employed. The CNN is capable of automatically pulling out multi-variant features and learning the spatial hierarchies of features using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. Back-propagation calibrates the weights of a neural network based on the error rate and helps to minimize the cost function in each iteration [20] .

There are various efficient deep CNN architectures available in the literature. The objective of this study is not to design a new deep CNN architecture, rather demonstrate the effectiveness of the proposed spectral mask enhancement (SpecMEn) scheme in classifying the LUS images into three classes (i.e. COVID-19, pneumonia, and normal) using state-of-the-art efficient deep CNN models (overall SpecMEn-DL method). For this purpose, DenseNet-201 [21] , ResNet-152V2 [22] , Xception [23] , VGG19 [24] , and NasNetMobile [25] architectures pre-trained with ImageNet [26] are architecture is a very large neural network with three fully connected layers at the closure path along with a softmax function for classification. Due to its depth, it is very troublesome to train. Images are passed through a stack of convolutional layers, where the kernels are used with a very small receptive area which takes a lot of time to calculate. The NasNetMobile or NasNet-A architecture, a family of NasNet that utilizes the NAS framework, is used for both frame-based and videolevel detection due to its high performance. The Neural Architecture Search (NAS) is a data-driven intelligent approach that allows the network block to learn from data through reinforcement learning instead of experiments.

The proposed method is implemented to classify the LUS images into three classes for both frame-level and video-level results. Following the frame-level stage, analysis is carried out at the video-level as well. Each LUS video contains a large number of frames but one particular class label among the three possible classes and predicting that particular video class is the ultimate target of a classification network.

In the LUS video dataset, only the labeling for each video is available. As a result, all the images of a particular video are assigned to the same class according to the class label of the video. For example, if a COVID-19 labeled video contains 300 image frames, all the 300 frames are considered as COVID-19. But in real life LUS-imaging, it is not necessary that all the frames of a COVID-19 labeled video must exhibit the characteristics of COVID-19. There may be some frames that contain normal or pneumonia characteristics. A COVID-19 labeled LUS video may contain frames that depict a healthy condition, whereas the rest of the frames are infectious. In this study, all the frames of a certain class of video are considered as members of that class; as individual frames are not annotated in the dataset, rather the videos are labeled as a whole. During the testing phase, individual video-level cases are considered where for each video, a decision-tree approach is followed to predict the class label. The analysis is performed in two steps, where a thresholding approach is applied at the first step to detect whether the frames of a certain video are healthy or not. If the number of frames in a video predicted as healthy crosses beyond a threshold, it is termed as normal or healthy case. If it does not cross the threshold, the decision on whether it is a COVID-19 case or pneumonia case is made by analyzing the predictions made on the other two types. The process is repeated for various threshold values and the results are presented for each of the thresholds.

The deep learning models used in the proposed scheme are trained with a learning rate of 0.002, batch size 64, and the number of epochs 30. The Adam Optimizer [27] is used in each of the stages as an optimizing function. Various types of data augmentation are utilized at the training phase including rotation, horizontal and vertical shifts, scaling, and flips. The categorical cross-entropy loss function [28] is applied to calculate loss between ground truth labels and predicted results.

The performance of the proposed method is evaluated on a test set consisting of 13,609 frames acquired from 49 LUS videos available in [7] . The models trained on 27,920 frames from 74 LUS videos are used to predict the frames into one of the three classes: COVID-19, pneumonia, and regular or healthy cases. Some standard statistical measures, such as the accuracy, sensitivity, specificity, and F 1 score are considered as the parameters for evaluating the performance of the proposed method. Five deep learning architectures, namely the DenseNet-201, VGG16, Xception, ResNet152V2, and NasNetMobile are employed to train through the proposed strategy and applied on the test set. Detailed results obtained for each of these five models are presented in Table 2 considering the two cases: with and without using the proposed spectral mask enhancement scheme (SpecMEn). The average increase in COVID-19 detection accuracy ranges up to 4% as noticeable from the Table. For example, in the case of the Xception model, for all three classes, each performance measure exhibits higher values when the proposed SpecMEn is applied, except a slightly lower specificity value in the regular class (0.952 and 0.949). A similar scenario is observed for the case of NasNetMobile, with relatively low accuracy in comparison to that achieved with the Xception model. It can be concluded that by using the proposed SpecMEn scheme, a significant increase is obtained in most of the performance evaluating parameters.

The models are trained to perform two-class classification by categorizing the images into healthy and diseased (COVID-19 and pneumonia) classes. The overall accuracy, weighted sensitivity, specificity, and F 1 score are presented in Table 3 . For all the five models, the use of the proposed technique achieves a propitious performance by improving all the evaluating parameters in a congruous manner, as distinct by the results. 

The ultimate goal of a classification network is to predict the class of an individual video through a certain decision, based on the frame-based results. In order to achieve this goal, the proposed classification scheme is implemented on the videos separately to check its efficacy at the individual video-level as well. For this purpose, the same 49 videos used for the frame-based results are tested by espousing the methodology presented in "Decision tree scheme for video-level detection" section. It is observed from the results in both Table 2 and  3 that NasNetMobile provides the closest result to the proposed technique, with slightly improved performance in a few parameters. Thence, the NasNetMobile model is considered for conveying the picture of improvement by the proposed method at the video-level.

The results for each of the thresholds are shown in Table 4 . It is evident from the table that for the threshold of 60% , i.e., when the individual video is predicted as normal if > 60% frames of that individual video are predicted as normal, the best results are achieved. Examining the results, an unvarying hike is conspicuous with an increase of 8.1% , 6.4% , and 8.6% in accuracy, specificity, and F 1 score, respectively. The improvement in accuracy, sensitivity, and specificity for the three cases separately are shown in Fig. 4 . The COVID-19 cases are predicted with an accuracy of 85.26 ± 0.9% by the proposed technique, whereas it becomes 75.28 ± 1.46% for NasNetMobile only. The average increase in sensitivity and specificity for COVID-19 prediction is 7.64% and 11% , respectively, compared with the traditional NasNetMobile model alone. Similarly, pneumonia and regular cases are predicted with the accuracies of 87.76% and 89.80% , which are respectively 2% and 8% greater than that achieved by the NasNetMobile. After the threshold of 0.6, the model converges and the same result is achieved for both 0.55 and 0.50 thresholds. The consistent improvement is noticeable in the visualization of the performance evaluating parameters in Fig. 4 . For the various thresholds, the average classification accuracy, specificity, and F 1 score increases by an average of 11% , 7.96% , and 11.75% , respectively, than the NasNetMobile model alone.

It is to be noted that the proposed technique is implemented on a large scale of data with all the 49 videos, comprised of a total of 13,609 frames. This is the very first study following the training and testing on this massive amount of LUS frames to the best of our knowledge. Among the videos, 3 COVID-19 videos are falsely predicted by both NasNetMobile and NasNetMobile+SpecMEn as pneumonia and 2 videos as healthy. Result deviates from the usual format in these unique cases where most of the frames are predicted wrongly. Reshaping the test set with the elimination of certain frames from these unique sources tremendously increases both individual and overall accuracy. However, the proposed technique is employed regardless of these to convey the true picture of the efficacy of this model. 

The studies related to automatic prediction based on LUS datasets are limited until now. In [8] , a relatively small dataset is used to consider the LUS cases through two different experiments: (1) 226 normal vs. 235 COVID-19 and 220 pneumonia cases, and (2) 235 COVID-19 vs. 220 pneumonia cases. VGG19 model is trained to classify the images into two classes each time, three-class classification is not performed there. The train-test split in that study is a bit unclear. Although the dataset used in this study can hardly be compared with them, a relative presentation of classifying the LUS images into healthy and unhealthy (COVID-19 and pneumonia) is provided in Table 5 . In this study, 3662 COVID-19, 1485 pneumonia, and 8036 normal images are utilized in the testing set which is unseen at the training phase. For the same task, in [8] , the amount of data was 5% to ours, with 235 COVID-19, 220 pneumonia, and 225 normal images.

In both [7] and [10] , they utilized a selected portion of the POCUS dataset. In [7] , they utilized 654 COVID-19, 277 bacterial pneumonia, 172 healthy images from 64 videos; whereas in [10] , they utilized 693 COVID-19, 377 bacterial pneumonia, and 295 healthy images from 86 videos and 28 images. In both works, they gathered the images through manual processing with 30 frames per video as the maximum rate. It is apparent from our analysis in "Video-level results" section, that neglecting a portion of the dataset holds the capability of magnifying the overall performance tremendously. 

A novel coronavirus outbreak of global health concern. The Lancet

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

COVID-19 control by computer vision approaches: a survey

Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT

Is there a role for lung ultrasound during the COVID-19 pandemic?

Diagnostic use of lung ultrasound compared to chest radiograph for suspected pneumonia in a resource-limited setting

automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS)

COVID-19 detection through transfer learning using multimodal imaging data

Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound

Accelerating COVID-19 differential diagnosis with explainable ultrasound image analysis

Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement

Feature enhancement in medical ultrasound videos using contrast-limited adaptive histogram equalization

Ultrasound imaging: signal acquisition, new advanced processing for biomedical and industrial applications

Adaptive histogram equalization and its variations

Evaluation of machine learning methods with Fourier Transform features for classifying ovarian tumors based on ultrasound images

Biomedical signal and image processing

Point-of-care lung ultrasound in the assessment of patients with COVID-19: a tutorial

Use of lung ultrasound to differentiate coronavirus disease 2019 (COVID-19) pneumonia from community-acquired pneumonia

The scientist and engineer's guide to digital signal processing

Deep learning

Densely connected convolutional networks

Identity mappings in deep residual networks

Xception: deep learning with depthwise separable convolutions

Very deep convolutional networks for largescale image recognition

Learning transferable architectures for scalable image recognition

The authors would like to acknowledge the Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology (BUET) for providing constant support throughout this study.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

The authors declare that they have no conflict of interest to disclose. 

In this paper, a spectral-domain enhancement scheme along with a histogram equalization pre-processing technique is implemented to extract the noise-reduced LUS images, which are used in the DL-based classification networks. Instead of directly using the given LUS images, the proposed SpecMEn-DL scheme utilizes noise reduced LUS images which helps in extracting better features for the classification networks and enhancing the classification performance to a significant margin. For example, at the frame-level evaluation, the proposed SpecMEn-DL scheme can enhance the COVID-19 and pneumonia detection accuracy by up to 4-6% in both 3-class and 2-class problems. At the video-level, where a single prediction is done on a particular patient's video, the detection accuracy, specificity, and F 1 score improve drastically by an average of 11% , 7.96% , and 11.75% , respectively, in comparison to the results obtained by the traditional DL model. Rigorous analysis with five established DL models in the source-independent conditions is presented to appraise the skill of the proposed technique. Consistently promising performance in both frame-level and video-level results demonstrate the superior ability of the proposed scheme in automatic COVID-19 detection from the LUS data, which can be a vital tool in this ongoing pandemic. Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.