key: cord-0903422-szlje3za
authors: Owais, Muhammad; Baek, Na Rae; Park, Kang Ryoung
title: DMDF-Net: Dual Multiscale Dilated Fusion Network for Accurate Segmentation of Lesions Related to COVID-19 in Lung Radiographic Scans
date: 2022-05-02
journal: Expert Syst Appl
DOI: 10.1016/j.eswa.2022.117360
sha: 98ca653755f45e6df424690cb22271cc1a9956e2
doc_id: 903422
cord_uid: szlje3za

The recent disaster of COVID-19 has brought the whole world to the verge of devastation because of its highly transmissible nature. In this pandemic, radiographic imaging modalities, particularly, computed tomography (CT), have shown remarkable performance for the effective diagnosis of this virus. However, the diagnostic assessment of CT data is a human-dependent process that requires sufficient time by expert radiologists. Recent developments in artificial intelligence have substituted several personal diagnostic procedures with computer-aided diagnosis (CAD) methods that can make an effective diagnosis, even in real time. In response to COVID-19, various CAD methods have been developed in the literature, which can detect and localize infectious regions in chest CT images. However, most existing methods do not provide cross-data analysis, which is an essential measure for assessing the generality of a CAD method. A few studies have performed cross-data analysis in their methods. Nevertheless, these methods show limited results in real-world scenarios without addressing generality issues. Therefore, in this study, we attempt to address generality issues and propose a deep learning–based CAD solution for the diagnosis of COVID-19 lesions from chest CT images. We propose a dual multiscale dilated fusion network (DMDF-Net) for the robust segmentation of small lesions in a given CT image. The proposed network mainly utilizes the strength of multiscale deep features fusion inside the encoder and decoder modules in a mutually beneficial manner to achieve superior segmentation performance. Additional pre- and post-processing steps are introduced in the proposed method to address the generality issues and further improve the diagnostic performance. Mainly, the concept of post-region of interest (ROI) fusion is introduced in the post-processing step, which reduces the number of false-positives and provides a way to accurately quantify the infected area of lung. Consequently, the proposed framework outperforms various state-of-the-art methods by accomplishing superior infection segmentation results with an average Dice similarity coefficient of 75.7%, Intersection over Union of 67.22%, Average Precision of 69.92%, Sensitivity of 72.78%, Specificity of 99.79%, Enhance-Alignment Measure of 91.11%, and Mean Absolute Error of 0.026.

The recent disaster of COVID-19 has brought the whole world to the verge of devastation because of its highly transmissible nature. In this pandemic, radiographic imaging modalities, particularly, computed tomography (CT), have shown remarkable performance for the effective diagnosis of this virus. However, the diagnostic assessment of CT data is a humandependent process that requires sufficient time by expert radiologists. Recent developments in artificial intelligence have substituted several personal diagnostic procedures with computeraided diagnosis (CAD) methods that can make an effective diagnosis, even in real time. In response to COVID-19, various CAD methods have been developed in the literature, which can detect and localize infectious regions in chest CT images. However, most existing methods do not provide cross-data analysis, which is an essential measure for assessing the generality of a CAD method. A few studies have performed cross-data analysis in their methods.

Nevertheless, these methods show limited results in real-world scenarios without addressing generality issues. Therefore, in this study, we attempt to address generality issues and propose a deep learning-based CAD solution for the diagnosis of COVID-19 lesions from chest CT images. We propose a dual multiscale dilated fusion network (DMDF-Net) for the robust segmentation of small lesions in a given CT image. The proposed network mainly utilizes the strength of multiscale deep features fusion inside the encoder and decoder modules in a mutually beneficial manner to achieve superior segmentation performance. Additional pre-and post-processing steps are introduced in the proposed method to address the generality issues and further improve the diagnostic performance. Mainly, the concept of post-region of interest (ROI) fusion is introduced in the post-processing step, which reduces the number of falsepositives and provides a way to accurately quantify the infected area of lung. Consequently, the proposed framework outperforms various state-of-the-art methods by accomplishing superior infection segmentation results with an average Dice similarity coefficient of 75.7%,

Recently, the global disaster of coronavirus disease 2019 has grieved millions of people and triggered a socioeconomic crisis worldwide. According to the given figures of the WHO (World Health Organization, 2021. Coronavirus Disease (COVID-19) Dashboard, accessed on 18 June 2021), until June 18, 2021, approximately 176,693,988 positive cases of the COVID-19 virus, including 3,830,304 deaths (2.17% mortality rate), have been reported globally. Additionally, the advent of different variants of COVID-19 has further caused an alarming situation worldwide due to their more contagious impact. Regarding the treatment of COVID-19, different experimental vaccines (Kim et al., 2021) have finalized clinical assessments and are authorized by the European Medicines Agency (EMA) and/or the Food and Drug Administration (FDA). However, mass production and worldwide distribution of the COVID-19 vaccine is still a challenging and time-consuming task. Until now, preventive measures and early diagnosis have been the only solutions to prevent further spread of this deadly virus. In the context of diagnosis, molecular tests, such as the nucleic acid amplification test (NAAT) and reverse transcription polymerase chain reaction (RT-PCR) are being performed to identify positive cases (Ai et al., 2020) . However, these subjective evaluations are performed under strict clinical conditions that can limit the use of these testing methods in outbreak regions.

Recent studies (Ai et al., 2020; Fang et al., 2020) have found that chest computed tomography (CT) is a cost-effective diagnostic tool for the identification of infection. Figure 1 shows a few CT images of different patients infected with the COVID-19

virus. The infected regions are indicated by red boundary lines. The quantitative results presented in (Ai et al., 2020) show that the personal evaluation of COVID-19 infection reached 97% sensitivity, in contrast with RT-PCR testing results based on lung CT images. Similar results in (Fang et al., 2020) demonstrated the diagnostic potential of chest radiography in the initial evaluation of the COVID-19 virus. Moreover, the quantitative evaluation of infection progress inside lung lobes is an important measure for medical treatment .

Therefore, accurate segmentation of the infected regions is an important pre-processing step for assessing the severity of COVID-19 infection. However, manual evaluation of a large volume of CT scans is a time-consuming task and increases the workload requirements of healthcare professionals. Recent advances in artificial intelligence technology, especially in the field of medical diagnostics (Hou et al., 2021; Mhiri et al., 2020; Xu et al., 2020) , have substituted several human-dependent diagnostic approaches with computer-aided diagnosis (CAD) tools. In the present outbreak of COVID-19, these CAD techniques can also support healthcare professionals in making timely and efficient diagnostic decisions using chest CT images.

Generally, a CAD method applies a set of artificial intelligence algorithms to analyze the given data, such as CT images, and provides diagnostic results. Recently, a new set of artificial intelligence algorithms called deep learning has emerged that has significantly enhanced the diagnostic capabilities of numerous CAD techniques. These state-of-the-art algorithms can emulate the diagnostic capability of healthcare experts and make effective diagnostic decisions.

Recently, convolutional neural networks (CNNs), a new variant of deep learning algorithms, have attained special attention in the development of CAD tools related to various medical domains. However, such CNN-based diagnostic methods need to be trained through supervised learning, which requires a large-scale annotated dataset. In the medical domain, data annotation is accomplished by healthcare experts, which requires sufficient time and resources. To substitute the requirement of a large-scale training dataset, transfer learning (Krizhevsky et al., 2012 ) was adopted to train a CNN-based CAD method. In this training approach, a pre-trained CNN (trained on a huge collection of natural images, such as ImageNet (Deng et al., 2009)) can be used in the medical domain. The internal structure of a CNN model consists of a series of convolutional and fully-connected (FC) layers, including other layers, such as batch normalization (BN), softmax, classification, and a rectified linear unit (ReLU) (detailed in (Heaton, 2015) ). The convolutional and FC layers included learnable parameters that were initially trained with the training dataset.

In response to the current pandemic, various types of CAD methods exist in the literature.

These existing methods mainly use chest X-ray and/or CT images to diagnose COVID-19.

Initially, most of the existing methods Minaee et al., 2020; Rathod et al., 2021) used deep classification models to make diagnostic decisions. These methods Minaee et al., 2020; Rathod et al., 2021) can only classify positive and negative patients without highlighting the infectious regions in a given radiographic scan. Later, new methods were proposed based on deep segmentation networks, which localized the infected regions in a given radiographic scan. However, most of the existing methods lack cross-data analysis, which is a prime indicator for assessing the effectiveness of a CAD method under real conditions. Limited studies (Ma et al., 2021; have been conducted in which a cross-data analysis of the methods is performed. However, these methods showed limited results in cross-data analysis.

Consequently, in this study, we address the limitations of the existing studies and develop a high-performance CAD method for the efficient and well-localized detection of COVID-19 related findings in chest CT images. The main contributions of our method are as follows.

-We propose a dual multiscale dilated fusion network (DMDF-Net) for the robust segmentation of small lesions in chest CT images. Our designed model utilizes the strength of grouped convolution and multiscale deep features fusion inside the encoder and decoder modules using multiscale dilated convolution to achieve better segmentation results with a reduced number of training parameters.

-Additional pre-and post-processing steps are introduced in the proposed method to address the generality issues and obtain superior performance in a real-world setting.

Moreover, the post-processing step also provides a way to accurately estimate the proportion of the infected area of the lung (PIAL), which is an essential measure for quantifying the severity of COVID-19 infection.

-Our proposed method achieves state-of-the-art results for the case of cross-data analysis and outperforms various existing methods and recent deep segmentation networks.

-Finally, we make the proposed framework (including implementation of DMDF-Net, pre-, and post-processing steps) openly available through (https://github.com/Owais786786/DMDF-Net.git, accessed on 18 January 2022) for fair comparisons by other researchers.

The remainder of this article is structured as follows. In Section 2, we briefly review the various existing CAD methods for diagnosing COVID-19 infection using chest radiographic scans. Section 3 presents our selected datasets and proposed method. The training/validation settings and quantitative results are provided in Section 4. Finally, a brief discussion and the conclusions for the proposed framework are presented in Sections 5 and 6, respectively.

In recent literature, various types of diagnostic methods have been proposed to automatically diagnose COVID-19 from chest radiographic scans. These methods primarily focus on CNN-based classification and segmentation models to make diagnostic decisions. In Minaee et al., 2020; , the authors proposed CNN-based CAD frameworks that mainly classify the given radiographic scan as either positive or negative.

Additionally, different training schemes were proposed in Minaee et al., 2020; to perform the optimal training of their models using a limited number of training samples. However, the methods in Minaee et al., 2020; were trained to classify only positive and negative cases of COVID-19 without detecting and localizing accurate lesion regions in a given radiographic image. In contrast, the semantic segmentation networks performed well in finding infected regions with COVID-19 in each radiographic image. However, pixel-level annotated ground truths are required to properly train and validate these segmentation networks. Such data annotation is performed by healthcare professionals, which requires sufficient time and resources.

To substitute the constraints of large-scale datasets, different semi-supervised learning and data synthesis methods were proposed in literature (Fan et al., 2020; Jiang et al., 2021; . These methods can effectively train deep networks with limited training data. For example, (Jiang et al., 2021) proposed an image synthesis framework based on a conditional generative adversarial network (C-GAN) that can generate radiographic data samples (including both COVID-19 positive and negative CT images) for adequate training of deep networks. In addition, the conventional U-Net segmentation model was trained with and without using synthesized data to demonstrate the efficiency of their data synthesis approach. Subsequently, presented a new version of C-GAN, called CoSinGAN, that can synthesize high-quality CT images by learning from a single data sample. The experimental results show superior segmentation performance for 2D

and 3D U-Net compared to previous reference methods based on the synthesized data of

CoSinGAN. In addition, (Fan et al., 2020) proposed a semi-supervised training scheme that effectively trains the proposed deep segmentation model (Inf-Net) using unlabeled data. A novel randomly selected propagation algorithm was adopted to perform the training of Inf-Net using labeled and unlabeled training data. Moreover, the aggregation of high-level features was performed inside Inf-Net to exploit the diverse representations of the lesion regions.

Later, (Ma et al., 2021) presented benchmarks for lung lobes and infection segmentation using two radiographic datasets, including CT images. Different segmentation models were trained and evaluated to achieve the best results. 3D U-Net was ranked as the best model among the different reference models. In a comparative study, (Oulefki et al., 2021) presented a detailed analysis of traditional machine learning techniques in response to the automated diagnosis of COVID-19. Based on a limited number of data samples, the first-ranked machine learning method showed comparable results to a deep CNN model. However, recent comparative studies (Jiang et al., 2021; Li et al., 2020; have proved that deep learning models outperform traditional machine learning methods using multi-source radiographic datasets. Furthermore, (El-Bana et al., 2020) developed a multitasking CAD method that comprises a classification and segmentation model to identify and segment certain types of infections in a given CT image. Initially, a pre-trained CNN model was configured to recognize the positive and negative cases of COVID-19. Subsequently, a deep segmentation network (DeepLabV3+ ) was included to segment the infectious regions in a given CT image. Similarly, (Zheng et al., 2020) presented a multiscale identification network (MSD-Net) to segment multiclass lesions of different sizes.

In a recent study, (Abdel-Basset et al., 2021) presented a novel segmentation network, FSS-2019-nCov, to substitute the constraints of large-scale training datasets. FSS-2019-nCov contains a dual-path encoder-decoder design that mainly extracts high-level features without changing the channel information. A pre-trained residual network (ResNet34) was configured as an encoder. Later, (Selvaraj et al., 2021) developed a CAD framework based on the joint connectivity of a classification and segmentation network, similar to (El-Bana et al., 2020) .

Additional handcrafted features (i.e., lesion texture and structure information) were also used to efficiently train both networks. Subsequently, and (Zhou et al., 2021) proposed segmentation-based CAD solutions for the effective detection of minor infectious regions in CT images caused by the COVID-19 virus.

To deal with multi-plane CT data, (Kesavan et al. 2021 ) applied a pre-trained Res-UNet model for identifying the COVID-19 related lesion regions from the lung CT images with various 2D planes (such as axial, coronal, and sagittal orientations). In another study, (Munusamy et al., 2021) proposed a novel CNN model (FractalCovNet) for detecting COVID-19 infection from heterogenous radiographic data (i.e., X-ray and CT images). The proposed model was configured to perform the following two tasks: 1) classifying the lung X-ray images into COVID-19 positive and negative cases; 2) recognizing the COVID-19 related infectious regions from lung CT images. Subsequently, (Voulodimos et al., 2021) performed the comparative analysis of two known segmentation models (U-Net and fully convolutional networks (FCNs)) using CT data from COVID-19 patients. Comparative results indicate the following distinctive aspects of FCNs over U-Nets: 1) achieve accurate segmentation despite the class imbalance on the dataset; 2) perform well even in case of annotation errors on the boundaries of symptom manifestation areas. (Zheng et al., 2021) performed the volumetric segmentation of the whole 3D chest CT-scan using an enhanced version of U-Net named 3D CU-Net. An attention mechanism was mainly included in the encoder part of the proposed 3D CU-Net to obtain different levels of the feature representation. Additionally, a pyramid fusion module with expanded convolutions was introduced at the end of the encoder to combine multiscale context information from high-level features. Similarly, (Zhao et al., 2021) proposed a dilated dual attention U-Net (namely D2A U-Net) for accurate detection of COVID-19 related lesion regions in chest CT images. The proposed D2A U-Net utilized the strength of the dual attention strategy to improve feature maps and decrease the semantic gap between different levels of feature maps. Additionally, the hybrid dilated convolutions are included in the decoder part to achieve larger receptive fields, which improves the decoding process. Finally, Table 1 presents a comparative summary of our proposed and various existing methods to highlight the superior aspects and limitations of each study. (Munusamy et al., 2021) 473(N/A) Detection of COVID-19 cases using both chest X-ray and CT images -Lack of ablation study -Lack of cross-data analysis U-Net and FCNs (Voulodimos et al., 2021) 939 (10) Overcome the effect of class imbalance and annotation errors.

-Lack of comparison with state-ofthe-art models -Limited dataset Improved 3D CU-Net (Zheng et al., 2021) 5,569 ( 

A total of two openly available datasets, MosMed (Morozov et al., 2020) and COVID-19-CT-Seg (Jun et al., 2021; Ma et al., 2021) , were selected to assess the performance of the proposed DMDF-Net and various baseline networks for a fair comparison. annotated by junior data annotators and validated by three medical professionals. Figure 2 presents a few CT images and their corresponding ground truths for both datasets. MATLAB (version R2020b), which is a well-known coding framework, was used to implement and simulate the proposed DMDF-Net and other baseline models. All the experiments were performed using a personal desktop computer with an Intel Core i7 CPU with a Nvidia GeForce GPU (GTX 1070), 16-GB RAM, and a Windows 10 operating system.

As shown in Figure 3 , the proposed CAD framework mainly consists of the following four stages: 1) data pre-processing step; 2) lung segmentation network (DMDF-Net-1); 3)

infection segmentation network (DMDF-Net-2); and 4) post-processing step. In the first stage, the color and contrast of the input CT image are adjusted according to the training dataset by applying a simple Reinhard transformation (RT) (Reinhard et al., 2001) . Mathematically, each testing image is transformed into an enhanced image by applying the transformation , where presents the RT as a mapping function and is the mapping

parameter that incorporates the color and contrast information of training image. Subsequently, the second and third stages process the enhanced image (obtained after pre-processing)

using two independent DMDF-Nets, and generate the segmented image of the lung region of interest (ROI) and infectious regions , respectively. Our proposed DMDF-Nets (named

as DMDF-Net-1 and DMDF-Net-2 in Figure 3 ) mainly perform semantic segmentation and classify each pixel of input CT image either as black '0' or white '1'. In the output of DMDF-Net-1, the white '1' pixels represent the "lung region" and black '0' pixels corresponding to "background." Similarly, the output of DMDF-Net-2 presents the "infectious region" and "normal/background region" as white '1' and black '0' pixels, respectively. Finally, the postprocessing stage further refines (the output of DMDF-Net-2 in the third stage) and generates ′ 2 the final output by performing post-ROI fusion (i.e., ) of both networks,

as shown in Figure 3 . The final output provides well-localized information about the infectious regions inside the lung lobes as that can be further used for the severity assessment of ′ COVID-19 infection. The addition of the post-processing stage reduces the false-positive pixels in (the output of DMDF-Net-2) and further provides a way to accurately quantify the ′ 2 severity of COVID-19 infection in terms of PIAL score. The PIAL score is calculated by dividing the area of the infected region (i.e., the total number of red pixels in final output image ) over the total area of lung lobes (i.e., the total number of red and green pixels in final output ′ image ) as shown in Figure 3 . The subsequent sections present the detailed design, workflow, ′ and selected training loss of the proposed DMDF-Net.

Output Image Input Image 

( ′ ) ′

The architecture of our proposed DMDF-Net is designed to meet the following objectives: 1) efficient memory consumption; 2) low number of trainable parameters; and 3) minimum performance degradation in terms of segmentation results. To accomplish these milestones, we primarily utilize the strength of the grouped-convolutional (G-Conv) and dilated convolutional (D-Conv) layers to develop the overall structure of the proposed network. The use of the G-Conv layer results in efficient memory consumption and fast processing speed owing to the decreased number of learnable parameters (Heaton, 2015) . In detail, a conventional convolutional layer (Heaton, 2015) processes an input tensor and generates ∈ ℛ × ℎ × an output tensor by employing a kernel of size . The ∈ ℛ × ℎ × ∈ ℛ × × × × entire process requires a total processing cost of (Heaton, 2015) . × ℎ × × × × However, a G-Conv layer requires a total processing cost of for a × ℎ × ( 2 + ) similar operation and reduces the processing cost by a factor of . In our network design, 2 most of the G-Conv layers contain a kernel size of ). Consequently, the 3 × 3 (i.e., = 3 average processing cost of the G-Conv layer is approximately eight to nine times lower than that of the conventional convolutional layer. Additionally, the D-Conv layers also result in better segmentation performance owing to the characteristic of exploiting the multiscale deep features without substantially affecting the computation cost .

The complete layer-wise design of the proposed DMDF-Net is shown in Figure 4 . The network design mainly comprises an encoder part, followed by a decoder module. The encoder part mainly exploits the multiscale deep features from the given image and represents it as a 3D tensor that includes the main features. Subsequently, the decoder module upsamples the 3D tensor (encoder output) and generates a binary image as the final output. The following subsections provide a detailed explanation of the encoder/decoder structure and workflow. 

To achieve efficient memory utilization and a low number of trainable parameters, we used the basic structural units of MobileNetV2 ) (labeled as A-Block and B-Block in Figure 4 and Table 2 ) to develop an efficient encoder design. In addition, a set of four multiscale D-Conv layers (labeled as C-Block in Figure 4 and Table   2 ) was also included to exploit and fuse a more diversified representation of the input image.

The encoder structure includes a total of four A-Blocks, three B-Blocks, one C-Block, and some other layers, as indicated in Figure 4 . Both A-and B-Blocks consist of the following three layers. 1) Expansion layer: a 1×1 convolutional layer that upsamples the depth size of the input tensor by a factor of 6 and generates an output tensor

Feature extraction layer: a 3×3 G-Conv layer that exploits the deep features from and

if stride 2. 3) Projection layer: a 1×1 convolutional layer that downsamples ℛ /2 × ℎ /2 × 6 = the depth of by a factor of 6 and generates the final output tensor or ∈ ℛ × ℎ × ∈ (depending on the stride value in the preceding layer). In addition, a residual ℛ /2 × ℎ /2 × connection is included in B-Block that differentiates it from A-Block and prevents the vanishing gradient problem during the training process . Subsequently, the C-Block mainly comprises four parallel D-Conv layers with dilation rate (DR) factors of 1, 6, 12, and 18 (in each layer). For effective computation, each D-Conv layer is followed by a projection layer (1×1 convolutional layer) that projects the depth of the output tensor of each D-Conv layer from 320 to 256 channels. To exploit multiscale features, four D-Conv layers process the input tensor and generate a total of four output tensors , , , ∈ ℛ × ℎ × and . Consequently, four projection layers further reduce the depth of these intermediate outputs and generate new output tensors , , , and . Ultimately, a depth concatenation layer performs multiscale deep features fusion by combining these four output tensors and provides the final output tensor . Mathematically, input tensor undergoes ∈ ℛ × ℎ × the following transformations after passing through these structural blocks:

where , , and represent the operations of A-, B-, and C-

Block as transfer functions, respectively. In Eqs. (1) and (2) dilated convolution operations, respectively. For , dilated convolution, ,

performs similarly to the standard convolution, . conv(·) 

*Output tensors of these layers are fed to depth concatenation layer; **dilation rate (DR); #Par.: Total number of parameters; '-': Not applicable;

.

The complete configuration and parametric details of the encoder module are listed in Table 2 . Initially, the input image (obtained after pre-processing) is processed through a stack of multiple layers (including convolutional, BN, and ReLU layers) and transformed into a 3D tensor of size 18×22×256. In detail, the first 3×3 convolutional layer (labeled as Conv 1

in Table 2 ) explores the input image in both horizontal and vertical directions and converts it into an output tensor of size 144×176×32. Subsequently, the second and third convolutional layers (labeled as G-Conv 1 and Conv 2 in Table 2 ) process the output of the previous layer and transform the output of Conv 1 into a tensor of size 144×176×16. Consequently, a stack of seven structural blocks (labeled as A-Blocks 1,2,3,4 and B-Blocks 1,2,3 in Table 2) consecutively process the output of the previous layer/block to obtain a more diverse representation of the input image as a high-level abstraction. Eventually, these seven structural blocks convert the output tensor of Conv 2 into an output of size 18×22×320. Additionally, C-Block 1 applies the strength of multiscale dilated convolution and further explores the output of A-Block 4 at four different scales (with DR factors of 1, 6, 12, and 18) and provides diversified multiscale feature maps of size 18×22×1024 after performing multiscale deep features fusion. For efficient computation on the decoder side, a projection layer (labeled as Conv 3 in Table 2 ) further transforms these high-level features (output of C-Block 1) in a lowdimensional space. In detail, the Conv 3 layer reduces its (output of C-Block 1) depth by a factor of 4 and gives a final output tensor of size 18×22×256, which contains diverse semantic information.

The decoder part of the proposed DMDF-Net mainly includes two transposed convolutional (TP-Conv) layers, one C-Block (labeled as C-Block 2 in Figure 4 and Table 2 ), some other layers named Conv and G-Conv, softmax, and pixel classification in Table 2 . Our main contribution in the decoder part is the addition of multiscale D-Conv layers (labeled as C-Block-2 in Figure 4 and Table 2 The detailed layer-wise configuration of the decoder module is presented in Table 2 .

Initially, an 8×8 TP-Conv layer (labeled as TP-Conv 1 in Table 2 ) bilinearly upsamples the encoder final output (output tensor of size 18×22×256 after Conv 3 layer) with an upsampling factor of 4 and generates an upsampled tensor of size 72×88×256. Subsequently, a depth concatenation layer combines a residual tensor of size 72×88×48 (obtained from B-Block 1 and further processed by Conv 4) with the output tensor of TP-Conv 1 and gives a concatenated tensor of size 72×88×304. Furthermore, a total of four convolutional layers (labeled as G-Conv 2, Conv 5, G-Conv 3, and Conv 6 in Table 2 ) further transform the output of the previous layer into a new output tensor of size 72×88×320. Consequently, C-Block 2 applies the strength of multiscale dilated convolution and further processes the output of the preceding layer (Conv 6) at four different scales (with DR factors of 1, 6, 12, and 18) and provides diversified multiscale feature maps of size 72×88×1024 after performing multiscale deep features fusion. These feature maps are further projected into a low-dimensional space from 72×88×1024 to 72×88×2 by processing through two convolutional layers (labeled as Conv 7 and Conv 8 in Table 2 ).

Next, a second 8×8 TP-Conv layer (labeled as TP-Conv 2 in Table 2 ) further upsamples the output of Conv 8 using an upsampling factor of 4 and gives a final output tensor of size 288×352×2. Finally, a pixel classification layer in conjunction with the softmax layer generates the pixel-wise prediction of the given input image as the final output of our model. The softmax layer applies a softmax function (Heaton, 2015) that transforms the output of the TP-Conv 2 layer into a probability. Subsequently, the pixel classification layer provides a class label (either black '0' or white '1') to each pixel of the input CT image and generates a binary image as a final output.

To perform better segmentation of minor lesion regions, a balanced cross-entropy (BCE) loss was selected for the training of the proposed DMDF-Net. BCE shows better performance than conventional cross-entropy (CE) loss (Heaton, 2015; Jadon, 2020) , mainly in the case of small segmentation objects or lesion regions (Li et al., 2021; Ni et al., 2020; Roth et al., 2021) . Additionally, we took advantage of transfer learning (Krizhevsky et al., 2012) to perform timely and efficient training of the proposed network. The basic structural units of MobileNetV2 (labeled as A-Block and B-Block in Figure 4 ) were used to develop the encoder design. Therefore, the initial training parameters of our encoder module (backbone network) were obtained from the MobileNetV2 network that was initially trained with an ImageNet dataset (Deng et al., 2009 ) using the conventional CE loss function (Heaton, 2015; Jadon, 2020) and stochastic gradient descent (SGD) optimization method (Li et al., 2018) .

Therefore, a related variant of the conventional CE loss function named as BCE was selected to perform sufficient training of the proposed DMDF-Net for the target domain. The mathematical interpretation of our selected BCE loss function is given as follows:

where and are the training data sample and its ground-truth mask, respectively. ℎ Subsequently, , , and represent the proposed DMDF-Net as a transfer function, the (.) total number of data samples, and the total initial training parameters, respectively. Finally, is the class balancing factor between black '0' and white '1' pixels, and is calculated as the fraction of dominant pixels (i.e., black '0' pixels) in the entire training dataset (Jadon, 2020) .

In this section, we present a detailed explanation of training, validation, and quantitative results of our method, including a detailed ablation study. Finally, we compare the performance of the proposed DMDF-Net (including both DMDF-Net-1 and DMDF-Net-2) with various state-of-the-art methods.

Based on existing studies (Kandel and Castelli, 2020; Prabowo and Herwanto, 2019) , an SGD optimizer with a small learning rate factor of 0.001 was selected to perform efficient training of the proposed model. Generally, a small learning rate factor can achieve the global minimum; however, it requires many epochs to perform sufficient training of a segmentation network (Johnson and Zhang, 2013) . However, a large value of the learning rate factor can skip the global minimum (Johnson and Zhang, 2013) . Therefore, a small learning rate factor of 0.001 was selected to achieve optimal convergence of the proposed DMDF-Net. Additionally, we used the default settings provided by MATLAB R2020b for other hyperparameters. The overall training procedure of the proposed DMDF-Net is given in Algorithm 1 as pseudo-code. In the MosMed dataset, the ground truths for lung segmentation are not given; therefore, we did not perform cross-validation in Exp#2 (i.e., using MosMed in training and COVID-19-CT-Seg in testing). Figure 5 shows the training/validation accuracies and losses of the proposed DMDF-Net for lung segmentation (Exp#1) and COVID-19 infection segmentation (Exp#2).

To avoid the overfitting problem, we included independent validation datasets in the training procedure for both Exp#1 and Exp#2. Consequently, we selected the best models based on the maximum validation accuracies for Exp#1 and Exp#2. Thus, the training of DMDF-Net-1 was 

Our proposed framework mainly includes a lung segmentation network (DMDF-Net-1 in significant performance gains of our model, the segmentation results of the "both lung" dataset are higher than the "left lung" and "right lung" datasets. Such performance difference occurs due to similar shape and texture patterns of two lung lobes in both "left lung" and "right lung"

datasets. Therefore, in the case of "left lung" and "right lung" datasets, it was more challenging for a CNN model to distinguish left and right lung lobes with similar shape and texture patterns.

However, in case of "both lung" dataset, the segmentation of both lung lobes was quite simple for a CNN model compared to the individual segmentation of left and right lung lobes.

Consequently, the performance of our model in case of the "both lung" dataset is higher than the individual results of the "left lung" and "right lung" datasets. The encoder design (backbone network) of the proposed DMDF-Net includes the basic structural units of MobileNetV2. Therefore, we also compared the performance of the proposed encoder design with that of the original MobileNetV2 as a backbone network for lung segmentation (Exp. #1). show that transfer learning, pre-processing, post-processing, and multiscale deep features fusion applying multiscale dilated convolution (C-Blocks) work in a mutually beneficial way to enhance the overall performance of the proposed diagnostic framework. Additionally, Figure   6 shows the visual outputs of the proposed framework with and without including the pre-and post-processing stages for COVID-19 infection segmentation (Exp#2). It can be observed ( Figure 6 ) that both pre-and post-processing stages mutually contribute to reducing the number of false-positive and false-negative pixels and correctly segment the lesion regions in a given CT image. Figure 6 . Visual output results of the proposed framework with and without including the pre-and post-processing stages for COVID-19 infection segmentation (Exp#2).

We also compared the performance of the proposed encoder design (in DMDF-Net-2) with that of the original MobileNetV2 as a backbone network for COVID-19 infection segmentation (Exp#2). addressed the class imbalance problem in our selected dataset and resulted in additional performance gains for COVID-19 infection segmentation task (Exp#2). In the post-processing step for Exp#2, the lung ROI (output of DMDF-Net-1) was applied over the output of DMDF-Net-2 to reduce the number of false-positive pixels, which is referred to as post-ROI fusion in this work. To highlight the significance of our post-ROI fusion-based post-processing step, we also evaluated the performance of its counterpart, which is named as pre-ROI fusion. In this pre-ROI fusion-based post-processing step, the lung ROI mask (output of DMDF-Net-1) was applied over the input CT image to obtain the lung ROI image. The lung ROI image was then further processed by DMDF-Net-2 to segment the infection regions. Table   7 shows the quantitative performance comparison of pre-ROI fusion versus post-ROI fusion with and without applying the pre-processing step for Exp#2. After training the proposed DMDF-Net-2 through transfer learning (without including the data pre-processing stage), post-ROI fusion outperforms pre-ROI fusion ( disparities can be observed in Table 7 after training the proposed DMDF-Net-2 from scratch (i.e., without transfer learning). Performance differences of our DMDF-Net-2 of no pre-and post-processing (the 1 st and 7 th rows of Table 7) , only pre-processing (the 2 nd and 8 th rows of Table 7 ), combined pre-and post-processing (the 6 th and 12 th rows of Table 7 ) versus only postprocessing (the 4 nd and 10 th rows of Table 7 ) can be observed in Table 7 . As shown in this table, we confirm that the combined pre-and post-processing, only pre-processing, only postprocessing, and no pre-and post-processing show the 1 st -4 th highest accuracies, respectively.

Subsequently, Figure 7 presents the visual output results of pre-ROI fusion versus post-ROI fusion (for Exp#2) after applying the pre-processing step and training DMDF-Net-2 through transfer learning. It can be observed (Figure 7 ) that the post-ROI fusion-based post-processing step effectively contributes to reducing the number of false-positive and false-negative pixels and correctly segmenting the lesion regions in a given CT image. We further analyze the effect of pre-and post-processing stages on the same dataset (COVID-19-CT-Seg) to show the comparative results of two different data distributions. In this experiment, we evaluated the average performance of the COVID-19-CT-Seg dataset with our proposed DMDF-Net-2 by performing the five-fold cross-validation. Table 8 shows these comparative results with and without applying pre-and post-processing steps. It can be observed (Table 8 ) that the addition of pre-and post-processing stages gave a marginal gain of 0.48%, 0.5%, 0.55%, and 0.06% for average DICE, IoU, AP, and SPE, respectively, in case of the same dataset (COVID-19-CT-Seg). In comparison, the effect of pre-and post-processing stages is significantly high in case of cross-dataset (MosMed) with the average gains of 2.79%, 2.44%, 1.54%, and 11.71% for average DICE, IoU, AP, and SEN, respectively. These comparative results (Table 8) show the significant contribution of pre-and post-processing stages in cross-dataset (having different data distributions) and validate the generality of our proposed solution. Our first MosMed dataset comprises a total of 2,049 images (having a total of 785 infected and 1264 non-infected slices). Subsequently, the second COVID-19-CT-Seg dataset includes a total of 3,520 images (with a total of 1843 infected and 1677 non-infected slices). Owning to a small number of data samples, the number of infected versus non-infected slices influenced the training of our proposed model by causing an under-fitting problem. To address this problem, we utilized the strength of transfer learning (Krizhevsky et al., 2012) to perform timely and efficient training of our model using a small dataset such as COVID-19-CT-Seg in all the experiments. Tables 3 and 5 show the significant gains of transfer learning in Exp#1 and Exp#2, respectively. In addition, we also observed a class imbalance problem (particularly in case of the MosMed dataset) owning to a small ratio of infected lung regions in each infected slice, which ultimately resulted in poor testing results (Table 6 ). This problem was further addressed by using a BCE loss in all the experiments. Tables 4 and 6 show the comparative performance gains of our BCE loss over the conventional CE loss function in Exp#1 and Exp#2, respectively.

In this section, we perform a detailed comparative analysis of the proposed method with state-of-the-art segmentation networks proposed for COVID-19 (Ma et al., 2021; and general image segmentation domains Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015; Sandler et al., 2018) . The proposed diagnostic framework includes DMDF-Net-1 and DMDF-Net-2 to extract lung ROI (Exp#1), and segment infectious regions (Exp#2) in a given CT image, respectively.

Consequently, the performance of both DMDF-Net-1 and DMDF-Net-2 is separately compared with different baseline models. In (Ma et al., 2021; , the authors used the same datasets as selected in our method; therefore, we directly compared our method with the given results in (Ma et al., 2021; for both lung segmentation (Exp#1) and COVID-19 infection segmentation (Exp#2).

Whereas in the case of other segmentation networks Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015; Sandler et al., 2018) proposed for general image segmentation applications, a direct comparison was not possible. Therefore, to make a fair comparison, we evaluated the segmentation results of these models Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015; Sandler et al., 2018) using the same datasets as those selected in this study. These baseline models include U-Net , DeepLabV3+ (based on ResNet ), MobileNetV2 , SegNet (based on VGG16 and VGG19 ), and FCNs . Table 9 shows the comparative results of the proposed DMDF-Net-1 and all these baseline models for the lung segmentation task (Exp#1). It can be observed (Table 9 ) that the proposed DMDF-Net-1 shows superior results (in terms of average DICE and IoU scores), including a lower number of training parameters compared to the other models. DAL-Net ranked as the second-best network based on the second highest DICE and IoU scores among all the baseline methods. The proposed DMDF-Net-1 outperforms (second-best), yielding the average gains of 0. , and MAE, respectively. These results show the average gains of the performance of "left E φ lung", "right lung", and "both lung" datasets. Additionally, in a t-test analysis (proposed versus ), we obtain an average p-value of less than 0.05 (specifically, a -value of 0.013) that distinguishes our model from with a 95% confidence score. In addition, the number of training parameters of the proposed network is lower than . To be specific, our DMDF-Net-1 includes approximately 0.8 million fewer parameters than (Owais et al., 2021) (i.e., 5.85 million [proposed] << 6.65 million ). Table 9 Figure 8 shows the final segmentation results of the proposed lung segmentation model (DMDF-Net-1) versus different state-of-the-art segmentation networks Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015; Sandler et al., 2018) . It can be observed in Figure 8 that our proposed DMDF-Net-1 and the methods of Sandler et al., 2018) show comparable visual results and outperform the other three baseline methods Long et al., 2015; Ronneberger et al., 2015) . However, the overall quantitative performance of Furthermore, Table 10 shows the comparative results of the proposed DMDF-Net-2 versus different baseline models for the infection segmentation task (Exp#2). It can be observed from Table 10 that the proposed DMDF-Net-2 also provides superior performance and includes a lower number of training parameters compared to other methods. In contrast, DeepLabV3+

(MobileNetV2) is ranked as the second-best network among the other networks. With the addition of only the pre-processing step, the proposed DMDF-Net-2 outperforms ), we obtained an average p-value of less than 0.01 (specifically, a -value of 0.0072) that distinguishes our model from with a 99% confidence score. Similarly, the number of training parameters of the proposed network is also lower than . To be specific, our DMDF-Net-2 includes approximately 1.6 million fewer parameters than (i.e., 11.7 million [proposed] << 13.56 million ).

Consequently, these results (Tables 9 and 10 ) highlight the superior performance of our model. Table 10 also includes the FLOPs and execution speed of our DMDF-Net-2 and other baseline models. After including both pre-and post-processing steps, our final infection segmentation framework (DMDF-Net-1 and DMDF-Net-2) requires 74.9 Giga FLOPs with an average execution speed of 7.35 frames per second. Figure 9 presents the visual comparative results of our proposed framework with the other state-of-the-art deep segmentation models. Figure 9a presents the comparative results without including pre-and post-processing stages. Figure 9b shows the visual outputs of all the methods by including only pre-processing stage. Finally, Figure 9c visualizes the comparative performance by including both pre-and post-processing stages (applying the same lung segmentation network). It can be observed (Figure 9 ) that the proposed network generates welllocalized segmentation outputs for the input CT images. However, several reference models Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015) generate inadequate segmentation results, which are marked as false-positive (i.e., incorrectly recognize the normal regions as infectious regions) and false-negative (i.e., not recognizing the infected regions) pixels in Figure 9 . Nevertheless, and showed better performance than Chen et al., 2018; Long et al., 2015; Ronneberger et al., 2015) . However, the average segmentation results (Tables 9 and 10) show a higher performance of our method than that of and . Primarily, the superior performance of our method is attained by the addition of C-Blocks in both encoder and decoder modules, which mainly exploit diverse representations of lung/lesion patterns from the given data by performing multiscale deep features fusion.

Moreover, a residual connection (extracted from B-Block 1 of the encoder module) further contributes to the low-level contextual information in the decoding part to refine the edge information of the final output. 

This section describes the distinctive aspects of the proposed method, with possible limitations that can influence the diagnostic performance of our system. Finally, it includes a brief roadmap for future work to address these constraints and improve the overall performance.

This study leveraged the strengths of recent deep learning techniques in chest CT image analysis to identify lung lesions associated with COVID-19 infection. The proposed framework mainly includes a lung segmentation network (DMDF-Net-1) and an infection segmentation network (DMDF-Net-2) to extract the lung ROIs and infected areas from a CT image. The output of DMDF-Net-1 is mainly used in a later post-processing step to improve the infection segmentation results of DMDF-Net-2 and to provide a quantitative evaluation of the infected area in the CT image. Accurate detection and quantification of infected lung regions are essential for measuring infection severity in individual lung lobes and to find suitable personalized treatments . Figure 10 presents the infection quantification results of our proposed diagnostic framework for some typical CT images, including both positive ( Figure 10a ) and negative (Figure 10b ) data samples. Additionally, the intermediate outputs (in Figure 10 , after pre-processing, DMDF-Net-1, DMDF-Net-2, and post-processing) further present the diagnostic workflow of the proposed framework. In Figure 10 , the PIAL score represents the quantification of the infectious regions in each CT image, which is calculated by dividing the area of the infected region by the total area of lung lobes (i.e., PIAL Table 10 ) processes 20.41 frames per second. The average execution time (in terms of number of processed frames per second) was determined using the computing environment described in Section 2.1. Consequently, the optimal design of our model achieves state-of-the-art performance and utilizes low-cost hardware resources without influencing the overall diagnostic performance.

To visualize the internal workflow of the proposed DMDF-Net, we also show the multiscale class activation maps (CAMs) (Zhou et al., 2016) extracted from five different layers (labeled as Conv 2, B-Block 1, B-Block 2, Conv 3, and Conv 8 in Table 2 ) of the network. The input image of size 288×352 was downsampled into four spatial sizes (i.e., 144×176, 72×88, 36×44, and 18×22) after passing through the encoder module. Subsequently, the encoded output of size 18×22 is upsampled into two spatial sizes (i.e., 72×88 and 288×352) after passing through the decoder module. The decoded output with a size of 288×352 is the final output of the proposed network. Therefore, a total of five layers were selected for multiscale CAM visualization based on the distinctive spatial sizes of their outputs inside the encoder and decoder modules. Figure 11 shows the multiscale CAM visualization of the proposed DMDF-Net-1 (lung segmentation network) and DMDF-Net-2 (infection segmentation network) for testing the CT images. It can be observed (Figure 11 ) that the class-specific regions (lung ROIs or infectious regions) become increasingly discriminative after passing through successive layers. Finally, a binary image is obtained as the final output that presents the "Lung/infectious region" and "normal/background region" as white '1' and black '0' pixels, respectively. Although several online datasets are available related to COVID-19, most datasets are related to the classification problem. A few segmentation datasets related to COVID-19 are publicly available, which only include the segmentation masks for infectious regions. This study presents a CAD framework for automatic detection and quantification of COVID-19 related findings in lung CT scans. The proposed method includes an additional post-processing step that also requires a lung segmentation mask for accurate segmentation and quantification of infected lung areas. Therefore, we selected the COVID-19 CT Seg dataset that includes the ground truths for both lung and infection regions of each slice. Secondly, we aimed to develop a CAD tool to segment trivial infected regions in the lung efficiently. Therefore, we selected the MosMed dataset that includes trivial lung tissue abnormalities with COVID-19 (pulmonary parenchymal involvement =<25%) (Morozov et al., 2020) . In our future work, we will explore additional segmentation datasets related to the detection and quantification of COVID-19 related findings and develop a more efficient CAD solution.

Despite the promising results of the proposed method compared to existing methods, the current research still has some limitations. First, the performance of cross-datasets is still limited and can be further improved. Therefore, in future work, we will strive to increase the cross-data performance of the method, including multi-source CT data. Second, the proposed network can only segment lesions associated with COVID-19. In future work, we will collect more datasets, including those for multiple diseases, and propose a new CAD method that can detect and distinguish between COVID-19 and different types of diseases, such as other viral and bacterial infections.

In this study, we proposed a fully automated CAD framework for the effective recognition and quantification of COVID-19 related findings in a chest CT image. We mainly proposed a deep segmentation network (named DMDF-Net) that includes additional pre-and postprocessing steps for accurate segmentation of infectious regions in CT images. The preprocessing step was included to address the generality issues considering a real-world scenario.

The post-processing step generates a well-localized ROI of infectious regions and further provides the quantification of lesion regions in terms of the PIAL score. In detail, our designed network utilizes the strength of grouped convolution and multiscale deep features fusion using multiscale dilated convolution to achieve better segmentation results with a reduced number of learnable parameters (specifically, 5.85 million). The optimal size of our model utilizes lowcost hardware resources and provides effective diagnostic results. 

Without Pre-and Post-Processing DMDF-Net-2 (Proposed) 5

With Pre-Processing DMDF-Net-2 (Proposed) 5

With Pre-and Post-Processing (Applying same DMDF-Net-1)

FSS-2019-nCov: A deep learning architecture for semi-supervised few-shot segmentation of COVID-19 infection. Knowledge-Based Syst

Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases

Segnet: A deep convolutional encoderdecoder architecture for image segmentation

Encoder-decoder with atrous separable convolution for semantic image segmentation

ImageNet: A large-scale hierarchical image database

A multi-task pipeline with specialized streams for classification and segmentation of infection manifestations in covid-19 scans

Enhanced-alignment measure for binary foreground map evaluation

Inf-net: Automatic covid-19 lung infection segmentation from CT images

Cognitive vision inspired object segmentation metric and loss function

Sensitivity of chest CT for COVID-19: Comparison to RT-PCR

Artificial Intelligence for Humans, Deep learning and neural networks

Early neoplasia identification in Barrett's esophagus via attentive hierarchical aggregation and self-distillation

A survey of loss functions for semantic segmentation

COVID-19 CT image synthesis with a conditional generative adversarial network

Accelerating stochastic gradient descent using predictive variance reduction

COVID-19 CT lung and infection segmentation dataset (Version 1.0) [Data set

The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset

Res-UNet supported segmentation and evaluation of COVID19 lesion in lung CT

Looking beyond COVID-19 vaccine phase 3 trials

Imagenet classification with deep convolutional neural networks

Transformation-consistent selfensembling model for semisupervised medical image segmentation

Preconditioned stochastic gradient descent

Efficient and effective training of COVID-19 classification networks with self-supervised dual-track learning to rank

Fully convolutional networks for semantic segmentation

Towards data-efficient learning: A benchmark for COVID-19 CT lung and infection segmentation

How to evaluate foreground maps

Brain graph super-resolution for boosting neurological disorder diagnosis using unsupervised multi-topology connectional brain template learning

Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning

Mosmeddata: Chest CT scans with covid-19 related findings dataset

FractalCovNet architecture for COVID-19 Chest X-ray image Classification and CT-scan image Segmentation

GC-Net: Global context network for medical image segmentation

Automatic COVID-19 lung infected region segmentation and measurement using CT-scans images

Domain-Adaptive Artificial Intelligence-Based Model for Personalized Diagnosis of Trivial Lesions Related to COVID-19 in Chest Computed Tomography Scans

Light-weighted ensemble network with multilevel activation visualization for robust diagnosis of COVID19 pneumonia from large-scale chest radiographic database

Duplicate question detection in question answer website using convolutional neural network

Automatic segmentation of COVID-19 pneumonia lesions and its classification from CT images: A survey

Color transfer between images

U-net: Convolutional networks for biomedical image segmentation

An application of cascaded 3D fully convolutional networks for medical image segmentation

Mobilenetv2: Inverted residuals and linear bottlenecks

An integrated feature frame work for automated segmentation of COVID-19 infection from lung CT images

Deep learning models for COVID-19 infected area segmentation in CT images

World Health Organization, WHO Coronavirus Disease (COVID-19) Dashboard

MSCS-DeepLN: Evaluating lung nodule malignancy using multi-scale cost-sensitive neural networks

Relationship of chest CT score with clinical characteristics of 108 patients hospitalized with COVID-19 in Wuhan

CoSinGAN: Learning COVID-19 infection segmentation from a single radiological image

D2A U-Net: Automatic Segmentation of COVID-19 CT Slices Based on Dual Attention and Hybrid Dilated Convolution

Image segmentation evaluation: a survey of methods

MSD-Net: Multi-scale discriminative network for COVID-19 lung infection segmentation on CT

Improved 3D U-Net for COVID-19 Chest CT Image Segmentation

Learning deep features for discriminative localization

Automatic COVID-19 CT segmentation using U-Net integrated spatial and channel attention mechanism

The authors declare no conflict of interest.