key: cord-0205028-a1evpetq authors: Jiang, Mingfeng; Zhi, Minghao; Wei, Liying; Yang, Xiaocheng; Zhang, Jucheng; Li, Yongming; Wang, Pin; Huang, Jiahao; Yang, Guang title: FA-GAN: Fused Attentive Generative Adversarial Networks for MRI Image Super-Resolution date: 2021-08-09 journal: nan DOI: nan sha: 3d7cd8506cf3db2c2ff62e2749d9d948951212a2 doc_id: 205028 cord_uid: a1evpetq High-resolution magnetic resonance images can provide fine-grained anatomical information, but acquiring such data requires a long scanning time. In this paper, a framework called the Fused Attentive Generative Adversarial Networks(FA-GAN) is proposed to generate the super-resolution MR image from low-resolution magnetic resonance images, which can reduce the scanning time effectively but with high resolution MR images. In the framework of the FA-GAN, the local fusion feature block, consisting of different three-pass networks by using different convolution kernels, is proposed to extract image features at different scales. And the global feature fusion module, including the channel attention module, the self-attention module, and the fusion operation, is designed to enhance the important features of the MR image. Moreover, the spectral normalization process is introduced to make the discriminator network stable. 40 sets of 3D magnetic resonance images (each set of images contains 256 slices) are used to train the network, and 10 sets of images are used to test the proposed method. The experimental results show that the PSNR and SSIM values of the super-resolution magnetic resonance image generated by the proposed FA-GAN method are higher than the state-of-the-art reconstruction methods. Image super-resolution refers to the reconstruction of high-resolution images from low-resolution images [1] . High resolution means that the pixels in the image are denser and can display more flexible details [2] [3] . These details are very useful in practical applications, such as satellite imaging, medical imaging, etc, which can better identify targets and find important features in high-resolution images [4] [5] [6] . High-resolution (HR) MRI images can provide fine anatomical information, which is helpful for clinical diagnosis and accurate decision-making [7] [8] . However, it not only requires expensive equipment but also requires a long scanning time, which brings challenges to image data acquisition. Therefore, further applications are limited by slow data acquiring and imaging speed [9] [10] . The super-resolution (SR) is a technique to generate a high-resolution (HR) image from a single or a group of low-resolution (LR) images, which can improve the visibility of image details or restore image details [11] [12] [13] . Without changing hardware or scanning components, SR methods can significantly improve the spatial resolution of MRI [14] [15] . Generally, there are three methods to implement image SR in MRI: interpolation-based, construction-based, and machine learning-based [16] [17] . The interpolation-based SR techniques assume that the area in the LR image can be extended to the corresponding area by using a polynomial or an interpolation function with a priori smoothness [18] [19] . The advantages of the interpolation-based super-resolution reconstruction algorithm are simplicity and high real-time performance; the disadvantage is that it is too simple to make full use of the prior information of MR images. In particular, the super-resolution reconstruction algorithm based on a single MR image has obvious shortcomings, which in a blurred version of the corresponding HR reference image [20] [21] . The reconstruction-based SR methods are introduced to solve an optimization problem incorporating two terms: the fidelity term, which penalizes the difference between a degraded SR image and an observed LR image, and the regularization term, which promotes sparsity and inherent characteristics of recovering the SR signal [22] [23] . The performance of these techniques becomes suboptimal especially in the high-frequency region when the input data becomes too sparse or the model becomes even slightly inaccurate [24] [25] [26] .These shortcomings reduce the effect of reconstruction-based SR methods to large magnifications, which may work well for small magnifications less than 4. Machine learning techniques, particularly deep learning (DL)-based SR approaches, have recently attracted considerable attention because of their state-of-the-art performance in SR for natural images. Most recent algorithms rely on data-driven deep learning models to reconstruct the required details for accurate super-resolution [27] [28] . Deep learning-based methods aim to automatically learn the relationship between input and output directly from the training samples [29] [30] . At the same time, deep learning has also played a vital role in CT/PET image reconstruction, such as PET Image Reconstruction from Sinogram Domain [31] [32] [33] . With the development of deep learning, the Generative Adversarial Network proposed the standard super-resolution GAN (SRGAN) framework for generating brain super-resolution images [34] . Most GAN-based image generation models are constructed using convolutional layers. Convolutions process information in local neighborhoods, however, using only convolutional layers is inefficient in establishing remote dependencies in images [35] [36] . It is difficult to learn the dependencies between images using a small convolution kernel. However, the size of the convolution kernel is too large, which will reduce the model's performance. Besides, increasing the size of the convolution kernel can also expand the receptive field, but it inevitably increases the complexity of the model [37] [38] . Zhang et al propose the Self-Attention Generative Adversarial Network (SAGAN) with attention-driven, long-range dependency modeling for image generation tasks [39] . In the previous work on reconstruction problems, deep learning based methods have two major issues [40] . Firstly, they treat each channel-wise feature equally, but contributions to the reconstruction task vary from different feature maps. Secondly, the receptive field in a convolutional layer may cause to lose contextual information from original images, especially those high-frequency components that contain valuable detailed information such as edges and texture. Therefore, the Channel-Attention module is designed to filter the useless features and to enhance the informative ones. Therefore, model parameters in shallower layers are to be updated mostly that are relevant to a given task. To the best of our knowledge, this is the first work to employ channel-wise attention to the MRI reconstruction problem [41] [42] . Combining the idea of MR reconstruction and image super-resolution, some researchers work on recovering HR images from low-resolution under-sampled K-space data directly [43] [44] [45] [46] . In this paper, a fused attentive generative adversarial network (FA-GAN) is proposed for generating super-resolution MR images from low-resolution MR ones. The novelty of this work can be concluded as following: 1)The local fusion feature block, consisting of different three-pass networks by using different convolution kernels, was proposed to extract image features at different scales, so as to improve the reconstruction performances of SR images; 2) The global feature fusion module, including the channel attention module, the self-attention module, and the fusion operation,was designed to enhance the important features of the MRI image, so that the super-resolution image is more realistic and closer to the original image; 3)The spectral normalization (SN) is introduced to the discriminator network, which can not only smooth and accelerate the training process of the deep neural network but also improve the model generalization performance. The proposed neural network model is designed to learn the image firstly, and then inversely map the LR image to the reference HR image [47] [48] . This model only takes LR images as input to generate SR images. The operation can be defined as where LR I , I HR ∈ℝ × are respectively LR and HR MRI images of size × and : I HR ∈ℝ × → LR I ∈ℝ × denotes the down-sampling process that creates a LR counterpart from an HR image. The network output is passed through a series of upsampling stages, where each stage doubles the input image size. The output is passed through a convolution stage to get the resolved image. Depending upon the desired scaling, the number of upsampling stages can be changed. The adversarial min-max problem is defined by (3) Different from those previous experiments [49] , the local fusion feature block consists of different three-pass networks by using different convolution kernels, as shown in Figure 2 . In this way, the information flows between those bypasses can be shared with each other, which allow our network to extract image features at different scales. The operation can be defined as , The global feature fusion module includes three parts, namely the channel attention module, the self-attention module, and the fusion operation. Through these modules, the important features of the MRI image can be enhanced, so that the super-resolution image is more realistic and closer to the original image. In this paper, a lightweight channel attention mechanism is introduced, which allows to selectively emphasize informative features and restrain less useful ones via a one-dimensional vector from global information. As illustrated in Figure 4 , a global average pooling is used to extract the global information across spatial dimensions H*W firstly. Then, it is followed by a dimension reduction layer with a reduction ratio of r, a ReLu activation, a dimension increase layer, and a sigmoid activation to generate SR image. The two dimension computable layers are implemented by fully connected layers. The final output of the recalibration is acquired by rescaling the input features. (2) Self-Attention Module The role of the self-attention module is to replace the traditional convolutional feature map with a self-attention feature map. where , indicates the extent to which the model attends to the ℎ location when synthesizing the ℎ region. Here, C is the number of channels and N is the number of feature locations of features from the previous hidden layer. The output of the attention layer is o and can be expressed as: In the above formulation, , , ℎ ,and are the learned weight matrices, which are implemented as 1×1 convolutions. Besides, we further multiply the output of the attention layer by a scale parameter and add back the input feature map. Therefore, the final output is given by, where γ is a learnable scalar and initialized to 0. Introducing learnable γ can make the network first rely on the information of the local neighborhood, and then gradually learn to assign more weight to non-local information. A. Direct Connection. The direct connection function can be implemented by adding the two terms directly as following: where i is the index of a feature. R represents the output of Channel Attention, and Y represents the output of Self-attention. Both  and  are set to 0.5 as the preset value. B. Weighted Connection. Compared to the direct connection, the weighted connection introduces the competition between R and Y. Besides, it can be easily extended to a softmax form, which is more robust and less sensitive to trivial features.Both α and β are set to 0.5 as the preset value. To avoid introducing extra parameters, we calculate weights using R and Y. The weighted connection function is represented as The loss function is used to estimate the difference between the value generated or fitted by the model and the real value, that is, the difference between the reconstructed MRI and the original MRI. The smaller the loss function, the stronger the model is. In order to improve the quality of model reconstruction, we propose to use perceptual loss, pixel loss, and adversarial loss as the combined loss function of the generator. Perceptual loss mimics human visual differences, and pixel loss is the difference between pixels in the image domain. 3 10 SR SR SR x Gen l l l − =+ (14) In the following we describe possible choices for the content loss l x SR and the adversarial loss l gen SR . This paper uses the Euclidean distance between VGG features, which is more relevant to human perception, as the content loss, as shown below: (15) , ij  indicates that the extracted feature is the j-th convolutional layer before the i-th largest pooling layer. , , H i,j represents the dimension of the feature layer. The adversarial loss function is the average discriminator probability value of the samples generated by the generator. The formula is as follows: All the experiment use TeslaV100-SXM2 GPU and four different MRI data , (18) where and represent the mean of the image and respectively, and represent the variance of the image and respectively, the covariance of the image and , and the constant value used to maintain stability. The expression of FID is In the formula, Tr represents the sum of the elements on the diagonal of the matrix, μ is the mean, Σ is the covariance, r x represents a real image, and g is a generated image. In the experiment, we set the parameters of the comparison experiment with the optimal parameters in order to compare the best reconstruction performance. and SSIM of 3200 two-dimensional MRI reconstructed by using different algorithms. Table 1 presents cardiac super-resolution MRI reconstruction performances by using the differences methods, Table 2 shows the brain super-resolution MRI reconstruction performances of different methods, Table 3 provides the knee super-resolution MRI reconstruction results of different methods, and Table 4 Table 5 illustrates the super-resolution MRI reconstruction performances by using different GAN-based methods in terms of FID. As shown in Table 5 , it can be found that the FA-GAN can effectively reduce the FID. A lower FID means that the reconstructed SR MR images are closer to the real high resolution MR images, which means that the quality of the reconstructed SR MR images is higher. One possible reason is that fusing hierarchical features improves the information flow and eases the difficulty of training. We can conclude from Table 6 According to the results of the ablation experiment, as shown in Table 6 , it can be seen that the CA and LFFB modules together plays the most important role in the super resolution MR image reconstruction, which affect the reconstruction performances obviously. However, the affection of the SA module is relative small, and the reconstruction quality drops slightly. Table 7 illustrates the reconstruction effect under different connection modes. It can be clearly seen that the weighted connection has achieved better results.Thus, we use a weighted connection in our method. For the selection of parameters α and β, we have done the following three sets of comparative experiments. As shown in Table 8 , the experimental results show that the parameters are the optimize values when α=0.5 and β=0.5. In this paper, the spectral normalization (SN) is introduced to the discriminator network, so as to stabilize the training of GAN does not require additional hyper parameter adjustments (set the spectral norm of all weight layers to 1). Figure 11 shows the effect of SN on FA-GAN, which makes the loss value steadily drop and makes the whole training process more stable. Figure 12 illustrates the loss value of the training process of SR image reconstruction by using four different GAN-based methods with ×4 times. It can be found that the loss value by using FA -GAN method decreases monotonously with iteration increasing, while the other methods decrease in waves, which indicates that the proposed FA -GAN method combined with spectral normalization makes the training more stable. This Image super-resolution using deep convolutional networks A large-scale database and a CNN model for attention-based glaucoma detection Robust super-resolution volume reconstruction from slice acquisitions: application to fetal brain MRI A Comparative Study of CNN-Based Super-Resolution Methods in MRI Reconstruction Exploiting deep residual network for fast parallel MR imaging with complex convolution DIMENSION: Dynamic MR imaging with both k-space and spatial prior knowledge obtained via multi-supervised network training Learning Joint-Sparse Codes for Calibration-Free Parallel MR Imaging MRI upsampling using feature-based nonlocal means approach Single-image super-resolution of brain MR images using overcomplete dictionaries An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization Image super-resolution using deep convolutional networks Image super-resolution via wide residual networks with fixed skip connection Interpolation-based super-resolution reconstruction: effects of slice thickness Fast single image super-resolution using estimated low-frequency k-space data in MRI Super-resolution reconstruction of MR imagewith a novel residual learning network algorithm Improving magnetic resonance resolution with supervised learning Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network Generative adversarial networks for reconstructing natural images from brain activity Densely connected convolutional networks Retrospective correction of Rigid and Non-Rigid MR motion artifacts using GANs Generative adversarial nets Fast single image super-resolution using estimated low-frequency k-space data in MRI Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss Photo-realistic single image super-resolution using a generative adversarial network Densely connected convolutional networks Learning deconvolutional deep neural network for high resolution medical image reconstruction Automating motion correction in multishot MRI using generative adversarial networks Learning a deep convolutional network for image super-resolution Self-attention generative adversarial networks Deep learning based framework for direct reconstruction of PET images Direct PET image reconstruction based on the Wasserstein generative adversarial network. IEEE Transactions on Radiation and Plasma Medical Sciences A deep encoder-decoder network for directly solving the PET image reconstruction inverse problem Searching for Activation Functions Spectral normalization for generative adversarial networks Firecaffe: near-linear acceleration of deep neural network training on compute clusters Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss Photo-realistic single image super-resolution using a generative adversarial network Learning deconvolutional deep neural network for high resolution medical image reconstruction Automating motion correction in multishot MRI using generative adversarial networks Learning a deep convolutional network for image super-resolution Pyramid feature attention network for saliency detection Super-resolution reconstruction of MR image with a novel residual learning network algorithm Direct delineation of myocardial infarction without contrast agents using a joint motion feature learning architecture A Super-Resolution-Involved Reconstruction Method for High Resolution MR Imaging Single MR image super-resolution via mixed self-similarity attention network An empirical study of spatial attention mechanisms in deep networks Attention augmented convolutional networks Dual attention network for scene segmentation This work is supported in part by the National Natural Science Foundation of China