key: cord-0440979-axt8mgsr
authors: Liu, Yitong; Deng, Ken; Sun, Chang; Yang, Hongwen
title: A Lightweight Structure Aimed to Utilize Spatial Correlation for Sparse-View CT Reconstruction
date: 2021-01-19
journal: nan
DOI: nan
sha: a7089ec8327c7b5ddb06325361060e2535a50199
doc_id: 440979
cord_uid: axt8mgsr

Sparse-view computed tomography (CT) is known as a widely used approach to reduce radiation dose while accelerating imaging through lowered projection views and correlated calculations. However, its severe imaging noise and streaking artifacts turn out to be a major issue in the low dose protocol. In this paper, we propose a dual-domain deep learning-based method that breaks through the limitations of currently prevailing algorithms that merely process single image slices. Since the scanned object usually contains a high degree of spatial continuity, the obtained consecutive imaging slices embody rich information that is largely unexplored. Therefore, we establish a cascade model named LS-AAE which aims to tackle the above problem. In addition, in order to adapt to the social trend of lightweight medical care, our model adopts the inverted residual with linear bottleneck in the module design to make it mobile and lightweight (reduce model parameters to one-eighth of its original) without sacrificing its performance. In our experiments, sparse sampling is conducted at intervals of 4{deg}, 8{deg} and 16{deg}, which appears to be a challenging sparsity that few scholars have attempted before. Nevertheless, our method still exhibits its robustness and achieves the state-of-the-art performance by reaching the PSNR of 40.305 and the SSIM of 0.948, while ensuring high model mobility. Particularly, it still exceeds other current methods when the sampling rate is one-fourth of them, thereby demonstrating its remarkable superiority.

1 Introduction approach to solve inverse problems [13, 14, 15, 16] . With the advent of compressed sensing [17] and its related regularizers, the quality of reconstructed images has been improved to a certain extent. One of the most typical regularizers is the total variation (TV) method, algorithms based on which include TV-POCS [18] , TGV method [19] , SART [13] and SART-TV [20] etc. In addition, dictionary learning is also commonly used as a regularizer. For example, [21] constructs a global dictionary and an iterative adaptive dictionary to solve the problem of low-dose CT reconstruction.

In recent years, with the improvement of computing power, there comes a rapid growth in deep learning [22] . Subsequently, neural networks have been widely applied in image analysis tasks, such as image classification [23] , image segmentation [24, 25, 26] , especially inverse problems in image reconstruction, such as artifacts reduction [27, 28] , denoising [29] and inpainting [30] . Since GAN (Generative Adversarial Networks) was designed elaborately by Goodfellow in 2014 [31] , it has been adopted in many image processing tasks due to its prominent performance in realistically predicting image details. Therefore, GANs are also naturally applied to improving the quality of low-dose CT images [32, 33, 34] . In addition, Ye et al. explored the relationship between deep learning and classical signal processing methods in [35] , explained the reason why deep learning can be employed in imaging inverse problems, and provided a theoretical basis for the application of deep learning in low-dose CT reconstruction.

Some researchers adopt deep learning-based architectures to complement and restore the limit-view Radon data [36, 37, 32, 38, 39, 40] . Dong et al. [36] used U-Net [25] to predict the missing Radon data, then reconstruct it to the image through FBP [41] . Jian Fu et al. [37] built a network that involves the tight coupling of the deep learning neural network and DPC-CT (Differential phasecontrast CT) reconstruction algorithm in the domain of DPC projection sinograms. The estimated result is a complete phase-contrast projection sinogram. Rushil Anirudh et al. established CTNet [38] , a system of 1D and 2D convolutional neural networks, which operates on the limited-view sinogram to predict the full-view sinogram, and then fed it to the standard analytical and iterative reconstruction algorithms to obtain the final result.

Other researchers carried out post-processing on reconstructed images with deep learning models, so as to remove the artifacts and noises for upgrading the quality of these images [42, 43, 44, 33, 45, 46, 47] . In 2016, a deep convolutional neural network [44] was proposed to learn an end-to-end mapping between the FBP and artifact-free images. In 2018, Yoseob Han and Jong Chul Ye designed a dual frame and tight frame U-Net [42] which satisfies the frame condition and performs better for recovery of high frequency edges in sparse-view CT. In 2019, Xie et al. [33] built an end-to-end cGAN model with joint function used for removing artifacts from limited-angle CT reconstruction images. In 2020, Wang et al. [45] developed a limited-angle TCT image reconstruction algorithm based on U-Net, which could suppress the artifacts and preserve the structures. Experiments have shown that U-Net-like structures are efficacious for image artifacts removal and texture restoration [35, 36, 42, 45, 47] .

Since neural networks are capable of predicting unknown data in the Radon and image domains, a natural idea is to combine these two domains [48, 49, 34, 50, 51, 52] to acquire better restoration results. Specifically, it first complements the Radon data, and then remove the residual artifacts and noises on images converted from the full-view Radon data. In 2018, Zhao et al. proposed SVGAN [34] , an artifacts reduction method for low-dose and sparse-view CT via a single model trained by GAN. In 2019, Liang et al. [49] proposed a comprehensive network combining projection and image domains. The projection estimation network is based on Res-CNN structure, and the image domain network takes the advantage of U-Net. In 2020, Zhu et al. designed ADAPTIVE-NET [50] to conduct joint denoising on the acquired sinogram and the reconstructed CT image, while reconstructing CT image in an end-to-end manner. In the past three years, experiments have proved that this sort of two-stage algorithm is quite conducive to image quality improvement.

All the current mainstream methods mentioned above make us notice that they solely process on each single CT image, while neglecting the solid fact the scanned object is always highly continuous. Consequently, there is abundant spatial information lies in these obtained consecutive CT images, which is largely left to be exploited. This enlightens us to propose a novel cascade model called LS-AAE (Lightweight Spatial Adversarial Autoencoder) that mainly focus on availably utilizing the spatial information between greatly correlated images. It has been proved in our experiments that this sort of structure design manages to efficaciously remove streaking artifacts in sparse-view CT images, and outruns other prevailing methods with its remarkable performance.

It is the social trend now to make healthcare mobile and portable. In lots of deep learning-based methods, however, scholars improve accuracy at the expense of sacrificing computing resources. Such computational complexity usually exceeds the capabilities of many mobile and embedded applications. This paper adopts the inverted residual with linear bottleneck [53] in the module design to propose a mobile structure that reduce model parameters to one-eighth of its original without sacrificing its performance.

Although enhancing the sparsity of sparse-view CT can bring benefits of accelerated scanning and related calculations, it will cause additional imaging damage. Balancing image quality and X-ray dose level has become a well-known trade-off problem. Thus, in order to explore the limit of sparsity in sparse-view CT reconstruction, we conduct sparse sampling at intervals of 4°, 8°and most importantly, 16°. Even under such sampling sparsity, our model can still exhibit its remarkable robustness and the state-of-the-art performance.

We introduce our proposed method exhaustively in Section II, the experimental results and corresponding discussion are described in section III, and conclusion is stated in section IV.

As is known to all, consecutive CT images usually contain high spatial coherency since the scanned object is usually spatially continuous. On account of that, we can imagine these CT images as adjacent frames in a video which contains much more information than a still image. This high correlation within the sequence of images can improve the performance of artifact removal from two aspects. Firstly, the extension of search regions from two-dimensional image neighborhoods to three-dimensional spatial neighborhoods provide extra information which can be used to denoise the reference image. Secondly, using spatial neighbors helps to reduce streaking artifacts as the residual error in each image is correlated. Also, we cannot help but notice that the task of artifact removal between consecutive images is similar to video denoising. Therefore, after investigating lots of research work on video denoising [54, 55, 56, 57, 58, 59, 60, 61, 62, 63] , we find out that current state-of-the-art methods lay lots of emphasis on motion estimation due to the strong redundancy along motion trajectories. To conclude, in order to more effectively remove streaking artifacts from sparse-view CT images, we need to design a structure that can not only look into the three-dimensional spatial neighborhood, but also capture the motion between consecutive images.

In recent years, lots of research has been invested into tuning deep neural networks to achieve an optimal balance between efficiency and performance. Among them, depthwise separable convolutions [64] exhibits its extraordinary capability and has gradually become an essential building block for numerous lightweight neural networks [64, 65, 66] . It aims to decompose the standard convolutional layer into two separate layers, namely the depthwise convolutional layer and the pointwise convolutional layer. The former layer is designed to perform lightweight filtering through employing a single convolutional filter per input channel, the latter one conducts 1 × 1 convolution to construct new features by computing linear combinations of input channels.

For the standard convolutional layer with input tensor size (c in , h, w), kernel size (c out , c in , k, k) and output tensor size (c out , h, w), its computational cost equals to c in · h · w · (k 2 · c out ). However, in depthwise separable convolutions, the depthwise convolutional layer has a computational cost of c in · h · w · k 2 since it merely operates on a single input channel, and the pointwise convolutional layer has a computational cost of c in · h · w · c out . Therefore, we only need a computational cost of h · w · c in · (k 2 + c out ) for depthwise separable convolutions, which is almost the one-ninth (k equals to 3 in our case) of the standard convolution. Most importantly, depthwise separable convolutions manage to lower the computational complexity to a large extent without sacrificing its accuracy, which would make it perfect to be inserted into our module design. Figure 1 : Structure overview. The sparse-view Radon data X is first sent to the neural network F for completion, then the restored full-view Radon data X ′ is converted to the image Y, which is feed into the neural network G for artifacts removal and we can finally obtain the ideal high-quality image Y ′ .

We can learn from the universal approximation theorem [67] that multilayer feedforward networks are capable of approximating various continuous functions. This inspires us to think that neural networks can be used to learn complex mappings that are difficult to solve through mathematical analysis. Thus, in this paper, we utilize a deep learning-based structure that combines the Radon domain and the image domain ( Figure 1 ) to solve the task of sparse-view CT reconstruction and inpainting.

Firstly, we want to make full use of the prior information in the Radon domain by converting the sparse-view Radon data X to the full-view Radon data X ′ so as to complement the missing data in some scanning angles. This process can be represented by the mapping: X f − → X ′ according to the universal approximation theorem, where function f can be approximated through our proposed neural network F . After we obtain the full-view Radon data X ′ , we transform it to the image Y through FBP. Although the first stage manages to alleviate the severe imaging damage from the original sparse-view CT image, there are still lots of streaking artifacts existing in Y that need to be removed to acquire the high-quality restored result Y ′ . We represent the restoration process into the mapping: Y g − → Y ′ , where function g can be approximated through our proposed neural network G. Through the above two-stage structure that combines the Radon domain with the image domain, we can finally get the ideal restored results.

We first adopt linear interpolation to convert the original sparse-view Radon data to full-view Radon data so as to satisfy the structural characteristics of our proposed neural network, which requires the input and output images to have the same resolution. Then we build a lightweight adversarial autoencoder (L-AAE) in Figure 2 to restore the Radon data, the structure of its autoencoder (L-AE) Figure 2 : The diagram of our proposed L-AAE, which is composed of a L-AE and a discriminator that help restore the image texture. can be seen from Figure 3 and Table 1 , which is composed of the encoder and the decoder that are highly symmetrical. We perform four down sampling in the encoder to obtain high-level semantic features of the input image, which is initially downsampled through conv1 layer with a stride of 2, and the subsequent downsampling is separately accomplished by the first building block of each unit. Each downsampling will halve the height and width of the activation map and double the number of channels.

As for the decoder, we correspondingly conduct four upsampling to restore the texture of the input image. Deconvolution is adopted here for upsampling with its kernel size and stride both equal to 2, so that each upsampling will double the height and width of the activation map and halve the number of channels. In addition, we add skip connections [68] between the encoder and decoder feature maps of the same resolution. Since the final feature map of the encoder has a relatively low resolution due to multiple downsampling, it will have an undesirable effect on the restoration of the image texture in the decoder. While the skip connection incorporates low-level features from the encoder which have a high resolution and contain abundant detailed information that will help to accurately restore the image texture. This sort of multi-scale, U-Net-like architectures have been proved to be effective in processing medical images.

The detailed structure of our building block can be seen from Figure 4 , it adopts the inverted residual with linear bottleneck referring to [53] , each block is composed of three convolutional layers. The first layer expands (characterized by the expansion factor exp) a low-dimensional compressed representation to high dimension with a kernel size of 1 × 1. The intermediate expansion layer adopts lightweight depthwise convolutions mentioned above so as to significantly decreases the number of operations (ops) and memory needed while sustaining the same performance. The last layer projects the feature back to a low-dimensional representation with a linear convolution like the first layer. All these layers are followed by batch normalization [69] and ReLU [70] as the non-linear activation except for the last layer that only followed by a batch normalization layer. In Figure 4 , IC and OC stand for the input and output channel of building blocks respectively. All convolutional layers in all building blocks have a stride of 1 except for Block2_1, Block3_1 and Block4_1 that have a stride of 2 to conduct downsampling.Expansion factor exp is 1 for Block1, Block7 and Block8 to avoid large ops and memory cost, we set up exp to be 3 for Block5_1 and Block6_1, and every block expect these mentioned above have an exp of 6. Besides, shortcut connections are implemented in blocks that have the same resolution between its input and output feature maps to enhance information flow and also improve the ability of a gradient to propagate across multiplier layers. We adopt 1 × 1 convolution in shortcuts when the number of channel in the input and output feature maps is different.

The discriminator in our L-AAE aims to strengthen model's ability to restore the detailed texture of images, its structure is almost the same as the encoder above, except that its Block4_3 and Block4_4 have an OC of 64 and 1 respectively. The output of Block4_4 is flattened and sent to sigmoid function for probability prediction, which we average to get the final output that represents the input image's probability to be a real image. This novel lightweight AAE enables us to acquire the well restored Radon data that are complete in every scanning angle, and the computational cost is about 8 times smaller than that of standard convolutions without sacrificing its accuracy.

After stage one, we transform the acquired full-view Radon data to images and find out that, we successfully enrich the information in the Radon domain and alleviate streaking artifacts from the original sparse-view CT imaging. Now in stage two, we will mainly focus on removing artifacts, restoring image to an ideal level. As mentioned above, we need a neural network that not only look into the three-dimensional spatial neighborhood, but also capture the motion between consecutive images, so as to efficaciously utilize the abundant spatial information between consecutive images to remove artifacts from the input image.

Generally speaking, motion estimation always brings an additional degree of complexity that is adverse to model's implementation in reality. It means that we need a structure that can manage to deploy motion estimation without much resource cost, we refer to [71] and its general structure appears to be a cascaded two-step architecture that inherently embed the motion of objects. Inspired by this, we propose a model named Lightweight Spatial Adversarial Autoencoder (LS-AAE) which can be seen from Figure 5 . It slightly modifies the L-AE from Figure 3 as its inpainting block, details are shown in Table 1 . The replacement from 2D convolution to 3D convolution enables our model to look into the three-dimensional spatial neighborhood for extra information. As shown in Figure 5 , five consecutive images {I i−2 , I i−1 , I i , I i+1 , I i+2 } are sent into the LS-AAE to restore the middle one. We firstly treat these inputs as triplets of consecutive images

, then enter them into the Inpainting Blocks 1. Subsequently, we obtain the outputs of these blocks and combine them into triplet

which will be sent into Inpainting Block 2 to acquire the ultimate estimation I ′′ i corresponding to the central image I i . The LS-AAE digs deep into the three-dimensional space and implicitly handles motion without any explicit motion compensation stage on account of the traits of its architecture. Besides, the three Inpainting Blocks in step one share the same weights so as to avoid memory cost. We also add a discriminator in stage two to better restore the image texture, the predicted image I ′′ i and its corresponding ground truth (the full-view CT imaging) I GT i are both send into this discriminator, its structure is exactly the same as it is in stage one.. Figure 5 : The diagram of our proposed LS-AAE. It combines an autoencoder that fully utilizes the spatial correlation between consecutive CT images and a discriminator that help refine image details.

Stage one and stage two are trained separately. For the autoencoders in these two models, we employ the multi-loss function below, which is consists of three parts l MSE , l Adv and l Reg with their respective hyperparameters α 1 , α 2 and α 3 .

l MSE calculates the mean square error between the restored image and its corresponding ground truth, it is widely used in various image inpainting tasks since it provides an intuitive evaluation for the model's prediction. The expression of l MSE can be seen from Equation (2).

Where function G AE stands for the autoencoder, I Input and I GT are the input image and its corresponding ground truth, W and H are the width and height of the input image respectively.

l Adv refers to the adversarial loss. The autoencoder manages to fool the discriminator by making its prediction as close to the ground truth as possible, so as to achieve the ideal image restoration outcome. Its expression can be seen from Equation (3).

Where function D and G AE stands for the discriminator and the autoencoder respectively, I Input is the model's input image.

l Reg is the regularization term of our multi-loss function. Since noises will have a side effect on our restoration result, we add a regularization term to maintain the smoothness of the image and also prevent overfitting. TV Loss is widely used in image analysis tasks, it reduces the variation between adjacent pixels to a certain extent. Its expression can be seen from Equation (4).

Where function G AE represents the autoencoder, I Input is the model's input image, W and H are the width and height of the input image respectively. ∇ calculates the gradient, · obtains the norm.

To optimize the discriminator of these two stages, their loss function should enable them to better distinguish between real and fake inputs. The loss function l Dis is shown in Equation (5).

Where function D and G stands for the discriminator and the autoencoder respectively, I Input and I GT are the input image and its corresponding ground truth. The discriminator outputs a scalar between 0 to 1 which represents the probability that the input image is real. Therefore, minimizing 1 − D(I GT )/maximizing D(I GT ) enables the discriminator to recognize real images, while minimizing D(G AE (I Input )) enables the discriminator to distinguish fake images that generated from the autoencoder from all input images.

During the training process, we adopt the Adam algorithm [72] for optimization. the learning rate is set to 1e-4 initially. For the multi-loss function, α 1 , α 2 and α 3 are set to 1, 1e-3, and 2e-8 respectively. We implement our whole structure using PyTorch [73] on two GeForce RTX 2080 Ti.

We adopt the LIDC-IDRI [74] as our dataset, which includes 1018 cases and approximately 240,000 DCM files of corresponding CT images. Cases 1 to 100 are divided into test set, cases 101 to 200 are divided into validation set, and the rest are divided into train set. Such a large amount of data allows us to train our models from scratch without overfitting. We utilize NumPy to read from these DCM files and conduct sparse sampling at intervals of 4, 8 and 16 (the corresponding full-view Radon data has 180 projections). Subsequently, we first analyze our overall structure through a series of ablation studies, and then compare our experimental results with other current methods to prove its superiority and robustness.

With all these innovations we make in our overall structure design, it would be appropriate for us to conduct corresponding ablation studies to prove their necessity. In this part, all the experimental results are acquired from sparse-view CT data with an interval of 4 if there is no specific mention.

As is known to all, U-Net has extraordinary performance in numerous medical image processing tasks, [42] implemented it for sparse-view CT imaging restoration and obtained outstanding restoration results. To testify that our proposed autoencoder can achieve a good balance between performance and mobility, we replace it with U-Net in the first stage and compare the restoration results and model parameters of this stage with ours, as shown in Table 3 . The images mentioned in Table  3 are reconstructed from the Radon data restored through stage one. As we can see from Table 3 , whether in the Radon domain or in the image domain, L-AE has competitive performance compared with U-Net. Moreover, it significantly reduces model parameters, making it suitable for situations where computational resources are extremely limited. This exhibits our model's ability in efficiently restoring CT images, thus adapting to the social trend of deploying portable medical devices.

We establish discriminators in both two stages, hoping to further improve our model's performance in restoring sparse-view CT data through the adversarial learning between the autoencoders and the discriminators. In order to verify this point of view, we send the test set into stage one where there is merely an autoencoder and compare its restoration results with ours, which can be seen from Table  4 . The images mentioned in Table 4 are reconstructed from the Radon data restored through stage one. From the above table, we can realize the significance of our proposed discriminator, it indeed assists our model to achieve a better level of restoration under the evaluation of PSNR and SSIM. Its precise structure (refers to Sec II) also ensures a high degree of mobility, which enables our overall structure to be portable and accurate at the same time.

As we state in Sec II, this sort of cascaded two-step structure inherently embeds the motion of objects which can largely help to remove image artifacts due to the strong redundancy between these consecutive images. Consequently, we design an experiment with reference to [71] to prove this view. In the second stage, instead of sending five consecutive images into this two-step LS-AAE, we directly input them into a single Inpainting Block (SIB) that is slightly modified in the three-dimensional convolution part to handle five images, that means we adopt a stride of 2 in the Conv1_1 layer (refers to Table 1 ). The experimental results can be seen from Table 5 below.

Now the SIB no longer owns this built-in cascade structure to implicitly conduct motion estimation, it suffers from a obvious drop in PSNR and SSIM. Therefore, we can arrive at the conclusion that, LS-AAE manages to effectively improve model's capability of restoring CT images with its cascaded two-step architecture that inherently capture the motion between consecutive images. We mention in Sec II that, the extension of search regions from two-dimensional image neighborhoods to three-dimensional spatial neighborhoods provide extra information for image restoration. Also, extracting spatial features is conducive to remove streaking artifacts as the residual error in each image is correlated. In order to realize this extension of search regions, three-dimensional convolution is employed in every Inpainting Block of LS-AAE. To verify the cruciality of these three-dimensional convolutions, we conduct an experiment in which 3D convolution are replaced back to 2D convolution, where the number of input images is regarded as the number of input channel (refers to Table 2 ). The inpainting results of these two models are shown in Table 6 . We can see that the inpainting outcome suffers from a drop about 0.9dB in PSNR, proving that three-dimensional convolutions assist model in restoring CT images to a certain extent without significantly consuming computational resources.

In all the experiments above, we set the image interval between input consecutive CT images of LS-AAE to the default value of 1. However, we cannot help but wonder that whether increasing the interval value can help the model obtain more spatial information, thereby enhancing its ability in removing image artifacts. In the following experiment, we set this image interval T to 1, 2, 3, 4 and 5 respectively, their corresponding results are shown in Table 7 . It can be learnt from Table 7 that this hyperparameter T does not have much impact on the final restoration result. Spatial correlation seems to be well utilized when the image interval is set to 1, which would be a decent default choice.

In this paper, we adopt a two-stage structure that combines the Radon domain and the image domain to obtain high-quality sparse-view CT images. Since each stage of the overall structure conducts restoration in their separate domains and both remarkably upgrade the restoration results, this leads us to think, what role do these two domains play? Subsequently, we feed our test set into these three structures: L-AAE in stage one that concentrates on the Radon domain, LS-AAE in stage two that focus on the image domain and of course, our overall structure that contains these two stages. The quantitative inpainting results of the above three structures can be referred from Table 8 , the intuitive outcome can also be seen in Figure 6 .

The Image Domain It can be seen from above that, restoration in each domain has its pros and cons. For the Radon domain, it demonstrates its superiority in enhancing the structural similarity of images so as to perform well under the evaluation of SSIM. While as for the image domain, it exhibits great ability in alleviating distortion, thus has a relatively good performance under the evaluation of PSNR. Naturally, we acquire extraordinary restoration results when combining these two domains to merge their respective advantages. Besides, we solely utilize the spatial correlation in the Image domain due to our discovery that, the spatial information between continuous Radon slices has little impact on the final inpainting outcome. We suppose this is because the texture in Radon slices does not have much similarity with CT images, thus cannot be restored in this way.

After verifying the rationality of our overall structural design, we want to testify its robustness through applying it to sparse-view CT data with a higher level of sparsity, which means, conducting sparse sampling at intervals of 4, 8 and even 16 (the corresponding full-view Radon data has 180 projections). In addition, we compare our method with other current ones to prove its prominent capability of restoring sparse-view CT images and removing streaking artifacts. The experimental results are shown in Table 9 , and the intuitive outcome can be seen from Figure 7 .

As we can see, our method exhibits extraordinary capability of restoring sparse-view CT imaging, effectively removes streaking artifacts and outruns other methods by a large margin. Also, it can be applied to extreme sparsity while still obtaining prominent inpainting outcome. Particularly, our method still exceeds others when the sampling rate is one-fourth of them, thereby demonstrating its remarkable robustness and superiority.

Ground Truth U-Net FBP SART-TV

Interval=8 Interval=16 Figure 7 : The intuitive restoration results of various methods at different sampling intervals. 

In this paper, we propose a lightweight structure that efficaciously restores sparse-view CT with its two-stage architecture combining the Radon domain and the image domain. Most importantly, we groundbreakingly exploit the abundant spatial information existing between consecutive CT images, so as to achieve a remarkable restoration outcome even if our method encounters extreme sparsity.

In the first stage, a mobile model named L-AAE is proposed to complement the original sparse-view CT in the Radon domain, it adopts the inverted residual with linear bottleneck in order to significantly reduce computational resource requirements while maintaining outstanding performance. In the second stage, after reconstructing the restored full-view Radon data into images through FBP, we establish a lightweight model called LS-AAE. It is designed to implicitly conduct motion estimation and dig into the three-dimensional spatial neighborhood with a relatively low memory cost. Therefore, it manages to concentrates on fully utilizing the strong spatial correlation between continuous CT images, so as to productively remove streaking artifacts and finally acquire high-quality restoration results.

Eventually, for the sparse-view CT with a sampling interval of 4, we achieve a PSNR of 40.305 and a SSIM of 0.948, realizing a remarkable restoration result that can effectively eliminate image artifacts. In addition, our method also performs well when it comes to extreme sparsity (the sampling interval is 8 or even 16), exhibiting its prominent robustness.

Representation of a function by its line integrals, with some radiological applications. II

Computerized transverse axial scanning (tomography): I. description of system

An outlook on X-ray CT research and development

Effectiveness of a staged US and CT protocol for the diagnosis of pediatric appendicitis: reducing radiation exposure in the age of ALARA

CT and computed radiography: The pictures are great, but is the radiation dose greater than required?

Strategies for reducing radiation dose in CT

CT dose reduction and dose management tools: Overview of available options

Low-dose versus standard-dose CT protocol in patients with clinically suspected renal colic

Dose reduction in multidetector CT using attenuation-based online tube current modulation

Evaluation of sparse-view reconstruction from flat-panel-detector cone-beam CT

Optimization-based image reconstruction from sparse-view data in offset-detector CBCT

Deep convolutional neural network for inverse problems in imaging

Simultaneous algebraic reconstruction technique (SART): A superior implementation of the ART algorithm

Iterative low-dose CT reconstruction with priors trained by artificial neural network

An improved statistical iterative algorithm for sparse-view and limited-angle CT image reconstruction

Jsr-net: A deep network for joint spatial-radon domain CT reconstruction from incomplete data

Robust uncertainty principles : Exact signal frequency information

Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization

Sparse-view X-ray CT reconstruction via total generalized variation regularization

Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT

Low-dose X-ray CT reconstruction via dictionary learning

Deep learning

Imagenet large scale visual recognition challenge

Fully convolutional networks for semantic segmentation

U-Net: Convolutional networks for biomedical image segmentation

Three dimensional root CT segmentation using multi-resolution encoder-decoder networks

Compression artifacts reduction by a deep convolutional network

Building dual-domain representations for compression artifacts reduction

Image denoising and inpainting with deep neural networks

ReconNet: Non-iterative reconstruction of images from compressively sensed measurements

Generative adversarial nets

Limited-view CT reconstruction based on autoencoder-like generative adversarial networks with joint loss

Artifact removal using GAN network for limited-angle CT reconstruction

Sparse-view CT reconstruction via generative adversarial networks

Deep convolutional framelets: A general deep learning for inverse problems

A deep learning reconstruction framework for X-ray computed tomography with incomplete data

A deep learning reconstruction framework for differential phase-contrast computed tomography with incomplete data

Lose the views: Limited angle CT reconstruction via implicit sinogram completion

Limited-view cone-beam CT reconstruction based on an adversarial autoencoder network with joint loss

Deep learning-based sinogram completion for low-dose CT

Theoretically exact filtered backprojection-type inversion algorithm for spiral CT

Framing U-Net via deep convolutional framelets: Application to sparseview CT

A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution

Image prediction for limited-angle tomography via deep learning with convolutional neural network

Deep learning based image reconstruction algorithm for limited-angle translational computed tomography

Low dose abdominal CT image reconstruction: An unsupervised learning based approach

Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal

High quality imaging from sparsely sampled computed tomography data with deep learning and wavelet transform in various domains

Comparison of projection domain, image domain, and comprehensive deep learning for sparse-view x-ray ct image reconstruction

Low-dose CT reconstruction with simultaneous sinogram and image domain denoising by deep neural network

A deep learning architecture for limitedangle computed tomography reconstruction

Artifact removal using a hybriddomain convolutional neural network for limited-angle computed tomography imaging

MobileNetV2: Inverted residuals and linear bottlenecks

On the difficulty of training recurrent neural networks

Real-time video super-resolution with spatio-temporal networks and motion compensation

Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms

Video denoising via empirical bayesian estimation of space-time patches

Denoising with kernel prediction and asymmetric loss functions

Model-blind video denoising via frame-to-frame training

ViDeNN: Deep blind video denoising

Non-local video denoising by CNN

DVDNET: A fast network for deep video denoising

Deep RNNs for video denoising

MobileNets: Efficient convolutional neural networks for mobile vision applications

Xception: Deep learning with depthwise separable convolutions

ShuffleNet: An extremely efficient convolutional neural network for mobile devices

Multilayer feedforward networks are universal approximators

Identity mappings in deep residual networks

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Deep sparse rectifier neural networks

FastDVDnet: Towards real-time deep video denoising without flow estimation

Adam: A method for stochastic optimization

Automatic differentiation in PyTorch

The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans