key: cord-0508438-f1j9hb28
authors: Wang, Ce; Zhang, Haimiao; Li, Qian; Shang, Kun; Lyu, Yuanyuan; Dong, Bin; Zhou, S. Kevin
title: Improving Generalizability in Limited-Angle CT Reconstruction with Sinogram Extrapolation
date: 2021-03-09
journal: nan
DOI: nan
sha: 776e8e38db9605e7ea6ffbb503c2ac164d44e6b5
doc_id: 508438
cord_uid: f1j9hb28

Computed tomography (CT) reconstruction from X-ray projections acquired within a limited angle range is challenging, especially when the angle range is extremely small. Both analytical and iterative models need more projections for effective modeling. Deep learning methods have gained prevalence due to their excellent reconstruction performances, but such success is mainly limited within the same dataset and does not generalize across datasets with different distributions. Hereby we propose ExtraPolationNetwork for limited-angle CT reconstruction via the introduction of a sinogram extrapolation module, which is theoretically justified. The module complements extra sinogram information and boots model generalizability. Extensive experimental results show that our reconstruction model achieves state-of-the-art performance on NIH-AAPM dataset, similar to existing approaches. More importantly, we show that using such a sinogram extrapolation module significantly improves the generalization capability of the model on unseen datasets (e.g., COVID-19 and LIDC datasets) when compared to existing approaches.

In healthcare, Computed Tomography(CT) based on X-ray projections is an indispensable imaging modality for clinical diagnosis. Limited-angle (LA) CT is arXiv:2103.05255v4 [eess.IV] 17 Nov 2021 a common type of acquisition in many scenarios, such as to reduce radiation dose in low-dose CT or forced to take projections in a restricted range of angles in Carm CT [17] and dental CT. However, the deficiency of projection angles brings significant challenge to image reconstruction and may lead to severe artifacts in the reconstructed images.

Many CT image reconstruction algorithms have been proposed in the literature to improve image quality, which can be categorized as model-based and deep-learning-based methods. For example, Filtered Back Projection (FBP) [20] , as a representative analytical method, is widely used for reconstructing a highquality image efficiently. However, FBP prefers acquisition with full-ranged views which makes using it for LACT sub-optimal. The (some times extreme) reduction on the range of projection angles decreases the effectiveness of the commercial CT reconstruction algorithms. To overcome such challenge, iterative regularizationbased algorithms [6, 12, 15, 18, 21, 27] are proposed to leverage prior-knowledge on the image to be reconstructed and achieve better reconstruction performance for LACT. Notice that those iterative algorithms are often computationally expensive and require careful case-by-case hyperparameter tuning.

Currently, deep learning(DL) techniques have been widely adopted in CT and demonstrate promising reconstruction performance [2, 8, 24, 28, 32, 33] . By further combining the iterative algorithms with DL, a series of iterative frameworks with the accordingly designed neural-network-based modules are proposed [1, 3, 5, 7, 13, 19, 29] . ADMMNet [25] introduces a neural-network-based module in reconstruction problem and achieves remarkable performance. Furthermore, DuDoNet [10, 11] , ADMM-CSNet [26] and LEARN++ [31] improve reconstruction results with an enhancement module in the projection domain, which inspires us to fuse dual-domain learning in our model design.

Although deep-learning-based algorithms have achieved state-of-the-art performance, they are also known to easily over-fit on training data, which is not expected in practice. MetaInvNet [30] is then proposed to improve the reconstruction performance with sparse-view projections, demonstrating good model generalizability. They attempt to find better initialization for an iterative HQS-CG [6] model with a U-Net [16] and achieve better generalization performance in such scenarios. But they still focus on the case with a large range of acquired projections, which limits the application of their model in practice. How to obtain a highly generalizable model when learning from practical data is still difficult.

To retain model generalizability in LACT reconstruction, we propose a model, called ExtraPolationNetwork (EPNet), for recovering high-quality CT images. In this model, we utilize dual-domain learning to emphasize data consistency between image domain and projection domain, and introduce an extrapolation module. The proposed extrapolation module helps complement missed information in the projection domain and provides extra details for reconstruction. Extensive experimental results show that the model achieves state-of-the-art performance on the NIH-AAPM dataset [14] . Furthermore, we also achieve better generalization performance on additional datasets, COVID-19 and LIDC [4] . This empirically verifies the effectiveness of the proposed extrapolation module. We make our implementation available at https://github.com/mars11121/EPNet.

CT reconstruction aims to reconstruct clean image u from the projection data Y with unknown noise n, whose mathematical formulation is:

where A is the Radon transform. For LACT, the projection data Y is incomplete as a result of the decrease of angle range (the view angle α ∈ [0, α max ] with α max < 180 • ). The reduced sinogram information limits the performance of current reconstruction methods. Therefore, the estimation of a more complete sinogram Y is necessary to enhance model reconstruction performance. To yield such an accurate estimation, the consistency between incomplete projection Y and complete projection Y is crucial. We assume Y is obtained by some operations (e.g. downsampling operation) from Y . Besides, Y and clean image u should also be consistent under the corresponding transformation matrix A. Consequently, we propose the following constraints:

where P is the downsampling matrix. In this way, the final model becomes the following optimization problem:

where R(·) is a regularization term incorporating image priors.

In this section, we introduce full details on the proposed method, which is depicted in Fig. 1 . Our model is built by unrolling the HQS-CG [6] algorithm with N iterations. The HQS-CG algorithm is briefly introduced in Section 3.1. Specifically, we utilize the Init-CNN module [30] to search for a better initialization for Conjugate Gradient (CG) algorithm in each iteration. The input of the module is composed of reconstructed images from the image domain and projection domain. In the image domain, we retrain the basic HQS-CG model and use the CG module for reconstruction. In the projection domain, we first use our proposed Extrapolation Layer (EPL) to estimate extra sinograms. Then, we use Sinogram Enhancement Network (SENet) to inpaint the extrapolated sinograms and reconstruct them with Radon Inversion Layer (RIL) [10] , which is capable of backpropagating gradients to the previous layer. Section 3.2 introduces the details of the involved modules.

SENet_k SENet_N

Radon Inversion

ExtraPolation Layer 

Traditionally, there exist many effective algorithms to solve objective (2) . One such algorithm is Half Quadratic Splitting(HQS) [6] , which solves the following:

The operator W is chosen as the highpass components of the piecewise linear tight wavelet frame transform. With alternating optimization among Y , u, and z, the final closed-form solution could be derived as follows: Fig. 3 : The architecture of SENet.

Init-CNN and RIL. We realize the Init-CNN module with a heavy U-Net architecture with skip connection, which stabilizes the training. Besides, the heavy U-Net shares parameters across different steps, which is proved more powerful for the final reconstruction. The Radon Inversion Layer (RIL) is first introduced in DuDoNet [10] , which builds dual-domain learning for reconstruction. We here use the module to obtain the reconstructed image from the projection domain. EPL. As introduced, the reduction of angle range is the main bottleneck in the limited-angle scenario. Besides, usual interpolation techniques are not suitable in this case. But few researchers consider extrapolating sinograms with CNNs, which provides more details of images since sinograms contain both spatial and temporal (or view angle) information of the corresponding images. Compared with the image domain difference, sinograms from different data distributions also have similarities in the temporal dimension. To utilize such an advantage, we propose a module called "Extrapolation Layer (EPL)" to extrapolate sinograms before SENet. As shown in Fig. 2 , the EPL module is composed of three parallel convolutional neural networks, where the left and the right networks are used to predict neighboring sinograms of the corresponding sides and the middle one is used to denoise the input. The outputs of the three networks are then concatenated, followed with the proposed supervision defined as follows:

where Y out is the predicted sinogram, Y gt is the corresponding ground-truth, and mask is a binary matrix to emphasize the bilateral prediction. Here, we utilize RIL to realize a dual-domain consistency for the prediction, which makes the module estimation more accurate when embedded into the whole model. SENet. With extrapolated sinograms, we then use SENet to firstly enhance the quality of sinograms, which is designed as a light CNN as in Fig. 3 . At last, the enhanced sinograms are mapped to the image domain via RIL, which would help decrease the different optimization directions in our dual-domain learning. The objective for SENet is as follows: where Y se is the enhanced sinogram, Y gt and u gt are the corresponding groundtruth sinogram and image, respectively. Loss function. With the above modules, the full objective function of EPNet is defined by:

where N is the total iterations of unrolled back-bone HQS-CG model, {u} N i=1 is the reconstructed image of each iteration, and L ssim is the SSIM loss.

Datasets. We first train and test models on the "2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge" dataset [14] . Specifically, we choose 1,746 slices of five patients for training and 1,716 slices of another five patients for testing. To further show our models' generalization capability, we test our models on 1,958 slices of four patients chosen from the COVID-19 dataset, and 1,635 slices of six patients from the LIDC dataset [4] . The two datasets are also composed of chest CT images but from different scenarios and machines, which constitutes good choices for testing the generalization capability. For the generalizability experiments, we have added additional HU value shift to the latter two datasets, which brings about similar distribution deformation. All the experiments are conducted with Fan-Beam Geometry and the number of detector elements is set to 800. Besides, we add mixed noise, composed of 5% Gaussian noise and Poisson noise with an intensity of 5e 6 , to all simulated sinograms [30] . Implementations and Training Settings. All the compared models are trained and tested with the corresponding angle number (15, 30, 60 , 90) except MetaInvNet ori, which is trained with 180 angle number as Zhang et al. [30] do. Our models are implemented using the PyTorch framework. We use the Adam optimizer [9] with (β 1 , β 2 ) = (0.9, 0.999) to train these models. The learning rate starts from 0.0001. Models are all trained on a Nvidia 3090 GPU card for 10 epochs with a batch size of 1. Evaluation Metric. Quantitative results are measured by the multi-scale structural similarity index (SSIM) (with level=5, Gaussian kernel size=11, and standard deviation=1.5) [23] and peak signal-to-noise ratio (PSNR) [22] . 

To investigate the effectiveness of different modules and used hyperparameters for models, we firstly conduct an ablation study with the following configurations, where the number of the input sinogram angle is fixed to α max = 60: module in DuDoNet [10] , the reconstruction performance drops a lot, but the module also improves generalization result by about 1.5 dB.

Quantitative Results Comparison. Then, we quantitatively compare our models with model-based and data-driven models. Results on the AAPM-test set show that the performance of our models and retrained MetaInvNet [30] are the best. Besides, the original training setting of MetaInvNet has achieved a better generalization performance on COVID-test and LIDC-test sets, but they need more projections to train the model and our models have also achieved better generalizability results than it except when α max = 15, which is due to the extremely limited sinogram information fed into extrapolation layer. On the other hand, HQS-CG has kept their performance across different data distributions, however the prior knowledge modeling limits their reconstruction performance on AAPM-test set, and the tuning and computation time is too expensive. Qualitative Results Comparison. We also visualize the reconstruction results of these methods on AAPM-test and COVID-test datasets. As in the first three rows of Fig. 4 , the reconstructed images from ours and retrained MetaIn-vNet show the best visualization quality on AAPM-test set across different angle numbers. Besides, our results show sharper details with the additional utilization of L SE in the projection domain. When testing the reconstructed image on the COVID-test set, our result also gives sharper details but with more artifacts since the data distribution is very different. Although HQS-CG has achieved better quantitative results on the COVID-test dataset, the reconstructed image of their model in the fourth row is even smoother than FBP.

We propose the novel EPNet for limited-angle CT image reconstruction and the model achieves exciting generalization performance. We utilize dual-domain learning for data consistency in two domains and propose an EPL module to estimate extra sinograms, which provide useful information for the final reconstruction. Quantitative and qualitative comparisons with competing methods verify the reconstruction performance and the generalizability of our model. The effectiveness encourages us to further explore designing a better architecture for EPL in the future.

Learned primal-dual reconstruction

Low-dose ct with a residual encoder-decoder convolutional neural network

Learned full-sampling reconstruction from incomplete data

The cancer imaging archive (TCIA): maintaining and operating a public information repository

Low-dose CT with deep learning regularization via proximal forward backward splitting

Nonlinear image recovery with half-quadratic regularization

Cnn-based projected gradient descent for consistent ct image reconstruction

Deep convolutional neural network for inverse problems in imaging

Adam: A method for stochastic optimization

DuDoNet: Dual domain network for CT metal artifact reduction

Encoding metal mask projection for metal artifact reduction in computed tomography

Adaptive graph-based total variation for tomographic reconstructions

Neural proximal gradient descent for compressive imaging

Tu-fg-207a-04: Overview of the low dose CT grand challenge

Wavelet-based reconstruction for limited-angle x-ray tomography

U-net: Convolutional networks for biomedical image segmentation

Mobile c-arm conebeam ct for guidance of spine surgery: Image quality, radiation dose, and integration with interventional guidance

Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization

Deep unfolded robust pca with application to clutter suppression in ultrasound

Machine learning for tomographic imaging

Reweighted anisotropic total variation minimization for limited-angle ct reconstruction

Image quality assessment: from error visibility to structural similarity

Multiscale structural similarity for image quality assessment

Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss

Deep ADMM-Net for compressive sensing MRI

ADMM-CSNet: A deep learning approach for image compressive sensing

Spectral ct image restoration via an average image-induced nonlocal means filter

A review on deep learning in medical image reconstruction

Jsr-net: a deep network for joint spatial-radon domain ct reconstruction from incomplete data

MetaInv-Net: Meta inversion network for sparse view CT image reconstruction

LEARN++: Recurrent dual-domain reconstruction network for compressed sensing CT

A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises

Handbook of Medical Image Computing and Computer Assisted Intervention