key: cord-0214990-jg8xfmo0
authors: Nikitin, Viktor; Andrade, Vincent De; Slyamov, Azat; Gould, Benjamin J.; Zhang, Yuepeng; Sampathkumar, Vandana; Kasthuri, Narayanan; Gursoy, Doga; Carlo, Francesco De
title: Distributed optimization for nonrigid nano-tomography
date: 2020-07-11
journal: nan
DOI: nan
sha: 6d1f2723726b5ed28f3cc78f2c70830ee1c0ce02
doc_id: 214990
cord_uid: jg8xfmo0

Resolution level and reconstruction quality in nano-computed tomography (nano-CT) are in part limited by the stability of microscopes, because the magnitude of mechanical vibrations during scanning becomes comparable to the imaging resolution, and the ability of the samples to resist beam damage during data acquisition. In such cases, there is no incentive in recovering the sample state at different time steps like in time-resolved reconstruction methods, but instead the goal is to retrieve a single reconstruction at the highest possible spatial resolution and without any imaging artifacts. Here we propose a joint solver for imaging samples at the nanoscale with projection alignment, unwarping and regularization. Projection data consistency is regulated by dense optical flow estimated by Farneback's algorithm, leading to sharp sample reconstructions with less artifacts. Synthetic data tests show robustness of the method to Poisson and low-frequency background noise. Applicability of the method is demonstrated on two large-scale nano-imaging experimental data sets.

Current X-ray imaging instruments demonstrate sub-100 nm resolution level in a wide range of fields, including biology, medicine, geology, and material sciences [1-6]. To reach this or even higher resolution levels one needs to address various instrument component limitations and instabilities from mechanical, detector, and optic components. At the same time, the structure of a sample may change during X-ray data acquisition, leading to an inconsistent tomographic data, further increasing the complexity of the sample reconstruction. This is caused not only by designed in situ study of dynamic processes under specific environmental conditions, but also by uncontrolled sample deformation due, for example, radiation damage. Cryogenic systems like cryojet only partially reduce sample deformation while the cold air stream induces vibrations. On the other hand, in vacuum cryostages typically maintain samples at lower temperatures than cryostreams, more efficiently preventing sample shrinkage without adding vibrations. But such systems compatible with tomographic acquisitions are complex, unpractical, impose stringent experimental constraints ruling out most of in situ experiments [7] [8] [9] .

To address sample deformation due to radiation damage, a number of time-resolved methods have been adopted during the last two decades. They demonstrate significant quality improvement for time-evolving samples compared to the conventional approach, however, most of them are based on additional a priori knowledge about the sample structure and motion, and require demanding computational resources for reconstructing experimental data in a reasonable time. For instance, in [10] we proposed a multi-GPU implementation of a method for suppressing motion artifacts by using time-domain decomposition of functions with a basis chosen with respect to the motion structure. The method operates with a low number of decomposition coefficients that can be used to determine the object state continuously in time. Next, the approach proposed in [11] is based on estimating local structural correlations over multiple time frames and finding inner object edges which remain constant in time, followed by the patched-based regularization according to the object structure. Also, there are methods built upon the concept of compressed sensing, which employs sparsity promoting algorithms, where the prior knowledge is given in terms of spatial-temporal total variation regularization for detecting sharp object changes [12, 13] . Finally, sample evolution can be estimated by using deformation vector fields (DVFs) constructed according to the motion model obtained from projection data [14] [15] [16] [17] [18] [19] . DVF estimation in 4D is time-consuming and may result in lots of possible solutions, therefore, there have been attempts to optimize it. For instance, a recent advanced method proposed in [20] reformulates the standard DVF estimation problem in order to decrease the number of possible solutions for the DVF estimation. The DVF evolution is iteratively estimated for the whole 3D object at different temporal states. This process is time-consuming even though it was implemented on GPUs. The authors estimate DVF for the states corresponding to the first and the last projection angles in each 180 degrees interval and linearly interpolate DVF in between these states. For processing experimental data, they also apply 4 × 4 data binning to get faster and more robust DVF estimation, and turn off the DVF estimation engine after a certain number of iterations with further continuation of solving the problem with a fixed DVF.

In some experiments, there is no requirement to reconstruct the object state at different time steps as proposed by computationally demanding time-resolved reconstruction methods, but instead the goal is to have a single reconstruction of high quality and resolution. These experiments include tomography of biological and medical samples, where X-ray dose deposition causes uncontrolled deformation as a result of radiation damage or thermal load [6, 21, 22] . Another case is tomography of geological samples affected by external cooling or heating systems, where the sample structure changes depending on the rotation angle and corresponding distance to the external device [23] . Even mechanical limitations of the rotation stage, such as inaccurate roll/pitch angles alignment or vibrations, in fact, create a time-resolved tomography problem where only one state needs to be recovered.

The consistency of projections for different angles can be improved by directly shifting projections before reconstruction [6, [24] [25] [26] [27] , or by using iterative re-projection methods based on a joint estimation of alignment errors and the object function [28] [29] [30] . These methods work especially well for the interlaced angular scanning protocol, in which a full tomographic scan is acquired with multiple sample rotations where projection angles are mod 2π different and uniformly cover the interval [0, 2π) [19, 31] . With this protocol, a significant time gap between acquisition of two nearby angles, that should give similar data in a static case, allows to determine corresponding projection shifts by using standard registration methods like cross-correlation.

In this work, we propose a generic distributed optimization framework based on alternating direction method of multipliers (ADMM) [32] for solving the tomography problem jointly with projection alignment and regularization. Projection alignment is done jointly with reconstruction and it is based on estimating optical flow commonly used in computer vision for motion tracking algorithms [33] [34] [35] . Optical flow is defined as the pattern of apparent motion of image objects between two consecutive frames caused by the object movement. In our proposed ADMM framework, we break the whole reconstruction problem into three sub-problems: tomographic reconstruction, estimation of deformation, and regularization of the solution. On each ADMM iteration, the sub-problems are solved independently and coordinated with making use of dual variables. We show that in many cases this scheme with projection data consistency regulated by optical flow results in recovering an object function with less artifacts. To our knowledge, a formal optimization framework has never been used for reconstruction problems with alignment. In contrast to conventional re-projection methods relying on empirical studies of each sub-problem, the ADMM formulation directly connects all sub-problems and built upon formal convergence theory, see [32] and references therein. Our contribution also includes developing a new scheme for accurate estimation of dense optical flow between projections by using Farneback's algorithm [34] . The algorithm can track the points at multiple levels of resolution by generating image pyramids, where each level has a lower resolution compared to the next (higher) level.

In the proposed ADMM model, we increase optical flow resolution with iterations and use the flow from the previous iteration as the initial guess for the next iteration. With this scheme, processing times for the alignment sub-problem is comparable to that of the tomography sub-problem. This strategy, combined with our multi-GPU code acceleration, allows processing large experimental data sets in a reasonable time.

This paper is organized as follows. In Section 2, we formulate the joint tomography problem with alignment and regularization, and show how this optimization problem can be solved with the ADMM by splitting the whole problem into sub-problems. In Section 3, we validate our approach on synthetic data and compare reconstruction results to the conventional method. Implementation details, optimization strategies, convergence results, and resolution estimations are given in Supplement1. Section 4 demonstrates reconstruction results for nano-tomography experimental data sets. Our conclusions and outlook are given in Section 5.

In this section, we present the augmented Lagrangian formulation of the tomography reconstruction problem with alignment and show how this problem is solved by using ADMM [32] with splitting the whole problem by local sub-problems.

Let u be an object in 3D space and d its projection data. The projection operator is given in terms of the Radon transform R, and as an example of the sparsity prior in this work we consider the total variation (TV) regularization term α ∇u 1 where parameter α controls the trade-off between fidelity and regularization terms. TV regularization is a generic tool for suppressing noise and data incompleteness artifacts in reconstructions [36] , although other sparsity priors can be adopted as well. Let D f be the deformation operator that remaps functions to new coordinates given with respect to optical flow f . Then the tomographic reconstruction problem of recovering object u from data d can be solved by considering the following minimization,

By introducing auxiliary variables, ψ 1 and ψ 2 , an equivalent problem can be written as follows, min u, f ,ψ 1 ,ψ 2

(1)

The augmented Lagrangian for the problem (1) is defined for real variables and reads as

∇u − ψ 2 2 2 + α ψ 2 1 where ρ 1 , ρ 2 > 0 are penalty parameters and λ 1 , λ 2 denote dual variables. The ADMM breaks the minimization problem of the augmented Lagrangian into local sub-problems with respect to u, f , ψ 1 , ψ 2 . The sub-problems are then coordinated through variables λ 1 , λ 2 to find a solution for the original problem. Specifically, the following steps are performed at every iteration k,

ψ k+1 1 , f k+1 = arg min

for zeros or some adequate initial guess at k = 0. On every iteration, L ρ 1 ,ρ 2 is minimized over each variable independently by using the most recent updates of other variables. The dual variables are then computed with respect to the scaled sum of the consensus errors. In what follows, we show our solution approach to these sub-problems in Eqs.

(2), (3), and (4). We refer to them as the tomography, alignment, and regularization sub-problems, respectively.

We express the minimization function for the tomography problem in Eq. (2) as

The problem is solved by considering the steepest ascent direction ∇ u F(u) computed as follows,

where the negative divergence operator -div is the adjoint to ∇. With the steepest ascent direction, we can construct iterative schemes for solving (2) by using methods with different convergence rates. Here we employ the Conjugate Gradient (CG) method for its faster convergence rate at the expense of memory requirements. CG iterations are given as u m+1 = u m + γ m η m , where γ m is a step length computed by a line-search procedure [37] and η m is the search direction that we compute by using the Dai-Yuan formula [38] . Alternatively, the tomography sub-problem can be solved by using second-order optimization methods with favorable convergence behavior. However, in practice these methods require heavy computation and memory usage, which is unfavorable for the complex ADMM scheme when processing large data sets.

On each ADMM iteration, we propose solving the tomography sub-problem approximately, by using only a few number of CG iterations since in practice early termination sufficiently speeds up ADMM, see review in [32] and Supplement1.

The minimization functional for the alignment sub-problem in Eq. (3) can be expressed as

As for the tomography sub-problem, we aim at solving this problem approximately on each ADMM iteration. Therefore, we decided not to increase the problem complexity by introducing additional coordinating variables with respect to ψ 1 and f . Instead, we propose sequential updates of these variables while solving the alignment sub-problem. To find the optical flow we fix variable ψ 1 and solvẽ

Sparse and dense optical flow estimation methods have been proposed by many authors, including Lucas and Kanade [33] , Farneback [34] , Brox [35] . In this work we adopted Farneback's algorithm because it gives accurate dense flow estimation, where the density level is controlled through parameters. Moreover, the algorithm allows solving the problem approximately (for a few iterations), which is favorable for the proposed ADMM scheme and consistent with the tomography sub-problem. At the same time, the choice of the method generally depends on the application, and other optical flow estimation algorithms can be applied as well. Farneback's algorithm generates an image pyramid, where each level has a lower resolution compared to the next (higher) level. The scheme can track the points at multiple levels of resolution, starting at the lowest level typically given by a window size parameter that is close to the image size. Increasing the number of pyramid levels enables the algorithm to handle larger displacements of points between frames. However, the number of computations also increases.

In the proposed reconstruction scheme there is no need to consider many pyramids levels for tracking deformation. Variable ψ 1 is updated on each ADMM iteration with the usage of the current object approximation u k+1 and dual variable λ k 1 that are typically not very close to final results on the first ADMM iterations. Thus at initial iterations it makes sense to find optical flow in low resolution with a large window size, and gradually increase resolution by decreasing the window size with iterations. Moreover, since Farneback's algorithm is an iterative procedure, the optical flow on each ADMM iteration can be used as an initial guess for the flow in the next iteration. Therefore, the number of iterations in the algorithm can be kept low.

The problem with respect to variable ψ 1 with fixed f ,

is treated as a standard L 2 minimization problem with linear operators. For a consistent approach with the tomography problem, we employ the CG method to find a solution.

The minimization functional for the regularization problem (4) is given by

The problem can be solved explicitly by using the softthresholding operator [39] ,

where, by abuse of notation, we assume element-wise operations for dividing, taking the absolute value, and comparing with 0. We have considered a sparse prior defined in terms of the total variation, which is a common method of choice for reconstructing data from noisy measurements and from the incomplete number of measurements. Although we use TV for regularization, the proposed framework allows the use of any other regularization approach based on the experimental conditions. For instance, one can use ∇u 2 for decreasing the gradient sparsity level. A multi-GPU implementation of the proposed scheme is available at https://github.com/nikitinvv/tomoalign, implementation details, convergence results, and performance tests are provided in Supplement1. Fig. 1 . Reconstruction of a deforming synthetic object by the Conjugate-Gradient method with pre-alignment before reconstruction (pCG), and by the proposed ADMM scheme with optical flow (OF): a) initial object with an example of projection data, b) deformation speed plot, c) orthogonal slices through the initial object and reconstructions, d) application of the deformation operator D f with the recovered optical flow f to the re-projection data ψ 1 : misalignment and fidelity terms with respect to the data d acquired at the object states marked by colored dots in the plot from b).

In this section, we validate our approach through simulations. We will compare reconstructions by the standard Conjugate Gradient solver with pre-aligned projection, referred to as pCG, and by the proposed ADMM scheme with optical flow, referred to as OF. Noise and additional background robustness for both methods is analyzed with adding the TV regularization term. TV-regularized versions for both methods, pCG and OF, are referred to as pCGTV and OFTV, respectively.

For validation we generated projection data for a continuously deforming synthetic object represented by a set of tube-like structures of different sizes and orientation, see Figure 1a . The amount of deformation for a normalized scanning time t ∈ [0, 1] is proportional to 1 − exp(−3t), see Figure 1b . The plot approximates a slowing down process of the object deformation caused by radiation damage during X-ray scanning. The deformation was simulated as a smooth random field with maximal displacement of 20 pixels (8% of the object size in one dimension). In total 384 projections through the 256 × 256 × 256 object were acquired in the interlaced scanning mode with 2 object rotations. Figure 1c shows reconstruction results comparison for the pCG and OF methods. In all presented orthogonal slices through the reconstructed object, the OF method suppress deformation artifacts appeared in pCG reconstructions as blurred and corrupted ellipses. Note that the OF method recovers not the initial object state, but the state which yields a higher consistency level of the projection data.

Projections in the pCG method were pre-aligned with respect to the projections from the first rotation. Instead of the conventional cross-correlation approach for alignment, we employed Farneback's algorithm with a non-dense optical flow. This approach also gives one common (x, y)-shift for all pixels in a projection, but it is more robust to noise than the crosscorrelation approach. Alternatively, one could consider a dense optical flow for pre-alignment, however, this results in even worse reconstructions because interlaced projections were acquired for different angles and thus are not supposed to be the same up to small features.

For the OF method, in turn, projections were aligned with respect to the re-projection estimate ψ 1 , see Eq. (1). In Figure 1d we show how recovered estimates ψ 1 differ from the data d acquired at different time steps marked by colored dots in Figure 1b . We also show that application of the estimated optical flow f to the re-projection ψ 1 minimizes the data fidelity error d − D f ψ 1 . Visualization of the optical flow is done by using a colored map where colors define flow directions, and color intensities show the amount of deformation.

The iterative scheme for the pCG method was performed for 128 iterations, whereas the OF scheme had 64 outer ADMM iterations and 4 inner iterations for each tomography and alignment sub-problems (early termination is a common practice to speed up ADMM). The optical flow density level was gradually increased with iterations by reducing window sizes in Farneback's algorithm from 256 to 32 pixels.

To check the noise robustness and stability of the proposed method to background changes we modified the synthetic object from Figure 1a by varying intensities of the tube-like structures, as well as adding Poisson noise and low-frequency background noise to projection data. We also considered two levels of deformation simulated as smooth random fields with maximal displacement of 20 and 30 pixels (8% and 12% of the object size), and analyzed how the number of angular rotations in the interlaced scan affects reconstruction quality. Results are given in Figure 2 , where for clarity we present cropped z-slices through the reconstructed object. There are several observations from this figure. The OF method is more robust to Poisson noise and low-frequency background noise than the pCG method. Additional TV regularization reduces high-frequency noise in OFTV reconstructions, whereas pCGTV reconstructions have undesirable cartoon effects. Next, the lower number of projections per rotation in the interlaced scanning mode is favorable for the proposed method but not always for the conventional method. One can observe straight line artifacts in pCG reconstructions for the case with 96 projections per rotation. The artifacts appeared because the alignment was done between interlaced projections that are rather more different than in the case with 192 and 384 projections. We also note that the proposed method partially resolved deformation issues for the non-interlaced scanning (1 rotation with 384 angles).

In practice, lowering the number of projections per rotation introduces additional overhead for rotating the stage, which leads to longer acquisition times and skipping some of the deforming sample states. Although the proposed method significantly suppresses motion artifacts, the trade-off between the rotation speed and reconstruction quality depends on the sample dynamics and needs to be determined experimentally.

In the following, we show reconstructions of projection data from two nano-tomography experiments conducted at the Advanced Photon Source in Argonne National Laboratory with the Transmission X-ray Microscope (TXM) of sector 32-ID [1].

The first experimental data set consists of fragments of a mouse somatosensory (S1) cortex containing myelinated axons. This sample has been stained with heavy metals (osmium, lead) [40] to increase X-ray absorption contrast in the projections. After staining, the sample was embedded in petropoxy resin to make it more X-ray resistant. In order to mitigate sample deformation caused by radiation damage, we employed the interlaced scanning protocol with low exposure times per projection. Projections were acquired in the step-scan mode at 8 keV with 0.2 s exposure time per projection. In total we measured 2880 projections over an angular range of 720 degrees. The detector assembly comprises a LuAG:Ce scintillator, a 5× long working distance Mitutoyo objective lens and a FLIR CCD. The CCD chip is made of 2448 × 2048 pixels 2 with 3.45 µm pixel size. The X-ray objective lens of the TXM is a 150 µm large diameter 50 nm outermost zone width Fresnel zone plate. With a CCD located 3.4 m downstream the sample, it offers a magnification of 56 and an effective pixel size of 22.3 nm. In order to increase the statistic for each voxel, 2 × 2 binning has been applied to projections, leading to a voxel size of 44.6 nm, value slightly inferior to the resolving power of the Fresnel zone plate.

In Figure 3 we provide reconstruction results for the mouse brain data set by the conventional CG solver with pre-aligned projections (pCG), and by the proposed ADMM scheme with dense optical flow (OF dense), and non-dense optical flow (OF non-dense) equivalent to standard (x, y)-shifts of projections. In Figure 3a we show examples of interlaced projections, so as misalignment between them. Figure 3b presents orthogonal slices through reconstructions by different methods. One can observe the proposed OF method significantly improves reconstruction quality compared to the pCG method. At the same time, reconstruction with the dense optical flow gives sharper results than the one with non-dense optical flow, resulting in separation of even smaller features. We also note that the recovered non-dense flow, in fact, approximately equals to the average value of the dense flow, see Figure 3c . Resolution levels were estimated by computing Fourier shell correlation (FSC) [41] between reconstructions obtained from two independent sets of 1440 projections. According to the 1/2-bit criterion, the pCG, OF non-dense, and OF dense methods demonstrate 195, 152, and 127 nm resolution levels, respectively. Corresponding FSC plots are placed in Supplement1.

Projections in the OF method with dense and non-dense optical flows were aligned jointly with tomographic reconstruction. Resolution for the dense optical flow so as for the recovered object was gradually increased during iterations. Specifically, we performed 96 ADMM iterations with 8 × 8 data binning, followed by 48 iterations with 4 × 4 binning, and 24 iterations with 2 × 2 binning, where ADMM variables on each binning level were extrapolated to the next level sizes. The optical flow was computed starting with 256 × 256, 128 × 128 and 64 × 64 window sizes in Farneback's algorithm for three binning levels, respectively, along with decreasing these windows sizes by 2 on each ADMM iteration. So the final iteration for the object size 1024 × 1024 × 1024 was computed with aligning projections by using 12 × 12 (64 − 2 · 24 = 12) window size in Farneback's algorithm. With the proposed scheme, the total time for OF reconstruction on 4 Tesla P100 GPUs was approximately 3 hours, whereas pCG reconstruction with 128 iterations with 2 × 2 binning took 2 hours.

The second experimental data set is related to the manufacturing of reusable N95 grade filter medium for medical masks and respirators. The current N95 filter materials contain coarse micrometer size fibers and rely on electrostatic charge to remove harmful contaminants from the air. Any washing of this material would destroy the electrostatic charge and thereby compromise the filters protective nature [42] . However, during an epidermic or pandemic (like the COVID-19 pandemic), there can be a significant shortfall of medical masks. Thus, the design and manufacture of reusable filter media and masks would mark a significant advancement in protecting first-line medical personnel. Nanofiber-based media fabricated using electrospinning technology can be a good candidate in this regard due to their finer nanometer-size fibers [43] . Nano-tomography is used to characterize in 3D this fine electrospun media in order to understand the influence of medium microstructure on the filtration efficiency and to eventually optimize medium microstructural configurations to achieve a high efficiency of >95%.

The studied material is exclusively composed of polymer fibers showing low attenuation coefficients with hard X-rays. Therefore, Zernike phase contrast [44] [45] [46] has been employed to obtain sufficient imaging contrast on this low absorbing material. It necessitates the use of an optical mechanism consisting in phase ring placed in the back focal plane of the Fresnel zone plate. It translates minute variations in phase into corresponding changes in amplitude, which can be visualized as differences in image contrast. With this technique low absorption contrast samples can be observed and recorded in high phase contrast with sharp clarity of minute specimen detail.

Projections were acquired in the interlaced step-scan mode at 8 keV with 1 s exposure time per projection. In total we measured 700 projections of the size 2448 × 2048 pixels 2 over an angular range of 2520 degrees. In order to increase the statistic for each voxel, 2 × 2 binning has been applied to projections, leading to a voxel size of 44.6 nm. Unlike with absorption contrast, the fiber material appears darker than the surrounding air because of the positive phase contrast effect where the phase ring selectively advances the phase of the non-diffracted beam by π/2.

In Figure 4 we provide reconstruction results for the medical mask data set by the conventional CG solver with pre-alignment of projections (pCG), and by the proposed ADMM scheme with dense optical flow (OF dense). Since the total number of projection angles is low, in order to get noise-free results we also equipped OF reconstructions with TV regularization, having the OFTV dense results. Figure 4a shows examples of interlaced projections, so as significant misalignment between them. Figure 4b presents orthogonal slices through reconstructions by different methods. We observe that the conventional method in this case demonstrates completely distorted and useless results. However, the proposed OF method allows to recover an object state acceptable for further data analysis. Additional TV regularization procedure during reconstruction helps in suppressing high-frequency noise and potentially simplifies further segmentation procedures. The method uses the TV term α ∇u 1 (see Eq.

(1)) with α = 3e−6 adjusted experimentally. Examples of recovered optical flow for different projections in Figure 4c confirm non-rigid misalignment of acquired projections. Although the number of projections is too low, for demonstrating quantitative quality improvement we also estimate resolution levels by computing FSC between reconstructions obtained from two independent sets of 350 projections. In this case we skip the TV-regularization step in reconstruction to avoid correlation of high frequencies. According to the 1/2-bit criterion, the OF dense method demonstrated 160 nm resolution level, while any correlation of frequencies was not observed for the pCG method even for > 500 nm resolution levels, see Supplement1.

The procedure of gradual increasing resolution during iterations is the same as the one for the brain data set. The total time for OF reconstruction on 4 Tesla P100 GPUs was approximately 1.8 hours, whereas pCG reconstruction with 128 iterations took 1.5 hours. 

In this paper, we demonstrated that projection data consistency regulated by optical flow inside the ADMM scheme can result in reconstructing one object state with less artifacts. In comparison to time-resolved methods, the proposed scheme does not require any prior knowledge about the object structure and motion. Moreover, the method has lower computational complexity and spends comparable times to process large-scale experimental data as the conventional CG solver. Interlaced scanning with a low number of angles per rotation is favorable for the proposed method, however, the number of angles in real experiments should be adjusted according to overhead for rotating the stage.

The ADMM framework is easily extendable with new subproblems and additional data processing procedures. A list of procedures that are common in experimental data processing workflow may contain, for instance, data deblurring [47] , denoising and quality enhancement with machine learning [48] . Technically, implementation of new sub-problems is straightforward by adding new auxiliary variables as in Eq. (1) . Recently, we developed a joint ptycho-tomography solver where an ADMM scheme relaxes existing requirements for probe overlap and the number of angles [49] . As a follow-up step, we plan to merge formulations and construct a generic framework for 3D ptychography that will also include alignment procedures.

We also plan to develop a multi-GPU implementation on several computing nodes to further extend the computational efficiency of the framework.

This document provides supplementary information to "Distributed optimization for nonrigid nanotomography". The supplementary material includes optimization aspects for implementing the proposed tomographic reconstruction method with alignment by using the ADMM scheme. We analyze convergence behavior of the method on synthetic data with and without noise, for different number of projections per rotation, and for different number of inner ADMM iterations for solving tomography and alignment sub-problems. We also provide a performance table with computational times on GPUs and Fourier shell correlation plots for estimating resolution of experimental data reconstructions shown in the paper. © 2020 Optical Society of America http://dx.doi.org/10.1364/ao.XX.XXXXXX Convergence behavior and performance of the proposed ADMM-based reconstruction scheme were optimized with the following aspects.

First, on each ADMM iteration we propose solving the tomography and alignment sub-problems approximately, by using only a few number of inner iterations. In practice, this procedure yields more favorable convergence behavior than the one with many inner iterations and allows to significantly speed up the whole ADMM scheme [1, 2] . In Figure S1 we present convergence results for different number of inner CG iterations in the tomography and alignment sub-problems. The figure demonstrates convergence plots of the augmented Lagrangian function for the schemes having the same computational complexity level in the sense that the product of the total number of ADMM iterations and the number of inner CG iterations is the same for all schemes. The tests were performed for synthetic 10% deformed data used in the paper. According to this figure, solving the problem with a low number of inner iterations (specifically with 4) demonstrates more favorable convergence speed results compared to other cases.

Second, we work with normalized and computationally optimized operators defined in the paper for constructing the ADMM framework. This facilitates the initialization of ADMM weighting factors ρ 1 and ρ 2 in the augmented Lagrangian and optimize line-search computations in the CG method for the tomography and alignment sub-problems. Additionally, we follow [3] and vary the penalties ρ 1 , ρ 2 on each iteration to improve the convergence rate. We also optimize evaluation of the Radon transform R since in the proposed framework this operator requires a relatively large amount of computational resources. Working with the data measured with the interlaced angular scanning protocol, the number of angles typically exceeds the object sizes. Therefore, instead of direct summation over the lines in computing the Radon transform we employ the Fourier-based method [4] . For N θ projections angles and N pixels object size in one dimension, the computational complexity O(N 3 log N) of the Fourier-based method is sufficiently lower than the complexity O(N 3 N θ ) of the direct summation over lines. Alternatively, the complexity level can be decreased by employing hierarchical decomposition [5] or the log-polar-based method [6] .

Third, in combination with gradual increasing optical flow resolution with iterations, we also propose a gradual increase of the recovered object resolution. Specifically, we perform a certain number of iterations for binned data and extrapolate the recovered object with optical flow and additional variables to a twice dense grid. The result is then used as an initial guess for the next set of iterations where the data is less binned. Final object and optical flow are both found on dense grids without binning. We employed this scheme for reconstructing the mouse brain and medical mask data sets used in the paper. In Figure S2 we demonstrate Fourier shell correlation plots for reconstructions by different methods.

The last important aspect involves the use of highperformance computing resources. Both alignment and tomography problems are data intensive and can be parallelized with different data-partitioning methods. Specifically, data for the alignment problem can be partitioned based on the rotation angles, and then each angle/partition can independently be processed by using the deformation operator. The tomography problem shows similar properties, where parallelization can be accomplished by slicing the object across the beam direction.

In this work, we use GPUs with Nvidia CUDA technology for accelerating computations. CUFFT and NPP libraries from CUDA Toolkit 10.1 were used for computing Fourier transforms and applying optical flow deformation with high accuracy. For optical flow estimation by using Farnenack's algorithm we utilized GPU-accelerated OpenCV library. Linear algebra operations in the ADMM schemes were implemented in Python using GPU-accelerated CuPy library.

In Table S1 we provide computational times of 1 ADMM iteration for processing data sets of different sizes. To satisfy the Nyquist sampling criterion for tomography, we chose the number of angles as 3/2 of the object size in one dimension. There are several observations from this table. First, it confirms computational complexity O(N θ N 2 ) and O(N 3 log N) for the alignment and tomography sub-problems, respectively, thus allowing users to estimate the whole processing time for other data sizes. Second, the tomographic part of the solver is more time-consuming than the alignment part and should be used as a key factor for determining processing times. However, for the interlaced scanning protocol, where the number of angles is typically much higher than the object size in one dimension, the alignment part may become more computationally demanding and crucial in estimating processing times. Finally, we observe that performance gain with increasing the number of GPUs is not linear, specifically, it does not exceed a factor of 2 when using 4 GPUs compared to 1 GPU. This is caused by a huge amount of semi-sequential CPU-GPU data transfers. We plan to address this problem, so as perform other code optimization, in our further work. Table S1 . Computational times (in seconds) of 1 ADMM iteration with 4 inner CG iterations for tomography and alignment subproblems. The GPU system for testing is based on 4×NVIDIA Tesla P100.

Nanoscale 3D imaging at the advanced photon source

High-resolution non-destructive three-dimensional imaging of integrated circuits

Design, characterization, and performance of a hard x-ray transmission microscope at the National Synchrotron Light Source II 18-ID beamline

OMNY-a tO-Mography nano crYo stage

Quantification and modeling of mechanical degradation in lithium-ion batteries based on nanoscale imaging

Threedimensional alteration of neurites in schizophrenia

Soft X-ray microscopy with a cryo scanning transmission X-ray microscope: II. Tomography

MISTRAL: a transmission soft X-ray microscopy beamline for cryo nano-tomography of biological samples and magnetic domains imaging

Latest developments in microtomography and nanotomography at PETRA III

Fourdimensional tomographic reconstruction by time domain decomposition

4D-CT reconstruction with unified spatial-temporal patch-based regularization

Spatial-temporal total variation regularization (STTVR) for 4D-CT reconstruction

Iterative 4D cardiac micro-CT image reconstruction using an adaptive spatiotemporal sparsity prior

Dynamic X-ray computed tomography

Evaluation of 4D-CT lung registration

Four dimensional material movies: High speed phase-contrast tomography by backprojection along dynamically curved paths

MoVIT: a tomographic reconstruction framework for 4D-CT

Motion compensated micro-CT reconstruction for in-situ analysis of dynamic processes

Space-time tomography for continuously deforming objects

Ab initio nonrigid x-ray nanotomography

Quantifying mesoscale neuroanatomy using x-ray microtomography

Relative merits and limiting factors for x-ray and electron microscopy of thick, hydrated organic materials

Dynamic in-situ imaging of methane hydrate formation and self-preservation in porous media

Towards automatic electron tomography II. Implementation of autofocus and lowdose procedures

Fiducial-less alignment of cryo-sections

Alignator: a GPU powered software package for robust fiducial-less alignment of cryo tilt-series

Automatic coarsealignment for TEM tilt series of rod-shaped specimens collected with a full angular range

Rapid alignment of nanotomography data using joint iterative reconstruction and reprojection

Automated angular and translational tomographic alignment and application to phasecontrast imaging

Alignment methods for nanotomography with deep subpixel accuracy

TIMBIR: A method for time-space reconstruction from interlaced views

Distributed optimization and statistical learning via the alternating direction method of multipliers

An iterative image registration technique with an application to stereo vision

Two-frame motion estimation based on polynomial expansion

Large displacement optical flow: descriptor matching in variational motion estimation

An introduction to continuous optimization for imaging

Numerical optimization

A nonlinear conjugate gradient method with a strong global convergence property

De-noising by soft-thresholding

Large-volume en-bloc staining for electron microscopy-based connectomics

Fourier shell correlation threshold criteria

Effect of decontamination on the filtration efficiency of two filtering facepiece respirator models

Electrospinning and Electrospun Nanofibers: Methods, Materials, and Applications

Das phasenkontrastverfahren bei der mikroskopischen Beobachtung

Amplitude and phase contrast in x-ray microscopy

X-ray nanocomputed tomography in Zernike phase contrast for studying 3D morphology of Li-O2 battery electrode

Deconvolution methods for image deblurring in optical coherence tomography

TomoGAN: low-dose synchrotron x-ray tomography with generative adversarial networks: discussion

Photon-limited ptychography of 3D objects via Bayesian reconstruction

Joint ptycho-tomography reconstruction through alternating direction method of multipliers

Photon-limited ptychography of 3D objects via Bayesian reconstruction

Distributed optimization and statistical learning via the alternating direction method of multipliers

On the fast Fourier transform of functions with singularities

Fast tomographic reconstruction via rotationbased hierarchical backprojection

Fast algorithms and efficient GPU implementations for the Radon transform and the backprojection operator represented as convolution operators

The authors declare no conflicts of interest.See Supplement1 for supporting content.