key: cord-0058360-emzuggzw
authors: Estienne, Théo; Vakalopoulou, Maria; Battistella, Enzo; Carré, Alexandre; Henry, Théophraste; Lerousseau, Marvin; Robert, Charlotte; Paragios, Nikos; Deutsch, Eric
title: Deep Learning Based Registration Using Spatial Gradients and Noisy Segmentation Labels
date: 2021-02-23
journal: Segmentation, Classification, and Registration of Multi-modality Medical Imaging Data
DOI: 10.1007/978-3-030-71827-5_11
sha: 8c29a04ad1f6110c75f6da04db88f0514aa7e0ea
doc_id: 58360
cord_uid: emzuggzw

Image registration is one of the most challenging problems in medical image analysis. In the recent years, deep learning based approaches became quite popular, providing fast and performing registration strategies. In this short paper, we summarise our work presented on Learn2Reg challenge 2020. The main contributions of our work rely on (i) a symmetric formulation, predicting the transformations from source to target and from target to source simultaneously, enforcing the trained representations to be similar and (ii) integration of variety of publicly available datasets used both for pretraining and for augmenting segmentation labels. Our method reports a mean dice of 0.64 for task 3 and 0.85 for task 4 on the test sets, taking third place on the challenge. Our code and models are publicly available at https://github.com/TheoEst/abdominal_registration and https://github.com/TheoEst/hippocampus_registration.

In the medical field, the problem of deformable image registration has been heavily studied for many years. The problem relies on establishing the best dense voxel-wise transformation (Φ) to wrap one volume (source or moving, M ) to match another volume (reference or fixed, F ) in the best way. Traditionally, different types of formulations and approaches had been proposed in the last years [17] to address the problem. However, with the recent advances of deep learning, a lot of learning based methods became very popular currently, providing very efficient and state-of-the art performances [9] . Even if there is a lot of work in the field of image registration there are still a lot of challenges to be addressed. In order to address these challenges and provide common datasets for the benchmarking of learning based [5, 19] and traditional methods [1, 10] , the Learn2Reg challenge is organised [4] . Four tasks were proposed by the organisers with different organs and modalities. In this work, we focused on two tasks: the CT abdominal (task 3) and the MRI hippocampus registration (task 4). In this work, we propose a learning based method that learns how to obtain spatial gradients in a similar way to [6, 18] . The main contributions of this work rely on (i) enforcing the same network to predict both Φ M →F and Φ F →M deformations using the same encoding and implicitly enforcing it to be symmetric and (ii) integrating noisy labels from different organs during the training, to fully exploit publicly available datasets. In the following sections, we will briefly summarise these two contributions and present our results that gave to our method the third position in the Learn2Reg challenge 2020 (second for task 3 and third for task 4).

An overview of our proposed framework is presented in the Fig. 1 . Our method uses as backbone a 3D UNet [3] based architecture, which consists of 4 blocks with 64, 128, 256 and 512 channels for the encoder part (E). Each block consists of a normalisation layer, Leaky ReLU activation, 3D convolutions with a kernel size of 3 × 3 × 3 and convolution with kernel size and stride 2 to reduce spatial resolution. Each of the F, M volumes passes independently through the encoder part of the network. Their encoding is then merged using the subtraction operation before passing through the decoder (D) part for the prediction of the optimal spatial gradients of the deformation field ∇Φ. We obtained the deformation field Φ from its gradient using integration which we approximated with the cumulative summation operation. Φ is then used to obtain the deformed volume together with its segmentation mask using warping M warp = W(M, Φ M →F ). Finally, we apply deep supervision to train our network in a way similar to [13] .

Symmetric Training. Even if our grid formulation has constraints for the spatial gradients to avoid self-crossings on the vertical and horizontal directions for each of the x,y,z-axis, our formulation is not diffeomorphic. This actually indicates that we can not calculate the inverse transformation of Φ M →F . To deal with this problem, we predict both Φ M →F and Φ F →M and we use both for the optimization of our network. Different methods such as [8, 12] explore similar concepts using however different networks for each deformation. Due to our fusion strategy on the encoding part, our approach is able to learn both transformations with less parameters. In particular, our spatial gradients are obtained by:

Pretraining and Noisy Labels. Supervision has been proved to boost the performance of the learning based registration methods integrating implicit anatomical knowledge during the training procedure. For this reason, in this study, we investigate ways to use publicly available datasets to boost performance. We exploit available information from publicly available datasets namely KITS 19 [11] , Medical Segmentation Decathlon (sub-cohort Liver, Spleen, Pancreas, Hepatic Lesion and Colon) [16] and TCIA Pancreas [7, 15] . In particular, we trained a 3D UNet segmentation network on 11 different organs (spleen, right and left kidney, liver, stomach, pancreas, gallbladder, aorta, inferior vena cava, portal vein and oesophagus). To harmonise the information that we had at disposal for each dataset, we optimised the dice loss only on the organs that were available per dataset. The network was then used to provide labels for the 11 organs for approximately 600 abdominal scans. These segmentation masks were further used for the pretraining of our registration network for the task 3. Furthermore, we explored the use of pretraining of registration networks on domain-specific large datasets. In particular, for task 3 the ensemble of the publicly available datasets together with their noisy segmentation masks were used to pretrain our registration network, after a small preprocessing including an affine registration step using Advanced Normalization Tools (ANTs) [2] and isotropic resampling to 2 mm voxel spacing. Moreover, for task 4, we performed an unsupervised pretraining using approximately 750 T1 MRI from OASIS 3 dataset [14] without segmentations. For both tasks, the pretraining had been performed for 300 epochs.

To train our network, we used a combination of multiple loss functions. The first one was the reconstruction loss optimising a similarity function over the intensity values of the medical volume L sim . For our experiments, we used the mean square error function and normalized cross correlation, depending on the experiment, between the warped image M warp and the fixed image F . The second loss integrated anatomical knowledge by optimising the dice coefficient between the warped segmentation and the segmentation of the fixed volume: L sup = Dice(M warp seg , F seg ). Finally, a regularisation loss was also integrated to enforce smoothness of the displacement field by keeping it close to zero deformation : L smo = ||∇Φ M →F ||. These losses composed our final optimization strategy calculated for both ∇Φ M →F and ∇Φ F →M

where α, β and γ were weights that were manually defined. The network was optimized using Adam optimiser with a learning rate set to 1e −4 .

Regarding the implementation details, for task 3, we used batch size 2 with patch size equal to 144 × 144 × 144 due to memory limitations. Our normalisation strategy included the extraction of three CT windows, which all of them are used as additional channels and min-max normalisation to be in the range (0, 1). For our experiments we did not use any data augmentation and we set α = 1, β = 1 and γ = 0.01. The network was trained on 2 Nvidia Tesla V100 with 16 GB memory, for 300 epochs for ≈12 h. For task 4, the batch size was set to 6 with patches of size 64 × 64 × 64 while data augmentation was performed by random flip, random rotation and translation. Our normalisation strategy in this case included: N (0, 1) normalisation, clipping values outside of the range [−5, 5] and min-max normalisation to stay to the range (0, 1). The weights were set to α = 1, β = 1 and γ = 0.1 and the network was trained on 2 Nvidia GeForce GTX 1080 GPUs with 12 GB memory for 600 epochs for ≈20 h.

The segmentation network, used to produce noisy segmentations, was a 3D UNet trained with batch size 6, learning rate 1e −4 , leaky ReLU activation functions, instance normalisation layers and random crop of patch of size 144×144×144. During inference, we kept the ground truth segmentations of the organs available, we applied a normalisation with connected components and we checked each segmentations manually to remove outlier results.

For each task, we performed an ablation study to evaluate the contribution of each component and task 3, we performed a supplementary experiment integrating the noisy labels during the pretraining. The evaluation was performed in terms of Dice score, 30% of lowest Dice score, Hausdorff distance and standard deviation of the log Jacobian. These metrics evaluated the accuracy and robustness of the method as well as the smoothness of the deformation. Our results are summarised in Table 1 , while some qualitative results are represented in Fig. 2 . For the inference on the test set, we used our model trained on both training and validation datasets. Concerning the computational time, our approach needs 6.21 and 1.43 s for the inference respectively for task 3 and 4. This is slower than other participants to the challenge, probably due to the size of our deep network which have around 20 millions parameters.

Concerning task 3, one can observe a significant boost on the performance when the pretraining with the noisy labels was integrated. Due to the challenging nature of this registration problem, the impact of the symmetric training was not so high in any of the metrics. On the other hand, for task 4, the symmetric component with the pretraining boosted the robustness of the method while the pretraining had a lower impact than on task 3. One possible explanation is that for this task, the number of provided volumes in combination with the nature of the problem was enough for training a learning based registration method. 

In this work, we summarise our method that took the 3rd place in the Learn2Reg challenge, participating on the tasks 3 & 4. Our formulation is based on spatial gradients and explores the impact of symmetry, pretraining and integration of public available datasets. In the future, we aim to further explore symmetry in our method and investigate ways that our formulation could hold diffeomorphic properties. Finally, adversarial training is also something that we want to explore in order to be deal with multimodal registration.

Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain

Advanced normalization tools (ANTS)

3D U-Net: learning dense volumetric segmentation from sparse annotation

Learn2reg -the challenge

Unsupervised learning for fast probabilistic diffeomorphic registration

Deep learning-based concurrent brain registration and tumor segmentation

Automaticmultiorgan segmentation on abdominal CT with dense V-networks

End-to-end unsupervisedcycleconsistent fully convolutional network for 3D pelvic CT-MR deformable registration

Deep learning in medical image registration: a survey

MRF-based deformable registration and ventilation estimation of lung CT

The KiTS19 challenge data: 300 kidney tumor cases with clinical context, CT semantic segmentations, and surgical outcomes

Unsupervised deformable image registration using cycle-consistent CNN

Learning a probabilistic model for diffeomorphic registration

Open access series of imaging studies: longitudinal MRI data in nondemented and demented older adults

Data from pancreas-CT. The cancer imaging archive

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

Deformable medical image registration: a survey

Linear and deformable image registration with 3D convolutional neural networks

A deep learning framework for unsupervised affine and deformable image registration