key: cord-0058322-cvx8mefi authors: Shusharina, Nadya; Bortfeld, Thomas; Cardenas, Carlos; De, Brian; Diao, Kevin; Hernandez, Soleil; Liu, Yufei; Maroongroge, Sean; Söderberg, Jonas; Soliman, Moaaz title: Cross-Modality Brain Structures Image Segmentation for the Radiotherapy Target Definition and Plan Optimization date: 2021-02-23 journal: Segmentation, Classification, and Registration of Multi-modality Medical Imaging Data DOI: 10.1007/978-3-030-71827-5_1 sha: a6737e67b7e523534a8a15d8fafb1ba73850857d doc_id: 58322 cord_uid: cvx8mefi This paper summarizes results of the International Challenge “Anatomical Brain Barriers to Cancer Spread: Segmentation from CT and MR Images”, ABCs, organized in conjunction with the MICCAI 2020 conference. Eighteen segmentation algorithms were trained on a set of 45 CT, T[Formula: see text] -weighted MR, and T[Formula: see text] -weighted FLAIR MR post-operative images of glioblastoma and low-grade glioma patients. Manual delineations were provided for the brain structures: falx cerebri, tentorium cerebelli, transverse and sagittal brain sinuses, ventricles, cerebellum (Task 1) and for the brainstem, structures of visual pathway, optic chiasm, optic nerves, and eyes, structures of auditory pathway, cochlea, and lacrimal glands (Task 2). The algorithms were tested on a set of 15 cases and received the final score for predicting segmentation on a separate 15 case image set. Multi-rater delineations with seven raters were obtained for the three cases. The results suggest that neural network based algorithms have become a successful technique of brain structure segmentation, and closely approach human performance in segmenting specific brain structures. With evolving technology, the delivery of radiation dose to the treatment target has become very conformal, with as small as 1 mm uncertainty. Therefore, the definition of the target itself and its consistency are becoming critical. Radiation oncologists define the target for irradiation as the gross disease which is visible on medical images, plus a margin that accounts for microscopic "invisible" spread into surrounding tissues. The margin defines the boundary of the clinical target volume (CTV) . Recent studies suggested that in some very conformal treatments, higher risk of treatment failures could be attributed to inadequately or inaccurately defined CTV [1] . While the CTV cannot be defined by thresholding image intensity, its boundary can be determined by anatomical structures that are natural barriers to tumor spread. For gliomas, the most common brain tumors, the falx cerebri, tentorium cerebelli, brain sinuses, and ventricles are known to restrict the spread, and therefore these structures define the CTV boundary [2] . In current practice, CTV is manually delineated on CT image acquired for radiotherapy treatment planning to deliver a high curative radiation dose. Correct placement of the CTV boundary can spare a measurable amount of brain tissue from radiation [3] , which will in turn reduce the risk of post-treatment neurocognitive deficit [4] . Identifying neuroanatomy on CT is challenging because of low soft-tissue image contrast and complex 3D shape of brain structures. This presents difficulty to follow target delineation guidelines and often results in unnecessary radiation exposure of tumor-free tissues. Undoubtedly, accurate segmentation of the barrier structures will improve definition of the CTV boundary leading to better treatment outcomes for the patients. The quality of the treatment plan is determined not only by the precise placement of the CTV boundary but also by the accurate delineation of healthy structures that must be spared from receiving the radiation dose. The plans for glioma patients must include delineations of the brainstem, structures of visual pathway, optic chiasm, optic nerves, eyes, and lens, structures of auditory pathway, cochlea, and lacrimal glands. These structures are routinely manually outlined for each treatment plan; therefore, automation will improve both efficiency of the plan creation workflow and consistency of the structure definition. In current clinical practice, in addition to the CT scan used for the treatment planning, each patient diagnosed with glioma and prescribed radiotherapy undergoes diagnostic post-Gadolinium T 1 -weighted MR and T 2 -weighted FLAIR MR imaging. Each of these modalities provides information on different types of physical properties, and therefore includes additional information relevant for image segmentation. To address the challenges of anatomy segmentation, multiple groups have proposed to combine CT images with information from MR scans [5, 6] . Chartsias et al. [7] used a generative adversarial network (GAN) approach to produce synthetic MR images from CT images from a unpaired training set of CT and MR images. Synthetic images were then used for data augmentation when training a segmentation algorithm. The GAN approach was also used in [8] to develop a cross-modality deep learning segmentation algorithm, where GAN-generated pseudo MR images were used for segmentation in conjunction with real CT. In these papers, the CT and MR data were unpaired, reflecting the difficulty of obtaining matched CT and MR images for a substantial number of cases. Robust segmentation algorithms gracefully handling missing imaging modalities have been demonstrated, using an abstraction layer [9] or channel dropout during training [10] . In conjunction with the international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI), we organized a challenge "Anatomical Brain Barriers to Cancer Spread: Segmentation from CT and MR Images" (ABCs). The goal of the challenge was to identify the best methods of segmenting brain structures that serve as barriers to the spread of brain cancers and structures to be spared from irradiation, for use in computer assisted target definition for glioma and radiotherapy plan optimization. For the challenge, we compiled a large image dataset acquired for patients diagnosed with glioblastoma and low-grade glioma who underwent surgery and radiotherapy treatment at Massachusetts General Hospital. Ground truth manual delineations for the anatomical brain structures were provided to developers to train their algorithms. The algorithms were optimized through testing against unseen manual delineations during the challenge and the final predictions were submitted as one-time inference on a separate, also unseen, set of cases. This paper summarizes the results of the ABCs challenge. The imaging data consisted of 75 cases of glioblastoma and low-grade glioma patients treated with surgery and adjuvant radiotherapy at Massachusetts General Hospital. The patients underwent routine post-surgical MRI examination by acquiring two MR sequences, contrast enhanced 3D-T 1 and 2D multislice-T 2 FLAIR required to define target volumes for radiotherapy treatment. CT scans were acquired after diagnostic imaging to use in radiotherapy treatment planning. Within-slice resolution (in mm) of the CT images was 0.7 × 0.7 (63%) 0.6 × 0.6 (27%) and 0.8 × 0.8, 0.5 × 0.5, 1.0 × 1.0 (10%) with the slice thickness of 2.5 mm. The resolution of MR T 1 was 1.0 × 1.0 (50%), 0.5 × 0.5 (25%), 0.9 × 0.9 (22%) and 1.1 × 1.1 mm (3%). The slice thickness for majority of images was 1.0 mm (75%), while the rest were between 0.5 and 2.0 mm. For the majority of MR T 2 images, the resolution was 0.9 × 0.9 mm (74%) and varied between 0.4 × 0.4 mm to 1.0 × 1.0 mm for the rest. The slice thickness was typically 6 mm (92%) although several images had the thickness of 1.0, 5.0, 6.5, and 7.0 mm. The challenge was divided into two tasks; Task 1 required segmentation of the brain structures relevant for the radiotherapy target definition and Task 2 required segmentation of the structures to be included in the treatment plan optimization to minimize radiation dose to adjacent healthy organs. For the both tasks, all cases image sets were manually delineated to create a set of nonoverlaying structures. For Task 1, the segmentation was done by one annotator and approved by a neuro-anatomist. For Task 2, multiple annotators, following the same annotation protocol, performed segmentation. For Task 1, ground truth labeling was obtained by manual delineation of the brain structures, falx cerebri, tentorium cerebelli, transverse and sagittal brain sinuses, ventricles, and cerebellum. As these structures are best seen on MR images (see Fig. 1 ), delineation was performed on the CT and T 1 -weighted MR image fusion using MIM Maestro software v.6 (MIM Software Inc, Cleveland, OH, USA). We ensured that the structures adjacent to each other did not overlap and all structures were continuous without holes in each of three planes. For Task 2, manually delineated brainstem, optic chiasm, optic nerves, and eyes, lacrimal glands, and cochlea were previously delineated by certified clinical personnel and approved by the radiation oncologist for the treatment plan optimization (see Fig. 2 ). These structures were manually outlined on the planning CT scan which was used to create the plan. For each case, the two MR images were co-registered since they were acquired sequentially during the same short imaging session. The planning CT and diagnostic MR images were aligned using rigid registration with 6 degrees of freedom. All images were resampled to the resolution of 1.2 mm in each direction and cropped to the size of 164 × 194 × 142 voxels. The cropped volume was centred at the point at the top of the brainstem. To assess the accuracy of auto-delineation we used two metrics, the volumetric Dice Similarity Coefficient (DSC) and surface Dice Similarity Coefficient (SDSC). The volumetric DSC is a voxel-wise measure of overlap of two binary image regions, it normalizes the size of overlap to the average size of the two structures: where V m is the set of voxels of structure m (manual ground truth) and V a is the set of voxels of structure a (automated segmentation). The SDSC metric assesses the distance between two surfaces relative to a given tolerance τ , see [11] , providing a measure of agreement between the borders of manually and automatically segmented structures: where S m , S a are areas of the surfaces of structures m and a, B τ m and B τ a are the border regions of thickness τ for the surfaces of structures m and a, and S m ∩ B a is the surface area of the part of S m such that any voxel in this part is no farther than τ from S a . SDSC ranges from 0 to 1 representing the fraction of the structure border that has to be manually corrected because it deviates from the ground truth by more than the acceptable distance defined by the tolerance τ . The performance of the algorithms was evaluated using SDSC with a tolerance τ = 2 mm. For the challenge participants, the submission score was calculated as a mean of DSC and SDSC of predictions for all structures and all cases. Participants were solicited through the MICCAI 2020 satellite events announcements. They were required to register on the challenge website (https://abcs. mgh.harvard.edu) to download manually annotated training data. After the training period of 8 weeks, the test data was released, consisting of 15 imaging sets for the participants to tune their algorithms. During testing period of three months, 18 teams submitted their predictions to be listed on the public leaderboard. To run the challenge, we created a computational platform for automated evaluation of the predictions submitted by the challenge participants. The automated Python scripted workflow consisted of calculation of the algorithm performance metrics, scoring, and ranking predictions. Two weeks prior to the MICCAI conference, the final test of 15 unseen image sets was released. Of the eighteen teams submitted the first test results, ten teams submitted their final predictions to be ranked competitively. Participants entering final ranking were asked to submit a six page summary paper describing their algorithms; of the ten submissions, five were invited to publish their full-size papers in this post-conference proceedings. Fig. 3 . Accuracy metrics, DSC and SDSC at the tolerance τ = 2 mm, of inter-rater variation characterizing agreement between each of the seven raters and the majority vote contour. In order to benchmark the accuracy of the algorithms, we conducted an interinstitutional inter-rater variability study for manual delineation of all 15 structures for three randomly selected cases. Seven raters from the Massachusetts General Hospital and MD Anderson Cancer Center were involved, five clinicians, one medical dosimetrist, and one radiographer. The raters were provided with guidelines to contour each of the 15 structures on 3 sample cases with manual contours for review prior to contouring. The seven contours per structure were collected and processed to perform the pair-wise comparison to each other and to the voxel-wise majority vote delineation using DSC and SDSC at the tolerance τ = 2 mm. Variability among raters was measured by the mean of DSC and SDSC between each rater and majority vote delineation for all cases and all structures (see Fig. 3 for the two Tasks). The range of DSC was (0.78, 0.881) for Task The accuracy metrics obtained by pairwise analysis comparing the mean agreement among all raters for all cases for each individual structure (see Fig. 4 for the two tasks) provided a baseline for an automated segmentation algorithm performance. Over 160 segmentation predictions were collected from the 18 participating teams during the three month algorithm optimization period. The submitted results have been compared according to the mean of the DSC and SDSC between predictions and manual delineations. The results are compiled in Table 2 and presented in Figs. 5 and 6 where the results of the final test are also shown. In this paper we presented the results of the ABCs challenge organized in conjunction with the MICCAI 2020 conference that was setup to identify the best algorithms for automated segmentation of brain structures that are used to define the clinical target volume (CTV) for radiotherapy treatment of glioma patients and the structures used for the treatment plan optimization. Accurate placement of the CTV boundary is the key to defining success of the treatment. For highly conformal dose delivery techniques, the treatment setup uncertainties are as low as 1.7 mm and do not exceed 3.5 mm [22] . The uncertainty of defining CTV boundary should not be larger than the targeting uncertainties. Therefore, clinically acceptable accuracy of auto-segmentation, defined by the SDSC, should be at least 0.95 for the tolerance τ = 2 mm. The challenge showed that deep learning algorithms with multi-modality inputs can generate high quality segmentations. Specifically, four of five brain structures, tentorium cerebelli (0.95), brain sinuses (0.96), ventricles (0.96), and cerebellum (0.98) were segmented with the clinically acceptable accuracy. For the falx cerebri (0.93), the accuracy is very close to that definition. As the degree of automation in routine clinical practice increases, rigorous evaluation of algorithm performance and extensive discussions between developers, clinicians, and regulatory authorities will become more important [23] . Although specific approval criteria are still under development, it is likely that they will include algorithmic segmentations being statistically indistinguishable from human expert ones. In the ABCs challenge, algorithms for Task 1 demonstrated a remarkable consistency of DSC for the final test (Fig. 5) . For ventricles and sinuses, the leading algorithms showed the mean and variance of the DSC indistinguishable from those of seven human experts. For the falx cerebri, tentorium cerebelli, and cerebellum, algorithms were extremely consistent, unlike the human experts. As the ground truth segmentation for Task 1 structures were performed by a single expert, one cannot exclude that the algorithms could have learned and imitated this person's approach to segmentation. However, a similar pattern was also observed in Task 2 (Fig. 6) , where ground truth segmentations were made by several independent experts (different from those in the inter-rater study) in routine clinical practice. Automated segmentations of the brainstem and optic chiasm were extremely consistent and showed much lower variance than the seven human experts. For most of the other structures, the average DSC over the 15 final test cases (red squares) fell in the range of the variance between the human experts (cyan boxes). That said, further work is needed to quantify the variation in human expert segmentations for a larger number of cases and a broader representations of experts from different institutions. Taken together, the results of the ABCs challenge suggest that neural network based algorithms have become a successful technique of brain structure segmentation, and closely approach human performance in segmenting specific brain structures. Conformal arc radiotherapy for prostate cancer: increased biochemical failure in patients with distended rectum on the planning computed tomogram despite image guidance by implanted markers ESTRO-ACROP guideline "target delineation of glioblastomas NRG brain tumor specialists consensus guidelines for glioblastoma contouring Dose-dependent cortical thinning after partial brain irradiation in high-grade glioma Multi-modal learning from unpaired images: application to multi-organ segmentation in CT and MRI Unsupervised domain adaptation via disentangled representations: application to crossmodality liver segmentation Adversarial image synthesis for unpaired multi-modal cardiac data Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation HeMIS: hetero-modal image segmentation Brain tumor segmentation on MRI with missing modalities Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy U-Net: convolutional networks for biomedical image segmentation nnu-net: self-adapting frame-work for U-Net-based medical image segmentation Deep residual learning for image recognition Harnessing 2D networks and 3D features for automated pancreas segmentation from volumetric CT images 3D U-Net: learning dense volumetric segmentation from sparse annotation C2FNAS: coarse-to-fine neural architecture search for 3D medical image segmentation Fast and accurate deep network learning by exponential linear units (ELUs). arXiv e-prints Attention U-Net: learning where to look for the pancreas V-Net: fully convolutional neural networks for volumetric medical image segmentation Inception-v4, inception-ResNet and the impact of residual connections on learning Analysis of the setup uncertainty and margin of the daily ExacTrac 6D image guide system for patients with brain tumors Advances in auto-segmentation