key: cord-0058342-z2xsfjme authors: Gay, Skylar S.; Yu, Cenji; Rhee, Dong Joo; Sjogreen, Carlos; Mumme, Raymond P.; Nguyen, Callistus M.; Netherton, Tucker J.; Cardenas, Carlos E.; Court, Laurence E. title: A Bi-directional, Multi-modality Framework for Segmentation of Brain Structures date: 2021-02-23 journal: Segmentation, Classification, and Registration of Multi-modality Medical Imaging Data DOI: 10.1007/978-3-030-71827-5_6 sha: 57f90714e6577b7d8681240db333c9ef301ac9b1 doc_id: 58342 cord_uid: z2xsfjme Careful delineation of normal-tissue organs-at-risk is essential for brain tumor radiotherapy. However, this process is time-consuming and subject to variability. In this work, we propose a multi-modality framework that automatically segments eleven structures. Large structures used for defining the clinical target volume (CTV), such as the cerebellum, are directly segmented from T1-weighted and T2-weighted MR images. Smaller structures used in radiotherapy plan optimization are more difficult to segment, thus, a region of interest is first identified and cropped by a classification model, and then these structures are segmented from the new volume. This bi-directional framework allows for rapid model segmentation and good performance on a standardized challenge dataset when evaluated with volumetric and surface metrics. Successful radiotherapy of brain tumors requires careful identification and delineation of normal organs-at-risk (OARs). Standard clinical practice involves manual delineation, a time-consuming process that is subject to intra-and inter-observer variability [1, 2] . Existing automatic segmentation approaches often fall within four categories: atlas based, machine learning based, deformable based, or region-based. Atlas based approaches involve large amount of registration, a computationally expensive process that is also dependent upon atlas anatomical similarity. Machine learning based techniques suffer from lack of global understanding of shape information, especially for complex structures with inhomogeneous textures. Deformable methods are sensitive to initialization, whereas region-based methods often fail at low contrast boundaries [3] . relatively to the background class. In this work, we therefore apply a two-stage approach to segmenting brain structures. Large structures are directly segmented, while smaller ones are first localized by a classification algorithm, and then segmented. This approach shows good results when used as part of the ABCs [8] challenge, a satellite event of the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2020 annual conference. This approach segments structures from the ABCs dataset by combining predictions from a bi-directional framework. Of the T1-weighted MR, T2-weighted MR, and CT images, only the T1w and T2w MR images are used in this approach to segment large structures. Challenge segmentations are divided into two tasks: Task 1 segments brain structures (falx cerebri, tentorium cerebelli, sagittal and transverse brain sinuses, cerebellum, ventricles) critical for defining clinical target volumes (CTVs) for brain radiotherapy treatment, while Task 2 segments structures (brainstem, optic chiasm, cochleas, eyes, lacrimal glands, and optic nerves) used for optimizing the radiotherapy treatment plan. Large structures, defined as all structures from Task 1 along with the brainstem, and optic chiasm from Task 2, are directly segmented using MR images alone. For small structures (eyes, optic nerves, lacrimal glands, and cochleas), Inception-ResNet-v2 is used to localize each VOI using the CT, followed by segmentation of these structures from the co-registered T1 and T2 VOIs. Bi-directional framework. Task 1 uses T1 and T2 images to directly segment large structures. For Task 2, the CTs are input into an Inception Res-Net-V2 and outputs the range of slices for each small structure (using a single model per structure). These coordinates are used to select VOIs from the T1 and T2 images and are subsequently segmented as in Task 2. Black circles are convolutional blocks, black circles with down/up arrows are down/up sampling, Cyan circles are 1 × 1 convolutional blocks, green circles are convolutional blocks with ReLU pre-activations and sigmoid post activations, pink squares in the Attention U-Net network are attention gates. Data from ABCs [8] were used to train segmentation CNNs. This dataset included glioblastoma and low-grade glioma patients, with all provided patient scans consisted of CT, T1-weighted MR, and T2-weighted MR images of the post-operative brain. All scans were rescaled to have the same spatial dimensions. This dataset provided separate training and testing sets of 45 and 15 patients, respectively. The training dataset was further split 70%-30% training-validation. At the conclusion of the challenge, a previously unseen dataset of 15 additional patients was provided and used for the final scoring stage. After preliminary evaluations of various deep learning models, a modified Attention U-Net [9] with batch normalization and a 3 × 3 × 3 convolutional kernel was selected for training all Task 1 segmentation tasks. DSC was used as the loss function as proposed by Milletari et al. [10] . Adam [11] was used as the optimizer, and the learning rate was set to 0.001 and reduced by 15% when loss did not decrease for 70 epochs. Training was performed on a 16 GB NVIDIA-V100 with batch size 2. Each model was trained for approximately 1,000 iterations using early stopping. Each of the five structures in Task 1 was trained using a dedicated model, and individual structure predictions were combined to create the final predictions. During training, a body mask was found by thresholding all CT values of -600 or higher. T1w and T2w regions within this mask were normalized by subtracting the mean and then dividing by the standard deviation ("z-score normalization"). Values outside the mask were reset to the minimum value for each scan. Only T1w and T2w images were then used by the model for training. All data was augmented on-the-fly during training by random combinations of rotations in the x-y plane (±15°); translations (±10, ±10, ±5) along the x, y, and z axes, respectively; zooms in the x-y plane (±15%), and warps. Additionally, before training, all data were flipped left-right to increase the training set size. The structures in Task 2, apart from the brainstem, are significantly smaller than the structures in Task 1, and the approach used for Task 1 was not optimal for the small structures. The relative volume of these structures compared to the input image can be increased by reducing the field of view of a CT scan. To do so, a CNN-based classification was applied to the CT scan to detect the location of the structures, and the images were cropped around the center of mass. The accuracy of the segmentation models improved by training the model with cropped images (i.e. narrow field of view CT scans). Classification. Inception-ResNet-v2 [12] , a CNN-based 2D classification architecture, was trained to classify the existence of an organ-of-interest in each CT slice. The classification model was trained to detect the eyes and the cochleae in each CT slices. The center of mass of each organ in cranial-caudal direction was determined by taking the average of the results from the classification model. Then, 20 slices above and below from the central slice were clipped from an original CT scan to cover the entire organ with some margin. In the segmentation phase, the total of the 40 slices were transferred to the segmentation model. The cropped slices for the eyes were also used to segment the optic nerves, the optic chiasm, and the lacrimal glands, as these structures were all located within the slices cropped for the eyes. Segmentation. After preliminary evaluations of various deep learning models, two models were selected for Task 2 segmentation based upon individual structure segmentation performance. Each of the six structures in Task 2 was trained using a dedicated model, and individual structure predictions were combined to create the final predictions. Training was performed on a 32 GB NVIDIA-V100 for all models. The brainstem, eyes, lacrimals, and optic nerves were segmented with a modified Attention U-Net [9] with batch normalization and a 3 × 3 × 3 convolutional kernel. DSC was used as the loss function as proposed by Milletari et al. [10] . Adam [11] was used as the optimizer, and the learning rate was set to 0.001 and reduced by 15% when loss did not decrease for 70 epochs. Batch size was set to 5. Each model was trained for approximately 1,000 iterations using early stopping. For the remaining Task 2 structures, the optic chiasm and the cochleas, a modified V-Net [10] with batch normalization and a 3 × 3 × 3 convolutional kernel was selected for segmentation. Batch size was set to 2, and all other training hyperparameters followed those described above. The choice of this architecture for these structures, instead of the Attention U-Net architecture used for all other Task 2 segmentations, was based upon visual inspection of preliminary results which revealed better performance for these structures. During training, a body mask was found by thresholding all CT values of -600 or higher. T1w and T2w regions within this mask were normalized by subtracting the mean and then dividing by the standard deviation ("z-score normalization"). Values outside the mask were reset to the minimum value for each scan. Only T1w and T2w images were then used by the model for training. All data was augmented on-the-fly during training by random combinations of rotations in the x-y plane (±15°); translations (±10, ±100, ±5 ) along the x, y, and z axes, respectively; zooms in the x-y plane (±15%), and warps. Additionally, before training all data was flipped left-right to increase the training set size. All submissions to the challenge were evaluated with two metrics: the Dice similarity coefficient (DSC) as well as the surface Dice similarity coefficient (SDSC) with a tolerance of 2 mm [13] . DSC and SDSC scores for individual structures were computed separately, and then averaged for the overall Task 1 and Task 2 scores. Finally, the overall challenge ranking was based upon the unweighted average of the Task 1 and Task 2 scores [8] . In the final challenge submission, relatively good results were achieved for large structure segmentations, with mean DSC above 0.9 for three of the seven large structures, and above 0.7 for three of the remaining. Similarly, mean SDSC with a 2mm tolerance was above 0.9 for six of these. For both metrics, sagittal and transverse brain sinus segmentation scored the lowest, at 0.64 and 0.89 for mean DSC and mean SDSC, respectively. Conversely, small structure scores were low in the final submission set. This was due to an error in the inference and data conversion scripts, which assigned the left structure class to right structures, and vice-versa. Post-challenge, these errors were corrected, with significant improvement to model performance. Tables 3 and 4 . Differences in modality make direct comparison between the approach of this study and others' difficult. In particular, published results often focus on MR [14, 15] or CT [13, 16, 17] exclusively instead of the multi-modality approach described here. However, our results generally compare favorably to those published in the literature. For example, this work finds mean brainstem DSC and SDSC as 0.999 and 0.993, respectively, which consistently ranks higher than other reported values [13] [14] [15] [16] [17] . Similar improvements for DSC are observed for cerebellum, optic chiasm, and left lacrimals [13] [14] [15] . However, SDSC scores for cochleas, lacrimals, and optic nerves were often lower than reported values [13] . There are a few limitations associated with this work. The DSC and SDSC reported for small structures are often, though not always, lower than those generated by CT-only approaches. This may be due to the poorer visibility of these structures in MR imagery, which were the only inputs into the segmentation models of this work. Future research will incorporate CT images into the segmentation models for additional context. In addition, the relatively small test set (n = 15) may be considered a limitation; however, the similarity of our results to those with significantly larger test sets indicates that the size of this test set did not significantly impact the work. In this work, we develop a multi-modality framework for rapid autosegmentation of brain structures as part of an international segmentation challenge. The results of this study showed good agreement with or improvement upon similar values reported in the literature. Future work will include incorporating CT images directly into segmentation models for challenging small structures. This approach could contribute to the Radiation Planning Assistant [18] , a fully automated treatment planning tool aimed at improving access to high quality radiation therapy across the globe. The reasons for discrepancies in target volume delineation: a SASRO study on head-and-neck and prostate cancers 3D variation in delineation of head and neck organs at risk A review on brain structures segmentation in magnetic resonance imaging Advances in Auto-Segmentation U-Net: convolutional networks for biomedical image segmentation Deep neural networks for anatomical brain segmentation Knowing What You Know in Brain Segmentation Using Bayesian Deep Neural Networks MICCAI 2020 ABCs Challenge Attention U-Net: Learning where to look for the pancreas V-Net: Fully convolutional neural networks for volumetric medical image segmentation Inception-v4, inception-ResNet and the impact of residual connections on learning Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv Evaluation of a deep learning approach for the segmentation of brain tissues and white matter hyperintensities of presumed vascular origin in MRI 3D Patchwise U-Net with transition layers for MR brain segmentation Organ at risk segmentation in head and neck CT images using a two-stage segmentation framework based on 3D U-net Automatic detection of contouring errors using convolutional neural networks Radiation planning assistant -a streamlined, fully automated radiotherapy treatment planning system Acknowledgements. The authors acknowledge the support of the High Performance Computing facility at the University of Texas MD Anderson Cancer Center and the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing computational resources that have contributed to the research results reported in this paper.