key: cord-0504024-zzcmvd11 authors: Deng, Yang; Wang, Ce; Hui, Yuan; Li, Qian; Li, Jun; Luo, Shiwei; Sun, Mengke; Quan, Quan; Yang, Shuxin; Hao, You; Liu, Pengbo; Xiao, Honghu; Zhao, Chunpeng; Wu, Xinbao; Zhou, S. Kevin title: CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography date: 2021-05-31 journal: nan DOI: nan sha: a1599db9ad069d5a6ad21c7a886615bcb1567c01 doc_id: 504024 cord_uid: zzcmvd11 Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement. Spinal or vertebral image segmentation in CT is a crucial step in all applications regarding automated quantification of spinal morphology and pathology. Over the recent years, deep learning has achieved remarkable success in various medical imaging applications 6, 7 and many automated spine image segmentation approaches have been proposed 8, 9 . However, all these approaches are data-dependant and have either been validated on private datasets or small public datasets. Considering SpineWeb 1 , a popular archive for multi-modal spine data, it lists only two CT datasets: CSI2014 10 and xVertSeg 11 , both of which only contain dozens of CT scans. Therefore, those approaches are heavily restricted. To address the concerns of large scale data availability, Sekuboyina et al. 2 organized the Large Scale Vertebrae Segmentation Benchmark(VerSe) as a challenge in conjunction with the International Conference on Medical Image Computing and Computed Assisted Intervention (MICCAI) 2019 and MICCAI 2020. With VerSe'19 2 , they released into the public domain a diverse dataset of 160 spine multi-detector CT scans with 1,735 vertebrae (120 seen CT scans and 40 hidden CT scans). For VerSe'20 3 , the upgraded version of VerSe'19, they released 300 CT scans, the largest public spine CT dataset to date. These two datasets both provide the ground truth of each vertebra and are the most used dataset for vertebrae segmentation currently. Nonetheless, these datasets are still of small size. Further, most CT scans from VerSe'19 and VerSe'20 datasets are 'cropped', only containing the small area of the spine and abandoning other information about surrounding organs. To advance the research in spinal image analysis, we hereby present a large-scale and comprehensive dataset: CTSpine1K. We collect and annotate a large-scale spinal vertebrae CT dataset from multiple domains and different manufacturers, totalling 1,005 CT volumes (over 500,000 labeled slices and over 11,000 vertebrae) of diverse appearance variations. We design an exquisite and unified annotation pipeline carefully to ensure the quality of annotations. To the best of our knowledge, our CTSpine1K dataset is the largest publicly available annotated spine CT dataset. We evaluate the quality of the dataset by carrying out benchmark experiments for vertebrae segmentation. To build a comprehensive spine image dataset that replicates practical appearance variations, we curate a large-scale CT dataset of spinal vertebrae from the following four open sources. COLONOG. This sub-dataset comes from the CT COLONOGRAPHY dataset related to a CT colonography trial 12 . We randomly select one of the two positions (we will open source the codes used for selection), which have similar information, of each patient for our dataset. There are 825 CT scans and are in a Digital Imaging and Communication in Medicine (DICOM) format. HNSCC-3DCT-RT. This sub-dataset contains three dimensional (3D) high-resolution fan-beam CT scans collected during pre-treatment, mid-treatment, and post-treatment using a Siemens 16-slice CT scanner with a standard clinical protocol for 31 head-and-neck squamous cell carcinoma (HNSCC) patients 13 . These images are in a DICOM format. MSD T10. This sub-dataset comes from the 10th Medical Segmentation Decathlon 14 . To attain more slices containing the spine, we select the task03_liver dataset consisting of 201 cases. These images are in a Neuroimaging Informatics Technology Initiative (NIfTI) format (https://nifti.nimh.nih.gov/nifti-1). COVID-19. This sub-dataset consists of non-enhanced chest CTs from 632 patients with COVID-19 infections. The images were acquired at the point of care in an outbreak setting from patients with Reverse Transcription Polymerase Chain Reaction (RT-PCR) confirmation for the presence of SARS-CoV-2 15 . We pick 40 scans with the images stored in a NIfTI format. We reformat all DICOM images to NIfTI to simplify data processing and de-identify images, meeting the institutional review board (IRB) policies of contributing sites. More details for those sub-datasets could be found in [12] [13] [14] [15] . All existing sub-datasets are under Creative Commons license CC-BY-NC-SA and we will keep the license unchanged. It should be noted that, for sub-dataset task03_liver and sub-dataset COVID-19, we only choose subset of data from them, and in all these data sources, we exclude those cases of very low quality. The overview of our dataset and the thorough comparison with the VerSe Challenge dataset can be seen in Table 1 . Considering that medical image annotation is a highly time-consuming and subjective task, we design a unified and rigorous labeling standard and pipeline before annotation. The annotation pipeline is shown in Fig. 2 . To reduce the annotation workload, we first use the public dataset from VerSe'19 and VerSe'20 Challenges to train a segmentation network using the nnUnet algorithm 16 . As mentioned earlier, most of the VerSe' Challenge samples are 'cropped', abandoning the information about surrounding structures such as organs. Therefore we select those cases which have complete CT images (without cropping) and have the same image spacing between images and their corresponding ground truth, totalling 41 cases that can be used. Then, for an image to be annotated, we invoke the trained segmentation model to predict segmentation masks and invite some junior annotators to refine the labels based on the prediction results. All these refined labels by junior annotators are checked by two senior annotators for further refinement. If the senior annotator finds it difficult to determine the annotations, these data will be sent to one of trained spine surgeons, whose image-reading experience averages 12 years. Finally, all of these annotated labels undergo a random double-check by coordinators to ensure the final quality of annotations. If there exist any wrong cases in double-checking, they are corrected by annotators. The human-corrected annotations and their corresponding images are then added to the training data to retrain a more powerful model. To speed up the annotation process, we update the database every 100 cases and retrain the deep learning model. The process is iterative until the annotation Figure 2 . The proposed annotation pipeline. We invite junior annotators, senior annotators and medical experts to ensure annotation equality in the pipeline. task is finished. The whole annotation process is operated with the software ITK-SNAP 17 . Segmentation masks are also saved in a NIfTI format. In total, we have annotations for 1005 CT volumes. The dataset is available on Figshare. (The permanent address is at https://github.com/ICT-MIRACLE-lab/CTSpine1K. Due to the limitation of free storage, we only upload some images but all the annotations for the peer-review process. Because all the image datasets we used are from open-source sub-datasets, we only need to provide the annotations.) The CTSpine1K dataset consists of three subfolders, two xlsx files, and a readme file, namely trainset, test_public, test_private, metadata_colonog.xlsx, metadata_neck.xlsx, and readme.txt. Each subfolder contains two subfolders named data and gt, where CT images and corresponding ground truth are stored. The metadata.xlsx file records the detailed information about this dataset, such as patient ID, manufacturer, patient gender, tumor location (if applicable), etc. (Subdataset COVID19 and MSD T10 do not have this metadata.xlsx file.) The readme.txt file records the IDs of those pathological cases. All CT volumes and ground truth are in NIfTI format. The dataset structure is as follows. IDs 1. 3 Based on CTSpine1K, we use a fully supervised method to train a deep network for spinal vertebrae segmentation to establish a benchmark. In recent years, the nnUnet model has achieved better results than other methods in many medical image segmentation tasks and has become the acknowledged baseline in medical image segmentation 16 . nnUnet is essentially a U-Net 18 , but with specific network architecture, design parameters, and training parameters self-adapted to the characteristics of the dataset itself, together with powerful data augmentation. Therefore, we choose nnUnet as the benchmarking model for vertebrae segmentation. Due to the huge amount of high-resolution 3D images in our dataset, we use the 3D full resolution U-net architecture. More details for nnUnet model can be seen in 16 . Data Split. Our dataset contains 1005 3D CT volumes (on average, each scan has 504 slices and 11 labeled vertebrae) with over 500,000 labeled vertebrae slices (size of 512x512). The CTSpine1K dataset is separated into a training dataset (610 subjects), a public test dataset (197 subjects), and a private test dataset (198 subjects). More details about the data split can be seen in Table 1 . Aiming to compare the domain differences between our annotated dataset and the VerSe Challenge datasets, we use the 41 public cases from the VerSe Challenges as Test_VerSe. Evaluation Metrics. Every vertebra (from C1 to L6) is labeled with integer values from 1 to 25. We here use two ubiquitous metrics prevalent in the medical image segmentation domain: (i) Dice Coefficient (DSC). DSC is computed per-label: 2|A * B|/(|A| + |B|), where A is the set of foreground voxels of a certain label and B is the predicted mask. (ii) Hausdorff Surface Distance 95 (HD) in mm. HD measures the local maximum distance between the two surfaces constructed out of the ground truth and predicted segmentation map. Implementation Details. We train the nnUnet model for 1,000 epochs, keeping the training configuration, such as the learning rate and data augmentation, etc., the same as the original settings in reference 16 . Due to the limit of computing sources, we use all folds rather than run five-fold cross-validation while training. The experiments are implemented using Pytorch on an RTX 3090 GPU. The training time is around 15 days. We calculate the two metrics of each vertebra and the results are reported in Table 2 . On the one hand, our experimental results are close to those reported in reference 2 with the same model (nnUnet), verifying the high quality of our annotations. On the other hand, Table 2 shows it is difficult to segment the diseased vertebrae (the DSC of L6 is almost 0). Specifically, the existence of L6 confuses the model, resulting in the prediction dislocations (see the last row in Fig. 4 ). Thus our labeled dataset which contains many L6 cases is very valuable for the diseased vertebrae segmentation (We have stated those cases which are hard for annotation in the readme.txt file). Table 2 also illustrates that the model trained with our annotations can achieve good performance on our CTSpine1K dataset but a much worse performance on the VerSe Challenge datasets, which explains there is an obvious domain gap between our annotated dataset and the public dataset. We infer the reason is that the COlONOG dataset is got on an empty stomach and colon, containing less information than full stomach and colon via CT images (see Fig. 3 ). Therefore, our annotations are a good complement to the existing datasets. VerSe' COLONOG VerSe' COLONOG VerSe' COLONOG Some visualization results are presented in Fig. 4 , where we can observe that the baseline model can achieve excellent segmentation results. Nevertheless, some failed predictions occur when there exist spinal diseases, especially sacral lumbarization and lumbar sacralization. Besides, the image's resolution of Z direction is closely related to the results and a lower resolution leads to worse results. How to maintain a reasonable performance for a low resolution is a research challenge. Image superresolution 19 might be worth exploring. The data within this work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). For own research purposes, the data can be freely downloaded and used, but we kindly ask investigators to cite this paper in their publications. The data are suitable for visualization in a variety of software, including 3D Slicer 20 and ITK-SNAP 17 . We provide the python scripts and our trained model on GitHub (https://github.com/ICT-MIRACLE-lab/CTSpine1K), which can serve as a starting point for the community to march on future development based on our CTSpine1K dataset. Images GT Predictions Images GT Predictions Verse: a vertebrae labelling and segmentation benchmark Vertebral compression fracture after spine stereotactic body radiation therapy: a review of the pathophysiology and risk factors Spinal and spinal cord infection Spinal Imaging and Image Analysis A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises Handbook of medical image computing and computer assisted intervention Iterative fully convolutional neural networks for automatic vertebra segmentation and identification Coarse to fine vertebrae localization and segmentation with spatialconfiguration-net and u-net Detection of vertebral body fractures based on cortical shell unwrapping A framework for automated spine and vertebrae interpolationbased detection and model-based segmentation Accuracy of ct colonography for detection of large adenomas and cancers Head-and-neck squamous cell carcinoma patients with ct taken during pre-treatment, mid-treatment, and post-treatment dataset. the cancer imaging archive A large annotated medical image dataset for the development and evaluation of segmentation algorithms Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets nnu-net: a self-configuring method for deep learning-based biomedical image segmentation User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability U-net: Convolutional networks for biomedical image segmentation Saint: spatially aware interpolation network for medical slice synthesis Gbm volumetry using the 3d slicer medical image computing platform We would like to acknowledge Beijing Jishuitan Hospital for its support in this work. The authors declare no competing interests.