key: cord-0511974-2wal73wq authors: Zhang, Minghui; Yu, Xin; Zhang, Hanxiao; Zheng, Hao; Yu, Weihao; Pan, Hong; Cai, Xiangran; Gu, Yun title: FDA: Feature Decomposition and Aggregation for Robust Airway Segmentation date: 2021-09-07 journal: nan DOI: nan sha: 79b130eff4c70edc9378677e86640417bfdbfab5 doc_id: 511974 cord_uid: 2wal73wq 3D Convolutional Neural Networks (CNNs) have been widely adopted for airway segmentation. The performance of 3D CNNs is greatly influenced by the dataset while the public airway datasets are mainly clean CT scans with coarse annotation, thus difficult to be generalized to noisy CT scans (e.g. COVID-19 CT scans). In this work, we proposed a new dual-stream network to address the variability between the clean domain and noisy domain, which utilizes the clean CT scans and a small amount of labeled noisy CT scans for airway segmentation. We designed two different encoders to extract the transferable clean features and the unique noisy features separately, followed by two independent decoders. Further on, the transferable features are refined by the channel-wise feature recalibration and Signed Distance Map (SDM) regression. The feature recalibration module emphasizes critical features and the SDM pays more attention to the bronchi, which is beneficial to extracting the transferable topological features robust to the coarse labels. Extensive experimental results demonstrated the obvious improvement brought by our proposed method. Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans. The novel coronavirus 2019 (COVID-19) has turned into a pandemic, infecting humans all over the world. To relieve the burden of clinicians, many researchers take the advantage of deep learning methods for automated COVID-19 diagnosis and infection measurement from imaging data (e.g., CT scans, Chest X-ray). Current studies mainly focus on designing a discriminative or robust model to distinguish COVID-19 from other patients with pneumonia [11, 17] , lesion localization [18] , and segmentation [16] . In this work, we tackle another challenging problem, airway segmentation of COVID-19 CT scans. The accurate segmentation enables the quantitative measurements of airway dimensions and wall thickness which can reveal the abnormality of patients with pulmonary disease and the extraction of patient-specific airway model from CT image is required for navigation in bronchoscopic-assisted surgery. It helps the sputum suction for novel COVID-19 patients. However, due to the fine-grained pulmonary airway structure, manual annotation is time-consuming, error-prone, and highly relies on the expertise of clinicians. Moreover, COVID-19 CT scans share ground-glass opacities in the early stage and pulmonary consolidation in the late stage [3] that adds additional difficulty for annotation. Even though the fully convolutional networks (FCNs) could automatically segment the airway, there remain the following challenges. First, FCNs are data-driven methods, while there are few public airway datasets with annotation and the data size is also limited. The public airway datasets, including EXACT'09 dataset [8] and the Binary Airway Segmentation (BAS) dataset [12] , focus on the cases with the abnormality of airway structures mainly caused by chronic obstructive pulmonary disease (COPD). These cases are relatively clean and we term their distribution as Clean Domain, on the contrary, we term the distribution of COVID-19 CT scans as Noisy Domain. Fig. 1 shows that fully convolutional networks (FCNs) methods [5, 12] trained on the clean domain cannot be perfectly generalized to the noisy domain. Although this challenge can be addressed via the collection and labeling of new cases, it is impractical for novel diseases, e.g. COVID-19, which cannot guarantee the scale of datasets and the quality of annotation. Second, transfer learning methods (e,g. domain adaptation [2, 13] , feature alignment [2, 14] ) can improve the performance on target domains by transferring the knowledge contained in source domains or learning domain-invariant features. However, these methods are inadequate to apply in our scenario because this target noisy domain contains specific features (e.g. patterns of shadow patches) which cannot be learned from the source domain. Third, the annotation of the airway is extremely hard as they are elongated fine structures with plentiful peripheral bronchi of quite different sizes and orientations. The annotation in the EXACT'09 dataset [8] and the BAS dataset [12] are overall coarse and unsatisfactory. However, the deep learning methods are intended to fit the coarse labels, and thereby they are difficult to learn the robust features for airway representation. To alleviate such challenges, we propose a dual-stream network to extract the robust and transferable features from the clean CT scans (clean domain) and a few labeled COVID-19 CT scans (noisy domain). Our contributions are threefold: • We hypothesize that the COVID-19 CT scans own the general features and specific features for airway segmentation. The general features (e.g. the topological structure) are likely to learn from the other clean CT scans, while the specific features (e.g. patterns of shadow patches) should be extracted independently. Therefore, we designed a dual-stream network, which takes both the clean CT scans and a few labeled COVID-19 CT scans as input to synergistically learn general features and independently learn specific features for airway segmentation. • We introduce the feature calibration module and the Signed Distance Map (SDM) for the clean CT scans with coarse labels, and through this way, robust features can be obtained for the extraction of general features. • With extensive experiments on the clean CT scans and the COVID-19 CT scans, our method revealed the superiority in the extraction of transferable and robust features and achieved improvement compared to other methods under the evaluation of tree length detected rate and the branch detected rate. A new dual-stream network is proposed, which simultaneously processes the clean CT scans and a few noisy COVID-19 CT scans to learn robust and transferable features for airway segmentation. In this section, we detail the architecture of the proposed dual-stream network, which is illustrated in Fig. 2 . COVID-19 CT scans share a similar airway topological structure with the clean CT scans, meanwhile introduce unique patterns, e.g., multifocal patchy shadowing and ground-glass opacities, which are not common in clean CT scans. Since the number of clean CT scans is relatively large and the airway structure is also clearer, we aim to adapt the knowledge from clean CT scans to improve the airway segmentation of COVID-19 CT scans. Therefore, a dual-stream network is designed to synergistically learn transferable features from both COVID-19 CT scans and clean CT scans, and independently learn specific features only from COVID-19 CT scans. As illustrated in Fig.2 , let X clean denote the input of sub-volume CT scans from the clean CT scans, X noisy from the noisy COVID-19 CT scans. Encoder clean and Encoder noisy are encoder blocks for feature extraction. Decoder clean and Decoder clean&noisy are decoder blocks to generate the segmentation results based on the features from encoders. The Output SDM represents the output of clean CT scans, and the Output P M represents the output of COVID-19 CT scans, they can be briefly defined as follows: where we omit the detail of 1 × 1 × 1 convolution and the Squeeze&Excitation (SE) module for a straightforward explanation of the overall workflow. In this case, the features of X noisy are decomposed into two parts: Encoder clean aims to extract high-level, semantic and transferable features from both clean CT scans and COVID-19 CT scans; Encoder noisy is designed to obtain the specific features which belongs to the COVID-19 samples. The features of clean CT scans extracted by Encoder clean are fed into Decoder clean . For COVID-19 CT images, the decomposed features are then aggregated again via the channelwise summation operation and fed into Decoder clean&noisy to reconstruct the volumetric airway structures. As is mentioned before, the annotation of the public airway dataset is overall coarse and unsatisfactory. Since we have determined which features to transfer, then the transferable features can be further refined to be more robust through feature recalibration and introducing signed distance map. Feature Recalibration: 3D Channel SE(cSE) module [20] is designed to investigate the channel-wise attention. We embed this module between Encoder clean and Decoder clean , aiming to refine the transferable features. Take U as input and U as output, U, U ∈ R F ×D×H×W with the number of channels F, depth D, height H, width W. 3D cSE firstly compresses the spatial domain then obtains channel-wise dependencies Z, which are formulated as follows: where δ(·) denotes the ReLU function and σ(·) refers to sigmoid activation, W 1 ∈ R F rc ×F , and W 2 ∈ R F × F rc . The r c represents the reduction factor in the channel domain, similar to [4] . The output of 3D cSE is obtained by: U = U Z. Signed Distance Map: In recent years, introducing the distance transformed map into CNNs have proven effectivity in medical image segmentation task [6, 10, 19] due to its superiority of paying attention to the global structural information. The manual annotation of the plentiful tenuous bronchi is error-prone and often be labeled thinner or thicker. The 3D FCNs cooperating with common loss function treat the labeled foreground equally and intend to fit such coarse labels, which are difficult to extract robust features. Even though the thickness of the annotated bronchi is uncertain, the phenomenon of breakage or leakage in the annotation can be avoided by experienced radiologists. Therefore, the overall topology is correctly delineated, and we can use the topological structure instead of the coarse label as a supervised signal. Besides, the intra-class imbalance problem in airway segmentation is severe. Distance transform map is used to rebalance the distribution of trachea, main bronchi, lobar bronchi, and distal segmental bronchi. We use the signed distance map transform as a voxel-wise reweighting method, incorporating with the regression loss that focuses on the relatively small values (such as the lobar bronchi and distal segmental bronchi) by having larger gradient magnitudes. Given the airway as target structure and each voxel x in the volume set X, we construct the Signed Distance Map (SDM) function termed as φ(x), defined as: x ∈ airway and x ∈ C, − inf ∀z∈C x − z 2 , x ∈ airway and x / ∈ C, where the C represents the surface of the airway, we further normalize the SDM into [−1, +1]. We then transformed the segmentation task on clean CT scans to an SDM regression problem and introduce the loss function that penalizes the prediction SDM for having the wrong sign and forces the 3D CNNs to learn more robust features that contain topological features for airway. Denote the y x as the ground truth of SDM and f x as the prediction of the SDM, the loss function for the regression problem can be defined as follows: where · 1 denotes the L 1 norm. The training loss functions consist of two parts, The first part is the L reg for the clean CT scans, and the second part is the L seg for the noisy CT scans, we combine the Dice [9] and Focal loss [7] to construct the L seg : where g x is the binary ground truth and p x is the prediction. The total loss is defined as L total = L seg + L reg . Dataset: We used two datasets to evaluate our method. • Clean Domain: Binary Airway Segmentation (BAS) dataset [12] . It contains 90 CT scans (70 CT scans from LIDC [1] ) and 20 CT scans from the training set of the EXACT'09 dataset [8] . The spatial resolution ranges from 0.5 to 0.82 mm and the slice thickness ranges from 0.5 to 1.0 mm. We randomly split the 90 CT scans into the training set (50 scans), validation set (20 scans), and test set (20 scans). • Noisy Domain: COVID-19 dataset. We collected 58 COVID-19 patients from three hospitals and the airway ground truth of each COVID-19 CT scan was corrected by three experienced radiologists. The spatial resolution of the COVID-19 dataset ranges from 0.58 to 0.84 mm and slice thickness varies from 0.5 to 1.0 mm. The COVID-19 dataset is randomly divided into 10 scans for training and 48 scans for testing. [2] 87.9 ± 4.9 85.5 ± 4.8 95.5 ± 1.6 Domain Adaptation (TPAMI,2018) [13] 87.0 ± 4.6 84.9 ± 4.0 96.0 ± 1. Network Configuration and Implementation Details: As shown in Fig.2 , each block in the encoder or decoder contains two convolutional layers followed by pReLU and Instance Normalization [15] . The initial number of channel is set to 32, thus {F 1 , F 2 , F 3 . In testing phase, we performed the sliding window prediction with stride 48. All the models were implemented in PyTorch framework with a single NVIDIA Geforce RTX 3090 GPU (24 GB graphical memory). Evaluation Metrics: We adopted three metrics to evaluate methods, including the a) tree length detected rate (Length) [8] , b) branch detected rate (Branch) [8] , and c) Dice score coefficient (DSC). All metrics are evaluated on the largest component of each airway segmentation result. Quantitative Results: Experimental results showed that the way of training on the BAS dataset then evaluating on the COVID-19 dataset performed worst, as expected. Training merely on the COVID-19 dataset performed better than training on both the BAS dataset and the COVID-19 dataset, which implied the necessity of transfer learning rather than merely together different datasets. 3D UNet with cSE [20] was trained on the COVID-19 dataset and the results showed no significant improvement. For comparison, three commonly used transfer learn-ing methods, Fine-tuned (pre-trained on BAS dataset, fine-tuned on COVID-19 dataset), Feature Alignment (FA) [2] through adversarial training, and Domain Adaptation (DA) by sharing weights [13] were reimplemented to be applied in our task, the results in Table. 1 demonstrated our proposed method is superior to these methods, the proposed method achieved the highest performance on all metrics of Length (92.1%), Branch (87.8%), and DSC (96.8%). We also conducted the ablation study to investigate the effectiveness of each component of the proposed method. In Table 1 Qualitative Results: The visualization of segmentation results is presented in Fig.3 . Compared to other methods, the proposed method gains improvement on both the severe and mild cases of the COVID-19 dataset, which accurately detected more bronchi surround by multifocal patchy shadowing of COVID-19. This paper proposed a novel dual-stream network to learn transferable and robust features from clean CT scans to noisy CT for airway segmentation. Our proposed method not only extracted the transferable clean features but also extract unique noisy features separately, transferable features were further refined by the cSE module and SDM. Extensive experimental results showed our proposed method accurately segmented more bronchi than other methods. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans Unsupervised bidirectional crossmodality adaptation via deeply synergistic image and feature alignment for medical image segmentation Ct imaging features of 2019 novel coronavirus Squeeze-and-excitation networks Automatic airway segmentation in chest ct using convolutional neural networks Reducing the hausdorff distance in medical image segmentation with convolutional neural networks Focal loss for dense object detection Extraction of airways from ct (exact'09) V-net: Fully convolutional neural networks for volumetric medical image segmentation Shape-aware complementary-task learning for multi-organ segmentation Dual-sampling attention network for diagnosis of covid-19 from community acquired pneumonia Airwaynet: a voxel-connectivity aware approach for accurate airway segmentation using convolutional neural networks Beyond sharing weights for deep domain adaptation Deep coral: Correlation alignment for deep domain adaptation Instance normalization: The missing ingredient for fast stylization A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images Prior-attention residual learning for more discriminative covid-19 screening in ct images A weakly-supervised framework for covid-19 classification and lesion localization from chest ct Shape-aware organ segmentation by predicting signed distance maps Anatomynet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy Acknowledgements. This work was partly supported by National Natural Sci-