key: cord-0528004-7dqpzebk authors: Chen, Han; Jiang, Yifan; Ko, Hanseok; Loew, Murray title: A Teacher-Student Framework with Fourier Augmentation for COVID-19 Infection Segmentation in CT Images date: 2021-10-13 journal: nan DOI: nan sha: bd2b6b1d039ed74825e6c52f1fb7bc114c3c5104 doc_id: 528004 cord_uid: 7dqpzebk Automatic segmentation of infected regions in computed tomography (CT) images is necessary for the initial diagnosis of COVID-19. Deep-learning-based methods have the potential to automate this task but require a large amount of data with pixel-level annotations. Training a deep network with annotated lung cancer CT images, which are easier to obtain, can alleviate this problem to some extent. However, this approach may suffer from a reduction in performance when applied to unseen COVID-19 images during the testing phase due to the domain shift. In this paper, we propose a novel unsupervised method for COVID-19 infection segmentation that aims to learn the domain-invariant features from lung cancer and COVID-19 images to improve the generalization ability of the segmentation network for use with COVID-19 CT images. To overcome the intensity shift, our method first transforms annotated lung cancer data into the style of unlabeled COVID-19 data using an effective augmentation approach via a Fourier transform. Furthermore, to reduce the distribution shift, we design a teacher-student network to learn rotation-invariant features for segmentation. Experiments demonstrate that even without getting access to the annotations of COVID-19 CT during training, the proposed network can achieve a state-of-the-art segmentation performance on COVID-19 images. The pandemic caused by the novel coronavirus disease that emerged at the start of 2020 has had significant worldwide medical and economic impacts [1, 2] . A recent report by the World Health Organization found that there had been more than 198 million confirmed cases and more than 4 million deaths globally by July 2021 [3] . To diagnose COVID-19, real-time reverse transcription-polymerase chain reaction (RT-PCR) tests and radiological imaging techniques, such as computed tomography (CT) and X-rays, are widely used. In particular, CT imaging plays a critical role in the early diagnosis and evaluation of COVID-19 [4, 5] , with the segmentation of infected regions in CT scans providing essential information that can be used in the quantitative assessment of the progression of the disease [6, 7] . Recently, deep-learning-based automatic segmentation methods [8, 9, 10, 11, 12] have been proposed for use in COVID-19 CT image analysis, and these have achieved excellent results. Despite the promising results, however, these approaches all rely on large datasets with pixel-level annotations, and it is timeconsuming and laborious to collect a sufficient number of COVID-19 CT images with annotations due to concerns over patient privacy and the lack of experts [13, 14, 15] . In contrast, collecting lung cancer datasets is relatively easy. Therefore, it is possible to utilize publicly available lung cancer databases to train a deep network for the detection of COVID-19 infections. For example, Jin et al. [16] developed a system using a large dataset from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) [17] to achieve multi-class classification diagnoses. Chen et al. [18] also proposed a contrastive learning method and trained an encoder that could be used to capture expressive feature representations in large lung datasets; they then employed a prototypical network for COVID-19 classification. However, these approaches were not able to produce any significant improvement in segmentation performance for COVID-19 infections, and few other studies have attempted to utilize lung cancer datasets for this purpose. Even though COVID-19 infection and lung cancer nodule exhibit similar manifestations to some degree in CT scans, models trained using CT images with lung nodules do not perform well when tested on COVID-19 CT images due to the domain shift between the two. The difference between pulmonary nodules and COVID-19 infection in CT scans is presented in Fig.1 and can be categorized as follows: (1) In terms of distribution, COVID-19 presents as a bilateral, patchy infection, while early-stage lung cancer is unilateral and oval in shape [19] ; (2) While there is also a clear difference in intensity due to the use of different scanners, scanning protocols, and subject populations [20] . In this paper, we consider infection segmentation in the context of the wide availability of lung cancer CT images with annotations, the limited availability of unlabeled COVID-19 CT images, and the domain shift between the two. We hypothesize that the features learned from pulmonary nodules in lung cancer CT can improve the segmentation performance for COVID-19 diagnosis through alignment. For our segmentation network, we design a novel data augmentation method and training scheme. In order to address the intensity shift, we transform the lung cancer CT images into the style of COVID-19 CT images using an effective augmentation approach via a Fourier transform, which replaces the low-level frequencies in the lung cancer CT images with those of COVID-19 CT images. Because lung cancer CT images are labeled at pixel-level, the augmented images and corresponding annotations are used to train the end-to-end infection segmentation network. To overcome the distribution shift, we introduce a teacher-student learning paradigm to achieve robust features learning. We treat our base network as a student network and introduce another teacher network. We then impose the same type of image transformation (e.g., rotation and affine transformation) on the input to the student network and the output of the teacher network, respectively. The output predictions of these two networks are forced to be consistent with these transformations. We validate the effectiveness of our proposed method using public COVID-19 CT images. Experimentally, it outperforms various competing state-of-theart approaches. The contributions of our work can be summarized as follows: • We propose a novel unsupervised COVID-19 infection segmentation network to distinguish the infected regions in the COVID-19 CT images, of which the training process requires a largescale labeled lung cancer CT dataset and unlabeled COVID-19 CT images. • We propose an effective data augmentation method to overcome the intensity shift between the lung cancer data and COVID-19 data using a Fourier transform and its inverse. • We build a segmentation framework that collaborates with a teacher-student network based on transformation-consistency learning, thus alleviating the distribution shift and allowing the network to learn the robust features. • The experimental results show that our proposed method can achieve a state-of-the-art performance for COVID-19 CT images. We also provide a comprehensive analysis of our approach. COVID-19 Infection Segmentation. Deep-learningbased segmentation methods have played an essential role in the fight against COVID-19. For example, Oulefki et al. [21] presented a multilevel thresholding process based on Kapur entropy to improve the COVID-19 segmentation performance. Zhou et al. [22] proposed to use spatial and channel attention to improve the representation ability of the network. Ouyang et al. [23] developed a 3D CNN network for COVID-19 infection segmentation and proposed a dual-sampling attention mechanism. To address the scarcity of well-labeled data, some studies constructed new networks suitable for small-scale or point-level labeled data. Fan et al. [8] presented a semi-supervised segmentation method based on a random selection propagation strategy, which required only a few labeled images and primarily utilizes unlabeled data. Laradji et al. [24] proposed a new COVID-19 segmentation model using point-level rather than full image-level annotations, which overcame the labeling issue to some extent. However, all of these methods still rely on labeled COVID-19 data for training without considering the use of lung cancer data, which are easier to access. Domain Adaptation for Medical Segmentation. Deeplearning models often suffer from poor generalization across different domains, making them difficult to employ in a wide variety of clinical settings [25] . Domain adaptation has gained significant attention as a way to overcome this limitation. It seeks to transfer the knowledge from a labeled data domain to a related but unlabeled domain in either image space or feature space with the aim to improve the performance of the model when applied to multiple different datasets [26, 27] . Some studies [28, 29, 30, 31] have utilized self-learning to minimize discrepancies between the feature spaces. Other studies attempted adversarial training [32, 33, 34, 35, 36, 37] in which a discriminator is employed to align the distributions of the two domains. Strategies originally introduced for semisupervised learning have also been explored in the context of domain adaptation. For example, in [38, 39, 40] , teacher-student learning was used to minimize the discrepancy between the predictions from the inputs of the two domains. Teacher-Student Learning. The concept of teacherstudent learning is based on distilling the knowledge of an ensemble of deep networks into a single network [41] . It employs unlabeled data to produce consistent predictions under perturbations [42] and has been widely used in segmentation tasks. Xu et al. [43] proposed a self-ensembling attention network consisting of a student model that acts as a base network and a teacher model serving as the ensembling network; the former learns from the output of the latter and tends to produce accurate predictions. Choi et al. [44] utilized GAN to transform the labeled image into the same style as the unlabeled image and then introduced a teacher network to enhance its performance in the segmentation of the unlabeled data. Yu et al. [39] proposed training a teacher-student model in which the teacher model produced noisy labels and corresponding uncertainty maps, while the student model was trained using the labels from the teacher model in consideration of label uncertainty. Our method is designed to compensate for the scarcity of COVID-19 data by utilizing a large-scale lung cancer dataset. Specifically, we propose an un-supervised COVID-19 infection segmentation network with robust invariant-feature learning. We propose to transform the lung cancer data into the style of COVID-19 CT images for data augmentation to overcome the intensity shift. By constructing a teacher-student network with consistency learning and entropy minimization, our network can then learn those features that are robust to transformations. As a consequence, the distribution shift is alleviated to some degree. In this section, we provide an overview of our proposed method. We then illustrate our Fourier transformation-based data augmentation method. Finally, we describe our teacher-student training strategy and optimization objection. Fig.2 overviews our unsupervised COVID-19 infection segmentation network. The labeled lung cancer image and unlabeled COVID-19 CT image are denoted as {X S , Y S } and {X T }, respectively. We first augment the input data by replacing the low-frequency information of F (X S ) with that of F (X T ). After the use of the inverse Fourier transformation, we obtain {X S →T , Y S } which retains the semantic information for lung cancer but in COVID-19 style. A teacher and a student network are employed in our framework, with the former acting as an ensemble network and the latter acting as the base network. After data augmentation, we feed X S →T into the student network and obtain the pixel-wise segmentation prediction. X T is sent to both the student and teacher networks. Here, we apply the same transformation to the input of the student network and to the output of the teacher network, then align the two-stream outputs with the consistency loss. Back-propagation only occurs for the student network, and the weights of the teacher network are updated using the exponential moving average (EMA) for the student network. To address the intensity shift between the lung cancer and COVID-19 CT images, our solution employs a Fourier transform and replaces the low-frequency spectrum information of the lung cancer images with that of the COVID-19 images. In this way, we make it possible to disentangle the low-level distribution (i.e., the intensity information) from the high-level semantic information (i.e., the object content) of an image and transfer the former to another image. Figure 2 : Overview of the proposed method. It consists of two components, a student network and a teacher network, which share the same architecture. The dashed lines and solid lines represent the data flow for the lung cancer and COVID-19 images, respectively. The student network is trained by the weighted combination of dice loss, consistency loss and entropy loss. The weights of the teacher network are the exponential moving average of that of the student network. τ(·) represents an elastic transformation operation with the same parameters in every iteration. Specifically, given image X ∈ R H×W×C (C = 1 for a single-channel input), we can calculate its Fourier transform using the FFT algorithm as, We further decompose F (X) into an amplitude spectrum A(X) ∈ R H×W×C and a phase spectrum P(X) ∈ R H×W×C , which respectively represent the intensity distribution and semantic content of the image, respectively. We then denote a binary mask M α = 1 (h,w)∈[−αH:αH,−αW:αW] , which controls the scale of the amplitude spectrum to be replaced. Given A(X S ) from the lung cancer image and A(X T ) from the COVID-19 image, we apply the mask M α to them and generate a new amplitude spectrum: After obtaining the transformed amplitude spectrum A(X S →T ), we combine it with the phase spectrum P(X S ) for lung cancer image X S and conduct the inverse Fourier transform F −1 to generate augmented image X S →T . where the semantic content of X S →T is the same as X S , but follows the intensity distribution of X T . In this way, we obtain the augmented lung cancer data {X S →T , Y S } with a similar intensity to that of COVID-19 images; this augmented data is utilized for further training and can effectively alleviate the intensity distribution between the two domains. Even though the augmented data X S →T have a similar style to X T , a difference in distribution remains. To account for this distribution shift between the nodule of lung cancer nodules and COVID-19 infection, we build a teacher-student network with a transformationconsistency learning strategy. The teacher and student networks follow the same U-Net architecture [45] but are updated in different ways. Specifically, the X S →T for lung cancer is only sent to the student network to output the pixel-wise segmentation resultŶ S , which is then utilized to calculate the following dice loss [46] : where g n and p n represent the ground truth for the input and the predicted probabilistic map, respectively, with a background class probability of 1 − p n . The term is used to avoid being divided by zero. Updating the base student network with the segmentation loss in Equation (4) leads to good segmentation performance for lung cancer images. However, when testing on the COVID-19 images, the performance will be significantly lower due to the distribution shift. To resolve this problem, we introduce a teacher network and utilize the unlabeled COVID-19 CT images to guide the student network to learn the robust features. As shown in Fig.2 , we first impose an elastic transformation τ(·) on X T , then the student network takes the transformed τ(X T ) as input and produce the predicted segmentation map. For the teacher network, we apply the same transformation τ(·) on its prediction map. We then define the consistency loss for the predictions: (5) To more firmly bridge the gap between the labeled lung cancer image and the COVID-19 image, we introduce entropy minimization to force the model's decision boundary toward a high prediction certainty for the teacher network's prediction from the COVID-19 input. Given a COVID-19 CT image X T , the entropy loss is calculated as follows: We update the student network with a combination of the dice loss, consistency loss and entropy loss. The optimization objection can thus be formulated as follows: where λ is the hyper-parameter acting as a weight for the consistency loss. Unlike the student network, the teacher network does not participate in the backpropagation and is updated via the exponential moving average (EMA) of the weights for the current student network at each step, where θ t,i and θ s,i represent the weights for the teacher and student networks at training step i, respectively, and β is a hyper-parameter for exponential moving average decay. The ensemble of the teacher and student network makes it possible to train the network with the unlabeled COVID-19 CT images. Moreover, through the regularization of the consistency loss, the student network can learn from the teacher network output. The networks are then regularized to be transformation-consistent thus increasing the generalization capacity and robustness to the distribution shift between the lung cancer images and the unlabeled COVID-19 images. Datasets. The lung cancer data comes from the LIDC-IDRI lung cancer dataset [17] , which is currently the largest CT dataset for pulmonary nodule detection. It provides a large number of chest CT images that share similarities with COVID-19 CT images. The LIDC-IDRI dataset contains 1018 cases, each of which includes images from a clinical thoracic CT scan along with pixel-wise annotations from experienced radiologists. The COVID-19 CT images come from http://medicalsegmentation.com/covid19/, which has been collected by the Italian Society of Medical and International Radiology. It contains 26 CT volumes from confirmed COVID-19 patients, and each volume contains ∼200 slices. The data processing for the above two datasets is detailed as follows. For LIDC-IDRI, we select subjects with lung nodules and generate a corresponding ground truth mask based on the patient's XML file. We exclude the scans where the pixel number for every nodule is less than 200 for robust training. This leaves a total of 2,438 slices from lung cancer CT images with annotations that are used as source data for training. For the COVID-19 CT images, we reformat all of the 3D volumes into 2D slices with a size of 512×512 to produce a total of 1,616 slices. We employ 70% of these slices as the unlabeled target data for training, while the remaining 30% are used to test segmentation performance. We follow the patient-level split rule when separating the target data into the training and test sets. Implementation details. Our network is implemented using PyTorch on an NVIDIA RTX 2080Ti and an Intel(R) Core i7-9700K CPU. The batch size is set to 1. We use an Adam optimizer with the initial learning rate of 6e-4 and weight decay of 0.0005. The hyperparameters are set at α = 0.005 and β = 0.99. The consistency weight is updated using a sigmoid ramp-up of λ = 1.5e −5(1−p) 2 , where p is the progress of the training epochs normalized to a range of 0 to 1. The structure of the student and teacher networks follows U-Net architecture [45] . Evaluation metrics. For quantitative evaluation, we adopt the three most commonly used metrics in medical imaging analysis: the dice similarity coefficient (Dice), sensitivity (Sen), and specificity (Spe) [47, 46] . The dice similarity coefficient is an overlap index that indicates the similarity between the prediction and the ground truth. Sensitivity and specificity are two statistical measures for the performance of binary medical image segmentation tasks. The former measures the per- centage of actual positive pixels correctly predicted to be positive, while the latter measures the proportion of actual negative pixels correctly predicted to be negative. These metrics are defined as follows: S en = TP TP + FN (10) where TP, FP, TN, and FN represent the number of true positive, false positive, true negative, and false negative pixels in the prediction, respectively. In this section, we compare the COVID-19 infection segmentation performance of the proposed method with the source-only (baseline) model and the state-of-theart segmentation methods FDA [31] , MinEnt [37] , and AdvEnt [37] . Quantitative results. Table 1 shows the quantitative results for each method, reported as the mean± error interval (calculated based on a 95% confidence interval). Our proposed method outperforms the other methods across most metrics. For example, our method produces a 9.54% improvement in the dice similarity coefficient compared with that of the source-only method, which trains the U-Net using lung cancer data in a supervised manner, which confirms the effectiveness of our data augmentation and consistency learning. FDA [31] also transforms lung cancer images into the style of COVID-19 images, which helps to reduce the intensity shift to some degree, but it still suffers from poor generalization because it does not consider the distribution shift. Unlike the entropy-based methods MinEnt [37] and Ad-vEnt [37] , our method attempts to learn the domaininvariant features for segmentation. Fig.3 shows that our proposed method produces the most impressive results overall, proving that the teacher-student structure can yield a more robust segmentation performance. Qualitative results. Fig.4 presents the segmentation results (red regions) for our method and other approaches as a comparison. The source-only method is able to distinguish small infected regions but suffers from poor generalization for large-scale infections, such as case (b), because it is only trained with the LIDC-IDRI lung cancer dataset, in which the pulmonary nodules are relatively small and oval in shape. Similar to our method, FDA [31] also utilizes unlabeled data and transforms the labeled images into the style of unlabeled images for supervised training, thus overcome the intensity shift. However, it cannot account for distribution shift, thus it performs poorly for case (d). MinEnt [37] utilizes entropy minimization with the segmentation map to overcome the shift between the labeled data and unlabeled data but fails to capture the fine-grained details in case (b). AdvEnt [37] relies on the adversarial training with the entropy map and is unable to handle the large-scale infections in cases (b) and (d). Overall, our proposed method performs better than the other methods and is consistently closer to the ground-truth COVID-19 infected region, demonstrating that our method effectively utilizes the unlabeled COVID-19 images and improves segmentation performance. This section describes the ablation analysis, of which the purpose is to assess the importance of each component and visualize the feature distribution learned by our method. Contribution of each component. We validate the effect of each component of our method by analyzing the performance of the following setups: (1) Source-only, Source-only FDA MinEnt AdvEnt Ours Figure 4 : Qualitative results for the segmentation task. Columns 1 presents the input COVID-19 CT images with the ground truth marked in red, while columns 2 to 6 are the segmentation results for the Source-only approach, FDA [31] , MinEnt [37] , AdvEnt [37] , and our proposed method. which includes only the student network and is trained with the lung cancer data in a supervised manner; (2) w/o aug., in which the data augmentation process that is essential for overcoming the intensity shift is removed; (3) w/o con., in which the teacher network is removed, corresponding to λ = 0 and the consistency loss not being used to update the student network; (4) w/o ent., in which the entropy minimization is not calculated, which corresponds to λ = 0. As shown in Table 2 , the exclu-sion of any of the components, especially the teacher network, leads to a drop in performance, thus confirming that these components play an important role in the performance of our proposed method. In particular, data augmentation lays a foundation for alleviating the intensity shift, while the teacher-student training scheme allows robust transformation-invariant features to be effectively exploited. Visualization of feature distributions via t-SNE. We analyze our proposed method by visualizing the feature representations using t-SNE [48] . We input the lung cancer and COVID-19 images to the trained baseline model of our proposed network and visualize the output feature maps for these two groups of data. As shown in Fig.5 , for the baseline model, the feature distributions of lung cancer images are separated from those of the COVID-19 images because supervision only occurs with the lung cancer data. Thus, the learned features from the baseline cannot improve the segmentation performance for COVID-19 images. In contrast, our method projects the feature distributions of the two Baseline Ours Figure 5 : Visualization of the feature distributions via t-SNE for two different datasets. The blue and red dots represent the high-dimension features representations of lung cancer and COVID-19 CT images, respectively. datasets into an overlapping space, illustrating the effectiveness of our scheme in aligning the two distributions. This indicates that robust domain-invariant features can be learned by our proposed network. In this paper, we proposed a novel teacher-student based framework for unsupervised COVID-19 infection segmentation in CT images. We attempted to address a challenging situation where there are no annotations for COVID-19 CT images, but the annotations for lung CT with pulmonary nodules are available. Given the differences between the pulmonary nodules with the COVID-19 infection, we introduced a Fourier transform-based augmentation method to alleviate the intensity shift. We further constructed a teacher-student network that utilizes the consistency loss and entropy loss to allow the network to learn the robust features, thus overcoming the distribution shift to some extent. Experiments on the COVID-19 CT dataset demonstrated that the proposed method produces a competitive performance when compared with the state-of-the-art methods. A novel coronavirus outbreak of global health concern Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for covid-19 Ct imaging of the 2019 novel coronavirus (2019-ncov) pneumonia Serial quantitative chest ct assessment of covid-19: a deep learning approach Longitudinal assessment of covid-19 using a deep learning-based quantitative ct pipeline: illustration of two cases Inf-net: Automatic covid-19 lung infection segmentation from ct images A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images Dual-branch combination network (dcn): Towards accurate diagnosis and lesion segmentation of covid-19 using ct images A weakly supervised consistency-based learning method for covid-19 segmentation in ct images Anam-net: Anamorphic depth embedding-based lightweight cnn for segmentation of anomalies in covid-19 chest ct images Sharing clinical data electronically: a critical challenge for fixing the health care system Preserving patient privacy while training a predictive model of in-hospital mortality Transfer learning from partial annotations for whole brain segmentation, in: Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data Development and evaluation of an artificial intelligence system for covid-19 diagnosis The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans Momentum contrastive learning for few-shot covid-19 diagnosis from chest ct images Covid-19 and early-stage lung cancer both featuring ground-glass opacities: a propensity score-matched study Domain adaptation for medical image analysis: a survey Automatic covid-19 lung infected region segmentation and measurement using ct-scans images An automatic covid-19 ct segmentation based on u-net with attention mechanism Dual-sampling attention network for diagnosis of covid-19 from community acquired pneumonia A weakly supervised consistency-based learning method for covid-19 segmentation in ct images Domain adaptation for segmentation of critical structures for prostate cancer therapy Advancing medical imaging informatics by deep learning-based domain adaptation Few-shot learning for ct scan based covid-19 diagnosis Self-learning to detect and segment cysts in lung ct images without manual annotation Selflearning ai framework for skin lesion image segmentation and classification Robust pancreatic ductal adenocarcinoma segmentation with multi-institutional multi-phase partially-annotated ct scans Fda: Fourier domain adaptation for semantic segmentation Adversarial training and dilated convolutions for brain mri segmentation, in: Deep learning in medical image analysis and multimodal learning for clinical decision support Deep adversarial training for multiorgan nuclei segmentation in histopathology images Shape constrained fully convolutional densenet with adversarial training for multiorgan segmentation on head and neck ct and low-field mr images Tumor-aware, adversarial domain adaptation from ct to mri for lung cancer segmentation Domain adaptation based covid-19 ct lung infections segmentation network Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation Semi-supervised brain lesion segmentation with an adapted mean teacher model Uncertaintyaware self-ensembling model for semi-supervised 3d left atrium segmentation Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation Model compression There are many consistent explanations of unlabeled data Selfensembling attention networks: Addressing domain shift for semantic segmentation Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation U-net: Convolutional networks for biomedical image segmentation V-net: Fully convolutional neural networks for volumetric medical image segmentation Evaluation of segmentation algorithms for medical imaging Visualizing data using t-sne