key: cord-0761621-3egum3w0 authors: nan title: Exploiting Shared Knowledge From Non-COVID Lesions for Annotation-Efficient COVID-19 CT Lung Infection Segmentation date: 2021-08-20 journal: IEEE J Biomed Health Inform DOI: 10.1109/jbhi.2021.3106341 sha: 59d1e675659a98723f22d24ec11ec8522f0fe9c2 doc_id: 761621 cord_uid: 3egum3w0 The novel Coronavirus disease (COVID-19) is a highly contagious virus and has spread all over the world, posing an extremely serious threat to all countries. Automatic lung infection segmentation from computed tomography (CT) plays an important role in the quantitative analysis of COVID-19. However, the major challenge lies in the inadequacy of annotated COVID-19 datasets. Currently, there are several public non-COVID lung lesion segmentation datasets, providing the potential for generalizing useful information to the related COVID-19 segmentation task. In this paper, we propose a novel relation-driven collaborative learning model to exploit shared knowledge from non-COVID lesions for annotation-efficient COVID-19 CT lung infection segmentation. The model consists of a general encoder to capture general lung lesion features based on multiple non-COVID lesions, and a target encoder to focus on task-specific features based on COVID-19 infections. We develop a collaborative learning scheme to regularize feature-level relation consistency of given input and encourage the model to learn more general and discriminative representation of COVID-19 infections. Extensive experiments demonstrate that trained with limited COVID-19 data, exploiting shared knowledge from non-COVID lesions can further improve state-of-the-art performance with up to 3.0% in dice similarity coefficient and 4.2% in normalized surface dice. In addition, experimental results on large scale 2D dataset with CT slices show that our method significantly outperforms cutting-edge segmentation methods metrics. Our method promotes new insights into annotation-efficient deep learning and illustrates strong potential for real-world applications in the global fight against COVID-19 in the absence of sufficient high-quality annotations. S INCE the beginning of 2020, the novel coronavirus disease (COVID-19) has rapidly spread worldwide, posing an extremely serious threat and challenge to all countries. This severe disease has been declared as a public health emergency of international concern by the World Health Organization (WHO), which has caused more than 2,600,000 deaths until the date of 9th March 2021, according to the statistics of Johns Hopkins Coronavirus Resource Center. 1 As one of the most commonly used imaging methods, computed tomography (CT) plays an important role in the fight against COVID-19 [1]- [3] . Researchers have proved that CT images have strong ability to capture typical features like ground glass and bilateral patchy shadows of affected patients [4] and are shown to be more sensitive compared with standard viral nucleic acid detection using real-time polymerase chain reaction (RT-PCR) for the early diagnosis of COIVD-19 infection [5] . Besides, CT images can provide visual evaluation of the extent of lung abnormalities and assist the process of prognostic [6] . In clinical practice, the segmentation of lung infections from CT images is an important component to assist in further assessment and quantification of the diseases [7] . Since manual contour delineation is time-consuming and laborious, and suffers from inter and intra-observer variabilities [8] , it is of great significance to develop artificial intelligence-based approaches to assist in the automatic segmentation of COVID-19 infections. Recently, the unprecedented development in deep learning has showed significant improvements and achieved state-of-the-art performances in many medical image segmentation tasks [9] - [12] , and deep neural networks have been widely applied in the global fight against COVID-19 [13] - [16] . However, the success of deep learning methods mainly requires large amount of high-quality annotated datasets, while it is impractical to collect large amount of well annotated data in real clinical approach, especially when radiologists are busy fighting the coronavirus disease. Additionally, as shown in Fig. 1 , the large variations in shape, size and position of lung infections and large inter-case variations pose great challenges for the segmentation tasks [17] . Therefore, exploring annotation-efficient COVID-19 lung infection segmentation methods with limited labeled data has become an urgent need especially in the current situation. Currently, there are several public non-COVID lung lesion datasets due to other clinical practices, such as MSD Lung for segmentation of lung tumor and NSCLC Pleural Effusion for segmentation of pleural effusion. These non-COVID datasets may serve as potential profit for generalizing useful information to the related COVID-19 infection segmentation task. Wang et al. [18] have proven that pre-training on non-COVID datasets can improve the segmentation performance of COVID-19 infection segmentation. However, the improvement of transfer learning is not stable when encountering large domain difference between datasets, and shared knowledge between COVID-19 and non-COVID lung lesions cannot be fully exploited. To address these challenges, we propose a novel relationdriven collaborative learning model for annotation-efficient COVID-19 CT lung infection segmentation by exploiting shared knowledge from non-COVID lesions. The network consists of encoders with the same architecture and a shared decoder. The general encoder is adopted to capture general lung lesion features based on multiple non-COVID lesions, while the target encoder is adopted to focus on task-specific features of COVID-19 infections. Features extracted from the two parallel encoders are concatenated for the subsequent decoder part. To exploit shared knowledge between COVID and non-COVID lesions, we develop a collaborative learning scheme to regularize the relation consistency between extracted features of given input. Our method can enforce the consistency of feature relation among extracted features and encourage the model to explore semantic information from both COVID-19 and non-COVID cases. Besides, the scheme can also be extended to utilize unlabeled COVID-19 data for feature relation regularization and achieve more consistent and robust learning. The contributions of this work are summarized as follows: r We propose a novel relation-driven collaborative learning model for annotation-efficient segmentation of COVID-19 lung infections from CT images by leveraging shared knowledge from non-COVID lesions to improve the segmentation performance of COVID-19 infections with limited training data. r We present a collaborative learning scheme to explore general semantic information from both COVID-19 and non-COVID cases by regularizing feature-level relation consistency of given input, so as to encourage the model to learn more general and discriminative representation of COVID-19 infections for better segmentation performance. The scheme can also be extended to utilize unlabeled COVID-19 data for the regularization to achieve more consistent and robust learning. r We have conducted extensive experiments on two COVID-19 datasets and two non-COVID lung lesion datasets for 2D and 3D segmentation tasks. The results show that our method achieves superior segmentation performance compared with other methods in the absence of sufficient high-quality COVID-19 data. In this section, we briefly review the research related to our work. We first review works on annotation-efficient deep learning for medical image segmentation. Then we review existing works on COVID-19 segmentation and transfer learning approaches for COVID-19. Compared with natural images, the annotations of medical images are much harder and more expensive to acquire due to following problems: 1) annotating medical images heavily relies on professional diagnosis knowledge of radiologists; 2) most modalities of medical images like CT are 3D volumes, which will take much more time and labor for annotation. To alleviate annotation scarcity, annotation-efficient methods have received great attention in medical image analysis community [19] , [20] . For example, semi-supervised learning aims at learning from a limited amount of labeled data and a large amount of unlabeled data, which is an effective way to explore knowledge from the unlabeled data [21] . Weakly supervised learning explores the use of weak annotations like noisy annotations and sparse annotations [22] . Besides, some approaches also aim at integrating multiple related datasets to learn general knowledge [23] , [24] . To issue the problem of limited labeled COVID-19 data, in this work, we aim at utilizing existing non-COVID lung lesion datasets for generalizing useful information to related COVID-19 task, so as to achieve better segmentation performance with limited in-domain training data. Automatic segmentation of COVID-19 infections from CT images is a crucial step to for quantification of the disease progression. Recently, several approaches have been proposed for COVID-19 lung infection segmentation. Shan et al. [25] Fig. 2. The overview of our proposed relation-driven collaborative learning model, where green and blue represent the data flow of general encoder and target encoder for COVID-19 infection segmentation, respectively. Extracted features from these two parallel encoders are concatenated for the input of the shared decoder. To exploit shared knowledge from non-COVID-cases, an additional data flow in orange is adopted. By regularizing feature-level relation consistency of given input, the model is encouraged to explore semantic information from both COVID-19 and non-COVID cases. Since the general encoder is applied to utilize non-COVID data and assist in the learning of COVID-19 segmentation as an auxiliary branch, we only employ the skip connections between the target encoder and shared decoder for the fusion of multi-scale features as shown in the top right corner. propose a deep learning-based system for automatic segmentation and quantification of infection regions. Amyar et al. [26] propose to improve the segmentation performance with a multitask learning approach. Xie et al. [27] propose a relational approach to leverages structured relationships by introducing a novel non-local neural network module to learn both visual and geometric relationships. Zhou et al. [28] propose a U-Net based segmentation network to incorporate spatial and channel attention for better feature representation. Other than fully supervised learning, Zheng et al. [29] develop a weakly-supervised approach to investigate the potential for automatic detection of COVID-19 based on patient-level label. Fan et al. [30] present a lung infection segmentation network for 2D CT slices with semisupervised strategy. Wang et al. [31] propose a noise-robust framework to learn from noisy labels for the pneumonia lesion segmentation task. Yao et al. [32] use a set of operations to synthesize lesion-like appearances for label-free segmentation. Ma et al. [33] propose an active contour regularized framework using region-scalable fitting to regularize and refine the pseudo labels for semi-supervised infection segmentation. Transfer learning aims to leverage knowledge and latent features from other datasets by pre-training models on large datasets and fine-tuning trained models on downstream tasks. Due to the problem of limited COVID-19 data, several transfer learning methods have been applied. For example, Chouhan et al. [34] propose an ensemble model to combine outputs from five pre-trained models based on ImageNet. Majeed et al. [35] adopt transfer learning procedure and propose a simple CNN architecture with a small number of parameters to distinguish COVID-19 from normal X-rays. Misra et al. [36] propose a multi-channel pre-trained ResNet architecture to facilitate the diagnosis of COVID-19. For segmentation of COVID-19 infections, Wang et al. [18] evaluate different transfer learning methods and revealed the benefits of transferring knowledge from non-COVID lung lesions. However, transfer learning only takes the advantage of existing models, and non-COVID cases are not utilized in the training procedure of downstream COVID-19 segmentation tasks. Different from these existing methods, our method aims at learning from COVID-19 and non-COVID lung lesions collaboratively to exploit shared semantic information. In this section, we first introduce the overview of our proposed method. Then we provide details of our relation-driven collaborative learning scheme and the overall training procedure. An overview of our proposed framework is shown in Fig. 2 . Following the design of standard U-Net [37] , [38] , our network consists of two encoders with the same architecture and a shared decoder. Since the encoder serves as a contraction to extract image contextual features, the upper one named general encoder (G) is adopted to capture general lung lesion features based on multiple non-COVID lesions, and the lower one named target encoder (T) is adopted to focus on task-specific features of the target COVID-19 infection segmentation task. After that, extracted features from these two parallel encoders are concatenated together for the input of the decoder. The shared decoder (D) serves as a symmetric expanding path to recover the spatial information of the extracted features. Since our motivation is to use non-COVID lesions to assist in the segmentation of COVID-19 infections, the general encoder can be seemed as an auxiliary branch to extract shared knowledge. Therefore, we only employ the skip connections between the target encoder and shared decoder for the fusion of multi-scale features as shown in the top right corner of Fig. 2 . Given a set of samples from non-COVID datasets D n , where X and Y denote the CT image and corresponding annotation of lung lesions. For the segmentation workflow, the general encoder is applied to extract general features, while the target encoder is applied to extract the task-specific features. These extracted features are concatenated and then fed into the decoder part to get the final segmentation results. To issue the problem of limited COVID-19 training data, instead of transferring pre-trained models to the downstream learning task of X c , we aim to involve X n collaboratively in the training procedure of COVID-19 to exploit shared knowledge from non-COVID cases, which can be used as a guidance for the learning of target COVID-19 infection segmentation. Specifically, a relation-driven collaborative learning scheme is applied to regularize the relation consistency between extracted features of given input and encourage the model to explore semantic information. Inspired by recent study on data-level regularization with sample relation [39] , to exploit shared knowledge between non-COVID and COVID-19 cases for collaborative learning, we propose to regularize feature-level relation consistency of given input, so as to facilitate the learning procedure of COVID-19 lung infections segmentation. Based on the assumption that general encoder is adopted to capture general features of lung lesions, and the target encoder is adopted to focus on task-specific features of COVID-19 infection, we aim to enforce the relation of features extracted from these two encoders as guidance for the collaborative learning approach. To estimate the relation of extracted features, we model the feature relation with channel-wise Gram Matrix [40] . For each input batch with B samples, we average the features within each batch to get the mean representation. We denote the extracted feature maps of encoder as F ∈ R C×H×W ×D for 3D segmentation networks or F ∈ R C×H×W for 2D segmentation networks, where H, W and D represent the spatial dimension of feature maps, and C represents the channel number. To obtain channel-wise feature relation, we reshape the feature maps into A ∈ R C×HW D or A ∈ R C×HW . After that, we get the channel-wise Gram Matrix as follows: where the value of m th row and n th column G mn is the inner product between the vectorized activation maps A (m) and A (n) , representing the similarity between the m th and n th channel of extracted features. Therefore, the final feature relation matrix R is obtained by conducting the L 2 normalization for each row of G as follows After the modeling of feature relation, our method encourages the network to learn more general and discriminative representation of COVID-19 infections by regularizing the feature relation consistency among given input, so as to explore semantic information from both COVID-19 and non-COVID cases. For explicit learning, the network is optimized based on the supervised segmentation loss L seg between outputŶ c and corresponding ground truth Y c . We use the combination of dice loss L dice and cross entropy loss L ce as the supervised segmentation loss, and deep supervision [41] is applied to obtain multi-scale supervision at different scales. The supervised segmentation loss can be summarized as Besides, to utilize the feature relation for collaborative learning, the non-COVID cases are additionally fed into general encoder to explore the general feature representation and its corresponding feature relation matrix R G (X n ). To ensure that general encoder can capture general features of lung lesions, our proposed scheme requires the generated feature relation matrices of general encoder to be stable using general relation consistency loss L G rc defined as While for target encoder, we enforce the extracted relation matrices of task-specific features to be more discriminative compared with general encoder using target relation consistency loss L T rc defined as where R G (X c ) and R T (X c ) denote the feature relation matrices of COVID-19 cases extracted from general encoder and target encoder, respectively. λ G and λ T are ramp-up weighting coefficients that control the trade-off between the segmentation loss and consistency loss, so as to mitigate the disturbance of consistency loss at early training stage. Since the network is supervised by limited COVID-19 cases, the training may become unstable and with poor generalization ability. By minimizing feature relation consistency losses L G rc and L T rc during the training procedure, the general encoder and target encoder can be enhanced to capture more general and discriminative representation, thereby exploring useful shared knowledge from adequate non-COVID data for better segmentation performance. Algorithm1 presents the detailed training procedure of our framework. For the optimation of the network, we update the target encoder and decoder based on the supervised segmentation loss L seg . Besides, relation consistency losses L G rc and L T rc are used to update the general encoder and target encoder, respectively. The collaborative learning scheme allow the two parallel encoders to benefit from each other's guidance, encouraging the model to explore semantic information from both COVID-19 and non-COVID cases. Training Procedure of Our Proposed Framework. Input:A batch of (X c , Y c ) from COVID-19 dataset D c and (X n , Y n ) from non-COVID dataset D n . Output:Trained network N with parameters θ G , θ T , θ D 1: while not converge do 2: (X c , Y c ), (X n , Y n ) ← sampled from D c and D n 3: Generate features of general encoder F G (X c ) and target encoder F T (X c ) 4: Generate general feature representation F G (X n ) 5: Calculate feature relation matrices R G (X c ), R T (X c ) and R G (X n ) as Eq. (1) and (2) 6: Generate segmentation outputŶ c 7: Calculate segmentation loss L seg as Eq. (3) 8: Calculate consistency losses L G rc , L T rc as Eq. (4) and (5) 9: Ramp up the weighting coefficients λ G and λ T 13: end while 14: returnTrained network N Since our proposed general relation consistency loss L G rc and target relation consistency loss L T rc do not require segmentation label, our proposed method can be straightforwardly extended to utilize unlabeled COVID-19 data for feature relation regularization. Specifically, we only activate the supervised segmentation loss L seg for labeled data, while computing the relation consistency losses L G rc and L T rc for all the training data. In this way, unlabeled data can be leveraged for the regularization to achieve more consistent and robust learning. A. Dataset Introduction 1) COVID-19 Dataset: We select out two public COVID-19 lung infection segmentation datasets for our experiments. The first dataset contains 20 CT volumes with over 1800 annotated slices released by Coronacases Initiative and Radiopaedial, which is publicly available at. 2 The annotation of infections is labeled by two radiologists and verified by an experienced radiologist by Ma et al. [42] . The second dataset is COVID-19 CT Segmentation dataset 3 collected by the Italian Society of Medical and Interventional Radiology, which contains 100 2D axial CT slices from different COVID-19 patients. A radiologist segmented the CT images using different labels for identifying lung infections. 2) Non-COVID Lung Lesion Datasets: In order to explore relevant information from non-COVID lung lesions to promote the annotation-efficient training of COVID-19 cases, we select 2 [Online]. Available: https://zenodo.org/record/3757476#.X4ABeYvivid 3 [Online]. Available: http://medicalsegmentation.com/covid19/ out two public non-COVID lung lesion segmentation datasets for our following experiments. The first dataset is MSD Lung Tumor Dataset of Medical Segmentation Decathlon (MSD) Challenge [43] in MICCAI 2018. 4 This dataset is comprised of patients with non-small cell lung cancer from Stanford University (Palo Alto, CA, USA) publicly available through TCIA. The tumor is annotated by an expert thoracic radiologist and 63 labeled CT volumes are used. The second dataset is NSCLC Pleural Effusion Dataset. 5 This dataset contains 78 CT volumes with annotation of pleural effusion. To exploit general features of lung lesions, we combine MSD and NSCLC datasets to form a non-COVID multi-lesion dataset in the following experiments. For 3D experiments of CT Volumes, to make a fair comparison, we follow the task settings of COVID-19 benchmarks in [42] . For the COVID-19 dataset, we make the same 5-fold cross validation based on pre-defined dataset split. Each fold contains 4 scans (20%) for training and 16 scans (80%) for testing. For non-COVID lung lesion datasets, we randomly select 80% of the data for training and the rest of 20% for validation. We use 3D U-Net [38] as the backbone network. Details of network architecture is shown in Table I . The input patch size is set as 56×160×192 with batch size of 2. Stochastic gradient descent (SGD) optimizer is used for training with initial learning rate of 0.01 and momentum of 0.99. 2) 2D Experiments on Axial CT Slices: To compare our method with state-of-the-art methods for 2D medical image segmentation, we make comparison experiments based on 2D COVID-19 CT slices. Following the same task settings of [30] , we use the same 50 images for training and validation, and the remaining 50 slices for testing. For non-COVID lung lesion datasets, we randomly select out 100 2D slices with lung lesions from different CT scans. Besides, unlabeled training set of COVID-SemiSeg Dataset [30] is used to evaluate the effectiveness of our proposed method to utilize unlabeled COVID-19 data. We use 2D U-Net [37] as the backbone network. Details of network architecture is shown in Table II . The input patch size is set as 448×384 with batch size of 2. Stochastic gradient descent (SGD) optimizer is used for training with initial learning rate of 0.01 and momentum of 0.99. All the experiments in our work are implemented in Pytorch [44] and trained on NVIDIA Tesla V100 GPUs. Our backbone network is based on nnUNet [45] that achieved state-ofthe-art results in 23 segmentation challenges with automatically designed U-Net according to the dataset properties. To unify the setting for our collaborative learning approach, we use planned network architectures of COVID-19 infection segmentation task Note that the general encoder and target encoder are with the same architecture as shown in the left column. for our framework. Following [46] , we use a Gaussian ramp-up function λ(t) = 0.1 * e −5(1−T /T max ) to control the balance between supervised loss and consistency loss, where T represents the current training step and T max represents the maximum training step. Motivated by the evaluation methods of the medical image segmentation decathlon [43] , we employ two complementary metrics to evaluate the segmentation performance. Dice Similarity Coefficient (DSC), a region-based measure is used to measure the region mismatch, and Normalized surface Dice (NSD), a boundary-based measure is used to evaluate how close the segmentation and ground truth surfaces are to each other. Both metrics take the values in [0,1] and higher scores represent better segmentation performance. Let G and S denote the ground truth and the segmentation result, respectively. The two metrics are defined as follows: where B ∂S denote the border regions of ground truth and segmentation surface at a threshold τ to tolerate the inter-rater variability of the annotators. We set τ = 3 mm for the evaluation of segmentation results in the following experiments. Besides, we also consider three other evaluation metrics in 2D experiments. Sensitivity (Sen) denotes the percentage of positive instances correctly identified. Specificity (Spec) denotes the percentage of predicted positive instances that are correctly identified. Mean Absolute Error (MAE) measures the pixel-wise error between segmentation output and corresponding groundtruth. To evaluate the effectiveness of the key components in our framework, we conduct ablation studies by removing the feature relation consistency loss. As shown in Table III , it is observed that all our methods can achieve better performance on all metrics compared with fully supervised methods, showing the effectiveness of our method. Besides, the usage of L G rc and L T rc can both further improve the segmentation performance compared with baseline. When removing the target relation consistency, the average segmentation performance of five folds is degraded by 0.4% and 0.2% on DSC and NSD, respectively. The result proves that the usage of target relation consistency loss L T rc can enforce the target encoder to be more discriminative, so as to improve the segmentation performance. However, the improvement is susceptible to the domain difference. Besides, we also conduct experiments of our backbone by removing the general relation consistency loss L G rc . In this way, the general encoder is frozen and are not updated during the training procedure, which means that the knowledge transfer is not available. The experimental results demonstrate that the average segmentation performance is degraded by 0.6% and 0.3% on DSC and NSD, showing the importance of knowledge transfer in our collaborative learning scheme. Some segmentation results of our method and 3D nnUNet are illustrated in Fig. 3 for visual comparison. As shown in the figure, our method can generate segmentation results with more accurate boundaries in Fig. 3(a)(b) , and less segmentation mistakes in small infection areas in Fig. 3(c)(d)(e) . These results demonstrate that the collaborative learning approach can better exploit shared knowledge from non-COVID cases, leading to better performance when generalizing on test data. To demonstrate the effectiveness of our method, we conduct extensive comparison experiments with other state-of-the-art methods. To ensure a fair comparison, all methods are experimented with the same network backbone and experimental settings. Segmentation models trained from scratch with only COVID-19 cases serve as our baseline results. Besides, as a simple and intuitive approach, pre-training segmentation models on non-COVID cases and fine-tuning on COVID-19 cases are utilized as comparison results for learning from both COVID-19 and non-COVID cases. The quantitative experimental results are [18] where additional non-COVID datasets are used. The best results are shown in red font and the second-best results in blue font. shown in Table IV . From the results, we can observe that transferring pre-trained models to COVID-19 infection segmentation tasks can generally improve the performance of training from scratch with only COVID-19 cases on most experiments, and using multi-lesion is superior to single-lesion when more general representations can be utilized to help COVID-19 infection tasks, with 2.3% and 1.7% improvements in DSC and NSD, respectively. However, these transfer learning methods show instability under different data distribution in five-fold cross validation experiments. The rationale is that the transfer ability largely depends on the domain difference between datasets. When there exists a large domain distance between non-COVID and limited COVID-19 training cases, transfer learning may somehow mislead the learning procedure. In [18] , the authors propose a multi-encoder architecture to freeze the non-COVID pre-trained encoder as an additional feature extractor for the training of COVID-19 cases. Features from the frozen adapted-encoder and reinitialized self-encoder are concatenated for the subsequent decoder. However, their workflow is still based on transfer learning, that training a network first on non-COVID cases and then on COVID-19 cases with foregoing pre-trained parameters. The main limitation is that the learning procedures of two tasks are separate. Therefore, the shared knowledge of non-COVID and COVID-19 cases cannot be fully exploited. It is observed that our method takes advantage of collaborative learning between two encoders and interactively improves the overall learning procedure. As a consequence, our method achieves higher segmentation performance with an averaged DSC of 70.3% and averaged NSD of 74.2%. Comparing with training from scratch, exploiting shared knowledge from non-COVID lesions can achieve further improvements with up to 3.0% in DSC and 4.2% in NSD. Paired T-test shows that the improvements are statistically significant at p < 0.05, validating the effectiveness of our proposed method. In this subsection, we compare our method with state-of-theart methods for 2D medical image segmentation, including U-Net [37] , U-Net++ [50] , Dense-UNet [49] , Attention-UNet [47] , Gated-UNet [48] , Inf-Net and Semi-Inf-Net [30] . Quantitative results are shown in Table V . As can be observed, our proposed method outperforms all comparing methods on all evaluation metrics by a large margin, validating the effectiveness of our framework. Paired T-test shows that the improvements are statistically significant at p < 0.05. Besides, we visualize some segmentation results of our method in Fig. 4 . These results indicate that our segmentation results are closer to the ground truth with less mis-segmented areas and outperform other methods significantly. For semi-supervised setting, we additionally integrate unlabeled COVID-19 cases of COVID-SemiSeg Dataset into the relation-driven training in our framework. These unlabeled cases can be utilized for the regularization of feature relation to achieve more consistent and robust learning and further improve the segmentation performance slightly. To better demonstrate the effectiveness of our proposed feature relation-driven learning, we make extensive experiments on several lesion segmentation datasets of medical images with different relation to COVID-19 infections. In addition to the lung tumor and pleural effusion datasets introduced before, we use LiTS dataset [9] with liver tumor annotations in abdominal CT volumes as non-COVID lesions in our method for comparison. Besides, to make comparison between intra-disease and interdisease relations, we use another multi-national CT dataset with labeled ground glass opacities [51] as an out-of-domain dataset for the learning of general branch, which is more relevant with similar appearance to the target dataset in our framework. In our experiments, we follow the settings of our 2D experiments with the same network backbone and implementation details. To make quantitative comparisons, we select out the same amount of 60 cases from these different datasets for the general branch, and 10 cases from the COVID-19 segmentation dataset for target branch. The experimental results are shown in Table VI . It can be observed from the table that using intradisease dataset that are more related to the target dataset can achieve better performance compared with other datasets under the same condition, which proves that more similar appearance can lead to more significant improvement by exploiting shared knowledge. Specifically, we observe that using non-lung lesions can also obtain comparable results compared with experiments using non-COVID lung lesions like lung tumor and pleural effusion. To visualize learning procedure of our method, we show some examples of general and target feature relation matrices at different epochs during the network training procedure in Fig. 6 . The absolute differences of these two matrices are shown in the right column in red to clearly visualize the alignment of matrices.It can be observed in Fig. 5 that as the training goes on, the general encoder gradually produces relation matrices with higher response at the same channel. Meanwhile, the absolute differences of feature relation matrices of non-COVID and COVID-19 cases extracted from general encoder are gradually decreased, indicating that the general encoder learns more general and robust representations of lung lesions.Besides, as observed in Fig. 6 , the absolute differences of general and target feature relation matrices of COVID-19 cases are gradually increased and tend to be stable as the training goes on, indicating that the target encoder is gradually enforced to focus on task-specific features and learn more discriminative representations compared with general encoder. With the outbreak of COVID-19 all over the world, designing effective automated tools for fighting against COVID-19 is highly demanded to improve the efficiency of clinical approaches and reduce the tedious workload of clinicians and radiologists. However, accurate segmentation of COVID-19 lung infections is a challenging task due to the large appearance variance of COVID-19 lesions of patients in different severity level, and existing data-driven segmentation methods mainly rely on large amount of well annotated data. In order to mitigate the insufficiency of labeled COVID-19 CT scans, it is essential and meaningful to develop annotation-efficient segmentation methods for the COVID-19 lung infection segmentation task. Considering that there are several public non-COVID lung lesion segmentation datasets due to other clinical practice, these datasets may serve as potential profit for generalizing useful information to assist in the related COVID-19 infection segmentation task. Some previous studies also highlight the usage of non-COVID lung lesions [18] , [42] . However, these existing approaches merely focus on investigating the transferability in COVID-19 infection segmentation. Although their results reveal benefits of pre-training on non-COVID datasets, the improvement is limited when shared knowledge between COVID-19 and non-COVID lung lesions cannot be fully utilized. Our experiment reveal that the proposed collaborative learning scheme can effectively exploit shared semantic information by regularizing the consistency between extracted features and promote the training procedure in the absence of sufficient high-quality COVID-19 data. In addition, our scheme can be extended to utilize unlabeled COVID-19 data for feature relation regularization. Experimental results show that even without annotations, our method can use unlabeled scans to explore feature relation and achieve more consistent and robust learning. Fig. 7 presents an example of challenging cases for COVID-19 lung infection segmentation. Although our method can achieve significant improvement by exploiting knowledge from non-COVID lesions, the limitation still exists. We observe that comparing with ground truth, there are still some mis-segmented areas when encountering challenging cases with multiple irregular infections. As a near future work, we intend to explore how to achieve more robust and reliable knowledge transfer. In addition, we also plan to extend our method to other medical image segmentation tasks to explore the usage of out-of-domain datasets for annotationefficient deep learning, thus enhancing the applicability of these methods in real-world applications. In this paper, we propose a novel relation-driven collaborative learning model to exploit shared knowledge from non-COVID lesions for annotation-efficient COVID-19 CT lung infection segmentation. Specifically, the model consists of two encoders with the same architecture and a shared decoder. The general encoder is adopted to capture general lung lesion features based on multiple non-COVID lesions and the target encoder is adopted to focus on task-specific feature of COVID-19 infection. To exploit shared knowledge from non-COVID lesions, we develop a collaborative learning scheme to regularize the relation between extracted features of given input for the training. We present a set of experiments on 2D slices and 3D volumes based on three COVID-19 datasets and two non-COVID datasets. Experimental results reveal clear benefits of utilizing non-COVID lesions in the absence of sufficient COVID-19 annotations to train a robust segmentation model. Moreover, we provide a semi-supervised learning solution to utilize the unlabeled COVID-19 cases for feature relation regularization and achieved performance improvements. Among all comparison experiments, our proposed method outperforms state-of-the-art methods and illustrates strong potential for real-world applications in the global fight against COVID-19. Management of lung nodules and lung cancer screening during the COVID-19 pandemic: CHEST expert panel report Diagnosis, prevention, and treatment of thromboembolic complications in COVID-19: Report of the national institute for public health of the Netherlands Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management CT imaging features of 2019 novel coronavirus (2019-NCOV) Sensitivity of chest CT for COVID-19: Comparison to RT-PCR CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19) Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19 Inter-observer variability of manual contour delineation of structures in CT The liver tumor segmentation benchmark (LITS) The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? Abdomenct-1k: Is abdominal organ segmentation a solved problem? Serial quantitative chest CT assessment of COVID-19: Deep-learning approach Adaptive feature selection guided deep forest for COVID-19 classification with chest CT Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography Contrastive cross-site learning with redesigned net for COVID-19 CT classification Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia Does non-COVID19 lung lesion help? investigating transferability in COVID-19 CT image segmentation Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation A survey on semi-supervised learning Scribble-based hierarchical weakly supervised learning for brain tumor segmentation Multi-organ segmentation via co-training weight-averaged models from few-organ datasets DodNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets Comput. Vision and Pattern Recognition Lung infection quantification of COVID-19 in CT images with deep learning Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans An automatic COVID-19 CT segmentation network using spatial and channel attention mechanism Deep learning-based detection for COVID-19 from chest CT using weak label Inf-Net: Automatic COVID-19 lung infection segmentation from CT images A noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from CT images Label-free segmentation of COVID-19 lesions in lung CT Active contour regularized semi-supervised learning for COVID-19 CT infection segmentation with limited annotations A novel transfer learning based approach for pneumonia detection in chest X-ray images COVID-19 detection using CNN transfer learning from X-ray images Multichannel transfer learning of chest X-ray images for screening of COVID-19 U-Net: Convolutional networks for biomedical image segmentation 3D U-Net: Learning dense volumetric segmentation from sparse annotation Semi-supervised medical image classification with relation-driven self-ensembling model A neural algorithm of artistic style 3D deeply supervised network for automatic liver segmentation from CT volumes Towards data-efficient learning: A benchmark for COVID-19 CT lung and infection segmentation A large annotated medical image dataset for the development and evaluation of segmentation algorithms PyTorch: An imperative style, high-performance deep learning library nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation Temporal ensembling for semi-supervised learning Attention U-Net: Learning where to look for the pancreas Attention gated networks: Learning to leverage salient regions in medical images H-denseUNet: Hybrid densely connected unet for liver and tumor segmentation from CT volumes UNet++: Redesigning skip connections to exploit multiscale features in image segmentation Rapid artificial intelligence solutions in a pandemic-the COVID-19-20 lung CT lesion segmentation challenge