key: cord-0972167-519jdq9n
authors: Wang, Yixin; Zhang, Yao; Liu, Yang; Tian, Jiang; Zhong, Cheng; Shi, Zhongchao; Zhang, Yang; He, Zhiqiang
title: Does non-COVID-19 lung lesion help? investigating transferability in COVID-19 CT image segmentation
date: 2021-02-23
journal: Comput Methods Programs Biomed
DOI: 10.1016/j.cmpb.2021.106004
sha: f23bf76617b517ae6285eee1932059f9f59fb50a
doc_id: 972167
cord_uid: 519jdq9n

Background and Objective: Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19 infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions and a better strategy to train a robust deep learning model for COVID-19 infection segmentation. Methods: Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. i) We introduce the multi-task learning method to get a multi-lesion pre-trained model for COVID-19 infection. ii) We propose and compare four transfer learning strategies with various performance gains and training time costs. Our proposed Hybrid-encoder Learning strategy introduces a Dedicated-encoder and an Adapted-encoder to extract COVID-19 infection features and general lung lesion features, respectively. An attention-based Selective Fusion unit is designed for dynamic feature selection and aggregation. Results: Experiments show that trained with limited data, proposed Hybrid-encoder strategy based on multi-lesion pre-trained model achieves a mean DSC, NSD, Sensitivity, F1-score, Accuracy and MCC of 0.704, 0.735, 0.682, 0.707, 0.994 and 0.716, respectively, with better genetalization and lower over-fitting risks for segmenting COVID-19 infection. Conclusions: The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Hybrid-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.

• Evaluating four transfer learning methods on a publicly available COVID 19 CT dataset and three public non COVID19 datasets

• Proposing a novel transfer learning strategy for COVID 19 dynamic feature selection and aggregation

• Transferability of COVID 19 model is crucial and helpful for doctors to make further assessment and quan tifi cation.

Does Non-COVID19 Lung Lesion Help? Investigating Transferability in COVID-19 CT Image Segmentation

In December 2019, the coronavirus disease 2019 (COVID-19) broke out and has become a global chal-E-mail addresses: wangyixin19@mails.ucas.ac.cn (Yixin Wang), hezq@lenovo.com (Zhiqiang He) tion (WHO).

A gold standard method to screen the COVID-19 patients is the real-time reverse transcription-polymerase chain reaction(rRT-PCR) [1] . However, it reports the results within few hours to 2 days and requires repeated tests [2, 3] . Moreover, such gold standards have been proven to have a high false negative rate due to the practical issues in sample collection and transportation as well as the performance of testing kits. Inversely, computed tomography (CT) images can not only detect most of the positive ones by RT-PCR, but also detect a lot more cases, especially for patients in the early stage [4, 5] . They have also shown strong ability to capture ground glass and bilateral patchy shadows which are typical CT features in affected patients [6, 7, 8] . Thus, chest CT has been adopted as a major diagnostic modality to confirm positive COVID-19, to effectively help the diagnosis of COVID-19 and to distinguish patterns and features in patients. Given that traditional CT imaging analysis methods are time-consuming and laborious, it is of great importance to develop artificial intelligence (AI) systems to aid COVID-19 diagnosis [9] .

Segmentation in CT slices is a key component of the diagnostic work-up for patients with COVID-19 infection in clinical practice [10] . It can provide more detailed information related to the pathology, and is better for the quantitative measurement of lesion size and the extent or severity of lung involvement, which may have prognostic implications. Therefore, many recent works are focusing on better segmentation methods of COVID-19 infections from CT images [4, 11, 12, 13, 14, 15] . Recently, deep learning with CNNs has showed significant performance improvements on the automatic detection and automatic extraction of essential features from CT images, related to the diagnosis of the Covid-19. Though deep learning has made great progress in medical image segmentation [16, 17] , it remains a challenging task [18] in the field of COVID-19 lung infection, as existing public datasets on COVID-19 are relatively small and weakly labelled. Thus, training deep networks from scratch with inadequate data and task-specific nature such as COVID-19 infection may lead to over-fitting and poor generalization.

Transfer learning is an effective method to solve this problem, which helps to leverage knowledge and latent features from other datasets and avoids over-parameterization. In transfer learning, successful deep learning models such as ResNet, DenseNet and GoogLeNet have been trained on large datasets such as ImageNet. These pre-trained models have proven impressive performance on natural image downstream tasks, and even they have been used in skin disease diagnosis from photographs recently [19, 20, 21, 22, 23] . However, there exists no large-scale annotated medical image datasets as data acquisition is difficult, and high-quality annotations are expensive. Recent research [24] shows that transfer learning from natural image datasets to medical tasks produces very limited performance gain. In particular, though with large medical datasets for pre-training, transfer learning in the task of COVID-19 infection segmentation is still much more difficult: 1) The shape, texture and position of COVID-19 infections are in high variation. 2) Existing large medical CT datasets differ in domains with COVID-19 datasets. Thus, similar in domain, a pre-trained model from lung lesions may share more knowledge with COVID-19 infection and learn some general-purpose visual representations for lung lesions.

Evidence shows that larger datasets are necessarily better for pre-training and the diversity of datasets is extremely important [25] . In medical domain, pretraining from medical datasets, especially chest CT datasets tends to be more homogeneous compared to non-medical and other medical areas' data. Thus, non-COVID19 lung lesion CT imaging manifestations may serve as potential profit for COVID-19 segmentation. Existing works have proven that multitask training through simply fusing different lesions with COVID-19 affects model's representation ability for COVID-19 infection segmentation [26] . Therefore, with limited COVID-19 datasets, 1) whether these non-COVID19 lesions help and to what extent they can contribute to COVID-19 infection segmentation? 2) How to train a better pre-trained model using these non-COVID19 datasets for transfer learning? 3) In what manners can the pre-trained models fitted on non-COVID19 lesions be effectively transferred to COVID-19?

In this paper, we aim to answer the above questions which are significant for COVID-19 segmentation. To our best knowledge, this is the first study to explore the transferability of non-COVID19 datasets for COVID-19 CT images segmentation.

Our contributions are as follows.

• We experimentally assess the extent of contributions from non-COVID19 to COVID-19 infection segmentation. We found that despite the disparity between non-COVID19 lung lesion images and COVID-19 infection images, pre-training on the large scale well-annotated lung lesions may still be transferred to benefit COVID-19 recognition and segmentation.

• We conduct extensive experiments using various non-COVID19 lung lesion datasets. Although pre-training on a single non-COVID19 dataset is unstable among different transfer strategies, learning from different non-COVID19 lesions demonstrates promising performance since such a multi-task learning process can share knowledge from different related tasks and discover common and general representations of lung lesions.

• We design four different transfer learning strategies with various performance gains and training time costs: Continual Learning, Body Finetuning, Pre-trained Lesion Representations and Hybrid-encoder Learning. The Hybrid-encoder strategy effectively combines both non-COVID19 and COVID-19 features and shows the best performance. We further conclude that it is possible to freeze the encoder and train only the decoder on COVID-19 tasks during finetuning, which provides significant performance gains and fast convergence speed.

In this section, we review the recent related research on COVID-19 CT images, COVID-19 segmentation and transfer learning for COVID-19.

CT is widely used in the screening and diagnosis of COVID-19 since ground-glass opacities and bilateral patchy shadows are the most relative imaging features in pneumonia associated with infections. Recent research on CT-based diagnosis for COVID-19 has indicated great performance. Compared with traditional CT images processing, artificial intelligence (AI) serves as a core technology, enabling a more accurate and efficient solution. The machine learning-based CT radiomics models such as random forest, logistic regression showed feasibility and accuracy for predicting hospital stay in patients affected in the work of [27] . Machine learning method was also adopted by Tang et al. [28] to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model. Gozes et al. [29] presented a system that utilized robust 2D and 3D deep learning models, modifying and adapting existing AI models and combining them with clinical understanding. Huang et al. [30] monitored the disease progression and understood the temporal evolution of COVID-19 quantitatively using serial CT scan by an automated deep learning method.

Segmentation is an essential step in AI-based image processing and analysis [10] . In particular, segmenting the regions of interests (ROIs) of COVID-19 infections is crucial and helpful for doctors to make further assessment and quantification. However, manual contouring of these infections is timeconsuming and tedious. Although plenty of methods have been explored on COVID-19 diagnosis and classification, there are very few works on the segmentation of COVID-19 infection due to its great annotation challenges.

Shan et al. [13] developed a DL-based system for automatic segmentation and quantification of infection regions and adopted a human-in-the-loop (HITL) strategy to accelerate the manual delineation of CT training. Zheng et al. [15] designed a weakly-supervised deep learning algorithm to investigate the potential of a deep learning-based model for automatic COVID-19 detection on chest CT volumes using the weak patient-level label. Based on semi-supervised learning, Fan et al. [12] presented a COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) for CT slices based on randomly selected propagation. Yan et al. [14] introduced a feature variation block which adaptively adjusted the global properties of the features for segmenting COVID-19 infection.

In transfer learning, deep models are first trained on large datasets such as ImageNet, then these pretrained models are fine-tuned on different downstream tasks. Several studies [25] , [31] , [32] have investigated transfer learning methodologies on deep neural networks applied to medical image analysis tasks. Plenty of works used networks pre-trained on natural images to extract features and followed by another classifier [33] , [34] . Carneiro et al. [34] replaced the fully connected layers of a pre-trained model with a new logistic layer and only trained the appended layer, yielding promising results for classification of unregistered multi-view mammograms. Other studies performed layer fine-tuning on the pretrained networks for adapting the learned features to the target domain. In [35] , CNNs were pre-trained as a feature generator for chest pathology identification. Gao et al. [36] fine-tuned all the layers of a pretrained CNN to classify interstitial lung diseases. Ghafoorian et al. [37] trained a CNN on legacy MR images of brain and evaluated the performance of the domain-adapted network on the same task with images from a different domain.

Due to the limited labeled COVID-19 data, several transfer learning methods have been applied to address this problem. Chouhan et al. [38] proposed an ensemble approach of transfer learning using pretrained models trained on ImageNet. Researchers used five different pre-trained models and analyzed their performance in chest X-ray images. In the work of [39] , several deep CNNs were employed for automatic COVID-19 infection detection from X-ray images through tuning parameters. Majeed et al. [40] presented a critical analysis for 12 off-the-shelf CNN models and proposed a simple CNN architecture with a small number of parameters that performed well on distinguishing COVID-19 from normal X-rays. By combining three different models which were fine-tuned on 3 datasets, Misra et al. [41] designed a multi-channel ensemble TL method based on ResNet-18 in such a way that the model could extract more relevant features for each class and identify COVID-19 features more accurately from the X-ray images.

Even though the above studies on transfer learning present inspiring achievement for COVID-19 research, there are several limitations: 1) They only focus on the ensemble of existing CNNs, but ignore the contribution of various datasets for pre-training.

2) Their studies are limited to X-ray dataset and only dedicate to COVID-19 detection and classification.

3) They lack an in-depth study on the transferability of different transfer manners related to COVID-19 infections. Our work contributes to a much more difficult task of semantic segmentation in COVID-19 CT images. We explore the transferability from the perspective of transferring knowledge from various non-COVID19 lung lesions. Moreover, we investigate better transfer methods to assist COVID-19 segmentation.

In this section, we briefly describe our backbone network in 3.1, then introduce the multi-task learning method to get a multi-lesion pre-trained model in 3.2. We further give detailed illustration on four transfer strategies employed in our work in 3.3-3.5. An overall comparison of the four strategies is clearly presented in Table 1 .

The U-Net is a commonly used network for medical semantic segmentation. As an advanced architecture, it has a U-shape like structure with an encoding and a decoding signal path. The encoder serves as a contraction to capture semantically image contextual features. The decoder is a symmetric expanding path recovering spatial information. The two paths are connected using skip connections on each same level, which recombine with essential high-resolution features from the encoding path. In this work, to better explore the transferability of COVID-19 segmentation, we build a strong 3D U-Net network as the baseline following nnU-Net [42] , which has surpassed most existing approaches on 23 public datasets in segmentation tasks. Instead of complex architecture, nnU-Net directly builds around the original U-Net architecture and automatically adapts itself to the specifics of COVID-19 dataset. Therefore, it's much more convenient to be re-implemented and adopted as a basis to explore transferability. Original batch normalization and ReLU are replaced by instance normalization and leaky ReLU. What's more, deep supervision loss [43] is aggregated to obtain multi-level deep supervision and facilitate the training process. 

Multi-task learning is an effective method to share knowledge among different related tasks. As for segmentation tasks, the performance of each task highly depends on the similarity among these tasks. Due to the large domain distance among those existing non-COVID19 lung lesion datasets, fusing these lesions to train a multi-task segmentation model tends to underperform on each single task. However, this multi-task training can exploit the shared knowledge which is essential for learning some general-purpose visual representations about lung lesions. Therefore, in our work, besides separately training segmentation models on each non-COVID19 dataset as pre-trained models for transfer learning, we provide a multi-lesion model learning from multiple lung lesions. Compared with learning from separate tasks, this multi-task strategy leads to a more robust pretrained model across all lesion tasks and empowers downstream COVID-19 task.

Continual Learning (CL) aims to learn from an endless stream of tasks [44] . It is built on the idea of learning continuously and adaptively about the external world and enabling the autonomous incremental learning of more complex skills and knowledge. This paradigm is capable of learning consecutive tasks without forgetting how to perform previously trained tasks. This is challenging that the training process tends to lose knowledge from the previous tasks due to the information relevant to current tasks. To avoid this, we adopt a training schedule to pre-train the model.

During pre-training the upstream task, the model is trained with a high initial learning rate, allowing the network to obtain optimal weights. The value of the initial learning rate is set as 0.01 and decays throughout the training process following the 'poly' learning rate policy (1 − epoch∕epoch max ) 0.9 . When the model is trained to convergence, the learning rate becomes a much smaller value, which is further set as the initial learning rate during training downstream COVID-19 task to prevent significant changes in its network parameters.

In the second downstream phase of COVID-19 infections, with the weights of the pre-trained model and the small learning rate as a start point, the model is trained following the same decay policy. In this way, the learning rate is decreasing continuously so that the weight parameters after training non-COVID19 lesion tasks tend to follow its training process while slightly being updated by the current COVID-19 infection task. This continual learning strategy is expected to be able to smoothly update the prediction model to take into account new tasks and data distributions (COVID-19) but still being able to re-use and retain useful knowledge and skills in the pretrained model (non-COVID19).

Fine-tuning is the most standard strategy to transfer knowledge from domains where data are abundant. In general, it is conducted by copying the weights from a pre-trained network and tuning them on the downstream task. Recent work [45] shows that fine-tuning can enjoy better performance on small datasets. It is of additional interest to assess the contribution of the encoder and decoder relative to learning COVID-19 knowledge. While the progres-sive reinitializatons demonstrate the incremental effect of each layers, it is unnecessary to prob the extent of localized reinitialization because our encoderdecoder network is not very deep. Therefore, we adopt two strategies for tuning a non-COVID19 lesion model on the COVID-19 downstream task.

Due to the large domain difference between pretrained tasks and downstream tasks, the most secured fine-tuning method is Body Fine-tuning, which means all parameters of the pre-trained models are used as the initial values to complete the training process of the model. When we train COVID-19 infection networks, all of the parameters are assigned initially from the pre-trained models on non-COVID19 lesions. In this fine-tuning strategy, the update of parameters largely depends on COVID-19 infection training process itself. Thus, it is a conservative fine-tuning approach that the training task of COVID-19 infection is not affected by the upstream pre-trained non-COVID19 models too much.

Our segmentation model is an encoder-decoder architecture and the encoder serves as a series of convolution operations to encode image features into context representations. These representations are trained on large relative datasets from upstream tasks, and fed as features to downstream ones. In natural language processing tasks, features extracted from internal representations of sequence language models are encoded as pre-trained text representations [44] , [45] . In this fine-tuning strategy, we aim to learn general lung lesion features, which we call Pre-trained Lesion Representations. We first train models on non-COVID19 lung lesion datasets. The encoders of these models are capable of encoding lesion features. In other words, the encoders' parameters can exhibit some transferability. Thus, in the following fine-tuning process on COVID-19 dataset, we preserve and freeze them while only fine-tuning and re-training the decoding parts.

During performing the above fine-tuning strategies, we face two challenges: 1) Body Fine-tuning easily falls into over-fitting because the downstream COVID-19 infection dataset is much smaller. 2) Utilizing the encoder of pre-trained models to capture feature representation is unstable, because the label spaces and losses for the upstream and downstream tasks differ inevitably. For example, though they are all lung lesions, they differ in appearance and shape. Therefore, we present a new transfer learning strategy for COVID-19, which incorporates three key properties:

• It leverages the transferred knowledge from non-COVID19 lung lesions.

• It gains stable performance improvement in both training from scratch and transfer learning methods.

• It shows no obvious training time increase.

To achieve these properties, we propose a Hybridencoder architecture. As shown in Fig.1 , we enhance the standard U-Net network by equipping with two encoders with the same architecture: Dedicatedencoder and Adapted-encoder. Furthermore, an attentionbased Selective Fusion unit is developed to aggregate information from both of the encoders by determining two sets of learnable weights.

The Dedicated-encoder is a task-specific feature extractor, focusing on segmenting COVID-19 infection through reinitializing all the parameters. These parameters Θ are all COVID19-specific which enable continuing update to the network.

The Adapted-encoder is an auxiliary feature extractor, aiming to learn general lung lesion features. Based on Pre-trained Multi-lesion learning in 3.2, we pre-train our 3D U-Net network's encoder to obtain dense representations of general lung lesions. When being transferred to target COVID-19 task, this pre-trained encoder serves as an adapted-encoder by totally freezing its pre-trained parameters as Θ .

Given a COVID-19 input volume as the ℎ sample X s ∈ ℝ × × , it is passed through the above paralleled encoders. Each encoder follows the same 3D U-Net, which consists of a number of stacked convolution layers and pooling layers. More specifically, let , be the output of ℎ layer of Dedicatedencoder and Adapted-encoder, respectively. These vectors can be obtained from the output of the previous layer −1 and −1 by a mapping (⋅):

where represents the weight matrix, * denotes convoluton operation, (⋅) and (⋅) represent instance normalization and leaky ReLU, respectively.

At the end of the encoding phase, both Dedicatedencoder and Adapted-encoder representations with channels containing rich semantic information are learned separately from COVID-19 and non-COVID19 datasets, denoted as X de = X ; Θ and X ad = X ; Θ , and the two encoders are parameterized by Θ and Θ , respectively.

Inspired by [46] , we design a Selective Fusion operation to combine and aggregate the informa-tion from both encoders to obtain a global and comprehensive representation for decoding phase. To achieve this, we first fuse X de and X ad using elementwise summation operation ⊕ as follows:

We then apply global average pooling to shrink X f through its 3D spatial dimensions × × . The ℎ element of channel-wise statistics ∈ ℝ is calculated by:

The output can be interpreted as integrated local descriptors to provide selection guidance. In order to exploit channel-wise dependencies, we conduct fully connection (fc) via ∈ ℝ × to reduce the dimension to ∈ ℝ ×1 with a reduction ratio :

To adaptively select different information from two encoders, we utilize softmax operation to obtain softattention across channels by:

where , ∈ ℝ × and , represent the soft attention vectors for X de and X ad , respectively. Through applying these soft-attentions using Eqn. 6, features from these two paralleled encoders are dynamic selected as incorporated COVID-19 feature representations and then fed into the decoding part.

=  ⋅ X de ⊕ ⋅ X ad (6) 

In this section, we will describe in detail the datasets, experimental setup and results of our investigation.

This dataset is released by Coronacases Initiative and Radiopaedia 1 . It is a publicly available COVID-19 volume dataset which contains 20 COVID-19 CT scans. In the work of [26] , [47] , left lung, right lung, and infections are well-labelled by two radiologists and verified by an experienced radiologist. Thus, with over 1800 annotated slices, this dataset serves as a downstream COVID-19 infection segmentation dataset for transfer learning in our work.

In order to better explore the transferability from various non-COVID19 lung lesions to COVID-19 infections, the following relationships need to be satisfied among these datasets: 1) The size of different lesion datasets are similar. 2) The shape, size and location of different lesion areas are relatively distinguishing. Therefore, in this paper, we introduce three public datasets.

This dataset was used in a crowd-sourced challenge of generalized semantic segmentation algorithms called the Medical Segmentation Decathlon (MSD) held in MICCAI 2018 2 . This dataset includes patients with non-small cell lung cancer from Stanford University (Palo Alto, CA, USA) publicly available through TCIA and previously utilized to create a radiogenomic signature [48] , [49] , [50] . The tumor regions are denoted by an expert thoracic radiologist on a representative CT cross section using OsiriX [51] . 63 3D CT scans with corresponding tumor segmentation masks are utilized in our paper.

StructSeg organ dataset is a collection of 3D organ CT scans along with their segmentation ground-truth from 2019 MIC-CAI challenge 3 . This dataset contains two types of cancers, nasopharyngeal cancer and lung cancer. We adopt the gross target volume segmentation of lung cancer from 50 patients. Each CT scan is annotated by an experienced oncologist and verified by another one.

This dataset is developed from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects 4 . 78 cases with pleural effusion are selected with their segmentation masks.

All the experiments are implemented in Pytorch and trained on NVIDIA Tesla V100 32GB GPU.

For fair comparison, we follow the settings on COVID-19 dataset benchmarks in [26] .

For the COVID-19 dataset, we use 5-fold cross validation based on a pre-defined dataset split file. Each fold contains 4 scans (20%) for training and 16 (80%) for testing. Training fewer data is more suitable for exploring the contribution of transfer learning.

For non-COVID19 lung lesion datasets, we all randomly select 80% of the data for training and the rest of 20% for validation. Pre-trained models based on these non-COVID19 lesions are all trained from scratch with random initial parameters using 3D U-Net network. Due to the limited number of different lesion cases, we do data pre-processing following nnU-Net [42] . The input patch size is set as 50×160×192 and batch size as 2, which should be chosen carefully [52] . Stochastic gradient descent optimizer with an initial learning rate of 0.01 and a nesterov momentum of 0.99 are used for non-COVID19 pre-training. Reduction ratio is set as 16, following [46] . We adopt a summation of Dice loss and Cross entropy loss as loss function.

Diagnostic evaluation is often used in clinical practice for disease diagnosis, patient follow-up or efficacy monitoring. Whether the results of a certain diagnostic evaluation are true, reliable and practical, will largely determine a reasonable medical decision. In this work, we introduce six evaluation metrics for the exploration of transferability.

Dice similarity coefficient (DSC) measures volumetric overlap between segmentation results and annotations. It is computed by:

where A is the sets of foreground voxels in the annotation and B is the corresponding sets of foreground voxels in the segmentation result, respectively. Normalized surface distance (NSD) [53] serves as a distance-based measurement to assess performance. It is computed by:

where ( ) and ( ) represent the borders of groundtruth and segmentation masks which use a threshold to tolerate the inter-rater variability of the annotators. We set = 3

for COVID-19 infection. In contrast to the DSC, which measures the overlap of volumes, the NSD measures the overlap of two surfaces.

We also consider four other evaluation metrics. Accuracy denotes the correct rate for both positive and negative predictions. Sensitivity shows the percentage of positive instances correctly identified positive. F1-score is the weighted harmonic average of Precision and Sensitivity which is an effective and comprehensive evaluation. MCC (Matthews correlation coefficient) is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positive, false positive, true negative, and false negative). Therefore, MCC is well-suited for our experiments on highly unbalanced binary label classes.

All of the four pre-trained models (MSD, Struct-Seg, NSCLC, Multi-lesion) are utilized to investigate the ability of transfer learning to COVID-19 datasets. With extensive experiments in Table 2 , we observe that different pre-trained models show different transferability. The best scores of models corresponding to each transfer learning strategies are highlighted along with the detailed comparative analysis of 5 validation folds.

Among pre-trained models on each single lesion (MSD, StructSeg, NSCLC), compared with training COVID-19 datasets from scratch, MSD tumor pretrained model improves the segmentation by 3.0% DSC and 3.2% NSD at most. Meanwhile, StructSeg and NSCLC lung lesion pre-trained models show instability under different transfer strategies. As shown in Table 2 , with Continual Learning and Body Finetuning strategies, StructSeg and NSCLC pre-trained models can achieve promising improvement in most Table 2 Results of 5-fold cross validation of different transfer learning strategies under different non-COVID19 lung lesion pre-trained models. The best results are shown in red font. 

As shown in Table 2 , we further notice that among all transfer learning strategies, multi-lesion pre-trained model not only achieves a high percentage for DSC and NSD values, but also performs the most stably among all pre-trained models, with the average DSC value of 0.696, 0.696, 0.693, 0.704, respectively. This shows the robustness of a multi-lesion pre-trained model used for transfer learning to COVID-19 infection. Table 3 further verifies this demonstration using Sensitivity, F1-score, Accuracy and MCC concerned with different transfer strategies. It is observed that transferring from multi-lesion pretrained model outperforms training from scratch on all these evaluation metrics among all strategies. Significantly, when it comes to Pre-trained Lesion Representations, where the encoder of pre-trained models is totally frozen and serves as a non-COVID19 lesion features extractor, the multi-lesion pre-trained model can still perform well. This confirms the ef-fectiveness of multi-task training for multiple lung lesions, which generates more robust and generalpurpose representations to help COVID-19 infection tasks. Fig.2(a)-(e) show some examples of segmentation results of the above pre-trained models. It is clear that compared with training from scratch, pre-training from single non-COVID19 models can obtain more accurate massive structures of COVID-19 but shows dissatisfying instability. However, pretraining from multi-lesion model shows high precision and smooth boundary like manually annotated.

Based on the above conclusion that multi-lesion pre-trained model brings consistently higher accuracy and robustness, we further analyze the performance of different transfer strategies adopted in this paper. Table 4 shows the results of COVID-19 infection segmentation using the same multi-lesion pretrained model under different transfer strategies. These results suggest that all the transfer strategies improve the performance of training from scratch on average DSC, NSD and Sensitivity. In particular, in fold 2, they improve the segmentation by more than 6.4% DSC and 6.6% NSD on maximum. The strategies of Continual Learning and Body Fine-tuning get similar promising results, which both improve the seg- Table 3 Results of Average Sensitivity, F1-score, Accuracy and MCC of different transfer learning strategies under different non-COVID19 lung lesion pre-trianed models. The best results are shown in red font. mentation by 2.3% DSC, 1.8% NSD and 3.4% Sensitivity on average. An interesting observation is that, compared with Continual Learning and Body Fine-tuning, where all the parameters are updated on COVID-19 dataset, the strategy of Pre-trained Lesion Representations still achieves a competitive performance with an entirely frozen encoder and inherited weights of pretrained non-COVID19 models. It is observed to improve the DSC from 0.673 to 0.693, NSD from 0.700 to 0.716 and Sensitivity from 0.643 to 0.662. In particular, in terms of training cost, Table 4 shows the training time (per epoch) for each transfer strategy. It can be clearly seen that the strategy of Pre-trained Lesion Representations spends just 148s per epoch on average, much less than training from scratch and other transfer strategies. Due to the frozen encoder, the strategy of Pre-trained Lesion Representations cuts down nearly a half of parameters that need to be updated. Thus, it is promising to adopt this strategy for COVID-19 transfer learning to save training costs and gain fast convergence.

It is also observed in Table 4 that our proposed Hybrid-encoder transfer learning strategy exhibits significantly better segmentation performance than other strategies using multi-lesion pre-trained model. It improves the average DSC from 0.673 to 0.704 and NSD from 0.700 to 0.735, which also performs best among all the pre-trained models in Table 2 . In terms of Sensitivity, F1-score, Accuracy and MCC in Table 3 , proposed Hybrid-encoder learning outperforms other strategies and enhances the values to 0.6818, 0.7069, 0.9943 and 0.7162, respectively. The transfer ability of Hybrid-encoder is also confirmed by Fig.3 . Compared with training from scratch, Hybrid-encoder learning yields segmentation results with more accurate boundaries in Fig.3 (b)(c)(e) and identifies some minor COVID-19 infection areas in Fig.3 (a) (d)(f). The success of proposed Hybridencoder learning strategy is owed to the designed parallel encoders, where the COVID-19 and non-COVID19 lesions are both employed for encoding feature representations, leading to better generalization and lower over-fitting risks.

A valuable demonstration is that a multi-lesion pre-trained model can make the best of multiple lung lesion representations, and advance the generalization and robustness of pre-trained models. The rationality of transferability from non-COVID19 to COVID-19 relies on their feature and texture similarity in CT images. Therefore, instead of starting the learning process from scratch, CNN first learns how to extract the features during pre-training with a diverse and large multi-lesion dataset, and the parameters can acquire appropriate values. Since the two datasets share common features, the model can pre-learn the shared knowledge about the shape, color, and edge of lung lesions. When a new dataset (i.e. inadequate COVID-19 infections) is given, the pretrained CNN can start from patterns that have been learned before and then targets to specific object concept about COVID-19 infections during training the downstream task. There is value in recognizing that the CT appearance of these different lung lesions share some similarity. Thus, with more kinds of lung lesion datasets incorporated to pre-train a model, we could achieve better performance. This exploration is an important contribution, enabling more research on transfer learning to COVID-19 infection from the perspective of utilizing non-COVID19 lung lesions. Moreover, this paper examines a series of different transfer learning strategies, including Continual Learning, Body Fine-tuning, Pre-trained Lesion Representations and the proposed Hybrid-encoder Learning. We observe segmentation improvement in all performance metrics. It is also noted that the strategy of Pre-trained Lesion Representations with a frozen encoder enhances performance as well. This gains more insight into the significant transferability exhibited by the encoding parameters. Though the encoding layers do not contain any explicit knowledge of the COVID-19 infection, their parameters still enable the optimizer to reach a higher performance while fine-tuning. The rationale is that the encoded multi-lesion representations contain more high-level and abundant encoding information of the medically relevant lung lesion observed in CT images. Through combining the multi-lesion representations and COVID-19 infection features, the proposed Hybrid-encoder achieves significant improvement. These observation and exploration are im-portant not only in COVID-19 transfer learning but also in the general medical domain, because feature reuse from pre-training out-of-domain datasets shows significant improvement for task performance and training convergence.

In this paper, we investigate the transferability in COVID-19 CT segmentation. We present a set of experiments to better understand how different non-COVID19 lung lesions influence the performance of COVID-19 infection segmentation and their different transfer ability under different transfer learning strategies. Our results reveal clear benefits of pre-training on non-COVID19 lung lesion datasets when public labelled COVID-19 datasets are inadequate to train a robust deep learning model. Among all the strategies, our proposed Hybrid-encoder Learning method based on multi-lesion pre-trained model effectively utilizes tranferred non-COVID19 lung lesion knowledge and gains significant improvement.

Future research directions include utilizing more various non-COVID19 lung lesion datasets and investigating better transfer learning methods, so that non-COVID19 lung lesions can be effectively used to improve the quality of COVID-19 infection segmentation in the absence of sufficient high-quality COVID-19 datasets.

Improved molecular diagnosis of covid-19 by the novel, highly sensitive and specific covid-19-rdrp/hel real-time reverse transcription-pcr assay validated in vitro and with clinical specimens

Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images

Covid-19 identification in chest x-ray images on flat and hierarchical classification scenarios

A rapid, accurate and machine-agnostic segmentation and quantification method for ct-based covid-19 diagnosis

Correlation of chest ct and rt-pcr testing for coronavirus disease 2019 (covid-19) in china: A report of 1014 cases

Ct imaging features of 2019 novel coronavirus (2019-ncov)

Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography

Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: Evaluation of the diagnostic accuracy

Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images

Inf-net: Automatic covid-19 lung infection segmentation from ct images

Lung infection quantification of covid-19 in ct images with deep learning

COVID-19 Chest CT Image Segmentation -A Deep Convolutional Neural Network Solution

Deep learning-based detection for covid-19 from chest ct using weak label

Capsnet topology to classify tumours from brain images and comparative evaluation

Diagnosis of alzheimer's disease with sobolev gradient based optimization and 3d convolutional neural network

Challenges and recent solutions for image segmentation in the era of deep learning

Deep learning based classification of facial dermatological disorders

Convolutional neural network based desktop applications to classify dermatological diseases

Comparative evaluations of cnn based networks for skin lesion classification

Analysis of deep networks with residual blocks and different activation functions: Classification of skin diseases

Skin disease diagnosis from photographs using deep learning

Transfusion: Understanding transfer learning for medical imaging

Not-sosupervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis

Towards data-efficient learning: A benchmark for covid-19 ct lung and infection segmentation

Machine learning-based ct radiomics model for predicting hospital stay in patients with pneumonia associated with sars-cov-2 infection: A multicenter study

Severity assessment of coronavirus disease 2019 (covid-19) using quantitative features from chest ct images

Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis

Serial quantitative chest ct assessment of covid-19: Deep-learning approach

Convolutional neural networks for medical image analysis: Full training or fine tuning?

Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning

Transfer learning for multicenter classification of chronic obstructive pulmonary disease

Unregistered multiview mammogram analysis with pre-trained deep learning models

Chest pathology identification using deep feature selection with non-medical training

Holistic classification of ct attenuation patterns for interstitial lung diseases via deep convolutional neural networks

Transfer learning for domain adaptation in mri: Application in brain lesion segmentation

Does Non-COVID19 Lung Lesion Help? Investigating Transferability in COVID-19 CT Image Segmentation for pneumonia detection in chest x-ray images

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Covid-19 detection using cnn transfer learning from x-ray images

Multi-Channel Transfer Learning of Chest X-ray Images for Screening of COVID-19

nnu-net: Breaking the spell on successful medical image segmentation

3d deeply supervised network for automatic liver segmentation from ct volumes

Parameter-efficient transfer learning for nlp

How to fine-tune bert for text classification

Selective kernel networks

COVID-19 CT Lung and Infection Segmentation Dataset

Nsclc radiogenomics: Initial stanford study of 26 cases

A radiogenomic dataset of nonsmall cell lung cancer

Nonsmall cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data-methods and preliminary results

Osirix: an open-source software for navigating in multidimensional dicom images

On the importance of batch size for deep learning

Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy

The authors have no conflict of interest to disclose.