key: cord-0881670-syrkaf5t authors: Fang, Chengyijue; Liu, Yingao; Liu, Ying; Liu, Mengqiu; Qiu, Xiaohui; Li, Yang; Wen, Jie; Yang, Yidong title: Label‐free coronavirus disease 2019 lesion segmentation based on synthetic healthy lung image subtraction date: 2022-04-22 journal: Med Phys DOI: 10.1002/mp.15661 sha: ce9522a5e8c2da46dd54508822ae972f8d01d17b doc_id: 881670 cord_uid: syrkaf5t PURPOSE: Coronavirus disease 2019 (COVID‐19) has become a global pandemic and is still posing a severe health risk to the public. Accurate and efficient segmentation of pneumonia lesions in computed tomography (CT) scans is vital for treatment decision‐making. We proposed a novel unsupervised approach using a cycle consistent generative adversarial network (cycle‐GAN) which automates and accelerates the process of lesion delineation. METHOD: The workflow includes lung volume segmentation, healthy lung image synthesis, infected and healthy image subtraction, and binary lesion mask generation. The lung volume was first delineated using a pre‐trained U‐net and worked as the input for the following network. A cycle‐GAN was trained to generate synthetic healthy lung CT images from infected lung images. After that, the pneumonia lesions were extracted by subtracting the synthetic healthy lung CT images from the infected lung CT images. A median filter and k‐means clustering were then applied to contour the lesions. The auto segmentation approach was validated on three different datasets. RESULTS: The average Dice coefficient reached 0.666 ± 0.178 on the three datasets. Especially, the dice reached 0.748 ± 0.121 and 0.730 ± 0.095, respectively, on two public datasets Coronacases and Radiopedia. Meanwhile, the average precision and sensitivity for lesion segmentation on the three datasets were 0.679 ± 0.244 and 0.756 ± 0.162. The performance is comparable to existing supervised segmentation networks and outperforms unsupervised ones. CONCLUSION: The proposed label‐free segmentation method achieved high accuracy and efficiency in automatic COVID‐19 lesion delineation. The segmentation result can serve as a baseline for further manual modification and a quality assurance tool for lesion diagnosis. Furthermore, due to its unsupervised nature, the result is not influenced by physicians’ experience which otherwise is crucial for supervised methods. The coronavirus disease 2019(COVID- 19) has become a global public health problem and now still affects billions of people's life. According to the World Health Organization, the pandemic has caused over 2 million deaths by April 2021. 1 The typical symptom of COVID-19 includes cough, fever, and pneumonia after infection. 2 Clinically, computed tomography (CT) scans are commonly used to evaluate the progress and severity of pneumonia 3, 4 due to their high resolution in three dimensions and broad availability compared to other imaging modalities. Accurate delineation of pneumonia lesions is vital for evaluating disease progression and assessing the severity of infection which is crucial for treatment decision making. 5 However, manual segmentation is time-consuming and labor-intensive. Therefore, automatic segmentation methods are highly demanded. In the past decade, deep learning has shown its tremendous power and potential in various radiological applications, including image segmentation, 6, 7 disease classification, and synthetic image generation. [8] [9] [10] [11] Since the beginning of the COVID-19 pandemic, deep learning, incorporating various image modalities like CT, X-ray, and ultrasound, 4, [12] [13] [14] [15] has been applied in clinical diagnosis, predicting disease progression, 16 classifying pneumonia types and assessing the severity of infection. 4, 17 However, existing methods are mostly based on supervised learning which requires substantial data labeling by radiologists as training references. For example, U-net networks have been used for the classification and segmentation of COVID-19 lesions in CT scans. [18] [19] [20] The results vary greatly among different studies, partially due to the inter-and intra-observer variations in the training lesion labeling by different radiologists. 18 Compared to supervised learning, unsupervised learning does not require training labeling, and hence gets rid of the burden of manual lesion delineation and the inter-and intra-observer inconsistency. For example, Yao et al. 21 proposed a label-free pneumonia lesion segmentation method that employed an unsupervised statistical method to simulate infected lungs from healthy ones. Zhang et al. 22 developed an unsupervised method to augment lung images for better follow-up segmentation training using a conditional generative adversarial network (GAN). 22 However, most of these methods focused on data augmentation with unsupervised networks, and had to rely on supervised networks to train the lesion segmentation process. 6, 21 Cycle consistent GAN (cycle-GAN) is an unsupervised network that has been widely used in medical image analysis, such as synthetic CT generation [23] [24] [25] and image transformation between different MRI sequences. 9, 24 Inspired by this, we propose a cycle-GAN-based unsupervised framework for COVID-19 USTC-1 77 Training CT Images without labeling USTC-2 14 Validation CT Images with labeling Coronacases 10 Test CT Images with labeling Radiopedia 8 Test CT Images with labeling UESTC 50 Test CT Images with labeling lesion segmentation. The cycle-GAN is used to convert infected lung slices to healthy lung slices by transforming pneumonia lesions into normal lung tissues. Then, the lesion is retrieved by subtracting the simulated "healthy" lung from the original image. The network does not require any image pairing or manual training label, hence can improve efficiency and eliminate the interand intra-observer inconsistency otherwise presented in supervised networks. In this study, CT scans of 77 COVID-19 patients with positive reverse transcription-polymerase chain reaction results were collected between December 2019 and January 2020 in the First Affiliated Hospital of the University of Science and Technology of China (USTC-1 dataset). The data was anonymized before any analysis. The patient and CT scan information is listed in Table 1 . All CT images were converted to 1×1×1 mm 3 spatial resolution and cropped to 256×256 pixels per slice. The image window level was set to [-800, 100] HU and all images were normalized to [-1, 1] with a zero background before being fed into network training. We selected 1264 healthy CT slices and 1272 slices with pneumonia lesions from the 77 CT scans as the training dataset. Lung volumes were extracted from CT images using a pre-trained 2D U-net lung segmentation network which was trained using 285 patients from a public dataset LUNA-16. 26, 30 The average dice similarity of this lung segmentation approach reached over 0.98 in the former study. 26 The extracted lung images were then used as the input of cycle-GAN. Another dataset including 14 patients from the First Affiliated Hospital of the University of Science and Technology of China (USTC-2 dataset) was used as the validation set. Three more datasets, including two public databases (Coronacases and Radiopedia) and one obtained from a published study 21 (UESTC), were used as the test set. The details of these datasets are listed in Table 1 . The scheme of the proposed method. G xy , G yx are generators for generating synthetic "Healthy" images and "Infected" images. GAN , cycle , idtentity are three loss functions used for training. D x , D y are two discriminators that distinguish real and synthetic images In the training stage, we first used a cycle-GAN strategy to generate synthetic healthy lung CT slices. The network architecture is illustrated in Figure 1 .We denote the infected lung CT slices by domain X and healthy lung CT slices by domain Y, the probability distribution for each domain is referred as P x and P y , respectively. The generator G XY denotes the mapping process from domain X to domain Y , and G YX denotes the mapping from domain Y to domain X .X,Ỹ are synthetic "infected"and "healthy" lung slices. Two adversarial discriminators D X and D Y are used to distinguish real input images and synthesized images. The architecture of the generator is a U-Net variant and consists of eight stages, as shown in Figure 2 . Unlike the original U-net, 7 instance normalization, which can better preserve image details in the image generation process, is applied immediately after each convolutional layer except for the last one. All convolution filters in the generator have a size of 3 × 3 pixels. We set the channel number of the first block to 64. In the encoder part, the width and height of the feature map are halved using convolution with a stride of 2 instead of max pooling. In the first four stages, the channel number is doubled after the feature map passes each layer, while in the last four stages the channel number is fixed to 512. All the feature maps in the encoder part are concatenated with their counterparts in the decoder part. The encoder and decoder parts are symmetric. The discriminator D X and D Y are implemented by a 70 × 70 Patch-GAN. 27 The architecture of the discrim-inator is illustrated in Figure 3 . The stride is set to 2 in the first three convolution layers and 1 in the last two layer, and the padding is 1 in all convolution layers. Leaky ReLu activation is applied with a slope of 0.2 after each convolution layer, except the last one. In the first convolution layer, a feature map with 64 channels is generated. After that, the channel number is doubled after the feature map passes each layer. In the last layer, the output is reduced to one channel. The generators and discriminators are trained by solving a min-max problem: min where total is the total loss aiming to learn the mapping function between the source and target domain. cycle and identity are introduced in (1) mainly to weigh the importance of the three losses. After optimization, we set cycle = 10 and identity = 5. GAN is the loss function of the discriminator calculating the difference between synthetic "healthy" slices and real healthy slices. To maintain stability during the learning process, we here choose L2 loss in the LSGAN 28 as our loss function instead of the sigmoid cross-entropy in regular GANs. 29 The GAN is defined The structure of the U-Net generator. The size of the input and output image is 256 × 256. The encoder and decoder parts each have eight stages. Skip connection is applied to each stage F I G U R E 3 The structure of the discriminator. In the 1st layer, convolution and Leaky-ReLu are applied. In the 2nd-4th layers, convolution, instance normalization, and Leaky-Reu are implemented. Only one convolution layer is applied in the last layer as: where x ∼ P x denotes the learning process on domain X. cycle is used to keep the consistency of the two generators G XY and G YX , and is defined as: where | ⋅ | 1 is the 1 norm. Since we only want to convert unhealthy lung CT slices into healthy ones, an identity loss identity in (1) is designed to keep the image fea-tures when a healthy slice is an input into the generator. Identity loss is defined as: We used the ADAM optimization method to train all the networks, with 1 = 0.5 and 2 = 0.999. Kernels were initialized randomly with a Gaussian distribution. We updated the generators and the discriminators at each iteration. The input image slice was randomly cropped to patches of 256 × 256-pixel size. The number of mini-batches was 1, and the number of epochs was 100. The learning rate was initially set to 0.0002 and linearly decreased to 0 in the last 50 epochs. The training was stopped at the 85 th epoch which rendered stabilized optimal dice similarity coefficient (DSC) results. All the hyperparameters were validated on the validation set (USTC-2). The training was conducted on a Linux The post-processing steps are illustrated in Figure 4 . The synthetic healthy image was subtracted from its corresponding real infected image to obtain a different image. The lung CT slices from each patient were reshaped into a 3D volumetric image. Median filtering was applied to suppress noise and remove small islands. Then k-means clustering was used to segment the lesion from the low-intensity background. A 5 × 5 Gaussian kernel was employed, slice by slice, to smoothen the lesion edge. Finally, erosion and dilation with a radius of 1 pixel were performed to further remove small and isolated regions. All post-processing steps, except the Gaussian filtering, were implemented on the 3D volumetric image. The post-processing procedures were done in MATLAB 2018A and took less than 2 min for each patient. As a comparison, we also compared the kmeans clustering method with Otsu thresholding which is commonly used for thresholding segmentation. To summarize the lesion segmentation process, a Unet previously trained on a publicly available dataset LUNA-16 30 was applied to segment the lung volume first. Then a cycle-GAN network was trained and employed to generate the corresponding synthetic healthy lung images from the infected lung images in the USTC-1 dataset. Finally, the lesion was obtained after subtraction of the synthetic healthy lung images from the corresponding infected lung images. The subtraction images containing lesions were combined into a 3D volumetric image based on which post-processing steps were performed to fine-tune the segmented lesion volume. The whole process is automated and does not need manual labeling of lung lesions as network training references. In this study, the DSC, volume precision (vPSC), and volume sensitivity (vSEN) defined in Equations (5)- (7) were used to evaluate the performance of the proposed segmentation method. The V pre and V gt in the equations represent the predicted and ground truth lesion volume. When it comes to the diagnosis, doctors are more interested in whether there is a lesion and where it is rather than how accurate the delineation is. We proposed an evaluation approach to count whether there is any lesion in different subvolumes of the lung. To mimic this process, the lung volume was divided into 12 subvolumes as illustrated in Figure 5 . Each subvolume was counted as positive if there was at least one lesion, F I G U R E 5 The 12 subvolumes in the lung volume division. The lung volume is divided into 12 regions equally and negative otherwise. Three additional metrics were used for comparison between the proposed segmentation method and other methods. The detection accuracy (ACC), precision (PSC), and sensitivity (SEN) are defined respectively as: The paired t-test was used to statistically compare the proposed method with other methods, that is, nnUNet-2D, nnUNet-3D, 31 Cople-Net, 18 Inf -Net 32, and NormNet. 21 As for those comparison methods, the nnUNet-2D/nnUNet-3D network was trained and validated on any two of the three datasets (Coronacases, Radiopedia, and UESTC) and tested on the remaining one dataset using a 5-fold cross-validation method; for all other networks, the network parameters were directly borrowed from the previous publications to build networks for the tests run on the three datasets. Table 2 shows the comparisons between the proposed method and several existing supervised and unsupervised methods. The results of our method reach a dice coefficient of 74.8 ± 12.1% and 73.0 ± 9.5% on the Coronacases and Radiopedia datasets, respectively. The volume precision and volume sensitivity are 81.3 ± 8.8%, 73.5 ± 20.5% on Coronacases and 77.3 ± 17.8%, 72.6 ± 11.1% on Radiopedia. As shown in Table 2 , our approach is comparable with the supervised Cople-Net method and outperforms the semi-supervised Inf -Net method on these two datasets. The performance of our method on the UESTC database reaches 64.4 ± 19.2%, which is still higher than Inf -Net. We also compared our method with a state-of -the-art unsupervised label-free 21 method (denoted as "NormNet" in Table 2 ). Our method results in higher scores in most indices and is more robust than the "NormNet" method when implemented on different datasets. The results also indicate that k-means clustering performs better than Otsu thresholding in image post-processing. Paired t-test results are shown in Table 3 . It is observed that the proposed method is comparable with the supervised methods in most cases. For example, our method is comparable to nnUNet-2D, nnUNet-3D, and Cople-Net on the Coronacases dataset, and comparable to nnUNet-3D, Cople-Net, and Inf -Net on the Radiopedia dataset. In UESTC, however, our method renders an inferior outcome compared to the supervised methods, but still outperforms the unsupervised method NormNet. The proposed method performs well even in small lesion segmentation. Figure 6a shows that a small lesion with only 2-mm width is correctly delineated. As shown in Figure 6b , our method can also readily separate lesions from the chest wall. Interestingly, our method catches some low contrast lesions which are skipped by radiologists during manual segmentation, as shown in Figure 6c . Figure 7 demonstrates the performance of our method compared to existing supervised methods. These lesions are varied in shape, size, and position. As shown in Figure 7a , our method localized and delineated, while other methods missed, the lesion close to the trachea. Our method performs better than Inf -Net and Cople-Net on a large lesion segmentation as shown in Figure 7b , while is comparable to all existing supervised methods in segmenting small lesions isolated in the lung volume and nearby the chest wall as shown in Figure 7c ,d. As shown in Table 4 , the average accuracy of our method on three datasets reaches 79.1%, which is comparable to the supervised methods. Our method outperforms the semi-supervised Inf -Net in detection accuracy and precision, except sensitivity. The results for the paired t-test on detection accuracy are shown in Table 5 . As shown both in Tables 4 and 5 , there is no significant difference between our method and nnUNet-2D, and a similar capability for lesion diagnosis can be observed between our method and all supervised methods when performed on database Coronacases and Radiopedia. F I G U R E 8 Segmentation limitations for the proposed method. The green contours are manual labels representing the ground truth and the red ones are automatic segmentation results by the proposed method in supervised learning approaches. It can work as an efficient and independent automatic segmentation method or can provide a start point for physicians' follow-up refinement. There are still some limitations to the proposed unsupervised method. First, as shown in Figure 8a , the method may miss some small and low contrast lesions. Second, the lung CT slices delineated by a U-net were used as the input, which may bear some false positive lesions misclassified from the top part of the liver. On three test sets, the top liver from three patients of a total of 68 (4.34%) was misclassified as a lesion. One example is shown in Figure 8b . Moreover, the sample number in this study might not be sufficiently large, particularly lacking patients with lesions in the top and bottom of the lung volume, potentially resulting in mis-delineation as shown in Figure 8c . On the other hand, the network was trained with 2D images due to hardware limitations. This strategy didn't make full use of the three-dimensional property of CT images. Future studies using 3D image input may improve segmentation accuracy, particularly in terms of the contour continuity along the image thickness direction. Despite this, the proposed unsupervised method still achieved decent segmentation results, with a dice value of 74.8% and an accuracy of 90.0% on the Coronacases public database. In the future, the unsupervised method can be combined with more sophisticated postprocessing methods, such as texture analysis or even additional deep learning networks, to further improve segmentation results. In this work, we propose a label-free approach that can accurately and efficiently delineate the COVID-19 lesions automatically in CT scans. The training process of the unsupervised network does not rely on any labeled data. The automatic segmentation can provide a start point for further manual refinement and can work as a quality assurance tool for lesion diagnosis. Due to its unsupervised nature, the result is not influenced by physicians' experiences which otherwise is crucial for supervised methods. COVID-19) dashboard with vaccination data Coronavirus disease 2019 (COVID-19) CT manifestations of coronavirus disease-2019: a retrospective analysis of 73 cases by disease severity Chest CT severity score: an imaging tool for assessing severe COVID-19. Radiol Cardiothorac Imaging Diagnosis and treatment of coronavirus disease 2019 (COVID-19): laboratory, PCR, and chest CT imaging findings 3D U-net: learning dense volumetric segmentation from sparse annotation U-net: convolutional networks for biomedical image segmentation Deep residual learning for image recognition MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks Generative adversarial networks MR-based synthetic CT generation using a deep convolutional neural network method Chest CT findings of coronavirus disease 2019 (COVID-19) CovXNet: a multidilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization Sonographic signs and patterns of COVID-19 pneumonia Is there a role for lung ultrasound during the COVID-19 pandemic? Chest CT findings in coronavirus disease 2019 (COVID-19): relationship to duration of infection Severity assessment of COVID-19 using CT image features and laboratory indices A noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from CT images Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19 Segmentation of COVID-19 infections on CT: comparison of four UNet-Based networks Label-free segmentation of COVID-19 lesions in lung CT CoSinGAN: learning COVID-19 infection segmentation from a single radiological image Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography Liver synthetic CT generation based on a dense-CycleGAN for MRI-only treatment planning CBCT-based synthetic CT generation using deep-attention cycleGAN for pancreatic adaptive radiotherapy Automated assessment of disease severity of COVID-19 using artificial intelligence with synthetic chest CT Unpaired image-to-image translation using cycle-consistent adversarial networks monet photos Least Squares Generative Adversarial Networks Generative adversarial nets LUng Nodule Analysis)16 -ISBI 2016 challengeacademic torrents nnU-Net: a self -configuring method for deep learning-based biomedical image segmentation Inf -Net: automatic COVID-19 lung infection segmentation from CT images The authors declare no conflict of interest.