key: cord-0884589-95libakm authors: Zhao, Shixuan; Li, Zhidan; Chen, Yang; Zhao, Wei; Xie, Xingzhi; Liu, Jun; Zhao, Di; Li, Yongjie title: SCOAT-Net: A novel network for segmenting COVID-19 lung opacification from CT images date: 2021-06-10 journal: Pattern Recognit DOI: 10.1016/j.patcog.2021.108109 sha: c2ca01f12643a88e059e81619fb971fa61de3971 doc_id: 884589 cord_uid: 95libakm Automatic segmentation of lung opacification from computed tomography (CT) images shows excellent potential for quickly and accurately quantifying the infection of Coronavirus disease 2019 (COVID-19) and judging the disease development and treatment response. However, some challenges still exist, including the complexity and variability features of the opacity regions, the small difference between the infected and healthy tissues, and the noise of CT images. Due to limited medical resources, it is impractical to obtain a large amount of data in a short time, which further hinders the training of deep learning models. To answer these challenges, we proposed a novel spatial- and channel-wise coarse-to-fine attention network (SCOAT-Net), inspired by the biological vision mechanism, for the segmentation of COVID-19 lung opacification from CT images. With the UNet++ as basic structure, our SCOAT-Net introduces the specially designed spatial-wise and channel-wise attention modules, which serve to collaboratively boost the attention learning of the network and extract the efficient features of the infected opacification regions at the pixel and channel levels. Experiments show that our proposed SCOAT-Net achieves better results compared to several state-of-the-art image segmentation networks and has acceptable generalization ability. The coronavirus disease 2019 (COVID- 19) , which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become an ongoing pandemic [1, 2] . As of 9 September 2020, there have been 212 countries with outbreaks, a total of 27,486,960 cases diagnosed, and 894,983 deaths, and the number of infected people 5 continues to increase [3] . Clinically, reverse transcription-polymerase chain reaction (RT-PCR) is the gold standard for diagnosing COVID-19 [4] , but it also has the dis-advantages of a high false-negative rate [5, 6] and the inability to provide information about the patients condition. COVID-19 has certain typical visible imaging features, such as lung opacification 10 caused by ground-glass opacities (GGO), consolidation, and pulmonary fibrosis, which can be observed in thoracic computed tomography (CT) images [6, 7] . Therefore, CT can be used as an essential tool for clinical diagnosis. CT can also directly reflect changes in lung inflammation during the treatment process and is a crucial indicator for evaluating the treatment effect [4] . However, in the course of treatment, the need 15 for repeated inspections leads to a sharp increase in the workload of radiologists. In addition, the assessment of inflammation requires a comparison of the region of lesions before and after treatment. Quantitative diagnosis by radiologists is inefficient and subjective and is difficult to be widely promoted. Artificial intelligence (AI) technology may gradually come to play an important role in CT evaluation of COVID-19 20 by enabling the evaluation to be carried out more quickly and accurately. AI can also realize the rapid response by integrating multiple functionalities, such as diagnosis [8, 9] , segmentation [10, 11] , and quantitative analysis [12, 13] , assisting doctors in rapid screening, differential diagnosis, disease course tracking, and efficacy evaluation to improve the ability to handle COVID- 19 . In this study, we focus on the segmentation 25 of COVID-19 lung opacification from CT images. Benefiting from the rapid development of deep learning [14] , many excellent convolutional neural networks (CNNs) have been applied to medical image analysis tasks and have achieved the most advanced performance [8, 15] . CNNs can be applied in various image segmentation tasks due to their excellent expression ability and data-driven 30 adaptive feature extraction model. However, the success of any CNN is inseparable from the accurate manual labeling of a large number of training images by medical personnel, so CNNs are not suitable for all tasks. COVID-19 lung opacification segmentation based on CT images is an arduous task that has the following problems. First, in the emergency situation of the COVID-19 outbreak, it is difficult to obtain 35 enough data with accurate labels to train deep learning models in a short time due to limited medical resources. Second, the infection areas in a CT slice show various features such as different sizes, positions, and textures, and there is no distinct boundary, which increases the difficulty of segmentation. Third, due to the complexity of the medical images, the lung opacity area is quite similar to other lung tissues and struc-40 tures, making it challenging to identify. Several works [16, 17, 18] have tried to solve these challenges from the perspectives of reducing manual depiction time, using noisy labels, and implementing semi-supervised learning, and have achieved specific results. Our approach in this study is derived from the attention learning mechanism, which makes full use of the inherent extraordinary attention ability of CNN to make the net- 45 work generate attention maps and make the attention vectors in the training process weight the spatial domain feature and channel domain feature. We will show that the spatial and channel domain features activated by the network can characterize the target area more accurately. The attention mechanism stems from the study of biological vision mechanisms 50 [19] , particularly selective attention, a characteristic of human vision. The feature integration theory proposed by Treisman and Gelade [20] uses a spotlight to describe the spatial selectivity of attention metaphorically. This model points out that visual processing is divided into two stages. In the first stage, visual processing quickly and spontaneously performs low-level feature extraction, including orientation, brightness, 55 and color, from the visual input in various dimensions in a parallel manner. In the second stage, visual processing will locate objects based on the features of the previous stage, generate a map of locations, and dynamically assemble the low-level features of each dimension of the activation area into high-level features. Generally speaking, essential areas attract the attention of the visual system more strongly. Wolfe et al. [21] believe that the attention mechanism uses not only the bottom-up information of the image but also top-down information of the high-level visual organization structure , and the high-level information can effectively filter out a large amount of irrelevant information. In our attention mechanism inspired model, we first use a traditional CNN to extract 65 local image features spontaneously. After that, we generate an attention map based on the low-level features of the previous stage to activate the spatial response of the feature, then calculate the attention vector based on the feature interdependence of the activation area to activate the channel response of the feature, and finally reorganize 4 of the high-level features. The attention map and attention vector contain top-down 70 information fed back to the current local features in the form of gating. It is clear that this coarse-to-fine attention process is a hybrid domain attention mode that includes spatial-wise and channel-wise attention modules. The attention learning method proposed above is specially designed to tackle the challenges faced by the task of COVID-19 lung opacification segmentation. The lung 75 CT slices of patients with pneumonia contain tissue structures easily confused with inflammation areas such as the trachea, blood vessels, emphysema background, and the existing CNN based methods complete segmentation mainly based on local information, leading inevitably to the overfitting of irrelevant information. In contrast, we designed the spatial-wise module to generate attention maps in feature extraction, 80 suppressing irrelevant information, and enhancing essential information in the spatial domain. Given the large intra-class differences between opacity regions, our channelwise module is designed to select and reorganize the spatial domain features. On the whole, we use a CNN with strong generalization ability to capture all the salient areas of lung CT images and then gradually enhance relevant and suppress irrelevant spatial 85 and channel domain features. It is like the process of radiologists searching for the target area, i.e., first finding the approximate search range through the relevant tissue structures, and then checking one-by-one whether each salient area belongs to the target [22] . Our method is more in line with such diagnostic experience of the radiologists. Our experimental results will show that compared with traditional CNNs, our so-90 called spatial-and channel-wise coarse-to-fine attention network (SCOAT-Net) recognizes the opacity area better when segmenting COVID-19 lung opacification. The contributions of this paper are threefold: • A novel coarse-to-fine attention network is proposed for segmentation of COVID-19 lung opacification from CT images, which utilizes embedded spatial-wise and 95 channel-wise attention modules and achieves state-of-the-art performance (i.e., an average Dice similarity coefficient, or DSC, of 0.8899). • We use the attention mechanism so that the neural network can generate attention maps without external region of interest (ROI) supervision. We use these attention maps to understand the training process of the network by observing 100 the areas that the network focuses on in different stages and increasing the interpretability of the neural network. • The generalization ability and compatibility of the proposed SCOAT-Net are validated on two external datasets, showing that the proposed model has specific data migration capability and can quantitatively assess the pulmonary involve-105 ment, a difficult task for radiologists. Deep neural networks (DNNs) have shown excellent performance for many automatic image segmentation tasks. Zhao et al. [23] proposed the pyramid scene parsing 110 network (PSPNet) , which introduces global pyramid pooling into the fully convolutional network (FCN) to make the global and local information act on the prediction target together. DeeplabV3 [24] proposed the ASPP (atrous spatial pyramid pooling) module to make the segmentation model perform better on multi-scale objects. U-Net [10] was introduced by Ronneberger et al. based on the encoder-decoder structure that 115 is widely used in medical image segmentation due to its excellent performance. It uses skip connections to connect the high-level low-resolution semantic feature map and the low-level high-resolution structural feature map of the encoder and decoder so that the network output has a better spatial resolution. Oktay et al. [25] proposed the attention gate model and applied it to the U-Net model, which improved the sensitivity 120 and prediction accuracy of the model without increasing the calculation cost. UNet++ [26] uses a series of nested and dense skip paths to connect the encoder and decoder sub-networks based on the U-NET framework, which further reduces the semantic relationship between the encoder and decoder and achieves better performance in liver segmentation tasks. 125 The segmentation of lung opacification based on CT images is an integral part of COVID-19 image processing, and there are many related works on this topic. Using the 6 lungs and pulmonary opacities manually segmented by experts as standards, Oulefki et al. [12] developed a CT image prediction model based on CNNs to monitor COVID- 130 19 disease development, and it showed excellent potential for the quantification of lung involvement. Some studies [27, 28, 29] trained segmentation or detection models with CT and segmentation templates of abnormal lung cases, which can extract the areas related to lung diseases, making the learning process of pneumonia type classification easier in the next steps. The deep learning model relies on a large amount of data train-135 ing, and it is impractical to collect a large amount of data with professional labels in a short time. Some studies [30, 31] use comparative learning as an entry point, which uses self-supervised comparative learning to obtain transformation-invariant representation features on limited-sample, effectively diagnosing COVID-19. Several research groups [16, 17, 18] attempted to solve this challenge from the perspectives of reduc-140 ing manual delineation time, using noisy labels, and implementing semi-supervised learning. VB-Net [16] has a perfect effect on the segmentation of COVID-19 infection regions. The mean percentage of infection (POI) estimation error for automatic segmentation and manual segmentation on the verification set is only 0.3%. In particular, it adopts a human-in-the-loop strategy to reduce the time of manual delineation signifi-145 cantly. Wang et al. [17] proposed noise-robust Dice loss and applied it in COPLE-Net, which surpasses other anti-noise training methods to learn COVID-19 pneumonia lesion segmentation in noisy labels. Inf-Net [18] uses a parallel partial decoder to aggregate high-level features and generate a global map to enhance the boundary area. It also uses a semi-supervised segmentation framework to achieve excellent performance 150 in lung infection area segmentation. More and more attempts have been focused on the combination of deep learning and visual attention mechanisms, which can be roughly divided into two categories: exogenous-attention mechanisms and endogenous-attention mechanisms. An 155 exogenous-attention mechanism allows the network to learn to generate an attention map during the training process by conducting ROI supervision externally so that the region activated by the network can accurately diagnose disease changes. Ouyang 7 et al. [32] applied this mechanism to the diagnosis of COVID-19 and glaucoma respectively, and the sensitivity was greatly improved. In contrast, a endogenous-attention 160 mechanism does not rely on exogenous ROI supervision but rather exploits the intrinsic endogenous-attention ability of CNN. Endogenous-attention consists of two parts, among which spatial-wise attention [25, 33, 34] redistributes the networks attention at the pixel level of the feature map to achieve more precise location, and channel-wise attention [35] redistributes the attention at the channel level to instruct the network in 165 selecting practical features. In Lei et al. [36] and Fu et al. [37] , spatial and channel dimension attention were combined with parallel mode to jointly guide network training, which captured rich contextual dependencies to address the segmentation task. Zhang [20] , which suggests that the attributes of a certain object are processed in sequence, i.e., the pre-attentive and the focused attention stages. For the segmentation of COVID-19 lung opacification, the spatial pre-attention in our SCOAT-Net helps reduce significantly the irrelevant area features and hence decrease the difficulty of optimiz-180 ing the channel attention for local feature extraction. Another recent model proposed in Mahmud et al. [41] not only includes spatial-and channel-level attentions but also introduces pixel-level attention to supplement the low-level features, which adds more model parameters. In contrast, to realize the integration of context features of various levels, our SCOAT-Net introduces skip connections to integrate the features of lower 185 level with that of current level, without introducing additional parameters. These integrated features are then used to effectively calculate the interdependence between the channel-wise attention modules and adaptively recalibrate the response. Up-sampling Skip connection Spatial-wise attention module UNet++ is an excellent image segmentation network which has achieved high-190 grade performance in medical imaging tasks [26] . It contains dense connections that make the contextual information of different scales closely related. However, although this complicated connection method improves the generalization ability of the model, it also causes information redundancy and weak convergence of the loss function on a small data set. Medical images have the characteristics of high complexity and noise, 195 which cause model overfitting when the amount of training data is insufficient. The SCOAT-Net proposed in this work redesigns the connection structure of UNet++ and introduces the more biologically plausible attention learning mechanism. It extracts the spatial and channel features from coarse to fine with only a few added parameters and obtains more accurate segmentation results. between the channels and adaptively recalibrates the information response of the channel. Additionally, in each convolution module, we use the residual block to train our network. The proposed spatial-wise attention module emphasizes attention at the pixel level, making the network pay attention to the key formation and ignore irrelevant information. Normally, in a CNN, the features extracted by the network change from simple low-level features to complex high-level features with the deepening of the convolutional layers. When calculating the attention map, we can not only use the information of single-layer features but also combine the upper and lower features of different resolutions. The final output of this module is expressed as x s ∈ R H u ×W u ×C u , which is given , : Attention Vector : 1 × 1 × , +1 : : by (1) and (2): where the function H R (·) stands for the convolution of size 1 × 1 followed by a batch x M ∈ R H u ×W u ×1 is the attention map generated by this module, which uses the saliency information in the spatial position to weigh the input features to complete the redistribution of the feature attention at the pixel level. The attention map generated by the sigmoid function is normalized between 0 and 1, and the output response will be weak- The input x c ∈ R H u ×W u ×C m of the proposed channel-wise attention module is obtained by concatenating the spatial-wise attention module's output x s with the feature map of the same layer, as in (3): where [·] represents concatenation. x g ∈ R 1×1×C m is the channel-wise statistical information calculated by x c through a global average pooling layer, as in (4), which can reflect the response degree on each feature map. We want the module to adaptively learn the feature channels that require more attention, and we also want it to learn the interdependence between channels. Inspired by the SENet [35] , we pass x g through two fully connected (FC) layers with parameters ω 1 and ω 2 to obtain the attention vector x V ∈ R 1×1×C m of the channel, as in (5): where ρ(·) refers to the ReLU activation function, and σ(·) refers to the sigmoid activation function. A structure containing two fully connected layers, which reduces the complexity and improves the generalization ability of the model, is adopted here. The fully connected layer of parameter ω 1 ∈ R Cm r ×C m reduces the feature channels' dimension with reduction ratio r (r = 16 in this experiment). In contrast, the fully connected layer of parameter ω 2 ∈ R C m × Cm r recombines the feature channels to increase its dimension to C m . The attention vector x V finally weights the input feature map x c , and after the convolution operation completes the feature extraction, it is added to itself to obtain the final output x i, j+1 ∈ R H u ×W u ×C u , as in (6): where H 2 R (·) represents the two-layer convolution for feature extraction. By combining binary cross-entropy (BCE) loss and Dice coefficient loss [42] , we use a hybrid loss function for segmentation as follows: where Y = {Y 1 , Y 2 , · · · , Y b } denotes the ground truths,Ŷ denotes the predicted proba-250 bilities, N indicates the batch size, and σ(·) corresponds to the sigmoid activation function. This hybrid loss includes pixel-level and batch-level information, which helps the network parameters to be better optimized. To evaluate the performance of lung opacification segmentation, we measure the Dice similarity coefficient (DSC), sensitivity (SEN), positive predicted value (PPV), volume accuracy (VA), regional level precision (RLP), regional level recall (RLR), and 95% HD between the segmentation results and the ground truth in 3D space, which are defined as follows. where V a and V b refer to the segmented volumes by the model and the ground truth, 255 respectively. In addition to the above voxel-level evaluation indicators, we also design the regionallevel evaluation indicators RLP and RLR, as in (9): where N a denotes the total number of connected regions of the model prediction result, N p denotes the number of real opacity regions predicted by the model, N b denotes the total number of real opacitiy regions, and N t denotes the number of real opacity regions predicted by the model. If the center of the connected area predicted by the model is in a real opacity region, then we accept that the predicted connected area is correct. We calculate the center of the connected area as: where U represents the point set of a single connected area of the prediction result, and V represents the point set of its edge. We use 95% HD (hausdorff distance) to measure the boundary accuracy of the segmentation results. HD is calculated as follows [17] : where S p and S g represent the surface point set of the segmentation result and ground truth VOIs, respectively. For 95% HD, the 95-th percentile in (11) is taken. This study and its procedures were approved by the local ethics committees. All methods were performed in accordance with the relevant guidelines and regulations. The entire experiment followed the Helsinki Declaration. Informed consent was not 265 required for this retrospective study (i.e., those discharged orwhodied). Written in- The aim of this experiment was to evaluate the performance of our proposed SCOAT-Net with different loss functions for lung opacification segmentation. We used six different loss functions, namely MSE, IOU [44] , BCE, Dice [42] , Focal [45] , and BCE-300 1 https://www.kaggle.com/c/covid-segmentation/data 2 https://github.com/Phanzsx/SCOAT-Net Dice, to train the proposed network with the same strategy and hyper-parameters, and the quantitative comparison is listed in ter, but it also causes the PPV and RLP performance to decline because it yields more false-positive predictions. The hybrid loss function combining BCE and Dice with pa-rameter α (α is empirically set to be 0.5 in the experiments) produced the best results. Except for SEN and RLP, which were slightly lower than Dice, the other indicators 310 were the best. The box plot shown in Fig. 3 demonstrates the performance of our proposed network with the BCE-Dice loss function. In 19 cases, the model we proposed exhibited excellent performance. The medians of DSC, SEN, PPV and PLP were all higher than 0.9, and the medians of VA was higher than 0.95, even though one or two cases did not achieve excellent results. We compared our proposed SCOAT-Net with other popular segmentation algorithms for lung opacification segmentation. The BCE-Dice loss function was used to train these networks. The quantitative evaluation of these networks was calculated by cross-validation, as shown in Table 2 . PSPNet had excellent PPV and RLP but the 320 lowest SEN. Although most of the predicted regions were correct, the voxel prediction could not cover all the opacity regions. ESPNetv2 had good PPV and RLR, but RLP was extremely low, which shows that the light-weight models could not achieve excellent region-level segmentation results on complex medical image segmentation tasks. DeepLabV3+ achieved an excellent result in Table 2 , which perhaps results from the 325 good adaptability of its atrous spatial pyramid pooling module designed for semantic segmentation. U-Net, which has an excellent performance in many medical image segmentation tasks, achieved general results in this work. Compared with U-Net, which has a more complex structure and more connections, UNet++ had slightly improved RLR performance, but it had a significant drop in other indexes. This indicates that its 330 dense connection improved the models generality but did not achieve excellent results on the relatively small dataset used in this work. Our proposed SCOAT-Net achieved the best performance among the compared networks. In particular, our model identified and segmented the pulmonary opacities more effectively by using spatial and channel-wise attention modules. Fig. 4 shows a visual 335 comparison of the results of each network. In the case #1 to the case #4, SCOAT-Net had the best segmentation performance, not only effectively hitting the target opacity region but also producing the least difference between the segmentation area and the ground truth. However, SCOAT-Net also returned some unsatisfactory segmentation results, as shown in the case #5 of Fig. 4 . All the models, including our model, failed 340 to predict this tiny opacity region. In this experiment, we verified the performance of the attention module on the lung opacification segmentation task. Our SCOAT-Net uses a total of six spatial-wise attention modules, as shown in the green circle in Fig. 1 . These modules can adaptively M further suppresses the attention area of the network in detail. As the training phase progressed, the attention regions of SCOAT-Net gradually became smaller. Additionally, for the opacity region that UNet++ did not recognize (the region indicated by the yellow arrow), SCOAT-Net adequately identified the target area, and on all the 360 attention maps, much attention focused on the target area. The attention module we designed not only effectively weights the feature map but also further helps us understand the training process of the neural network, which improves its interpretability. Furthermore, we also introduced the attention module from other studies [25, 34] into UNet and UNet++ and compared the results with that of our SCOAT (spatial-and 365 channel-wise coarse-to-fine attention) method, as shown in Table 3 . A1 imitates the connection structure of Attention UNet [25] , and A2 uses the pyramid attention module of [34] . Compared with the U-Net, we found that the model with A1 or A2 attention First, we used an external dataset of another center, i.e., the WUHAN dataset introduced above, to test the robustness and compatibility of the proposed SCOAT-Net. The scans in this dataset are different from the scans used for training. Fig. 6 presents 385 the lung CT scans of two cases under treatment. COVID-19 is clinically divided into Furthermore, we evaluated our model on the public KAGGLE dataset mentioned earlier, which includes 9 axial volumetric CT scans and is segmented the infected areas by a radiologist. In the experiment, we directly verified the model trained on our own data set, and the results on the 9 cases are shown in Table 4 complete segmentation. In summary, our proposed SCOAT-Net was validated on two different external datasets, proving that it has the ability to provide an objective assessment of pulmonary involvement and therapy response in COVID-19. Table 5 indicates that compared with others, the proposed method achieves a certain degree of performance on fine-grained opacity area segmentation, especially the higher IOU for consolidation, but it is not ideal for GGO segmentation. On the one hand, fine-grained segmentation of lung opacification is still a challenging task due to 425 the slight difference in imaging manifestation between GGO and consolidation. On the other hand, our SCOAT-Net does not have a specific design for this task, but for the segmentation of abnormal areas of the lungs. In the future, to obtain a better fine-grained segmentation performance we will attempt to design a customized attention module that uses differences in the shape and density of various types of lung opacification to 430 increase the distance between classes in the feature domain. CNNs have been widely used in various medical image segmentation tasks due to their excellent performance [10, 26, 25, 44] . Some networks have been improved from the perspective of connection structure (e.g., U-Net [10] ), and others have been 435 improved from the perspective of combining multi-scale features (e.g., PSPNet [23] ). These improvements have enhanced the expression ability of the models to a certain extent. However, due to the particularity of medical image-related tasks, only a small amount of applicable data can be obtained, making it impossible to converge when training conventional DNNs, which is a common problem. In addition to augmenting images at different time-points during the treatment. It provides a quantitative assessment of pulmonary involvement, which is a difficult task for radiologists but is essential to the clinical follow-up of patient disease development and treatment response. Despite the superiority mentioned above, our network still has shortcomings, e.g., failure of predicting certain tiny opacity regions, as shown in case #5 of Fig. 4 . This suggests that we can continue to enhance our network's recognition of targets of different scales by using multi-scale feature fusion or cascading convolution in different receptive field sizes. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this pa-25 per. Nowcasting and forecasting the poten-490 tial domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: Summary of a report of 72 495 314 cases from the chinese center for disease control and prevention Weekly operational update coronavirus disease 2019 Coronavirus disease 2019 (covid-19): A perspective from china A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Correla-510 tion of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: A report of 1014 cases Ct imaging features of 2019 novel coronavirus (2019-515 ncov) Dermatologist-level classification of skin cancer with deep neural networks Deep-covid: 520 Predicting covid-19 from chest x-ray images using deep transfer learning U-net: Convolutional networks for biomedical image segmentation Automatic ischemic 525 stroke lesion segmentation from computed tomography perfusion images by image synthesis and attention-based deep neural networks Automatic covid-19 lung infected region segmentation and measurement using ct-scans images Automated quantitative tumour response assessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study Deep learning Knowledgebased collaborative deep learning for benign-malignant lung nodule classification 540 on chest ct Abnormal lung quantification in chest ct images of covid19 patients with deep learning and its application to severity prediction A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images Inf-net: Automatic covid-19 lung infection segmentation from ct images A saliency-based search mechanism for overt and covert shifts of visual attention A feature-integration theory of attention Visual search in scenes involves selective and nonselective pathways Eye movements in medical image perception: a selective review of past, present and future Pyramid scene parsing network Encoder-decoder with atrous separable convolution for semantic image segmentation Attention u-net: Learning where to look for the pancreas Unet++: A nested u-net architecture for medical image segmentation Dual-branch combination network (dcn): Towards accurate diagnosis and lesion segmentation of covid-19 using ct images Deep learning-based triage and analysis of lesion burden for covid-19: a retrospective study with external validation Synergistic learning of lung lobe segmentation and hierarchi-29 cal multi-instance classification for automated severity assessment of covid-19 in ct images Momentum contrastive learning 590 for few-shot covid-19 diagnosis from chest ct images Multi-task contrastive learning for automatic ct and x-ray diagnosis of covid-19 Dual-sampling attention network for diagnosis of covid-19 from community acquired pneumonia Attention convolutional neural network for accurate segmentation and quantification of lesions in ischemic stroke disease Salient object detection with 605 pyramid attention and salient edges Squeeze-and-excitation networks Self-co-attention neural network for anatomy segmentation in whole 610 breast ultrasound Dual attention network for scene segmentation Attention residual learning for skin lesion 615 classification Automatic covid-19 ct segmentation using u-net integrated spatial and channel attention mechanism D2a u-net: Automatic segmentation of covid-19 lesions from ct slices with dilated convolution and dual attention mechanism Covtanet: A hybrid tri-level attention based network for lesion seg-625 mentation, diagnosis, and severity prediction of covid-19 chest ct scans V-net: Fully convolutional neural networks for volumetric medical image segmentation A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation Unet 3+: A full-scale connected unet for medical image segmentation Focal loss for dense object detection Espnetv2: A light-weight, 640 power efficient, and general purpose convolutional neural network Denseaspp for semantic segmentation in street scenes Ce-645 net: Context encoder network for 2d medical image segmentation Guideline for medical imaging in auxiliary diagnosis of coronavirus disease 2019 Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography Kiseg: A three-stage segmentation framework for multi-level acceleration of chest ct scans from covid-19 patients Enet: A deep neural network architecture for real-time semantic segmentation Data augmentation using learned transformations for one-shot medical image segmentation We would also like to thank LetPub for its linguistic assistance during the preparation of this manuscript. international journals and conference papers including Neuroimage, IEEE TPAMI, IEEE TIP, IEEE TBME, ICCV, CVPR, etc. He is also an active reviewer for more than ten leading journals and conferences. His research interests include visual mechanism modeling, and the applications in image processing for computer vision and medical diagnosis. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.