key: cord-0719775-zzfn3sui authors: nan title: Drr4covid: Learning Automated COVID-19 Infection Segmentation From Digitally Reconstructed Radiographs date: 2020-11-16 journal: IEEE Access DOI: 10.1109/access.2020.3038279 sha: 8aeb91350c9d6fcacccf612f7cf04a8cf4c5a2ce doc_id: 719775 cord_uid: zzfn3sui Automated infection measurement and COVID-19 diagnosis based on Chest X-ray (CXR) imaging is important for faster examination, where infection segmentation is an essential step for assessment and quantification. However, due to the heterogeneity of X-ray imaging and the difficulty of annotating infected regions precisely, learning automated infection segmentation on CXRs remains a challenging task. We propose a novel approach, called DRR4Covid, to learn COVID-19 infection segmentation on CXRs from digitally reconstructed radiographs (DRRs). DRR4Covid consists of an infection-aware DRR generator, a segmentation network, and a domain adaptation module. Given a labeled Computed Tomography scan, the infection-aware DRR generator can produce infection-aware DRRs with pixel-level annotations of infected regions for training the segmentation network. The domain adaptation module is designed to enable the segmentation network trained on DRRs to generalize to CXRs. The statistical analyses made on experiment results have indicated that our infection-aware DRRs are significantly better than standard DRRs in learning COVID-19 infection segmentation (p < 0.05) and the domain adaptation module can improve the infection segmentation performance on CXRs significantly (p < 0.05). Without using any annotations of CXRs, our network has achieved a classification score of (Accuracy: 0.949, AUC: 0.987, F1-score: 0.947) and a segmentation score of (Accuracy: 0.956, AUC: 0.980, F1-score: 0.955) on a test set with 558 normal cases and 558 positive cases. Besides, by adjusting the strength of radiological signs of COVID-19 infection in infection-aware DRRs, we estimate the detection limit of X-ray imaging in detecting COVID-19 infection. The estimated detection limit, measured by the percent volume of the lung that is infected by COVID-19, is 19.43% ± 16.29%, and the estimated lower bound of infected voxel contribution rate for significant radiological signs of COVID-19 infection is 20.0%. Our codes are made publicly available at https://github.com/PengyiZhang/DRR4Covid. complex, and meanwhile, CT examinations are costly. As the number of infected patients rapidly increases, the routine use of CT brings heavy burden to the radiology department [6] . In contrast, CXR examination is much easier, faster and less costly, and provides high-resolution 2D images of the lungs that can detect a variety of lung conditions such as pneumonia, emphysema and cancer. CXRs are the typical firstline imaging modality used for patients under investigation of COVID-19 [7] . Therefore, automated infection measurement and COVID-19 diagnosis based on CXRs is important for faster examination, where infection segmentation is an essential step for assessment and quantification. Many approaches have been proposed for automated COVID-19 diagnosis based on CXRs, and have claimed notable detection accuracy of COVID-19 infection. However, due to the lack of sufficient CXRs with pixel-level annotations of infected regions, the majority of these approaches are designed by using classification models rather than segmentation models. The projective nature of X-ray imaging causes large overlapping of anatomies, fuzzy object boundaries and complex texture patterns, thus making it extremely difficult to delineate infected regions precisely on CXRs even for experienced clinicians [8] . As an alternative, some researchers have leveraged the interpretability of classification model (e.g., saliency map or attention map) to locate the infected regions roughly. However, such methods are unable to produce accurate COVID-19 infection segmentation for further assessment and quantification. Currently, to our best knowledge, no effective approaches have been developed for automated COVID-19 infection segmentation on CXRs as reviewed by Shen et al. [7] . Digitally reconstructed radiograph (DRR) [9] [10] [11] [12] is a synthetic X-ray image that is generated by simulating the passage of X-rays through a 3D CT volume in specific poses (position and orientation) within a virtual imaging system. CXR findings of COVID-19 infection reflect those described by CT [13] such as bilateral, peripheral consolidation and/or ground glass opacities (GGOs) [7] , [14] , [15] . Besides, delineating infected regions in 3D CT scans is much easier than in heterogeneous 2D CXRs because CT scans can provide accurate 3D images of the lungs rather than heterogeneous 2D images. Thus, we propose to learn automated COVID-19 infection segmentation on CXRs from labeled DRRs by leveraging the publicly available CT scans with voxel-level annotations of infected regions and the correlation between DRRs and CXRs. To this end, we propose a novel approach, called DRR4Covid, which can learn automated COVID-19 infection segmentation on CXRs from labeled DRRs. We design DRR4Covid with a modular framework, which consists of an infection-aware DRR generator, a deep segmentation network, and a domain adaptation module. Given a CT volume with voxel-level infection annotations, our infection-aware DRR generator can produce DRRs with adjustable radiological signs of COVID-19 infection, and generate pixellevel annotations of infected regions that match the DRRs accurately. Although such synthetic DRRs are photo-realistic, there is still a gap between synthetic DRRs and real CXRs, which may lead to a poor segmentation performance on real CXRs. Therefore, we introduce a domain adaptation module to train networks on labeled DRRs and unlabeled CXRs together. In this article, we provide a simple but effective implementation of DRR4Covid by using a domain adaptation module based on Maximum Mean Discrepancy (MMD), and a FCN-based [16] network with a classification header and a segmentation header. Extensive experiment results have confirmed the efficacy of our method; specifically, without using any annotations of CXRs, our network has achieved a classification score of (Accuracy: 0.949, AUC: 0.987, F1-score: 0.947) and a segmentation score of (Accuracy: 0.956, AUC: 0.980, F1-score: 0.955) on a test set with 558 normal cases and 558 positive cases. Besides, by adjusting the strength of radiological sign of COVID-19 infection in synthetic DRRs, we estimate the detection limit of X-ray imaging in detecting COVID-19 infection. The estimated detection limit, measured by the percent volume of the lung that is infected by COVID-19, is 19.43% ± 16.29%, and the estimated lower bound of the contribution rate of infected voxels for significant radiological signs of COVID-19 infection is 20.0%. The novelties and contributions of our study mainly come from four major aspects: 1) We propose a novel approach, i.e., DRR4Covid, to learn automated COVID-19 infection segmentation on CXRs. To our best knowledge, this is the first attempt to learn automated COVID-19 infection segmentation on CXRs by using the labeled DRRs that are generated from Chest CT scans. Owing to the modular framework, our DRR4Covid can be implemented flexibly with the off-the-shelf segmentation networks and domain adaptation algorithms. Moreover, DRR4Covid is a unified approach that can be applied to other lesion segmentation (e.g., lung nodule and tumor) on X-ray images; 2) We design an infection-aware DRR generator to synthesize infection-aware DRRs with pixel-level annotations of infected regions for training segmentation network. The statistical analyses made on experiment results have confirmed that our infection-aware DRRs are significantly better than standard DRRs in learning COVID-19 infection segmentation (p < 0.05); 3) We provide a simple but effective implementation of DRR4Covid by using a domain adaptation module based on Maximum Mean Discrepancy (MMD), and a FCN-based network with a classification header and a segmentation header. The statistical analyses made on experiment results have confirmed that the domain adaptation module can improve the infection segmentation performance on CXRs significantly (p < 0.05); 4) We estimate the detection limit of X-ray imaging in detecting COVID-19 infection for the first time, which VOLUME 8, 2020 is of great significance for the severity assessment of COVID-19 infection based on X-ray imaging. In this section, we review the related work from three aspects, including DRR, domain adaptation and CXR based screening of COVID-19 in light of infection segmentation. A digitally reconstructed radiograph (DRR) [9] [10] [11] [12] is a synthetic X-ray image that is generated by simulating the passage of X-rays through a 3D CT volume in specific poses (position and orientation) within a virtual imaging system. DRRs are generally used as reference images by the intensity based 2D to 3D image registration algorithms to verify the correct setup position of a patient for many radiotherapy treatments [17] [18] [19] . Each pixel value of DRR is obtained by calculating the radiological path length (RPL) [20] , i.e., the summation of the length travelled by the ray in each voxel, multiplied by the relative CT intensity of the voxel that is measured in Hounsfield units (HUs). Thus, with a high complexity level of O(n 3 ), the synthesis of DRRs is computationally intensive by nature [21] . Meanwhile, in the iterative optimization of 2D-3D image registration algorithms, the synthesis of DRRs is usually performed many times to calculate the similarity measure [19] , which greatly limits the running speed of 2D-3D image registration algorithms [12] . Therefore, the majority of previous studies have focused on this problem and have proposed plenty of improved approaches to accelerate the synthesis of DRRs [11] , [12] , [19] [20] [21] [22] [23] . In contrast, we are more concerned with the consistency between DRRs and the infection annotation masks. Thus, we directly design our infection-aware DRR generator based on SiddonGpuPy [24] , which combines the serial algorithm proposed by Jacob [11] to improve the original Siddon's algorithm [9] , and the parallel implementation proposed by Greef et al. [20] . The most closely related work is TD-GAN [8] and Deep-DRR [25] , [26] . TD-GAN aims to learn automatic parsing of anatomical objects in X-ray images from labeled 3D CT scans by using synthetic labeled DRRs. The pixel-level annotations of anatomical objects are obtained by projecting 3D CT labels along the same trajectories used in the synthesis of DRRs. TD-GAN adopts the CycleGAN architecture to perform unpaired image-to-image translation and unsupervised domain adaptation to enable the segmentation models trained on DRRs to generalize to real X-ray images. Similar strategy is also used by X2CT-GAN [27] to reduce the gap between synthetic DRRs and real X-ray images. Unlike TD-GAN and X2CT-GAN, DeepDRR attempts to produce more realistic radiographs and fluoroscopy from 3D CT scans to enable machine learning models trained directly on Deep-DRRs to generalize to clinical data without the need for domain adaptation. DeepDRR has been used in anatomical landmark detection in pelvic X-ray and to simulate X-rays of the femur during insertion of dexterous manipulators in orthopedic surgery. Both TD-GAN and DeepDRR care more about the anatomical structures than the lesion regions. Given a CT with COVID-19 infection, the existing DRR generators may produce a DRR with no findings due to the heterogeneity of DRRs. It is tough to keep the consistency between standard DRRs and annotation masks of lesion regions by using existing DRR generators. Therefore, we design a new infection-aware DRR generator to solve this problem through a category-weighted projection and RPL threshold method. Domain adaptation aims at rectifying the distribution discrepancy between the training samples (source domain) and test samples (target domain) [28] and tuning the model toward better generalization onto the target domain in a supervised or unsupervised manner. Numerous domain adaptation methods have been proposed based on deep models recently as deep networks can learn more transferable features for domain adaptation and achieve better performance [29] [30] [31] . The main insight behind these approaches is to extract domaininvariant representations by embedding domain adaptation modules in the pipeline of deep learning [28] , [32] [33] [34] [35] [36] [37] [38] . Existing deep domain adaptation methods align the distributions of source domain and target domain mainly from three perspectives. The first stream is image alignment, and image-to-image translation models are typically used to reduce the gap between source domain images and target domain images [8] . The second stream is feature alignment [32] [33] [34] [35] [36] [37] , which is the mainstream approach and aims to learn domain-invariant deep features. The last stream is output alignment, which is often used to learn semantic segmentation of urban scenes from synthetic data [28] , [38] . Moreover, we recognize that there are two main approaches to perform feature alignment, including adversarial approach [34] , [35] , [39] [40] [41] and non-adversarial approach [31] , [33] , [36] , [42] [43] [44] . The adversarial approach motivates deep models to extract domain-invariant features through adversarial training. It is done by training task-specific deep models to minimize the task-specific loss and the adversarial loss simultaneously, thereby fooling the domain discriminator to maximize the probability of deep features from source domain being classified as target domain. The non-adversarial approach is statistic moment matching-based approach, involving maximum mean discrepancy (MMD) [33] , [42] , [43] , central moment discrepancy (CMD) [44] and second-order statistics matching [36] . The statistic moment matching-based approach encourages deep models to extract domain-invariant deep features by minimizing the distance between the statistic moments of deep features from source domain and from target domain. MMD [45] is the most representative method, and has been widely used to measure the discrepancy between the source domain and target domain distributions [31] . Compared with the adversarial approaches, MMD-based methods are simple, stable and are easy to implement, and thus can facilitate to verify the efficacy of 207738 VOLUME 8, 2020 DRR4Covid quickly. In our implementation of DRR4Covid, we directly use an off-the-shelf MMD-based domain adaptation approach, i.e., LMMD proposed by Zhu et al. [31] , to enable the deep models trained on DRRs to generalize to real CXRs. Segmentation is an essential step in automated infection measurement and COVID-19 diagnosis, which can provide the delineation of the regions of interest (ROIs), e.g., infected regions, in the CXRs for further assessment and quantification. Many approaches have been proposed for automated COVID-19 diagnosis based on CXRs. However, the majority of these approaches are based on classification models rather than segmentation models as reviewed by Shen et al. in [7] due to the aforementioned reasons. Some researchers have leveraged the interpretability of deep classification models to highlight the infected regions rather than accurately segmenting the infected regions. Specifically, Oh et al. [6] introduce a probabilistic Grad-CAM saliency map to indicate the multifocal lesions within CXRs in their local patch-based deep classification models for COVID-19 diagnosis. Such method is derived from a famous explanation technique, i.e., gradient weighted class activation map (Grad-CAM), and can effectively locate the radiological signs of COVID-19 infection, such as the multifocal ground-glass opacification and consolidations. Similarly, Karim et al. [46] use a revised Grad-CAM, i.e., Grad-CAM++, and layer-wise relevance propagation (LRP) [47] in classifying CXRs as Normal, Pneumonia and COVID-19 to indicate the class-discriminating regions in CXRs. Besides, Tabik et al. [48] adopt multiple explanation techniques, including occlusion [49] , saliency [50] , input X gradient [51] , guided backpropagation [52] , integrated gradients [53] , and DeepLIFT [54] , to investigate the interpretability of deep classification models and highlight the relevant infected regions of pneumonia and COVID-19 separately. To sum up, these approaches based on explanation techniques are mainly used for the inspection of deep models' decision, and may not be suitable for further assessment and quantification. In comparison, our DRR4Covid is able to train deep segmentation models for precise infection segmentation directly without the need for the pixel-level infection annotations of real CXRs. In this section, we describe the modular framework of proposed DRR4Covid, and analyze the critical elements in the design of DRR4Covid, followed by an introduction of our implementation of DRR4Covid. Given CT scans with voxel-level infection annotations and unlabeled CXRs, we aim to learn deep models to perform automated COVID-19 infection segmentation on CXRs. We design DRR4Covid with a modular framework as shown in Fig. 1 . DRR4Covid consists of three key components, i.e., an infection-aware DRR generator, a deep classification and/or segmentation model, and a domain adaptation module. The basic workflow of DRR4Covid involves generating DRRs with pixel-level infection annotations from CT scans, and training deep models on synthetic labeled DRRs and unlabeled CXRs by using the domain adaptation module. The DRR generator is responsible for synthesizing photorealistic DRRs that resemble real CXRs as much as possible and producing pixel-level infection annotations that match the DRRs precisely by projecting 3D CT annotations along the same trajectories used in synthesizing DRRs. Highquality labeled DRRs in the context of this article can be defined by two conditions. One is a good consistency between DRRs and infection annotation masks; the other one is a good correlation between the radiological signs of COVID-19 infection in DRRs and in real CXRs. As CXRs are typically considered less sensitive than 3D CT scans [7] , it may happen that CT examination detects an abnormality, whereas the X-ray screening on the same patient reports no findings. DRRs also suffer from such problem, which will lead to the inconsistency between DRRs and infection annotation masks. This is the first key point for designing a high-quality DRR generator. The second key point is the correlation between the radiological signs of COVID-19 infection in real CXRs and in synthetic DRRs. Note that the synthetic DRRs and infection annotation masks are used later to train deep segmentation models. Thus, a large gap between the radiological signs of COVID-19 infection in real CXRs and in synthetic DRRs will make deep models trained on DRRs fail to generalize to real CXRs even if the domain adaptation module is applied. Although synthetic DRRs are photo-realistic, there is still a gap between DRRs and real CXRs. Thus, we introduce the domain adaptation module into the framework of DRR4Covid. According to the quality of synthetic labeled VOLUME 8, 2020 DRRs, the problem of training deep models on labeled DRRs and unlabeled CXRs for infection segmentation on CXRs by using the domain adaptation module can be divided into two categories. One is deep domain adaptation with fully supervised learning in the source domain (i.e., synthetic DRRs) and unsupervised learning in the target domain (i.e., real CXRs); the other one is deep domain adaptation with weakly supervised learning in the source domain and unsupervised learning in the target domain. The condition of the first category is a good consistency between DRRs and infection annotation masks. If such condition is not well satisfied, the problem will be turned into the second category due to the inaccurate synthetic infection annotations. Compared with the second one, the first category of problem is well defined, and has been extensively studied. In this article, we mainly focus on solving the first category of problem. Thus, we first implement a high-quality DRR generator, i.e., the infectionaware DRR generator. We design the infection-aware DRR generator to produce high-quality DRRs as defined in Section III-A. The standard DRR generator takes a CT volume or an infection annotation volume in a specific pose (position and orientation) as input and outputs a DRR or an infection mask. In contrast, our DRR generator takes both a CT volume and its infection annotation volume as input and produce a labeled DRR as illustrated in Fig. 2 . A ray is casted from the X-ray source through labeled CT volumes to the center of each pixel of DRR. Each pixel value of DRR is obtained by calculating the classweighted RPL [20] , i.e., the class-weighted summation of the length travelled by this ray within each voxel, multiplied by the relative CT intensity of the voxel that is measured in HUs. The calculation of the d-th pixel of DRR p d is formulated as where d is the 3D index set of the voxels in the X-ray direction, | d | is the number of voxels in d , l (i,j,k) represents the normalized length travelled by the ray within the (i, j, k)-th voxel, ρ (i,j,k) and w (i,j,k) denote the CT value and the weight of the (i, j, k)-th voxel, respectively. The weight of the (i, j, k)-th voxel is defined as where m (i,j,k) ∈ {0, 1, 2} is the category of the (i, j, k)-th voxel, 0, 1 and 2 represent the background, lungs and COVID-19 infection respectively. Note that the infectionaware DRR generator will produce standard DRRs when the weights of all categories are equal. On the other hand, where π c d denotes the contribution rate of the voxels of category c in calculating p d , and T c represents the contribution threshold of category c. Specifically, π c d is defined as where c d denotes the 3D index set of the voxels of category c in the X-ray direction. The strength of the radiological signs of COVID-19 infection in CXRs and DRRs depends on the contribution rate of infected voxels (CRIV) due to the projective nature of X-ray imaging. A higher value of CRIV represents a larger number of infected voxels appear in the X-ray direction, and the radiological signs of COVID-19 infection, e.g., GGOs, will be more significant. Such property of X-ray imaging can be well modeled in ''1'' and ''4'' by our infection-aware DRR generator. Increasing the weight of infected voxels will improve the value of CRIV and vice versa. Accordingly, our infection-aware DRR generator can produce DRRs with different strengths of radiological signs of COVID-19 infection simply by adjusting the weight of infected voxel. The synthetic pixel-level annotations of COVID-19 infection are also computed based on the CRIV. Therefore, our infectionaware DRR generator can maintain the consistency between synthetic DRRs and infection annotation masks easily by increasing the weight of infected voxels when the value of CRIV is too small. To sum up, our infection-aware DRR generator has the following advantages: 1) By setting the weight of infected voxels to a very small value, our infection-aware DRR generator is able to produce DRRs with no findings, which are essential for training deep classification models for COVID-19 diagnosis; 2) By setting the weight of infected voxels to a relatively large value, our infection-aware DRR generator can generate high-quality DRRs with pixel-level annotations of infected regions, which are essential for training deep segmentation models for precise COVID-19 infection segmentation; 3) By adjusting the weight of infected voxels from small values to large values, our infection-aware DRR generator will synthesize a serial of labeled DRRs with different strengths of the radiological signs of COVID-19 infection. Such DRRs might be able to be used to estimate the detection limit of X-ray imaging in detecting COVID-19 infection. We design a FCN-based network as depicted in Fig. 3 . It consists of a backbone network, a classification header and a segmentation header. Compared with FCN [16] , our model has an auxiliary classification header. The classification header is designed for two purposes. One is to enable our model to perform both classification task and segmentation task for automated infection measurement and COVID-19 diagnosis. The other one is to facilitate the use of MMD-based methods for domain adaptation. The backbone network is responsible for extracting deep features by performing the convolution and spatial pooling operations on DRRs and CXRs. The extracted deep features are then fed into the classification header and segmentation header separately. In the classification branch, we adopt a very simple structure with a global average pooling (GAP) layer and a fully convolution (FC) layer. In the segmentation branch, we use two convolutional layers followed by an up-sampling layer to generate the segmentation output with the same size as the input DRRs and CXRs. As a nonparametric distance estimate between two distributions, MMD [45] has been widely used in domain adaptation algorithms to measure the discrepancy between the source and target distributions. In our implementation, we adopt an off-the-shelf MMD-based domain adaptation approach, i.e., LMMD loss proposed by Zhu et al. [31] . LMMD can measure the discrepancy of local distributions by taking the correlations of the relevant subdomains into consideration. By minimizing the LMMD loss during the training of deep models, the distributions of relevant subdomains within the same category in the source domain and target domain are drawn close. As the LMMD method is proposed in the context of object recognition and digit classification tasks, we apply it to the classification header directly by aligning the deep features from the GAP layer. The effect of feature alignment can be propagated to the segmentation branch implicitly through the input of the GAP layer. The experiment results have verified the efficacy of our design, which will be detailed in Section IV-D. The training of our model is performed by minimizing the classification loss l cls , segmentation loss l seg , and LMMD loss l mmd simultaneously. The total loss is computed as where λ cls , λ seg , and λ mmd denote the weights of the classification loss, segmentation loss, and LMMD loss, respectively. We use the public COVID-19-CT-Seg dataset [55] , which consists of 20 public COVID-19 CT cases with voxel-level annotations of the left lung, right lung and COVID-19 infection. The annotations, first labeled by junior annotators, are refined by two radiologists with 5 years of experience, and are further verified and refined by a senior radiologist with more than 10 years of experience in chest radiology. In these 20 CT volumes, the voxel values of 10 volumes have been normalized to [0, 255], and thus we cannot access their CT values measured in HUs. We discard these ten cases and use the other 10 CT cases for the synthesis of DRRs in our experiments. For each CT case, we obtain 40 front-view DRRs and 40 lateralview DRRs with pixel-level annotations of infected regions by using our infection-aware DRR generator, which will be detailed in Section IV-B. Thus, we build a training set in the source domain with these 800 DRRs as shown in Table 1 . We use two public COVID-19 CXR collections [56] , [57] , which are constructed upon Radiopaedia [58] , COVID-19 image data collection [59] , Chest X-Ray Images (Pneumonia) [60] , SIRM [61] , Twitter COVID-19 CXR dataset [62] , and Hannover Medical School dataset [63] . The first collection [56] consists Table 1 . We perform 4-fold cross validation (75% patients for training and 25% patients for testing) on the training-validation set and perform independent testing on the test set. Note that these CXRs have no pixel-level expert annotations of infected regions. We can only use the image tags (i.e., positive or negative) to evaluate the classification and segmentation results. Therefore, we also introduce another CXR dataset, i.e., the BIMCV COVID-19+ dataset [64] , where a sub-set of 10 CXRs is annotated with ROIs of the COVID-19 findings (e.g., consolidation/GGOs) by a team of eight radiologists from the Hospital Universitario de San Juan de Alicante for the first iteration. To our best knowledge, this is the only COVID-19 CXR dataset that provides pixel-level annotations of infected regions currently. Although the sub-set is very small, we use it to provide preliminary insights into the real infection segmentation performance of our method. DRRs with no findings are important for training deep classification and segmentation models for COVID-19 diagnosis. Our infection-aware generator is able to generate such DRRs with no findings by setting the weight of infected voxels to a relatively small value to reduce the CRIV in the ray-casting process. In our experiment, we empirically set the weights of background, lung, and COVID-19 infection as w 0 = 24.0, w 1 = 24.0, and w 2 = 1.0. Several synthetic normal DRRs are depicted in Fig. 4 . It is easy to generate multiple DRRs from a single CT volume by adjusting the pose (position and orientation) of the CT volume within a virtual imaging system. In our experiment, we randomly shift each CT volume between −100 and 100, and rotate it between −45 • and 45 • in 3D directions. Several DRRs generated from a single CT volume are illustrated in Fig. 5 . For each CT volume, we first generate 40 normal DRRs, including 20 front-view DRRs and 20 lateral-view DRRs by randomly adjusting its pose. Next, in the same way we generate 40 DRRs and the corresponding pixel-level annotations of infected regions with given (w 0 , w 1 , w 2 ) and T 2 . By this means, we build a training set in the source domain with 800 DRRs as shown in Table 1 . Finally, with given the 63 different combinations of (w 0 , w 1 , w 2 ) and T 2 , we totally obtain 63 training sets in the source domain. This article aims at learning automated COVID-19 infection segmentation on CXRs from DRRs. To this end, we propose DRR4Covid, which consists of an infection-aware DRR generator, a FCN-based network and a MMD-based domain adaptation module. To verify the efficacy of our method, we conduct experiments from four aspects: 1) standard DRRs versus infection-aware DRRs; 2) using domain adaptation versus not using domain adaptation; 3) estimating the detection limit of X-ray imaging in detecting COVID-19 infection by searching for the best parameters (w 0 , w 1 , w 2 ) and T 2 ; and 4) evaluation of segmentation performance on 10 CXRs. Accordingly, in each fold, we first train the FCN-based network on the 63 training sets in the source domain without using the domain adaptation module respectively. Next, we train the same network on the 63 training sets in the source domain and the training set in the target domain by using the MMD-based domain adaptation module separately. All of the trained models are finally evaluated on the same validation set and test set. We report the results of the 4-fold cross validation in the format of Mean ± Standard Deviation. Note that the annotations of CXRs in the target domain are always kept unseen in all training tasks and our infection-aware DRR generator will produce standard DRRs when (w 0 , w 1 , w 2 ) equals to (1.0, 1.0, 1.0). ResNet-18 is adopted as the backbone of the FCN-based network in our experiments. We train the network with 100 epochs by using Adam optimizer with the parameters of β 1 = 0.9 and β 2 = 0.999. We adopt mini-batch of 16, and use an initial learning rate of 0.0001 that is linearly decayed by 2% each epoch after 50 epochs. We initialize the backbone network with the weights of ResNet-18 that are pre-trained on ImageNet. Data augmentation, involving random cropping, horizontal flipping, vertical flipping and random rotating, are performed. The input image size of our network is 256 × 256 × 3. Besides, the category-weighted cross entropy loss is adopted as the segmentation loss to emphasize the optimization of COVID-19 infection segmentation, where the weights of background, lung and COVID-19 infection are set to 0.1, 1.0 and 5.0. Binary cross entropy loss is used as the classification loss. The weights of the classification loss, segmentation loss and LMMD loss are set as λ cls = 1.0, λ seg = 1.0, and λ mmd = 0.3 respectively. We use the PyTorch1.4 framework to build the deep models. The infection-aware DRR generator is designed by using CUDA10.2, Python3.6, and Cython. All models are trained and evaluated on a Linux server equipped with four NVIDIA GTX1080ti GPU cards. For the classification output of our model, we adopt the commonly used classification metrics, including accuracy, F1-score and area under precision-recall curve (AUC of PR-curve). As the pixel-level annotations of infected regions are not available for the validation and test sets in the target domain (CXRs), we are unable to use the segmentation evaluation metrics directly. To enable evaluate the quality of segmentation output of our model, we convert the segmentation output into classification output by determining whether there exists infected regions in the segmentation output, and then adopt the same three classification metrics. As for the sub-set of 10 CXRs in the BIMCV COVID-19+ dataset, we directly adopt the commonly used segmentation metric, i.e., Dice similarity coefficient (DSC), to evaluate the segmentation results. Statistical tests are conducted to determine the significance in performance differences of learning infection segmentation from DRRs between standard DRRs and our infection-aware DRRs and between using our domain adaptation module and not using domain adaptation. According to the experiment design, we perform paired samples t-test to compare the means of scores. Specifically, we remove the outliers of the scores and apply Shaprio-Wilk test for normality. If the variables violate the assumption of normality, we perform Wilcoxon signed-rank test instead of paired samples t-test. The SciPy package is used in these analyses. We report the evaluation results of our model trained on the 63 training sets with/without MMD-based domain adaptation module in Table 4 -27 of the appendix. We will analyze these results from the four perspectives as introduced in the experiment design. CXRs. The first column represents the infection masks that are generated with the contribution threshold T 1 = T 2 = 0.00. We use such infection masks to indicate the infected pixels of DRRs whose corresponding X-rays pass through the infected voxels of CT volume. Firstly, we do qualitative comparison in Fig. 7 . As can be seen, many infected pixels in standard DRRs indicated by infection masks present no-findings due to the low contribution rate of infected voxels in the X-ray casting. This observation is consistent with the heterogeneous nature of X-ray imaging, and implicates that X-ray imaging has a lower sensitivity than CT imaging. We notice that the radiological signs of COVID-19 infection in standard DRRs are indistinctive. Fig. 6 , the radiological signs of COVID-19 infection become more significant gradually as the weight of infected voxels w 2 increases. Such ability of our infection-aware DRR generator promotes to take full advantages of the publicly available CT volumes and determine the precise infection annotation masks for training infection segmentation models. Note that visual findings of COVID-19 infection in DRRs will be unrealistic when the value of w 2 is too large, e.g., w 2 = 12.0, which may break the correlation between real CXRs and synthetic DRRs. Secondly, we analyze the classification and segmentation results on the validation and test sets without using domain adaptation in Table 16 -27 of the appendix. To avoid the influence of the subjective choice of contribution threshold, we average the performance scores on CTIV, and compare the average scores of standard DRRs and infection-aware DRRs visually in Fig. 8 and Fig. 9 . Our infection-aware DRRs achieve significantly higher average scores on both validation and test sets in the target domain than the standard DRRs. Such results indicate that the gap between infectionaware DRRs (e.g., w 2 = 3.0) and real CXRs is smaller than the gap between standard DRRs (w 2 = 1.0) and real CXRs, and thus verify the efficacy of our infection-aware DRR generator without using domain adaptation. Next, we analyze the classification and segmentation results on the validation and test sets with using domain adaptation in Table 4 -15 of the appendix. We compare the average results of standard DRRs and infection-aware DRRs visually in Fig. 10 and Fig. 11 . Similarly, the infection-aware DRRs surpass the standard DRRs by a large margin on both validation and test sets in the target domain. Such results strongly demonstrate the effectiveness of our infection-aware DRR generator with using the domain adaptation module. Finally, we perform statistical tests to compare the means of scores in Fig. 8, Fig. 9, Fig. 10 and Fig. 11 between standard DRRs and infection-aware DRRs. We observe statistical significant difference (p < 0.05) in all of these 96 comparison items, which indicates our infection-aware DRRs are significantly better than the standard DRRs in learning automated COVID-19 infection segmentation on CXRs from DRRs. In order to highlight the efficacy of our domain adaptation module, we compare the average scores of domain adaptation and no domain adaptation on the validation and test sets in Fig. 12 and Fig. 13 . This intuitive comparison shows that the using of our domain adaptation module can improve the classification and segmentation scores of infection-aware DRRs significantly and consistently, which verifies the efficacy of our domain adaptation module. Besides, seen from Fig. 8 and Fig. 9 , we notice that the average scores of infection-aware DRRs increase first and then decrease as the VOLUME 8, 2020 weight of infected voxels w 2 increases from 1.0 to 3.0 and then to 12.0. The peak of average scores of infection-aware DRRs appears at w 2 = 3.0. It suggests that an excessively large weight of infected voxels may make the infected regions in DRRs unrealistic, thus leading to a decrease in performance scores without using domain adaptation module. In contrast, there is no significant decrease in the average scores of infection-aware DRRs with using our domain adaptation module as shown in Fig. 10 and Fig. 11 when the weight of infected voxels w 2 increases from 3.0 to 6.0 and then to 12.0. It implies that the domain adaptation module still works well even when infected regions in DRRs become slightly unrealistic. On the other hand, we observe that the segmentation scores are relatively lower than the classification scores when the domain adaptation module is not applied. For instance, in the case of infection-aware DRRs with w 2 = 3.0, the average segmentation scores on the test set in the target domain, including the accuracy, AUC and F1-score, are 0.581, 0.889, and 0.694 respectively, whereas the corresponding classification scores are 0.869, 0.934, and 0.874. Such results implicate that the segmentation header is much more sensitive to the domain discrepancy between VOLUME 8, 2020 DRRs and real CXRs than the classification header. By using the domain adaptation module, both the segmentation scores and classification scores are greatly improved; specifically, the improvement in segmentation scores is much more significant than the improvement in classification scores. For instance, in the case of infection-aware DRRs with w 2 = 3.0, the average segmentation scores on the test set are 0.925 (+0.344↑), 0.969 (+0.080↑), and 0.917 (+0.223↑) respectively, whereas the corresponding classification scores are 0.948 (+0.079↑), 0.987 (+0.053↑), and 0.947 (+0.073↑). Such results indicate that our domain adaptation module works well not only for classification task but also for segmentation task, thus confirming our claim that the effect of feature alignment applied in the classification branch can be propagated to the segmentation branch implicitly. Finally, we perform statistical tests to compare the means of scores in Fig. 12 and Fig. 13 between using our domain adaptation module and not using domain adaptation. We observe statistical significant difference (p < 0.05) in all of these 60 comparison items except for 4 items, i.e., standard DRR's classification AUC on the validation set and standard DRRs' classification AUC, segmentation accuracy and segmentation F1-score on the test set. It indicates our domain adaptation module is able to improve the performance of learning infection segmentation on CXRs from DRRs significantly and thus confirms the efficacy of our domain adaptation module. We specifically use the case of infection-aware DRRs with w 2 = 3.0 and T 2 = 0.20 as an example to show the COVID-19 infection segmentation results. The segmentation scores, including accuracy, AUC, and F1-score, on the validation and test sets are (0.919, 0.977, 0.910) and (0.956, 0.980, 0.959) respectively as listed in Table 7 , 8, 9, 13, 14 , and 15 of the appendix. Next, we visualize the infection segmentation results from the first fold. The confusion matrices of the segmentation results on the corresponding validation and test sets are shown in Fig. 14 . We visualize several true positive and true negative cases in Fig. 15 . Compared with previous studies that highlight the infected regions roughly by leveraging the interpretability of deep classification models, our segmentation model trained on the infection-aware DRRs is able to segment the infected regions in CXRs directly and accurately. Besides, we present several failure (false positive and false negative) cases in Fig. 16 . As mentioned earlier, CXRs are generally considered less sensitive than 3D CT scans [7] . It may happen that CT examination detects an abnormality, whereas the X-ray screening on the same patient reports no findings. The significance level of radiological signs of COVID-19 infection in Xray images depends on the severity of COVID-19 infection, which is typically assessed by the percent volume of the lung that is infected by COVID-19 (PIV for short). Only when the PIV reaches a certain level (i.e., detection limit), X-ray imaging can effectively detect COVID-19 infection. Therefore, we propose to leverage our infection-aware DRR generator by searching the best parameters (w 0 , w 1 , w 2 ) and T 2 to estimate such detection limit. The insight behind this 207748 VOLUME 8, 2020 estimation method is that the DRRs, generated by using the best parameters, have the smallest gap with real positive X-ray images, and thus will achieve the highest classification and segmentation scores no matter whether the domain adaptation module is appied or not. Accordingly, we average the corresponding items in Table 4 -27 of the appendix to obtain the total average (T-Avg.) scores of 63 training sets to search for the best parameters. Meanwhile, we compute the equivalent average PIV (EAPIV) of the 10 CT cases that are used to generate infection-aware DRRs. As can be seen from Table 2 , the peak of T-Avg. scores at each row in Table 2 appears consistently at w 2 = 3.0, and the corresponding EAPIV is 19.43% ± 16.29% (Mean ± Standard Deviation). When the EAPIV is less than 19.43% (w 2 = 3.0), e.g., 11 .77% (w 2 = 1.5) and 8.51% (w 2 = 1.0), the corresponding T-Avg. scores drop below 81.0% no matter what the contribution threshold of infected voxels (CTIV) T 2 is. Such results imply that DRRs generated from a CT case whose PIV is less than 19.43% cannot be easily distinguished from normal DRRs. Therefore, we conclude that the detection limit 207750 VOLUME 8, 2020 of X-ray imaging, measured by the percent volume of the lung that is infected by COVID-19, is 19.43% ± 16.29%. Moreover, to examine the function of CTIV, we plot the histograms of the CRIV of the pixels in infection-aware DRRs in Fig. 17 . Note that only the pixels whose corresponding X-rays pass through the infected voxels of CT volume are counted. These histograms show the effectiveness of our infection-aware DRR generator in changing the distribution of CRIV of the pixels in infection-aware DRRs. For the best parameters w 0,1 = 1.0 and w 3 = 3.0 (the 8-th column of Table 2 ), the peak of T-Avg. scores (87.9%) appears at T 2 = 0.20. Increasing the CTIV T 2 from 0.2 to 0.4 makes a large numberof pixels (more than 600,000) whose CRIVs are between 0.2 and 0.4 be treated as negative pixels, thus leading to a significant drop in T-Avg. score (-6.9%↓). Meanwhile, decreasing the CTIV T 2 from 0.2 to 0.15 makes the pixels (less than 200,000) whose CRIVs are between 0.15 and 0.2 be treated as positive pixels, thus leading to a minor drop in T-Avg. score (-0.9%↓). Therefore, we conclude that the estimated lower bound of CRIV for significant radiological VOLUME 8, 2020 signs of COVID-19 infection in DRRs is 20.0%. It means that the pixels whose CRIVs are lower than 20.0% cannot be easily distinguished from the pixels of the lungs in CXRs. We simply use the 4-fold cross validation models trained by using our domain adaptation module to segment the 10 CXRs. The average DSC scores of these segmentation results are reported in Table 3 . As can be seen, our infection-aware DRRs (w 2 = 3.0 or w 2 = 6.0) have achieved an average DSC score of ∼40%, which are much better (>20%) than the standard DRRs. Besides, we also observe the same pattern as in Table 2 : the peak of average DSC scores of our infection-aware DRRs appears at w 2 = 3.0. It provides more evidence for confirming the validity of the estimated detection limit of X-ray imaging in detecting COVID-19 infection. VOLUME 8, 2020 This study still has a variety of limitations. Firstly, the synthetic annotation masks of infected regions for DRRs are only investigated experimentally, and are not evaluated by the radiologists. The expert annotations of infected regions for DRRs may provide evidence to determine more accurate contribution threshold of infected voxels. Secondly, this study uses publicly available CT and CXR datasets from various sources. The variability in expert annotations of CT scans and CXRs is not assessed, which may introduce implicit biases to the training and evaluation. Thirdly, the evaluation for the segmentation performance of our method in this study is incomprehensive due to the lack of sufficient CXRs with pixel-level annotations of infected regions. The evaluation results on ten CXRs may be biased and unable to give guidance to optimize our method. A comparison between learning infection segmentation from CXRs straightway and learning from DRRs may provide key insights on how to realize high-quality automated COVID-19 infection segmentation on CXRs efficiently. Lastly, this study uses only ten COVID-19 positive CT cases for synthesizing DRRs. The performance of our DRR4Covid and the validity of the estimated detection limit of X-ray imaging in detecting COVID-19 infection can be improved by involving more CT scans from patients in various stages of COVID-19 infection. Nevertheless, the findings of this article provide promising results that encourage the use of DRR4Covid for learning automated COVID-19 infection segmentation on CXRs from DRRs without the need for any expert annotations of CXRs. We propose a novel approach, called DRR4Covid, to learn automated COVID-19 infection segmentation on CXRs from DRRs. DRR4Covid consists of three key components, including an infection-aware DRR generator, a classification and segmentation network, and a domain adaptation module. The infection-aware DRR generator is able to produce DRRs with adjustable strength of radiological signs of COVID-19 infection, and generate pixel-level infection annotations that match the DRRs precisely, thus enabling deep segmentation networks to be trained directly for automated infection segmentation. The domain adaptation module is introduced to reduce the domain discrepancy between DRRs and CXRs by training networks on unlabeled real CXRs and labeled DRRs together. We provide a simple but effective implementation of DRR4Covid by using a domain adaptation module based on Maximum Mean Discrepancy (MMD), and a FCN-based network with a classification header and a segmentation header. Extensive experiment results have confirmed the efficacy of our method; specifically, without using any annotations of real CXRs, our network has achieved a classification score of (Accuracy: 0.949, AUC: 0.987, F1-score: 0.947) and a segmentation score of (Accuracy: 0.956, AUC: 0.980, F1-score: 0.955) on a test set with 558 normal cases and 558 positive cases. Besides, we estimate the detection limit of X-ray imaging in detecting COVID-19 infection by adjusting the strength of radiological signs of COVID-19 infection in synthetic DRRs. The estimated detection limit, measured by the percent volume of the lung that is infected by COVID-19, is 19.43% ± 16.29%, and the estimated lower bound of the contribution rate of infected voxels is 20.0% for significant radiological signs of COVID-19 infection. To our best knowledge, this is the first attempt to realize the automated COVID-19 infection segmentation base on CXRs by using the labeled DRRs that are generated from Chest CT scans. Future work can be carried out by extending the DRR4Covid to DRR4Lesion to enable multiple lung lesion segmentation on CXRs. Tables 4-27. YUNXIN ZHONG received the bachelor's degree in biomedical engineering from the Beijing Institute of Technology (BIT), Beijing, China, in 2018, where she is currently pursuing the master's degree with the School of Life of Science. Her research interests include the cross-cutting field of medical image processing and computer science. Ms. Zhong met the EAD challenge co-located with the 16th ISBI and was awarded with the ''EAD2019 Detection Award.'' YULIN DENG received the bachelor's and master's degrees from the Beijing Institute of Technology, in 1983 and 1986, respectively, and the Ph.D. degree from the Nagoya Institute of Technology, Japan, in 1997. He spent the next seven years as an Assistant Professor with the Analytical Chemistry Laboratory. In 1993, he had the honor to obtain a national scholarship from the Ministry of Education of Japan. Subsequently, he was awarded a two-year Postdoctoral Fellowship from the Health Service Utilization and Research Commission of Canada. He has been involved in Neuropsychiatry Research Unit, University of Saskatchewan, Canada, since 1997. After his postdoctoral training, he was a Senior Scientist with the Neurology Research Unit, College of Medicine, University of Saskatchewan. He has been promoted to a Full Professor with the Beijing Institute of Technology, China, where he has been in charge of the School of Life Science and Technology, since 2002. In 2013, he was nominated to be a Corresponding Member of the International Academy of Astronautics. He has authored or coauthored more than 300 publications, including research articles, books, and review articles. In the past 10 years, he has been involved in a variety of projects, including molecular biochemistry, proteomics study, and space biology. His research interests include the pathogenesis of Parkinson's disease. He also focuses on endogenous neurotoxins and their involvement in the neurodegeneration. XIAOYING TANG was a Teaching Assistant, a Lecturer, and an Associate Professor of the Department of Electronic Engineering, Beijing Institute of Technology (BIT), from 1983 to 2002. She has been an Associate Professor, a Full Professor, an Associate Dean, and a Responsible Professor of biomedical engineering with the School of Life Sciences, BIT. She is engaged in the research of biomedical signal detection and processing, medical image processing, and key technologies of MRI equipment. She has published more than 60 articles in the field of biomedical signal extraction and processing, medical image processing, and nuclear magnetic resonance imaging, including 7 SCI articles and 20 EI articles in the past five years. She also wrote two monographs as an associate editor and participated in the compilation of three textbooks. Dr. Li received the Outstanding Scientific and Technological Worker of the Chinese Institute of Electronics, in 2017. VOLUME 8, 2020 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study Pathological findings of COVID-19 associated with acute respiratory distress syndrome Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: A descriptive study WHO Coronavirus Disease (COVID-19) Dashboard. Who.int Towards contactless patient positioning Deep learning COVID-19 features on CXR using limited training data sets Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19 Task driven generative modeling for unsupervised domain adaptation: Application to X-ray image segmentation Fast calculation of the exact radiological path for a threedimensional CT array Computation of digitally reconstructed radiographs for use in radiotherapy treatment design A fast algorithm to calculate the exact radiological path through a pixel or voxel space Fast generation of digitally reconstructed radiographs using attenuation fields with application to 2D-3D image registration Frequency and distribution of chest radiographic findings in COVID-19 positive patients Correlation of Chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR Fully convolutional networks for semantic segmentation CT imaging based digitally reconstructed radiographs and their application in brachytherapy Comparison of digitally reconstructed radiographs generated from axial and helical CT scanning modes: A phantom study GPU-accelerated digitally reconstructed radiographs Accelerated ray tracing for radiotherapy dose calculations on a GPU A fast DRR generation scheme for 3D-2D image registration based on the block projection method Fast calculation of digitally reconstructed radiographs using light fields Fast DRR generation for 2D/3D registration How does the hip joint move? Techniques and applications DeepDRR-A catalyst for machine learning in fluoroscopyguided procedures Enabling machine learning in X-ray-based procedures via realistic simulation of image formation X2CT-GAN: Reconstructing CT from biplanar X-Rays with generative adversarial networks A curriculum domain adaptation approach to the semantic segmentation of urban scenes Decaf: A deep convolutional activation feature for generic visual recognition How transferable are features in deep neural networks?'' in Proc Deep subdomain adaptation network for image classification Unsupervised domain adaptation by backpropagation Learning transferable features with deep adaptation networks Conditional adversarial domain adaptation,'' in Proc Multi-adversarial domain adaptation Deep coral: Correlation alignment for deep domain adaptation Multi-representation adaptation network for cross-domain image classification Learning to adapt structured output space for semantic segmentation Domain-adversarial training of neural networks Adversarial discriminative domain adaptation Cycada: Cycle-consistent adversarial domain adaptation Deep transfer learning with joint adaptation networks Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources Central moment discrepancy (CMD) for domaininvariant representation learning A kernel two-sample test DeepCOVIDExplainer: Explainable COVID-19 diagnosis based on chest X-ray images On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on chest X-ray images Visualizing and understanding convolutional networks Deep inside convolutional networks: Visualising image classification models and saliency maps Investigating the influence of noise and distractors on the interpretation of neural networks Salient deconvolutional networks Axiomatic attribution for deep networks Learning important features through propagating activation differences Towards efficient COVID-19 CT annotation: A benchmark for lung and infection segmentation Can AI help in screening viral and COVID-19 pneumonia? COVID-CXNet: Detecting COVID-19 in frontal chest X-ray images using deep learning COVID-19 image data collection: Prospective predictions are the future Chest X-Ray Images (Pneumonia) COVID-19 Database. Sirm.org. Accessed Covid-19 Image Repository BIMCV COVID-19+: A large annotated dataset of RX and CT images from COVID-19 patients respectively, where he is currently pursuing the Ph.D. degree with the School of Life of Science. He has authored article SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications (ICCV2019). His research interests include the cross-cutting field of medical image processing and computer science Zhang met the Endoscopic Artefact Detection (EAD) challenge collocated with the The authors would like to thank providers of Radiopaedia, COVID-19 Image Data Collection, Chest X-Ray Images (pneumonia), SIRM, Twitter COVID-19 CXR dataset, and Hannover Medical School dataset.