key: cord-0437314-8ndkasy8 authors: Zhou, Jieli; Jing, Baoyu; Wang, Zeya title: SODA: Detecting Covid-19 in Chest X-rays with Semi-supervised Open Set Domain Adaptation date: 2020-05-22 journal: nan DOI: nan sha: 5db8ce783cc6960bfb2865a53f4202e63100139c doc_id: 437314 cord_uid: 8ndkasy8 The global pandemic of COVID-19 has infected millions of people since its first outbreak in last December. A key challenge for preventing and controlling COVID-19 is how to quickly, widely, and effectively implement the test for the disease, because testing is the first step to break the chains of transmission. To assist the diagnosis of the disease, radiology imaging is used to complement the screening process and triage patients into different risk levels. Deep learning methods have taken a more active role in automatically detecting COVID-19 disease in chest x-ray images, as witnessed in many recent works. Most of these works first train a CNN on an existing large-scale chest x-ray image dataset and then fine-tune it with a COVID-19 dataset at a much smaller scale. However, direct transfer across datasets from different domains may lead to poor performance due to visual domain shift. Also, the small scale of the COVID-19 dataset on the target domain can make the training fall into the overfitting trap. To solve all these crucial problems and fully exploit the available large-scale chest x-ray image dataset, we formulate the problem of COVID-19 chest x-ray image classification in a semi-supervised open set domain adaptation setting, through which we are motivated to reduce the domain shift and avoid overfitting when training on a very small dataset of COVID-19. In addressing this formulated problem, we propose a novel Semi-supervised Open set Domain Adversarial network (SODA), which is able to align the data distributions across different domains in a general domain space and also in a common subspace of source and target data. In our experiments, SODA achieves a leading classification performance compared with recent state-of-the-art models, as well as effectively separating COVID-19 with common pneumonia. Since the Coronavirus disease 2019 (COVID- 19) was first declared as a Public Emergency of International Concern (PHEIC) on January 30, 2020 1 , it has quickly evolved from a local outbreak in Wuhan, China to a global pandemic, costing millions of lives and dire economic loss worldwide. In the US, the total COVID-19 cases grew from just one confirmed on Jan 21, 2020 to over 1 million on April 28, 2020 in a span of 3 months. Despite drastic actions like shelterin-place and contact tracing, the total cases in US kept increasing at an alarming daily rate of 20,000 -30,000 throughout April, 2020. A key challenge for preventing and controlling COVID-19 right now is the ability to quickly, widely and effectively test for the disease, since testing is usually the first in a series of actions to break the chains of transmission and curb the spread of the disease. COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 2 By far, it is the most reliably diagnosed through Reverse Transcription Polymerase Chain Reaction (RT-PCR) 3 in which a sample is taken from the back of throat or nose of the patients and tested for viral RNA. While taking samples from the patients, aerosol pathogens could be released and would put the healthcare workers at risk. Furthermore, once the sample is collected, the testing process usually takes several hours and recent study reports that the sensitivity of PT-PCR is around 60-70% [1] , which suggests that many people tested negative for the virus may actually carry it thus could infect more people without knowing it. On the other hand, the sensitivity of chest radiology imaging for COVID-19 was much higher at 97% as reported by [1, 8] . Due to the shortage of viral testing kits, the long period of waiting for results, and low sensitivity rate of RT-PCR, radiology imaging has been used as a complementary screening process to assist the diagnosis of COVID-19 and triage patients into different risk levels. Unlike PT-PCR, imaging is readily available in most healthcare facilities around the world, and the whole process can be done rapidly. In recent years, with the rapid advancement in deep learning and computer vision, many breakthroughs have been developed in using Artificial Intelligence (AI) for medical imaging analysis, especially disease detection [14, 32, 33] and report generation [4, 16, 17, 21] , and some AI models achieve expert radiologist-level performance [19] . Right now, with most healthcare workers busy at front lines saving lives, the scalability advantage of AI-based medical imaging systems stand out more than ever. Some AI-based chest imaging systems have already been deployed in hospitals to quickly inform healthcare workers to take corresponding actions 4 . Annotated datasets are required for training AI-based methods, and a small chest x-ray dataset with COVID-19 is collected recently: COVID-ChestXray [6] . In the last few weeks, several works [2, 20, 22, 23] apply Convolutional Neural Networks (CNN) and transfer learning to detect COVID-19 cases from chest x-ray images. They first train a CNN on a large dataset like Chexpert [14] and ChestXray14 [32] , and then fine-tune the model on the small COVID-19 dataset. By far, due to the lack of large-scale open COVID-19 chest x-ray imaging datasets, most works only used a very small amount of positive COVID-19 imaging samples [6] . While the reported metrics like accuracy and AUC are high, it is likely that these models overfit on this small dataset and may not achieve the reported performance on a larger COVID-19 x-ray dataset. Besides, these methods suffer a lot from label domain shift: these newly trained models lose the ability to detect common thoracic diseases like "Effusion" and "Nodule" since these labels do not appear in the new dataset. Moreover, they also ignored the visual domain shift between the two datasets. On the one hand, the large-scale datasets like ChestXray14 [32] and Chexpert [14] are collected from top U.S. health institutes like National Institutes of Health (NIH) clinical center and Stanford University, which are well-annotated and carefully processed. On the other hand, COVID-ChestXray [6] is collected from a very diverse set of hospitals around the world and they are of very different qualities and follow different standards, such as the viewpoints, aspect ratios and lighting, etc. In addition, COVID-ChestXray contains not only chest x-ray images but also CT scan images. In order to fully exploit the limited but valuable annotated COVID-19 chest x-ray images and the large-scale chest x-ray image dataset at hand, as well as prevent the above-mentioned drawbacks of those fine-tuning based methods, we define the problem of learning a classifier for COVID-19 from the perspective of open set domain adaptation (Definition 1) [25] . Different from traditional unsupervised domain adaptation which requires the label set of both source and target domain to be the same, the open set domain adaptation allows different domains to have different label sets. This is more suitable for our problem because COVID-19 is a new disease which is not included in the ChestXray14 or Chexpert dataset. However, since our task is to train a new classifier for COVID-19 dataset, we have to use some annotated samples. Therefore, we further propose to view the problem as a Semi-Supervised Open Set Domain Adaptation problem (Definition 2). Under the given problem setting, we propose a novel Semisupervised Open set Domain Adversarial network (SODA) comprised of four major components: a feature extractor G f , a multilabel classifier G y , domain discriminators D д and D c , as well as common label recognizer R. SODA learns the domain-invariant features by a two-level alignment, namely, domain level and common label level. The general domain discriminator D д is responsible for guiding the feature extractor G f to extract domain-invariant features. However, it has been argued that the general domain discriminator D д might lead to false alignment and even negative transfer [26, 34] . For example, it is possible that the feature extractor G f maps images with "Pneumonia" in the target domain and images with "Cardiomegaly" in the source domain into similar positions, which might result in the miss-classification of G y . In order to solve this problem, we propose a novel common label discriminator D c to guide the model to align images with common labels across domains. For labeled images, D c only activates when the input image is associated with a common label. For unlabeled images, we propose a common label recognizer R to predict their probabilities of having a common label. The main contributions of the paper are summarized as follows: • To the best of our knowledge, we are the first to tackle the problem of COVID-19 chest x-ray image classification from the perspective of domain adaptation. be a target domain with N t unlabeled samples, where the underlying label set L t of the target domain might be different from the label set L s of the source domain. Define L c = L s ∩ L t as the set of common labels shared across different domains,L s = L s \L c andL t = L t \L c be sets of domain-specific labels which only appear in the source and the target domain respectively. The task of Unsupervised Open Set Domain Adaptation is to build a model which could accurately assign common labels in L c to samples x t n in the target domain, as well as distinguish those x t n belonging toL t . with N t unlabeled samples and The task of Semi-supervised Open Set Domain Adaptation is to build a model to assign labels from L t to unlabeled samples in D t . We summarize the symbols used in the paper and their descriptions in Table 1 . multi-label classifier for L G y l binary classifier for label l (part of G y ) R common label recognizer D c domain discriminator for common labels L c D д general domain discriminator L G y loss of multi-label classification over the entire dataset L R loss of R over the entire dataset L D д loss of D д over the entire dataset L D c loss of D c over the entire dataset λ the coefficient of losses x input image h hidden features y ground-truth label y predicted probabilitŷ d predicted probability that x belongs to source domain r predicted probability that x has common labels An overview of the proposed Semi-supervised Open Set Domain Adversarial network (SODA) is shown in Fig. 1 . Given an input image x, it will be first fed into a feature extractor G f , which is a Convolutional Neural Network (CNN), to obtain its hidden feature h (green part). The binary classifier G y l (part of the multi-label classifier G y ) takes h as input, and will predict the probabilityŷ l for the label l ∈ L (blue part). We propose a novel two-level alignment strategy for extracting the domain invariant features across the source and target domain. On the one hand, we perform domain alignment (Section 3.2), which leverages a general domain discriminator D д to minimize the domain-level feature discrepancy. On the other hand, we emphasize the alignment of common labels L c (Section 3.3) by introducing another domain discriminator D c for images associated with common labels. For labeled images in D s and D t ′ , we compute loss for D c and conduct back-propagation only if the input image x is associated with a common label l ∈ L c . As for unlabeled data in D t , we propose a common label recognizer R to predict the probabilityr that an image x has a common label, and user as a weight in the losses of D c and D д . Domain adversarial training [10] is the most popular method for helping feature extractor G f learn domain-invariant features such that the model trained on the source domain can be easily applied to the target domain. The objective function of the domain discriminator D д can be written as: whered д denotes the predicted probability that the input image belongs to the source domain. In SODA, we use a Multi-Layer Perceptron (MLP) as the general domain discriminator D д . In the field of adversarial domain adaptation, most of the existing methods only leverage a general domain discriminator D д to minimize the discrepancy between the source and target domain. Such a practice ignores the label structure across domains, which will result in false alignment and even negative transfer [26, 34] . If we only use a general domain discriminator D д in the open set domain adaptation setting (Definition 1 and Definition 2), it is possible that the feature extractor G f will map the target domain images with a common label l ∈ L c , say "Pneumonia", and the source domain images with a specific label l ∈L s , "Cardiomegaly", to similar positions in the hidden space, which might lead to the classifier miss-classifying a "Pneumonia" image in the target domain as "Cardiomegaly". To address the problem of the miss-matching between the common and specific label sets, we propose a domain discriminator D c to distinguish the domains for the images with a common label. For the labeled data from the source domain D s and the target domain D t , we know whether an image x has a common label or not, and we only calculate the loss L d c for D c on the samples with common labels: whered c denotes the predicted probability that the input images is associated with a common label. However, a large number of images in the target domain are unlabeled, and thus extra effort is required for determining whether an unlabeled image is associated with a common label. To address this problem, we propose a novel common label recognizer R to predict the probabilityr whether an unlabeled image has at least one common label. The probabilityr will be used as a weight in the loss function of D c 5 : Figure 1 : Architecture overview of the proposed SODA model. Given an input image x, the feature extractor G f will extract its hidden features h (green part), which will be fed into a multi-label classifier G y (blue part), a common label recognizer R (yellow part) and a domain discriminator D (red part) to predict the probabilityŷ of disease labels, the probabilityr that x is associated with a common label and the probabilityd that x belongs to the source domain. L y , L r and L d denote the losses of image classification, common label classification and domain classification. D д is the general domain discriminator, and D c is the domain discriminator for images associated with a common label. G y 1 denotes the image classifier for the first label in the label set of the entire dataset L = L s ∪ L t . Note that the gradients from L d c and L d д are not allowed to pass throughr (grey arrows). In addition, we also user to re-weigh unlabeled samples in D д (Equation 1) to further emphasize the alignment of common labels: Finally, the recognizer R is trained on the labeled set D s ∪ L t ′ via cross-entropy loss: The overall objective function of SODA can be written as a min-max game between classifiers G y , R and discriminators D д , D c : where L R , L D д , L l abel D c and L un D c are respectively defined in Equation 5, 4, 2 and 3; L G y denotes the cross-entropy loss for multi-label classification; λ denotes the coefficient for each loss function. Source Domain. We use ChestXray-14 [32] as the source domain dataset. This dataset is comprised of 112,120 anonymized chest x-ray images from the National Institutes of Health (NIH) clinical center. The dataset contains 14 common thoracic disease labels: "Atelectasis", "Consolidation", "Infiltration", "Pneumothorax", "Edema", "Emphysema", "Fibrosis", "Effusion", "Pneumonia", "Pleural thickening", "Cardiomegaly", "Nodule", "Mass" and "Hernia". Target Domain. The newly collected COVID-ChestXray [6] is adopted as the target domain dataset. This dataset contains images collected from various public sources and different hospitals around the world. This dataset (by the time of this writing) contains 328 chest x-ray images in which 253 are labeled positive as the new disease "COVID-19", whereas 61 are labeled as other well-studied "Pneumonia". . We evaluate our model from four different perspectives. First, to test the classification performance, following the semi-supervised protocol, we randomly split the 328 x-ray images in COVID-ChestXray into 40% labeled set, and 60% unlabeled set. We run each model 3 times and report the average AUC-ROC score. Second, we compute the Proxy-A Distance (PAD) [3] to evaluate models' ability for minimizing the feature discrepancy across domains. Thirdly, we use t-SNE to visualize the feature distributions of the target domain. Finally, we also qualitatively evaluate the models by visualizing their saliency maps. We compare the proposed SODA with two types of baselines methods: fine-tuning based transfer learning models and domain adaptation models. For fine-tuning based models, we select the two most popular CNN models DenseNet121 [13] and ResNet50 [12] as our baselines. These models are first trained on the ChestXray-14 dataset and then fine-tuned on the COVID-ChestXray dataset. As for the domain adaptation models, we compare our model with two classic models, Domain Adversarial Neural Networks (DANN) [10] and Partial Adversarial Domain Adaptation (PADA) [5] . Note that DANN and PADA were designed for unsupervised domain adaptation, and we implement a semisupervised version of them. We use DenseNet121 [13] , which is pretrained on the ChestXray-14 dataset [32] , as the feature extractor G f for SODA. The multi-label classifier G y is a one layer neural network and its activation is the sigmoid function. We use the same architecture for D д , D c and R: a MLP containing two hidden layers with ReLU [24] activation and an output layer. The hidden dimension for all of the modules: G y , D д , D c and R is 1024. For fair comparison, we use the same setting of G f , G y and D д for DANN [10] and PADA [5] . All of the models are trained by Adam optimizer [18] , and the learning rate is 10 −4 . To investigate the effects of domain adaptation and demonstrate the performance improvement of the proposed SODA, we present the average AUC-ROC scores for all models in Table 2 . Comparing the results for ResNet50 and DenseNet121, we observe that deeper and more complex models achieve better classification performance. For the effects of domain adaptation, it is obvious that the domain adaptation methods (DANN, PADA, and SODA) outperform those fine-tuning based transfer learning methods (ResNet50 and DenseNet121). Furthermore, the proposed SODA achieves higher AUC scores on both COVID-19 and Pneumonia than DANN and PADA, demonstrating the effectiveness of the proposed two-level alignment. Proxy A-Distance [3] has been widely used in domain adaptation for measuring the feature distribution discrepancy between the source and target domains. PAD is defined by where ϵ is the domain classification error (e.g. mean absolute error) of a classifier (e.g. linear SVM [7] ). Following [10] , we train SVM models with different C and use the minimum error to calculate PAD. In general, a lower d A means a better ability for extracting domain invariant features. As shown in Fig. 2 , SODA has a lower PAD compared with the baseline methods, which indicates the effectiveness of the proposed two-level alignment strategy. We use t-SNE to project the high dimensional hidden features h extracted by DANN, PADA, and SODA to low dimensional space. The 2-dimensional visualization of the features in the target domain is presented in Fig. 3 , where the red data points are image features of "Pneumonia" and the blue data points are image features of "COVID-19". It can be observed from Fig. 3 that SODA performs the best for separating "COVID-19" from "Pneumonia", which demonstrates the effectiveness of the proposed common label recognizer R as well as the domain discriminator for common labels D c . Grad-CAM [28] is used to visualize the features extracted from all compared models. Fig. 4 shows the Grad-CAM results on seven different COVID-19 positive chest x-rays. These seven images have annotations (small arrows and box) indicating the pathology locations. We observe that ResNet50 and DenseNet121 can focus wrongly on irrelevant locations like the dark corners and edges. In contrast, domain adaptation models have better localization in general, and our SODA model gives more focused and accurate pathological locations than other models compared. In addition, we consult a professional radiologist with over 15 years of clinical experience from Wuxi People's Hospital and received positive feedback on the pathological locations as indicated by the Grad-CAM of SODA. In the future, we plan to do a more rigorous evaluation study with more inputs from radiologists. We believe the features extracted from SODA can assist radiologists to pinpoint the suspect COVID-19 pathological locations faster and more accurately. Domain adaptation is an important application of transfer learning that attempts to generalize the models from source domains to unseen target domains [9, 10, 15, [29] [30] [31] 36] . Deep domain adaptation approaches are usually implemented through discrepancy minimization [30] or adversarial training [9, 10, 29] . Adversarial training, inspired by the success of generative adversarial modeling [11] , has been widely applied for promoting the learning of transfer features in image classification. It takes advantage of a domain discriminator to classify whether an image is from the source or target domains. On top of these methods, a couple of works have been presented for exploring the high-level structure in the label space, which aim at further improving the domain adaptation performance for multi-class image classification [34] or fundamentally solving the application problem when the label sets from source domains and target domains are different [36] . In order to meet the latter target, more and more researchers have started to study the open set domain adaptation problem, in which case the target domain has images that do not come from the classes in the source domain [25, 36] . Universal domain adaptation is the latest method that is proposed through using an adversarial domain discriminator and a non-adversarial domain discriminator to successfully solve this problem. [36] . Although domain adaptation has been well explored, its application in medical imaging analysis, such as domain adaptation for chest x-ray images, is still under-explored. Semi-supervised learning is a very important task for image classification, which can make use of both labeled and unlabeled data at the same time [27] . Recently it has been used to solve image classification problems on a very large (1 billion) set of unlabelled images [35] . In spite of many progresses that have been made with unsupervised domain adaptation methods, the domain adaptation with semi-supervised learning has not yet been fully explored. There has been substantial progress in constructing publicly available databases for chest x-ray images as well as a related line of works to identify lung diseases using these images. The largest public datasets of chest x-ray images are Chexpert [14] and ChestXray14 [32] , which respectively include more than 200,000 and 100,000 chest x-ray images collected by Stanford University and National Institute of Healthcare. The creation of these datasets have also motivated and promoted the multi-label chest x-ray classification for helping the screening and diagnosis of various lung diseases. The problems of disease detection [14, 32, 33] and report generation using chest x-rays [4, 16, 17, 21] are fully investigated and have achieved much-improved results upon recently. However, there have been very few attempts for studying the domain adaptation problems with the multi-label image classification problem using chest x-rays. In this paper, in order to assist and complement the screening and diagnosing of COVID-19, we formulate the problem of COVID-19 chest x-ray image classification in a semi-supervised open set domain adaptation framework. Accordingly, we propose a novel deep domain adversarial neural network, Semi-supervised Open set Domain Adversarial network (SODA), which is able to align the data distributions across different domains at both domain level and common label level. Through evaluations of the classification accuracy, we show that SODA achieves better AUC-ROC scores than the recent state-of-the-art models. We further demonstrate that the features extracted by SODA is more tightly related to the lung pathology locations, and get initial positive feedback from an experienced radiologist. In practice, SODA can be generalized to any semi-supervised open set domain adaptation settings where there are a large well-annotated dataset and a small newly available dataset. In conclusion, SODA can serve as a pilot study in using techniques and methods from domain adaptation to radiology imaging classification problems. Ziyong Sun, and Liming Xia. 2020. Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Analysis of representations for domain adaptation Clinical Report Auto-completion Partial adversarial domain adaptation COVID-19 image data collection. arXiv Support-vector networks Sensitivity of chest CT for COVID-19: comparison to RT-PCR Unsupervised Domain Adaptation by Backpropagation Domain-adversarial training of neural networks Generative adversarial nets Deep residual learning for image recognition Densely connected convolutional networks Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison Cross-Domain Labeled LDA for Cross-Domain Text Classification Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports On the automatic generation of medical imaging reports Adam: A method for stochastic optimization Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT Hybrid retrievalgeneration reinforced agent for medical image report generation COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning Rectified linear units improve restricted boltzmann machines Open set domain adaptation Multiadversarial domain adaptation Semi-supervised domain adaptation via minimax entropy Grad-cam: Visual explanations from deep networks via gradient-based localization Adversarial Discriminative Domain Adaptation Deep domain confusion: Maximizing for domain invariance Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment Classification Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays Adversarial Domain Adaptation Being Aware of Class Relationships Billion-scale semi-supervised learning for image classification Universal domain adaptation