key: cord-0991144-isk8fuki authors: Qi, Xiao; Nosher, John L.; Foran, David J.; Hacihaliloglu, Ilker title: Multi-Feature Semi-Supervised Learning for COVID-19 Diagnosis from Chest X-ray Images date: 2021-04-04 journal: 12th International Workshop on Machine Learning in Medical Imaging, MLMI 2021, held in conjunction with 24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021 DOI: 10.1007/978-3-030-87589-3_16 sha: ca775592e0b2d6b4c5fc42f925db33ea874e9140 doc_id: 991144 cord_uid: isk8fuki Computed tomography (CT) and chest X-ray (CXR) have been the two dominant imaging modalities deployed for improved management of Coronavirus disease 2019 (COVID-19). Due to faster imaging, less radiation exposure, and being cost-effective CXR is preferred over CT. However, the interpretation of CXR images, compared to CT, is more challenging due to low image resolution and COVID-19 image features being similar to regular pneumonia. Computer-aided diagnosis via deep learning has been investigated to help mitigate these problems and help clinicians during the decision-making process. The requirement for a large amount of labeled data is one of the major problems of deep learning methods when deployed in the medical domain. To provide a solution to this, in this work, we propose a semi-supervised learning (SSL) approach using minimal data for training. We integrate local-phase CXR image features into a multi-feature convolutional neural network architecture where the training of SSL method is obtained with a teacher/student paradigm. Quantitative evaluation is performed on 8,851 normal (healthy), 6,045 pneumonia, and 3,795 COVID-19 CXR scans. By only using 7.06% labeled and 16.48% unlabeled data for training, 5.53% for validation, our method achieves 93.61% mean accuracy on a large-scale (70.93%) test data. We provide comparison results against fully supervised and SSL methods. Code: https://github.com/endiqq/Multi-Feature-Semi-Supervised-Learning-for-COVID-19-CXR-Images Diagnostics is a key tool for improved management of Coronavirus disease 2019 (COVID- 19) , permitting healthcare workers to rapidly triage patients. Currently, the gold standard diagnosis is based on reverse-transcription polymerase chain reaction (RT-PCR) tests. To improve the management of COVID-19, radiological assessment, based on computed tomography (CT) and chest X-ray (CXR), has also been incorporated into the decision-making process. Compared to CT, CXR provides additional advantages such as fast screening, being portable, and easy to setup (can be setup in isolation rooms). However, the interpretation of CXR images, compared to CT, by the expert radiologist is a difficult process as the visual cues for the disease can be subtle or similar to regular pneumonia. As such computer-aided diagnostic systems that can aid in the decision-making process have been investigated [19, 20, 25] . Computer-aided diagnosis via deep supervised learning has achieved strong performance when provided with a large labeled data set [19, 20, 25] . However, the requirement of expert knowledge for the labeling process is costly. The scarcity of available data, in particular with a new disease, also affects the success of the developed methods. In order to mitigate this problem methods based on semisupervised learning (SSL) have been developed for diagnosing lung disease from CXR images. The following papers provide a brief overview of prior work on lung disease classification from CXR data using SSL. Consistency regularization and pseudo labeling have been the two most dominant methods investigated for SSL. [24] proposed a multi-class abnormality detection from CXR data. Using 35% labeled and 35% unlabeled data as the training set an AUROC close to 0.82 was reported on 20% test data. In [10] , disentangled stochastic latent space was used to improve self-ensembling for semi-supervised CXR classification. Using 3.12% labeled and 73.2% unlabeled data for training an AUROC value of 0.66 was obtained on 22% test data. In [9] , the same group extended their prior work [10] by training SSL network on linear mixing of labeled and unlabeled data, at the input and latent space, to improve network regularization. 9.6% improvement of AUROC, compared to [10] , was reported using same evaluation strategy as [10] . In [3] , the authors proposed a graph-based label propagation method for semi-supervised CXR classification and achieved an AUROC of 0.78 with only using 20% of the labeled data. Unlabeled dataset size used for training and test dataset size was not reported in [3] . [3] was recently extended for SSL-based classification of Covid-19 from CXR data [4] . Using 30% labeled and 70% unlabeled data for training an average accuracy of 94.6% was reported on 2.14% test data. Training and testing data included only 200 and 100 Covid-19 scans respectively. Deep learning methods can automatically learn the features of the data without the need for data preprocessing. Nonetheless, automation requires 1-the development of deeper and complex networks, and 2-annotated large training data. Preprocessing can provide solutions to some of these challenges. Most recently [20] local phase-based image processing was proposed for improved representation of lung CXR data. Quantitative results demonstrated the importance of local phase image features for improved diagnosis of COVID-19 disease from CXR scans. Motivated by this, in this work we propose a local phase-based SSL method for accurate diagnosis of COVID-19 from minimal training data. Our proposed method achieves similar accuracy as full supervised baseline architecture accuracy by only using 7.06% labeled and 16.48% unlabeled data for training, 5.53% for validation, on 70.93% testing data. Our contributions are as follows: 1-We introduce a novel multi-feature SSL model which outperforms baseline mono-feature-based SSL using minimal training data. 2-We perform ablation studies to show the effect of local-phase features for training and testing. 3-We evaluate our technique on 18,691 CXR data set of healthy (8, 851) , regular pneumonia (6,045), and COVID-19 (3, 795) scans. This is the largest COVID-19 evaluation study reported for SSL. We provide a performance comparison of the presented method against the supervised baseline and several SSL methods. Multi-feature Images: Enhancement of CXR images (CXR(x, y)) is based on the extraction of local phase image features using bandpass quadrature filters and L1 norm-based contextual regularization method [20] . Monogenic filter, and α-scale space derivative quadrature filters (ASSD) are used as bandpass quadrature filters. Three different local phase CXR(x, y) image features are extracted: 1-Local weighted mean phase angle (LwP A(x, y)), 2-LwP A(x, y) weighted local phase energy (LP E(x, y)), and 3-Enhanced local energy attenuation image (ELEA(x, y)). During this work, we used the same filter parameters as explained in [20] . A multi-feature image, denoted as M F (x, y), is created by combining these three types of local phase images as three-channel input. Since all enhanced images are grayscale, the combination results in an image with shape (w*h*3 ), where w and h correspond to the width and height of the image. Qualitative results corresponding to the M F (x, y) images are displayed in Figure 1 . Green arrows point to diffuse irregular patchy consolidations. The COVID-19 image shows bilateral peripherally distributed opacities (green arrows). Investigating Figure 1 we can observe local phase and M F (x, y) images of COVID-19 have improved opacity features related to COVID-19 compared to original CXR(x, y) image. Although LP E(x, y) image did not result in opacity detection improvement in the COVID -19 image, it could compliment the other two local phase enhanced CXR images. We can observe a similar enhancement of consolidations related to bacterial pneumonia image (middle row). The M F (x, y) image and the CXR(x, y) image are used as an input to train our proposed multi-feature teacher/student SSL model which is explained in the next section. Semi-supervised learning pipeline: Our proposed method consists of five steps as illustrated in Figure 2 . The pipeline is an extension of [26] , and the algorithm leverages the desirable property that learning is tolerant to a certain degree of noise [17] . Our goal here is to use a multi-feature convolutional neural network (CNN) architecture for the teacher model to limit the labeling noise for unlabeled dataset, thus forming a better teacher model without additional training samples. The same multi-feature CNN architecture is also used for the student model. Our multi-feature CNN architecture consists of two mono-feature network streams for processing CXR(x, y) images and the enhanced M F (x, y) images respectively. In [1, 20] three different fusion strategies for the optimal fusion of features were investigated. The authors have shown that late-fusion outperformed early and mid-level fusion operations. Therefore we adopt the latefusion strategy during this work. In our design, predictions are made based on high-level features from both network streams (Fig.2-Step1 and 4) . During this work we utilize ResNet50 [11] as the encoder network for both streams. Prior SSL methods used AlexNet [9] as their main architecture, however, in [20] ResNet50 outperformed the pre-trained AlexNet, for classifying COVID-19 from CXR(x, y) images, and had slightly improved mean accuracy compared to SonoNet64 [5] , XNet(Xception) [6] , InceptionV4(Inception-Resnet-V2) [21] and EfficientNetB4 [22] . In our proposed work we denote x ld as labeled data and y ld the corresponding label. Unlabeled data is denoted as x uld and the generated pseudo label is denoted asŷ uld . The teacher model, denoted as T , after finetuning with the labeled data, x ld , performs a forward run on unlabeled images x uld to obtain the class distribution P (.|x uld ; θ T ), where θ T denotes the parameters of T . From this distribution, the trained teacher model predicts the pseudo-label y uld for each image according to softmax prediction vector. Once the teacher model, T , generates pseudo labels from the unlabeled data, the top K percent of images in each class, are retained as new positive training samples for each class. The ranking is done based on the corresponding label classification score. Optimization of hyperparameter K is performed using 10% of labeled and validation data. In terms of the selection of K, a small K provides the simple and clean images without much labeling noise for each class. When K increases the images are less obvious and noisier introducing lots of false positives. Therefore, there is a significant trade-off on K. Based on our experiment, the classification accuracy goes down significantly when K > 0.35. And K = 0.25 gives us the best performance and it corresponds to 75% of the available unlabeled training dataset. The paired x uld ,ŷ uld is then shown to the student model ,denoted as S, to optimize its parameter θ S . The optimization is based on the gradient calculated by back-propagation from the cross-entropy loss using SGD optimizer: where η is the learning rate and (x, y; θ) denotes the cross entropy loss calculated between input x uld and labelŷ uld with parameter θ After the student model optimizes its parameters, using Equation 1, its parameters θ (t+1) S is further optimized on a labeled sample x ld , y ld using cross entropy again to minimize the loss: Clearly, (x ld , y ld ; θ , which in turn depends on the pseudo labelŷ uld Dataset: We evaluate the performance of proposed method on following datasets: BIMCV [12] , COVIDx [25] , COVID-19-AR [8] , and MIDRC-RICORD-1c [23] . The COVID-19-AR and MIDRC-RICORD-1c were downloaded from the Cancer Imaging Archive (TCIA) Public Access [7] thus denoted as TCIA dataset. The total data collection is consist of 18,691 CXR scans from 16,817 subjects. The detailed data distribution is shown in Table 1 and A subset containing 5,433 with a balanced distribution of classes consisting of 1-all COVID-19 images from COVIDx, 2-partial normal and pneumonia images from COVIDx, and 3-partial COVID-19 images from BIMCV were randomly selected as our evaluation dataset (Table. 2). This data was split into a training dataset (4,400 scans), validation dataset (544 scans), and early stopping set (489 scans). We keep our validation set to a realistic size as heavy hyperparameter tuning on a large validation set may have limited real-world applicability [18] . All the random splits were repeated five times and average results are reported. During the random split, the evaluation and testing data did not include the same patient scans. We generate small labeled data by using 10%, 20%, and 30% of the training dataset and treat the rest of the training data as unlabeled examples. We would like to reiterate that 10%, 20%, and 30% of our training data corresponds to 2.35%, 4.7%, and 7.06% of the full labeled data (18, 691 CXR scans) . Two test datasets were generated to show the robustness of our proposed approach. The remaining COVID-19 images in BIMCV plus same amount of images in normal and pneumonia classes from COVIDx form our Test-1 dataset. All remaining images in COVIDx and all images in TCIA dataset were used to form our Test-2 dataset. The joined test data corresponds to 70.93% of the full dataset (18, 691 CXR scans) . To evaluate the performance of our multi-feature SSL using CXR(x, y) and M F (x, y) simultaneously to guide both teacher and student models, denoted as MF-TS, we compared it against three types of SSL methods: 1-SSL guided by CXR(x, y), denoted as CXR-TS (this is also the baseline for Billion Scale SSL [26] ); 2-SSL guided by M F (x, y), denoted as Enh-TS; 3-SSL with the teacher model guided by both CXR(x, y) and M F (x, y) and the student model guided by CXR(x, y) only, denoted as MF-T. We also compare our method against previously developed SSL methods: Temporal Ensembling [13] , and Pseudo Labeling [15] . In addition, comparison between our proposed SSL method and supersized learning (SL) is also conducted.Three leading network architectures are incorporated: ResNet50 [11] , which trained by varying percent of labeled CXR samples (10%, 20%, 30%, and 100% of training data), XNet(Xception) [6] and InceptionV4(Inception-Resnet-V2) [21] , which are trained on 100% of the labeled training data only. During comparative evaluation, we use the same amount of validation samples to tune hyperparameters for the investigated baselines. We evaluate the proposed and all the baseline methods by reporting the mean accuracy, precision, recall, and F1-score values. All proposed networks were trained for 50 epochs, using the early stopping technique [16] to avoid overfitting, a learning rate of 0.001 for the first epoch and a learning rate decay of 0.1 every 15 epochs with a mini-batches of size 32. All images were normalized to have zero mean and unit variance and resized to the suitable size for each network during training. All techniques were implemented in Python using the Pytorch framework Firstly, we notice that all the investigated methods have better overall performance across all metrics for Test-1 dataset compared to Test-2 dataset (Table. 3). Test-2 is a more challenging dataset, which has a significantly larger number of samples compared with Test-1 and the cases of COVID-19 in Test-2 from different populations and regions. The proposed MF-TS model achieves an equivalent performance, by only using 30% labeled training data, in comparison with ResNet50 [11] and outperforms the XNet [6] and the InceptionV4 [21] for both Test-1 and Test-2 data (Table. 3). For Test-1 data, ResNet50 [11] accuracy drops down to 86.46% when trained on 10% labeled training data while the proposed MF-TS achieves a significantly higher mean accuracy of 92.43% (p < 0.05 with paired t-test). For Test-2 data we observe a similar significant (p < 0.05 with paired t-test) improvement between ResNet50 [11] and our proposed MF-TS network architecture (84.29% vs 90.08% mean accuracy). We also compared our multi-feature guided method with three SSL techniques. From Table. 3, we observe that MF-TS offers a substantial improvement over Temporal Ensembling [14] for all metrics at every (Table. 3). To further support the assertion that features from enhanced images are beneficial for SSL, we list two additional models, which are Enh-TS and MF-T. Enh-TS and MF-T slightly outperform CXR-TS as shown in Table. 3. Investigating Table. 3, only MF-T has a better performance in all metrics compared with CXR-TS. The improvement mainly comes from pseudo-label generation. Our fusion model has a more precise prediction for unlabeled samples, and thus it is beneficial for training a new student model. We presented a novel multi-feature SSL method for classifying COVID-19 disease from CXR scans. Most prior work that evaluates SSL methods uses a subset of labeled data and large unlabeled data (70%) for training and a small subset for testing (20%). To simulate a more realistic scenario, which is often found in medical imaging applications where both obtaining data and labeling efforts are expensive, we opted to use a relatively small labeled dataset (16.48%) and validation dataset (5.53%)for training. We exhibit, on a small training (23.54%) and large testing dataset (> 70%), that the proposed multi-feature SSL provides improved classification results for diagnosing COVID-19 from CXR scans compared to prior SSL and SL methods. Our results suggest the feasibility of using local-phase CXR image features for improving the success rate of SSL methods and provide a strong foundation for future developments. Future work will include more extensive evaluation and investigation of the proposed method for classifying COVID-19 disease from CT and ultrasound data using SSL methods. Automatic segmentation of bone surfaces from ultrasound using a filter-layer-guided cnn Pseudolabeling and confirmation bias in deep semi-supervised learning Graphx-net: Chest x-ray classification under extreme minimal supervision Graphxcovid: Explainable deep graph diffusion pseudo-labelling for identifying covid-19 on chest x-rays Sononet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound Xception: Deep learning with depthwise separable convolutions The cancer imaging archive (tcia): Maintaining and operating a public information repository Chest imaging representing a covid-19 positive rural u.s. population Semi-supervised medical image classification with global latent mixing Semi-supervised learning by disentangling and self-ensembling over stochastic latent space Deep residual learning for image recognition Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients Temporal ensembling for semi-supervised learning Temporal ensembling for semi-supervised learning Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks Early stopping without a validation set Learning with noisy labels Realistic evaluation of deep semi-supervised learning algorithms Automated detection of covid-19 cases using deep neural networks with x-ray images Chest x-ray image phase features for improved diagnosis of covid-19 using convolutional neural network Inception-v4, inception-resnet and the impact of residual connections on learning Efficientnet: Rethinking model scaling for convolutional neural networks The rsna international covid-19 open annotated radiology database (ricord) Semisupervised classification of diagnostic radiographs with noteacher: A teacher that is not mean Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Billion-scale semisupervised learning for image classification