key: cord-0642046-5q65xc9q
authors: Han, Yan; Chen, Chongyan; Tewfik, Ahmed H; Ding, Ying; Peng, Yifan
title: Pneumonia Detection on Chest X-ray using Radiomic Features and Contrastive Learning
date: 2021-01-12
journal: nan
DOI: nan
sha: cfd30c4cd51e338e1e1aa429592018e152787b5b
doc_id: 642046
cord_uid: 5q65xc9q

Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates its potential to facilitate medical imaging diagnosis before the deep learning era. With the rise of deep learning, the explainability of deep neural networks on chest X-ray diagnosis remains opaque. In this study, we proposed a novel framework that leverages radiomics features and contrastive learning to detect pneumonia in chest X-ray. Experiments on the RSNA Pneumonia Detection Challenge dataset show that our model achieves superior results to several state-of-the-art models (>10% in F1-score) and increases the model's interpretability.

Pneumonia is the leading cause of people hospitalized in the US [1] . It requires timely and accurate diagnosis for immediate treatment. As one of the most ubiquitous diagnostic imaging tests in medical practice, chest X-ray plays a crucial role in pneumonia diagnosis in clinical care and epidemiological studies [2] . However, rapid pneumonia detection in chest Xrays is not always available, particularly in the low-resource settings where there are not enough trained radiologists to interpret chest X-rays. There is, therefore, a critical need to develop an automated, fast, and reliable method to detect pneumonia on chest X-rays.

With the great success of deep learning in various fields, deep neural networks (DNNs) have proven to be powerful tools that can detect pneumonia to augment radiologists [3, 4, 5, 6] . However, most of the DNNs lacks explainability due to their black-box nature. Thus researchers still have a limited understanding of DNNs' decision-making process. * Co-corresponding.

One method of increasing the explainability of DNNs in chest radiographs is to leverage radiomics. Radiomics is a novel feature transformation method for detecting clinically relevant features from radiological imaging data that are difficult for the human eye to perceive. It has proven to be a highly explainable and robust technique because it is related to a specific region of interest (ROI) of the chest X-rays [7] . However, directly combining radiomic features and medical image hidden features provides only marginal benefits, a result mostly due to the lack of correlations at a "mid-level"; it can be challenging to relate raw pixels to radiomic features. In efforts to make more efficient use of multimodal data, several recent studies have shown promising results from contrastive representation learning [8, 9] . But, to the best of our knowledge, no studies have exploited the naturally occurring pairing of images and radiomic data.

In this study, we proposed a framework that leverages radiomic features and contrastive learning to detect pneumonia in chest X-ray. Our framework improves chest x-ray representations by maximizing the agreement between true image-radiomics pairs versus random pairs via a bidirectional contrastive objective between the image and human-crafted radiomic features. Experiments on the RSNA Pneumonia Detection Challenge dataset [10] show that our methods can fully utilize unlabeled data, provide a more accurate pneumonia diagnosis, and remedy the black-box's transparency.

Our contribution in this work is three-fold: (1) We introduce a framework for pneumonia detection that combines the expert radiographic knowledge (radiomic features) with deep learning. (2) We improve chest X-ray representations by exploring the use of contrastive learning. Our model thus has the advantages of utilizing the paired radiomic features requiring no additional radiologist input. (3) We find that our models significantly outperform baselines in pneumonia detection with improved model explainability.

Pneumonia detection is a binary classification task which requires to classify a chest radiology image into either pneu-monia or normal. Popular pneumonia detection dataset includes RSNA Pneumonia Detection Challenge [10] and pediatric pneumonia diagnosis 1 .

Traditionally, non-image features (e.g., patient age, gender, and body temperature) and radiomic features [11] are used for automatic chest disease classification. In recent years, many studies explored deep neural networks (DNNs) for this task [12, 13, 14] . For instance, Rajpurkar et al. introduced the CheXNet, a deep CNN trained to predict 14 diseases on chest X-ray [13] . Liang and Zheng used the Residual Neural Network (ResNet-18) [15] pre-trained on the NIH ChestX-ray 14 dataset and fine-tuned on the child's chest X-rays dataset for pediatric pneumonia diagnosis [14] .

For the medical image classification task, semi-supervised or unsupervised learning methods have benefited this task hugely because preparing annotated corpora is generally time-consuming and expensive. It also requires domain expertise and significant effort to ensure accuracy and consistency. To relieve this problem, one method is to utilize unlabeled image data. For example, Tang et al., [16] introduces the task-oriented unsupervised adversarial network, which consists of a cyclic I2I translation framework for RSNA Pneumonia Detection Challenge and a pediatric pneumonia diagnosis dataset.

Another popular trend, especially in recent years, is the contrastive representation learning [17, 18, 9] . Nevertheless, it may not be beneficial to directly apply these visual contrastive learning methods to medical images than pre-training models on ImageNet and fine-tuning them on medical images, mainly because the medical images have high interclass similarity [8] . Thus, Zhang et al. [8] proposed to use contrastive learning to learn visual representations from radiology images and text reports by maximizing the agreement between image-text representation pairs. Different from these works, we studied the contrastive learning between radiomics and convolutional neural networks (CNN) features to obtain medical visual representations. Therefore, our model does not require radiology text reports which are usually not publicly available. To this end, we deem that our framework is simple yet scalable when coupled with large-scale medical image datasets.

Inspired by recent contrastive learning algorithms [8] , our model learns representations by maximizing agreement between radiomics features related to pneumonia ROI of the chest X-rays and the image features extracted by the attentionbased convolutional neural network (CNN) model, via a contrastive loss in the latent space. Since radiomics can be considered as the quantified prior knowledge of radiologists, we deem that our model is more interpretable than others. As il-1 https://data.mendeley.com/datasets/rscbjbr9sj/3 lustrated in Figure 1 , our framework consists of three phases: contrastive training, supervised fine-tuning, and testing.

Contrastive training. The model is given two inputs, x u and x v . x u is the original chest X-rays without a corresponding paired bounding box. x v is the original chest X-rays with an additional paired bounding box. For normal chest X-rays, we take the whole image as a bounding box.

For x u , we utilize the pre-trained attention-based CNN models, Residual Attention Network (ResNet-18Attention) [19] pre-trained on CIFAR-10 [20] , as the backbone of the network. We replace the last fully-connected layer with a multilayer perceptron (MLP) to generate a 128-dimensional image features vector u. For x v , we apply the PyRadiomics 2 to extract 102-dimensional quantitative features, and [21] showed the details of these quantitative features and extraction process. We then use an MLP to map the features to a 128-dimensional radiomics feature v.

At each epoch of training, we sample a mini-batch of N input pairs (X u , X v ) from the training data, and calculate their image features and radiomics features pairs (U, V ). We use (u i , v i ) to denote the ith pair. The training loss function will be divided into two parts. The first part is a contrastive image-to-radiomics loss:

where < u i , v i > represents the pairwise distance, i.e.

[ (u i − v i ) p ] 1 p and p represents the norm degree, e.g., p = 1 and p = 2 represent the Taxicab norm and Euclidean norm, respectively; and τ ∈ R + represents a temperature parameter. In our model, we set p to 2 and τ to 0.1. Like previous work [8] , which uses a contrastive loss between inputs of the different modalities, our image-to-radiomics contrastive loss is also asymmetric for each input modality. We thus define a similar radiomics-to-image contrastive loss as:

Our final loss is then computed as a weighted combination of the two losses averaged over all pairs in each minibatch where λ ∈ [0, 1] is a scalar weight

Supervised fine-tuning. We follow the work of Zhang et al. [8] by fine-tuning both the CNN weights and the MLP blocks together, which closely resembles how the pre-trained CNN weights are used in practical applications. In this process, the loss function is the cross-entropy loss whereŷ and y represent the true and predicted disease label, respectively.: Testing. The model is only given one input, the original chest X-rays x u without a corresponding paired bounding box. Image features are extracted then mapped into the 128dimensional feature representation u. Finally, the predicted output is calculated based on u.

To evaluate the performance of our proposed model, we conducted experiments on a public Kaggle dataset: RSNA Pneumonia Detection Challenge 3 . It contains 30,227 frontal-view images, out of which 9,783 images has pneumonia with a corresponding bounding box. We used 75% imaged for training and fine-tuning and 25% for testing.

We used SGD as our optimizer and set the initial learning rate as 0.1. We iterated the training and fine-tuning process for 200 epochs with batch size 64 and early stooped if the loss did not decrease. We reported accuracy, F1 score, and the area under the receiver operating characteristic curve (AUC).

We compared four models: (1) ResNet-18, (2) ResNet-18 with radiomics features (ResNet-18Radi), (3) ResNet-18 with the attention mechanism (ResNet-18Att), and (4) ResNet-18Attention with radiomics features (ResNet-18AttRadi).

Experimental results are shown in Table 1 . Compared with the baseline models (ResNet-18 and ResNet-18Att), our radiomics-based models (ResNet-18Radi and ResNet-18AttRadi) achieved better performance on the pneumonia/normal binary classification task. It suggests that radiomic features can provide additional strengths over the image features extracted by the CNN model. Compared ResNet-18Att with ResNet-18 and ResNet-18AttRadi with ResNet-18Radi, we observed that the attention mechanism could effectively boost the classification accuracy. It proves our hypothesis that pneumonia is often related to some specific ROI of chest X-rays. Hence, the attention mechanism makes it easier for the CNN model to focus on those regions. Figure 2 shows the training and fine-tuning loss convergence for the ResNet-18AttRadi model on the training set. We find that the loss drops rapidly during the pre-training stage within just a few epochs, revealing that contrastive learning makes the model learn to extract image features fast and effectively. To fairly evaluate the impact of radiomics features on ROI, we conducted additional experiments using the whole image as a bounding box to extract the radiomics features, denoted as ResNet-18FairRadi and ResNet-18AttFairRadi. Table 2 shows that even if without ROI, the radiomics features could improve the performance of the deep learning model by 5% in F1 score. This observation further demonstrates that combining radiomics features with a deep learning model for reading chest X-rays is necessary. Figure  3 shows the original chest X-ray with a bounding box, attention map of the final attention layer of the ResNet-18Att and ResNet-18AttRadi, respectively. These examples suggest that our ResNet-18AttRadi model can focus on a more accurate area of the chest X-ray while ResNet-18Att attends to almost the whole image and contains plenty of attention noise. This illustrates that contrastive learning can help the model learn 

In this work, we present a novel framework by combining radiomic features and contrastive learning to detect pneumonia from chest X-ray. Experimental results showed that our proposed models could achieve superior performance to baselines. We also observed that our model could benefit from the attention mechanism to highlight the ROI of chest X-rays. There are two limitations to this work. First, we evaluated our framework on one deep learning model (ResNet). We plan to assess the effect of radiomic features on other DNNs in the future. Second, our model relies on bounding box annotations during the training phase. We plan to leverage weakly supervised learning to automatically generate bounding boxes on large-scale datasets to ease the expert annotating process. In addition, we will compare contrastive learning with multitask learning to further exploit the integration of radiomics with deep learning.

While our work only scratches the surface of contrastive learning using radiomics knowledge in the medical domain, we hope it will shed light on the development of explainable models that can efficiently use domain knowledge for medical image understanding.

This research study was conducted retrospectively using human subject data made available [10] . We don't

This project was supported by the National Library of Medicine under award number 4R00LM013001 and Amazon Machine Learning Grant. Ying Ding receives research support from Amazon. Yifan Peng is a coinventor on patents awarded and pending. 

Community-acquired pneumonia requiring hospitalization among us adults

Automated abnormality classification of chest radiographs using deep convolutional neural networks

Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases

Identifying pneumonia in chest x-rays: A deep learning approach

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images

Identifying medical diagnoses and treatable diseases by image-based deep learning

Development and clinical application of radiomics in lung cancer

Contrastive learning of medical visual representations from paired images and text

A simple framework for contrastive learning of visual representations

Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia

Radiomics: images are more than pictures, they are data

Intelligent pneumonia identification from chest x-rays: A systematic literature review

Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning

A transfer learning method with deep residual network for pediatric pneumonia diagnosis

Deep residual learning for image recognition

Tuna-net: Task-oriented unsupervised adversarial network for disease recognition in cross-domain chest x-rays

Data-efficient image recognition with contrastive predictive coding

Momentum contrast for unsupervised visual representation learning

Residual attention network for image classification

Learning multiple layers of features from tiny images

Computational radiomics system to decode the radiographic phenotype