key: cord-0448309-i5vaveci authors: Bhimireddy, Ananth Reddy; Burns, John Lee; Purkayastha, Saptarshi; Gichoya, Judy Wawira title: Few-Shot Transfer Learning to improve Chest X-Ray pathology detection using limited triplets date: 2022-04-16 journal: nan DOI: nan sha: e46a6984e9c33c60165ed6d44d84496be3fd7a6a doc_id: 448309 cord_uid: i5vaveci Deep learning approaches applied to medical imaging have reached near-human or better-than-human performance on many diagnostic tasks. For instance, the CheXpert competition on detecting pathologies in chest x-rays has shown excellent multi-class classification performance. However, training and validating deep learning models require extensive collections of images and still produce false inferences, as identified by a human-in-the-loop. In this paper, we introduce a practical approach to improve the predictions of a pre-trained model through Few-Shot Learning (FSL). After training and validating a model, a small number of false inference images are collected to retrain the model using textbf{textit{Image Triplets}} - a false positive or false negative, a true positive, and a true negative. The retrained FSL model produces considerable gains in performance with only a few epochs and few images. In addition, FSL opens rapid retraining opportunities for human-in-the-loop systems, where a radiologist can relabel false inferences, and the model can be quickly retrained. We compare our retrained model performance with existing FSL approaches in medical imaging that train and evaluate models at once. Improvements in deep learning algorithms and availability of large annotated datasets have been critical in the gains achieved in computer vision tasks including segmentation and classification. However, the application of these algorithms remains a challenge for medical imaging, as computer vision algorithms usually require a large amount of well-annotated datasets Rajpurkar et al. (2017) . In medical practice, annotated datasets are expensive to curate Willemink et al. (2020) , limited by HIPAA/GDPR and other regulations Armstrong et al. (2005) , and often focused on a specific medical condition limiting generalizability. Deep learning approaches that perform classification and segmentation tasks require large annotated datasets. Usually, a deep learning neural network will go over this large dataset multiple times (called epochs) to continuously adjust the weights of the network nodes during the training process. A contrarian approach to this traditional (big-shot) approach, which many researchers are working on under the terms, few-shot learning (FSL), one-shot learning, zero-shot learning Peng et al. (2015) , is to use fewer epochs or fewer images in the neural network training Ravi and Larochelle (2016) . In medical imaging, the big-shot approach is tedious, labor-intensive, and thus expensive because radiologists' time is costly Chartrand et al. (2017) . Few-Shot Learning on label limited datasets in medical imaging is not new Prabhu et al. (2019) ; Medela et al. (2019) ; Paul et al. (2021b) . In our proposed approach, we work with limited data and interannotator differences. We evaluate our approach using the CheXpert Irvin et al. (2019) Chest X-rays dataset. Our approach of blending an FSL algorithm with image triplets, which we call Triplet Few-Shot Learning (TFSL), is practical and novel. Image Triplets is a set of three images -a false positive (FP) or false negative (FN) generated by the model, a true positive (TP), and a true negative (TN). FSL is a form of meta-learning which allows generalization to new classes on small data sets. FSL has been applied in medical imaging for many tasks. In an ISBI 2019 paper, Medela et al. applied FSL to reduce the need for labeled biological datasets in histopathology Medela et al. (2019) . Ali et al. (2020) applied it for classifying endoscopy images Ali et al. (2020) . More recently, FSL has been applied to interactive training during segmentation, similar to our vision of using it in human-in-the-loop situations Feng et al. (2021) , in COVID-19 diagnosis from CT scans , and detecting rare pathologies from fundus images that were collected for a different purpose like Diabetic Retinopathy Quellec et al. (2020) . FSL can perform better than other frameworks such as transfer learning and Siamese networks to detect rare conditions usually represented with few images Quellec et al. (2020) ; Feyjie et al. (2020); Kim et al. (2017) . The TFSL algorithm is designed using MarginRank-ingLoss to reduce the number of false inferences made by the model. The TFSL algorithm is built on the best-performing pre-trained model submitted to the CheXpert competition. The pre-trained model used in the paper for experiments is trained using CheXpert data and is available athttps://github.com/ gaetandi/cheXpert. An inference of this model is run to create a baseline model. CheXpert dataset is used to train and evaluate the TFSL approach on image triplets. The first image in the image triplet is randomly selected from the failed inference images of the baseline model. Theoretically, the inference failure can be either an FP or FN. The second image in the image triplet is a TP and the third image is a TN. While collecting the training image triplets (Fig 1) , a checking label is also collected. The checking label is -1 if the first image of the triplet is FN and 1 if the random image is an FP. The choice of -1 and 1 as labels is explained in Section 3.4. We tested sets of 50/100/150 image triplets and the fine-tuning model improved performance over the baseline model at 150 image triplets. The randomly selected 150 image triplets were used for training the TFSL algorithm, but all the failed inference images (except the ones that are used for training) were collected and used to validate the algorithm. False inference images are randomly selected from all the failed inference images. The pre-trained model uses a DenseNet121 algorithm to classify pathologies from an image. Before training, images are converted to RGB, resized to (320x320) and further cropped to 224. The image data is converted into a PyTorch DataLoader. Adam optimizer and BCELoss (Binary Cross-Entropy Loss) are used to build the pre-trained model. Inference (evaluation) of the pre-trained model was used as the baseline model in this paper. We also implemented the FSL algorithm by following the guidelines outlined Cermelli et al. (2020) and refer to it as Incremental FSL Training in this paper. The pre-trained classification model was modified by replacing the final layer (Linear layer and Sigmoid activation function) with a Linear layer of 128 units and PReLU (Parametric Rectified Linear Unit) He et al. (2015) activation function to create 128-dimensional vectors for every image in the image triplets. The architecture of data set creation and modeling the TFSL algorithm is summarized in Fig 3. 3. Each image in the image triplet was transformed into a 128-dimensional vector. These 128dimensional vectors are used to calculate the distance between the images. If the false inference image is FN, then the image should be closer to the TP image from the triplet in an n-dimensional space; conversely if the false inference is FP, then the image should be closer to the TN image from the triplet in an n-dimensional space. We use the pre-trained classification model to create the n-dimensional vectors. The TFSL model is trained for five epochs using Adam Optimizer, a learning rate of 0.0001, and a weight decay of 1e-5. Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are used as evaluation metrics. Margin Ranking Loss is used as the loss function. The training approach is slightly different from the Incremental FSL approach, in which the model was trained and validated on all 14 pathologies at once. Margin Ranking Loss function takes two inputs x1, x2 and returns 1 if the first input is ranked higher and returns -1 if the first input is ranked lower. We chose -1 and 1 as labels while creating training image triplets to mimic this pattern. The euclidean distance between first image vector and second image vector will be x1 and the euclidean distance between first image vector and third image vector will be x2. The mathematical form of Margin Ranking Loss function (1) is provided below -where x1, x2 are the loss inputs and y is the label tensor. loss(x1, x2, y) = max(0, −y * (x1 − x2) + margin) (1) PPV and NPV are chosen as evaluation metrics for the FSL because they aid in identifying the decrease in False Positives and False Negatives, respectively. PPV increases as the FPs are converted to TNs. NPV increases as FNs are converted to TPs. All the models designed, trained, and evaluated in this paper uses the PyTorch. We open-sourced the dataset creation, training and validation code 1 . We compare the baseline DenseNet Model to TFSL model and incremental FSL models. The results are provided in Table 1 . The baseline model has high PPV and low NPV values. The TFSL model reduced FP and FN outputs, indicated by an increase in PPV and NPV, respectively. By performing a statistical test, we concluded that the TFSL algorithm and Incremental FSL algorithm results improved over the baseline results. After performing a similar test on TFSL and Incremental FSL algorithms, we concluded that the PPV values did not significantly differ from each other. The Incremental FSL NPV was higher than both the TFSL model with a statistical significance value (pvalue) of 0.007. We used the dependent t-test model from statsmodel in the Scipy package to perform the above statistical tests. The time taken to train and validate the TFSL model on any pathology is around 8-9 minutes on a Nvidia Quadro P6000 GPU with 24 GB memory. This provides us with the ability to label and train the model rapidly. The Incremental FSL also consumed the same amount (8-9 minutes) to train and validate the algorithm on any pathology. FSL architectures are growing within medical imaging Paul et al. (2021a) . This work builds on FSL architecture through the use of triplets as shown in Figure 2 . TFSL requires less training data and time compared to previous approaches. While a two-step process of using a saliency-based classifier with a discriminative autoencoder ensemble Paul et al. (2021a) has better performance in FSL compared to our approach, the simplicity and speed of our approach are important advantages to be considered. Our approach for fine-tuning models can be taken to edge devices, as has been shown in the non-medical imaging domain Lungu et al. (2020) . Additionally, all previous FSL architectures consider a singular ground truth label for images. Image labeling can have variability, particularly among radiologists when annotating studies Cabitza et al.; Saha et al.. Our approach is able to use triplets that are determined by human-in-the-loop's annotations and train a model that is specific to these new labels. Thus, the TFSL approach can deal with rapid re-training required on false inference images as determined by the user radiologist. The baseline model failed to identify true negative inference images frequently in pathologies such as Enlarged Cardiomediastinum, Lung Lesion, Consolidation, Pleural Other and Fracture, and at other times failed to identify true positives. Fine-tuning in TFSL and Incremental FSL substantially improved the identification of TP and TN inference images. A TFSL and humanin-the-loop system can be retrained easily through transfer learning methods made difficult by other approaches. In the paper, we presented a comparison of results between baseline and fine-tuned models, providing a conclusive evidence that the TFSL algorithm outperformed the baseline model. The use of Margin Ranking as the loss function, performance gains in limited datasets with quick retraining, and the simplicity of our approach are important characteristics of the TFSL approach. In the future, we plan to test this approach on different modalities and nonmedical/natural image data sets. In summary, the major contributions of our paper are as follows: 1. We present a modified Few-Shot Learning algorithm to effectively improve the results of predicting pathologies on images whose inferences failed. 2. We present a comparison of results between a fine-tuned model, our Few-Shot Learning model and a previously published few short learning algorithm trained in an incremental fashion. Our model out-performs the fine-tuned model and achieves a higher NPV in all classes, with close performance to the incremental FSL. We demonstrate that we can get good performance at lower computation. 3. We experiment with MarginRankingLoss and TripletMarginLoss function as loss functions. Despite the assumption that TripletMarginLoss would perform better for image triplets, we found that MarginRankingLoss is more appropriate for our use case. This is not described in any of the previous works. 4. Previous works have evaluated Few-Shot Learning on fewer classes within one or multiple datasets. We present our experiments using the CheXpert dataset to improve the predictions on 14 pathologies on Chest X-rays. Institutional Review Board (IRB) Figure 1 : Pneumothorax detection -triplets Additive angular margin for few shot learning to classify clinical endoscopy images Potential Impact of the HIPAA Privacy Rule on Data Collection in a Registry of Patients With Acute Coronary Syndrome Bridging the "last mile" gap between AI implementation and operation: "data awareness" that matters A few guidelines for incremental few-shot segmentation Umapada Pal, and David Doermann. Face recognition-a one-shot learning perspective Deep learning: a primer for radiologists. Radiographics Self-supervised learning for fewshot image classification Momentum contrastive learning for few-shot covid-19 diagnosis from chest ct images An overview of multi-modal medical image fusion Interactive few-shot learning: Limited supervision, better medical image segmentation Semisupervised few-shot learning for medical image segmentation Fully convolutional structured lstm networks for joint 4d medical image segmentation Delving deep into rectifiers: Surpassing human-level performance on imagenet classification Leveraging the feature distribution in transferbased few-shot learning Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison Efficient multi-scale 3d CNN with fully connected CRF for accurate brain lesion segmentation Few-shot learning using a small-sized dataset of high-resolution fundus images for glaucoma diagnosis Siamese networks for few-shot learning on edge embedded devices Few-shot learning with global class representations Few shot learning in histopathological images: reducing the need of labeled data on biological datasets Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation Discriminative ensemble learning for few-shot chest x-ray diagnosis Discriminative ensemble learning for few-shot chest x-ray diagnosis Learning deep object detectors from 3d models Few-shot learning for dermatological disease diagnosis Automatic detection of rare pathologies in fundus photographs using few-shot learning Chexnet: Radiologist-level pneumonia detection on chest xrays with deep learning Optimization as a model for few-shot learning Breast cancer MRI radiomics: An overview of algorithmic features and impact of inter-reader variability in annotating tumors f-anogan: Fast unsupervised anomaly detection with generative adversarial networks Preparing medical imaging data for machine learning This research work performed in this paper does not require IRB approval as the data is an open-source dataset.