key: cord-0563564-3g5z8s8p authors: Karnes, Michael; Perera, Shehan; Adhikari, Srikar; Yilmaz, Alper title: Adaptive Few-Shot Learning PoC Ultrasound COVID-19 Diagnostic System date: 2021-09-08 journal: nan DOI: nan sha: f7010fadcec96735b66f2ccacf85a0ccc40848aa doc_id: 563564 cord_uid: 3g5z8s8p This paper presents a novel ultrasound imaging point-of-care (PoC) COVID-19 diagnostic system. The adaptive visual diagnostics utilize few-shot learning (FSL) to generate encoded disease state models that are stored and classified using a dictionary of knowns. The novel vocabulary based feature processing of the pipeline adapts the knowledge of a pretrained deep neural network to compress the ultrasound images into discrimative descriptions. The computational efficiency of the FSL approach enables high diagnostic deep learning performance in PoC settings, where training data is limited and the annotation process is not strictly controlled. The algorithm performance is evaluated on the open source COVID-19 POCUS Dataset to validate the system's ability to distinguish COVID-19, pneumonia, and healthy disease states. The results of the empirical analyses demonstrate the appropriate efficiency and accuracy for scalable PoC use. The code for this work will be made publicly available on GitHub upon acceptance. Coronavirus disease 2019 , caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has rapidly become a global health emergency [1] . As of December 2020, the virus has spread to every country infecting more than 69 million people and resulting in 1.5 million deaths worldwide [2] . Insufficient medical resources have become a major challenge, especially in low-income countries. There is a critical need for fast, accessible and low-cost diagnostic tests in point-of-care (PoC) settings to stratify risk and efficiently allocate limited healthcare resources. SARS-CoV-2 reverse transcriptase-polymerase chain reaction (RT-PCR) is the current diagnostic gold standard worldwide [3] . It has an estimated sensitivity of 75% and can take several days to obtain results [4] , [5] . While chest Xray is more widely available, its utility is limited by its low sensitivity [6] . Computed tomography (CT) has been viewed Identify applicable funding agency here. If none, delete this. as an alternative for the diagnosis of COVID-19 [7] . However, due CT's ionizing radiation and limited availability outside of large hospitals it is not an optimal screening tool. A rapid, accurate, and inexpensive screening tool is required for appropriate triage and diagnosis of patients with suspected COVID-19. The use of bedside lung ultrasound (LUS) is a common practice in a wide variety of clinical settings, including emergency departments and intensive care units [8] . There have been several studies evaluating the use of LUS in patients with suspected COVID-19 infection with one reporting a sensitivity of 90%, higher than seen in X-ray [9] , [10] . LUS can help identify patients with COVID-19, who have been incorrectly diagnosed by RT-PCR, and prevent the further spread. In addition to screening, LUS has been shown to be an effective imaging modality for predicting the course, stratifying the risk, and monitoring COVID-19 disease state [11] . The characteristic LUS findings of COVID-19 (a thickened or irregular pleural line, confluent B-lines, sub-pleural consolidations, and pleural effusions) demonstrate promise in trending clinical progression from onset to resolution [12] . Thus, LUS is a reliable, costeffective, and easy-to-use tool for rapid triage, diagnosis, and early risk stratification of COVID-19. The primary limitation of LUS diagnostics is the extensive training, experience and expertise required for the accurate identification of disease characteristics [13] , [14] . The ability to accurately interpret LUS images requires recognition of normal sonographic anatomy, normal variants, as well as pathology. As a result, LUS diagnostics are limited from their full use in point-of-care (PoC) settings. This creates a need for additional technologies to aid healthcare providers in interpreting LUS images. Machine learning (ML) algorithms are one such technology. Originating from the field of pattern recognition, ML provides the framework for extracting coherent patterns from high dimensional noisy data. This is especially true for deep neural networks (DNN). In 2012, the power of the DNN was established with the record breaking performance of the AlexNet achieving a top-1 accuracy of 63.3% over a 1000 class classification problem [15] . This breakthrough was made possible by the collection of a large annotated dataset, ImageNet, with millions of images and the developments in computational power. Shortly after, the DNN became larger and more complex, improving their performance each year with a current record of 88.6% top-1 accuracy [16] . These results solidified the position of DNN as a top approach for visual classification. The high performances of DNN visual classification comes with a critical caveat; the training set must sufficiently represent the scenarios seen while testing. This means large annotated training sets, limited applicability, and unpredictable errors [17] . In response there has been an effort to reduce training set sizes and improve generalization by transferring the knowledge of a DNN pretrained on large dataset to novel applications, commonly referred to as transfer learning [18] . The primary advantage to this approach is the parameters of the DNN are frozen while adaptive layers are trained, which significantly reduces the number of trained parameters and the number of required training examples. This is important for applying ML to ultrasound datasets. The availability of annotated ultrasound datasets is increasing with some reaching tens of thousands of images. However, the majority of ultrasound datasets are less than 300 images [19] . The contribution of this work is the introduction of a novel LUS diagnostic system built on the few-shot learning (FSL) visual classification algorithm. The proposed system has low training requirements with as few as 8 images per class, while traditional DNN approaches require thousands. The results of this study demonstrate the ability of the FSL based system in extending the accessibility of rapid LUS diagnostics to resource limited clinics. The proposed ultrasound COVID-19 diagnostic system is based on a SOTA DNN visual classification algorithm that significantly reduces training requirements associated with traditional deep learning. DNN are large regressed embedding transforms. In the classification problem, the DNN is trained to project the image space to the latent space that minimizes classification error. The learned transform filters within the network posses the network's knowledge domain. Many approaches have been taken to adapt the learned knowledge domain to novel tasks. This can take the form of fine tuning [20] - [22] , where the learned state is used to initialize the DNN and only the final layers of network are trained [23] , or direct transformations of the DNN knowledge domain to a novel task [24] , [25] . The algorithm within the proposed system falls in the domain adaptation category producing a direct transform of the DNN latent space to a targeted discriminative feature space. FSL provides a framework for leveraging the knowledge domain of pretrained networks to novel tasks. In its basic The few-shot visual diagnostic task classifies a query image with respect to a small annotated set reference images. The ultrasound images are imported from the handheld probe to the computer, encoded, and then classified by their distances to dictionary of reference points. The clinician is presented with a report of each disease state probability, their distances from the reference points, and an attention heat map highlighting the regions of interest. form, high dimensional images are encoded into a metric feature space and then classified by their relations from learned reference points as shown in Figure 1 . FSL follows a long line of metric based learning with many recent works focusing on incorporating DNN. One early example is the Siamese network architecture [26] . Further developed by the Matching network [27] where the DNN was trained to estimate the set-to-set probability. In 2017, Prototypical-Net [22] trained a DNN to directly generate a discriminative feature embedding space. The application of ML in ultrasound diagnostics has been growing rapidly, but still limited when compared to other imaging modalities. These pioneering studies address such task as: tumor detection, fetal health monitoring, and cardiac monitoring [19] , [28] . The onset of COVID-19 created a push for rapid pulmonary diagnostics. These developments have successfully utilized MRI, CT, and LUS images to detect COVID-19 characteristics in patients' lungs. To the best of our knowledge, there have only been four studies applying ML for LUS diagnostics [29] , [30] , and only two focused on the detection of COVID-19 [31] , [32] . This work differs from the current SOTA in three major ways due to the direct consideration for clinical usage during development, primarily the ability to adapt to individual practices and provide information in such a way to be easily incorporated into the greater corpus of information used in the diagnostic process. These differences are: 1.) we present a full ML LUS diagnostic system; 2.) we incorporate a vocabulary based FSL pipeline enabling a significant reduction in training requirements; 3.) our system generates intuitively understandable distance based classifications. This section presents the methodology behind the proposed approach. The flow chart of the algorithm is shown in Figure 2 . First the theory and problem formulation of FSL is presented. This is followed by an explanation of the feature extraction and classification processes and concludes with the algorithm training process. , where x i ∈ R dimg with labels y i ∈ 1, . . . , C. The FSL approach uses a sub-sample D τ ⊆ D to create a set of support (reference) images, , and a set of query images to be classified, FSL requires a few assumptions on the relationships between sets D, D τ , S τ , Q τ . D τ must be a sub-sample of classes each with a specified number of samples, k. S τ and Q τ must be split such that the classes in the query set are represented in the support set. The FSL objective is to find a function f (S τ , Q τ ) that best estimates of the query set labels, y * , with best being defined as Deep neural image classification networks are trained to estimate the class probabilities of a given image using a final sof tmax() output layer. This process generates learned feature extracting filters corresponding to the most discriminative features of the training set. Assuming the learned filters are sufficiently generalized, a query image can be effectively encoded in the network's latent feature space. The class probabilities can also be viewed as a Gaussian mixture model (GMM) of class characteristics in the latent manifold [24] . This view looks at the image as an instance of characteristics from a GMM source. The DNN performs a series of linear kernel transformations. From the central limit theorem, it is known that the linear combination of Guassian distributions generates a Gaussian distribution. Therefore, the DNN embedded features can be viewed as a GMM produced from the GMM of visual characteristics with a class probability defined by: The proposed algorithm performs a series of linear transforms on the GMM of the latent manifold, preserving the mixture model throughout the process. The principle component analysis (PCA) calculates the dominant directions of the manifold through the eigenvectors of the covariance matrix. The result is a manifold oriented by decreasing variance and therefore decreasing entropy. Trimming the eigenvectors with the smallest eigenvalues reduces the GMM to the most informative distributions with the greatest entropy. The k-means cluster vocabulary organizes the distribution structures into 'words' across classes according to the prominent feature clusters seen in the latent features of the support set. Interpreting the latent manifold with the calculated vocabulary combines distributions into semantically relevant features based on their similarity to the 'words' in the generated support set vocabulary. The Mahalanobis distance transforms the GMM according to its relation to the learned class signatures. This process centers the GMM around the code word and scales it with the covariance matrix. Therefore, the Mahalanobis distances can also be considered as instances of a GMM and can be interpreted as the probability of the query image originating from the GMM of the reference class. The final classification is completed using the linear discriminant analysis (LDA), which projects the class probabilities to the optimally discriminative space, maximizing interclass variance while minimizing the intraclass variance. a) Feature Extraction: The proposed algorithm extracts activation features from the latent space of a pretrained DNN, MobileNet [33] . Let Φ be the feature embedding function of the DNN. The feature embedding, a i , of the image x i is generated by a forward pass through the network, shown in Equation 3 , where a i are the activations within the networks latent space. The embedded features are then compressed to a i using Equation 4 . The latent features are reduced by PCA transform, P , and interpreted by the vocabulary, V , calculated from S τ using k-means clustering. The resulting mean vector of the features are then normalized, creating the image feature vector r i . b) Dictionary Generation: The class signatures (a.k.a. representative appearance models) of the dictionary are calculated from the image feature vectors, r(x) in the support set S τ . The support set provides k examples for each class, giving us multiple representations per class r c,k . The co-variance of the class, Σ c , is calculated from S τ c . Class Sub-representations are calculated from the k-means of the r c manifold, producing p clusters. The hierarchical code word representations are generated from the centroids of each cluster, µ c,p . c) Classification: The class of a query image is predicted by the distances of its signature to those in the dictionary. The distances are calculated using Mahalanobis distances from the class signatures, µ c,p . The final classification decision is made by linear discriminant analysis (LDA) of the query image distances to each dictionary signature. d) Training: The proposed approach is unique in that it requires no DNN training. Instead, a series of linear transforms is trained on small sample of reference images and project the MobileNet DNN embedded features to a optimally discriminative space. These transforms include PCA reduction, k-means vocabulary, dictionary, and LDA separation. The PCA is pretrained on an unlabeled random sub-sample of D, serving as a general context transform of the DNN response to a lower dimensional space, trimming low activation neurons. The kmeans, dictionary, and LDA are trained on the sub-sampled support set of reference images, S τ . Performance evaluations are conducted on the COVID-19 POCUS Dataset [32] , the largest publicly available of its type, comprising PoC LUS images from COVID-19, pneumonia, and healthy patients. The dataset is split into LUS clips produced by linear and convex probes. The clips were collected from several sources, the primary being: grepmed.com, thepocusatlas.com, butterflynetwork.com and radiopaedia.org. The dataset is heterogeneous, originating from varying institutions and devices. The data is unidentified and no additional meta data, such as vitals or demographics, are provided. All image annotations are verified by medical professionals. In total, the linear clips contain images comprised of 1,457 normal, 315 pneumonia, 445 COVID frames. The convex LUS clips contain images comprised of 11,646 normal, 4,585 pneumonia, 8,188 COVID frames. The proposed algorithm was evaluated on test data randomly selected and sequestered using a 20% split. Each image was normalized and resized to (224,224). The objective of the experiment is to analyze the training requirements and classification performance of the system. This is done by running an analysis of the algorithm's classification performance with varying number of reference images from 8 to 64 for each class. Three binary classification scenarios are considered: healthy v. COVID-19, healthy v. pneumonia, and pneumonia v. COVID-19. The algorithm was implemented in Python on a Linux OS with open-source libraries. All experiments were ran on a Intel(R) Core(TM) i5-8600K CPU with 16 GB of RAM. The longest experimental case (using 64 samples) took 15 seconds to process. All evaluation metrics are calculated over 10 trials each containing randomly selected training and test sets. The experimental performances was evaluated using receiver operating characteristic curves (ROC) which shows the system's sensitivity over its selectivity. Note the ROC curves are plotted using 1-Specificity for easier reading. Only results for linear ultrasound images are shown due to limited space. Figure 3 shows the mean ROC curves for each experimental case. The plots are organized first by classification scenario, (healthy v. COVID-19, healthy v. pneumonia, or pneumonia v. , and then by number of training samples per class, (8, 16, 32, 64) . The strongest trend is seen in the increase in specificity with number of training examples. A performance saturation is seen at 64 samples for all scenarios. The highest performance is seen in the healthy v. pneumonia case achieving a high sensitivity with just 8 training samples. This was followed by the pneumonia v. COVID-19 and then healthy v. COVID-19. These results show that detecting COVID-19 is a more challenging task than detecting pneumonia, but still be achieved with 64 samples per class. In the pursuit of increased decision understanding an attention heat map to highlight the image regions that correspond to the algorithm's disease state decision is generated by the system. Only qualitative assessment is possible due to a lack of segementation annotation. Figure 4 shows an ultrasound image of a COVID-19 infected lung with the attention heat map. The image is sampled in a grid of image tiles. The distance of each tile to the learned COVID-19 signature is denoted by its color with red being the highest. This heat map highlights a subpleural consolidation. The purpose of these experiments is to asses the potential effectiveness of the COVID-19 ultrasound diagnostic system evaluated by its ability to accurately predict disease state and to efficiently learn from limited samples. In medical decision making, the risks of type one and two errors must be considered. The ROC curves display the performance tradeoffs for higher sensitivity and specificity. The results of the experiments show that the algorithm is capable of reliably detecting COVID-19 symptoms in the lungs. the results also show that the algorithm is capable of reliably distinguishing COVID-19 symptoms from pneumonia. The results of the experiments also demonstrate a significant reduction in training requirements with the capability of learning disease models with as few as 8 training samples per class in some scenarios. A high classification performance was seen in all scenarios with 64 samples per class. This capability opens the door for clinicians to adapt the algorithm to their environmental factors, such as differences in patient demographics, equipment, and operators. The value of the algorithm's diagnostic performance is dependent on its ability to be incorporated into the larger clinical diagnostic process. This requires that the algorithm's diagnostic predictions be presented in an intuitively understandable manner. Qualitative assessment of the attention heat maps demonstrates the capability of highlighting relevant regions of interest. The combination of the high prediction performances and the intuitive displays of the system delivers the aide of deep learning in a clinically viable way. Rapid, accurate, and inexpensive COVID-19 detection is critically needed. This paper presents the adaptive PoC ultrasound COVID-19 diagnostic system based on the novel FSL visual classification algorithm. The system was designed with a specific focus on its incorporation into the clinical diagnostic process, requiring understandable outputs, adaptability, and reliability. The system takes less then 15 seconds to train on an Intel(R) Core(TM) i5-8600K CPU. The generated disease state models are compact each requiring less than 1 MB of memory. The distance based classification provide intuitive interpretation of the system's predictions. The attention heat maps highlight the regions of the ultrasound images that are most responsible for its classification. The results show that the system is highly capable of accurately diagnosing COVID-19 and pneumonia disease states with as few as 64 training images per disease. Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2): a global pandemic and treatment strategies COVID-19 situation update worldwide, as of 10 Sarscov-2 and the covid-19 disease: a mini review on diagnostic methods False Negative Tests for SARS-CoV-2 Infection -Challenges and Implications A case report of COVID-19 with false negative RT-PCR test: necessity of chest CT Accuracy of Emergency Department Clinical Findings for Diagnosis of Coronavirus Disease Diagnostic value and key features of computed tomography in Coronavirus Disease Lung ultrasound for the emergency diagnosis of pneumonia, acute heart failure, and exacerbations of chronic obstructive pulmonary disease asthma in adults: A systematic review and meta-analysis Diagnostic accuracy of point-of-care lung ultrasound in COVID-19 Application of Lung Ultrasound During the Coronavirus Disease 2019 Pandemic: A Narrative Review Lung Ultrasound Findings in Patients with COVID-19 ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection Accuracy of ultrasonography in the diagnosis of acute calculous cholecystitis: review of the literature A critical evaluation in the delivery of the ultrasound practice: the point of view of the radiologist Imagenet classification with deep convolutional neural networks Sharpness-aware minimization for efficiently improving generalization One pixel attack for fooling deep neural networks A survey on deep transfer learning Machine learning for medical ultrasound: status, methods, and future opportunities Convolutional neural networks for medical image analysis: Full training or fine tuning? Learning to compare: Relation network for few-shot learning Prototypical networks for fewshot learning Incremental learning through deep adaptation Improved fewshot visual classification Fast and flexible multi-task classification using conditional neural adaptive processes Siamese neural networks for one-shot image recognition Matching networks for one shot learning Deep learning in medical ultrasound analysis: A review Automated bline scoring on thoracic sonography Detection of abnormalities in ultrasound lung image using multi-level rvm classification Accelerating covid-19 differential diagnosis with explainable ultrasound image analysis Pocovid-net: Automatic detection of covid-19 from a new lung ultrasound imaging dataset (pocus) Mobilenets: Efficient convolutional neural networks for mobile vision applications