key: cord-0486850-spyiwwvb authors: Baum, Zachary M C; Bonmati, Ester; Cristoni, Lorenzo; Walden, Andrew; Prados, Ferran; Kanber, Baris; Barratt, Dean C; Hawkes, David J; Parker, Geoffrey J M; Wheeler-Kingshott, Claudia A M Gandini; Hu, Yipeng title: Image quality assessment for closed-loop computer-assisted lung ultrasound date: 2020-08-20 journal: nan DOI: nan sha: 2a261cfaaa47b5b22eb53d8b830f17a1f210b482 doc_id: 486850 cord_uid: spyiwwvb We describe a novel, two-stage computer assistance system for lung anomaly detection using ultrasound imaging in the intensive care setting to improve operator performance and patient stratification during coronavirus pandemics. The proposed system consists of two deep-learning-based models. A quality assessment module automates predictions of image quality, and a diagnosis assistance module determines the likelihood-of-anomaly in ultrasound images of sufficient quality. Our two-stage strategy uses a novelty detection algorithm to address the lack of control cases available for training a quality assessment classifier. The diagnosis assistance module can then be trained with data that are deemed of sufficient quality, guaranteed by the closed-loop feedback mechanism from the quality assessment module. Integrating the two modules yields accurate, fast, and practical acquisition guidance and diagnostic assistance for patients with suspected respiratory conditions at the point-of-care. Using more than 25,000 ultrasound images from 37 COVID-19-positive patients scanned at two hospitals, plus 12 control cases, this study demonstrates the feasibility of using the proposed machine learning approach. We report an accuracy of 86% when classifying between sufficient and insufficient quality images by the quality assessment module. For data of sufficient quality, the mean classification accuracy in detecting COVID-19-positive cases was 95% on five holdout test data sets, unseen during the training of any networks within the proposed system. Lung ultrasound (LUS) imaging can be used for point-of-care diagnosis and management of some pulmonary conditions in patients presented with respiratory symptoms [1] , such as the recently-emerged COVID-19-related pneumonia [2] . When outbreaks of infectious respiratory diseases happen, bedside LUS imaging at point-of-care offers great accessibility and flexibility for healthcare provision while helping to manage infection control [3] . Indeed, studies have indicated that LUS may be useful for assessing suspicious COVID-19-positive cases for triage during the early management of the disease [4] . A prospective study on 41 patients showed that LUS not only detects positive COVID-19 patients but also indicates prognosis, evidenced by the association with admission to intensive care units and death [5] . Despite positive results from various referral centers, and the widespread availability of US equipment, the use of LUS for assisting with the management of infections such as COVID-19 is severely limited, particularly in smaller hospitals and countries with limited healthcare infrastructure. We argue that this lack of adoption is likely to be due partly to the lack of healthcare professionals with the requisite training to acquire and correctly interpret diagnostic-quality US images, and the practical challenges associated with providing this training [6] . Modern machine learning methods, often represented by deep learning, can potentially address these problems through automated LUS image classification. Born et al. [7] reported 89% and 92% classification accuracy using neural networks, at the frame and video level, respectively. Roy et al. [8] proposed deep learning models for frame-and video-level severity score prediction, as well as pixel-level segmentation. However, such methods are often trained and tested using expert-acquired US data, and obtaining large quantities of such data is often infeasible in practice. Furthermore, there is a steep learning curve associated with high-quality US data acquisition, and substantial variation has been observed in acquisition and interpretation protocols between experts [9] . We propose a computer-assisted system for both US acquisition and interpretation, which aids clinical diagnosis and prognosis through prediction. In particular, we present a closed-loop design of an imaging acquisition protocol based on a novel image quality assessment algorithm. This design ensures both downstream image interpretation tasks receive data of sufficient quality. We also propose a one-class novelty detection network to address the use of non-representative training data that have few or limited negative examples, which is a common issue in the context of a rapidly evolving disease epidemic/pandemic, such as the recent COVID-19 pandemic [10] . This work presents an application to detect COVID-19-positive patients, demonstrating the feasibility of two modules: a quality assessment module and a diagnosis assistance module. We conducted experiments using a set of retrospective LUS data from both a control group and a positive COVID-19 group, confirmed by swab tests using polymerase chain reaction (PCR). Our contributions are: 1) a closed-loop imaging acquisition assistance system developed based on image quality assessment; 2) a novel image quality assessment module that combines supervised classification with one-class novelty detection networks; and 3) a rigorous validation experiment presented using COVID-19 LUS data obtained on patients. Figure 1 provides an overview of our proposed system. The quality assessment module classifies US images and provides user feedback. When image quality is insufficient, users are instructed to repeat the acquisition at this location on the patient, following application-specific protocols. Although considered out of scope for this work, such a system may prompt users to repeat predefined protocols, for example making use of external assistance such as teleguidance or automated action suggestions [11, 12] . The diagnosis assistance module uses images determined to be of sufficient quality to perform predefined tasks, such as inferring the likelihood of specific conditions. We consider two strategies for training convolutional neural networks to accept or reject diagnostic-quality LUS images: The first strategy uses a classification network that receives a LUS frame and predicts class probabilities for individual label typessufficient and insufficient quality. A binary classification network (QA bin ) based on the Visual Geometry Group (VGG) network architecture [13] is used to discriminate images of sufficient quality from those of insufficient quality. Training QA bin requires manual labeling of the data as belonging to one of two classes, sufficient quality or insufficient quality. The second strategy is to train an adversarial deep learning model for one-class novelty detection [14] , which requires only COVID-19-positive examples. A common scenario encountered during a rapidly-evolving epidemic is that control data from healthy subjects or patients with other conditions, obtained using the same protocol, are not readily available [10] . A quality assessment novelty detection (QA nd ) approach is proposed to train a reconstructor network and a discriminator network in an adversarial manner [14] . The reconstructor network learns how to reconstruct the target class images with sufficient quality in this casewhile it produces larger errors when reconstructing images outside of the target distribution, assessed by the simultaneously-trained discriminative network. Additionally, we propose combining these two models using Bayesian model averaging. The third quality assessment method (QA bin+nd ) assesses image quality by ( ) = ( ) + ( ), where ( ) and ( ) are class probabilities obtained from QA bin and QA nd , respectively. The prior probabilities, and , are estimated by the proportion of respective training data size. QA bin+nd serves as a simple mechanism to combine the two previous models, which may be trained with different data sets acquired at different stages of the epidemic. This may provide a method that does not require model re-training as new data becomes available or if there is restricted access to some datasets. Another binary classification network (D bin ) is trained, also based on the VGG network [13] , to predict if LUS image frames fall into the COVID-19-positive or COVID-19-negative class. The use of a well-established network architecture, Figure 1 . Illustration of the proposed closed-loop lung ultrasound system described in Section 2. such as that used in D bin , is to investigate the feasibility of the feedback mechanism enabled by the quality assessment module. Therefore, in addition to the classification performance of D bin using all test data, the overall classification error, sensitivity, and specificity at various threshold values are also reported using images that were previously classified by the quality assessment module as being of sufficient quality. Further work is needed to test this strategy with video-level or subject-level classification tasks and for more fine-grained severity scores or prognostic clinical outcomes. Patient images were obtained in two hospitals by two clinicians, both using a Butterfly iQ US probe (Butterfly Inc., Guilford, CT, USA). In total, 25800 LUS images were acquired at 10 frames per second from 37 COVID-19-positive patients (11495 images from 10 positive cases at Site A and 14305 images from 27 positive patients at Site B). Additionally, 16627 images from 12 control patients, who tested negative for COVID-19, were acquired at Site A. All COVID-19 diagnoses were confirmed by PCR tests. Image quality was manually labeled as sufficient or insufficient by an experienced ultrasound imaging researcher with over 5 years of experience with clinical US imaging. This resulted in 699 images from Site A and 238 images from Site B being denoted as of insufficient quality. Example images are provided in Figure 2 . Due to challenges with control data availability across imaging sites, this work adopts a two-way split for training and test sets without the use of a validation set. This prohibits network fine-tuning through methods such as hyperparameter tuning. The proposed quality assessment networks, QA bin , QA nd , and QA bin+nd , were trained on data from Site B, containing only COVID-19 patient images. QA bin was trained on all images, whilst QA nd was trained on images of sufficient quality. The diagnosis assistance module network, D bin , was trained five times, with five-fold cross-validation. Site A data, containing COVID-19 and control images, was partitioned, at the patient-level, into five cross-validation training and test sets. Each fold contained COVID-19 and control images. Three experiments are presented to evaluate the quality assessment module. The five Site A test sets were used as this data was not previously seen by any networks and contained COVID-19 and control images. Before diagnosis classification, each fold of data was "screened" independently by each quality assessment network, yielding three "screened" versions of each Site A test set. Each version is then classified by D bin . Testing in this manner gives three diagnosis assistance module results on three different "screened" versions of the Site A test sets. This facilitates validation of the entire proposed system using unseen data (i.e. data not available during training). All neural networks were implemented in TensorFlow 2. Reference-quality open-source code was adopted, where possible, for reproducibility. The quality assessment and diagnosis assistance modules were trained on a NVIDIA DGX-1 using a Tesla V100 GPU for approximately 90 and 60 minutes each, respectively. Figure 2 shows qualitative examples of predictions from both modules. Guided gradient-weighted class activation maps [15] illustrate how regions of anatomical and pathological interest are highlighted, indicating the networks have learned meaningful, human interpretable LUS features, which may be effectively used for diagnosis assistance. The classification accuracy of the proposed quality assessment module is 0.85 using QA bin or QA nd alone, and 0.86 using the combined QA bin+nd . The classification accuracy for the diagnosis assistance module, D bin , without any quality assessment is 0.95, with specificity reaching 1.00, 0.98, and 0.98, when sensitivity is 0.80, 0.90, and 0.95, respectively. Classification accuracy of the diagnosis assistance module, D bin , following quality assessment, and rejection of images of insufficient quality, is 0.95, 0.97, and 0.95 when using QA bin , QA nd , and QA bin+nd , respectively. However, it is important to note that a decrease in classification accuracy to 0.86 was observed when testing on data of insufficient quality. The employed dataset has many fewer negative examples for training and testing, an over-optimistic scenario compared to what would be expected in real-life clinical LUS procedures. Nevertheless, the improvements seen when trained with more data of insufficient quality are likely to be greater and more impactful for less experienced users. Similar constraints on the diagnosis assistance module need to be considered when interpreting the results reported in this work. The experimental data were acquired by experienced clinicians, although there is still a significant proportion of frames that decreased performance of the diagnostic accuracy without prior quality assessment. This work is the first, to our knowledge, to provide machine learning assistance for acquisition and diagnostic classification using point-of-care LUS, validated on clinically acquired data from COVID-19-positive patients. Our system addresses an often overlooked issue in LUS, ensuring images are of sufficient quality prior to machine learning assisted diagnosis. We have shown the feasibility of using LUS for diagnosis assistance in detecting positive COVID-19 cases from healthy controls. Our novel method combines a supervised classifier with a novelty detection algorithm to form a quality assessment module, which is validated with a diagnosis assistance module, used to predict abnormalities in LUS images. This system has the potential to be extended to other clinical tasks for diagnosis, staging, and predicting clinical outcomes during LUS procedures, and will be validated prospectively by providing feedback to inexperienced clinicians. International evidence-based recommendations for point-of-care lung ultrasound. Intensive care medicine Point-of-care lung ultrasound in patients with COVID-19-a narrative review COVID-19 outbreak: less stethoscope, more ultrasound. The Lancet Respiratory Medicine Lung ultrasonography for early management of patients with respiratory symptoms during COVID-19 pandemic Lung Ultrasound findings are associated with mortality and need of intensive care admission in COVID-19 patients evaluated in the Emergency Department Machine learning for medical ultrasound: status, methods, and future opportunities automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS) Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound Lung ultrasound training: a systematic review of published literature in clinical lung ultrasound training The challenges of deploying artificial intelligence models in a rapidly evolving pandemic Directing Ultrasound Probe Placement for Image Guided Prostate Radiotherapy Automatic Probe Movement Guidance for Freehand Obstetric Ultrasound Very deep convolutional networks for large-scale image recognition Adversarially learned one-class classifier for novelty detection Grad-CAM: visual explanations from deep networks via gradient-based localization