key: cord-0514547-rsiq6wzp authors: Wong, Alexander; Dorfman, Adam; McInnis, Paul; Gunraj, Hayden title: Insights into Data through Model Behaviour: An Explainability-driven Strategy for Data Auditing for Responsible Computer Vision Applications date: 2021-06-16 journal: nan DOI: nan sha: ad4e74cfb08c25431927f778aa6c1f2dc9b94ace doc_id: 514547 cord_uid: rsiq6wzp In this study, we take a departure and explore an explainability-driven strategy to data auditing, where actionable insights into the data at hand are discovered through the eyes of quantitative explainability on the behaviour of a dummy model prototype when exposed to data. We demonstrate this strategy by auditing two popular medical benchmark datasets, and discover hidden data quality issues that lead deep learning models to make predictions for the wrong reasons. The actionable insights gained from this explainability driven data auditing strategy is then leveraged to address the discovered issues to enable the creation of high-performing deep learning models with appropriate prediction behaviour. The hope is that such an explainability-driven strategy can be complimentary to data-driven strategies to facilitate for more responsible development of machine learning algorithms for computer vision applications. The rise of open source benchmark datasets [10, 6, 3, 13, 16, 11, 14] has led to significant progress in machine learning for computer vision. A common assumption made when leveraging benchmark datasets is that they are curated in a way that is free of data quality issues. Therefore, such datasets are often used 'as is' and sight unseen in practice to train new models, relying on scalar performance metrics to judge a model's efficacy. However, data auditing in recent studies [2, 9, 12] have unveiled hidden biases and data quality issues in well-known benchmark datasets, which can negatively affect real-world performance of models trained on such datasets. Most data auditing strategies consider only data characteristics and not model behaviour, and thus such strategies are subjective, based largely on human intuition, and can leave hidden issues that affect model behaviour negatively. In this study, we take a different approach and explore an explainability-driven strategy to data auditing, where actionable insights into data are discovered through the eyes of explainability based on a model's behaviour. This paper is organized as follows. Section describes the underlying methodology behind explainability-driven strategy to data for the development of responsible computer vision applications auditing. Section 2 describes the experiments conducted on two popular medical benchmark datasets. Section 3 presents the results in terms of the hidden data quality issues that were discovered during the auditing process, as well as actionable insights and steps taken as a result of such insights. Finally, conclusions are drawn in Section 4. In this study, we aim to explore the efficacy of an explainability-driven strategy to data auditing for the development of responsible computer vision applications, which is a conceptual departure from the direction taken by existing data-driven strategies in research literature. More specifically, explainability-driven data auditing is conducted as follows (see Figure 1 ). First, a dummy model prototype is constructed and trained with the data under investigation. Second, the data is fed back into the trained dummy model prototype and a quantitative explainability technique is leveraged to identify the critical factors driving the behaviour of the prototype across the data. Third, the identified critical factors are studied to discover hidden data quality issues. An unique aspect of this explainabilitydriven strategy to data auditing is that hidden data quality issues are discovered based on prediction behaviour through the eyes of explainability, and as such has the potential to compliment data-driven strategies to uncover hidden issues not identified based on considering just data characteristics alone. Figure 1 . Overview of the explainability-driven data auditing workflow. A dummy model prototype is trained using the data under investigation. A quantitative explainability technique is used to identify critical factors in the data that drives the prediction behaviour of the prototype. The critical factors are audited to discover hidden data quality issues. We demonstrate this strategy by auditing two popular benchmark datasets (OSIC Pulmonary Fibrosis Progression dataset [8] and CNCB COVID-19 CT dataset [17] ) on dummy deep CNN regression and classification model prototypes, respectively. For explainability we leverage GSInquire [7] , which was demonstrated to better reflect a model's decision-making process when compared to stateof-the-art approaches, and one of the only approaches that can be used on deep learning regression models. In particular, the OSIC dataset is part of a popular Kaggle challenge, with the winning solution [1] using the data largely 'as is' without considering the CT modality used. These are good use cases given the importance of responsible computer vision in healthcare. In this section, we will discuss the hidden data quality issues that were discovered using the proposed explainabilitydriven strategy for data auditing, as well as the steps taken to address these issues based on the actionable insights gained from the data auditing strategy. The explainability-driven data auditing led to the discovery of several hidden data quality issues that caused the Figure 2 . Actionable insights gained from the explainabilitydriven data auditing process can be used to address discovered data quality issues. dummy model prototypes to make predictions for the wrong reasons, even if performance is high based on scalar metrics. These include: 1) incorrect calibration metadata led to data dynamic range being erroneously used by the model prototype to make predictions, 2) synthetic padding (Fig. 3a) introduced during data curation being used to erroneously guide predictions, 3) circular artifacts (Fig. 3b) being used by the model to erroneously guide predictions, and 4) patient tables (Fig. 3c) being used by the model to make predictions. The discovered data quality issues led to the following actionable insights: 1) incorrect calibration data removal, 2-3) domain-specific artifact mitigation, and 4) automatic table removal (see Figure 2 ). By taking the above actions on the data set to address the discovered data quality issues uncovered via the aforementioned explainability-driven strategy for data auditing, the resulting deep learning models not only achieved significantly higher performance [15, 5] , but also led to models that made predictions based on the right visual cues. For example, in the case of the OSIC Pulmonary Fibrosis Progression dataset, addressing the discovered data quality issues led to the creation of a deep CNN regression model [15] with state-of-the-art performance above the winning solutions in the OSIC Kaggle Challenge [1] that learned to leverage relevant visual anomalies such as honeycombing in the lungs (see Fig. 4 for example CT images from the OSIC Pulmonary Fibrosis Progression dataset and corresponding identified critical factors). In the case of CNCB COVID-19 CT dataset, addressing the discovered data quality issues led to the creation of deep CNN classification models [5, 4] with state-of-the-art performance (exceeding 98% accuracy) that learned to leverage relevant visual anomalies in the lungs such as ground-glass opacities and bilateral bilateral patchy opacities. In this study, an explainability-driven strategy for data auditing was explored and conducted on two different popular medical benchmark datasets. The proposed data auditing strategy led to the discovery of critical hidden data quality issues that led to incorrect prediction behaviour of deep learning models, and the actionable insights gained was leveraged to addressing these issues and enable the creation of deep learning models that not only had higher performance but also made predictions based on the right reasons. The hope is this explainability-driven strategy can compliment data-driven strategies to facilitate for more re-sponsible machine learning-driven computer vision development. OSIC pulmonary fibrosis progression 1st place solution Auditing Imagenet: Towards a model-driven framework for annotating demographic attributes of large-scale image datasets The pascal visual object classes (VOC) challenge. IJCV Example CT images from the OSIC Pulmonary Fibrosis Progression dataset, with the highlighted areas corresponding to the critical factors used by a state-of-the-art deep CNN regression model for predicting fibrosis progression, as identified via quantitative explainability. It can be observed that, by addressing the discovered data quality issues identified via explainability-driven data auditing Covid-net ct-2: Enhanced deep neural networks for detection of covid-19 from chest ct images through bigger, more diverse learning Covidnet-ct: A tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images Do explanations reflect decisions? a machine-centric strategy to quantify the performance of explainability algorithms OSIC. OSIC pulmonary fibrosis progression Large image datasets: A pyrrhic win for computer vision? Covidnet: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations ChestX-Ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest x-rays for sars-cov-2 lung disease severity Fibrosis-net: A tailored deep convolutional neural network design for prediction of pulmonary fibrosis progression from chest ct images Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography