key: cord-0109238-dh05gi03 authors: Gunraj, Hayden; Wang, Linda; Wong, Alexander title: COVIDNet-CT: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest CT Images date: 2020-09-08 journal: nan DOI: nan sha: 8169ff494402c4b4580f0821e41787dc56decc62 doc_id: 109238 cord_uid: dh05gi03 The coronavirus disease 2019 (COVID-19) pandemic continues to have a tremendous impact on patients and healthcare systems around the world. In the fight against this novel disease, there is a pressing need for rapid and effective screening tools to identify patients infected with COVID-19, and to this end CT imaging has been proposed as one of the key screening methods which may be used as a complement to RT-PCR testing, particularly in situations where patients undergo routine CT scans for non-COVID-19 related reasons, patients with worsening respiratory status or developing complications that require expedited care, and patients suspected to be COVID-19-positive but have negative RT-PCR test results. Motivated by this, in this study we introduce COVIDNet-CT, a deep convolutional neural network architecture that is tailored for detection of COVID-19 cases from chest CT images via a machine-driven design exploration approach. Additionally, we introduce COVIDx-CT, a benchmark CT image dataset derived from CT imaging data collected by the China National Center for Bioinformation comprising 104,009 images across 1,489 patient cases. Furthermore, in the interest of reliability and transparency, we leverage an explainability-driven performance validation strategy to investigate the decision-making behaviour of COVIDNet-CT, and in doing so ensure that COVIDNet-CT makes predictions based on relevant indicators in CT images. Both COVIDNet-CT and the COVIDx-CT dataset are available to the general public in an open-source and open access manner as part of the COVID-Net initiative. While COVIDNet-CT is not yet a production-ready screening solution, we hope that releasing the model and dataset will encourage researchers, clinicians, and citizen data scientists alike to leverage and build upon them. Coronavirus disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), continues to have a tremendous impact on patients and healthcare systems around the world. In the fight against this novel disease, there is a pressing need for fast and effective screening tools to identify patients infected with COVID-19 in order to ensure timely isolation and treatment. Currently, real-time reverse transcription polymerase chain reaction (RT-PCR) testing is the primary means of screening for COVID-19, as it can detect SARS-CoV-2 ribonucleic acid (RNA) in sputum samples collected from the upper respiratory tract [1] . While RT-PCR testing for COVID-19 is highly specific, its sensitivity is variable depending on sampling method and time since onset of symptoms [2, 3, 4] , and some studies have reported relatively low COVID-19 sensitivity [5, 3] . Moreover, RT-PCR testing is a time-consuming process which is in high demand, leading to possible delays in obtaining test results. Chest computed tomography (CT) imaging has been proposed as an alternative screening tool for COVID-19 infection due to its high sensitivity, and may be particularly effective when used as a complement to RT-PCR testing [5, 4, 6] . CT imaging saw extensive use during the early stages of the COVID-19 pandemic, particularly in Asia. While cost and resource constraints limit routine CT screening specifically for COVID-19 detection [7] , CT imaging can be especially useful as a screening tool for COVID-19 infection in situations where: • patients are undergoing routine CT examinations for non-COVID-19 related reasons. For example, CT examinations may be conducted for routine cancer screening, monitoring for elective surgical procedures [8] , and neurological examinations [9] . Since such CT examinations are being conducted as a routine procedure regardless of COVID-19, there is no additional cost or resource constraints associated with leveraging such examinations for COVID-19 screening as well. • patients have worsening respiratory status or developing complications that require expedited care [10] . In such scenarios, immediate treatment of patients may be necessary and thus CT imaging is conducted on the patient for COVID-19 infection while waiting for RT-PCR testing to confirm COVID-19 infection. • patients are suspected to be COVID-19-positive but their RT-PCR tests are negative. For example, patients who have had close contact with confirmed COVID-19 cases and are exhibiting symptoms of the disease are highly suspect, but may have negative RT-PCR results. In these cases, CT imaging may be used to confirm COVID-19 infection pending positive RT-PCR results. In early studies, it was found that certain abnormalities in chest CT images are indicative of COVID-19 infection, with ground-glass opacities, patchy shadows, crazy-paving pattern, and consolidation being some of the most commonly reported abnormalities, typically with bilateral involvement [11, 12, 13, 14, 5, 4, 6] . Moreover, some studies have found that abnormalities in a patient's chest CT scan due to COVID-19 infection may be present despite a negative RT-PCR test [5, 4, 6] . However, as illustrated in Figure 1 , these imaging abnormalities may not be specific to COVID-19 infection, and the visual differences between COVID-19-related abnormalities and other abnormalities can be quite subtle. As a result, the performance of radiologists in distinguishing COVID-19-related abnormalities from abnormalities of other etiology may vary considerably [15, 16] . For radiologists, visual analysis of CT scans is also a time-consuming manual task, particularly when patient volume is high or in large studies. In this study, we introduce COVIDNet-CT, a deep convolutional neural network architecture tailored specifically for detection of COVID-19 cases from chest CT images via a machine-driven design exploration approach. We also introduce COVIDx-CT, a benchmark CT image dataset derived from CT imaging data collected by the China National Center for Bioinformation (CNCB) [17] comprising 104,009 images across 1,489 patient cases. Additionally, to investigate the decision-making behaviour of COVIDNet-CT, we perform an explainability-driven performance validation and analysis of its predictions, allowing us to explore the critical visual factors associated with COVID-19 infection while also auditing COVIDNet-CT to ensure that its decisions are based on relevant CT image features. In an effort to encourage continued research and development, COVIDNet-CT and the COVIDx-CT dataset are available to the general public in an open-source and open access manner as part of the COVID-Net [18, 19] initiative, a global open initiative for accelerating collaborative advancement of artificial intelligence for assisting in the fight against the COVID-19 pandemic. The paper is organized as follows. We first discuss related work on deep learning systems for CT-based COVID-19 detection in Section 2. Next, we discuss the construction of the COVIDx-CT dataset, the design strategy used to build COVIDNet-CT, the architecture design of COVIDNet-CT, and the explainability-driven performance validation strategy leveraged to audit COVIDNet-CT in Section 3. Following this, in Section 4, we present and discuss the results of our experiments to evaluate the efficacy and decision-making behaviour of the proposed COVIDNet-CT, as well as a comparison of COVIDNet-CT to existing deep neural network architectures for the task of COVID-19 detection on chest CT images. Finally, we draw conclusions and discuss future directions in Section 5. A number of studies have proposed deep learning systems based on chest CT imaging to distinguish COVID-19 cases from non-COVID-19 cases (which may include both normal and abnormal cases) [16, 20, 21, 22, 23, 24, 25, 26, 27, 17, 28, 29, 30] . Many of the proposed systems further identify non-COVID-19 cases as normal [20, 17, 28, 29] , non-COVID-19 pneumonia (e.g., bacterial pneumonia, viral pneumonia, community-acquired pneumonia (CAP), etc.) [21, 22, 23, 20, 17, 29, 30] , or non-pneumonia [22] . Additionally, some of the proposed systems require lung and/or lung lesion segmentation [21, 16, 25, 26, 17] , which necessitates either a segmentation stage in the proposed systems or manual segmentation by radiologists. To the best of the authors' knowledge, the proposed COVIDNet-CT deep neural network architecture is the first to be built using a machine-driven design exploration strategy specifically for COVID-19 detection from chest CT images. Explainability methods have been leveraged in some studies to investigate the relationship between imaging features and network predictions. Bai et al. [21] and Jin et al. [28] visualized importance variations in chest CT images using Gradient-weighted Class Activation Mapping (Grad-CAM) [31] . Similarly, Mei et al. [16] created heatmaps of COVID-19 infection probabilities within receptive fields by upsampling their network's predictions to match chest CT image dimensions. Zhang et al. [17] examined the correlation between key clinical parameters and segmented lung lesion features in chest CT images. To the best of the authors' knowledge, this is the first study to perform explainability-driven performance validation on deep neural networks for COVID-19 detection from chest CT images using an explainability method geared towards identifying specific critical factors as opposed to more general heatmaps that illustrate importance variations within an image. In this section, we will first describe the methodology behind the construction of the COVIDx-CT open access benchmark dataset with which we train and evaluate the proposed COVIDNet-CT deep neural network architecture. Next, we describe the methodology behind the creation of the proposed COVIDNet-CT deep neural network architecture via a machine-driven design exploration strategy, as well as the resulting deep neural network architecture. Furthermore, we discuss the implementation details as well as training strategy used to train COVIDNet-CT. Finally, we describe in detail the strategy for explainability-driven performance validation of COVIDNet-CT. To build the proposed COVIDNet-CT, we constructed a dataset of 104,009 chest CT images across 1,489 patient cases, which we refer to as COVIDx-CT. To generate the COVIDx-CT dataset, we leverage the CT imaging data collected by the CNCB [17] , which is comprised of chest CT examinations from different hospital cohorts across China as part of the China Consortium of Chest CT Image Investigation (CC-CCII). More specifically, the CT imaging data consists of chest CT volumes across three different infection types: novel coronavirus pneumonia due to SARS-CoV-2 viral infection (NCP), common pneumonia (CP), and normal controls. Figure 2 shows example CT images for each of the infection types from the constructed COVIDx-CT dataset. For NCP and CP CT volumes, slices marked as containing lung abnormalities were leveraged. Additionally, we excluded CT volumes where the background had been removed to leave segmented lung regions, as the contrast between the background and segmented lung regions can lead to model biases. Finally, we split the COVIDx-CT dataset into training, validation, and test sets, using an approximate 60%-20%-20% split for training, validation, and test, respectively. These sets were constructed such that each patient belongs to a single set. Figure 3 shows the distribution of patient cases and images in the COVIDx-CT dataset amongst the different infection types and dataset splits. Inspired by [18] , a machine-driven design exploration strategy was leveraged to create the proposed COVIDNet-CT. More specifically, machine-driven design exploration involves the automatic exploration of possible network architecture designs and identifies the optimal microarchitecture and macroarchitecture patterns with which to build the deep neural network architecture. As discussed in [18] , the use of machine-driven design exploration allows for greater flexibility and granularity in the design process as compared to manual architecture design, and ensures that the resulting architecture satisfies the given operational requirements. As such, a machine-driven design exploration approach would enable the creation of a tailored deep convolutional neural network catered specifically for the purpose of COVID-19 detection from chest CT images in a way that satisfies sensitivity and positive predictive value (PPV) requirements while minimizing computational and architectural complexity to enable widespread adoption in clinical environments where computing resources may be limited. More specifically, in this study we leverage the concept of generative synthesis [32] as our machinedriven design exploration strategy, where the problem of identifying a tailored deep neural network architecture for the task and data at hand is formulated as a constrained optimization problem based on a universal performance function U (e.g., [33] ) and a set of quantitative constraints based on operational requirements related to the task and data at hand. This constrained optimization problem is then solved via an iterative strategy, initialized with the data at hand, an initial network design prototype, and the set of quantitative constraints. Here, we specify two key operational requirements as quantitative constraints during the machine-driven design exploration process: (i) COVID-19 sensitivity ≥ 95% on the COVIDx-CT validation dataset, and (ii) COVID-19 PPV ≥ 95% on the COVIDx-CT validation dataset. These operational requirements were specified in order to ensure low false-negative and false-positive rates respectively. For the initial network design prototype, we leveraged residual architecture design principles [34, 35] , as they have been shown to enable reliable deep architectures which are easier to train to high performance. Furthermore, the output of the initial network design prototype is a softmax layer corresponding to the following prediction categories: (i) no infection (normal), (ii) non-COVID-19 pneumonia, and (iii) COVID-19 viral pneumonia. The proposed COVIDNet-CT architecture is shown in Figure 4 , and is publicly available at https://github.com/haydengunraj/COVIDNet-CT. As can be seen, the network architecture produced via a machine-driven design exploration strategy exhibits high architectural diversity as evident by the heterogeneous composition of conventional spatial convolution layers, pointwise convolutional layers, and depthwise convolution layers in a way that strikes a balance between accuracy and architectural and computational complexity. Further evidence of the high architectural diversity of the COVIDNet-CT architecture is the large microarchitecture design variances within each layer of the network (as seen by the tensor configurations of the individual layers shown in Figure 4) . Furthermore, the machine-driven design exploration strategy made heavy use of unstrided and strided projection-replication-projection-expansion design patterns (which we denote as PRPE and PRPE-S for unstrided and strided patterns, respectively) consisting of a projection to lower channel dimensionality via pointwise convolutions, a replication of the projections to increase channel dimensionality efficiently, an efficient spatial feature representation via depthwise convolutions (unstrided and strided for PRPE and PRPE-S, respectively), a projection to lower channel dimensionality via pointwise convolutions, and finally an expansion of channel dimensionality conducted by pointwise convolutions. The use of lightweight design patterns such as PRPE and PRPE-S enables COVIDNet-CT to achieve high computational efficiency while maintaining high representational capacity. While these design patterns may be difficult and time-consuming to design manually, machine-driven design allows for these fine-grained design patterns to be rapidly and automatically discovered. Finally, selective long-range connectivity can be observed in the proposed COVIDNet-CT architecture, which enables greater representational capabilities in a more efficient manner than densely-connected deep neural network architectures. Notable characteristics include high architectural diversity, selective long-range connectivity, and lightweight design patterns (e.g., PRPE and PRPE-S patterns). The proposed COVIDNet-CT was pre-trained on the ImageNet [36] dataset and then trained on the COVIDx-CT dataset via stochastic gradient descent with momentum [37] . The hyperparameters used for training are as follows: learning rate=5e-3, momentum=0.9, number of epochs=17, batch size=8. Data augmentation was applied with the following augmentation types: cropping box jitter, rotation, horizontal and vertical shear, horizontal flip, and intensity shift and scaling. In initial experiments, it was found via explainability-driven performance validation (see Section 3.5 for more details on the methodology) that erroneous indicators in the CT images (e.g., patient tables of the CT scanners, imaging artifacts, etc.) were being leveraged by the network to make predictions. To help prevent this behaviour, we introduce an additional augmentation which removes any visual indicators which lie outside of the patient's body, as illustrated in Figure 5 . Finally, we adopt a batch re-balancing strategy similar to that employed in [18] to ensure a balanced distribution of each infection type at the batch level. The proposed COVIDNet-CT was implemented, trained, and evaluated using the TensorFlow deep learning library [38] . While scalar performance metrics are a valuable quantitative method for evaluating deep neural networks, they are incapable of explaining a network's decision-making behaviour. In clinical applications, the ability to understand how a deep neural network makes decisions is critical, as these decisions may ultimately affect the health of patients. Motivated by this, we audit COVIDNet-CT via an explainability-driven performance analysis strategy in order to better understand which CT imaging features are critical to its detection decisions. Moreover, by leveraging explainability, we can ensure that COVIDNet-CT is making decisions based on relevant information in CT images rather than erroneously basing its decisions on irrelevant factors (as we have seen in initial experiments as described in Section 3.4). In this study, we leverage GSInquire [39] as the explainability method of choice for explainability-driven performance validation to visualize critical factors in CT images. GSInquire leverages the generative synthesis strategy [32] that was employed for machine-driven design exploration, and was previously shown quantitatively to provide explanations that better reflect the decision-making process of deep neural networks when compared to other state-of-the-art explainability methods [39] . Unlike approaches that generate heatmaps pertaining to importance variations within an image, GSInquire can identify specific critical factors within an image that have the greatest impact on the decision-making process. The proposed COVIDNet-CT was analysed both quantitatively and qualitatively to evaluate its detection performance and decision-making behaviour. In particular, we report quantitative performance metrics for the COVIDx-CT test set and qualitatively examine critical factors in the CT images as identified by GSInquire for explainability-driven performance validation. We quantitatively evaluate the performance of the proposed COVIDNet-CT on the COVIDx-CT dataset. For this dataset, we computed the test accuracy as well as sensitivity and PPV for each infection type at the image level. The test accuracy, architectural complexity (in terms of number of parameters), and computational complexity (in terms of number of floating-point operations (FLOPs)) of COVIDNet-CT are shown in Table 1 . As shown, COVIDNet-CT achieves a relatively high test accuracy of 99.1% while having relatively low architectural and computational complexity. This highlights one of the benefits of leveraging machine-driven design exploration for identifying the optimal macroarchitecture and microarchitecture designs for building a deep neural network architecture tailored for the task and data at hand. In the case of COVIDNet-CT, the result is a highly accurate yet highly efficient deep neural network architecture that is suitable for scenarios where computational resources are a limiting factor. In clinical scenarios, such architectures may also be suitable for use in embedded devices. We next examine the sensitivity and PPV for each infection type in Table 2 and Table 3 respectively, as well as how these statistics could impact the efficacy of COVIDNet-CT in a clinical environment. In Table 2 , we observe that COVIDNet-CT achieves good COVID-19 sensitivity (97.3%), which ensures that a low proportion of COVID-19 cases are incorrectly classified as non-COVID-19 pneumonia or normal cases. Moreover, given that RT-PCR testing is highly specific, we want to ensure that COVIDNet-CT has high sensitivity in order to effectively complement RT-PCR testing. Next, in Table 3 , we observe that COVIDNet-CT also achieves a high COVID-19 PPV, thereby ensuring a low proportion of false-positive predictions which could cause an unnecessary burden on the healthcare system in the form of isolation, testing, and treatment. Examining Figure 6 , we observe that COVIDNet-CT is extremely effective at distinguishing COVID-19 cases from normal control cases, and is capable of distinguishing non-COVID-19 pneumonia cases from COVID-19 cases for the vast majority of these cases. Interestingly, while some COVID-19 cases are incorrectly classified as non-COVID-19 pneumonia cases, far fewer non-COVID-19 cases are misclassified as COVID-19 cases. Based on these results, it is shown that COVIDNet-CT could be used as an effective standalone screening tool for COVID-19 patients, and could also be used effectively in conjunction with RT-PCR testing. However, we note that COVIDNet-CT is trained on images from a single data collection [17] , and although this collection is comprised of scans from several institutions, the ability of COVIDNet-CT to generalize to images from other countries, institutions, or CT imaging systems has not been evaluated. As such, COVIDNet-CT could be improved via additional training on a more diverse dataset. Next, we compare the performance of the proposed COVIDNet-CT with existing deep neural network architectures for the task of COVID-19 detection from chest CT images. More specifically, we compare it with the deep residual network architecture proposed in [35] (referred to here as ResNet-50), which is capable of achieving high accuracy, sensitivity, and PPV on the proposed COVIDx-CT benchmark dataset. It can be observed from Table 1 that COVIDNet-CT achieves a test accuracy 0.4% higher than that achieved with the ResNet-50 architecture while having 94.1% fewer parameters and 90.2% fewer FLOPs. Moreover, as shown in Table 2 and Table 3 respectively, COVIDNet-CT achieves higher sensitivity and specificity than the ResNet-50 architecture across all infection types. These results highlight the benefits of leveraging machine-driven design exploration to create deep neural network architectures tailored to the task, data, and operational requirements. This is particularly relevant in clinical scenarios, as the ability to rapidly build and evaluate new deep neural network architectures is critical in order to adapt to changing data dynamics and operational requirements. In this study, we leveraged GSInquire [39] to perform explainability-driven performance validation of COVIDNet-CT in order to better understand its decision-making behaviour, and to ensure that its decisions are based on diagnostically-relevant imaging features rather than irrelevant visual indicators. Figure 7 shows the critical factors identified by GSInquire in three chest CT images of patients with COVID-19 pneumonia. Examining these visual interpretations, we observe that COVIDNet-CT primarily leverages abnormalities within the lungs in the chest CT images to identify COVID-19 cases, as well as to differentiate these cases from non-COVID-19 pneumonia cases. As previously mentioned, our initial experiments yielded deep neural networks that were found via explainabilitydriven performance validation to be basing their detection decisions on irrelevant indicators such as patient tables and imaging artifacts, which highlights the importance of leveraging explainability methods when building and evaluating deep neural networks for clinical applications. Furthermore, the ability to interpret how COVIDNet-CT detects COVID-19 cases may help clinicians trust its predictions, and may also help clinicians discover novel visual indicators of COVID-19 infection which could be leveraged in manual screening via CT imaging. In this study, we introduced COVIDNet-CT, a deep convolutional neural network architecture tailored for detection of COVID-19 cases from chest CT images via machine-driven design exploration. Additionally, we introduced COVIDx-CT, a benchmark CT image dataset consisting of 104,009 chest CT images across 1,489 patients. We quantitatively evaluated COVIDNet-CT using the COVIDx-CT test dataset in terms of accuracy, sensitivity, and PPV. Furthermore, we analysed the predictions of COVIDNet-CT via explainability-driven performance validation to ensure that its predictions are based on relevant image features and to better understand the CT image features associated with COVID-19 infection, which may aid clinicians in CT-based screening. In our analyses, we observed that COVIDNet-CT is highly performant when tested on the COVIDx-CT test dataset, and that abnormalities in the lungs are leveraged by COVIDNet-CT in its decision-making process. While COVIDNet-CT is not yet suitable for clinical use, we publicly released COVIDNet-CT and instructions for constructing the COVIDx-CT dataset as part of the COVID-Net open intiative in order to encourage broad usage and improvement by the research community. In the future, the performance and generalizability of COVIDNet-CT may be improved by expanding and diversifying the COVIDx-CT dataset, and COVIDNet-CT may also be extended to additional clinical tasks such as mortality risk stratification, lung function analysis, COVID-19 case triaging, and treatment planning. However, the ability to build solutions for these tasks is contingent on the availability of high-quality datasets. Finally, additional analysis of the explainability results may be performed in the future to identify key patterns in the CT images which may aid clinicians in manual screening. Detection of sars-cov-2 in different types of clinical specimens Evaluating the accuracy of different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding of 2019-ncov infections Stability issues of rt-pcr testing of sars-cov-2 for hospitalized patients clinically diagnosed with covid-19 Correlation of chest ct and rt-pcr testing for coronavirus disease 2019 (covid-19) in china: A report of 1014 cases Sensitivity of chest ct for covid-19: Comparison to rt-pcr Chest ct for typical coronavirus disease 2019 (covid-19) pneumonia: Relationship to negative rt-pcr testing Acr recommendations for the use of chest radiography and computed tomography (ct) for suspected covid-19 infection Pulmonary pathology of early-phase 2019 novel coronavirus (covid-19) pneumonia in two patients with lung cancer The role of chest computed tomography in asymptomatic patients of positive coronavirus disease 2019: A case and literature review Chest ct imaging of an early canadian case of covid-19 in a 28-year-old man Clinical characteristics of coronavirus disease 2019 in china Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan, china Ct imaging features of 2019 novel coronavirus (2019-ncov) Time course of lung changes at chest ct during recovery from coronavirus disease 2019 (covid-19) Performance of radiologists in differentiating covid-19 from non-covid-19 viral pneumonia at chest ct Artificial intelligence-enabled rapid diagnosis of patients with covid-19 Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Covidnet-s: Towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest x-rays for sars-cov-2 lung disease severity Ai augmentation of radiologist performance in distinguishing covid-19 from pneumonia of other etiology on chest ct Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: Evaluation of the diagnostic accuracy Application of deep learning technique to manage covid-19 in routine clinical practice using ct images: Results of 10 convolutional neural networks Diagnosis of covid-19 using ct scan images and deep learning techniques Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution Deep learning-based detection for covid-19 from chest ct using weak label Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks Development and evaluation of an ai system for covid-19 diagnosis Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images A deep learning algorithm using ct images to screen for corona virus disease Grad-cam: Visual explanations from deep networks via gradient-based localization Ferminets: Learning generative machines to generate efficient neural networks via generative synthesis Netscore: Towards universal metrics for large-scale performance analysis of deep neural networks for practical usage Deep residual learning for image recognition Identity mappings in deep residual networks Imagenet: A large-scale hierarchical image database On the momentum term in gradient descent learning algorithms TensorFlow: Large-scale machine learning on heterogeneous systems Do explanations reflect decisions? a machine-centric strategy to quantify the performance of explainability algorithms We would like to thank Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs program, CIFAR, DarwinAI Corp., NVIDIA Corp., and Hewlett Packard Enterprise Co.Author contributions statement H.G. and A.W. conceived the experiment, H.G., L.W., and A.W. conducted the experiment, H.G. and A.W. analysed the results. All authors reviewed the manuscript. Competing interests: L.W. and A.W. are affiliated with DarwinAI Corp. Ethics Approval: The study has received ethics clearance from the University of Waterloo (42235).