key: cord-0590366-b2pu72k0
authors: Chatterjee, Soumick; Das, Arnab; Mandal, Chirag; Mukhopadhyay, Budhaditya; Vipinraj, Manish; Shukla, Aniruddh; Rao, Rajatha Nagaraja; Sarasaen, Chompunuch; Speck, Oliver; Nurnberger, Andreas
title: TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models
date: 2021-10-16
journal: nan
DOI: nan
sha: 3b5b117769ab60044a68fad0704605b2b173bd7c
doc_id: 590366
cord_uid: b2pu72k0

Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods.

The use of artificial intelligence is widely prevalent today in medical image analysis. It is, however, imperative to do away with the black-box nature of Deep Learning techniques to gain the trust of radiologists and clinicians, as a model's erroneous output might have a high impact in the medical domain.

Interpretability vs Explainability: Interpretability and Explainability methods aim to unravel the black-box nature of decision-making in machine learning and deep learning models. Explainability focuses on explaining the internal working mechanisms of the model. In contrast, interpretability focuses on observing the effects of changes in model parameters and inputs on the model prediction, hence attributing properties to the input, and it also requires a context, considering the audience who is to interpret the properties and characteristics for the model arXiv:2110.08429v2 [cs.CV] 7 Feb 2022 outcomes [1, 2] . It is to be noted that earlier work failed to draw a clear line of distinction between interpretability and explainability as these terms are subjective and pertain to the stakeholders, considering the audience who must understand the model and its outcomes. Interpretability and explainability nurture a sense of human-machine trust as they help the users of the machine/deep-learning models understand how certain decisions are made by the model and are not limited to statistical metric-based approaches such as accuracy or precision. This in turn, allows employing such models in mission-critical problems such as medicine, autonomous driving, legal systems, banking etc. The basis of measuring model transparency is said to comprise simulatability, decomposability, and algorithmic transparency [2] ; and model functionality which consists of textual descriptions of the model's output and visualisations of the model's parameters. The goals to be attained through interpretability of models [1] are trust, reliability, robustness, fairness, privacy, and causality. Explainability can be formulated as the explanation of the decisions made internally by the model that in turn generates the observable, external conclusions arrived at by the model. This promotes human understanding of the model's internal mechanisms and rationale, which in turn helps build user trust in the model [3] . The evaluation criteria for model explainability include comprehensibility by humans, fidelity and scalability of the extracted representations, and scalability and generality of the method [4] .

Some models are transparent and hence inherently understandable, such as Decision Trees, K-Nearest Neighbours, Rule-based and Bayesian models. There are even some deep learning based models, like GP-UNet [5] and CA-Net [6] , which are inherently trying to provide an explanation concerning their outcome. but some are opaque and require post-hoc explanations. A post-hoc model explanation can be through model-agnostic and model-specific techniques that aim towards explaining a black-box model in human-understandable terms. The European Union has also incorporated transparency and accountability of models in 2016 as a criterion in the ethical guidelines of trustworthy AI [7, 8] . Through interpretability and explainability, to some degree, the following goals can be achieved: trustworthiness, causality, confidence, fairness, informativeness, transferability and interactivity, which in turn can help improve the model [7] .

Data scientists can generate interpretability-explainability results for their opaque models and present the model's reasoning or the model's inner mechanism to the domain experts (e.g. clinicians). If the reasoning is the same as a human domain expert would have done, then the experts can have more faith in that model, which in turn implies that the model can be incorporated into real-life workflows. Apart from building trust in the models, interpretability and explainability techniques can be used by the data scientists to improve their models as well, by discussing the outcomes with the domain experts and improving the model based on expert feedback. Moreover, accurate (verified by experts) interpretability-explainability results can be used as part of automatic or semi-automatic teaching programs for trainees.

For classification models, there are multiple interpretability ad explainability techniques supported by various libraries such as Captum 1 , Torchray [9] and CNN Visualization library 2 and many of the techniques are introduced in the following section of this paper. However, this is more challenging for segmentation as the output is more complex than just a simple class prediction. In this contribution, various interpretability and explainability techniques used for classification models have been adapted to work with segmentation models. On applying the interpretability techniques, essential features or areas of the input image can be visualised, on which the model's output is critically based. By applying explainability techniques, a better understanding of the knowledge represented in the model's parameters can be achieved.

This paper proposes a unified, flexible and scalable interpretability-explainability pipeline for PyTorch, TorchEsegeta, which leverages post-hoc interpretability and explainability methods and can be applied on 2D or 3D deep learning models working with images. Apart from implementing the existing methods for classification models, this research extends them for segmentation models. This pipeline can be applied to trained models with little to no modification, using either a graphical user interface for easy access or directly using a python script. It also provides an easy platform for incorporating new interpretability or explainability techniques. Moreover, this pipeline provides features to evaluate the interpretability and explainability methods using two different methods. First, using cascading randomisation of the model's weights, which can help evaluate how much the results are dependent on the actual weights, and second, using quantitative metrics: infidelity and sensitivity. Finally, the pipeline was applied on models trained to segment vessels from magnetic resonance angiograms (MRA), and the interpretability results are shown here. Furthermore, apart from the technical contributions with the TorchEsegeta pipeline, this paper also provides a comprehensive overview of several post-hoc interpretability and explainability techniques.

This paper presents the TorchEsegeta framework, which integrates various interpretability and explainability techniques available in different libraries and extends these techniques for segmentation models. It is noteworthy that the development of this pipeline started with the exploration of various interpretability techniques for classifying COVID-19 and other types of pneumoniae [10] . An initial pipeline was developed under that project for classification models, but only for 2D images. This research further extends that for 3D volumetric images, as well as for segmentation models. Apart from incorporating these features, the original pipeline was further streamlined and improved during this research -to create the first version of TorchEsegeta.

The following interpretability and explainability based libraries have been explored in this research work -Captum, CNN visualisation, TorchRay, DeepDream, Lucent, LIME and SHAP Captum is a library built on PyTorch and is used to provide the interpretability of machine/deep-learning models. It provides many algorithms that evaluate the contribution of different features in providing a model's prediction and thus helps in improving the model.

CNN visualisations [11] provides different implementations for the different interpretability techniques and visualisations for CNN based models architectures. TorchRay 3 is a package used for several visualisation methods for convolutional neural network architectures using PyTorch. It focuses on interpretability wherein it attempts to determine which regions of the input image influence the final prediction made by the model [9] . LIME (Local Interpretable Model-agnostic Explanations) is a model-agnostic technique that provides post-hoc model explanations to explain the decisions of the deep learning model, and due to its model-agnostic nature, LIME flexibly explains any unknown model [7, 8, 12] SHAP 4 (SHapley Additive exPlanation) is a post-hoc, game-theoretical approach that computes shapley values in order to explain the prediction made by the deep learning model [13] . Lucent 5 is the PyTorch implementation of lucid for the explainability of deep learning models. It aims to explain the decision made by the deep neural network by explaining what is being learnt by the various layers of the network. DeepDream 6 is also an explainability technique that is used to visualise the parameters learnt by the convolutional neural network.

Interpretability techniques help to understand the focus area of a model -can help understand the reasoning done by the model. These techniques can be categorised into two groups: model attribution and layer attribution.

Model attribution techniques are the techniques to assess the contribution of each attribute to the prediction of the model. There is a long list of methods available in the literature under the rubric model attribution techniques, hence for better understanding, they are further divided here into two groups: feature-based and gradient-based.

Feature-based Techniques: Methods under this group bring into play input and/or output feature space to compute local or global model attributions.

(a) Feature Permutation -This is a perturbation based technique [14, 15] in which the value of an input or group of inputs is changed utilising random permutation of values in a batch of inputs and calculating its corresponding change in output. Hence, meaningful feature attributions are calculated only when a batch of inputs is provided. 

As all possible permutations are considered, this technique is computationally expensive, and this can be overcome by sampling the permutations and averaging their marginal contributions instead of considering all possible permutations such as the ApproShapley sampling technique [16] (c)

Feature Ablation -This method works by replacing an input, or a group of inputs with another value defined by a range or reference value and the feature attribution of the input or group of inputs such as a segment are computed. This method works based on perturbation. (d)

Occlusion -Similar to the Feature Ablation method, Occlusion [17] works as a perturbation based approach wherein the continuous inputs in a rectangular area are replaced by a value defined by a range or a reference value. Using this approach, the change in the corresponding outputs is calculated in order to find the attribution of the feature.

(e) RISE Randomised Input Sampling for Explanation of Black-box Models -It generates an importance map indicating how salient each pixel is for the model's prediction [18] . RISE works on black-box models since it does not consider gradients while making the computations. In this approach, the model's outputs are tested by masking the inputs randomly and calculating the importance of the features. (f) Extremal Perturbations -They are regions of an image that maximally affect the activation of a certain neuron in a neural network for a given area in an image [9] . Extremal Perturbations lead to the largest change in the prediction of the deep neural network, when compared to other perturbations defined by

for the chosen area a. The paper also introduces area loss to enforce the restrictions while choosing the perturbations. (g)

Score-weighted Class Activation (Score CAM) -It is a gradient-independent interpretability technique based on class activation mapping [19] . Activation maps are first extracted, and each activation then works as a mask on the original image, and its forward-passing score on the target class is obtained. Finally, the result can be generated by the linear combination of score-based weights and activation maps 7 . Given a convolutional layer l, class c, number of channels k and activations A:

Gradient-based Techniques:

This group of methods mainly use the model's parameter space to generate the attribute maps -calculated with the help of the gradients.

(a) Saliency -It was initially designed for visualising the image classification done by convolutional networks and the saliency map for a specific image [20] . It is used to check the influence of each pixel of the input image in assigning a final class score to the image I using the linear score model S c (I) = w T c I + b c for weight w and bias b.

Guided Backpropagation -In this approach, during the backpropagation, the gradients of the outputs are computed with respect to the inputs [21] . RELU activation function is applied to the input gradients, and direct backpropagation is performed, ensuring that the backpropagation of non-negative gradients does not occur.

Deconvolution -This approach is similar to the guided backpropagation approach wherein it applies RELU to the output gradients instead of the input gradients and performs direct backpropagation [17] . Similarly, RELU backpropagation is overridden to allow only non-negative gradients to be backpropagated.

Input X Gradient -It extends the saliency approach such that the contribution of each input to the final prediction is checked by multiplying the gradients of outputs and their corresponding inputs in a setting of a system of linear equations AX = B where A is the gradients and B is the calculated final contribution of input X [22] .

Integrated gradients -This technique [23] calculates the path integral of the gradients along the straight line path from the baseline x to the input x. It satisfies the axioms of Completeness i.e. the attributions must account for the difference in output for the baseline x ' and input x, Sensitivity i.e. a non-zero attribution must be provided even to inputs that flatten the prediction function where the input differs from the baseline, and Implementation Invariance of gradients i.e. two functionally equivalent networks must have identical feature attributions. It requires no instrumentation of the deep neural network for its application. All the gradients along the straight line path from x to x are integrated along the i th dimension as:

Grad Times Image -In this technique [24] the gradients are multiplied with the image itself. It is a predecessor of the DeepLift method as the activation of each neuron for a given input is compared to its reference activation, and contribution scores are assigned according to the difference for each neuron.

DeepLift -It is a method [25] that considers not only the positive but also the negative contribution scores of each neuron based on its activation concerning its reference activation. The difference in the output of the activation function of the neuron t under observation is calculated based on the difference in input with respect to a reference input. Contribution scores for each of the preceding neurons x 1 , ..x n that influence t are assigned C x i t that sums up to the difference in t's activation:

DeepLiftShap -This method extends the DeepLift method and computes the SHAP values on an equivalent, linear approximation of the model [13] . It assumes the independence of input features. For model f, the effective linearization from SHAP for each component is computed as:

(i) GradientShap -This technique also assumes the independence of the inputs and computes the game-theoretic SHAP values of the gradients on the linear approximation of the model [13] . Gaussian noise is added to randomly selected points, and the gradients of their corresponding outputs are computed. (j)

Guided GradCAM -The Gradient-weighted Class Activation Mapping approach provides a means to visualise the regions of an image input that are predominant in influencing the predictions made by the model. This is done by visualising the gradient information pertaining to a specific class in any layer of the user's choice. It can be applied to any CNN-based model architecture. The guided backpropagation and GradCAM approaches are combined by computing the element-wise product of guided backpropagation attributions with upsampled GradCAM attributions [26] . (k)

Grad-Cam++ -Generalised Gradient-based Visual Explanations for Deep Convolutional Networks is a method [27] that claims to provide better predictions than the Grad-CAM and other state-of-the-art approaches for object localisation and explaining occurrences of multiple object instances in a single image. This technique uses a weighted combination of the positive partial derivatives of the last convolutional layer's feature maps concerning a specific class score as weights to generate a visual explanation for the corresponding class label. The importance w c k of an activation map A k over class score Y c is given by

Grad-CAM++ provides human-interpretable visual explanations for a given CNN architecture across multiple tasks, including classification, image caption generation and 3D action recognition. Smooth Grad -In this method [19, 28] random Gaussian noise N(0, σ 2 ) is added to the given input image and the corresponding gradients are computed to find the average values over n image sampleŝ

Vanilla and guided backpropagation techniques can be used to calculate the gradients.

Layer Attribution methods evaluate the effect of each individual neuron in a particular layer on the model output. There are various types of layer attribution methods that have been explored as part of this research work. Perturbation-Based approaches [19] perturb the original input and observe the change in the prediction of the model, and are highly time-consuming.

Inverted representation -The aim of this technique is to generate the given input image after a number of n target layers. An inverse of the given input image representation is computed [29] in order to find an image Φ(x 0 ) whose representation best matches the input image Φ 0 while minimising the given loss l such that

2.

Layer Activation with Guided Backpropagation -This method [30] is quite similar to guided backpropagation, but instead of guiding the signal from the last layer and a specific target, it guides the signal from a specific layer and filter. The guided backpropagation method adds an additional guidance signal from the higher layers to the usual backpropagation.

Layer DeepLift -This is the DeepLift method as mentioned in the model attribution techniques [13] , but applied with respect to the particular hidden layer in question.

Layer DeepLiftShap -This method is similar to the DeepLIFT SHAP technique mentioned in the model attribution techniques [13] , applied for a particular layer. The original distribution of baselines is taken, and the attribution for each input-baseline pair is calculated using the Layer DeepLIFT method, and the resulting attributions are averages per input example. Assuming model linearity, φ( f 3 , y) ≈ m y i 5.

Layer GradCam -The GradCam attribution for a given layer is provided by this method [26] . The target output's gradients are computed concerning the particularly given layer. The resultant gradients are averaged for each output channel (dimension 2 of output). The average gradient for each channel is then multiplied by the layer activations. The results are then added over all the channels. For the class feature weights w, global average pooling is performed over the feature maps A such that

Layer GradientShap -This is analogous with GradientSHAP [13] method as mentioned in the model attribution techniques but applied for a particular layer. Layer GradientSHAP adds Gaussian noise to each input sample multiple times, wherein a random point along the path between baseline and input is selected, and the gradient of the output with respect to the identified layer is computed. The final SHAP values approximate the expected value of gradients * (layer activation of inputs -layer activation of baselines).

Layer Conductance -This method [31] provides the conductance of all neurons of a particular hidden layer. Conductance of a particular hidden unit refers to the flow of Integrated Gradients attribution through this hidden unit. The main idea behind using this method is to decompose the computation of the Integrated Gradients via the chain rule. One property that this method of conductance satisfies is that of completeness. Completeness means that the conductances of a particular layer add up to the prediction difference of F(x) -F(x ) for input x and baseline input x . Other properties that are satisfied by this method are that of Linearity and insensitivity.

Internal Influence -This method calculates the contribution of a layer to the final prediction of the model by integrating the gradients with respect to the particular layer under observation. The internal representation is influenced by an element j as defined by [32] such that

wherein s = g, h for the slice of the network represented by s as a function of tuple g,h.

It is similar to the integrated gradients approach where instead of the input, the gradients are integrated with respect to the layer. 9.

Contrastive Excitation Backpropagation/Excitation Backpropagation -This approach is used to generate and visualise task-specific attention maps. Excitation Backprop method as proposed by [33] is to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process wherein the most relevant neurons in the network are identified for a given top-down signal. Both top-down and bottom-up information is integrated to compute the winning probability of each neuron as defined by

for input x and weight matrix w. The contrastive excitation backpropagation is used to make the top-down attention maps more discriminative. 10. Layer Activation -It computes the activation of a particular layer for a particular input image [34] . It helps to understand how a given layer reacts to an input image. One can get an excellent idea of what part or features of the image at which a particular layer looks. 11. Linear Approximator -This is a technique to overcome inconsistencies of post-hoc model interpretation; linear approximator combines a piecewise linear component and a nonlinear component [9, 35] . The piecewise linear component describes the explicit feature contributions by piecewise linear approximation that increases the expressiveness of the deep neural network. The nonlinear component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, which in turn increases the prediction per-formance. Here, the interpretability is obtained once the model is learned in the form of feature shapes and has high accuracy. 12. Layer Gradient X Activation -This method [36] computes the element-wise product of a given layer's gradient and activation. It is a combination of the gradient and activation methods of layer attribution. The output attributions are returned as a tuple if the layer input/output contains multiple tensors or a single tensor is returned.

Explainability techniques aim to unravel the internal working mechanism of a modeltries to explain the knowledge represented inside the model's parameters.

As the network is trained using many examples, it is essential to check what has been learnt from the input image. Hence, visualising what has been learnt at different layers of the CNN based network gives rise to repetitive patterns of different levels of abstraction, enabling to interpret of what has been learnt by the layers. This visualisation can be realised using DeepDream. Given an arbitrary input image, any layer from the network can be picked, and the detected features at that layer can be enhanced. It is observed that the initial layers are sensitive to basic features in the input images, such as edges, and the deeper layers identify complex features from the input image. As an example, InceptionNet 8 was trained on animal images, and it was observed that the different layers of the network could interpret an interesting remix of the learnt animal features in any given image.

Local Interpretable Model-agnostic Explanations (LIME) is a technique for providing post-hoc model explanations. LIME constructs an approximated surrogate, locally linear model or decision tree of a given complex model that helps to explain the decisions of the original model, and due to its model-agnostic nature, it can be employed to explain any model even when its architecture is unknown [7, 8, 12] . LIME also performs the transformation of input features to obtain a representation that is interpretable to humans [4] . In the original work [37] , the authors propose LIME, an algorithm that provides explanations for a model f, that are locally faithful in the locality Π x , for an individual prediction of the model, such that users can ensure that they can trust the prediction before acting on it. LIME provides an explanation for f in the form of a model g and g ∈ G where G is a set of all possible interpretable models and Ω(g) which is the complexity of the interpretable model, is minimised. L( f , g, Π x ) is the fidelity function that measures the unfaithfulness of g in estimating f in the locality Π x . Hence, the explanation provided by LIME is [37] :

SHapley Additive exPlanation [13] is similar to LIME such that it approximates an interpretable, explanation model g of the original, complex model f, in order to explain a prediction made by the model f(x). SHAP provides post-hoc model explanations for an individual output and is model-agnostic. It calculates the contribution of each feature in producing the final prediction and is based on the principles of Game Theory such as Shapley Values [4] . SHAP works on simplified, interpretable input data x ' , which is analogous to the original input data such that x = h x (x). SHAP must ensure consistency theorems [13] ,local accuracy such that g(x ' ) matches f (x) when x = h x (x) and missingness such that if an attribute's value is 0, it does not not imply that its corresponding contribution to the prediction is 0. Hence, the contribution of each attribute x can be calculated as follows [13] :

where z ' ∈ {0, 1} M and |z ' | is the number of non-zero entries in z ' and z ' ⊆ x ' represents all z ' vectors where the non-zero entries are a subset of the non-zero entries in x ' .

Based on Lucid 9 , Lucent is a library implemented using PyTorch for the explainability of deep learning models. The implementation provides optivis, which is the main framework for providing the visualisations of parameters learnt by the different layers of the deep neural network. It can be used to visualise torchvision models with no overhead for setup. Activation atlas methods, feature visualisation methods, building blocks and differentiable image parameterisations can be used to visualise the different features learnt by the network. Activation atlas methods comprise different methods which show the network activations for a particular class or average activations in a grid cell of the image. Feature visualisation methods help understand the crucial features for a neuron, entire channel, or layer. Building blocks help to visualise the activation vector and its components for the given image. Differentiable image parameterisations find the types of image generation processes which can be backpropagated through, and this helps to perform appropriate preconditioning of the input image to improve the optimisation of the neural network.

The pipeline architecture is shown in Figure 1 . The pipeline is implemented keeping in line with the object-oriented methodology. One of the key features of the pipeline is that it is easily scalable, according to the customised needs of the end-user because of the JSON based configuration. The pipeline is a plug and play software and can be easily used by the end-users. This pipeline is publicly available on GitHub 10 .

3D input data is especially important for medical images like MRI and CT scans. Hence, the framework has been built to support both 2D and 3D input data. One of the most important features of the pipeline is scalability. New methods can be added to the pipeline seamlessly by the users and used as per requirement.

The JSON based configuration will allow the user to execute the pipeline according to their specific needs and is easy to use the feature. The logging facility allows the users to track the pipeline execution and make changes in case of any error because of wrong parameter usage.

The timeout facility makes sure that no method execution is above a certain threshold of time. That way, it saves both computation resources as well as valuable time for the end-user.

Another way of saving much valuable time for the user is by using Multi-GPU, which helps in the parallel execution of multiple methods simultaneously. The multi-threading feature enables the execution of multiple methods on the same GPU allowing the users to make optimum use of resources.

Patch-based models are supported in the pipeline and thus helps running computationally intensive methods. Existing interpretability and explainability methods do not give an option to be directly used on patch-based models.

Automatic Mixed Precision [38] facility is also incorporated in the framework aiding in the faster execution of the code. Due to the use of this technique, the overall memory consumption while executing the methods is greatly reduced. The execution time is also reduced as a result.

The interpretability techniques that are available publicly are basically for classification problems. The framework is an extension to the segmentation problem with the help of a task-specific wrapper functionality -

Pixel-wise multi-class classification 2.

Threshold-based pixel classification

In this method, the pixel scores are summed up for all pixels predicted as each class. Two main steps are performed:

a. Pixel-wise Class Assignment -In this step, for every pixel, the argmax class is computed. Let y be the image with dimensions

b. Final Output Calculation -In this step, the sum of the pixel scores for all pixels is predicted as each class is computed.

Out in = count{y ij y ij ∈ class m} (16) where 0 ≤ m ≤ N c

This method performs class identification by Otsu threshold and then sums up the pixels for each class. This task is also performed in two steps:

a. Normalisation -In this step, the input image is normalised by the following function:

b. Pixel-wise binarisation -The pixel-wise binarisation is performed with the help of Otsu thresholding.

where th = otsu(y ij norm )

The output for both the processes is a tensor with a single value for each class.

A Graphical User Interface (GUI) has been created to provide the end-user with an intuitive graphical layout to select the parameters and interpretability methods according to their requirement. The first screen shown in Figure 2 provides the user with a dialogue box for parameter selection. The user can choose the model nature, model name, dataset and many more run-time parameters. Once these selections are made, the 'Select Methods' button will display the following dialogue box as shown in Figure 3 . The users have a wide range of interpretability methods to choose from in this dialogue box. Once the interpretability methods are chosen, the method-specific parameter dialogue box is displayed to the user as in Figure 4 . The users can then choose the visualisation method, device id and many other parameters. On clicking the 'Next' button, the code will be executed in the back-end, and the output will be generated in the output path specified in Figure 2 .

In order to evaluate and compare the methods, both qualitative and quantitative aspects have been considered.

For qualitative or visual evaluation, the cascading randomisation [39] technique was implemented, in which the model weights are randomised successively, from the top to the bottom layers. The learned weights of the layers are destroyed from top to bottom by this technique. While doing so, the interpretability and explainability techniques are applied to each state of randomisation. If the interpretability-explainability results are dependent upon the model's weights -how it is supposed to be -then the quality would decrease as the amount of randomisation increases. If they are not dependent, then the results will be unaffected by the randomisation -hence, the technique is not accurate or unsuitable for the current model. 

For the quantitative evaluation, the uncertainty method was chosen due to the unavailability of ground truth data for the attribution images. Two metrics have been used for computing the uncertainty values of the attribution methods: Infidelity [40] ] and Sensitivity [40] . These metrics can be used only on model attribution methods as of now. It is to be noted that the qualitative metrics can be used on top of cascading randomisation as an additional level of evaluation.

Infidelity 11 represents the expected mean-squared error between the explanation multiplied by a meaningful input perturbation and the differences between the predictor function at its input and perturbed input. Sensitivity 11 measures the extent of explanation change when the input is slightly perturbed.

To evaluate the TorchEsegeta pipeline for segmentation models, a use-case of vessel segmentation was chosen.

The segmentation network models chosen for this use case are from the DS6 paper [41] . The models are UNet, UNet MSS and UNet MSS with Deformation. The difference between the UNetMSS and UNetMSS with Deformation is a change in the number of up-sampling and down-sampling layers and a modified activation function. UNetMSS performs downsampling five times using convolution striding and uses transported convolution for upsampling, and it includes instance normalisation and Leaky ReLU in the convolution blocks. On the contrary, the modified version applies four down-sampling layers using max-pool and performs upsampling using interpolation combined with Batch normalisation and ReLU in its convolution block.

For UNetMSS with Deformation, a small amount of variable elastic deformation is added at the time of training, along with each volume input. The authors have given a comprehensive overview of the method, and based on their results, one can see that UNetMSS with Deformation is the best performing model.

As a use case study, the previously mentioned interpretability and explainability techniques were explored while analysing the results of a vessel segmentation model trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain called DS6 [41] . The model automatically segments vessels from 3D 7 Tesla TOF-MRA images. A study about the attribution outputs of the different methods across the three models was conducted. The zoomed-in portions of the attribution images show even more closely the model's focus areas. Figure 5 shows examples of interpretability techniques compared across the three models. Figure 6 portrays similar interpretability results from U-Net MSS Deformation model. By observing the interpretability results, it can be understood where the network is focusing. In production, ground-truth segmentations are not available. By looking at the interpretability results, it might be possible to estimate in which regions the network might miss-predict: by looking at the areas where the network did not adequately focus. These interpretability results can assist radiologists to build trust in models which focus on correct regions. On the other hand, a model developer can benefit from looking at the wrong focus regions or regions with no focus -a knowledge that can be further used to improve the model architecture or the training process. In all of these attribution maps, whichever pixel is highlighted by red represents high activation, indicating the most important pixels in the input according to the respective method, and the other region represents lower to no activation.

In Figure 5 , the CNN Visualization Layer Activation Guided Backpropagation attribution map shows a lot of attributions, most of which are on parts other than the brain vessels, i.e. the region of interest. The respective zoomed images also confirm the observation. On the contrary, methods such as Captum Deeplift and TorchRay Gradient provide fewer attributions and miss out on major portions of the brain vessels. Torchray Deconvolution provides more attributions comparatively; however, most of the attributions are concentrated on certain areas of the brain. CNN Visualization Vanilla Backpropagation, Captum Deconvolution and Captum Saliency provide better results compared to the others -they provided more human-interpretable results. These methods mainly attribute on the region of interest. However, a look at the respective zoomed images would reveal that for Captum Saliency, there are attributions even in the areas surrounding the brain vessels. For the other two, the attributions are mainly on the brain vessels only. Figure 6 shows a comparison of some of the better performing interpretability methods for model U-Net MSS Deformation model. In Figure 7 and Figure 8 , the layer-wise attributions for the three models are shown, for the methods: Excitation Backpropagation, Layer Conductance, Layer DeepLift and Layer Gradient X Activation. For all the methods, the maximum attributions are shown in the Conv3 layer. Among the three models, U-Net MSS Deformation seems to focus better than the other two, as it attributions can be seen all over the brain contrary to the other two (it is to be noted the vessels are present all over the brain, and not focused in a specific region). The network focus shifts while going from the first to the last layer of the models. In the initial layers, the focus is more on selective regions of the brain. However, towards the deeper layers like Conv3, the focus is almost on the entire brain.

A comparative study of the cascading randomisation technique is shown for the outputs of 4 interpretability methods for all the models in Figure 9 .

Moreover, to show the functionality of the quantitative evaluation part of the pipeline, a few methods were compared quantitatively using infidelity and sensitivity, the scores are shown in Table 1 and 2. 

In this work, various interpretability and explainability methods were adopted for segmentation models and were used to interpret the network. A pipeline has been developed and made public on GitHub 10 , TorchEsegeta, comprised of those interpretability and explainability methods that can be applied on 2D or 3D image-based deep learning models for classification and segmentation. The pipeline was experimented with using three models -UNet, UNetMSS and the UNetMSS-Deform, for the task of vessel segmentation from MRAs, using interpretability methods. The evaluation of the methods have been done qualitatively using cascading randomisation and quantitatively using evaluation metrics. It is worth mentioning that several explainability methods are part of the pipeline, but they were not evaluated with any use-case scenario during this research. This pipeline can be used by data scientists to improve their models by tweaking their models based on the shown interpretability and explainability -to improve the reasoning of the models, in turn improving the performance. Moreover, they can use this pipeline to show it to the model's users, for them to have trust in the model while using them in high-risk situations. On the other hand, this pipeline can be used by expert decision-makers, like clinicians, as a decision support system -by understanding the reasoning done by the models, they can get assistance in decision making.

It is noteworthy that the current pipeline can be extended for reconstruction purposes, and new interpretability and explainability methods can be added to the existing pipeline. Ground truth based pixel-wise interpretability for segmentation models can be implemented, which will add a new dimension to the existing work. Nevertheless, the real evaluation of the interpretations and explanations can only be done by the domain experts -in this case of vessel segmentation, by the clinicians -who can judge whether these results are actually useful or not. This step of the evaluation was not performed under the scope of the current work and will be performed in the near future. The outputs of all models and methods show the focus without randomisation fading through cascading randomisation 1 and 2. This implies that the output of these interpretability methods relay upon the weights of the network and not just random predictions. CNN Visualization Guided GradCAM shows the strongest dependency on the weights as even with cascading 1, all the attributions disappeared. Comparing the attributions for UNet against the other two models, it can be observed that the dependency on weights is stronger for the other models.

Interpretability and explainability: A machine learning zoo mini-tour

Interpretability of deep learning models: a survey of results. 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, Internet of people and smart city innovation

Explainable artificial intelligence and machine learning: A reality rooted perspective

Principles and practice of explainable machine learning

Lesion detection from weak labels with a 3d regression network. International Conference on Medical Image Computing and Computer-Assisted Intervention

Comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE transactions on medical imaging 2020

Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion

Visual analytics for explainable deep learning

Understanding deep networks via extremal perturbations and smooth masks

Exploration of interpretability techniques for deep covid-19 classification using chest x-ray images

Toward interpretable machine learning

A unified approach to interpreting model predictions

Random forests

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously

Polynomial calculation of the Shapley value based on sampling

Visualizing and understanding convolutional networks

RISE: Randomized Input Sampling for Explanation of Black-box Models

Score-CAM: Score-weighted visual explanations for convolutional neural networks

Deep inside convolutional networks: Visualising image classification models and saliency maps

Striving for simplicity: The all convolutional net

Not just a black box: Learning important features through propagating activation differences

Axiomatic attribution for deep networks

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Learning important features through propagating activation differences

Grad-cam: Visual explanations from deep networks via gradient-based localization

Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks

Smoothgrad: removing noise by adding noise

Understanding Deep Image Representations by Inverting Them

Striving for Simplicity: The All Convolutional Net

Influence-directed explanations for deep convolutional networks

Top-down neural attention by excitation backprop

Evolving normalization-activation layers

An interpretable neural network model through piecewise linear approximation

Towards better understanding of gradient-based attribution methods for deep neural networks

Explaining the predictions of any classifier

Sanity checks for saliency maps

On the (in) fidelity and sensitivity for explanations

Nürnberger, A. DS6, Deformation-aware Semi-supervised Learning: Application to Small Vessel Segmentation with Noisy Training Data

The authors declare no conflict of interest.