key: cord-0937720-8fxt3bxk
authors: Singh, Gurmail
title: Think positive: An interpretable neural network for image recognition
date: 2022-04-04
journal: Neural Netw
DOI: 10.1016/j.neunet.2022.03.034
sha: 4027d04cd1b4746688ce5a78d378aa753e91467c
doc_id: 937720
cord_uid: 8fxt3bxk

The COVID-19 pandemic is an ongoing pandemic and is placing additional burden on healthcare systems around the world. Timely and effectively detecting the virus can help to reduce the spread of the disease. Although, RT-PCR is still a gold standard for COVID-19 testing, deep learning models to identify the virus from medical images can also be helpful in certain circumstances. In particular, in situations when patients undergo routine [Formula: see text]-rays and/or CT-scans tests but within a few days of such tests they develop respiratory complications. Deep learning models can also be used for pre-screening prior to RT-PCR testing. However, the transparency/interpretability of the reasoning process of predictions made by such deep learning models is essential. In this paper, we propose an interpretable deep learning model that uses positive reasoning process to make predictions. We trained and tested our model over the dataset of chest CT-scan images of COVID-19 patients, normal people and pneumonia patients. Our model gives the accuracy, precision, recall and F-score equal to 99.48%, 0.99, 0.99 and 0.99, respectively.

The pandemic COVID-19 is placing enormous strain on public health systems around the world, and severely affecting the economies of many countries. Although, vaccination is being done for the virus, but the number of the variants of the virus is also increasing. The new variants of the virus can reduce the effectiveness of the vaccines [48] . Therefore, along with vaccination for the virus, detection of the virus is important to reduce the spread of the disease and the development of mutants of the virus. In addition to the prevalent testing technique reverse transcription polymerase chain reaction (RT-PCR), deep learning models can also be helpful in efforts to detect the virus. Most of the deep learning algorithms work as a black-box because their reasoning process for their predictions is not transparent/interpretable. However, the interpretation of the reasoning process of a deep learning model related to a high stake decision is important. There have been cases where J o u r n a l P r e -p r o o f Journal Pre-proof erroneous data fed into the black-box models went unnoticed, due to which wrongful long prison sentences were given (eg, inmate Glen Rodriguez was denied parole because of wrong COMPAS score) [26] and [47] . The lack of interpretability of the reasoning processes of such deep learning models has become a major issue for whether we can trust predictions that are coming from these models. Therefore, we propose an interpretable deep learning model quasi prototypical part network (Quasi-ProtoPNet) , and tranied and tested the model over the dataset of chest CT images.

In this section, we first discuss those works that are related to our paper because of the interpretability of their reasoning process. Second, we provide a brief summary of the studies that are related to this study as they categorize medical images (chest CT-scan and X-ray images). The models in the second category attempt to distinguish medical images of COVID-19 patients from the medical images of pneumonia patients and normal people, but the models are not necessarily interpretable.

Several approaches have emerged to interpret convolutional neural networks, including posthoc interpretability analysis. Once a neural network performs the classification, posthoc analysis is used to interpret the neural network. Deconvolution [61] , saliency visualization [38, 43, 44, 47] and activation maximization [11, 19, 25, 28, 38, 59] are a few examples of posthoc analysis technique. However, these visualization approaches of posthoc analysis do not shed light on the reasoning process with clarity.

Attention-based interpretability is another technique to clarify the reasoning process of the neural networks. The instances of this technique include part-based models [12, 14, 21, 34, 37, 45, 53, 62, 63, 65] and class activation maps (CAM) [64] . In this approach, the aim of a model is to show the patches of an input image that are the focus of its attention; nonetheless, these models do not represent prototypes that resemble the parts of an input image that are the focal points of the models. Recently, a model CXR-specific with class activation maps has also been developed to detect COVID-19 from medical images [32] .

Case-based classification techniques that use prototypes [4, 31, 52] or k-nearest neighbours [30, 36, 46] are also related to our work. Throughout this paper, a prototype or a prototypical part will represent a patch of an image. Li et al. [26] have developed a model that uses full image-sized prototypes and requires a decoder for visualizing prototypes. Chen et al. [6] developed a model ProtoPNet which significantly improved on the model developed in [26] . Figure 1 , ProtoPNet is able to identify different parts of an input image that are similar to different prototypes, and it classifies an image based on the similarity scores. To classify an input image, ProtoPNet finds the Eucildean distance between each latent patch of the input image and the learned prototypes of images from different classes, where prototypes have spatial dimensions 1 × 1. The maximum of the inverted distances between a prototype and the patches of the input image is called the similarity score of the prototype. Note that, the smaller the distance, the larger the reciprocal, and there will be only one similarity score for each prototype. A weighted combination of similarity scores is used to determine the logits for different classes and these logits are normalized using Softmax to determine the class of the input image. The weights for the correct class and incorrect class of a training image are set equal to 1 and −0.5, respectively. These weights are also called connections of the similarity scores with the classes. The negative weights are assigned to include the negative reasoning process, that is, to reject the incorrect classes. ProtoPNet tries to zero out the negative weights during the training process, and with this assumption of ProtoPNet, a theorem is proved [6, Theorem 1.1]. However, our experiments show that it is hardly possible to zero out the negative connections during the training process after making a negative connection between the similarity scores and incorrect classes.

The models NP-ProtoPNet [42] , Gen-ProtoPNet [40] and Ps-ProtoPNet [41] are variations of ProtoPNet, and we refer to these four models collectively the ProtoPNet models or the series of ProtoPNet models. Gen-ProtoPNet model uses a generalized version of the Euclidean distance function, NP-ProtoPNet considers the negative reasoning process and the positive reasoning process equally, and Ps-ProtoPNet model uses the connections between logits and similarity scores as suggested by [41, Theorem 1] , and uses the generalized version of the distance function. The theorem [41, Theorem 1] uses a more realistic assumption of fixed negative connections between similarity scores and incorrect classes to find the impact of change in the negative connections on the logits. The impact on the logits is obtained due to the projection of prototypes to the actual patches of training images, that is, the replacement of the prototypes with the patches of the training images. However, the use of fixed negative connections leads to decrease in the logit of correct class and increase in the logit of incorrect classes, consequently the accuracy of Ps-ProtoPNet deceases after the projection of prototypes. In particular, the impact is more severe when the number of classes is small, see [41, Theorem 1] . In summary, each model of the series of ProtoPNet models uses the negative reasoning process along with the positive reasoning process, whereas our model Quasi-ProtoPNet uses only positive reasoning process to categorize images.

In order to get rid of the flaws of the ProtoPNet models, especially when the number of classes is small, Quasi-ProtoPNet uses only positive reasoning process by placing zero connection between the similarity scores and incorrect classes. Quasi-ProtoPNet suspends the convex optimization of the last layer to keep the connections constant, where by the suspension of the convex optimization of the last layer means that Quasi-ProtoPNet does not optimize the last layer by freezing all other layers. In addition to the positive reasoning process, Quasi-ProtoPNet uses prototypes of all types of spatial dimensions, that is, rectangular spatial dimensions and square spatial dimensions, whereas ProtoPNet model uses the J o u r n a l P r e -p r o o f Journal Pre-proof prototypes with only square spatial dimensions 1 × 1. Prototypes with large spatial dimensions help our model to classify the images on the basis of objects instead of backgrounds of the objects in the images. However, the optimum spatial dimensions need to be determined to get better accuracy.

To identify an image that has not been previously exposed, humans can compare patches of the image with patches of images of known objects. This type of reasoning is usually used in difficult identification tasks. For example, radiologists may compare suspicious tumors in an X-ray or a CT-scan image with prototype tumor images to diagnose cancer. This type of human reasoning inspired our model where comparison of image parts with learned prototypes is an integral part of the model's reasoning process. Therefore, our model differentiates between CT-scan images of a COVID-19 patient and CT-scan images of pneumonia patients based on greater similarity between the learned prototypes and the patches of images.

Several non-interpretable networks have been proposed to distinguish chest CT-scan or X-ray images of COVID-19 patients from chest CT-scan or X-ray images of pneumonia patients and normal people, see [1, 2, 5, 7, 8, 9, 10, 15, 17, 22, 23, 24, 29, 33, 32, 60] . Some studies have surveyed the machine learning/deep learning models that classify chest CT-scan images or X-ray images of COVID-19 patients, pneumonia patients and normal people. A survey by Bhattacharya et al. [3] signifies the lack of sufficient and reliable data of the medical images related COVID-19 patients for neural networks, but a model's reliability depends data. However, we experimented our model over currently publicly available the biggest dataset of the CT-scan images [16] . Few more studies [54, 55, 56, 57, 58] related to multi-view hashing and image retrieval are also worth mentioning.

We choose the dataset [16] of chest CT-scan images of COVID-19 patient, normal people and pneumonia patients to train and test our model. The dataset consists of 143778 training images and 25658 test images. We crop the images using the bounding box information provided with the dataset. Also, we use the information provided with the dataset to segregate the cropped images into three classes Covid, Normal and Pneumonia that contain the images of COVID-19 patients, normal people and pneumonia patients, respectively. We also call these classes first, second and third, and denote them by C, N and P , respectively. The classes C, N and P have 35996, 25496 and 82286 training images, and 12245, 7395 and 6018 test images, respectively. All images have been resized to the dimensions 224 × 224 as required by the base models.

The novelty of our model is that it uses positive reasoning process along with the use of prototypes that can have any type of spatial dimensions, that is, rectangular spatial dimensions and square spatial dimensions. Quasi-ProtoPNet uses an objective function different from the objective function used in the series of ProtoPNet models. The contributions of this paper are summarized below.

• Quasi-ProtoPNet uses only the positive reasoning process by maintaining zero connection between the similarity scores and incorrect classes. Quasi-ProtoPNet suspends the convex optimization of the last layer to keep the connections fixed. The suspension of the convex optimization also reduces the training time considerably.

• The architecture of Quasi-ProtoPNet helped us to prove a theorem, see Theorem 3.1.

The theorem provides the theoretical evidence of the reason of the improvement in the performance of our model over the other ProtoPNet models. We remark that the theorem is not only true for the distance function that we use for our model, but it is also true for any positive-valued function that satisfies the triangular inequality and has appropriate domain.

• Quasi-ProtoPNet uses prototypes with both types of spatial dimensions, that is, rectangular spatial dimensions and square spatial dimensions, whereas ProtoPNet model uses prototypes with only square spatial dimensions 1 × 1.

The rest of the paper is organized as follows. In Section 2, we provide a detailed information about the architecture of our model, and we explain the training procedure and reasoning process of our model. In Section 3, we provide confusion matrices for our model with different base models, and we compare the performance of our model with the ProtoP-Net models and the base models. Also, we show that the improvement in the accuracies given by our model over the accuracies given by the other ProroPNet models is statistically significant. A graphical comparison of the accuracies is provided. In this section, we also prove a theorem that finds the bounds of the changes in logits due to projection of prototypes on the training images. In Section 4, we talk about the limitations of our model. In Section 5, a brief discussion on our model and the series of ProtoPNet models is provided. Finally, in Section 6, we conclude our work.

In this section, we introduce and explain the architecture and the training process of our model Quasi-ProtoPNet in the context of CT-scan images.

Quasi-ProtPNet can be built on convolutional layers of a state-of-the-art base model (baseline), such as: VGG-19 [39] , ResNet-34, ResNet-152 [18] , DenseNet-121, or DenseNet-161 [20] . As shown in Figure 2 , Quasi-ProtoPNet consists of the convolution layers of a base model that are followed by two additional convolutional layers 2 × 1 and 1 × 1. These convolutional layers are collectively denoted by L, and they are followed by a generalized convolutional layer [13, 27] p t of prototypical parts. The layer p t is followed by a dense layer w with no bias. The parameters of L and the weight matrix of a dense layer are denoted by L conv and w m , respectively. The activation functions ReLU and Sigmoid are used for the additional second last convolutional layer and last convolution layer, respectively. Note that, convolutional layers L form a non-interpretable (black-box) part of our model whereas the generalized convolutional layer p t forms the interpretable (transparent) part of our model. Although, convolutional layers of any of the base models can be used to construct our model, we provide the explanation of Quasi-ProtoPNet when it is constructed over the convolutional layers of VGG-16. Let x be an input image. Since the output of the convolutional layers of VGG-16 has depth 512 and spatial dimensions 7 × 7, L(x) has depth 512 and spatial dimensions 6 × 6. Note that, the layer p t is a vector of prototypical units, and each prototypical unit is a tensor of the shape 512 × h × w, where 1 × 1 < h × w < 6 × 6, that is, h and w together are neither equal to 1 nor 6. Suppose n and m denote the total number of classes and prototypes for each class, respectively. Let P c = {p c l } m l=1 be the set of prototypes of a class c and P = {P c } n c=1 is set of all prototypes. For our work n = 3, but we randomly set the hyperparameter m = 10.

The shapes of L(x) and p t are 512 × 6 × 6 and 512 × h × w, where h and w lies between 1 and 6 but together they are neither equal to 1 nor 6. Therefore, each prototype can be thought of as a part of L(x). The model takes into account the spatial relationship between L(x) and the prototypical parts, and upsamples the part of L(x) (the part of L(x) that is at the smallest distance from a prototypical part) to the input image x to identify the patch on x that resembles similar to a prototype. The green rectangles in the source images are the parts of the source images from where the prototypes are actually projected. The source image of the prototypes p 1 1 , p 2 1 and p 3 10 are also shown in the Figure 2 . Similar to ProtoPNet (see Section 1.1), Quasi-ProtoPNet computes the similarity scores between an input image and prototypes p 1 1 − p 1 10 , p 2 1 − p 2 10 and p 3 1 − p 3 10 , see Figure 2 . The prototype p 1 1 , p 2 1 and p 3 10 have similarity scores 2.8001, 0.7889 and 1.0233, and the similarity score of p 1 1 is greater than the other two similarity scores. The complete list of similarity scores obtained from our experiments is given in the matrix s m , see Section 2.3.

In the dense layer w, the matrices w m and s m are multiplied to obtain the logits. The logits for the classes C, N and P are 38.0688, 10.1137 and 11.1361, respectively. The interpretability/transparency of our model comes into play when an image is classified into a certain class. Our model is able to tell the reason of the classification of the image to that class, and the reason is that the image has some patches more similar to certain learned prototypes related to that class and it shows those learned prototypes. The learned prototypes J o u r n a l P r e -p r o o f Journal Pre-proof are projected from the training images, so they are actual patched of the training images.

Quasi-ProtoPNet uses the generalized version d of the Euclidean distance function, and in this section we show that d is a generalization of the Euclidean distance function. Consider Quasi-ProtoPNet with base model VGG-16. Let x be an input image. Therefore, the shape of L(x) is 512 × 6 × 6 as described in Section 2.1. Let p be any prototype with shape 512 × h × w, where 1 ≤ h, w ≤ 6, and h and w together are neither equal to 1 nor 6. The output O(= L(x)) of the convolutional layers L has (7 − h)(7 − w) patches of dimensions h × w. Hence, square of the distance d(P ij , p) between p and (i, j) patch P ij (say) of O is:

Note that, if p has prototypes of spatial dimensions 1 × 1, that is, h = w = 1, then

which is the square of the Euclidean distance between p and a patch of O, where p 11k ≃ p k . Therefore, the function d is a generalization of the Euclidean distance function. The prototypical unit p t calculates the following.

That is,

The Equation ( 2) exhibits that a prototype p is more similar to the image x if the reciprocal of the distance between p and a latent patch of x is smaller. Quasi-ProtoPNet is trained using the following two steps.

Let X = {x 1 . . . x n } and Y = {y 1 . . . y n } be sets of images and associated labels, respectively, and D = {(x i , y i ) : x i ∈ X, y i ∈ Y }. Then our objective function is:

where ClstCost is given by the equation

The Equation (4) discloses that the drop in the cluster cost (ClstCost) leads to the clustering of prototypes around their respective classes. The reduction in cross entropy leads J o u r n a l P r e -p r o o f Journal Pre-proof to better classifications, see the objective function (3) . The hyperparameters λ is set equal to 0.7. Since w m is the weight matrix for the dense layer, w (i,j) m is the weight assigned to the connection between logit of ith class and similarity score of jth prototype. Therefore, for a class c, we put w (i,j) m = 1 for all j with p i j ∈ P i , and for all p c j ̸ ∈ P i with c ̸ = i, m (c,j) w = 0. The non-negativity of the distance function and optimization of all the layers before the last layer with optimizer SGD help Quasi-ProtoPNet to learn important latent spaces.

Let x be an input image. At the second step, Quasi-ProtoPNet projects the prototypes onto the patches of x that are more similar to the prototypes. That is, a patch of x that is at a smaller distance from a prototype gets projected, and the distance must be at least 93rd percentile of all the inverted distances of the prototype from all the images. For this purpose, Quasi-ProtoPNet makes the following update: p c j ←− arg min {P:P ∈ patches(L(x i )) ∀i such that y i =c} d(P, p c j ).

In this section, we explain our model with an example of an input image as given in Figure 3 . In Figure 3 , the image in the first column belongs to the class Covid. In the second column of the figure, the green rectangle on the image are enclosing the patches of the image J o u r n a l P r e -p r o o f Journal Pre-proof that give the highest similarity score to the prototypes in the third column. In the fourth column, the rectangles are enclosing the patches on the source images of the prototypes, that is, the rectangles are pinpointing the patches on the source images from where the prototypes are projected. In the fifth column, similarity scores between the prototypes and patches of the test image are displayed. In the sixth column, the connections between similarity scores and the logits are given. Since the image belongs to the first class C, the similarity scores of the prototypes of the second and third class are assigned zero weight. The entries of the seventh column are obtained by multiplying similarity scores and class connections, and the logit (38.0688) for the class C is obtained by adding the entries of the seventh column. The logit for the class C can also be computed by multiplying the first row of w m with matrix s m . The logit for the classes N and P are 10.1137 and 11.1361, respectively, and can be computed by multiplying second and third row of w m with matrix s m .

The transpose of the weight matrix w m and similarity score matrix s m that we obtain from our experiments are as follows: 

Journal Pre-proof

In this section, we present the metrics given by our model and compare the performance of our model with the performance of the other models.

Suppose TP, TN, FP and FN denote the true positives, true negatives, false positives and false negatives for the Covid class. The metrics accuracy, precision, recall and F1-score are [49, 50, 51] :

In (5) and (6) 

The series of ProtoPNet models are constructed over the convolution layers of the base models. Although, the accuracies of the series of ProtoPNet models and the base models become stablize prior to 35 epochs (see Section 3.4), but we trained and tested the models for 100 epochs.

The performance comparison in the metrics is provided in Table 1 . We see from the third column of the Table 1 that when we build our model on the convolutional layers of VGG-16 then the accuracy, precision, recall and F1-score given by Quasi-ProtoPNet are 99.05, 0.98, 0.99 and 0.98, respectively. The accuracy, precision, recall and F1-score given by the models ProtoPNet, That is, our model exhibits some prototypes from the image class that are similar to some patches of the classified image. In other words, if an image is classified to a certain class then it must have some patches similar to the prototypes of that class. The model also gives prototypes that can be manually compared with some patches of the classified image to know why a certain class has been assigned to the image.

Since an accuracy is the proportion of correctly classified images among all the test images, the test of hypothesis concerning system of two proportions can be applied to determine whether the differences between the accuracies are statistical significant. Let n d be the size of test dataset. Let x 1 and x 2 be the number of images correctly classified by models 1 and 2, respectively. Let p 1 = x 1 /n d andp 2 = x 2 /n d . The statistic for the test concerning difference between two proportions (accuracies) is as follows [35] :

Suppose the models 1 and 2 give the accuracies p 1 and p 2 . Then, our hypothesis: H 0 : (p 1 − p 2 ) = 0 (null hypothesis) H a : (p 1 − p 2 ) ̸ = 0 (alternative hypothesis) Let the level of confidence (α) be 0.05. Therefore, to reject the null hypothesis, the p-value must be less than 0.025 because we have two-tailed hypothesis. Suppose p 1 represents the accuracy given by Quasi-ProtoPNet and the accuracies given by the other models are represented by p 2 . The values of test statistic Z are obtained by the above formula, see Equation (7). We use the standard normal table to obtain the associated p-values, and list the p-values in the Table 2 .

In particular, when convolutional layers of VGG-16 are used to construct the models, we get the the p-values from the accuracy given by Quasi-ProtoPNet along with the accuracies given by Ps-ProtoPNet, Gen-ProtoPNet, NP-ProtoPNet, ProtoPNet and VGG-16 equal to 0.00755, 0.00002, 0.00002, 0.00002 and 0.40905, respectively. The null hypothesis for all the p-values that correspond to the series of ProtoPNet models got rejected, because the p-values are less than 0.025, see the Table 2 . Therefore, the accuracies given by Quasi-ProtoPNet with differnt base models are statistically significantly (with 95% confidence) better than the accuracies given by the ProtoPNet models. However, the p-values given in the last column of Table 2 corresponding to the base models VGG-16, ResNet-34, ResNet-152, DenseNet-121 and DenseNet-161 are greater than 0.025. So, the accuracies given by these base models are not significantly different from the accuracies given by our model. In the Figures 10-15 , graphical comparison of the accuracies given by Quasi-ProtoPNet and the other models is provided. Although, the accuracies given the models become stable before 35 epochs, the models are trained and tested for 100 epochs over the dataset [16] , and the graphical comparisons of the accuracies are provided over 50 epochs.

The Figure 10 provides a comparison of the accuracies given by the models when they are constructed over the convolutional layers of VGG-16. Although, it is difficult to see the difference between the accuracies in the Figures 10-15 , the difference is clear before the models stabilize.

In this section, we prove a theorem similar to [6, Theorem 2.1] . The theorem [6, Theorem 2.1] assumes that the negative connections between similarity scores and incorrect classes can be made equal to zero during the training process. As mentioned in Section 1.1, our experiments show that it is hardly possible to make the negative connections zero during the training process. However, we do not need to make this assumption because our model uses only positive reasoning process, and the suspension of the convex optimization of the last layer of our model keeps the connection between similarity scores and incorrect classes zero. Furthermore, [6, Theorem 2.1] is proved with the Euclidean distance function, whereas our theorem is neither restricted to the Euclidean distance function nor to its generalized version d, but the distance function can be replaced with any positivevalued function that satisfies the triangular inequality and has an appropriate domain. However, we present the theorem with a hemimetric, a distance function more general than the distance function d. 

and ϵ is given by p t (L(x)) = max P ∈ patches(L(x)) log f 2 (P, p) + 1 f 2 (P, p) + ϵ ;

Then after projection, 1. the output logit ∆ k (say) for the correct class k can decrease at most by m log

2. the output logit ∆ k ′ (say) for incorrect classes k ′ can increase at most by m log

denote the prototypes of class c. The connection between similarity score and incorrect classes is J o u r n a l P r e -p r o o f Journal Pre-proof zero, and the suspension of the convex optimization of the dense layer keep these connections fixed. Therefore,

Let ∆ c be the difference between the output logit of class c after the projection and before the projection of prototypes.

) denote the logits after the projection and before the projection, respectively. Therefore, we have

Assume,

Therefore,

First, to prove 1, that is, to find the lower bound of ∆ k , assume c = k in the above equations (9) and (10) , where k is the correct class of x.

From the inequality given in the assumption 2, we have

Using the triangular inequality, we have

By the assumption 2, we have

Square inequality (13) and add ϵ to the result, we obtain

On rearranging inequality (14), we have

By inequalities (12) and (15), we have

J o u r n a l P r e -p r o o f Journal Pre-proof Therefore, by equations (11) and (16), we have

.

Hence, by the equations (8) and (17), we have

, that is, ∆ k ≥ −m log(1 + δ)(2 − δ).

Second, to prove 2, that is, to find the upper bound of ∆ k ′ , assume c = k ′ in the above equations (9) and (10) , where k ′ is the incorrect class of x.

By the triangle inequality,

The assumption 1 gives:

By the inequality (19), we have

The inequality (20) gives:

From the inequalities (18) and (21), we have

Again, by the triangle inequality, we have

The assumption 1 implies

Therefore, by the inequality (23), we have

J o u r n a l P r e -p r o o f

Again, by the assumption 1, we have

On simplifying the above inequality, we obtain

Therefore,

By the inequality 25, we have

On combining the inequalities (24) and (26), we obtain

On combining the inequalities (23) and (27), we have

Therefore, by equation 10, and inequality 28, we have

log(1 + δ)(2 − δ) ≤ m log(1 + δ)(2 − δ).

Hence, ∆ k ′ ≤ m log(1 + δ)(2 − δ).

As mentioned in the Section 1.1, Quasi-ProtoPNet gives better performance than the series of ProtoPNet models when classification is to be made over only a few classes. As the number of classes grows bigger, we cannot guarantee that the performance of our model will be better than the performance of ProtoPNet and Ps-ProtoPNet. However, there are many cases similar to the case of CT-scan images as discussed in this paper when we need to classify images over only a few classes. Therefore, our model can be really useful for such situations. 

Quasi-ProtoPNet model suspends the convex optimization of the last layer to keep the connections constant and it uses the objective function that accommodates only the positive reasoning process. Also, the suspension reduced the training time of our model. Quasi-ProtoPNet is closely related to the series of other ProtoPNet models, but strikingly different from them due to its reasoning process for the classifications. Quasi-ProtoPNet uses the positive reasoning process whereas other ProtoPNet models use the negative reasoning process along with the positive reasoning process that leads to decrease in their accuracy, especially when number of classes is small. In particular, our model can be useful during this pandemic when deadly mutants of coronavirus (e.g. omicron variant) are being identified.

The use of positive reasoning process along with the use of prototypes with rectangular spatial dimensions and square spatial dimensions helped our model to improve its performance over the series of the other ProtoPNet models. Moreover, as observed in Section 3.2, Quasi-ProtoPNet gives the highest accuracy (99.48%) when DenseNet-121 is used as the base model, and the highest accuracy given by Quasi-ProtoPNet is equal to the highest accuracy (99.48%) given by the noninterpretable model DenseNet-161.

Covid-chexnet: Hybrid deep learning framework for identifying covid-19 virus in chest x-rays images

Covid-deepnet: Hybrid multimodal deep learning system for improving covid-19 pneumonia detection in chest x-ray images

Deep learning and medical image processing for coronavirus (covid-19) pandemic: A survey

Prototype selection for interpretable classification

Efficient-covidnet: Deep learning based covid-19 detection from chest x-ray images

This looks like that: deep learning for interpretable image recognition

Covid-19 deep learning prediction model using publicly available radiologist-adjudicated chest x-ray images as training data: Preliminary findings

Artificial intelligence-based classification of chest x-ray images into covid-19 and other infectious diseases

Predicting covid-19 pneumonia severity on chest x-ray with deep learning

Early diagnois of covid-19-affected patients based on x-ray and computed tomography images using deep learning algorithm

Visualizing higherlayer features of a deep network

Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition

Generalizing the convolution operator in convolutional neural networks

Rich feature hierarchies for accurate object detection and semantic segmentation

Covid-net ct-2: Enhanced deep neural networks for detection of covid-19 from chest ct images through bigger, more diverse learning

Covid-net open source initiative -covidx ct-2 dataset

Covidnet-ct: A tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images

Deep residual learning for image recognition

A practical guide to training restricted boltzmann machines

Densely connected convolutional networks

Part-stacked CNN for fine-grained visual categorization

A deep learning approach to detect covid-19 coronavirus with x-ray images

Deep learning based detection and analysis of covid-19 on chest x-ray images

Accurate prediction of covid-19 using chest x-ray images through deep feature learning model with smote and machine learning classifiers

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

Efficient implementation of a generalized convolutional neural networks based on weighted euclidean distance

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

Automated detection of covid-19 cases using deep neural networks with x-ray images. Computers in biology and medicine 121

Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning

Classification using class cover catch digraphs

Analyzing interreader variability affecting deep ensemble learning for covid-19 detection in chest radiographs

An ensemble based machine learning model for diabetic retinopathy classification

Faster r-cnn: Towards real-time object detection with region proposal networks

Probability and Statistics for Engineers

Learning a nonlinear embedding by preserving class neighbourhood structure

Neural activation constellations: Unsupervised part model discovery with convolutional networks

Deep inside convolutional networks: Visualising image classification models and saliency maps

Very deep convolutional networks for large-scale image recognition

An interpretable deep learning model for covid-19 detection with chest x-ray images

Object or background: An interpretable deep learning model for covid-19 detection from ct-scan images

These do not look like those: An interpretable deep learning model for image recognition

Smoothgrad: removing noise by adding noise

Axiomatic attribution for deep networks

Selective search for object recognition

Distance metric learning for large margin nearest neighbor classification

When a computer program keeps you in jail: How computers are harming criminal justice

The effects of virus variants on covid-19 vaccines

Wikipedia contributors, 2021a. Accuracy and precision -Wikipedia, the free encyclopedia

Wikipedia contributors, 2021b. F-score -Wikipedia, the free encyclopedia

Wikipedia contributors, 2021c. Precision and recall -Wikipedia, the free encyclopedia

Prototypal analysis and prototypal regression

The application of two-level attention models in deep convolutional neural network for fine-grained image classification

Deep multi-view enhancement hashing for image retrieval

Task-adaptive attention for image captioning

Depth image denoising using nuclear norm and learning graph model

Ageinvariant face recognition by multi-feature fusion and decomposition with self-attention

Precise no-reference image quality evaluation based on distortion identification

Understanding neural networks through deep visualization

Covid-19 detection and disease progression visualization: Deep learning on chest x-rays for classification and coarse localization

Visualizing and understanding convolutional networks

Part-based r-cnns for fine-grained category detection

Learning multi-attention convolutional neural network for fine-grained image recognition

Learning deep features for discriminative localization

Interpretable basis decomposition for visual explanation

ProtoPNet uses the positive reasoning process that is inspired from reasoning process that an intelligent student uses to solve a multiple choice question

• The performance of our model is on par with the performance of the state-of-the-art non-interpretable models

• The theorem provides the evidence of better performance of our model

ProtoPNet uses prototypes with both types of spatial dimensions, that is, rectangular spatial dimensions and square spatial dimensions

• Our model keeps constant connections between similarity scores and logits. The suspension of the convex optimization of the last layer also reduces the training time considerably. Sincerely yours, Gurmail Singh -Corresponding author • Affiliation: Post Doctoral Researcher at Faculty of Engineering and Applied Science University of Regina

The author is grateful to the Faculty of Engineering and Applied Sciences at the University of Regina for making arrangement of a deep learning server for him to run his experiments.

The COVID-19 pandemic is an ongoing pandemic and is placing additional burden on healthcare systems around the world. Timely and effectively detecting the virus can help to reduce the spread of the disease. Although, RT-PCR is still a gold standard for COVID-19 testing, deep learning models to identify the virus from medical images can also be helpful in certain circumstances. However, the transparency/interpretability of the reasoning process of predictions made by such deep learning models is essential. In this paper, we propose an interpretable deep learning model that uses positive reasoning process to make predictions. We trained and tested our model over the dataset of chest CT-scan images of COVID-19 patients, normal people and pneumonia patients. Our model gives the accuracy, precision, recall and F-score equal to 99.48%, 0.99, 0.99 and 0.99, respectively.Keywords: CT-scan, Prototypes, COVID-19, Pneumonia, Interpretable.