key: cord-0628576-sfiiarlk
authors: Xu, Mengting; Zhang, Tao; Zhang, Daoqiang
title: MedRDF: A Robust and Retrain-Less Diagnostic Framework for Medical Pretrained Models Against Adversarial Attack
date: 2021-11-29
journal: nan
DOI: nan
sha: 03787067c4a012dc80fc3ed94ee5ebd8f410f301
doc_id: 628576
cord_uid: sfiiarlk

Deep neural networks are discovered to be non-robust when attacked by imperceptible adversarial examples, which is dangerous for it applied into medical diagnostic system that requires high reliability. However, the defense methods that have good effect in natural images may not be suitable for medical diagnostic tasks. The preprocessing methods (e.g., random resizing, compression) may lead to the loss of the small lesions feature in the medical image. Retraining the network on the augmented data set is also not practical for medical models that have already been deployed online. Accordingly, it is necessary to design an easy-to-deploy and effective defense framework for medical diagnostic tasks. In this paper, we propose a Robust and Retrain-Less Diagnostic Framework for Medical pretrained models against adversarial attack (i.e., MedRDF). It acts on the inference time of the pertained medical model. Specifically, for each test image, MedRDF firstly creates a large number of noisy copies of it, and obtains the output labels of these copies from the pretrained medical diagnostic model. Then, based on the labels of these copies, MedRDF outputs the final robust diagnostic result by majority voting. In addition to the diagnostic result, MedRDF produces the Robust Metric (RM) as the confidence of the result. Therefore, it is convenient and reliable to utilize MedRDF to convert pre-trained non-robust diagnostic models into robust ones. The experimental results on COVID-19 and DermaMNIST datasets verify the effectiveness of our MedRDF in improving the robustness of medical diagnostic models.

T HERE are many impressive examples of deep neural networks achieving excellent performances on medical diagnostic tasks in radiology [1] , dermatology [2] , and ophthalmology [3] , etc. However, recent studies have revealed the fact that the robustness of the state-of-the-art neural network is poor, i.e., it is easily to craft a visually imperceptible adversarial example to mislead a well-trained network with high confidence [4] - [6] . The vulnerability to adversarial examples poses a huge threat to the deployment of these models to medical diagnostic tasks that require extremely high reliability [7] - [10] . For example, the misdiagnosis of COVID-19 may cause a large number of diseases to spread. Therefore, developing a robust model to defend against adversarial attacks is very crucial in medical image field. There are many different defense strategies developed in natural image field. One of the most successful empirical defenses to date is adversarial training [5] , which can be regarded as a data augmentation technique that trains neural networks on adversarial examples. However, adversarial training in medical image is problematic as it requires a large labeled training set whereas medical data sets are usually with a small amount of labeled samples. To solve this problem, Li et al. [10] propose the semi-supervised adversarial training (i.e., SSAT) which utilizes both labeled and unlabeled data to generate psudo-labels. However, the application of SSAT is also limited, because for most medical diagnostic tasks, unlabeled data is also inaccessible, not to mention the heterogeneity between multi-site data sets acquired through different devices (i.e., data distribution difference) and the privacy of medical data. Moreover, Xue et al. [11] propose a defense mechanism which embeds an auto-encoder into the model structure and keeps high-level features invariant to general noises. However, retraining mechanism is not friendly to the medical diagnostic model that has been already deployed online. It is timeconsuming and laborious to go back to the online process. Other pre-processing based-defense methods have also shown effectiveness in natural image field. For example, Xie et al. [12] use random resizing and padding (Random R-P) to pre-process the input images before feeding the images into the models. Jia et al. propose the ComDefend [13] to transform the adversarial image to its clean version by compression and reconstruction. However, these defense methods that have good effects in the field of natural images may not be suitable for medical images. For natural images, there is strong similarity and relevance between neighbor pixels in the local structure, random resizing and image compression can help reduce the redundant information of the image, while retaining the dominant information. But for medical images, medical lesions often occupy only a few pixels. Random resizing and image compression may cause the loss of lesion features, thereby affecting the classification and defense effects. To make matters worse, there is still no effective confidence indicator for doctor to evaluate the diagnostic result of the . I, each test medical image x is perturbed by isotropic noises η to produce the noisy copies of x, then they are denoised by the pre-defined denoiser D. II, the denoised copies are input to the pre-trained medical diagnostic model h θ to get the predictions. III, the robust diagnostic result g(x) on x and the Robust Metric (RM) of the result are obtained by the majority voting on the prediction labels of denoised ones.

model. Therefore, how to reliabily improve the robustness of medical diagnostic model is still an open problem.

In this paper, we propose a novel Robust and Retrain-Less Diagnostic Framework for Medical Pretrained Models (i.e., MedRDF) to defense against adversarial attack. As shown in Fig. 1 , our proposed MedRDF can easily convert the non-robust pre-trained model to robust one in inference time without retraining. Specifically, firstly, for each queried medical image x, MedRDF produces a large number of copies (i.e., with adding common noise and denoising) around it. Secondly, the denoised copies are input into the pre-trained diagnostic model to get the prediction labels. Finally, MedRDF outputs the robust diagnostic result of medical image x by majority voting on the prediction labels of denoised ones. What's more, MedRDF also produces the Robust Metric (RM) as the confidence of the result, which can be used to instruct the doctor to adopt the diagnostic result or re-evaluate it.

The main innovations of our MedRDF can be summarized as follows:

• A novel Robust and Retrain-Less Diagnostic Framework for medical pretrained models (i.e., MedRDF) has been proposed. The MedRDF can be applied to all medical diagnostic tasks seamlessly without retraining diagnostic models, which is very convenient for diagnostic services that are already deployed online. • A novel Robust Metric (i.e., RM) based on MedRDF has been proposed. It can give the confidence score of the diagnostic result produced by MedRDF, so as to guide the following work of the doctor, such as adopting the result (with high RM) or re-evaluating this case (with low RM).

In this section, we first briefly introduce the latest developments in deep learning in the diagnosis of coronavirus disease (COVID-19) and common pigmented skin lesions. Then, the recent adversarial attacks and defense methods on natural and medical images have been reviewed.

In the past few years, high-performance deep diagnostic classification models for disease diagnosis have emerged. Here, we are going to introduce two successful applications of deep learning models in medical image analysis.

In recent years, the global outbreak of the Coronavirus disease (COVID-19) has caused tens of thousands of deaths and infected millions of people around the world. This undoubtedly poses a huge threat to the lives of the human beings and the national public health system. Any technical tool that can quickly screen for COVID-19 infection with high accuracy is vital to healthcare professionals. The main clinical tool currently used to diagnose COVID-19 is Reverse Transcription Polymerase Chain Reaction (RT-PCR), but it is expensive, less sensitive, and requires specialized medical personnel [14] . A clinical study of COVID-19 infected patients showed that most of these types of patients were infected by lung infections after being exposed to the virus [15] . Therefore, easy-to-use and low-cost X-ray (i.e., radiography) imaging has become an excellent alternative to COVID-19 diagnosis.

Many automatic algorithms have been proposed to diagnose COVID-19 from chest X-ray images [16] - [18] . In particular, deep learning methods have been considered the best performing methods [19] , including Generative Adversarial Networks (GANs) [20] , Extreme Learning Machine (ELM) [21] , and Long /Short Term Memory (LSTM) [22] . Besides, Jain et al. [15] compared Inception V3 [23] , Xception [24] , and ResNeXt [25] models which have high performance in natural image field and examined their accuracy in diagnosis of COVID-19. Morever, Schlemper et al. [26] proposed the Attention-Gated Sononet (AG-Sononet) model, which is carefully designed for fetal ultrasound images. It can also be used for COVID-19 disease diagnosis.

2) Pigmented Skin Lesions: Skin cancer is one of the most commonly diagnosed cancers worldwide. According to the 2019 statistical report of the American Association, the number of new cases and deaths of skin cancer in the United States (excluding basal cell and squamous cell skin cancer) is as high as 104,350 and 11,650, respectively [27] . Among them, melanoma accounts for the largest proportion of all lesions, and the estimated number of new cases and deaths are 92.5% and 62.1%, respectively. However, the skin cancer can be highly treated by early detection and diagnosis, thus reducing the mortality rate.

Due to the importance of early detection, many deep learning methods are used to improve the accuracy of diagnosis and expand the scale of diagnosis. For example, Li et al. [28] proposed a framework consisting of multi-scale fullyconvolutional residual networks and a lesion index calculation unit (LICU) to simultaneously address lesion segmentation and lesion classification. Yan et al. [29] proposed an attentionbased melanoma recognition method, which introduces an endto-end trainable attention module regularization for melanoma recognition.

Despite the high performance of deep neural networks in medical image diagnosis, Szegedy et al. [4] first discovered that deep networks are extremely vulnerable to the adversarial examples. The so-called "adversarial example" is added carefully designed perturbation on the original example, which is invisible to the human eye, thus misleading the network output a wrong perdiction with a high confidence. Even worse, due to the transferability of adversarial examples, the perturbation designed for one network can also be used to fool other networks.

In recent years, the adversarial attack methods for natural image have developed rapidly. Goodfellow et al. [30] proposed Fast Gradient Sign Method (FGSM) to generate adversarial examples. It calculates the gradient of the loss function with respect to the pixel, and modifies the pixel value of a fixed step along the direction of the gradient. Based on this work, Madry et al. [5] proposed an iterative attack method, which randomly starts a perturbation, and updates the pixel value multi-time along the direction of the gradient, which is called the Projected Gradient Descent (PGD). In addition to these gradient-based methods, Carlini and Wagner (C&W attack) [6] explored the use of maximum marginal loss and optimization method to generate adversarial examples with high fooling rate and small distortion with respect to the original image. In addition, more and more black-box attacks have been proposed. These so-called black-box attack can successfully change the model prediction without knowing the parameters and structure of the attacked model. Uesato et al. [31] proposed simultaneous perturbation stochastic approximation (SPSA) attack. It is a gradient-free query-based attack method, which minimizes the output logits of the true label and the largest logits of the rest of labels. Chen et al. [32] proposed the hard-label RayS attack, which only relies on the hard-label output of the target model and utilizes a fast check step to skip unnecessary searches. This significantly saves the number of queries needed for the hard-label attack.

Apart from the development of adversarial attack in the field of natural images, medical image domian has also payed more and more attention to this topic. Ma et al. [33] analyzed the different behaviors of medical images and natural images when attacked by adversarial examples, and concluded that medical images are more vulnerable to adversarial attacks. Other studies [7] , [8] , [34] evaluated the robustness of deep diagnostic models on different tasks by adversarial attacks.

Considering the importance of network robustness, many defense methods have been proposed [35] - [37] . Among which, Adversarial Traing (AT) has been demonstrated to be one of the most effective defense methods. AT can be regarded as a data augmentation technology, that trains network on adversarial examples. After that, many methods were improved based on AT and showed superior performance. TRADES [38] trades adversarial robustness off against accuracy. The objective function of TRADES is a linear combination of natural loss and regularization term. MART [39] differentiates the misclassified examples and correctly classified examples during adversarial training and adopts a regularized adversarial loss involving both adversarial and natural examples to improve the robustness of models. For medical image field, Liu et al. [40] propose the augmentation method to add adversarial synthetic nodules and adversarial attack samples to the training data to improve the generalization and the robustness of the lung nodule detection systems. However, these methods require retraining the model, which is not friendly to the medical diagnostic models that have been already deployed online.

Besides adversarial training methods, many pre-processing based-defense methods have been proposed. Xie et al. [12] use random resizing and padding (Random R-P) to pre-process the input images before feeding the images into the models to make predictions. Jia et al. propose the ComDefend [13] , which consists of a compression convolutional neural network (ComCNN) and a reconstruction convolutional neural network (RecCNN) to transform the adversarial image to its clean version. However, the random resizing and compression operators may loss the lesion features of medical images.

In this section, we will introduce in detail the datasets and pre-trained models used in our study.

Two public datasets are used in this study, including: 1) COVID-19 Radiography Database [14] : It consists of chest X-ray images with size 224×224 of COVID-19 positive, normal, and viral pneumonia images (i.e., 3-class diagnostic task). In the current release, there are 1200 COVID-19 positive images, 1341 normal images, and 1345 viral pneumonia images. We have split this dataset into training, validation, and test set with ratio 8 : 1 : 1.

2) DermaMNIST [41] : It is based on HAM10000 [42] , [43] , which consists of 10, 015 multi-source dermatoscopic images of common pigmented skin lesions. This dataset is labeled as 7 different categories (i.e., actinic keratoses, basal cell carcinoma, benign keratosis, dermatofibroma, melanocytic nevi, melanoma, vascular), as a 7-class classification task. We have split the images into training, validation and test set with ratio 7 : 1 : 2. The source images of 3 × 600 × 450 are resized into 3 × 28 × 28.

In order to better explore the effect of MedRDF on different pretrained models, the base classifiers we use in experiments are natural image based ResNet-18 and ResNet-50 [44] and medical image based AG-Sononet-16 [26] . We directly train the networks on COVID-19 and DermaMNIST datasets without fine-tuning. The ResNet-18 and AG-Sononet-16 are trained for 100 epochs using stochastic gradient descent with momentum 0.9 and weight decay 1e −6 . The initial learning rate is 1e −4 and is decayed by 0.1 on 50 and 75 epochs. The ResNet-50 is trained for 100 epochs using stochastic gradient descent with momentum 0.9 and weight decay 1e −4 . The initial learning rate is 1e −3 and is decayed by 0.1 on 50 and 75 epochs. The batch size is 10.

Let X ∈ R d denote the input space and Y = {1, · · · , K} be a finite set consists of K possible class labels. D = {(x 1 , y 1 ), · · · , (x m , y m )} is a training set with m labeled examples, where x i ∈ X is the feature vector and y i ∈ Y is the label of the i-th example. Given a medical diagnostic model h θ with parameters θ, it outputs the class label h θ (x i ) for each input image x i ∈ X :

where p k (x i , θ) is the probability (softmax on logits) of x i belonging to class k. We denote A h θ as the space of adversarial examples for the pre-trained model h θ . Adversarial example x ∈ A h θ is supposed to be quasi-imperceptible to the human eye and misclassified by h θ , i.e.,

where d(·) is the distance function, is the maximum perturbation for adversarial attack. Here we aim to design a robust diagnostic framework g to correctly classify these adversarial examples x ∈ A h θ with the pre-trained model h θ .

Inspired by random smoothing [45] , we construct a robust and retrain-less diagnostic framework (MedRDF) g for medical pretrained model h θ . η from distribution µ to produce the noisy copies, and then denoises the noisy copies by pre-defined denoiser D. Secondly. MedRDF inputs these copies to pre-trained medical diagnostic model h θ to obtain the prediction labels. Thirdly, the final diagnostic result is obtained by majority voting based on the labels of denoised copies. The MedRDF g is formulated as follows:

where h θ (·) represents h θ (D(·)), D(·) is the pre-defined denoiser. An equivalent definition is that g(x) returns the class k whose pre-image {x + η ∈ R d : h θ (x + η) = k} has the largest probability measure under the distribution µ(x, σI). The level of noise η has been bounded by σ, where the noise level σ controls the tradeoff between robustness and accuracy, i.e., the robustness of the MedRDF increases with the increasing of σ while its standard accuracy decrease.

The detailed information of our MedRDF is described as follows:

1) Isotropic Noise η: Recent studies [4] , [46] show that the non-robustness of deep networks against attacks is caused by the high nonlinearity of deep networks. Input base classifier h θ , diagnostic case x, noise distribution µ(0, σI), sampling numbers n, abstention threshold α, denoiser operator D.

Initialization array: counts[0, · · · , n − 1] Kalimeris et al. [47] show that with the continuous training of the network, a significant increasing in the curvature of the decision boundary and loss landscape will occur, and the adversarial examples are easy to hide in these isolated regions with high curvature [48] , as illustrated in Fig. 2 . Based on this observation, we add the common random noise η bounded by σ to original image, which can reduce the impact of the adversarial example in isolated area on the accuracy of the model. As shown in Fig. 2 , in the noise area, with x and x + as the center and maximum noise σ as the boundary, most examples can be correctly classified. The result is also true for the adversarial example in the isolated area. Therefore, adding isotropic noise to the original image to generate noisy copies can effectively instruct the network not to be misled by adversarial examples. However, although neural networks have certain robustness to common noise, too large noise will still lead to the accuracy decrease of h θ , which will also affect the final prediction result of g based on h θ . In the following subsection, we will introduce the denoising operator to alleviate the decline of accuracy.

2) Pre-defined Denoiser: To alleviate the accuracy decline of the base classifier h θ under large isotropic noise, denoising operator has been adopted in our MedRDF. Instead of CNNbased denoiser [49] , we use Gaussian Smoothing (GS) and median Filter (MF) as denoisers in our work, which have faster inference speed and more efficient GPU memory than CNNbased denoiser.

3) Prediction and Majority Voting: For notational convenience, we define Equation (3) as P k = P(h θ (x + η) = k). Letk A = arg max k P k . Notice that by definition, g(x) = k A . We draw n noise examples with Markov Chain Monte Carlo (MCMC) principle from distribution µ(0, σI), and inquire n noise-corrupted copies of x through the base classifier h θ (·). Sample a vector of class counts {n k } k∈Y from Multinomial({P k } k∈Y , n). Let k A = arg max k n k be the class whose count is largest. Let n A and n B be the largest count and the second-largest count, respectively. If k A appeares much more often than any other class, then the prediction of MedRDF g returns k A . Otherwise, it abstains from making a prediction. As Cohen et al. [45] declared, we use the hypothesis test from Hung & Fithian [50] to calibrate the abstention threshold so as to bound by α the probability of returning an incorrect answer. The prediction of MedRDF g satisfies the following guarantee:

Proposition 1: With probability at least 1 − α over the randomness in the prediction of MedRDF, the probability that the prediction of MedRDF returns a class other thank A is at most α, i.e.,

We use the p-value of the two-sided hypothesis test that n A is drawn from binomial distribution Binom(n A + n B , 1/2) to verify whether Equation (4) holds. If the p-value is less than α, then return k A . Else, abstain. i.e., we can adopt two-sided hypothesis test with binomial distribution (Binom) to justify the randomness in the prediction of MedRDF:

The proof is as follows: Proof 1: MedRDF returns a class other thank A if and only if (1) k A =k A and (2) MedRDF does not abstain. We have:

(6) Recall that MedRDF does not abstain if and only if the pvalue of the two-sided hypothesis test that n A is drawn from Binom(n A + n B , 1/2) is less than α. Theorem 1 in Hung & Fithian [50] proves that the conditional probability that this event occurs given that k A =k A is exactly α. That is,

Therefore, we have:

When α is small, MedRDF abstains frequently but rarely returns the wrong class. When α is large, MedRDF usually makes a prediction, but may often return the wrong class. α = 0.001 and n = 1e 4 have been adopted in our framework. The complete prediction procedure of MedRDF g is described in Algorithm 1. 

In medical diagnostic tasks, in addition to the diagnostic results output by the model, we also hope to obtain the confidence score of the results, so as to better guide the doctor's follow-up work, such as adopting the result or reevaluating this case. Therefore, in this subsection, in order to provide doctors with intuitive and effective indicator, we define a Robust Metric (RM) based on MedRDF. The formulation of RM is presented as follows:

where n A and n B denote the number of classes k A and k B with the most and second most occurrences of g, respectively.

K is number of diagnosis categories. Setting the threshold of RM, when the RM output by MedRDF is greater than the threshold, the doctor can accept this diagnostic result. Otherwise, doctor should consider re-evaluating this result. The effectiveness of RM is analyzed as follows: From Equation (9) we can obtain:

Then for different classification tasks, doctors can set different thresholds to make the probability of output labels reach their expectations. Take the 3-class diagnostic task as an example, we set a threshold of RM with 1 for indicating the diagnostic result to be robust or not, that is to say, k A should have at least 5/9 probability for robust evaluation. For 7-class diagnostic task, when setting the threshold of RM as 3, the probability of class k A is at least 0.51.

In this section, we first introduce the experimental settings including the common isotropic noises and adversarial attack methods we used in this study. Second, we choose the best noise boundry σ and the number n of the copies for our experiments by ablation study. Then, we conduct a set of experiments to evaluate the robustness of our MedRDF under different adversarial attacks. Furthermore, we confirm the necessity and effectiveness of our RM indictor and visually present the robust diagnostic results for different cases. Finally, we have shown more comparable results on our MedRDF with other augmentation strategies and other defense methods.

A. Experimental Settings 1) Common Isotropic Noise: We evaluate the robustness of MedRDF under gaussian noise, salt-and-pepper (s.p.) noise and poisson noise, and utilize gaussian smoothing (GS) and median filter (MF) as denoisers in experiments.

2) Adversarial Attack:

The adversarial examples are crafted by the most challenging "white-box" attacks (i.e., I-FGSM [51] , PGD [5] , and C&W [6] ) and "black-box" attacks (i.e., SPSA [31] and RayS [32] ). The "white-box" attacks are under maximum L ∞ perturbation = 8/255. We first explore the influence of common noise σ and adversarial perturbation on original and robust accuracy. As shown in TABLE I, since we set = 8/255 for adversarial attack, we choose σ = 0.1 for the boundry of the common noise for its high accuracy.

2) The number of the copies: In this part, we explore the influence of different number of copies to the final robustness of MedRDF. The defense accuracy and test time of MedRDF on different number n of copies are recorded in TABLE II. As shown in TABLE II, both on COVID-19 and DermaMNIST datasets, although the natural accuracy on n = 1e 5 is higher than it on n = 1e 4 , the test time on each image when n = 1e 5 is much longer than it on n = 1e 4 , which is not conducive to the clinical application of the MedRDF. For example, on COVID-19 dataset, the natural accuracy is 91.4% on n = 1e 5 , which is little higher than 91.2% on n = 1e 4 . However, its test time on each image is 87.6s, which is not easily tolerated when compared with the test time 3.8s on n = 1e 4 . Besides, in terms of defense accuracy, it can be seen that n = 1e 4 has a greater impact on the final defense accuracy on MedRDF, compared with that on n = 1e 3 and n = 1e 5 . For instance, with n = 1e 4 , the accuracy on DermaMNIST attacked by C&W is 66.0%, which is higher than the accuracy (65.6% and 65.9%) of n = 1e 3 and n = 1e 5 , respectively. In summary, we choose the number n = 1e 4 in our experiments. White box attack. The accuracy of original models (i.e., h θ = ResNet-18, ResNet-50, and AG-Sononet-16) and our MedRDF (i.e., g θ based on ResNet-18, ResNet-50, and AG-Sononet-16) are recorded in TABLE III, TABLE IV and  TABLE V V: Accuracy (%) of different defense mechanism (rows) against white box adversarial attacks with maximum L ∞ perturbation = 8/255 (columns) on COVID-19 and DermaMNIST dataset with AG-Sononet-16. The original accuracy of each defense is described in the column "Natural". GS: gaussian smoothing, MF: median filter. The number after attack method represents the number of iteration steps. 70.3% when attacked by PGD-7, which is much better than the accuracy of base model AG-Sononet-16 (i.e., 17.7%), and even is comparable with that without any attack (i.e., natural accuracy 70.6%). These results indicate the effectiveness of our framework to convert non-robust models to robust ones. Moreover, TABLE VI records the accuracy of COVID-19 with original models ResNet-50 and AG-Sononet-16 under different noise settings. The result we can obtain from TABLE VI is that, since original AG-Sononet-16 model is not robust to common noise (i.e., the natural accuracy is 28.2% after adding noise without denoiser), the MedRDF without denoiser will lose its discrimination ability (i.e., the accuracy is 28.2% under all attacks with random guess in TABLE V). This result has attracted our attention that the robustness of base model h θ under common noise will affect the final robustness of MedRDF under adversarial attack.

Black box attack. For SPSA attack, to estimate the gradi- 2) Visualization Results: In this part, we illustrate the superiority of our proposed MedRDF by visualizing the changes in the internal feature of each model.

The change of attention maps. As shown in Fig. 3 , in each subfigure, the first column shows the original image and its corresponding adversarial image attacked by PGD-100 with their labels, respectively. The second column shows the attention maps and output labels of base model AG-Sononet-16 on original image and adversarial image, respectively. And the third column denotes the attention maps and output labels of MedRDF g on original image and adversarial image, respectively. From Fig. 3 we can observe, the attention maps of base model on original image (i.e., h(x)) and adversarial image (i.e., h(x )) are extremely different. From this we can infer that, due to the changes of feature the base model focuses on, the base model can be easily fooled by adversarial example. On the contrary, we notice that MedRDF do not significantly change the attention map of the original image and the adversarial example, which shows that MedRDF are Fig. 4 . In each subfigure, the first row shows the original image and adversarial image attacked by C&W. The second row shows the feature maps on original image and adversarial image at the second "BasicBlock" layer of the base model ResNet-18, respectively. And the feature maps at the third row are produced by MedRDF. From the second rows at Fig. 4 (a)-(c) , one can observe that the learned features of base model h for the clean image focus on semantically informative regions (represented in red), while the features of the adversarial images are activated globally (without any specific focus). However, this problem can be effectively solved by MedRDF. From the third row of each subfigure, we can see that the feature map of the adversarial image generated by MedRDF is consistent with the clean image. These visualization results indicate that our MedRDF is not susceptible to adversarial perturbations, thus improving the robustness effectively. 2) Visualization Results: In order to illustrate the effectiveness of RM more intuitively, several cases are presented in Fig. 5 , where the last two cases should be re-evaluated with doctor due to the low RM of the result.

For each test image, MedRDF first creates a large number of noisy copies. To illustrate the effectiveness of this operator, we compare our operator of creating noisy copies with other augmentation strategies. Specifically, we use random resizing and random rotating to replace the noise in this experiment. The resizing range is [200, 224] , the rotating angle is [10, 100] . The experimental results can be found in TABLE X. From  TABLE X we can obtain, compared with random rotating and resizing, our proposed MedRDF with noisy copies achieves best accuracy under all attacks.

2) Comparison with Other Defense Methods.: To further verify the superior performance of our method, we compare MedRDF with other defense mechanisms in this section, including pre-processing based-defenses (i.e., Random R-P [12] , ComDefend [13] ), and retraining-based defenses (i.e., adversarial training (AT) [5] , TRADES [38] , and MART [39] ). The accuracy and training time of each method can be found in TABLE XI. From TABLE XI we can obtain, when the dataset is COVID-19, whether the base model is ResNet-18 or ResNet-50, MedRDF not only has the highest defense accuracy (e.g. the accuracy of MedRDF based ResNet-50 attacked by C&W is 91.2% while the Random R-P is 51.0%), but its training time is much shorter than other retraining defense methods (e.g., the training time of MedRDF based ResNet-18 is 0.51 hrs while the TRADES is 1.97 hrs). For Der-maMNIST, MedRDF still maintains the best defense accuracy under many attacks (e.g., PGD-20, PGD-100, and C&W on ResNet-50). Compared with Random R-P, the pre-processing defense method, MedRDF has better defense accuracy on medical images. Besides, compared with retraining methods, MedRDF which is employed in the inference phase can greatly reduce the training time and training burden. The above TABLE X: Accuracy (%) of different augmentation strategies (rows) against white box adversarial attacks with maximum L ∞ perturbation = 8/255 (columns) on COVID-19 with ResNet-18. The original accuracy of each defense is described in the column "Natural". The denoiser is MF after adding s.p. noise. The number after attack method represents the number of iteration steps. analyses confirm that our MedRDF is effective and suitable for defending against adversarial attack on medical diagnostic tasks.

We propose a Robust and Retrain-Less Diagnostic Framework for Medical pre-trained models against adversarial attack (i.e., MedRDF). MedRDF allows users to seamlessly convert the pre-trained non-robust medical diagnostic model into robust one in inference phase, which is very convenient for diagnostic services that are already deployed online. Moreover, we also propose an effective Robustness Metric (RM) based on MedRDF, which gives the confidence score of the diagnostic result. Experimental results demonstrate a superior performance of MedRDF on COVID-19 and dermaMNIST datasets in both white-box and black-box adversarial settings. In the future, we plan to study the robustness of base medical models to common noise which plays an important role in our robust framework, as well as the trade-off between the natural accuracy and defense accuracy. In addition, we will extend our research to the field of medical image segmentation.

Chexnet: Radiologistlevel pneumonia detection on chest x-rays with deep learning

Dermatologist-level classification of skin cancer with deep neural networks

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

Intriguing properties of neural networks

Towards deep learning models resistant to adversarial attacks

Towards evaluating the robustness of neural networks

Generalizability vs. robustness: investigating medical imaging networks using adversarial examples

Towards evaluating the robustness of deep diagnostic models by adversarial attack

Impact of adversarial examples on deep learning models for biomedical image segmentation

Defending against adversarial attacks on medical imaging ai system, classification or detection

Improving robustness of medical image diagnosis with denoising convolutional neural networks

Mitigating adversarial effects through randomization

Comdefend: An efficient image compression model to defend adversarial examples

Can ai help in screening viral and covid-19 pneumonia?

Deep learning based detection and analysis of covid-19 on chest x-ray images

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Automated detection of covid-19 cases using deep neural networks with x-ray images

Covidiagnosis-net: Deep bayes-squeezenet based diagnosis of the coronavirus disease 2019 (covid-19) from x-ray images

Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment

Generative adversarial networks: An overview

Extreme learning machine: theory and applications

Long short-term memory

Rethinking the inception architecture for computer vision

Xception: Deep learning with depthwise separable convolutions

Aggregated residual transformations for deep neural networks

Attention-gated networks for improving ultrasound scan plane detection

Skin lesion analysis towards melanoma detection using deep learning network

Melanoma recognition via visual attention

Explaining and harnessing adversarial examples

Adversarial risk and the dangers of evaluating against weak attacks

Rays: A ray searching method for hard-label adversarial attack

Understanding adversarial attacks on deep learning based medical image analysis systems

Vulnerability analysis of chest x-ray image classification against adversarial attacks," in Understanding and interpreting machine learning in medical image computing applications

Distillation as a defense to adversarial perturbations against deep neural networks

Thermometer encoding: One hot way to resist adversarial examples

Adversarial training for free

Theoretically principled trade-off between robustness and accuracy

Improving adversarial robustness requires revisiting misclassified examples

No surprises: Training robust lung nodule detection for low-dose ct scans by augmenting with adversarial attacks

Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis

The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic)

Deep residual learning for image recognition

Certified adversarial robustness via randomized smoothing

Mitigating evasion attacks to deep neural networks via region-based classification

Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems

Robustness via curvature regularization, and vice versa

Denoised smoothing: A provable defense for pretrained classifiers

Rank verification for exponential families

Adversarial machine learning at scale