key: cord-0149127-wuv1yhma
authors: Ma, Linhai; Liang, Liang
title: Increasing-Margin Adversarial (IMA) Training to Improve Adversarial Robustness of Neural Networks
date: 2020-05-19
journal: nan
DOI: nan
sha: 0fb733b10f343cf29af879dacbc429c6e6a43b5a
doc_id: 149127
cord_uid: wuv1yhma

Deep neural networks (DNNs) are vulnerable to adversarial noises. By adding adversarial noises to training samples, adversarial training can improve the model's robustness against adversarial noises. However, adversarial training samples with excessive noises can harm standard accuracy, which may be unacceptable for many medical image analysis applications. This issue has been termed the trade-off between standard accuracy and adversarial robustness. In this paper, we hypothesize that this issue may be alleviated if the adversarial samples for training are placed right on the decision boundaries. Based on this hypothesis, we design an adaptive adversarial training method, named IMA. For each individual training sample, IMA makes a sample-wise estimation of the upper bound of the adversarial perturbation. In the training process, each of the sample-wise adversarial perturbations is gradually increased to match the margin. Once an equilibrium state is reached, the adversarial perturbations will stop increasing. IMA is evaluated on publicly available datasets under two popular adversarial attacks, PGD and IFGSM. The results show that: (1) IMA significantly improves adversarial robustness of DNN classifiers, which achieves the state-of-the-art performance; (2) IMA has a minimal reduction in clean accuracy among all competing defense methods; (3) IMA can be applied to pretrained models to reduce time cost; (4) IMA can be applied to the state-of-the-art medical image segmentation networks, with outstanding performance. We hope our work may help to lift the trade-off between adversarial robustness and clean accuracy and facilitate the development of robust applications in the medical field. The source code will be released when this paper is published.

Deep neural networks (DNNs) have become the first choice for automated image analysis due to their superior performance. However, recent studies have shown that DNNs are very vulnerable to adversarial noises. Adversarial noise was first discovered by [28] and then explained by [12] . Adversarial noises can significantly affect the robustness of DNNs for a wide range of image classification applications [2, 10, 13, 20] . The COVID-19 pandemic has caused the death of millions of people [35] . A large-scale study shows that CT had higher sensitivity for the diagnosis of COVID-19 as compared with initial reverse-transcription polymerase chain reaction (RT-PCR) from swab samples [1] . As reviewed in [24] , many DNN models for COVID-19 diagnosis from CT images have been developed and achieved very high classification accuracy. However, none of these studies [24] considered DNN robustness against Figure 1 . An example of a clean image and the image with unperceivable adversarial noise: We modified a Resnet-18 model [14] and trained it on a public COVID-19 CT image dataset [27] , and then the model's robustness is tested. This shows a CT image (denoted by x) of a lung that was infected by COVID-19 and correctly classified as infected (left). After adding an unperceivable adversarial noise δ to the image x, the noisy image x + δ is classified as uninfected (right). On the test set, although the model achieved ≥ 95% accuracy on clean images, its accuracy dropped to zero on small noise levels (see 4.3). adversarial noises. Fig. 1 shows that Resnet-18 model [14] is very vulnerable to unperceivable adversarial noise. Clearly, this non-robust model cannot be trusted in real clinical applications. Besides, adversarial noise can be seen as the worst-case random noise. An adversarial-robust DNN model is also robust to random noise that may exist everywhere in the real world [11] . Last but not least, it seems that adversarial noises are created by algorithms (e.g. PGD) and therefore it is only a security issue caused by hackers. However, random imaging noises could also be "adversarial" noises leading to wrong classifications. For the COVID-19 application, we did an additional test and found out that 2.75% of the noisy samples with uniform white noises on the level of 0.05 (L-inf norm), can cause the Resnet-18 model to make wrong classifications. 2.75% is not a negligible number for this medical application. And, adversarial robustness should be the built-in property of a model for this application. And all of the DNN models in the previous COVID-19 studies [24] should be checked and enhanced for adversarial robustness before being deployed in clinics and hospitals. In short, the adversarial robustness problem of DNN models is critical and is worthy of conducting studying on it.

To improve adversarial robustness of a DNN model, adversarial training is the most popular method. By generating adversarial training samples to train the model, adversarial training can improve adversarial robustness of the DNN model. The standard adversarial training (also called vanilla adversarial training) [17, 19] is to generate adversarial training samples with a fixed and unified adversarial perturbation upper bound. Many advanced adversarial traiing methods have been proposed. TRADES [38] and MART [34] uses loss regularization terms to make a trade-off between adversarial robustness and standard accracy. DAT [33] applies converge quality as a criterion to adjust adversarial training perturbations. ATES [26] and CAT [5] apply curriculum strategy for adversarial traninig. The misclassification-aware strategy is used in methods including IAAT [3] , FAT [39] , Costomized AT [6] and MMA [9] , which adjust the perturbation upper bound adaptively in the traning process. GAIRAT [40] applies sample-wise weights to loss for robust training. None of these methods aims to preserve the standard accuracy during adversarial training. As a result, the generated adversarial training samples significantly harm the model's standard accuracy (on clean samples) [23, 29] , which prevents it being used in the medical domains, such as medical image analysis.

The focus of our paper is on how to generate proper adversarial training samples such that the standard accuracy is preserved as much as possible. Our major contributions are as follows. (1) We make the hypothesis that the optimal adversarial training samples should be just about to cross the decision boundaries, and we analyze the condition un-der which the equilibrium/optimal state exists. (2) Based on this hypothesis, we design an adaptive adversarial training method, named Increasing Margin Adversarial (IMA) training: In each training epoch, IMA makes sample-wise estimations of the upper bound of the adversarial perturbations that will be used to generate adversarial samples in the next epoch. In the training process, decision boundaries will be gradually pushed away from the clean samples, which enhances robustness. Once an equilibrium state is reached, the adversarial perturbations will stop increasing, preventing adding too much noise that may hurt the model's standard accuracy. We evaluate IMA on public datasets that are widely used in the adversarial robustness research community. The results show that: ¡a¿ IMA can significantly improve the adversarial robustness of DNN models and achieves state-of-the-art performance; ¡b¿ IMA has the highest standard accuracy among all competing defense methods; ¡c¿ IMA can be applied on pretrained (on clean data) model, which reduces training time cost. (3) We apply IMA to a clinical COVID19 CT image classification task to show the significance of the robustness study in the medical domain. (4) We extend the IMA method to medical image segmentation tasks, for which adversarial robustness has rarely been studied before. 3)): a solid blue arrow shows one PGD iteration to updatex; a dashed blue arrow shows that whenx moves out of the -ball, it will be projected back onto the -ball; the solid red arrow denotes the final adversarial perturbation. (b) Explanation of Algorithm 2: during the PGD iterations (Eq. (3)),X1 may cross the decision boundary in the next iteration, andX2 has just crossed the decision boundary. Then, binary search is performed betweenX1 andX2 to lo-cateX (also see Fig. 3 (d)).

In this paper, we only focus on untargeted adversarial attacks and defenses. Let (x, y) be a pair of a sample x and its true label y. The objective function of generating an adversarial sample is:

where Loss(.) is the loss function,x is the adversarial sample to be generated, and f (.) is a DNN model. The objective function is under the contraint that:

where is perturbation upper bound and . p is L p vector norm. Given a clean sample (x, y), there are many ways to solve Eq. (1) and (2) to obtain an adversarial samplex. One popular method is called Projected Gradient Descent (PGD) [17, 19] , which leverages an iterative way to approximate the optimalx:

where h(.) is the normalization function, α is the step size, x (k) is the adversarial sample at the iteration k, and Π (.) is a projection operation to ensure the generated perturbation x (k) − x p ≤ (called -ball). After N interations (called N-PGD), we obtain the final adversarial samplex=x (N )

By taking into account both the standard accuracy and adversarial robustness, the objective function of the Standard Adversarial Training (SAT) [12, 17, 19] is:

where f θ (.) is the DNN model with the parameter θ. The objective function is under the constraint that:

The SAT uses both clean sample {x, y} and adversarial sample {x, y} to train the DNN model f θ . In this way, the DNN model gains robustness against adversarial noise. For SAT, adversarial training samples are generated by using the PGD method (Eq. (3)) with a fixed purtabation upper bound for every {x, y}. The fixed and uniform perturbation upper bound is the weak point of SAT, which will be dicussed in the next section.

From Fig. 3 , the optimal location of an adversarial training samplex should be very close to the decision boundary ( Fig. 3 (d) ). And the optimal adversarial perturbation upper bound should be the distance betweenx and the x. This distance is similar as the "Margin" in Support Vector Machine (SVM) [8] . So, we will call this optimal adversarial perturbation upper bound as "Margin" in this paper.

In fact, the real margin should be the shortest distance between x and the corresponding decision boundary d. This means for a true optimalx that is about to cross the decision Adding adversarial noise to a sample/data point x is to push it towards a direction in the input space. Once the data point goes across the decision boundary, the prediction from a DNN model is changed. Apparently, the closer x is to a decision boundary, the more likely it is pushed across the decision boundary. (b) Given a training set (X, Y ), where X denotes the set of samples and Y denotes the set of true lables, ideally, adversarial training samplesX are placed closer to the true/optimal decision boundary ( Fig. 2 (a) ). By training the model with both (X, Y ) and (X, Y ), the model decision boundary is pushed away from X and therefore adversarial robustness is improved. (c) However, the distribution of training samples is not always as ideal as that in (b). The margin of each training sample is different. A uniform and fixed adversarial training perturbation can lead to adversarial training samples that go across the true decision boundary, and a model trained on these adversarial samples will have low standard acccuracy on clean data. (d) Apparently, the optimal adversarial training samplesX should be just about to go across the decision boundary. Training with too small perturbation ||X − X||p is not effective enough. Training with too large perturbation ||X − X||p leads to low standard acccuracy on clean data.

boundary,x − x ⊥ d holds. However, for DNN models, such a true optimalx can hardly be located precisely in the input space. So, the estimated margin is always larger than the real margin. This is an inevitable overestimation problem, and it exists in all adversarial training methods, such as MMA [9] (see Fig. 5 ).

Motivated by 2.1, 2.2 and 2.3, we design IMA, an adaptive adversarial training method. One epoch of the IMA training process is shown in Algorithm 1, which includes two sub-processes: (1) Compute the loss and update the DNN model (Line 2 to 12); (2) Update the sample margin estimation (Line 13 to 17). Here is a brief description of Algorithm 1. x is a clean training sample with true label y, and i is the unique ID for this sample (Line 1). After the clean sample is processed by the DNN model f (.), the loss L 0 on the clean sample is obtained (Line 2-3). If x is correctly classified (Line 4), Algorithm 2 will be used to generate an adversarial/noisy samplex (Line 5). After the noisy sample is processed by the model, the loss L 1 on the noisy sample (Line 6-7) is obtained. If x is not correctly classified, Algorithm 2 will not be used (Line [8] [9] . Then, the model f (.) is updated by backpropagation from the combined loss (Line 11-12). If both the clean sample x and the noisy samplex are correctly classified, which means the sample margin is not large enough to reach the decision boundary, the margin E(i) for this training sample will be expanded (Line [13] [14] . Otherwise, the sample margin E(i) is too large and the perturbation with this magnitude has already pushedx across the decision boundary, and therefore, E(i) should be reduced (Line 16). E(i) should be always smaller than the maximum perturbation , ensured by a clip operation (Line 19).

Here is a brief description of Algorithm 2 (BPGD). In each interation (Line 1), x is modified to getx (Line 2, see Eq. (3)). Ifx is misclassified by the current model f (.) (Line 3), thenx is represented byx 2 and x is is represented byx 1 in Fig. 2 . So, a binary search is applied to find the newx in Fig. 2 (Line 4), which is just about to cross the decision boundary (Line 5). If the misclassification in Line 3 does not happen, then the algorithm runs to the next iteration (Line 7-8). We can show that on certain conditions, an equilibrium state ( Fig. 4 (a) ) exists. The equilibrium state means that the overall loss is minimized and the adversarial training samples are placed near the decision boundary ( Fig. 3 (d) ).

To simplify the discussion, we assume there are three classes and three decision boundaries between classes ( Fig.  4 (b) ). The softmax output of the neural network model f has three components: p 1 , p 2 and p 3 corresponding to the three classes. If a data pointx is about to cross the decision boundary B ij between class-i (c i ) and class-j (c j )

Input: the training set S the CNN model f (.) g(.) is the function that transforms the output of f (.) to a perdicted class label, e.g., argmax Loss is the loss function for training the model f . E is the array of the estimated sample margins. E(i) is the margin of the sample indexed by the unique ID i. Every E(i) is initialized to be ∆ ε Parameters: ∆ ε is the expansion step size (positive scalar)

is the adversarial training perturbation upper bound Output: Updated model f after this training epoch Process:

Run the model f on clean sample:

Generate a noisy sample using the Algorithm 2:

Run the model f on noisy sample:z ← f (x) 7: (i.e., x is on the class-y side of the decision boundary), then p i (x) = p j (x). The mathematical expectation of the crossentropy loss of the generated adversarial tranining samples (i.e., L 1 in Algorithm 1, whenx is correctly classified) is:

The IMA method pushes the generated adversarial training samples towards the decision boundaries. Then,x ∈ c i are split into two parts: those pushed to B ij denoted by Algorithm 2 (Binary-PGD): generate noisy samples Input: training sample (x, y) the estimated margin ε, currently the CNN model f (.) g(.) is the function that transforms the output of f (.) to a perdicted class label, e.g., argmax the Loss function L Parameters: maximum PGD iteration number K ← 20 PGD step size α ← (4 × ε)/K Output: the generated noisy samplex Function BPGD(x, y, ε, f , L):

if g(f (x)) = y then 5:x 2 ←x 6:

is the normalization function. ||.|| p denotes vector Lp norm. N is always set to 10, which is constant. So, this binary search's will not enlarge the time complexity of the whole algorithm. Π ε (.) ensures that ||x − x|| p ≤ ε.

x ∈ c i , B ij and those pushed to B ik denoted byx ∈ c i , B ik . So, we can get:

Ex∈c 2 (−log (p2 (x))) = Ex ∈c 2 ,B 12 (−log (p2 (x)))

If the generated adversarial training samples (random variables)x ∈ c i andx ∈ c j have the same spatial distribution on the decision boundary B ij between the two classes, then:

As a result, E reaches the minimum when p i (x) = p j (x).

In the above analysis, the loss on clean training samples is ignored because, in experiments, we find out that the loss on clean training samples converges very quickly.

We have shown that an equilibrium state can be achieved when the generated adversarial training samples have the same spatial distribution on the (final) decision boundary. Here, we outline what will happen if the spatial distributions of the generated adversarial training samples in different classes are not the same on the current decision boundary. We note that our IMA method will actively generate and push the adversarial training samples towards ("close to" due to numerical precision) the current decision boundary of the neural network model. The training is a dynamic process to adjust the decision boundary of the model. Let's focus on the following two terms.

where q i (x) and q j (x) are the distributions (i.e., densities) of the adversarial training samples on the current decision boundary between the two classes, and q i (x) and q j (x) may not be equal to each other. In fact, F i and F j can be interpreted as two forces that try to expand the margins of the samples in the two classes against each other. By dividing the decision boundary into small regions (i.e., linear segments), the two integrals can be evaluated in the individual regions. In a region, if q i (x) > q j (x) (i.e., more samples in class-i) then the current state is not in equilibrium: after updating the model using these noisy samples, the noisy samples in class-i will be correctly classified and the noisy samples in class-j will be incorrectly-classified (this is a simple result of classification with imbalanced data in the region), which means the decision boundary will shift towards the samples in class-j, and therefore the margins of the corresponding samples in class-i will expand and the margins of the corresponding samples in class-j will shrink. Thus, the decision boundary may shift locally towards the samples in one of the classes. Obviously, the decision boundary will stop shifting when the local densities of noisy samples in different classes are the same along the decision boundary, i.e., q i (x) becomes equal to q j (x), which means an equilibrium state is reached. We defer a math-rigor analysis to our future work.

The above analysis provides the rationale that our IMA puts the noisy/adversarial samples close to the decision boundaries, which is significantly different from the theory of the MMA method [9] .

Data augmentation (e.g., flip an image) is widely used to improve DNN standard accuracy on clean data (e.g., on CIFAR10). This causes a problem for Algorithm 2: the optimalx may not be betweenx 1 andx 2 . To resolve this issue, Algorithm 2 is replaced with Algorithm 3 in case of data augmentation. In Algorithm 3, the binary search is conducted between x andx.

Function BPGD(x, y, ε, f , L): (x), y) ) + x) 4: In this section, we discuss the difference between our IMA and other existing defense methods. Please read the references before reading this section, or the reader may skip this section.

TRADES [38] optimizes a loss with a KL divergencebased regularization term to consider both adversarial robustness and standard accuracy. MART [34] uses a similar KL divergence-based regularization term for robustness training. DAT [33] applies convergence quality as the criterion for generating adversarial training samples. CAT [5] uses adversarial training perturbations of different strengths forever training sample. GAIRAT [40] applies sample-wise weights in the loss, but the weights cannot prevent adversarial training samples from going across the decision boundary. So, GAIRAT is very likely to generate too-large adversarial training perturbation. As shown in the experiments, GAIRAT's standard accuracy is much worse than that of IMA (see 4.1 and 4.3). These methods are obviously very different from IMA.

The FAT method [39] applies early-stop PGD to generate adversarial training samples with a parameter τ . When τ = 0, the generated adversarial training sample isx 1 in Fig. 2 (b) ; when τ > 0, the generated adversarial training sample will go across the decision boundary, which may hurt the model's standard accuracy. This is against the idea of IMA. The IAAT method [3] uses a sample-wise adversarial training perturbation to train a model, but the samplewise perturbation is based on heuristics, which may not be optimal and is completely different from IMA. The MMA method [9] is based on sample-wise margins directly estimated by PGD [19] . However, MMA has the margin overestimation problem (mentioned in 2.3) as illustrated in Fig.  5 . With the gradual expansion mechanism, the sample margins estimated from IMA can be much closer to the real margin, as illustrated in Fig. 5 . The gradual expansion mechanism also helps to reach an equilibrium state (2.5). In short, IMA is different from MMA.

Pytorch1.9.0 [22] is used for model implementation and evaluation. Nvidia V100 GPUs are used for model training and testing. We observe that the adversarial perturbations have inner structures and patterns ( Fig. 1 in [12] , Fig. 1 in [37] , Fig. 2 in [2] , etc.), which will be completely ignored by L-inf norm. This means L-inf is not a good measure to describe adversarial noise. As a result, in this paper, all of the adversarial training perturbations and adversarial noises are measured in the L2 norm. PGD attack [19] with 100 iterations and IFGSM attack [18] with 10 iterations are used for evaluation. For all experiments, three noise levels are used. The largest noise level always makes STD's accuracy drop to almost 0, which means it is strong enough.

In the experiment, the model trained only on clean data set is named "STD"; the model trained with SAT with adversarial training perturbation upper bound is named "SAT "; the model trained with FAT [39] with τ = 0 is named "FAT(τ = 0)"; the model trained with FAT with dynamic τ (in keeping with configuration in [39] ) is named "FAT". Other comparing methods include: TRADES [38] (used to be state-of-the-art method), GAIRAT [40] (samplewise adaptive method) and MMA [9] (sample-wise adaptive method).

To make the comparison fair, a unified adversarial training perturbation upper bound is used for all comparing methods, in each of the classification experiments. In the CIFAR 10 experiment, the is 3 (L2 norm) in keeping with that in the MMA paper [9] . Then, we calculate the average pixel-wise perturbation upper bound p :

where we can get p = 3 32×32 . By applying this p to SVHN and COVID19 experiments, we can calculate the adversarial training perturbation upper bound in these two experiments. For SVHN experiment, = 2 32 × 32 × 3 × p = 3. For COVID19 experiment = 2 224 × 224 × 1 × p ≈ 12. Also, to make a fair comparison, in the training process, the number of iterations is 20 in PGD for all methods. All the evaluation results are the average of three runs with different random seeds.

In this experiment, we use CIFAR 10 dataset [16] to evaluate our method. CIFAR 10 contains 60000 32×32×3 colorful images. It has 10 classes. There are 6000 images per class with 5000 training and 1000 testing images per class. We apply all comparing methods to WideResNet-28-4 (WRN-28-4) in keeping with that in the MMA paper [9] .

To make the comparison fair, for all methods, batch size is 128, the number of training epochs is 300, the adversarial training perturbation upper bound is 3.0 (see 3.1). Data augmentation is used. For IMA, the ∆ ε is 0.03. To reduce time cost, IMA training is applied after 200 epochs of stardard training. Other configurations of comparing methods are in keeping with that in the corresponding papers.

In this experiment, we use SVHN dataset [21] to evaluate our method. SVHN dataset contains 32 × 32 × 3 color images of 0 ∼ 9 digits, 73257 digits for training and 26032 digits for testing. We apply all comparing methods to our self-designed DNN model. The network structure is COV (3, 32, 3, 1, 1 To make the comparison fair, for all methods, batch size is 128, the number of training epochs is 60, the adversarial training perturbation upper bound is 3.0 (see 3.1). Besides adversarial training, we did not use any other data augmentation (e.g. crop, etc.). For IMA, the ∆ ε is 0.04. For GAIRAT, the "begin epoch" was 30 (half of the total number of epochs). Other configurations of comparing methods are in keeping with that in the corresponding papers.

In this experiment, we show that IMA can be applied to practical medical image classification task. We used a public COVID-19 CT image dataset [27] . It was collected from patients in hospitals from Sao Paulo, Brazil. It contains 1252 CT scans (2D images) that are positive for COVID-19 infection and 1230 CT scans (2D images) for patients uninfected, 2482 CT scans in total. From infected cases, we randomly selected 200 samples for testing, 30 for validation, and 1022 for training. From the uninfected cases, we randomly selected 200 for testing, 30 for validation and 1000 for training. Each image is resized to 224 × 224 as input. We modified the output layer of the Resnet-18 model [14] for this binary classification task: uninfected (label 0) vs infected (label 1). We also replaced batch normalization with instance normalization because it is known that batch normalization is not stable for small batch-size [36] . As shown in the previous studies [24] , infected regions in the images have a special pattern called ground-glass opacity.

To make the comparison fair, for all methods, the batch size is 32, the number of training epochs is 100 and the adversarial training perturbation upper bound is 12 (see 3.1). Adam optimizer was used with default parameters. For STD, weight decay of 0.01 is applied with Adam (i.e., AdamW with default parameters). For IMA, the ∆ ε is 2.0. Other configurations of comparing methods are in keeping with that in the corresponding papers.

In this experiment, we show that IMA can also be applied to practical medical image segmentation tasks. Because most of the well-known adaptive adversarial training methods are designed upon the assumption that it is a classification task. Applying them to regression-based tasks, such as segmentation, is not straightforward. Besides, there are few general and effective adaptive adversarial training methods designed especially for segmentation tasks. As a result, only SATs with different s are compared in this experiment.

To apply IMA, we reformulate a segmentation task as a pseudo classification task. Since the Dice index is often used to evaluate segmentation performance, segmentation can be considered "correct" if Dice > a threshold, and "wrong" otherwise, which is binary classification to classify the segmentation output. In the experiments, this Dice threshold is set to 60% for all of the datasets because a Dice score higher than 60% is considered "good" for medical applications [4, 7, 30, 31] . For non-medical images, there is no consensus about a good Dice score, which is the reason we used medical images.

We apply SAT and IMA to a self-configuring DNN, nnUnet [15] . The nnUnet can automatically configure itself, including preprocessing, network architecture, training, and post-processing for the dataset. The inputs of nnUnet are 2D slices of 3D images. The "Average Dice Index (ADI)" is used as the metric, whose formula is:

Here, n is the number of samples in the test set. For the sample i, T P i is the number of pixels in the true-positive area, F P i is the number of pixels in the false-positive area, and F N i is the number of pixels in the false-negative area. Three public datasets are used in this experiment. The ∆ ε and of IMA are from the grid research on the validation set.

The Heart MRI dataset [25] has 20 labeled 3D images: 16 for training, 1 for validation, and 3 for testing. The median shape of each 3D image is 115 × 320 × 320, of which 115 is the number of slices. In this experiment, only 2D segmentation is considered, so the input of the model is one slice. The batch size (40) , and input image size (320 × 256) are self-configured by nnUnet for this data set. The model is trained for 50 epochs, where each epoch contains 50 iterations. Other training settings are the same as [15] . For SAT, we tried three different noise levels (5, 15, 25) . For IMA, the step size ∆ ε is 5 and the adversarial training perturbation upper bound is 20.

The Hippocampus MRI dataset [25] has 260 labeled 3D images: 208 for training, 17 for validation, and 35 for testing. The median shape of each 3D image is 36 × 50 × 35, where 36 is the number of slices. The batch size (366), the input image size (56 × 40), and the network structure are self-configured by nnUnet for this data set. The model is trained for 50 epochs, where each epoch has 50 iterations. Other training settings are the same as those in [15] . For SAT, we tried three different noise levels (5, 10, 15) . For IMA, the step size ∆ ε is 2.0 and the is 15.

The Prostate MRI dataset [25] has 32 labeled 3D images: 25 for training, 2 for validation, and 5 for testing. The median shape of each 3D image is 20 × 320 × 319, where 20 is the number of slices. The batch size (32), the input image size (320 × 320), and the network structure are selfconfigured by nnUnet for this dataset. The model is trained for 50 epochs, where each epoch has 50 iterations. Other training settings are the same as those in [15] . For SAT, we tried three different noise levels (10, 20, 40) . For IMA, the step size ∆ ε is 10 and the is 40. 

= 3 is a relatively large but reasonable training perturbation upper bound, which is from MMA [9] . If an adversarial training method's standard accuracy drops significantly under large , it means its algorithm can not generate proper adversarial training samples.

From Table 1 and Table 2 , conclusions can be get: (1) Except SAT and STD, IMA outperforms all other comparing methods, on both standard accuracy and robustness. (2) FAT (τ = 0) has better standard accuracy than FAT, which supports our discussion in 2.7 about FAT. (3) GAIRAT's standard accuracy is not good on CIFAR 10 and cannot converge on SVHN. This supports our discussion in 2.7 that GAIRAT's weight-based criterion cannot guarantee good adversarial training samples. Table 3 ). Only one-third of the total epochs of IMA training can make the model robust enough. This also suggests IMA can be applied to the pretrained model.

SVHN is not as challenging as CIFAR 10, so the evaluation results from different methods are closer than that in CIFAR 10. In SVHN (Table 2 ), SAT has even higher standard accuracy than STD, while even lower noisy accuracy than STD. Fig.  6 shows that as the grows up, the standard accuracy of SAT drops first, and increases then. By contract, the noisy accuracy first increases and then drops significantly. This means, too larger training perturbation may not only harm standard accuracy, but also harm noisy accuracy. This contradicts the common sense in adversarial robustness community and will be studied in the future work. 

Because of the page limitation, the sensitivity study is only provided for CIFAR 10. The evaluation results in this section are got from the validation set. Table 4 shows that IMA is not significantly sensitive to the parameter . This is because the Gradual Expansion Mechanism constrains the estimated margins' expansion. This reduces the IMA's sensitivity to the upper bound . By comparing, MMA and SAT are very sensitive to . Fig. 7 shows how expansion step size ∆ ε effects the performance of IMA. As the ∆ ε grows up, standard accuracy reduces fast at the beginning and then tends to be stable. The noisy accuracy (at noise level 0.3) increases fast at the beginning and then tends to be stable. In general, IMA is not very sensitive to ∆ ε , except that ∆ ε is too small. This is because too small ∆ ε can not make IMA expand the estimated margin enough, under a fixed number of training Figure 7 . the Effect of ∆ε in the IMA in CIFAR 10 test set ( = 3.0). The first line shows the standard accuracy under different ∆ε; the second line shows the noisy accuracy under noise level 0.5; the third line is the geometric mean of the first two lines, which shows the trade-off between standard accuracy and robustness.

COVID-19 is a binary classification task, which is very simple, so the evaluation results from different methods are closer than that in CIFAR 10 and SVHN. All the conclusions in 4.1 still holds in Table 5 . COVID-19 dataset is much sparser than CIFAR 10 and SVHN (larger individual image size while smaller training set size). This means minor decision boundaries (see the small holes in Fig. 5 (c) ) are more likely to exist, which are very likely to be ignored by FAT. So, the standard accuracy of FAT (τ = 0) is poor. SAT has even higher standard accuracy than STD, while even lower noisy accuracy than STD. Fig. 8 shows that as the grows up, in the beginning, the standard accuracy of SAT drops while the noisy accuracy increases. This is reasonable. However, when the keeps increasing, performance goes back and forth between standard accuracy and robustness. Same as that in 4.1.1, too larger training perturbation may not only harm standard accuracy but also harm noisy accuracy. This contradicts the common sense in the adversarial robustness community and will be studied in future work.

1RLV\ Figure 8 . the Effect of on the SAT in COVID19 test set

From the result in Table (6 -8) , conclusions can be made: (1) IMA significantly outperforms SAT with different s. This means the extension of IMA to image segmentation is successful. (2) IMA's standard accuracy (the Dice score) outperforms that of STD in all three datasets. This means adversarial training may increase the standard accuracy in some cases. I think this is because the training set is so sparse that they can hardly form a general enough distribution, namely, serious overfitting occurs. The adversarial training examples generated by IMA are of good quality, so their effect on reducing the overfitting overwhelms the effect of affecting the standard accuracy. What's more, because the training set is sparse, and the margin is large, it is more likely to generate proper adversarial samples. 

From the experiments, we can see: (1) IMA has the best performance on noisy data; (2) at the same time, IMA has a minimal reduction in standard accuracy. This supports our hypothesis in Section 2.

The limitation of this research includes: (1) IMA has a good performance on both standard and noisy accuracy. IMA's time complexity is the same as that of MMA, which means the scalability is the same. But IMA has more operations, which leads to more time costs for each training epoch. This can be released because IMA can be used on a pretrained model to save time. But there should still be room to improve the method to make it more time-efficient.

(2) IMA largely reduces the overestimation problem (see 2.3, 2.7) that exists in MMA, but the overestimation still exists. Methods can be designed for better margin estimation in future work.

In this paper, We give a hypothesis that in the space of the training set, the optimal adversarial training samples should be just about to cross the decision boundaries. Based on this hypothesis, we design an adaptive adversarial training method, named IMA. In each epoch, IMA combines binary search with the PGD algorithm to get an accurate samplewise estimation of the upper bound of the training adversarial perturbation for the next epoch. In the training process, such a sample-wise estimation is gradually expanded to push away minor decision boundaries and approximate major decision boundaries. Finally, an equilibrium state will be reached, which is a sign of the distribution of optimal adversarial training samples. IMA can also signifi-cantly reduce the overestimation of the adversarial perturbation upper bound. The IMA is evaluated on publicly available datasets under two popular adversarial attacks, PGD and IFGSM. The results show that: (1) the IMA significantly improves adversarial robustness of DNN classifiers, which achieves state-of-the-art performance; (2) the IMA has a minimal reduction in standard accuracy among all comparing defending methods; (3) the IMA can be applied to pretrained standard-trained models to reduce time cost; (4) the IMA can also be applied to state-of-the-art medical image segmentation frameworks, with outstanding performance. We hope our work may draw more attention to standard accuracy in the adversarial robustness community and facilitate the development of robust applications in the medical field.

Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases. Radiology

Threat of adversarial attacks on deep learning in computer vision: A survey

Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets

Measurement and reliability: statistical thinking considerations

Curriculum adversarial training

Cat: Customized adversarial training for improved robustness

Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology

Support-vector networks

Mma training: Direct input space margin maximization through adversarial training

Robust physical-world attacks on deep learning visual classification

Robustness of classifiers: from adversarial to random noise

Explaining and harnessing adversarial examples

Assessing threat of adversarial examples on deep neural networks

Deep residual learning for image recognition

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation

Learning multiple layers of features from tiny images

Adversarial examples in the physical world

Adversarial examples in the physical world

Towards deep learning models resistant to adversarial attacks

Soft biometric privacy: Retaining biometric utility of face images while perturbing gender

Reading digits in natural images with unsupervised feature learning

Automatic differentiation in pytorch

Adversarial training can hurt generalization

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

Improving adversarial robustness through progressive hardening

Sarscov-2 ct-scan dataset: A large dataset of real patients ct scans for sars-cov-2 identification. medRxiv

triguing properties of neural networks

Robustness may be at odds with accuracy

Inter-rater agreement in glioma segmentations on longitudinal mri

Accurate mr image registration to anatomical reference space for diffuse glioma

An experimental study on the noise properties of x-ray ct sinogram data in radon space

On the convergence and robustness of adversarial training

Improving adversarial robustness requires revisiting misclassified examples

WHO. Coronavirus disease (covid-19) dashboard

Group normalization

Miss the point: targeted adversarial attack on multiple landmark detection

Theoretically principled trade-off between robustness and accuracy

Attacks which do not kill training make adversarial learning stronger

Geometry-aware instance-reweighted adversarial training