key: cord-0470828-nh4ze7dg
authors: Alkhouri, Ismail; Velasquez, Alvaro; Atia, George
title: BOSS: Bidirectional One-Shot Synthesis of Adversarial Examples
date: 2021-08-05
journal: nan
DOI: nan
sha: e2df3cc287dc2f57f62ac01b5afcd7093bd39b53
doc_id: 470828
cord_uid: nh4ze7dg

The design of additive imperceptible perturbations to the inputs of deep classifiers to maximize their misclassification rates is a central focus of adversarial machine learning. An alternative approach is to synthesize adversarial examples from scratch using GAN-like structures, albeit with the use of large amounts of training data. By contrast, this paper considers one-shot synthesis of adversarial examples; the inputs are synthesized from scratch to induce arbitrary soft predictions at the output of pre-trained models, while simultaneously maintaining high similarity to specified inputs. To this end, we present a problem that encodes objectives on the distance between the desired and output distributions of the trained model and the similarity between such inputs and the synthesized examples. We prove that the formulated problem is NP-complete. Then, we advance a generative approach to the solution in which the adversarial examples are obtained as the output of a generative network whose parameters are iteratively updated by optimizing surrogate loss functions for the dual-objective. We demonstrate the generality and versatility of the framework and approach proposed through applications to the design of targeted adversarial attacks, generation of decision boundary samples, and synthesis of low confidence classification inputs. The approach is further extended to an ensemble of models with different soft output specifications. The experimental results verify that the targeted and confidence reduction attack methods developed perform on par with state-of-the-art algorithms.

Ismail Alkhouri, Member, IEEE, Alvaro Velasquez, Senior, IEEE, and George Atia, Fellow, IEEE Abstract-The design of additive imperceptible perturbations to the inputs of deep classifiers to maximize their misclassification rates is a central focus of adversarial machine learning. An alternative approach is to synthesize adversarial examples from scratch using GAN-like structures, albeit with the use of large amounts of training data. By contrast, this paper considers oneshot synthesis of adversarial examples; the inputs are synthesized from scratch to induce arbitrary soft predictions at the output of pre-trained models, while simultaneously maintaining high similarity to specified inputs. To this end, we present a problem that encodes objectives on the distance between the desired and output distributions of the trained model and the similarity between such inputs and the synthesized examples. We prove that the formulated problem is NP-complete. Then, we advance a generative approach to the solution in which the adversarial examples are obtained as the output of a generative network whose parameters are iteratively updated by optimizing surrogate loss functions for the dual-objective. We demonstrate the generality and versatility of the framework and approach proposed through applications to the design of targeted adversarial attacks, generation of decision boundary samples, and synthesis of low confidence classification inputs. The approach is further extended to an ensemble of models with different soft output specifications. The experimental results verify that the targeted and confidence reduction attack methods developed perform on par with stateof-the-art algorithms.

Index Terms-IEEE, IEEEtran, journal, L A T E X, paper, template.

T HIS demo file is intended to serve as a "starter file" for IEEE journal papers produced under L A T E X using IEEEtran.cls version 1.8b and later.

Given a single instance of an object, the human brain is capable of learning the concept of the underlying phenomena and recognize similar objects in future experience. This remarkable capacity for cognition is a holy grail of machine learning models for classification tasks and beyond. Naturally, this raises questions for the development of classification models with respect to their ability to (i) learn concepts from a single observed instance and (ii) robustly classify similar objects with similar confidence for a given class. The former is being addressed under the area of one-shot learning [1] , where prior knowledge, often in the form of a pre-trained model, is leveraged in order to learn a new concept from a single datum. The latter problem of robustness is being assessed in adversarial machine learning via additive perturbations to data I. Alkhouri and G. Atia are with the Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, 32825 USA e-mails: ialkhouri@knights.ucf.edu, george.atia@ucf.edu.

A. Velasquez is with the Information Directorate, Air Force Research Laboratory, Rome, NY, 13441 USA e-mail: alvaro.velasquez.1@us.af.mil. and the synthesis of adversarial examples, which are often used to test the robustness of a given model. In this paper, we reconcile these notions of one-shot learning and the synthesis of adversarial examples for the first time in what we call oneshot synthesis. In particular, given a pre-trained model p(. ; θ) parameterized by θ and a datum x d , we propose a synthesis procedure that generates a new datum x similar to x d such that it approximately induces a user-defined output distribution p d as the inference result p m (. ; θ) of the pre-trained model. We do not restrict the given distribution p d to target a specific class. By controlling the induced output distribution, our approach generalizes traditional notions of targeted, non-targeted [2] , and confidence reduction attacks (where the goal is to lower the confidence level of the true label to cause ambiguity [3] ), which are special cases of the proposed framework.

The proposed Bidirectional One-Shot Synthesis (BOSS) solution differs from existing synthesis and adversarial machine learning methods in a number of key ways. First, is the capacity for BOSS to handle user-defined constraints on both the input and output directions of the given pre-trained classifier p(. ; θ), where the output constraint is not limited to only targeting a specific class. These constraints come in the form of distance bounds between the given datum x d (distribution p d ) and the synthesized input datum x (output inference p(x ; θ)). The capacity to handle constraints in both the input and output directions of the model has the additional benefit of producing specific outcomes. For example, a malicious attacker may want to cause a classifier trained to detect stop signs to instead classify these as yield or speed limit signs with equal confidence, which can lead to dire consequences in autonomous driving scenarios. Such inherent vulnerabilities in machine learning models are difficult to test without accounting for both the input (e.g., the synthesized datum x must look like a stop sign) and output (e.g., p 'yield' (x ; θ) ≈ p 'speed 70' (x ; θ) ≈ 0.5) directions of the model simultaneously. See Figure 1 for examples. Second, our one-shot synthesis approach requires only a single datum, which mitigates the excessive data requirements of popular methods based on Generative Adversarial Networks (GANs) [4] . Finally, it is worth noting that the use of additive perturbations to data has been widely explored in the literature to induce misclassifications and test the robustness of learning models such as [3, 5, 6, 7, 8] . Though such approaches also often require only a single datum, they function by perturbing said datum to enact the attack. Our approach differs in that we synthesize a datum from scratch, thereby enabling the exploration and synthesis of data that exists outside the confines of typical perturbation bounds.

In this paper, we present the BOSS problem to synthesize From left to right, samples are picked from the MNIST digits [9] , MNIST fashion [10] , CIFAR-10 [11] , GTSRB [12] , and COVID-19 Chest X-ray [13] database, respectively. Details about the trained classifiers are given in the supplementary material.

feature vectors that follow some desired input and output specifications. We prove that BOSS is NP-complete. An algorithmic solution is proposed based on generative networks and the back-propagation algorithm [14] to produce (from scratch) these examples in a white-box settings. We present methods to select the specification for the applications of targeted adversarial attacks, confidence reduction attacks, and decision boundary samples. We show the capabilities of our algorithm in extensive experimental results where we also compare with state-of-the-art attack methods.

The rest of the paper is organized as follows. Section II presents related work in the areas of generative models and adversarial machine learning. Section III follows with the mathematical formulation of the problem we are trying to solve. The proposed algorithmic solution is presented in Section IV. In Section V, we present instances of the proposed approach in the context of various applications. The experimental results are presented in Section VI, followed by the conclusion in Section VII.

Our approach is related to generative modeling using GANs in that we utilize a similar structure and layer configuration to its generator side, but there are key differences. Underlying GANs is the training of two sub-models: a generator model trained to generate fake examples resembling ones from an original dataset, and a discriminator model trained to classify examples as either real (i.e., from the domain) or fake (generated) [4] . The process of training a GAN requires a large amount of training data. Here, the underlying task, and the training process and its requirements are altogether different: given a trained model, we seek to generate an example that induces a predefined Probability Mass Function (PMF) at its output and to simultaneously enforce similarity to a desired example, without access to any training dataset. This is accomplished in our implementation without the use of a discriminator. Additionally, the generator and the discriminator of a GAN are trained together in a zero-sum game until the discriminator is fooled about 50% of the time [4] . In our case, we update the parameters of the generator using loss functions defined with respect to (w.r.t.) given specifications for every feature vector.

Variants of GANs have also been developed in prior work to perform individual and universal attacks. Examples include [15] , in which a GAN architecture is trained along with an attacker model in a three-way game for enhanced adversarial training and faster convergence, the Conditional GANs (CGANs) in [16] , where the generator, discriminator, and the classifier are trained to generate additive perturbations, and the Auxiliary Classifier GAN (AC-GAN) [17] , in which an auxiliary classifier is used instead of the discriminator. More closely related to our paper is the work in [18] ( [19] ) which uses generative networks without discriminators to synthesize additive (nonadditive) adversarial samples, however, there are two major distinctions. First, unlike these works, we do not require a labeled dataset to train our system to synthesize adversarial samples at inference time. Second, the methods in [18] and [19] are only implemented for targeted and non-targeted attacks. Our approach, on the other hand, is unifying in that it offers full control over the desired output PMF of the trained classifier (while maintaining similarity to specific input features), making it applicable to a wide range of scenarios, including targeted/non-targetted attacks, confidence reduction attacks, and synthesis of decision boundary examples.

Similar to BOSS, optimization-based individual attack techniques, such as the CW attack [5] , the L-BFGS attack [6] , Deepfool [7] , saliency map attack [3] , and NewtonFool [8] do not require extensive training in order to cause damage at inference time. These methods, however, largely optimize a cost function expressed in terms of an additive perturbation norm and/or the model's loss subject to misclassifications of the adversarial examples. In sharp contrast, we use a generative approach in which adversarial samples are synthesized from scratch (rather than generating imperceptible perturbations).

Our approach can also be used to synthesize near-decisionboundary samples that can be utilized in adversarial training for robust decision-making. The authors in [20] proposed a technique for adversarial training using such samples built on GANs. The idea is to fine-tune the model parameters by augmenting the training dataset with generated near-boundary samples. During training, two loss functions are minimized: the CGAN loss and the Kullback-Leibler (KL) divergence between the output of the trained classifier and the uniform distribution. Our method is used to generate near-boundary examples, but does not require extensive training with a full dataset in order to produce such examples. We remark that decision boundary examples produced by our algorithm could also be utilized in other techniques such as Knowledge Distillation (KD), a method used to enhance the training of a 'student' network based on a trained 'teacher' network [21] .

Suppose we have some trained model p with parameters θ (e.g., a trained Neural Network). An input x ∈ R N induces a probability distribution p(. 

where upper bounds δ s and δ c and loss functions d and D are given.

Therefore, according to Definition 1, in BOSS we seek to synthesize an input x that is close (within bound δ s ) to a desired feature vector x d w.r.t. the distance function d, and simultaneously ensure that the output distribution p(x ; θ) it induces is sufficiently close to a desired distribution p d w.r.t. loss function D (within bound δ c ).

First, we prove that the BOSS problem is NP-complete. This establishes that, in general, there is no polynomial-time solution to the BOSS problem unless P = NP. In Section IV, we develop a generative approach to obtain an approximate solution to the BOSS problem.

Definition 2 (CLIQUE Problem). Given an undirected graph G = (U, E) and an integer k, find a fully connected sub-graph induced by U ⊆ U such that |U | = k.

Proof. It is easy to verify that the BOSS problem is in NP since, given a tensor x, one can check whether the input and output constraints d(x, x d ) ≤ δ s and D(p(x ; θ), p d ) ≤ δ c are satisfied in polynomial time. It remains to be shown whether the BOSS problem is NP-hard. We will establish this result via a reduction from the CLIQUE problem in Definition 2. Given a CLIQUE instance G = (U, E), k with |U | = n and |E| = m, we construct its corresponding BOSS instance p(. ; θ), x d , p d , δ s , δ c as follows. Let x d = 0 denote the allzeroes vector and let p d be defined as the desired output distribution below

where = 1 − 1 − (1/(k + 1)). Finally, let δ s = k/n and δ c = 0. We choose the mean square error loss (MSE) function to compute d(x, x d ) ≤ δ s . The choice of loss function for computing D(p(x; θ), p d ) ≤ δ c is superfluous since we have chosen δ c = 0. For the given trained model p(. ; θ), we define its connectivity and parameters θ as follows. The input layer consists of n entries given by the solution x ∈ [0, 1] n . There is one hidden layer consisting of n + m ReLU functions σ 1 , . . . , σ n+m such that the first n ReLU functions have a bias term of − 1 and the next m ReLU functions have a bias term of − 2. Finally, there is an output layer with two softmax output activation functions p 1 (x ; θ) and p 2 (x ; θ). Let θ h ij denote the weight of the connection between the i th input x i and the j th ReLU activation σ j in the hidden layer. For each u i ∈ U in the given CLIQUE instance, we have θ h ii = 1. The outputs of these n ReLU activation functions are fully connected to the softmax output activation function p 1 (x; θ), each with a corresponding weight of 1. For each edge

This defines the input connectivity of ReLU functions σ n+1 to σ n+m . The outputs of these are then fully connected to the second softmax function p 2 (x ; θ), each with weight 1. See Figure 2 for an example. We now prove that there is a clique of size k in G if and only if there is a feasible solution x to the reduced BOSS instance.

( =⇒ ) Assume there is a clique of size k in G. We can derive a feasible solution x to the reduced BOSS instance as follows. For every vertex u i ∈ U in the clique, let x i = 1 and let all other values of x be 0. The corresponding MSE loss is d(x, x d ) = k/n, thereby satisfying the input constraint defined by δ s . The solution x induces an output of σ i (x i + ( − 1)) = for each entry of x corresponding to a vertex u i in the clique and an output of 0 for all other entries. Thus, we have k inputs of value into the first softmax output function. Now, let us consider the edges induced by this clique. For each edge e k = (u i , u j ) ∈ E in the clique, we have σ n+k (x i +x j +( −2)) = and an output of 0 for all other edges. Since there are k(k−1)/2 edges in a clique of size k, this yields k(k − 1)/2 inputs of value into the second softmax output function. Thus, we have p(x ; θ) = p d and the constraint D(p(x; θ), p d ) ≤ 0 = δ c is satisfied. ( ⇐= ) We prove the contrapositive. That is, if there is no clique of size k in G, then the reduced BOSS instance is infeasible. We proceed by showing that there must be exactly k non-zero entries in x in order to satisfy constraints d(x, x d ) ≤ k/n and D(p(x ; θ), p d ) ≤ 0 and that, if there is no clique of size k, then there is no choice of k non-zero entries in x that will satisfy D(p(x ; θ), p d ) ≤ 0. Note that there must be at least k entries in x with value strictly greater than (1 − ) in order to yield an input of k into the first softmax output function and satisfy the first entry in p d . Let us consider the minimum MSE loss for a solution x with more than k nonzero entries. For k + 1 entries of value strictly greater than (1 − ), we have d(x, x d ) = (k + 1)(1 − ) 2 /n. With some algebraic manipulation, we have that, for any value of ≤ 1 − 1 − (1/(k + 1)), (k + 1)(1 − ) 2 /n > k/n, thereby violating the constraint d(x, x d ) ≤ δ s . Thus, there must be exactly k non-zero entries in x. Now, let us consider the second softmax output function, which requires an input of k(k −1)/2. Since there is no clique of size k in G, any choice of k vertices in G will induce a set of edges whose cardinality is strictly less than k(k − 1)/2. Therefore, the output of the second softmax function will be strictly less than the second entry in p d . This violates the constraint D(p(x; θ), p d ) ≤ 0.

Note that, for a given CLIQUE instance in the proof of Theorem 1, the corresponding reduced BOSS instance is such that, if there exists a polynomial-time solution to the BOSS problem, then we could use this solution to solve the CLIQUE problem in polynomial time. This would imply that P = NP. We therefore conjecture that a polynomial-time solution to the BOSS problem is not likely to exist.

To obtain a solution to the BOSS problem in Definition 1, we take a generative approach in which x is obtained as the output of a generative network, g(. ; φ) : R Q → R N , with parameters φ, i.e., g(z ; φ) = x, where z ∈ R Q is a random input to the generative network. We utilize the adjustable parameters of network g for the objectives of BOSS. Therefore, we define the combined network h(. ; ψ) : R Q → [M ], whose layers are the concatenation of the layers of g and p, where ψ = {φ, θ}. In other words, h(z ; ψ) = p(x ; θ) = p(g(z ; φ) ; θ).

We augment a repeated version of vector z to create a small training dataset and utilize the back-propagation algorithm [14] . Given the two objectives of BOSS, and the utilization of the adjustable parameters of network h, φ, we introduce the surrogate losses L h (p(g(z ; φ) ; θ), p d ) and L g (g(z ; φ), x d ), and use the back-propagation algorithm to optimize φ based on the minimization

where λ is a loss weight. It is important to note that (2) is used to update parameters φ while the trained classifier parameters θ remain unchanged. Due to the use of network h, the surrogate loss functions L g and L h are selected as the MSE and the categorical cross-entropy loss, respectively. The only needed assumption is that p m (x ; θ), ∀m ∈ [M ], of the classifier of interest are differentiable w.r.t. the adjustable parameters of network h. As such, our proposed solution Algorithm 1 BOSS Input: z, p(. ; θ), g, xd, pd, δc, δs Output: x 1: Initialize x, φ, λ 2: while D(p(x ; θ), pd) ≥ δc or d(x, xd) ≥ δs 3: get φ as the minimizer of (2) with λ 4: x = g z ; φ 5: update λ using (3) 6: return x is implementable with any parameterized classifiers (not necessarily NNs) with differentiable outputs (e.g., support vector machines [22] ).

In the following, we present an algorithmic approach to solve BOSS by iteratively optimizing (2) . At every iteration, the adjustable parameters φ of the generator model g are updated to satisfy the two objectives of small PMF distance from p d and high similarity of the generated example to x d . We define an exit criteria if a maximum number of iterations/steps T is reached, or if a feasible solution per Definition 1 is found given x d , p d , δ s , and δ c .

The parameter λ in (2) weighs the relative importance of each loss function to both avoid over-fitting and handle situations in which the solver converges for one loss function prior to the other [23] . We propose an iterative approach to select the value of λ at every iteration. This dynamic update depends on the distance D between the desired specification p d and the actual output p(g(z ; φ) ; θ) at iteration τ . Specifically, we update λ as

As such, it is required to have the distance function D returning values in the range of [0, 1]. Here, we utilize the Jensen-Shannon (JS) divergence distance [24] , which returns 0 for two equivalent PMFs and is upper bounded by 1. The updates are also a function of the initial selection of λ denoted λ 0 . Since in this paper we focus on the task of image classification, we set d = 1 − I, where I is the Structural Similarity Index (SSIM) [25] , which is equal to 1 for two identical images. The signum function sign(.) is used to determine whether to increase or decrease λ, based on the ratio of the actual and desired specifications which regulates the amount of change. The ReLU function σ(. ) prevents λ from becoming negative. This occurs when the desired p d is easily attained in early steps of the algorithm. The procedure is presented in Algorithm 1.

3 (left) shows a block diagram of the proposed method.

In this section, we extend our solution presented in Algorithm 1 to an ensemble of V models. 

p v (. ; θ v ) parameterized by θ v , ∀v ∈ [V ], a tensor x d ∈ R N , and probability distributions p d,v ∈ ∆ M ,

2: while D(p v (x ; θv), pd,v) ≥ δc,v or d(x, xd) ≥ δs 3: get φ as the minimizer of (4) with λv 4: x = g(z ; φ) 5: update λv using (5) 6: return x

, where upper bounds δ s and δ c,v and loss functions d and D v are given.

In this case, Algorithm 2 presents the procedure as we update φ to solve the minimization

where λ v is the weight of the loss function L

h . The parameter λ v of the v th classifier is dynamically updated based on its initial value λ 0 v , threshold δ c,v , and distance 

In this section, we present applications of our proposed framework along with the designation of the desired specifications. We remark that the choice of x d is not exclusive to data from the same distribution as that used to train the target model p. For example, BOSS can be used to generate an image that looks like a 'shirt' but gets classified as a '5' (with high confidence) against a classifier trained using MNIST digits.

Inspired by the defense mechanism presented in [20] , we could use our method to generate boundary samples, which are examples that fall near the decision boundary of a pre-trained classifier. Such examples can be used for adversarial training to enhance robustness or for knowledge distillation (e.g., see [20, 21] ). Here, we focus on synthesizing such examples regardless of the application of interest. Unlike [20] , we achieve this without the need to train a CGAN with a full dataset, and with the additional flexibility of choosing a desired soft output p d (e.g., uniform over only a subset of the classes). The first four synthesized samples in the bottom row of Figure 1 are all examples of near-decision-boundary samples.

To specify p d , we define a subset B ⊆ [M ] of labels associated with the desired boundaries near which the examples are to be generated, so that

We term this application, BOSS Boundary examples (BOSS-B), which is a function of the set B. As a special case of (6), setting B = [M ] yields a uniform distribution over all classes. We call this case BOSS near Uniform (BOSS-U) boundary example generation.

Our framework can be used to implement targeted and nontargeted adversarial attacks. Since non-targeted attacks can be considered as a special case of targeted ones [5] , we present selections for the latter.

Given target label t ∈ [M ], the goal is to generate x such that argmax

The selection of p d is simply the indicator function p d (l) = 1{l = t}, which takes the value 1 if l = t and 0 otherwise. This representation is also known as the one hot encoding of target label t [23] . We call this attack, BOSS Targeted attack (BOSS-T). The last example of Figure  1 presents original and synthesized images as an instance of BOSS-T.

Confidence reduction attacks can cause classification mistrust or ambiguity, which could have dire consequences in safetycritical systems [3] . To synthesize confidence reduction examples using our framework, let c d ∈ [0, 1) denotes the desired confidence for the predicted label f * (x d ) := argmax

Then, we define

We term this setting as the BOSS Confidence reduction attack (BOSS-C).

Our approach can be used to generate a sample x that is classified as one target class t by V models. This is accomplished by enforcing p d,1 = · · · = p d,V = p d and express p d as in BOSS-T. In another setting, p d,v , ∀v ∈ [V ] can be set differently by selecting different target classes t v , ∀v ∈ [V ].

We use D as the JS distance to compare PMFs (desired and actual) and the SSIM index I as a measure of similarity between examples. We define σ JS and σ s as the average of D and I, respectively, over the set of observations X . In addition to the aforementioned metrics, for BOSS-T, we utilize the attack success rate α =: N s /|X |, where N s is the number of times a generated adversarial sample is classified as the predefined target label. For BOSS-C, we compute σ con , defined as the average confidence level of prediction of the true label over the set of interest X .

We use the MNIST digits dataset [26] in the evaluation of the different applications described in Section V and present additional examples from other datasets such as MNIST Fashion [10] , CIFAR-10 [11] , GTSRB [12] and COVID-19 Radiography [13] in Figure 1 . More examples using these datasets are also included in the supplementary document showing similar performance trends. We compare BOSS-T and BOSS-C to the CW and NewtonFool attacks, respectively. We use the l 2 variant of CW with a maximum number of iterations 3000, confidence 0.9, learning rate 0.01, and binary search steps 10. For NewtonFool, we use 50 iterations and set the small perturbations parameter as η = 0.01. The details of the architectures of the pre-trained classifiers and the generative networks used are provided in the supplementary document. The initial loss weights are chosen as λ 0 = λ 0 v = 0.001. The random vector z of dimension Q = 100 is generated from a uniform distribution over the interval [0, 1]. The parameters are updated using the ADAM optimizer [27] with step size 0.025.

First, we show the performance of the BOSS algorithm as a function of the similarity (represented by d(x, x d )) and PMF distance (represented by D(p(x ; θ), p d )), w.r.t iteration τ ∈ T . Results for BOSS-C is shown in Figure 4 . We use c d = 0.6 for the desired confidence level. The top curve represents the dynamic loss weight, λ, assignment as a function of the measures used (d and D), and the accepted thresholds for the two objectives. For each iteration, τ , the update is taking place according to (3) . Hence, results presented in the figure are implemented without the use of the exit criteria of Algorithm 1. As observed, when τ increases, d and D decrease. As expected, after the step 5, where d and D satisfy the requirements (as represented in δ s and δ c ), λ decreases. Figure 5 show the similarity I(x, x d ), and its Cumulative Distribution Function (CDF), and the third and fourth plots show D(p(x ; θ), p d ) and its Complementary CDF (CCDF). For the similarity I, we observe that BOSS-C achieves the best performance while other methods return nearly similar results. We remark that BOSS-T yields the largest PMF distance D since the requirement of inducing misclassifications to the target label (i.e., ensuring the mode of the distribution occurs at the target class) can still be attained by relaxing the distance upper bound δ c to a relatively large value. BOSS-U yields the smallest JS distance since the outputs p m (x ; θ), ∀m ∈ [M ] of all nodes need to be equal (uniform distribution). Similar observations are noticed for BOSS-C and BOSS-B. This experiment underscores the flexibility of our framework in achieving the goal of each application as specified by x d , p d , δ c , and δ s .

For BOSS-C, Table II presents the results for σ con , σ s , and σ JS . For an average confidence of σ con ≈ 67%, BOSS-C (with c d = 0.6) and NewtonFool are successful in reducing the average confidence of the model from the original value σ con = 99.1%. This is accomplished with very high level of similarity measure of σ s ≈ 88% and σ s ≈ 97% for BOSS-C and NewtonFool, respectively. While NewtonFool attack produces examples with higher σ s , it fails to maintain the classification accuracy CA which drops from 98.1% to 75.5%, and yields a large distance σ JS from the desired PMF. Results on the confidence scores of BOSS-C and NewtonFool are provided in the supplementary material. Figure 6 illustrates the confidence scores without attacks and after BOSS-C and NewtonFool are implemented for 100 feature vectors. The curves reflect the strength of the BOSS-C and NewtonFool in reducing the confidence level to the desired scores c d = 0.6. Fig. 7 : Comparison between the output PMF and the desired p d in (6) , in terms of the l 2 and l ∞ norms for 100 boundary examples generated using our proposed method (top), and their corresponding values w 1 (x) and w 2 (x) (bottom). 

The performance of BOSS-T against the CW attack [5] is presented in Table I for attack success rate α = 100%. BOSS-T outperforms CW in terms of imperceptibility (higher imperceptibility of σ s = 86.83% for BOSS-T in comparison with σ s = 83.83% for CW) and the average execution time per feature vector. The CW attack scores higher in terms of the average distance σ JS and the average classification confidence σ con . For fair comparison, we use an average case setting in evaluating targeted attacks in which the target labels are chosen uniformly at random [5] .

A comparison between BOSS-T and CW in terms of the SSIM and CDF of the SSIM between the original and synthesized examples is shown in Figure 5 (first) and Figure 5 (second), respectively. As shown, on average BOSS-T achieves similar similarity performance as CW.

Samples of every application are presented in Figure 9 . CW and NewtonFool samples are included for comparison purposes. Samples of BOSS-C and NewtonFool are given in the second and third rows, respectively. While BOSS-C succeeds to maintain the correct classification of these samples, NewtonFool fails to maintain that as seen from the images of digits '3', '5', '8', and '9'. The rounded confidence scores (in percentage) are given at the bottom of each synthesized image. Figure 10 shows samples from CIFAR-10 dataset. Figure 11 shows samples from the Chest X-ray dataset where labels 0-3 represent 'Normal Lung', 'COVID-19', 'Ovine Pulmonary Adenocarcinoma' (OPA), and 'pneumonia'.

We conduct two experiments to demonstrate the capability of our proposed algorithm in targeting two trained classifiers for the MNIST dataset p 1 (. ; θ 1 ) and p 2 (. ; θ 2 ). 81  80  80  82  81  81  82  83  82  82   80  86  69  80  74  78  80  77  81  75   82  81  81  82  81  81  82  85  81  81   91  94  83  86  88  85  84  87  91 Figure 9 .

of the first experiment is to generate an example x (using Algorithm 2) such that the classification of both trained models is altered to the same target label t. In the second experiment, the goal is to alter the classification to two different labels. Samples from these experiments are presented in Figure 13 . The columns represent the ground truth labels and the rows show the original example, synthesized samples for the experiment with same targets, and synthesized samples for the experiment with different targets, respectively. The target labels of both models shown as pairs at the bottom of the synthesized images were chosen at random. Figure 12 presents the CDF of I (left) and the CCDF of D (right) computed over 100 trials. For both experiments, it is observed that Algorithm 2 successfully achieves both objectives as indicated by the high similarity and low JS measures. Additionally, We notice that the 'same targets' experiment returns slightly higher SSIM in comparison to the 'different Fig. 12 : [REMOVE JS FROM D] CDF of I representing the similarity measure for the experiments of targeting an ensemble of two models p 1 (x ; θ 1 ) and p 2 (x ; θ 2 ) (left). CCDF of D to measure the PMF distances for the cases of same targets (p d,1 = p d,2 ) and different targets (p d,1 = p d,2 ) (right). The average case scenario is used to select the target labels. Fig. 13 : Samples from each class of the MNIST dataset (columns) for the targeted attacks scenario against an ensemble of two models p 1 (. ; θ 1 ) and p 2 (. ; θ 2 ) where the target classes are the same (second row) and different (third row). The SSIM is displayed on top while the predicted label w.r.t. p 1 (. ; θ 1 ) and p 2 (. ; θ 2 ), respectively, are shown on the bottom.

targets' while reporting similar performances in terms of the PMF distances. This is due to the fact that the latter setting requires that each model specifies different targets which adds more constraints to the process of tuning parameters φ.

We introduced BOSS, a framework for one-shot synthesis of adversarial samples that satisfy input and output specifications for pre-trained classifiers. We formulated the BOSS problem and developed an approximate solution using generative approach. We also presented an extension of the proposed approach to an ensemble of trained models. The flexibility of BOSS is demonstrated through various applications, including synthesis of boundary examples, targeted attacks, and reduction of confidence samples. A set of experiments verify that BOSS performs on par with state-of-the-art methods, without the need for extensive training and with the additional flexibility it affords on output specifications and the number of targeted models.

Generalizing from a few examples: A survey on few-shot learning

Adversarial machine learning in image classification: A survey towards the defender's perspective

The limitations of deep learning in adversarial settings

Towards evaluating the robustness of neural networks

Intriguing properties of neural networks

Deepfool: a simple and accurate method to fool deep neural networks

Objective metrics and gradient descent algorithms for adversarial examples in machine learning

Mnist handwritten digit database

Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms

Learning multiple layers of features from tiny images

Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition

Can AI help in screening viral and COVID-19 pneumonia?

A direct adaptive method for faster backpropagation learning: The rprop algorithm

Rob-gan: Generator, discriminator, and adversarial attacker

Generating adversarial examples with conditional generative adversarial net

Constructing unrestricted adversarial examples with generative models

Generative adversarial perturbations

Adversarial transformation networks: Learning to generate adversarial examples

Enhancing the robustness of deep neural networks by boundary conditional gan

Knowledge distillation with adversarial samples supporting decision boundary

Support vector machines

Deep learning

Divergence measures based on the shannon entropy

Image quality assessment: from error visibility to structural similarity

Gradient-based learning applied to document recognition

Adam: A method for stochastic optimization