key: cord-0152634-rtum18uj
authors: Chen, Li; Zhu, Shaowei; Yin, Zhaoxia
title: Reversible Attack based on Local Visual Adversarial Perturbation
date: 2021-10-06
journal: nan
DOI: nan
sha: a9b3a261e5ff0f3b44beba88ad4e4f17c56c8ce9
doc_id: 152634
cord_uid: rtum18uj

Deep learning is getting more and more outstanding performance in many tasks such as autonomous driving and face recognition and also has been challenged by different kinds of attacks. Adding perturbations that are imperceptible to human vision in an image can mislead the neural network model to get wrong results with high confidence. Adversarial Examples are images that have been added with specific noise to mislead a deep neural network model However, adding noise to images destroys the original data, making the examples useless in digital forensics and other fields. To prevent illegal or unauthorized access of image data such as human faces and ensure no affection to legal use reversible adversarial attack technique is rise. The original image can be recovered from its reversible adversarial example. However, the existing reversible adversarial examples generation strategies are all designed for the traditional imperceptible adversarial perturbation. How to get reversibility for locally visible adversarial perturbation? In this paper, we propose a new method for generating reversible adversarial examples based on local visual adversarial perturbation. The information needed for image recovery is embedded into the area beyond the adversarial patch by reversible data hiding technique. To reduce image distortion and improve visual quality, lossless compression and B-R-G embedding principle are adopted. Experiments on ImageNet dataset show that our method can restore the original images error-free while ensuring the attack performance.

In this section, we emphasize the research significance of our work from the following four aspects: (1) research background and the great research value of reversible adversarial examples; (2) research progress of adversarial attack; (3) familiarization of reversible information hiding and analysis of the research status of reversible adversarial attack based on reversible information hiding; (4) motivation and contributions of the proposed work.

Recently, deep learning [1] has become more important in various tasks, such as autonomous driving [2] , face recognition [3] , and image classification [4] . However, researchers [5] have found that the welldesigned adversarial examples pose a potential threat to the security of deep learning systems. Adversarial examples [6, 7] refer to adding specific noise to the input images that humans cannot perceive but machines can, causing the model to misclassify with high confidence.

Adversarial examples will interfere with the neural network analysis of input data, and its appearance has brought significant challenges to the security of intelligent systems [8] . Some researchers have used adversarial examples for positive application [9] in recent years. To protect the privacy of images, the model can't be retrieved without affecting the user's normal recognition of image content [10, 11] . For example, video conferencing has become the norm since COVID- 19 . To ensure that the meeting content is not recognized by the artificial intelligence system of third-party conferencing software, users can take adversarial advantage of noise to achieve the purpose of privacy protection [12] . However, the data has lost its value in digital forensics, medical treatment, and other fields after human processing. Therefore, it is significant for the study of examples that are both adversarial and reversible. Reversible adversarial attack [13, 14] refers to embedding the information needed to restore the original images into adversarial examples through the reversible technology to generate both adversarial and reversible examples. This type of examples is known as Reversible Adversarial Example (RAE) [13] . On the one hand, RAEs can play an adversarial role and attack those unauthorized or malicious artificial intelligence systems, thus achieving the purpose of protecting image data. On the other hand, users can restore original images without any distortion from RAEs for authorized legal systems. The emergence of RAEs provides new insights for people to study adversarial examples.

In recent years, adversarial attack has become an important issue, and an increasing number of researchers have been interested in the study on adversarial attack. In this section, we review the research status of adversarial attack.

In 2013, the reference [7] first proposed the concept of adversarial attack. Generally, we can divide adversarial attack methods into white-box attack [15] and black-box attack [16] . The white-box attack hypothesis that attackers have complete knowledge of the target model, and generate adversarial examples based on the gradient of the target model, such as Fast Gradient Sign Method (FGSM) [5] and Carlini and Wagner Attacks (C&W) [17] . The black-box attack hypothesis is that attackers don't know the architecture of the target model and generate adversarial examples only through the input and output of the model, such as One Pixel Attack [18] . Besides image-specific adversarial perturbation, the reference [19] proved the existence of universal adversarial perturbation, so we add the universal adversarial perturbation to different images, which could cause most of images in the dataset to be misclassified by the classification model. Apart from the above imperceptible adversarial perturbation, the references [20] and [21] have studied an alternative method of generating adversarial examples in which the attacker limits the perturbation to a small region in the image but does not limit the amplitude of the perturbation. We call it adversarial patch or adversarial sticker. Compared with the traditional adversarial perturbation, which is imperceptible, the adversarial patch is not imperceptible completely, but it does not affect cognition and classification semantically for humans. More importantly, it has the advantage of being independent of the scene and the input. As shown in Fig. 1 , the two on the left are traffic signs with graffiti in the real world, while the two on the right are traffic signs with adversarial patches. We can see that the adversarial patch is more like the natural corrosion of the image than the adversarial noise. Therefore, the adversary can easily attack the realworld deep learning system [22] . So far, there have been many adversarial patch generation methods. Brown et al. [20] proposed a method to create a universal adversarial image patch that can attack any image. The patch can be printed and pasted on any image, photographed, and presented to the image classifier. Karmon et al. [21] shown that a patch made by modifying 2% pixels can attack the most advances InceptionV3. They used an optimization-based method and a modified loss function to generate local adversarial perturbation. In order to improve the visual fidelity and attack performance, Liu et al. [23] proposed the PS-GAN framework. First, through the adversarial process, it converted the input of any type of patch into an adversarial patch that is highly related to the attack image. Second, they introduce the attention mechanism to predict the key attack area to further enhance the attack capability. 

Adversarial attack technology refers to modifying the input image to make it misclassified by the model, but it has no effect on human cognition in semantics. A similar technique in the important issue is information hiding [24] [25] [26] , which is a technology that hides secret information into publicly available media information, and it is difficult for people to perceive its existence.

The implementation of reversible information hiding technology, or recoverable information camouflage technology, is mainly divided into two categories: reversible image transformation [27] and reversible data embedding [24] . Reversible image transformation refers to the reversible transformation of the original image into an arbitrarily selected target image of the same size, getting a camouflage image almost indistinguishable from the target image. Reversible data embedding means that the image is modified by specific rules to embed secret data, and we can restore the original image after data extraction. Both adversarial attack and reversible information hiding can achieve the purpose of attack or data hiding by modifying the signal of the input image without affecting the semantics. Is it possible to create images that are both adversarial and reversible using the combination of adversarial attack and reversible information hiding technology?

In 2018, in the reference [13] , the concept of reversible adversarial examples is proposed. They embed the signal error between the original image and the adversarial example into the corresponding adversarial example using the reversible information hiding technology, and get the reversible adversarial example that still causes the model to misclassify. At the same time, this method can extract adversarial perturbation from the reversible adversarial example and use the reversible adversarial example to subtract the adversarial perturbation to get the original image. In generating adversarial examples, the effect of attack cannot be achieved if the perturbation amplitude is too small. Therefore, to ensure the success rate of the attack, a large amplitude of the perturbation is required. As the amplitude of adversarial perturbation increases, that would cause the following three problems: (1) Reversible data embedding cannot fully embed noise, therefore the original image cannot be restored; (2) The reversible adversarial image is severely distorted, which leads to unsatisfied image quality; (3) Due to increased distortion of RAE, the attack ability decreases accordingly. In order to solve the above problems, the reference [14] proposes to use the idea of reversible transformation instead of reversible data embedding to construct reversible adversarial examples. By adopting the Reversible Image Transformation (RIT) algorithm [27] , it directly disguised the original image as its adversarial example to obtain the reversible adversarial example. The realization of the "reversibility" of this scheme is not limited by the amplitude of the adversarial perturbation. Therefore, while ensuring the visual quality of the reversible adversarial example, it can achieve a reversible attack with a higher attack success rate.

As mentioned above, to generate reversible adversarial examples, the reference [13] adapts reversible data embedding technology to embed the signal error between original images and adversarial examples into corresponding adversarial examples. The reference [14] uses RIT technology to disguise the original image as a corresponding adversarial example to obtain the reversible adversarial example. However, these two approaches are aimed at the traditional adversarial noise that is imperceptible to people and does not take into account the locally visible adversarial perturbation [28] . In fact, visual adversarial perturbation has a greater impact on image content and usability, making reversibility more necessary. Thus, we conducted related experiments. To get the adversarial example, we first train the adversarial patch and then determine the position of the patch in images. Finally, we use the RIT algorithm to generate reversible adversarial examples and input them into the model to test the success rate of attack. Analyzing the experimental results, we found the following problems: (1) The impact of the embedding of auxiliary information on the initial adversarial perturbation is ignored during the reversible image transformation process, leading to a significant decline in the attack success rate of the reversible adversarial examples. (2) The amount of auxiliary information required by the reversible image transformation technology is relatively stable. It does not become smaller with the decrease of the perturbation size, resulting in the serious distortion of reversible adversarial examples and affecting visual quality.

To solve the problems, we propose a method for generating reversible adversarial examples against local visual adversarial perturbation. To get reversible adversarial examples, we first train the adversarial patch and then optimize the patch location in the image. Then the information required to restore the original image is embedded into adversarial examples. Finally compared our method with the RIT technology in the reference [14] . Experiments show that the proposed method can solve the above problems and generating reversible adversarial examples with good attack performance. The rest of this paper is organized as follows. In section 2, we introduce the generation process of the proposed reversible adversarial examples in detail. Section 3 shows the experiments and analysis. Section 4 gives conclusions and prospects.

In this section, we describe the method to generate reversible adversarial examples against local visual adversarial perturbation. The overall framework of our method is shown in Fig. 2 

To generate reversible adversarial examples, we must first generate adversarial examples. Image-specific [15] and universal adversarial perturbation [19] are two different types of adversarial perturbation. The former can only generate adversarial perturbation for a single image, however the perturbation cannot achieve the attack effect for a new example and must be generated again. The adversarial perturbation generated by the latter can attack any image. Because the universal adversarial perturbation has better generalization performance [19] , we carry out experiments based on universal adversarial perturbation. As a result, we use the algorithm of reference [20] to generate adversarial examples. The attack algorithm is briefly described below. Given patch P, image , patch position , patch transformation , and a target class ̂, define a patch operation O( , , , ) which first performs the transformation on the patch , and then applies the adversarial patch to the position of the image . In order to obtain the trained patch ̂, the attacker uses a variant of the Expectation over Transformation framework [29] and performs the objective function by ̂= arg max Χ~,~T,~L [log Pr (̂|O( , , , ) 

where denotes dataset, T is a set of transformations including rotations, scaling, and so on, L is a distribution over locations in the image.

Since the location of the patch in the image affects the attack's effectiveness, finding a particularly "vulnerable" place can significantly boost the performance of the attack [30] . Therefore, we employ the Basin Hopping Evolution (BHE) algorithm [31] to discover the patch's ideal position when applying it to the image. The BHE algorithm combines the Bopping Evolution algorithm and the evolutionary algorithm. First, initialize the population, and then begin the iterative process. Using one iteration as an example, first, use the BH algorithm to develop a better series of solutions, then perform crossover and selection operations to choose the next generation population. To maintain the diversity of solutions, the BHE algorithm has numerous starting points and crossover operations to obtain the global optimal solution.

After obtaining adversarial examples, reversible data embedding technology is used to generate reversible adversarial examples. Specifically, we use the adversarial example as the carrier image and utilize reversible data embedding technology to embed the pixel value of the original image covered by the patch into the carrier image. Next, we will introduce in detail the generation process of the reversible adversarial examples.

For color images, first, divide them into three channels and use the same embedding algorithm for each channel. We use the B-R-G embedding principle to reduce the impact on the visual quality of color images since each channel has a different influence on human vision [32] . A flag bit and a threshold are assigned to each channel. The flag bit indicates whether the channel is embedded with data, 0 means no embedded data, and 1 indicates embedded data. Different embedding capacities correlate to various thresholds. As the size of the adversarial patch increases, the amount of data embedding also rises. In order to ensure the complete embedding of information, we first used WebP to compress the embedded image, and then used Prediction Error Extension (PEE) [33] , a reversible data embedding approach with huge embedding capacity that takes advantage of the correlation of more adjacent pixels. The information is embedded outside the patch during the data embedding to reduce the impact on the original adversarial perturbation. Finally, we take the coordinates of the adversarial patch, the flag, and threshold corresponding to each channel as auxiliary information, and use the same data embedding method to embed auxiliary information in the upper left corner of the image. In this step, we set the threshold to a fixed value .

The embedding process of PEE can be summarized into the following two steps:

Step 1, computing Prediction Error. According to the pixel value and the predicted value ̂ , the prediction error can be calculated as = −̂.

(2) In contrast to Difference Expansion (DE), this method creates feature elements for extended embedding using a predictor instead of a difference operator. The predictor predicts the pixel value based on the neighborhood of a given pixel, using the inherent correlation in the pixel neighborhood.

Step 2, data embedding. The prediction error after embedding a bit i can be calculated as 

The process of data extraction is the reverse of data embedding. When the authorized model accesses the reversible adversarial examples, the embedded information can be extracted using the data extraction algorithm and restore the original image without any distortion. Fig. 3 shows the original image, adversarial example, reversible adversarial example, and restored image in the experiments. The following is the restoration procedure:

Step 1, extracting auxiliary information. According to a fixed value , the data extraction algorithm extracts auxiliary information from the upper left corner, including patch coordinates, flags, and thresholds corresponding to the three channels.

Step 2, extracting image information. First, the image is cropped and reorganized according to the patch coordinates. Then, based on the threshold values and flags extracted in the first step, the same extraction algorithm extracts data from the three channels.

Step 3, restoring original image. The extracted image is decompressed, and the restored image is overwritten to the patch image according to the patch coordinate, thereby restoring the original image without any distortion.

Adversarial Example Reversible Adversarial Example Recovered Image 

In this part, we illustrate the superiority of our method in the task of image classification attack. First, we will introduce the experiment settings. Second, comparative experiments are conducted by us from two aspects: attack success rate and image visual quality. Finally, we discuss and analyze the results of the experiment.

We choose ImageNet [34] dataset for experiments. In order to train the adversarial patch, we choose a train set of 2000 images. The patch size is 38×38, 42×42, 48×48, and 54×54, respectively, and these do not exceed 6% of the image size. During the testing phase, 600 images are selected randomly as the test set and ResNet50 [35] is chosen as the target model.

We compare the attacking performance and image visual quality of our method with the RIT-based method [14] .

To prove the superiority of RAEs we generate in terms of attack performance, this part compares our method with the latest state-of-the-art RIT-based method [14] . The experimental results are shown in Tab. 1. The second line shows the attack success rates of the generated adversarial examples [20] . The attack success rates of the reversible adversarial examples generated by the RIT-based method [14] and our proposed method, respectively, are shown in the third and fourth lines. We can see that the attack success rates of our reversible adversarial examples are 86.96%, 87.79%, 89.13%, and 93.48% when the patch sizes are 38×38, 42×42, 48×48, and 54×54, respectively. In the same situation, the reversible adversarial examples generated by the RIT-based method had attack success rates of 77.09%, 80.60%, 83.61%, and 87.29%, respectively. Experiments show that the attack performance improves as the size of the adversarial patch grows. Moreover, the attack success rates of the RAEs generated by our proposed algorithm are much greater than those generated by the RIT-based method under the same circumstances. This indicates that RAEs generated by our method have a high attack success rate.

We 

where 1 represents the maximum value of the image point color and the variance is defined as 

where

and , and are all greater than 0, In the above formula, and are all pixels of the image block; and are the standard deviation of image pixel value. is the covariance of X and ; C 1 , C 2 and C 3 are constants. Because the calculation of PSNR is based on the error between corresponding pixels, while SSIM measures image similarity from brightness, contrast and structure respectively. The value of SSIM ranges from 0 to 1. The value is 1 when the two images are identical, and the larger the value, the more similar the two images are. The results are shown in Tab. 3. The SSIM value of the RIT-based RAEs remains at 0.95, but our RAEs' SSIM is more than 0.95 and near to 1 when the patch is small. We ca see from the results in Tab. 2 

In this part, we will analyze the experimental results of the proposed method from the perspective of image visual quality and attack success rate.

According to the results of PSNR and SSIM in the experiment, we found that the PSNR value of the RIT-based method is kept at around 34, and the SSIM value is maintained at 0.95. This is because the embedding amount of auxiliary information in this method is essentially constant, so the image distortion is severe even as the patch size decreases. As the patch size decrease, the amount of information embedded becomes smaller, and the region outside the patch expands, so the PSNR of the image rises, and the SSIM value even approaches 0.99 in our proposed method. Therefore, the image distortion of reversible adversarial examples generated by our method is less than that generated by the RIT-based method. Thus, we can better preserve the unique structure of original adversarial perturbation.

It can be seen from the results of Tab. 1 that our attack performance is better than RIT-based method, because we better preserve the unique structure of original adversarial perturbation. The adversarial patch in the image is more salient than other objects. Hence the network detects the adversarial patch and ignores other targets in the image [18] , resulting in the image being misclassified. The RIT-based method directly converts the original images into the target images when generating reversible adversarial examples. It does not consider the impact on the initial adversarial perturbation when embedding auxiliary information. The method described in this paper avoids the region where the patch is located when the information is embedded, thereby reducing the impact on the initial adversarial perturbation. The result of image quality evaluation also shows that the reversible adversarial examples generated by the method in this paper keep the adversarial examples' perturbation structure better. When the patch size was 54×54, our PSNR value was lower than that of the baseline method, but our attack success rate was still relatively high because we kept the structure of the original adversarial perturbation better, and our SSIM value was not lower than that of the baseline method. Therefore, the attack success rate of the reversible adversarial examples generated by our proposed method is better than that of the RIT-based method.

In this paper, we explored the reversibility of adversarial examples based on locally visible adversarial perturbation and proposed a reversible adversarial example generation method by embedding the information in the area beyond the patch to preserve adversarial capability and achieve image reversibility. To guarantee the visual quality of the generated adversarial example images, we have to minimize the amount of the data that has to been embedded for original image recovery, thus lossless compression is adopted. Compared with the RIT-based method, the proposed method achieves both complete reversibility and state-of-the-art attack performance. As well known, for image blocks of the same size, the smoother the image area, the higher the lossless-compression efficiency, and the smaller the amount of compressed data. Therefore, in the future, we are going to patch the adversarial perturbation to the smooth area as much as possible to enhance performance.

Deep learning

Robustness analysis of behavioral cloning-based deep learning models for obstacle mitigation in autonomous vehicles

Scale fusion light cnn for hyperspectral face recognition with knowledge distillation and attention mechanism

Enhancing adversarial robustness for image classification by regularizing class level feature distribution

Explaining and harnessing adversarial examples

Threat of adversarial attacks on deep learning in computer vision: A survey

Intriguing properties of neural networks

Adversarial examples: Attacks and defenses for deep learning

Benign adversarial attack: Tricking algorithm for goodness

Scene privacy protection

On the (im)practicality of adversarial perturbation for image privacy

Motion-excited sampler: Video adversarial attack with sparked prior

Reversible adversarial examples

Reversible adversarial attack based on reversible image transformation

Deepfool: a simple and accurate method to fool deep neural networks

Delving into transferable adversarial examples and black-box attacks

Towards evaluating the robustness of neural networks

One pixel attack for fooling deep neural networks

Universal adversarial perturbations

Adversarial patch

Lavan: Localized and visible adversarial noise

Adversarial examples in the physical world

Perceptual-sensitive gan for generating adversarial patches

Reversible data embedding using a difffference expansion

Reversible data hiding based on reducing invalid shifting of pixels in histogram shifting

Reversible data hiding

Reversible visual transformation via exploring the correlations within color images

On visible adversarial perturbations & digital watermarking

Synthesizing robust adversarial examples

Adversarial training against location-optimized adversarial patches

Adv-watermark: A novel watermark perturbation for adversarial examples

A high visual quality color image reversible data hiding scheme based on BRG embedding principle and CIEDE2000 assessment metric

Expansion embedding techniques for reversible watermarking

Imagenet large scale visual recognition challenge

Deep residual learning for image recognition

The authors thank the anonymous referees for their valuable comments and suggestions. We express our heartfelt thanks to National Natural Science Foundation of China 62172001, Reversible Adversarial Examples for funding this study. 

The authors declare that they have no conflicts of interest to report regarding the present study.