key: cord-0201562-unlneegw authors: Wenger, Emily; Shan, Shawn; Zheng, Haitao; Zhao, Ben Y. title: SoK: Anti-Facial Recognition Technology date: 2021-12-08 journal: nan DOI: nan sha: 3c93dac21c1184f1dc6beb3452bcab88f4b42ade doc_id: 201562 cord_uid: unlneegw The rapid adoption of facial recognition (FR) technology by both government and commercial entities in recent years has raised concerns about civil liberties and privacy. In response, a broad suite of so-called"anti-facial recognition"(AFR) tools has been developed to help users avoid unwanted facial recognition. The set of AFR tools proposed in the last few years is wide-ranging and rapidly evolving, necessitating a step back to consider the broader design space of AFR systems and long-term challenges. This paper aims to fill that gap and provides the first comprehensive analysis of the AFR research landscape. Using the operational stages of FR systems as a starting point, we create a systematic framework for analyzing the benefits and tradeoffs of different AFR approaches. We then consider both technical and social challenges facing AFR tools and propose directions for future research in this field. In recent years, facial recognition systems have accelerated their growth in scale and reach, becoming an increasingly ubiquitous part of our daily lives. The majority of citizens in the world's most populous countries are enrolled in one or more facial recognition systems, whether they know it or not. In the United States, nearly 200 million Americans are enrolled in the FBI facial recognition database, which leverages access to driver license photos from most states [1] . In China, a well-known surveillance system uses facial recognition to monitor civilian behavior and enforce the social credit score system [2] , [3] . In Russia, authorities acquired 100,000+ cameras in Moscow to build a facial recognition-based COVID quarantine enforcement system [4] . Beyond government use cases, facial recognition systems are now regularly used for myriad purposes, including authenticating travelers at airports and employees entering corporate offices. The advancements that paved the way to these facial recognition systems have also opened the door to their potential misuse and abuse. With moderate resources, an individual or institution, public or private, can now extract training data from social media and online sources to build facial recognition models capable of recognizing large groups of users. In 2020, New York Times journalist Kashmir Hill demonstrated the potential for facial recognition misuse when she profiled Clearview.AI, a private for-profit company that scraped over 3 billion images from "public sources" to build a facial recognition system that recognized hundreds of millions of private citizens [5] , without their knowledge or consent. Clearview and companies like it could enable surveillance and tracking by anyone willing to pay 1 . In addition to images shared online, other reports have detailed how photos taken in unexpected places -airports, city streets, government buildings, schools, corporate offices -can end up in facial recognition systems without subjects' knowledge or consent (e.g., [1] , [7] , [8] , [9] , [10] , [11] ). Despite backlash against intrusive facial recognition systems [12] , [13] , [14] , [15] , there are few tools available to protect users against them. While big tech has begun to selfregulate [16] and openly called for legislation [13] , [12] , legislative efforts to regulate facial recognition remain scarce. In their place, a cottage industry of anti-facial recognition (AFR) tools has emerged. These AFR tools are designed to target different parts of facial recognition systems, from data collection and model training to inference, with the unified goal of preventing successful recognition by unwanted or unauthorized models. In the last 12 months, more than a dozen AFR tools have been proposed (e.g., [17] , [18] , [19] , [20] , [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] , [29] , [30] , [31] , [32] ). While most are constrained to research prototypes, a few of these tools have produced public software releases and gained significant media attention [19] , [22] , [33] . Proposals in the rapidly growing collection of AFR tools differ widely in their assumptions and techniques and target different pieces of the facial recognition pipeline. There is a need to better understand their commonalities, to highlight performance tradeoffs, and to identify unexplored areas for future development. In this paper, we address this need, through the lens of a common framework for analyzing a wide range of AFR systems. More specifically, we make the following contributions: • Taxonomization of targets in facial recognition: AFR systems target a wide range of components in the facial recognition process. Using a generalized version of the facial recognition data pipeline, we provide the first framework to reason broadly about existing and future work in this space. • Categorization and analysis of AFR systems: We take the current body of work on AFR systems, categorize and analyze them using our proposed framework. • Mapping design space based on desired properties: We identify a core set of key properties that future AFR systems might optimize for in their design, and provide a design roadmap by discussing how and if such properties can be achieved by AFR systems that target each stage in our design framework. • Open challenges: We use our framework to identify sig- Fig. 1 . The workflow of how facial recognition systems recognize a human face in an input image, along with the corresponding terminology. (a): A query image, after being submitted to the system, is passed to the feature extractor to produce a feature vector; (b): this feature vector is used to query a reference database of labeled feature vectors; (c): if the query feature vector matches a labeled feature vector in the database, the label is used to find a reference image, and the system outputs the reference image and the identity (i.e. Alice Smith in this example). nificant challenges facing current AFR systems, as well as directions for potential solutions. In the rest of the paper, we begin with a detailed description of real-world facial recognition systems ( §II), including realworld deployment scenarios and key technical components. We then present the motivation and threat model of AFR systems ( §III), and our systemization of existing AFR tools by examining the five overarching stages of facial recognition systems that AFR tools could target ( §IV). We discuss the key attack methods used by existing AFR proposals targeting each stage, i.e., data collection ( §V), data processing ( §VI), feature extractor training ( §VII), identity creation ( §VIII), and query matching ( §IX). We then identify key desirable properties for future AFR systems, and map them to points in the design space ( §X). Finally, we discuss open challenges and potential directions for future AFR research ( §XI). The broad deployment of facial recognition systems (and by extension, AFR systems) is fraught with ethical challenges and implications, not the least of which are significant biases against women and people of color [34] . While we discuss ethical tensions surrounding AFR systems in §XI-B, we do not make assertions in this paper on how (and whether) AFR tools should be used. Development and adoption of AFR tools are driven by backlash against biases in and misuse of facial recognition systems. Even as we continue to struggle with their legal and ethical implications, we recognize that AFR tools are here to stay, and an analysis of their strengths and limitations is crucial to advancing the ongoing debate about both their use and the place of facial recognition in our world. As context for later discussions, we now provide an overview of facial recognition (FR) systems and their realworld implementations. FR systems identify people by their facial characteristics, generally by comparing an unidentified human face in an image or a video against a database of facial images with known identities. While there are many design variants [35] , we focus on the state-of-the-art and widely adopted FR systems, which employ deep neural networks (DNNs) to perform recognition on digital face images. We note the distinction between facial recognition systems, the main target of AFR systems and the subject domain of this work, versus facial verification systems. Facial verification is used widely to authenticate users on mobile devices (e.g. FaceID on iPhones), by checking the similarity of a user's facial features against the stored feature vector matching the authorized user. The large majority of AFR systems focus only on facial recognition, and as such, we do not consider facial verification or its disruption in this work. Below, we begin by presenting the run-time workflow of facial recognition. We then propose a breakdown of the FR workflow into five operational stages, a framework that we will revisit and use for analyzing AFR systems in Section IV. Finally, we give an overview of real-world deployments of FR. Figure 1 summarizes the run-time workflow of how FR systems identify a face from an input image. First, a query image, i.e. a face image to be identified, is fed through a feature extractor, a DNN that converts the image into a feature vector (or a mathematical representation of the person's facial features). Next, this feature vector is used to query a reference database, a collection of face images of known identities. This query search is done by comparing the input feature vector against the reference feature vectors stored in the database to find the closest match. Finally, if the query search finds a reference feature vector in the database sufficiently similar to the input, the FR system declares that a match has been found and outputs the corresponding identity and the associated reference image (i.e. Alice Smith in Figure 1 ). It is worth noting that the terminology used to describe a FR system can vary across the literature, and some alternative terms are listed in Figure 1 . For example, query images are sometimes called "probe images" or "test images," while feature vectors are referred to elsewhere as "face templates" or "faceprints". Reference images are also known as "identified images" or "gallery images". The terms we choose to use in this paper are, we believe, most familiar to the security research community. We now examine the FR operational pipeline and divide it into a set of operational stages that will frame our discussion of FR and AFR tools. These operational stages correspond to specific subtasks in FR, which together encompass the five critical points of direct interaction between users and FR systems. Figure 2 depicts the five operational stages of FR. We discuss each stage below, and will revisit them as a framework to analyze anti-facial recognition tools in §IV. Face images primarily come from two sources: online image scraping [61] or physically taking a photo of a person [1] , [8] . We discuss sources of face images for FR systems in further detail in §II-C. 2 Image preprocessing. Raw images from stage 1 are often poorly structured (e.g., varying face sizes, bystanders in background). To make downstream tasks easier, the FR system often preprocess images by applying face detection (e.g., automated face cropper [62] ) to remove the background and extract each individual face, followed by a data normalization process [63] , [64] , [65] . 3 Training feature extractor. The crucial element of DNN-based FR systems is the feature extractor used to compute facial features from an image. To achieve accurate recognition, the computed feature vectors must be highly similar for photos of the same person, but sufficiently dissimilar across photos of different people. To enable this behavior, most existing FR systems adopt the training methodology proposed by [65] in 2015: adding an extra loss function during model training to directly optimize for large separations between different faces in the feature space. Followup works explore alternative loss functions and model architectures to further improve the accuracy of FR systems (e.g., [63] , [64] , [66] ). To maximize efficacy, the feature extractor is generally trained on millions of labeled face images. Extensive resources are required to both collect and label a large face dataset and to actually train the model. As a result, many FR practitioners, including large companies [67] and government agencies [68] , [69] , opt to purchase or license a well-trained feature extractor from tech companies (e.g. [ FR systems need a large database of known (or labeled) faces in order to match unknown (unlabeled) faces to their true identities. As a result, FR systems build a reference database of people they want to recognize, by first collecting and preprocessing labeled face images of these individuals, and then passing them to the feature extractor to obtain feature vectors. The reference database stores the corresponding feature vector and identity pairs [78] , [61] , [79] . At run-time, the FR system takes in an unidentified face image, extracts its feature vector, then uses it to query the reference database to locate a match (if any exists). If the feature space distance (e.g., L 2 or cosine) of the query image is close enough to an entity in the database, the system outputs a match. In recent years, large corporations and government agencies across the globe have adopted FR for various applications. This wide adoption was triggered by significant accuracy improvements of FR systems, largely due to new training methods [65] and more powerful neural network architectures [80] . Below, we present some commonly known FR use cases and discuss their impact on users. Government agencies around the globe use FR for a variety of purposes. For example, the US government uses FR systems for law enforcement purposes such as border control [50] and police operation 2 [81] . The Chinese government employs FR to monitor specific subpopulations [2] , [82] , track video game use [83] , and enforce COVID lockdowns [51] . Table I lists more examples of government uses of FR. For a broader exploration of this topic, we refer the reader to [84] . Commercial use cases. Many corporations have integrated FR into their security and commerce pipelines. The most common FR use cases are enhancing store or office security. For example, companies like Apple, Macy's, and Lowes have begun using FR to catch shoplifters in their stores [54] . Other companies have employed FR to monitor corporate facility access [55] , [56] . Product-based applications have emerged as well, such as car companies like Subaru using FR to track driver fatigue [58] or airlines using FR to streamline passenger checkins [59] , [60] . Sources of face images. The definitive source of images for deployed FR models is often unknown. Based on government reports and media articles, we outline some known sources of training, reference, and query images used by today's FR systems. Training images (used to train feature extractors) often come from a mix of academic training datasets (e.g. [85] , [86] , [87] , [88] ), proprietary data, and public data scraped from social media accounts, according to a report of the US Government Accountability Office [68] . Reference images used to create the reference database generally come from the Internet (e.g., social media), or government databases (e.g., passport and driver license photos). A list of known reference image sources for some well-known FR operators is shown in Table II . Finally, query images can come from both online and physical sources, including social media, police body cams, mug shots, corporate surveillance systems, state identification images, passport photos, and others [69] . After identification, query images are often fed back into the reference database, either to enhance existing feature vectors or create new ones. For example, US Customs and Border Patrol states that images of non-US travelers collected at US entry points are fed back into a larger DHS database as reference images. Similar techniques are used by several Chinese companies [89] , [10] . In this section, we discuss factors driving the development of anti-facial recognition (AFR) tools, the threat model of those AFR tools, and its practical implications. Numerous forces have coalesced to drive the recent trend in AFR tool development. First, numerous reports about the provenance of images used in commercial FR systems have raised significant privacy concerns. The most infamous examples are Clearview.ai and PimEyes -both companies have scraped over 3 billion images from social media sites to use in their FR systems [90] without user knowledge or consent. Second, increased government use of FR systems has caught the eye of citizens who have raised significant concerns about Fig. 3 . Overview of our proposed stage-based framework for analyzing existing AFR proposals. We list the five critical stages of facial recognition as discussed in §II-B and present AFR strategies per stage by the attack target, action, and desired effect. the long-term effects of FR on privacy and freedom of expression [15] , [93] . Third, multiple editorials have highlighted and discussed the demographic bias of existing FR systems, calling for a moratorium on (or at least regulation of) the FR technology [13] , [94] , [95] . Consequently, public sentiment about FR is mixed and, especially in western countries, trending negative [96] , [97] , [98] , [99] . This shift in public opinion, combined with the concerns and forces noted above, has motivated researchers to create various AFR tools to counteract unwanted FR systems. AFR tools are used by a person P to combat a FR system or service F . In this context, P takes the role of an attacker and acts against F . Development of AFR tools generally makes the following assumptions about each party: • P has no special access to or authority over the target FR system F , but wishes to evade unwanted facial recognition by modifying or otherwise controlling their own face images. • F 's goal is to either create or maintain an accurate facial recognition operation. Furthermore, F operates at scale and does not specifically target P for identification. Assumptions and Implications. The above threat model relies on several key assumptions. We now discuss their implications. Our study focuses exclusively on image-based AFR tools that a user P can deploy themself. These image-based designs dominate the current set of AFR proposals. Yet a user P may, depending on their context, be able to use other means (e.g., legal action) to fight unwanted facial recognition. We discuss potential nonimage-based AFR methods later in §XI. 2) Assuming F does not specifically target P for recognition: We note that existing AFR tools are designed to fight large-scale FR systems. This is because, from a practical standpoint, if system F wishes to specifically recognize a user P, there are much more efficient options than using a general, large-scale FR system. Therefore, most AFR tools are not designed to withstand this level of scrutiny. Data Source Photos taken in academic research study. Signed release for photos taken at public event. Photos posted online by user on personal social media. Photos posted online by user's friends. Images sold by companies without user knowledge. Photos obtained from surveillance cameras in public spaces. Photos from government databases. We now discuss and analyze existing AFR proposals. To do so, we propose and use a stage-based framework to categorize AFR strategies, which encompasses the five critical points of direct interactions between users and FR systems. AFR tools can operate at these points, where FR systems interface with the broader world. As shown in Figure 3 , each of these critical points corresponds to an key operational stage of FR systems, i.e. the stages 1 -5 described in §II. With this in mind, we now summarize the "attack" strategies used by AFR tools to disrupt the operation of each FR stage and taxonomize current AFR proposals. In the next few sections, we discuss in detail the AFR proposals targeting each individual stage ( §V- §IX), before discussing the goals and tradeoffs of AFR tools ( §X). Finally, we consider broad challenges facing AFR development and discuss potential future directions( §XI). The overall structure of our analysis is shown in Table IV . Since the five FR stages 1 -5 encompass the points of direct interaction between P and F , they naturally cover the points of attack employed by existing AFR proposals. Next we briefly describe the general strategies used by AFR tools targeting each FR stage. In the image collection stage, labeled and/or unlabeled images are collected for use by F , either by physically taking photos or scraping online images. When targeting this stage, AFR tools focus on disrupting the data collection process to prevent F from acquiring usable face images of P. This second stage pre-processes collected face images using a series of digital transformations, e.g., face detection, background cropping, and normalization. AFR tools deployed at this stage seek to render the processed images unusable, either by breaking the preprocessing functions (e.g., preventing faces from being detected), injecting noise and artifacts onto the images, or removing P's identity information from the images. Since stage 3 is dedicated to training face feature extractors, AFR tools targeting this stage seek to degrade the accuracy of the extractor by poisoning its training images. To create the reference database, labeled reference images are passed through the feature extractor to create their feature vectors. AFR tools targeting this stage attempt to corrupt the feature vectors created for P's reference images so that the database holds a "wrong" feature vector of P. In the query matching stage, AFR tools seek to prevent accurate matching between a query image's feature vector (of P) and P's feature vectors stored in F 's reference database. This is generally achieved by perturbing (or modifying) the query image to change its feature vector. Using our stage-based analysis framework, we now present a comprehensive taxonomy of existing AFR proposals in Table V . In this list, we categorize existing AFR proposals by the year of release, the individual FR stage they target, and the attack scenario. We further break down the attack scenario by P's knowledge of F (white box or black box 3 ), the AFR deployment context (physical or digital), whether the attack is targeted or untargeted 4 , whether the AFR tool has been tested against real-world FR systems, and any unique or notable features of the AFR tool. We note a significant imbalance of AFR tools targeting different stages. Stage 2 and 5 have attracted the most number of AFR proposals, likely due to the popularity of adversarial perturbation-based research. We also notice that 7 out of 30 proposals assume a "white-box" access to F 's FR pipeline, which is often unrealistic in practice. Finally, only 12 out of the 30 proposals have tested the AFR effectiveness against at least one real-world FR system. Overall, Table V serves as a comprehensive summary of current AFR proposals, which we will refer to throughout the paper. In the next five sections, we discuss in greater detail how existing AFR proposals attack each of the five stages. In each section, we first describe the goals of F and P in the corresponding stage and then discuss specific AFR proposals that allow P to disrupt F . This section focuses on methods that allow P to attack F by disrupting the process of face data collection (stage 1 ). • F 's goal is to obtain usable face images from online or physical sources. In many scenarios, F aims to collect high quality images of millions or billions of people (e.g., Clearview.ai [90] ). • P's goal is to prevent their face images from being collected for use in face recognition systems. They use online or physical evasion/disruption techniques to thwart image collection. Face images can come from two sources: scraping online images or physically capturing faces using cameras. Thus we divide existing AFR tools acting at this stage into two subcategories: preventing scraping of online images and preventing image capture by cameras. A large portion of face images used to build today's FR systems are scraped from online social media platforms. Thus an effective way to stop F is to prevent web scraping. While each single user can try their best to limit their online footprint, most of the AFR methods require the help of others or an online platform (e.g., Flickr). Anti-scraping techniques have been widely studied in the security community [115] , [116] , [117] , [118] , [119] . Techniques such as rate limits, data limits, ML-based scraping detection are already used by online platforms [120] . However, a significant portion of scraping still goes undetected as scrapers develop more sophisticated tools to bypass detection [120] . Data leverage by users. P could try to prevent F from collecting their online images by withholding them. Recent works propose the concept of "data leverage" where users of online platforms work collectively to withhold data or control how their data is used by tech companies [100] , [32] , [121] . While not specifically aimed at facial recognition, these proposals offer alternative models for online engagement while protecting user data. Ordinary civilians can already use smartphones to take high-quality photos of anyone at any moment. These photos could be collected and used by facial recognition systems like PimEyes [91] . Furthermore, face photos taken by on-street surveillance cameras are increasingly used by commercial or government facial recognition systems [56] , [53] , [122] , [1] , [9] , especially in major metropolitan areas and inside stores. Today's proposals for avoiding image capture come from both research community and activists (e.g. protesters and artists) concerned about surveillance. They fall into two broad categories: hiding faces from cameras and disrupting camera operation. Face hiding. People can wear clothes, hats, masks, or move their head to prevent (usable) facial image being captured by cameras. Notably, during the June 2020 wave of protests in the US, nonprofit organizations compiled a "tech toolkit" to help privacy-conscious protesters obfuscate their faces from cameras and avoid identification [123] ; in late 2020, a Chinese artist used a map of on-street surveillance cameras to successfully guide others to evade identification by positioning their head/body "away" from those cameras [124] . Without physically breaking cameras, human users can prevent cameras from capturing (usable) images by simply shining laser lights at them [93] . Other commonplace methods include covering cameras with fabric or stickers. Stage 2 processes raw face images using a series of digital transformations to facilitate further operations in stages 3 , 4 , and 5 . AFR proposals targeting this stage seek to disrupt the digital transformation process such that the processed face images are "unusable" by subsequent stages. • F 's goal is to obtain well-structured face images from a large number of raw images. • P's goal is to either prevent their face being detected/extracted from raw images or to anonymize their face in these images. Face detection extracts well-centered head shots from raw images. The commonly used face detection systems [62] rely on DNNs to accurately infer the location of faces in an image. To disrupt face detection, existing AFR tools leverage the concept of "adversarial perturbations" against DNN models. Adversarial perturbations are a well-studied phenomenon in the field of adversarial machine learning. These carefully crafted, pixel-based perturbations, when added to an image, can cause DNNs to produce wrong classification results(e.g., [125] , [126] , [127] , [128] ). Typically, the perturbations are generated using an iterative optimization procedure that maximizes the likelihood of model misbehavior while minimizing perturbation visibility. The generation procedure varies depending on P's knowledge on F (e.g. white-box vs black-box, see Table V) . AFR tools using adversarial perturbations can be further divided into two types, based on how the perturbation is added to images. They can be directly added to digital images if P has direct access to these images or fabricated as physical objects that P can carry (e.g., an adversarial T-shirt) or place on cameras. Directly modifying digital images. Using AFR tools, users who post images online can directly add adversarial perturbations to these images before posting them (e.g., [28] , [25] ). In this way, users can ensure that those properly perturbed images cannot be used by FR systems to extract any face information. Wearing custom designed physical objects. Often users do not have access to face images to modify them. An alternative way to "inject" adversarial perturbations into images is to carry or wear a physical object so that any camera taking a photo of the user will also capture a version of the adversarial perturbation. Along these lines, prior works have successfully translated face-detection-evading adversarial perturbations into makeup [33] , [123] , t-shirts [26] , [24] , or stickers. Placing a sticker on cameras. An orthogonal approach involves transforming the adversarial perturbation into a translucent sticker that can be placed over a camera lens. This sticker imperceptibly modifies images taken by the camera to prevent people and faces from being detected in those images [101] . P can also anonymize their face images to remove identity information. Physical anonymization can be easily achieved by wearing masks, hats, makeup, etc, which overlaps with "avoiding image capture" in 1 discussed in §V-B. Thus our discussion below focuses on digital anonymization techniques applied to online face images. To anonymize face images, the leading proposals use generative adversarial networks (GANs) [129] and differential privacy [130] . Several proposals use GANs to first transform face images into latent space vectors, modify those vectors to remove identity information, and reconstruct the images from the modified vectors [102] , [17] , [103] . The modified faces still look human but are anonymized to prevent accurate identification. Another proposal, IdentityDP [18] , uses similar techniques but goes a step further by providing provably differentially private identity protection. A side effect of anonymization is that the anonymized faces generally do not resemble the original face but carry significant changes in shape, skin tone, hair color, or other properties. All FR systems require an effective feature extractor to distinguish between faces of different people. AFR proposals attacking stage 3 focus on manipulating or corrupting the process of training feature extractors. • F 's goal is to train a high-quality feature extractor using available data. • P's goal is to prevent their photos from being used to train an effective feature extractor. Data poisoning is a well-studied technique in the field of adversarial machine learning. By manipulating the training data of a DNN model, an external party can negatively impact the model's training [131] , [132] , [133] , [134] , [135] . Poisoned models can exhibit a variety of (mis)behaviors, from incorrect classification of specific inputs to complete model failure. Existing AFR proposals focus on the latter. Making training data unlearnable. By injecting specially crafted noise on training data, Huang et al. [20] render the data "unlearnable" by a DNN model. This noise misleads the model into thinking that the data have already been learned, thwarting necessary parameter updates. When a user submits their "unlearnable" face images as a training image for the FR feature extractor, the extractor will not learn anything to improve its performance. Since training an effective face feature extractor requires millions or even billions of face images [63] , [64] , [65] , once the number of unlearnable training images becomes large enough, the trained feature extractor will not meet the accuracy level required for practical deployment. A related proposal from Evtimov et al. [104] injects adversarial shortcuts into the dataset. Models trained on this data overfit to the shortcut and fail to learn the meaningful semantic features of the data. Now the trained extractor model has a distorted understanding of the feature space, it cannot produce high quality feature vectors required for accurate face recognition. In stage 4 , with a trained extractor in hand, F creates a reference database of labeled face feature vectors to facilitate identification of unidentified faces. AFR tools targeting this stage seek to fill the reference database with incorrect face/label mappings, so that P cannot be accurately recognized from their query images. • F 's goal is to create a database against which they can run facial recognition searches. This database should contain feature vectors of the people F wishes to recognize. • P's goal is to disrupt the feature vector creation process. This prevents F from creating an accurate feature vector which can be matched against query images of P's face. Existing AFR proposals in this category focus on poisoning feature vectors before they are stored into the reference database. The specific poisoning techniques depend on the underlying assumptions about how F compares run-time query images to the feature vectors stored in the database. Assuming classification-based query matching. A recent AFR proposal, Fawkes [19] , assumes that F produces run-time facial recognition results by adding a shallow classification layer on top of the feature extractor. Fawkes seeks to corrupt the final classification output by "cloaking" (or poisoning) reference images of P, i.e. shifting their feature vectors away from the correct representation by adding imperceptible perturbations to the P's reference images [19] . F 's shallow classification models trained on these shifted feature vectors will learn to associate incorrect feature spaces with P's identity, producing wrong matches for P's (uncloaked) query images at run-time. An earlier work, FishyFace [136] , also proposes to disrupt face verification by poisoning the training data used to train a one-class SVM model. Since FishyFace targets per-user face verification, rather than large-scale FR systems, we exclude it in Table V and our analysis. Two other AFR proposals, LowKey [22] and FoggySight [21] , assume a K-nearest neighbors approach to query/database matching. LowKey [22] adds digital adversarial perturbations to change the feature representation of P's reference images (similar to Fawkes). These perturbed images create a reference feature vector for P that is different from those of P's run-time query images, thus preventing matching. FoggySight [21] takes a community-driven approach, where users modify their images to protect others. These collective modifications flood the top-K matching set for a specific user with incorrect feature vectors, drowning out the correct feature vector and preventing query image matching. The final set of AFR tools aims to prevent run-time query image identification. These methods can provide one-time protection for users who believe their images are already enrolled in a reference database. Furthermore, since labeled query images can also be added to the reference database, using these AFR tools at run-time can also help poison the reference feature vectors (see §VIII). However, current AFR proposals targeting this stage focus strictly on evasion and do not consider this joint evasion and poisoning possibility. • F 's goal is to identify the individual in the query image. • P's goal is to alter their query image so it doesn't match their database feature vector and thus cannot be identified. The assumption here is that F 's reference database contains accurate feature vectors of P. Adversarial perturbations have been the dominant method for evading DNN classification and consequently are relevant for evading FR. Due to the extremely high number of these techniques, we restrict our discussion to proposals explicitly designed to evade FR systems at run-time. We organize these proposals by their operational context: physical and digital. Physical evasion techniques. The first group of proposals injects adversarial perturbations into face images by having P wear them as physical objects. While these methods echo those described in §VI-A, they focus on thwarting image recognition or classification rather than face detection. Earlier proposals [106] , [105] use adversarial makeup and eyeglasses to cause incorrect classification by FR models. More recent proposals consider two other directions, either using larger but input-independent adversarial patches to boost the effectiveness of evasion [30] , or making the perturbation digitally controllable and/or much less perceivable by human eyes by projecting visible/infrared light onto user faces [112] , [108] , [29] . Here P digitally modifies their unlabeled (online) face images to prevent them from being accurately recognized by FR systems. Most proposals in this category apply traditional adversarial perturbation generation techniques to create minimally visible perturbations that cause F 's feature extractor to produce misleading feature vectors. Their generation process varies depending on the assumption of feature matching process: a shallow classification on the feature vector or nearest neighbor based vector matching [110] , [107] , [109] , [113] . More recent proposals propose methods designed to be more robust to real-world FR systems (i.e. joint optimization on multiple feature extractors, etc) [27] , [23] , [114] . Another recent proposal [111] uses a GAN to generate adversarial perturbations rather than applying the above mentioned optimization techniques. In our discussion of current AFR tools, we consider the design space of AFR tools through the lens of specific FR stages they disrupt. To date, all existing AFR proposals we analyzed have focused their design around disrupting a single stage in this framework. Assuming an AFR tool must disrupt some portion of the FR pipeline to be effective, we can map out and explore the design space of AFR tools using this framework. For researchers and practitioners in the AFR community, perhaps the most critical question is: "what are the benefits and limitations of AFR tools that target each specific stage in the framework?" Or, an alternative form of the question might ? ? TABLE VI EVALUATING AFR TOOLS USING FIVE PROPERTIES, WHERE THE TOOLS ARE GROUPED BY THE FR STAGE THEY TARGET. be: "Given a set of prioritized properties for an AFR system, can I find the best stage(s) to disrupt in order to achieve them?" We attempt to answer these questions here, by first identifying a set of high level properties that AFR tools can potentially optimize for, then for each property, discussing how targeting a given stage affects an AFR tool's ability to achieve it. Ultimately, we hope to provide a high level roadmap that can guide the design of AFR tools optimizing for specific properties in mind. Note that while we consider each stage in isolation, it might be possible for an AFR tool to target multiple stages, possibly gaining a combination of benefits (and limitations). When considering properties to guide the design of AFR tools, we assume that efficacy is a given priority. Our list of 5 properties target additional considerations beyond basic efficacy, and include desirable properties for efficacy (#1 and #2) and for minimizing dependencies and cost (#3, #4, #5): 1) Long-term robustness against evolving FR systems 2) Broad protection coverage, efficacy even for users with unprotected face images online 3) No reliance on 3rd parties, does strong protection require assistance from service providers or other users? 4) Minimal friction for user P, minimizing cost for user to deploy the AFR tool on a consistent basis 5) Minimal impact on other users, minimizing potential risks to non-users of the AFR tool Next, we discuss the above properties in turn, and consider how easily each property can be achieved by AFR tools that target different operational stages in our framework. For each combination of property and target stage, we "quantify" how easily the desirable property can be achieved by an AFR tool designed to disrupt that stage. means that the property has already been achieved by current AFR proposals targeting this stage; means that the property seems "promising" and has good potential to be achieved by AFR designs targeting this stage; and ? indicates significant progress may be required to achieve this property by targeting this stage, and the likelihood of success is unknown. Table VI provides an overview of our conclusions. For easy notation, we will use AFR k to refer to the group of AFR proposals that target FR stage k . An effective AFR tool should provide strong and lasting protection against unwanted facial recognition. That is, it should protect a user P from unwanted FR from initial use, and extending into the future, even as FR systems continue to advance. : None While this principle is the main goal of AFR, none of existing AFR tools (targeting any stage) is able to achieve this property. No current system provides strong protection against ever-evolving FR systems. : AFR 1 , AFR 2 , AFR 4 Conceptually, P can achieve long-term robustness by consistently undermining the face data pipeline of F . AFR 1 and AFR 2 can both prevent any face image of P to be included into F 's pipeline. AFR 4 can corrupt F 's understanding of any face images in the reference database. While promising, existing AFR tools fail to consistently prevent the inclusion of or corrupt all P's images from both online and physical sources. It remains unclear if these two groups of AFR tools can provide long-term robustness. AFR 3 could be overcome over time as F switches to newer and different feature extractors. AFR 5 offers only one-time protection, and does not address the scenario where query images get added to the reference database. Many of us already have an online presence, e.g., face photos posted years ago without AFR protection. An effective AFR proposal would ideally provide protection under the challenging but realistic scenario where P already has unprotected face images online. : AFR 5 AFR tools that rely on run-time evasion are not impacted by the existence of unprotected images online. The presence of unprotected images complicates the protection of AFR 4 since F has some groundtruth information about the correct features of P's faces. However, the addition of protected images can slowly move the features of P away from the correct feature, and thus achieve protection. Moreover, several AFR tools [19] , [21] proposed a "group cloaking" idea where multiple users coordinate together to achieve better protection for those having an existing online presence. ?: AFR 1 , AFR 2 , AFR 3 These three groups of AFR tools focus on disrupting the (training) data pipeline of FR. As a result, they cannot protect P against F who has obtained unprotected images of P. Ideally, an AFR tool can be operated by a user P alone, and achieve strong protection without assistance or participation third-party, either a central content provider like Facebook or a friendly user willing to cooperate to help P. This is an abstract measure of the entity-level complexity required to operate the tool. Achieving this property has the added benefit of limiting exposure of potentially sensitive user photos or personal data to any 3rd party, i.e. the AFR is also privacy-preserving. : AFR 2 , AFR 4 , AFR 5 AFR tools in these three groups all rely on adding certain perturbations on face images, which can be done by P without assistance from other parties. ?: AFR 1 , AFR 3 For those AFR 1 seeking to prevent online data scraping, they rely on the assistance of image sharing platforms. Similarly, disrupting the training of a feature extractor requires a coordinated effort across many users, since P only contributes a very limited subset of the training data. This usability-related property measures what P needs to sacrifice in order to consistently apply the AFR tool. This property is motivated by the well-known findings that users prefer and are more likely to use protection solutions that introduce minimal friction to their daily life [137] , [138] . : AFR 1 , AFR 2 , AFR 3 , AFR 4 , AFR 5 So far, existing AFR tools all introduce some levels of "disruption" to P, whether it is adding visual noise, perturbations or transformations to P's online photos that rampages their original purpose, requiring P to always wear odd makeup/clothes/accessories, or purchasing more powerful computing hardware/services to implement the AFR tool against continuely evolving F . More research efforts are needed to limit the amount/type of disruption to users. This final property examines how the outcome of P's AFP protection would affect other users. Intuitively, P can protect themselves by forcing F to fail (give a null or uninformative result), or by intentionally tricking F to recogize them as another person P'. Depending on the context, the latter may negatively affect P', producing potential social risks (see §XI-B for detailed discussions on social challenges facing AFR). : AFR 1 , AFR 2 These two groups of AFR tools focus on disrupting the data pipeline of F , and thus, have no impact on other users. These three groups of AFR tools seek to intentionally misclassify P's face to another user, and as a result, could potentially impact other users included in F 's reference database. In this section, we describe what we see as the major technical and broader social/ethical challenges facing future AFR development. Each challenge spans multiple properties and stages laid out in this paper. For each challenge, we provide context for why the challenge exists and, where possible, suggest ways to address it. Like §X, the challenges described here represent our best efforts to understand and systematize the AFR space. They are not exhaustive, and are meant as signposts rather than a comprehensive map for future research. Our analysis shows that the majority of AFR proposals, especially those targeting stages 2 − 5 , employ adversarial perturbations, which do not yet provide provable protection guarantees. In practice, the success rate of adversarial perturbations may drop significantly when P's knowledge of F is imperfect [139] . Many adversarial perturbation-based protections can also be circumvented by more advanced FR systems. For example, F could adversarially train the feature extractor [140] , [141] to be more robust against adversarial examples, thus defeating AFR tools against stages 3 or 4 . F could also remove adversarial perturbations from face images before processing them or adding them to the reference database [142] , circumventing AFR tools that target stages 2 or 5 . Improving adversarial perturbation generation methods may help increase short-term efficacy of those AFR tools. However, the lack of provable, ongoing protection is a much tougher barrier to overcome. In order to provide reliable, ongoing protection, developers of AFR tools can consider two possible paths: (i) integrate provable guarantees into the perturbation generation process, or (ii) consider an alternative that provides guaranteed protection. For (ii), there are two potential directions. The first is focus on attacking stage 1 , where defeating FR does not require evading or poisoning a feature extractor. The second is to switch from "misleading" the feature extractor with "minor" image modifications to completely disabling the feature extraction and/or matching process. Some AFR proposals (especially those targeting stage 4 ) implicitly or explicitly assume that users can start "from scratch" to protect their online persona. In practice, most Internet users today already have face images online, posted by themselves or others, and at least some of those images are already captured by FR databases. Over 1.8 billion photos are uploaded to online platforms daily [143] , making it likely that one or more unmodified photos of a user P will likely end up online, with or without P's knowledge. Given the widespread use of web scraping to collect FR reference images [91] , [5] , it is likely that at least one of these photos is already in a FR system reference database. This stark reality has two implications for future AFR research. First, AFR tools should be evaluated under the practical scenarios where the FR system has access to both protected and unprotected online photos of P. While several AFR tools have provided such measurements (e.g. [19] , [21] ), many others have not. Second, we believe that AFR tools managed by online platforms will offer better protection of online footprints against FR systems than those executed by individual users. These platforms can protect photos of an individual posted by them or others, and are overall better positioned to deploy more powerful protection mechanisms. For example, online platforms could employ the group cloaking techniques proposed in Fawkes [19] or Fog-gySight [21] to corrupt reference databases composed of images from their sites. After images are scraped, online platforms could use provenance-tracking to re-identify stolen images, e.g. in the training dataset of a feature extractor, and enable exposure/prosecution of photo thieves [144] , [145] , [146] . All these methods ought to be accompanied by enhanced anti-scraping techniques to prevent large-scale scraping of face images, i.e. stricter rate limiting, access permissions, and scraping detection heuristics, to make it safer for individuals to have online footprints. A related but distinct challenge faced by AFR systems is the permanence of face data. For better or for worse, most people have the same face their whole adult life 5 . Our faces may age, but they remain recognizable as uniquely "us" to most humans and FR systems [147] . The slow rate at which faces change is a major challenge for AFR tools. To be long-term effective, these tools must conceal the same piece of static data (a face) from numerous adversaries over many years. Once F obtains P's protected face photo, they can try as many times as they want to break the protection [141] . If F ever succeeds, either in 1 month or 1 year, they "win" and P loses, because modern FR systems only need one clean picture in the reference database to identify a person [63] . For example, Clearview.ai identified a person based on a single reference image in which the person's reflection appeared faintly in a mirror [5] . Clearly, the issue of face data permanence poses a significant challenge for AFR tool development. 5 Major plastic or reconstructive surgery excepted. One final technical challenge faced by AFR tool developers is the lack of transparency on how proprietary FR systems work in practice. This hampers AFR tool development and testing. Without access to proprietary FR systems, AFR researchers must do their best to glean a generic understanding of how FR systems work from public documents and academic papers, e.g. [65] , [68] . While this may be sufficient to develop AFR tools that work well in the lab, it would likely be impossible for researchers to perform comprehensive efficacy tests against proprietary systems. Furthermore, AFR tool developers have no knowledge of how or if FR systems are actively working to overcome AFR systems. The 2020 global FR market was valued at 3.86 billion US dollars [148] , so FR stakeholders have ample resources and personnel to quickly deploy changes as new AFR systems emerge. Even passive improvements to FR systems, such as the arrival of new training methods or architectures, can overcome AFR protection and compromise user privacy [141] . Altogether, this lack of transparency means that that AFR tools face an upward battle in the fight against unwanted FR. In addition to these technical challenges, AFR tools face broader social and ethical considerations. These stem from a variety of factors, including a lack of regulation, benefits of FR for the public good, and demographic disparities in FR systems. Today, FR systems are generally unregulated and easy to deploy. Practically anyone with a powerful laptop and access to an image dataset could create a FR system. This democratization of FR has allowed 3rd party FR systems like Clearview.ai, which rely on unauthorized data use [61] , to flourish. As a result, it is extremely difficult (if not impossible) for individuals to know when/where FR systems are deployed and what they are capable of. This lassiez-faire climate creates significant ambiguity as to when AFR tools can/should be deployed. For example, around the world, photos taken for official government purposes (e.g. drivers' license and passport photos) are used as reference images in government FR systems aiding law enforcement officers, border control agents, among others [1] , [48] , [50] , [3] . This goverment-sponsored FR may be unwanted but is not (necessarily) unauthorized under the status quo, and the legality of using AFR tools to thwart downstream FR when official driver's license photos are taken is ambiguous. To augment the confusion, systems like Clearview are used by law enforcement [5] , further blurring the concept of unauthorized vs unwanted FR and the appropriate use of AFR tools. As FR and AFR use increases, a clash over this issue seems almost inevitable. Both privacy-sensitive citizens and criminals can use AFR tools. Law enforcement's use of facial recognition can benefit society in multiple ways, such as tracking and locating wanted criminals or lost children [149] , [150] . Consequently, AFR tools applied by bad actors could ultimately harm the public good. The debate between privacy and national security plays out in numerous other tech domains, such as end-to-end encryption [151] . Legitimate claims can be made by both sides. AFR researchers must be mindful of this tension and the potential consequences of their work. One ethical tension not yet explored in current literature is the social effect of misidentifications caused by AFR tools. For example, if U uses an AFR tool and is misidentified by a recognition system as P , what outcome might this have for P ? If U is engaging in illegal activity but P is arrested instead, the AFR tool could cause serious harm, both to P and to U 's victim(s). The well-known bias of FR systems heightens this tension. Police departments routinely make rushed identification decisions based on partial results from facial recognition systems [81] . Furthermore, facial recognition systems misidentify people of color at higher rates [34] , [152] . Recent work has found that AFR tools exhibit these same biases [153] , [154] . The social impact of AFR misclassification requires urgent study. As facial recognition (FR) continues to grow in scale and ubiquity, we expect anti-facial recognitions to rise in popularity. There is an urgent need to think longitudinally about AFR tools, analyzing both their limits and their potential. Our paper aims to fill this gap by providing both a framework for discussing AFR proposals and an assessment of the current state of AFR research. We find that current AFR tools possess some, but not all, of the traits needed to successfully defeat unwanted FR in the real world. Many existing proposals leverage adversarial perturbations to evade FR models, either in the preprocessing 2 or classification 5 stages. Such perturbations, while often effective in the short-term, lack long-term guarantees, and cannot fundamentally change FR system behavior in the future. Future AFR proposals may benefit from more exploration of designs that target stages 1 and 4 , which could provide wider-reaching protection. The perpetual lineup Twelve days in xinjiang: How china's surveillance state overwhelms daily life One month, 500,000 face scans: How china is using ai to profile a minority How russia is using facial recognition to police its coronavirus lockdown The secretative company that may end privacy as we know it Clearview ai's facial recognition app called illegal in canada Collection of biometric data from aliens upon entry to and departure from the united states Ongoing face recognition vendor test (frvt) America under watch Face recognition at the sales office Fears for children's privacy as delhi schools install facial recognition Amazon extends moratorium on police use of facial recognition technology Ibm ceo's letter to congress on racial justice reform Ban dangerous facial recognition technology that amplifies racist policing Facebook plans to shut down its facial recognition program Deepblur: A simple and effective method for natural image obfuscation Identitydp: Differential private identification protection for face images Fawkes: Protecting privacy against unauthorized deep learning models Unlearnable examples: Making personal data unexploitable Foggysight: a scheme for facial lookup privacy Lowkey: Leveraging adversarial attacks to protect social media users from facial recognition Preventing personal data theft in images with adversarial ml Making an invisibility cloak: Real world adversarial attacks on object detectors Fashion-guided adversarial attack on person segmentation Adversarial t-shirt! evading person detectors in a physical world Face-off: Adversarial face obfuscation Socialguard: An adversarial example based privacy-preserving technique for social images Adversarial light projection attacks on face recognition systems: A feasibility study Advhat: Real-world adversarial attack on arcface face id system Camera adversaria Can "conscious data contribution" help users to exert "data leverage" against technology companies? Cv dazzle: Camouflage from face detection Gender shades: Intersectional accuracy disparities in commercial gender classification Face recognition: Past, present and future (a review) Surveillance state: how gulf governments keep watch on us How facial recognition is taking over a french city Kenyan police launch facial recognition on urban cctv network Police facial recognition use in belarus, greece, myanmar raises rights, data privacy concerns Independent report on the london metropolitan police service's trial of live facial recognition technology China exports facial id technology to zimbabwe Buenos aires is using facial recognition system that tracks child suspects, rights group says We are hurtling towards a surveillance state: the rise of facial recognition technology Inside china's dystopian dreams: A.i., shame and lots of cameras Malaysian police adopt chinese ai surveillance technology Facial recognition in schools: systems deployed in europe and the us amid privacy concerns Facial recognition technology in schools: Critical questions and concerns This israeli face-recognition startup is secretly tracking palestinians System soon to identify risky goods Biometric breakthrough: How cbp is meeting its mandate and keeping america safe Facial recognition tech fights coronavirus in chinese city Resisting the rise of facial recognition Apple sued in nightmare case involving teen wrongly accused of shoplifting, driver's permit used by impostor, and unreliable facial-rec tech The retail stores you probably shop at that use facial-recognition technology We went inside alibaba's global headquarters. here's what we saw Major tech company using facial recognition to id workers In-car biometric technology for human interaction Subaru forester is first mainstream model to offer facial recognition technology Jetblue will test facial recognition for boarding Could facial recognition be the future of airport security? delta air lines is testing it out The world's scariest facial recognition company, explained Joint face detection and alignment using multitask cascaded convolutional networks Arcface: Additive angular margin loss for deep face recognition Cosface: Large margin cosine loss for deep face recognition Facenet: A unified embedding for face recognition and clustering Magface: A universal representation for face recognition and quality assessment Facial recognition technology: Privacy and accuracy issues related to commericial uses Facial recognition technology: Federal law enforcement agencies should better assess privacy and other risks Azure face recognition Amazon rekognition Sensetime Build your own face recognition service using amazon rekognition Inside the creepy and impressive startup funded by the chinese government that is developing ai that can recognize anyone, anywere Inception-v4, inception-resnet and the impact of residual connections on learning Garbage in, garbage out: Face recognition on flawed data Huawei/megvii uyghur alarms Game over: Chinese company deploys facial recognition to limit youths' play Mapped: The state of facial recognition around the world Ms-celeb-1m: A dataset and benchmark for large-scale face recognition Vggface2: A dataset for recognising faces across pose and age A data-driven approach to cleaning large face datasets Learning face representation from scratch China state tv exposes wide illegal use of facial recognition cameras in commercial properties The secretive company that might end privacy as we know it Pimeyes Because there were cameras, i didn't ask any questions Hong kong protesters are using lasers to distract and confuse. police are shining lights right back Here's a way forward on facial recognition Defund facial recognition More than half of u.s. adults trust law enforcement to use facial recognition responsibly Beyond face value: Public attitudes to facial recognition technology Facial recognition: A cross-national survey on public acceptance, privacy, and discrimination Has facial recognition technology been misused? a user perception model of facial recognition scenarios Data leverage: A framework for empowering the public in its relations hip with technology companies The translucent patch: A physical and universal attack on object detectors Deepprivacy: A generative adversarial network for face anonymization A systematical solution for face de-identification Disrupting model training with adversarial shortcuts Facilitating fashion camouflage art Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition Fast geometrically-perturbed adversarial faces Invisible mask: Practical attacks on face recognition with infrared Efficient decision-based black-box adversarial attacks on face recognition Generating adversarial examples by makeup attacks on face recognition Advfaces: Adversarial face synthesis Vla: A practical visible light-based attack on face recognition systems in physical world On brightness agnostic adversarial examples against face recognition systems Towards face encryption by generating adversarial identity masks You are how you click: Clickstream analysis for sybil detection Robots welcome: Ethical and legal considerations for web crawling and scraping Detection of web scraping using machine learning Detection of web api content scraping: An empirical study of machine learning algorithms Anti-scraping application development Scraping by the numbers data strikes": Evaluating the effectiveness of a new form of collective action against technology companies Walmart's use of sci-fi tech to spot shoplifters raises privacy questions How to protect your phone and identity at protests How to 'disappear' on happiness avenue in beijing Towards deep learning models resistant to adversarial attacks Towards evaluating the robustness of neural networks Ead: elasticnet attacks to deep neural networks via adversarial examples Sparse adversarial attack to object detection Generative adversarial nets The algorithmic foundations of differential privacy Badnets: Identifying vulnerabilities in the machine learning model supply chain Targeted backdoor attacks on deep learning systems using data poisoning Trojaning attack on neural networks Transferable clean-label poisoning attacks on deep neural nets Poison frogs! targeted clean-label poisoning attacks on neural networks Fishy faces: Crafting adversarial images to poison face authentication Usage patterns of privacy-enhancing technologies Privacy tradeoffs: Myth or reality Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks Oriole: Thwarting privacy against trustworthy deep learning models Data poisoning won't save you from facial recognition Faceguard: A self-supervised defense against adversarial face images A photo used to be worth a thousand words, but thanks to social media photos have lost their value Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models Membership inference attacks against machine learning models Radioactive data: tracing through training Face verification across age progression using discriminative methods Facial recognition market size, share and trends report Pinellas county sheriff's office facial recognition program International statement: End-to-end encryption and public safety Robustness disparities in commercial face detection Fairness properties of face recognition and obfuscation systems Bias and fairness of evasion attacks in image perturbation