key: cord-0645495-srvef7gs authors: Botezatu, Cristian; Ibsen, Mathias; Rathgeb, Christian; Busch, Christoph title: Fun Selfie Filters in Face Recognition: Impact Assessment and Removal date: 2022-02-12 journal: nan DOI: nan sha: 387513f7461a4d559f9554731f05aa82e81e1e12 doc_id: 645495 cord_uid: srvef7gs This work investigates the impact of fun selfie filters, which are frequently used to modify selfies, on face recognition systems. Based on a qualitative assessment and classification of freely available mobile applications, ten relevant fun selfie filters are selected to create a database. To this end, the selected filters are automatically applied to face images of public face image databases. Different state-of-the-art methods are used to evaluate the influence of fun selfie filters on the performance of face detection using dlib, RetinaFace, and a COTS method, sample quality estimated by FaceQNet and MagFace, and recognition accuracy employing ArcFace and a COTS algorithm. The obtained results indicate that selfie filters negatively affect face recognition modules, especially if fun selfie filters cover a large region of the face, where the mouth, nose, and eyes are covered. To mitigate such unwanted effects, a GAN-based selfie filter removal algorithm is proposed which consists of a segmentation module, a perceptual network, and a generation module. In a cross-database experiment the application of the presented selfie filter removal technique has shown to significantly improve the biometric performance of the underlying face recognition systems. In the recent past, the use of deep convolutional neural networks has achieved remarkable improvements in face recognition (FR) accuracy, surpassing human-level performance [1] , [2] . Due to these breakthrough advances FR technologies have become an essential tool for identity management systems and forensic investigations worldwide. In the latter application scenario, public content plays an important role, especially facial images from social media [3] , [4] , [5] . However, before sharing their face images on social media platforms, e.g., Facebook or Instagram, users frequently edit them to achieve a desired impact. Common editing tools include beautification filters which may apply significant alterations to the facial shape and texture, e.g., by enlarging the eyes or smoothing of the skin. Furthermore, so-called fun selfie filters are frequently applied by users to add to the amusement, as illustrated in Fig. 1 , inducing severe alterations and occlusions to face images. In a FR system, fun selfie filters applied to face images are expected to represent a challenge for various processing stages [6], [7] . For instance, a large coverage of the facial region by a fun selfie filter may hamper face detection or face sample quality estimation. In addition, biometric comparison scenarios where one of the face images to be compared has been altered with a fun selfie filter are expected to be challenging. However, to the best of the authors' knowledge, the impact of fun selfie filters on the performance of state-of-the-art FR systems has not been investigated yet. Recently, deep learning techniques have been applied for the purpose of image inpainting. In particular methods based on generative adversarial network (GAN) have shown impressive results for removing facial occlusion [8] , e.g., caused by medical masks [9] . In order to perform well, such techniques usually require a large amount of realistic training data containing face image with and without targeted occlusions. To the best of the authors' knowledge, so far the feasibility of GANbased removal of fun selfie filters has not been investigated in the scientific literature. In order to assess the impact of fun selfie filters on FR systems and mitigate their potentially negative effects, this work makes the following contributions: • A qualitative assessment of fun selfie filters, available in mobile application stores, is conducted. Based on this assessment, ten highly relevant filters are identified and classified w.r.t. the face image alterations. • The automated creation of a dataset generated from images of more than 1,000 subjects of the public FRGCv2 and FERET face image databases. • A comprehensive evaluation of the impact of fun selfie filters on the performance of face detection, face sample quality, and FR, along with a detailed discussion of obtained results. • A selfie filter removal algorithm which is created by adapting existing network architectures for segmentation and inpainting [9] , [10] , along with a detailed evaluation of FR performance before and after the selfie filter removal. The remainder of this paper is organised as follows: related works are briefly summarised in Sect. II. The creation of the selfie filter face image dataset is described in detail in Sect. III. Subsequently, the impact of selfie filters on different FR sub-systems is evaluated and discussed in Sect. IV. Sect. V introduces the novel GAN-based selfie removal which is applied to the created dataset to evaluate to which extent it mitigates the effects of selfie filters. Sect. VI concludes the work. Facial occlusions challenge FR systems which are able to cope with the occlusion problem in three main ways [8] : • Occlusion Robustness: apply a patch-based or learningbased feature extraction strategy to describe the feature space that is less affected by facial occlusions (e.g., [ [24] ). Many works have reported negative impacts of facial occlusions on FR, e.g., caused by sunglasses or face masks [8] . The common facial occlusions that challenge the current state-ofthe-art FR systems are listed in Tab. I. Similar to occlusions, strong makeup [25] , [26] , [27] or even facial tattoos and painting [28] have been shown to negatively influence FR systems, especially in cases where significant parts of a face are covered with tattoos or paint. Additionally, due to the recent COVID-19 pandemic, the trend of wearing facial masks in public is growing all over the world. Some people wear masks to guard themselves from certain viruses, pollution, or to simply hide their face and emotions from the public. However, in some cases facial masks are used intentionally to trick the FR systems. Hence, many research activities focus on algorithms to increase FR performance when dealing with masks that cover a large area of an individual's face [29] , [30] . Focusing on selfie filters, the impact of beautification filters has firstly been analysed in [31] . It was shown that the performance of a FR system might significantly drop in case a beautification filter drastically alters the facial appearance. In more recent works it has been shown that FR systems can be robust to moderate alterations resulting from the use of beautification applications, e.g., in [32] , [33] , [34] . With respect to fun selfie filters, the Specs on Faces (SoF) dataset was introduced in [35] for the purpose of evaluating various tasks, e.g., face detection and gender prediction, in challenging environmental scenarios. This face database contains face images of 112 subjects to which two fun selfie filters have been applied. However, its small size and the fact that facial images of the said database were mostly captured in a single session makes the SoF dataset less suitable for FR performance evaluations. The large amount of facial occlusion variations, as well as their possible random placement on the face, makes FR under occlusion a yet unresolved issue. So far, fun selfie filters have not been considered as potential occlusions a FR system has to deal with. Hence, the impact of selfie filters on FR performance is assessed both in the direct face comparison but also following an occlusion recovery approach, using inpainting techniques to treat the occluded face as an image repairing problem. For an effective occlusion removal, it is important to accurately detect and segment the occluded facial regions. In the past year, various deep learning-based object detection and segmentation methods have been shown to obtain outstanding performance [36] , [37] . Regions with convolutional neural networks (R-CNNs) [38] , Fast R-CNNs [39] , or Faster R-CNNs [40] are well-known for their state-of-the-art object detection performance. These approaches use selective search algorithms to extract regions from an image, feeding them to a CNN to produce a feature vector for each processed region. Subsequently, machine learning-based classifiers, e.g., support vector machines (SVMs), analyse the features extracted from each candidate region to determine the presence of the object. Despite their competitive detection performance, such approaches are computationally expensive. The fully convolutional network (FCN) auto-encoder framework [41] is a well performing approach for training an image segmentation network. The need for more robust object boundaries led to the development of U-Net [42] , being one of the most used end-to-end FCNs in image segmentation. The U-Net encoder uses a series of convolutions with max pooling layers, while the decoder uses transformed convolutions to upsample the encoded information. The encoder and decoder feature maps are concatenated to better learn the contextual information. For an accurate selfie filter segmentation, the current work adopts the idea presented in [9] , using U-Net architecture supplemented with a squeeze and excitation (SE) [43] block at the output of the first three layers of the encoder. Deep learning-based algorithms have been effectively used to reconstruct occluded facial regions, e.g., [22] . On the other hand, inpainting techniques focus on reconstructing the occluded elements of the image, leaving FR out of consideration. It is a challenging task to recover details of facial features on high-level image semantics, being used in many FR scenarios, such as when a subject wears sunglasses [24] , a facial mask [9] , or when there are other external facial occlusions [44] , [10] . The purpose of inpainting is to reconstruct missing information in an image. Inpainting methods usually consider information from the whole image (i.e., low-level texture information and high-level semantic information). Traditional inpainting methods rely on low level information to find best corresponding patches from the unaltered regions in the same image [45] [46] . These methods work well for background completions and repetitive texture patterns. However, as the face image consists of many unique components, low level features are limited for face inpainting tasks. Thus, the inpainting process needs to be carried out with a high semantic confidence [47] . Facial inpainting (also referred to as face completion) methods have been found to improve FR performance on occluded face images [48] . Rapid progress in deep learning, in particular GANs, inspired lots of studies [49] [9] on facial inpainting. Here, GANs are proposed to deal with both low-level textural features and high-level semantic features utilised for removing facial occlusions. In [9] several GAN-based image inpainting models, i.e., [50] , [51] , [52] , [44] , are benchmarked on real world images showcasing significant reconstruction capability. To create the fun selfie filter database used in this work, a qualitative assessment of popular mobile applications for adding fun selfie filters were conducted. In addition, various styles that focus on occluding different facial regions was considered. Subsequently, the selected selfie filters were applied to 1,441 face images of the FRGCv2 [53] dataset. For this purpose, an automated software that emulates the chosen mobile applications was used, as illustrated in Fig. 2 . The used subset of the FRGCv2 dataset has good-quality face images which allows to analyse the sole impact of fun selfie filters on FR modules in the absence of quality-related factors [54] , e.g., variations in pose or illumination. To create an appropriate database of facial images with selfie filters, a total of ten selfie filters were selected from five different fun selfie filter mobile applications. The mobile applications were selected by performing a ranking based on the criteria in Tab. II. The scores have been assigned based on the available reviews from users, as well as the authors' experience while using the applications. Tab. III shows the five selfie filter mobile applications that received the highest rankings and which are the mobile applications used in this work. When investigating the impact of fun selfie filters on FR systems, it is interesting to see how the selfie filter coverage and placement affect the performance of the tested systems. The chosen selfie filters are depicted as part of Fig. 2 . According to the criteria presented in Tab. II, a categorisation of fun selfie filters was performed based on facial coverage and placement of the selfie filter. Coverage: The selfie filter coverage is quantified by focusing on the facial region polygon that is used as a mask to the original image, cropping the facial part of the image as illustrated in Fig. 3 . This information is used further on to investigate the impact of the selfie filter based on its facial alteration, as well as focusing on specific elements that drive the eventual decrease in facial recognition performance. Eq. 1 reports on the average pixel intensity variation due to the selfie filter, being a stable and accurate way of estimating the significance of the selfie filter. Using this metric, transparent selfie filters will achieve a lower score in comparison to corresponding solid color selfie filters that cover the same area. Additionally, smoothing, compression, and other effects which do not occlude facial attributes, will not have a big impact on the calculated coverage intensity score. After trying a wide variety of selfie filters provided by various mobile application, the visual complexity of recognising the identity behind the selfie filter has been defined following the thresholds presented in Tab. IV. It is expected that the coverage score will correlate with the actual difficulty of recognising the original face once the selfie filter is applied. Tab. IV highlights the main selfie filter groups and Fig. 4 presents the scores for each of the ten samples. The impact of fun selfie filters on face detection and sample quality is estimated for scenarios with different facial coverage measures. In experiments on recognition performance, the most relevant scenario where either one of the face images to be compared has been modified using a fun selfie filter is considered. For evaluations on recognition performance, the placement of fun selfie filters is additionally considered. In all evaluations a comparison with a baseline of unaltered face images is made. Fig. 6 presents the min-max normalized face detection score, while Tab. V refers to the actual score, where the detection score ranges differ across the used algorithms (e.g., in our case, the range for detection scores on dlib is [0; 4], RetinaFace is [0; 1], and COTS is [0; 5.4]). The min-max normalization is done to ensure an equal scale when comparing the performance of various algorithms as defined in Eq. 2. (2) where R min and R max cover the desired range of normalized data (i.e., in this case R min = 0 and R max = 1) and X i refers to the detection score of sample i. As shown in Tab. V and Fig. 6 , using dlib [60] , RetinaFace [61] , and a COTS method, the confidence scores of detected faces for the selfie filtered images, in general, degrade as the selfie filter coverage increases. The used COTS is particularly prone to detection errors ( ) when the face is covered at a higher intensity or if the eye region is occluded. Such recognition systems are designed based on constraint environments, where the subject is required to follow some rules for an increased FR pipeline accuracy (e.g., in the border control scenario, every person follows a well-defined FR protocol and no unnecessary object is allowed to occlude the face). For face quality assessment, on the basis of FaceQNet [62] and MagFace [63] , results, ranging in the interval [0; 1], are shown in Fig. 7 In addition to the case when the face is fully altered by the selfie filter, a significant effect on face sample quality is shown by medium coverage selfie filters. FaceQNet as well as MagFace return a consistently lower image quality score as the selfie filter coverage increases. For MagFace, the comparatively high variance for high coverage selfie filters is mainly caused by attributing relatively good face quality scores to face images where the joker mask is applied. Hence, if the selfie filter highlights certain facial characteristics, the magnitude of the facial embedding increases while the face may still be significantly occluded. Biometric recognition performance is measured in terms of false non-match rate (FNMR) at certain false match rate (FMR) [64] , [65] . In addition, the failure-to-enrol rate (FTE) and the equal error rate (EER) are reported. For the ArcFace [66] and COTS system, a higher selfie filter facial coverage results in a higher FNMR and EER, see detection error tradeoff (DET) curves in Fig. 8 and Tab. VII. With respect to the placement of fun selfie filters, the impact of the altered facial region differs but is especially challenging for mouth. Due to its more constraint target environment, COTS performs poorly on FTE when the eyes are covered. The facial coverage resulting from the application of fun selfie filters can range from being almost non-existent to the extreme case of full coverage (cf. Fig. 4 ). As shown in the previous section, FR systems recognize well on low facial coverage scenarios, while they are rather challenged by selfie filters that produce high facial coverage. This section introduces the proposed fun selfie filter removal method which represents an adaptation of the inpainting technique of [10] . As suggested in [9] , the architecture is supplemented with a perceptual network [67] , in a form of a pre-trained VGG-19 fixed network, to stimulate the generator output, to have similar feature representation to the ground truth ones. The segmentation module and perceptual network are inspired from [47] . As proposed in [47] , a good way of improving the GAN-based image inpainting accuracy is to split the occluded image segmentation from the actual image inpainting. An overview of the architecture of the segmentation algorithm is depicted in Fig. 10 . The output of the segmentation module is a binary map indicating the pixels covered by the selfie filter. The generator of the segmentation map is a modified version of the U-Net architecture [42] consisting of a CNN-based encoder and decoder: Encoder consists of five blocks comprising of a convolution and a 'Squeeze-and-Excitation' layer, followed by a down-sampling of the input along its spatial dimensions by applying a MaxPool of kernel size 2 and stride 2; Decoder resembles the encoder architecture except that the MaxPool is replaced by the up-sampling layer and instead of the convolution layers, deconvolution layers are applied, where the last layer of the decoder uses sigmoid activation function. The local information is combined with the global one by concatenating the result of the deconvolution layers with the feature maps from the encoder at the same level. As a loss function, cross-entropy is used between the predicted binary map and corresponding target map, adding a post processing step to handle image processing operations of erosion and dilation. The goal of this module is to remove the selfie filter and reconstruct the representation of the facial characteristics that have been covered by the selfie filter in a way that is both structural and appearance wise consistent with the ground truth image. The main building blocks for the image inpainting module are the generator, the discriminator and the perceptual network. Generator: The generator has the same encoder and decoder architecture as the generator of the segmentation map, with the addition of gated convolution for the image inpainting network, accounting for a dynamic feature selection mechanism for each channel and spacial location. Fig. 13 depicts the used architecture, where each convolution is distinctively marked based on its type (e.g., gated, dilated gated, or normal convolution). Overall, the GAN model takes as input the original or selfie filtered image together with the corresponding selfie filter binary segmentation and passes it to the first generator network. Once the coarse output is derived, it is passed through a refinement network for an improved inpainting. The refined inpainting together with the selfie filter binary segmentation is the input to the fully convolutional discriminator and to the perceptual network. As a result, the GAN loss is computed and the training proceeds until it has reached the targeted number of iterations. Discriminator For training free-form image inpainting networks, the fully convolutional discriminator architecture is used. As indicated earlier, the network is inspired by global and local GANs [50] , MarkovianGANs [68] , and perceptual loss [69] . A six strided convolutional network with kernel size 5 and stride 2 is used as the discriminator. Additionally, GANs are applied for each feature element in this feature map, formulating h × w × c number of GANs focusing on different locations and different semantics of the input image. Perceptual network The third module of the image inpainting module presented in Fig. 13 is a perceptual network, in the form of a pre-trained VGG-19 fixed network [67] with a perceptual loss [69] that is applied to penalize the outputs that are perceptually not reasonable by defining a feature level distance measure between the intermediate feature maps of the reconstructed image and its original counterpart. The purpose of this network is to encourage the generator's output to be similar to the original image. As a optimization, [9] suggests exploiting the intermediate convolution layer feature maps of the VGG-19 network to get rich structural information. This is expected to help in recovering plausible structure of the face semantics. The overall generator loss function, L, is defined as: where λ rccoarse = 30, λ rcrefined = 70, λ perc = 50, and λ G = 0.7. L rc = L H + L SSIM (calculated based on the coarse and refined outputs, as presented in Fig. 13 ). L H uses the mean squared error (MSE) if the absolute element-wise error falls below one and the l 1 -distance, otherwise. Its combination with the L SSIM ensures that the resulting image resembles its target, being also similar in terms of structural similarity index (SSIM). L perc refers to the perceptual loss, while L G , the generator loss, captures the MSE loss given the discriminator's refined output and the target image. Additionally, the discriminator loss function, L D , is: where L MSE is the MSE between the input and target tensor. For both the generator and the discriminator, the Adam optimizer is applied [70] with an initial learning rate of 0.001 that is adjusted every 50,000 training iterations by 0.1. To assess the GAN model's generalizability, seven selfie filters are chosen for training and validation, while the remaining three are used for testing, as shown in Fig. 11 . The overall image inpainting process is directly related to the quality of the selfie filter segmentation. Hence, for enhanced accuracy and generalizability, the training of the segmentation module should be done on a wide variety of selfie filters. Given the limited number of considered popular selfie filters (Fig. 11) , it becomes rather challenging to generalize well on unknown selfie filter occluding unknown facial images. Therefore, data augmentation was applied. The implemented data augmentation method, for which examples are shown in Fig. 12 , consists of three steps: 1) Identify the facial region based on its landmarks; 2) Divide the identified region in a the desired number of subregions that do not overlap; 3) Place random shapes (of random colour and intensity) on a subset of subregions such that only one shape is attributed to a subregion and its size is a perfect fit for the target subregion. In a cross-database experiment, the training is performed on the FERET dataset, while the selfie filter removal is applied to the FRGC dataset. (Fig. 11 ) based on the FERET references and 7,870 are semi-synthetically created images with shapes ( Fig. 12 ) based on the FERET probes. The model is trained for 70 epochs (i.e., as a good time and performance trade-off), using an Adam optimizer as a replacement optimization algorithm for stochastic gradient descent. Fig. 14 presents the evolution of the GAN based selfie filter removal training, where samples exceeding 500,000 iterations do not differ much from the performance attained just after 500,000 iterations. However, fine-tuning the generator and the discriminator over more iterations was seen to be very important when testing the selfie filter removal. Fig. 15 highlights the selfie filter removal performance on unseen selfie filters over a set of pre-trained weights. Having trained the GAN-based selfie filter removal model for at least 500,000 iterations, following the train and test split presented in Fig. 11 , a comparison with the original unaltered counterparts is performed. Following the Peak Signal-to-Noise Ratio (PSNR) [71] and Mean Structural Similarity Index (MSSIM) [72] scores, Tab. IX indicates an overall higher similarity between the FRGC reference and the image where the selfie filter has been removed relative to the selfie filtered counterpart. PSNR captures the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. MSSIM is a perception-based model that considers image degradation as perceived change in structural information, where pixels have strong inter-dependencies especially when they are spatially close. A higher PSNR or MSSIM score relate to an increased degree of similarity between the compared images. To quantify the benefit of applying the proposed removal method, the FR performance is re-evaluated after removing the selfie filters. The corresponding DET curves are plotted for selfie filters of various facial coverage (Fig. 4) and affected facial regions (Fig. 5 ). Comparing the results in Fig. 16 and Tab. X with their selfie filtered counterparts, the EER has mostly decreased, improving the FR performance across facial coverage and placement. The only exception is proposed by the low coverage selfie filters, where, due to some facial artifacts after the removal, the metric is slightly higher than the selfie filtered counterpart, while still keeping the superior FR performance. The high coverage selfie filters have benefited the most from the proposed selfie filter removal approach, achieving a 15.30% percentage point lower EER for ArcFace (Fig. 16a) and a 16.10% percentage point lower EER for COTS (Fig. 16b) compared to the selfie filtered variant (Fig. 18) . For medium facial coverage selfie filters, despite of a realistic reconstruction of the face, the challenge of approximating facial elements of the mouth region in particular has not allowed for a significant growth in FR performance (Fig. 19) . In the case of low coverage selfie filters, facial images before and after selfie filter removal maintain a high visibility of the original facial characteristics (Fig. 19) . Despite of the relatively accurate selfie filter removal, the resulting images might vary in brightness or might still contain selfie filter related artefacts. In the case of the total facial coverage only some facial elements are approximated, while still leading to an enhanced FR performance. The application of the selfie filter removal when the entire face is occluded sees the highest FR performance enhancement but it can also improve performance in other scenarios as, for instance, illustrated in Fig. 20 and Tab. XIII. In addition to reducing the EER, the selfie filter removal improves the FTE. Given that COTS has shown to be vulnerable to selfie filter covering the eye region, the selfie removal has reduced the corresponding FTE by 19.87% points. Furthermore, the selfie filter removal has reduced the FTE on Fun filter are frequently used to modify selfies, e.g., prior to sharing them on social media. Alterations and occlusions that are added to face images by applying such fun selfie filters represent a challenge for FR systems. The results obtained during this work have shown that fun selfie filters may negatively impact commercial and open-source FR modules. Across face detection, sample quality estimation, and FR, this is especially the case for facial images with high selfie filter facial coverage and for fun selfie filters that cover the mouth and nose. Furthermore, for the used COTS system, eye coverage has a high correlation with an increased FTE. To tackle the above challenge, a selfie filter removal algorithm has been proposed. The proposed GAN-based method was shown to reduce the negative effects caused by the selfie filter when removing it prior to FR. Deep-Face: Closing the Gap to Human-Level Performance in Face Verification A survey on deep learning based face recognition Police embrace social media as crime-fighting tool Detroit police department weekly report on facial recognition How police monitor social media to find crime and track suspects Handbook of Digital Face Manipulation and Detection: From DeepFakes to Morphing Attacks Digital face manipulation in biometric systems," in Handbook of Digital Face Manipulation and Detection: From DeepFakes to Morphing Attacks, ser. Advances in Computer Vision and Pattern Recognition A survey of face recognition techniques under occlusion A novel ganbased network for unmasking of masked face Free-form image inpainting with gated convolution Face recognition with local binary patterns Learning multi-scale block local binary patterns for face recognition Local gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition A robust elastic and partial matching metric for face recognition Deep face recognition: A survey Face occlusion detection based on multi-task convolution neural network Development of partial face recognition framework Multiscale representation for partial face recognition under near infrared illumination Partial face recognition: Alignment-free approach Glasses removal from facial image using recursive error compensation Robust face recognition via sparse representation Robust lstm-autoencoders for face de-occlusion in the wild Recognizing faces with partial occlusion using inpainting Unsupervised eyeglasses removal in the wild Can facial cosmetics affect the matching accuracy of face recognition systems? Impact of facial cosmetics on automatic gender and age estimation algorithms Impact and detection of facial beautification in face recognition: An overview Impact of facial tattoos and paintings on face recognition systems Masked face recognition challenge: The insightface track report MFR 2021: Masked face recognition competition On the effects of image alterations on face recognition accuracy Detecting facial retouching using supervised deep learning PRNU-based detection of facial retouching Differential detection of facial retouching: A multi-biometric approach AFIF4: Deep gender classification based on AdaBoost-based fusion of isolated facial features and foggy faces A survey of deep learning-based object detection Image segmentation using deep learning: A survey Rich feature hierarchies for accurate object detection and semantic segmentation Fast R-CNN Faster R-CNN: Towards real-time object detection with region proposal networks Fully convolutional networks for semantic segmentation U-net: Convolutional networks for biomedical image segmentation Squeeze-and-excitation networks Interactive removal of microphone object in facial images Simultaneous structure and texture image inpainting Patchmatch: A randomized correspondence algorithm for structural image editing An improved method for semantic image inpainting with GANs: Progressive inpainting Does generative face completion help face recognition? Occlusion-aware face inpainting via generative adversarial networks Globally and locally consistent image completion Generative image inpainting with contextual attention Edgeconnect: Structure guided image inpainting using edge prediction Overview of the face recognition grand challenge Face image quality assessment: A literature survey Sweet Face Camera YouCam Fun Bloom Camera Dlib-ml: A machine learning toolkit RetinaFace: Single-shot multi-level face localisation in the wild FaceQnet: Quality assessment for face recognition based on deep learning MagFace: A universal representation for face recognition and quality assessment ISO/IEC 2382-37:2012 Information Technology -Vocabulary -Part Information Technology -Biometric Performance Testing and Reporting -Part 1: Principles and Framework, Int'l Organization for Standardization ArcFace: Additive angular margin loss for deep face recognition Very deep convolutional networks for large-scale image recognition Imageto-image translation with conditional adversarial networks Perceptual losses for real-time style transfer and super-resolution Adam: A method for stochastic optimization Hardware-constrained hybrid coding of video imagery Image quality assessment: from error visibility to structural similarity This research work has been partially funded by the German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE and the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 860813 -TReSPAsS-ETN.