key: cord-1012746-gzpfbauk
authors: Hosni Mahmoud, Hanan A.; Mengash, Hanan Abdullah
title: A novel technique for automated concealed face detection in surveillance videos
date: 2020-06-12
journal: Pers Ubiquitous Comput
DOI: 10.1007/s00779-020-01419-x
sha: 91c28587a88a9a81ff1b3198217d82ee38f4f348
doc_id: 1012746
cord_uid: gzpfbauk

Face detection perceives great importance in surveillance paradigm and security paradigm areas. Face recognition is the technique to identify a person identity after face detection. Extensive research has been done on these topics. Another important research problem is to detect concealed faces, especially in high-security places like airports or crowded places like concerts and shopping centres, for they may prevail security threat. Also, in order to help effectively in preventing the spread of Coronavirus, people should wear masks during the pandemic especially in the entrance to hospitals and medical facilities. Surveillance systems in medical facilities should issue warnings against unmasked people. This paper presents a novel technique for concealed face detection based on complexion detection to challenge a concealed face assumption. The proposed algorithm first determine of the existence of a human being in the surveillance scene. Head and shoulder contour will be detected. The face will be clustered to cluster patches. Then determination of presence or absent of human skin will be determined. We proposed a hybrid approach that combines normalized RGB (rgb) and the YCbCr space color. This technique is tested on two datasets; the first one contains 650 images of skin patches. The second dataset contains 800 face images. The algorithm achieves an average detection rate of 97.51% for concealed faces. Also, it achieved a run time comparable with existing state-of-the-art concealed face detection systems that run in real time.

In this research, we are concerned by two concepts; the first is face detection in surveillance videos and the second is concealed face identification. Face detection is concerned by discovery of faces in a video or video frames and if discovered, then the image location should be noted. The challenges of face detection techniques are numerous. Face pose in a video frame can be non-conform because of the locus of camera-face dependency. Camera locus can be frontal, inclined, or profile. Also, faces can be concealed partially or totally due to innocent actions like presence of beards or glasses, or due to threatening actions like presence of mask. Another problem that face detection systems may face is the problem that faces may be the occlusion where faces can be partially concealed by other objects in the video frame. Also, lighting state and camera characteristics can distress the appearance of a face. To handle these complications, researchers have proposed different techniques. Robust face recognition f r o m m u l t i -v i e w v i d e o s i s p r o p o s e d b y D u , Sankaranarayanan,and Chellappa [10] . Advanced face detection techniques can handle adversative conditions such as lightning setting and profile angles. Nowadays, techniques utilize neural networks and skin color identification. Skin detection using color processing mechanism was proposed by Wu et al. [43] . Skin color detection utilizing neural network was proposed by Kim, Hwang, and Cho [24] . This algorithm achieved high-performance face detection time. Shearlet neural network for face detection using one sample per person was achieved by Borgi et al. [3] . Ejbali, Zaied, and Ben Amar [11] implemented face recognition model based on elastic graph equivalence, skin segmentation, and consequent. A literature survey of face recognition is presented by Zhao, Chellappa, Phillips, and Rosenfeld [49] . Filali et al. [12] introduced texture classification of melanoma skin cancer utilizing an efficient convolutional neural network. Chai, Shan, Chen, and Gao [5] utilized locally linear regression for poseinvariant face recognition. Khan and Khan [23] produced pioneered algorithms with high reliability for face localization in multifaceted images.

Identification of skin area is one of the well-known techniques to firstly identify faces in images or video frame. Distinguishing the skin area reduces the time complexity of the face detection algorithms. On the contrary, we want to detect covered faces, so we are proposing an exclusion algorithm, where we have to identify faces in frames by applying the head and shoulder identification; second, if we can detect movement in the video, then we can locate the face; we then exclude the concealed face assumption if skin complexion is detected.

Face Our research presents a technique for face detection which is based on skin color segmentation using hybrid color model combining the normalized RGB and the YCbCr models. This paper combines the skin segmentation and facial features to detect faces in images. It has been found that by using skin segmentation accuracy, the algorithm is improved and gives the better result than other approaches. This paper is organized in the following manner: Section 2 presents background around face detection in general and in constrained environment. Section 3 describes the proposed technique. Section 4 describes the proposed hybrid approach for skin detection. Section 5 gives experimental results, while section 6 summarizes the conclusions.

A face shape prototype with local adaptive morphology handling is presented by Liu, Guo, Liu, Lee, and Yao [29] as an alignment standard to overcome geometric distortion artifacts due to various poses. Du, Hu, Qiao, and Pitas [9] proposed a very well-connected face recognition system utilizing lowrank sparse illustration. Zaman, Shafie, and Mustafah [46] presented a facial recognition system that is robust against different expressions and camera occlusions. Jin, McCann, Froustey, and Unser [21] proposed a deep convolutional neural network to resolve ill-posed inverse image problems.

Zhu, Mai, and Shao [50] utilized a color attenuation methodology for haze removal from hazy images. They introduced linear modelling paradigm for scene depth utilizing a supervised learning method, where the depth information can be modelled. Yu, Bampis, Gupta, and Bovik [45] introduced bi-step image quality estimate methodology which is very important in any image prediction system. According to Fu, Xu, Li, Liu, Ye, and Zhu [13] , crowd density estimation is carried utilizing convolutional neural networks, which is very helpful in surveillance system to detect and estimate people in crowds. Wang, Wang, Wang, Zhang, and Qiao [42] proposed scene recognition utilizing local patches. Wu, Lin, Dong, Yan, Bian, and Yang [44] utilized one example methodology for person re-identification through progressive learning.

Chen, Papandreou, Kokkinos, Murphy, and Yuille [6] presented semantic image segmentation utilizing deep convolutional nets; the proposed methodology gave a very good performance in image segmentation that utilizes semantic information. Zhang K., Zhang Z., Li Z., and Qiao Y. [48] proposed a framework that influenced a cascaded architecture of deep convolutional networks. They indicated high performance in forecasting face location in a coarse-to-fine fashion. Also, Badrinarayanan, Kendall, and Cipolla [2] proposed image segmentation technique using neural network. Li et al. [26] as well, utilized semantic information in maps to solve salient object identification problem; they modelled the semantic attributes of salient objects and utilized convolutional neural network with raw images as input and saliency maps as output. Romera, Álvarez, Bergasa, and Arroyo [36] also focused on extracting semantic information to achieve semantic segmentation in real time. Although Gao, Li, Woo, and Tian [14] emphasized on image segmentation of thermography imaging using genetic algorithms, still their proposal can be extended to normal images. While Liu, Xiao, and Yang [28] focused on edge detection utilizing coastline detection algorithm, they mixed the region and edge active contour methods.

Real time for surveillance systems is very crucial in their success. Awais M. et al. [1] proposed video surveillance system with enhanced accuracy and less computational complexity. Their system comprises face localization and recognition, and it takes real-time videos of faces. The system then extracts key frames and compares it with stored facial images. It uses histogram of oriented gradients (HOG) features. Their simulation results show almost 92% success rates which are comparable with the deep learning approaches, but deep learning techniques have higher computational cost. Ullah H. et al. [40] presented a real-time and new face recognition technique with occlusion. The system utilized 68 points to detect the face in the input image. Linear discriminant techniques are then utilized to extract face features. At the last stage, a classifier with nearest center measures is used. The system is proven to act in real time through experimentation results. Haq M. et al. [17] proposed a novel technique to boost the performance of low-resolution face recognition. Many other articles have studied highperformance face recognition such as the authors Zhang J. et al. [47] . They designed a high-performance face recognition system that utilizes edge computing.

As we surveyed many face detection and recognition techniques, still the most important issue for our research are the systems that recognize masked faces or faces under occlusion. Qezavati H., Majidi B., and Manzuri M. T. [34] introduced a methodology for the detection of partially covered face. It is utilized in surveillance videos containing partially concealed faces including headscarves and eyeglasses. The methodology combines Haar and binary histogram for face classification.

Rajeshwari, Karibasappa, and GopalKrishna [35] surveyed face detection based on skin detection. Liao, Jain, and Li [27] addressed problems in face detection with no prior constrains. They utilized normalized pixel difference image features. These features are extracted by experimental psychology. Bu W. et al. [4] presented a novel cascaded CNN (convolutional neural network) framework to detect masked faces. They also constructed a dataset for masked faces. Ge S. et al. [15] also proposed a LLE-CNNs for occluded face detection. Pre-trained CNNs are utilized to exclude facial regions from the image expressing them with descriptors. These descriptors are converted into similarity-based measures. They tested the system on a large pool of synthesized faces, occluded faces, and also on non-faces. Ghiasi G. and Fowlkes C.C. [16] presented a hierarchical deformable face detection model. They presented occlusions in a structured model. They also enhanced training data with synthetically occluded face images. Nair A. and Potgantwar A. [32] proposed an automated masked person detection in less time.

In a recent study, Ud Din N. et al. [39] proposed mask object removal in face images. They faced challenges because facial masks usually cover a large part of the face, and they also faced the problem of the lack of training datasets for face image with and without mask. They introduced a solution for mask detection. They also utilized a generative adversarial network (GAN) of two discriminators.

The proposed technique is utilized in surveillance systems. It aims in detecting concealed faces in surveillance images or videos. It comprises many algorithms starting with an algorithm to take images for the surveillance scene under different conditions. The scene under surveillance will detect new objects that enter the scene. The object will be identified as a human being using height and width measurement extraction. If a human being is detected, face and shoulder areas will be extracted utilizing pattern learning from a training phase. Clustering of patches is performed and determines if it is skin patch or a concealed face patch. Human complexion detector using a hybrid technique combining normalized RGB and the YCbCr is proposed. An overview of the proposed technique is depicted in Fig. 1 . Head and shoulder detection algorithm is depicted in Fig. 2 . The proposed technique comprises many algorithms. Algorithm 1 depicts the training phase for head and shoulder detection. Algorithm 2 is utilized to determine head and shoulder for an unknown image. Algorithm 3 depicts the face detection algorithm. The RGB color space is mainly utilized for digital image as explained by Cheng, Liu, and Haifeng [7] . There is a high correlation between the RGB components, which creates sensitivity. To solve the sensitivity issue, each component of the RGB should undergo normalization process: normalized RGB color space by Loesdau et al. [30] . The normalization in Eqs. 1, 2, and 3 helps to reduce the dependency between the RGB components.

HSV model (hue saturation value) is described by Jang and Ra [20] as a discernment model that discriminates between luminance and chrominance components. H, S, and V components are depicted in Eqs. 4, 5, and 6. 

Another model is YCbCr to encode RGB images as introduced by Lei et al. [25] . This model utilizes linear transform to separate illumination and chrominance components. This model is very effective the detection of human skin. YCbCr is calculated as follows in Eqs. 7, 8, and 9.

Y ¼ 0:299 R−0:587 G−0:114 B ð7Þ

We simulated the three models using 100 labelled images of skin and non-skin patches; 70 patches were skin and 30 patches were non-skin patches. The computed confusion matrix is depicted in Tables 1, 2, and 3. The results are not convincing enough, as the best one was the YCbCr, which gives only 71.4% true-positive detection and 53.3% truenegative detection. We decided to utilize a hybrid model of the normalized RGB and the YCbCr as follows in Eqs. 10, 11, and 12.

Y ¼ 0:299 r−0:587 g−0:114 b ð10Þ

The hybrid model takes advantage of the normalization process of the RGB model to reduce the dependency between the RGB components, and takes advantage of the YCbCr model to encode RGB images and to separate illumination and chrominance components. The confusion matrices of the normalized RGB, HSV model, YCbCr model, and our proposed hybrid model are shown in Tables 1, 2, 3, and 4. In the experiments, we used 2000 patches; 1000 are human skin patches and 1000 are masked human skin patches. The hybrid model gives 97.3% true-positive detection and 98.2% true-negative detection, outperforming the other three models. The true- negative percentage is very important for our proposed system, because if we are sure that the area is face and it did not give us skin detection, therefore we can assume the face is covered. The proposed algorithm is depicted in Algorithm 4. Identification of the covered face is depicted in Algorithm 5. The whole system is depicted in Fig 3 a  and 

The Identification of covered face in a given cluster (Clus (i)) is as follows: Start 1. Convert the cluster into the same hybrid model that was utilized in the training phase. 2. Classify cluster using the skin classifier as either a skin or non-skin patch.

The face is not covered End

We tested the hybrid model utilizing a set of training images that are selected from two databases, the first one (TAN) is presented by Tan et al. [38] , and the second dataset (FvNF) is presented by Nanni and Lumini [33] . The first image database includes 650 skin and non-skin patches. Color images are obtained from the various sources and under different illumination settings. The second dataset (FvNF) is (face vs. nonface), which is composed by 800 face images. This dataset has been collected and used by Nanni and Lumini [33] to evaluate the capability of a skin detector method to detect the presence of a face, based on the number of pixels classified as skin. For our experiments, we created a synthetic dataset by occluding parts of the skin in 450 images in FvNf and labelled them with concealed human set while the rest of the images were left not occluded.

For classification of the skin patches, we applied the FMeasure as stated in Eq. 13. It is shown from Table 5 that the proposed hybrid technique has high detection rate compared with existing models. Metrics such as FMeasure and specificity will be utilized for comparison between the proposed algorithm (Algorithms 4 and 5) and other algorithms in the literature. Specificity describes the true-negative rate that measures the ratio of actual negatives that are correctly detected as such. FMeasure is a metric of the experiment accuracy. It comprises the precision and the recall. Also, the metrics recall, falsepositive ratio (FPR), and false-negative ratio (FNR) are utilized in the experimentation, and the comparison result is depicted in Tables 5 and 6 , and Fig. 6 . All those metrics are defined in Eqs. 13-18.

Recall

Some of the used skin patches, clustering, and skin patched after concealment (used in actual experiments) are depicted in Fig. 4 (dataset TAN) and Fig. 5 (dataset FvNF). The proposed hybrid approach achieved highconcealed face detection performance for frontal faces. Performance is degraded when images contain nonfrontal faces and for dark skin images if concealed with brown cover. The hybrid approach achieves average specificity of 96.8%, which is an enhancement in specificity Non-human skin 120 880 by 28%, and average detection rate of 97.5%, which is an enhancement by 42.12% than the second best algorithm YCbCr model (Fig. 6) . While FMeasure is a measure that combines precision and recall, the proposed hybrid approach achieves enhancement by 30%. In Table 7 , we compared some of existing systems for concealled face detection. Different features are sought and compared with our proposed system. The systems that we compared are as follows: [34] detected partially covered face with headscarf. 3. LLE: Ge S. et al. [15] succeeded in detecting masked faces with LLE-CNNs. 4. Occlusion coherence: Ghiasi G. and Fowlkes C.C. [16] introduced a local occluded faces utilizing a hierarchical deformable model.

Viola: Nair A. and Potgantwar A. [32] proposed a masked face detection using the viola algorithm.

We carried experiments to determine the runtime of our proposed approach. It is very crucial to carry our algorithm in real time. We compared our implementation with Viola (Nair A. and Potgantwar A. [32] ) and GAN (Ud Din N. et al. [39] ). The Viola system detects people's face and determines if it is masked or not in video based setting. The Viola system comprises several phases such as distance from camera, identifies the eye line and the face, and finally detects if the face is masked. While with GAN, we compared part of our proposed system (masked face detection) with the first phase of GAN which includes detection of mask in images. We added the detection of face and shoulder to GAN. Both systems are known for their real-time occluded face detection.

In Fig. 7 , we show average runtime comparison combined with average detection rate. We used 450 images from the FvFN database with synthetic mask.

As shown in Fig. 7 , Voila has a slight less average runtime than our proposed system but with lower detection rate. Our system has 12% more detection rate while having average runtime of 8% which is still considered real time. Also, GAN has more average runtime than our proposed system but with comparable detection rate. Our system is 30% faster than GAN. Figure 8 presents the average runtime for VOILA, GAN versus, and our proposed system for 11 different masked face images. Head and shoulder contour is detected. The face will be clustered to cluster patches. Then determination of presence or absent of human skin will be determined. We proposed a hybrid approach that combines normalized RGB and the YCbCr space color. This technique is tested on two datasets; the first one contains 650 skin patches, and the second one contains 800 face images. Masks are synthesized on 60% of the images in the data sets. The algorithm achieves an average masked face detection rate of 97.51% for concealed faces in real time. 

Real-time surveillance through face recognition using HOG and feedforward neural networks

SegNet: A deep convolutional encoder-oder architecture for image segmentation

Regularized shearlet network for face recognition using single sample per person

A cascade framework for masked face detection

Locally linear regression for pose-invariant face recognition

DeepLab: semantic image segmentation with deep convolutional nets atrous convolution and fully connected CRFs

Uniform color space for color storage

Face detection and facial feature extraction based on a fusion of knowledge based method and morphological image processing

Robust face recognition via low-rank sparse representation-based classification

Robust face recognition from multi-view videos

Face recognition based on beta 2D elastic bunch graph matching

Texture Classification of skin lesion using convolutional neural network

Fast crowd density estimation with convolutional neural networks

Physics-based image segmentation using first order statistical properties and genetic algorithm for inductive thermography imaging

Detecting masked faces in the wild with LLE-CNNs

Occlusion coherence: localizing occluded faces with a hierarchical deformable part model

Boosting the face recognition performance of ensemble based LDA for pose non-uniform illuminations and lowresolution images KSII Transactions on Internet and Information Systems

An overview of face detection methods in angular positions

Labeled faces in the wild: a database for studying face recognition in unconstrained environments

Pseudo-color image fusion based on intensity-hue-saturation color space

Deep convolutional neural network for inverse problems in imaging

Deep unified model for face recognition based on convolution neural network and edge computing

Towards reliable face localization in complex images

Convolutional neural networks and training strategies for skin detection

Image retrieval based on YCbCr color histogram

DeepSaliency: Multi-task deep neural network model for salient object detection

A fast and accurate unconstrained face detector

A coastline detection method in polarimetric SAR images mixing the region-based and edge-based active contour models

Panoramic face recognition

Chromatic indices in the normalized rgb color space

Appearance-based facial detection for recognition

Masked face detection using the Viola algorithm: a progressive approach for less time consumption

FvNF, Skin detection for reducing false positive in face detection

Partially covered face detection in presence of headscarf for surveillance applications

Survey on skin based face detection on different illumination poses and occlusion

ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation

Face Recognition based on scale invariant feature transform and spatial pyramid representation

Image and skin Database

A Novel GAN-Based Network for Unmasking of Masked Face

A robust face recognition method for occluded and low-resolution images

Face detection based on template matching and neural network

Weakly supervised patchnets: describing and aggregating local patches for scene recognition

Skin detection using color processing mechanism inspired by the visual system

Progressive learning for person re-identification with one example

Predicting the quality of images compressed after distortion in two steps

Robust face recognition against expressions and partial occlusions

Design and implementation of a face recognition system based on edge computing

Joint face detection and alignment using multitask cascaded convolutional networks

Face recognition: a literature survey

A fast single image haze removal algorithm using color attenuation prior

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations