key: cord-1020105-9b7t9l7h
authors: Kumar, Manish; Raju, Kota Solomon; Kumar, Dinesh; Goyal, Nitin; Verma, Sahil; Singh, Aman
title: An efficient framework using visual recognition for IoT based smart city surveillance
date: 2021-01-20
journal: Multimed Tools Appl
DOI: 10.1007/s11042-020-10471-x
sha: 40cbf2c7ef4876f8929a59d434ff879ef9082360
doc_id: 1020105
cord_uid: 9b7t9l7h

Smart city surveillance systems are the battery operated light weight Internet of Things (IoT) devices. In such devices, automatic face recognition requires a low powered memory efficient visual computing system. For these real time applications in smart cities, efficient visual recognition systems are need of the hour. In this manuscript, efficient fast subspace decomposition over Chi Square transformation is proposed for IoT based on smart city surveillance systems. The proposed technique extracts the features for visual recognition using local binary pattern histogram. The redundant features are discarded by applying the fast subspace decomposition over the Gaussian distributed Local Binary Pattern (LBP) features. This redundancy is major contributor to memory and time consumption for battery based surveillance systems. The proposed technique is suitable for all visual recognition applications deployed in IoT based surveillance devices due to higher dimension reduction. The validation of proposed technique is proved on the basis of well-known databases. The technique shows significant results for all databases when implemented on Raspberry Pi. A comparison of the proposed technique with already existing/reported techniques for the similar applications has been provided. Least error rate is achieved by the proposed technique with maximum feature reduction in minimum time for all the standard databases. Therefore, the proposed algorithm is useful for real time visual recognition for smart city surveillance.

The real-time implementation of a computer vision system on IoT based surveillance system is the need of the hour for contemporary society. Pattern recognition is one of the ground breaking recognition techniques to serve major applications such as biometric security, forensic investigation, Quick Response (QR) code and smart door locking systems, etc. [19] . The major challenges for developing a feature recognition system based on IoT applications are computational efficiency, accuracy, power consumption, and portability. There are a lot of existing techniques for pattern recognition are local binary pattern (LBP) and its variants, principle component analysis (PCA), and linear discrimination analysis (LDA) developed by the various researcher. Among these techniques, the local binary pattern technique is the most popular, investigated, and scrutinized due to its quality features such as tolerance against illumination changes, ease of implementation, computational simplicity, and fast response [31] . LBP, along with its variants, are investigated by the authors for classification problems [2] . The face image is divided into 8 × 8 or 16 × 16 regions, and then LBP feature distributions are extracted. The histogram of such features is computed region-wise, and a global concatenated histogram is used as a face descriptor. The performance of the proposed method is evaluated under different challenges [14] . The main idea of the EVBP descriptor is based on Virtual Electric Field (VEF). Authors combined Local Binary Pattern (LBP) based on the VEF. The neighbourhood of each pixel is assumed as a grid of virtual electric charges that are electrostatically balanced. The LBP concept is applied to the neighbourhood to generate the EVBP based representation of the face. This representation is computed for all four directions using the corresponding four electrical interactions [9] . A novel face feature extraction approach based on LBP and Two Dimensional Locality Preserving Projections (2DLPP) is explored. This approach aims to enhance the texture features without disturbing the space structure properties of a face image. LBP nullifies the variation in illumination and noise due to which the detailed texture characteristics of face images are enhanced. 2DLPP is performed to keep prominent features and decrease the feature size. In the proposed mechanism, the Nearest Neighbourhood Classifier (NNC) is used to classify the faces [45] . A new approach named the Two Directional Multi-level Threshold-LBP Fusion (2D-MTLBP-F) is proposed to illuminate invariant face recognition. The Threshold Local Binary Pattern (TLBP), combined with the Discrete Cosine Transform (DCT), is investigated. The LBP with different thresholds and neighbourhoods can be used to generate information. This information can be used to enhance the recognition rate. In the proposed method face images are normalized using DCT normalization technique, the resultant images are transformed into 61 levels of TLBP with different thresholds, and then the normalized DCT image is fused into these TLBP layers and face recognition is performed using the sparse sensing classifier (SRC) [3] . A novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT) merges the frequency components of images obtained through Weber local descriptor and local binary descriptor in frequency domain [15] . These frequency components are invariant to multi-scale and multi-orientation facial images for facial expressions. Selection of significant and prominent feature set is the key to highly accurate face recognition, texture classification [16, 18] and scene classification [35, 42] . Despite exotic properties and applications of LBP, its extracted features are very sensitive to the image noise. In any image small variations may drastically modify the LBP features [22] . The number of LBP codes occurs very significantly  thereby infrequent features are difficult to measure from the particular histogram bin  and compact features of the image are difficult to calculate and become almost incomprehensible for the system. Further, Uniform LBP is used for the dimensions reduction [12] . For the binary codes contain less than three transitions from 1 to 0 and vice versa are called uniform patterns. It has been observed that the uniform patterns are less than 90% of total patterns for (8, 1) neighbourhood and almost 70% for (16, 2) neighbourhood but still the further reduction of the dimensions of the image poses a serious challenge. To target the issue of significant dimensional reduction of LBP descriptor, many subspace approaches are reported in literature [25] . The Principal Component Analysis (PCA) approach is reported to remove co-occurrence features [11] . Still, PCA is hypersensitive to the noise, and its suitability is restricted to the small data analysis, and the recognition rate remains insufficient [4] . LDA is another useful method to reduce the feature dimensions but its computational complexity and rotation variant approach limits its uses [28] . In order to address computational complexity and for computational load reduction other prominent techniques reported are Power Method [33] , QR factorization [21] and subspace iteration methods [8] but these approaches suffer from slow convergence under situations, such as low signal to noise ratio and unknown subspace dimensions.

The algorithms and their variants as shown in Table 1 achieve optimum accuracy. For IoT based surveillance systems these methods are too complex. The complexity is in terms of computation time and run time memory requirements. Multimodal biometric identification approach is proposed for human verification based on voice and face recognition fusion, for the surveillance system voice recognition module is difficult to implement [1] . Reducing the effect of noise due to the illumination on face database has been proposed for face recognition [6] .Hybrid feature extraction (HFE) technique is proposed for overcome the anti ageing effect of face recognition. Results of the algorithm is proved on different database but the complexity in terms of training and testing is not suitable for IoT based fast recognition systems [30] . A multi-feature fusion framework is proposed in literature with Gabor and deep feature for small sample face recognition, accuracy and performance of algorithm is up to the mark still extraction of feature process is lengthy and time consuming [46] . Also, there are various deep learning algorithms involving neural networks like Convolution Neural Network (CNN) and its variants [13, 29] which were explored by various researchers in past few years. These algorithms are computationally expensive and require specialized hardware like GPU's for their development and deployment [32] . These algorithms are not suitable for IoT applications in real time where limited resources and power are available. Thus, there is a requirement of an approach which can address the above mentioned issues efficiently and is also suitable for feature dimensionality reduction in a short span of time.

Therefore, a low power and less computational method need to be evolved for IoT based surveillance systems. Some authors proposed fast space decomposition [37, 38] to perform feature dimensional reduction. It is useful for optimum compact feature extraction. Face and texture datasets can be used for the validation of these methods. Recently, many authors are using Raspberry Pi board for IoT based surveillance systems as it is available at an affordable price for prototyping of the systems. The major contribution in the paper are:

1. An efficient framework is proposed for surveillance systems for smart cities using IoT devices. The proposed framework works well for real time applications in industries for employee identification and surveillance systems in smart cities. 2. Efficient fast subspace decomposition over Chi Square transformation is proposed. This transformation has yield better recognition rates over various datasets. The paper is divided into four sections. The first section has already introduced the research problem and presented the literature survey. The second section of the paper explores and explains the algorithm of real time face recognition including face detection, face image enhancement, features extraction, dimensional reduction, face classification and other recognition applications. The penultimate section of the paper presents the experimental setup, results, and their discussion; and the last section concludes the paper.

The proposed algorithm is elucidated in Fig. 1 by the camera the features are extracted and reduced in real time for recognition. If the face is recognized from the database, the name and identity are displayed otherwise the face may be registered in the database for future recognition. The authors tested this system on raspberry Pi board but this can be further extended to other IoT devices.

Pseudo code of the proposed algorithm is as follows: Training Phase:

Step 1: Normalize the dataset to convert the dataset by substracting the mean image and divide by variance. Also divide the image into 8 × 8 bins for computation of Local Binary Patterns

Step 2: Apply Local Binary Pattern of each image using uniform LBP.

Step 3: Create histogram of each bin and concatenate the histograms to get global histogram

Step 4: Apply Chi Square transformation of the resultant image to achieve Gaussian distribution

Step 5: Store the transformed data into the requisite format into the csv/xml file.

Testing/Deployment Phase:

Step 1: Read the stored csv/xml file.

Step 2: Input image from Camera/DataSet for recognition

Step 3: Apply LBP, Chi Square Transformation as per training on this single image.

Step 4:Compute Chi Square Distance of test image to all the images in the dataset and Classify the image to the minimum error class.

Testing/Deployment can be done on a local Machine or on IoT devices. In case of IoT device csv/xml file and trained model need to be deployed on IoT device.

The proposed Chi-square transformed fast subspace LBP algorithm described as: Initially, uniform LBP u 2 8;2 is extracted the feature of a query image where subscript 8,2 represents eight neighbours at a distance of 2. Superscript u 2 stands for using codes for uniform patterns and one code for all other patterns. The central pixel denoted as (x c , y c ), P denotes the

sampling points on a circle with radius R, i c and i P denotes, gray-scale values of the central pixel respectively [3] . Thresholding function S(a) may be defined as

The h LBP is features histogram calculated by the standard LBP algorithm. Further, Chi-Square transformation is performed to make the distribution of the PDF of LBP as Gaussian thereby optimum usage of extracted LBP features is achieved. This Chi square transformation is performed by taking two samples of LBP features denoted as 'a' and 'b'. These samples further introduce another feature vector x = {x 1 , x 2 , x 3 ………. x d } where each element of x i is represented as:

To evaluate the Chi squared distance the normalization of 'x' is performed as:

Now, the fast sub space decomposition is applied on the input LBP feature 'x' for the dimensional reduction as:

Where s(t) is the LBP feature histogram, A(θ) is the subspace span with the dimension 'd', n(t) represents additive noise and x(t) is the array output observed at time '′t ′ = 1, …. , N. In order to remove the unreliable features, the co-variance matrices of signal x(t) is calculated as:

Where W s is the co-variance matrix of the signal and the decomposition of W x signal for the finite number of features 'N' (say) can be written as:

b W x is the signal subspace and its dimension is calculated from the 'd' eigenvectors {e 1 , ……., e d } of b W x . Now the task is to calculate the optimal value of 'd' so that non-repeated feature of the LBP histogram can be extracted. The length of 'd' can be evaluated using the non-repeated Eigen values of the co-variance matrices [37] . This optimal length of 'd' is calculated by taking new statistics in consideration as reported in [37] . The extracted features of the trained data set are reduced by the signal subspace vector. Thereafter, the trained data set is stored in the system memory and the signal subspace vector extracts the optimal features for all the testing samples.

The proposed approach reduces features of histogram of the trained data set stored in the system memory. For the recognition of the given query image, the reduced feature histogram can be computed. Thereafter minimum distance of the features is calculated by Chi square distance.

Various feature similarity approximation techniques between the test image features histogram and stored trained image feature histogram such as log-likelihood, Euclidean distance, histogram intersection technique and Chi square distance are probed. In the proposed work, Chi square distance calculation is used for recognition. Further authors substantiate that after applying weights to the unique features on the image gives better results in terms of accuracy and time complexity. The extracted feature image and the histogram vector is shown in Fig. 2. 

The proposed architecture is shown in Fig. 3 . The High end server in the architecture is used to store dataset, module training. The trained model and the computed features are then stored in the common dataset. This common dataset is either on cloud or inside IoT device memory. The IoT Gadget is used to deploy the model in real time. The gadget is also connected to cctv/ web camera for real time input. The trained model inside the gadget will work as Identfication module for all applications like Employee Identification use Face, Security Surveillance in Industries and Security device for Vehicles. The computed decision can further be communicated to mobile device for further actions.

The experiments are performed and validated on desktop and Raspberry Pi. The desktop machine is used with Octa core i5 processor of 2.7 GHz, 4 GB-DDR3 RAM, and Linux (Ubuntu 16.04) operating system using OpenCV (version 3.2.0). Raspberry Pi board having Quad core 1.2 GHz Broadcom BCM2837 64 bit CPU, 1 GB RAM with 8GB memory card. The proposed approach is validated on four different databases and the cross-validation Table 2 . It is observed that the performance of proposed algorithm is better than the existing approaches except PmSVM-Chi2 and PmSVM-HI as both of these are error free on the given dataset. Fig. 6 . In this dataset all the face images collected from internet showcase variation of expression, posture and illumination. The high dimension LBP feature gives robust performance than baseline LBP feature and baseline HOG feature [40] . Comparison of proposed algorithm with various existing approaches on the basis of percentage error rate is shown in Table 2 . It is observed that the proposed approach performed better than extant subspace approaches although memory consumption and computation cost is quite high for this data set.

(d) Analysis of DynTex++ database for dynamic texture recognition.

The DynTex++ database [20] contains 36 classes and every class has 100 sequence of 50 × 50 × 50 size. This dataset is widely using for dynamic texture recognition and it has large dimension as compared to face databases because of this it consumes relatively more memory and computation cost also soars. For the validation of approaches, the test bench is designed as five cross average validation; 80 sequences for training set, rest 20 sequences for testing. The same experiment is repeated for 5 time and average results are taken into consideration. Comparative analysis of proposed algorithm with existing approaches is shown in Table 2 . It is clear that proposed approach performed better as compared to the existing approaches. Fig. 6 LFW dataset [41] In order to get better physical insight of the proposed technique a comparison analysis of error rate percentage with respect to reduced feature percentage and error rate percentage for standard algorithms have been performed. By using the proposed feature reduction technique, the percentage change in error rate with respect to the percentage change in reduction of features is studied for the standard databases as represented in Fig. 7 . Less than 3% error rate is achieved with 27% reduction of features for the entire tested database except LFW in which it is less than 10%. In comparison to the existing recognition and detection algorithms the proposed technique exhibits least error rate with maximum feature reduction in minimum time for all the standard databases. Moreover, the proposed technique is dynamic in nature.

Further, the error rate performance of the existing algorithms for the standard databases has been compared to the proposed algorithm as represented in Fig. 8 . It is observed that the percentage error rate is lowest for the proposed algorithm for all the standard databases as compared to existing algorithms.

Further, the proposed method is also contrasted for standard datasets with different algorithms. The parameters including precision, sensitivity and F-measure of the intended technique are compared with other existing algorithms for standard datasets as represented in Fig. 9a -d. It is observed that the precision of the proposed algorithm is comparable to the existing algorithm, but on the other hand, the sensitivity and F-measure is much larger, which proves the efficacy of the retrieved features through the use of proposed technique. Therefore, by applying the proposed technique the computation time for recognition as well as the memory usage has been reduced significantly. This makes the proposed algorithm suitable for real time applications and memory devices like Raspberry Pi etc.

False acceptance rates (FAR) and the true acceptance rates (TAR) are significant parameters for all surveillance related applications. As face recognition is now a days gaining popularity in surveillance environment so a comparison of FAR & TAR has been performed for all readily available datasets with the existing algorithms as shown in Fig. 10a -d. It is observed that TAR for varying for the proposed algorithm is higher than the existing recognition techniques for varying values of FAR which makes it highly efficacious for potential security and forensic investigation applications. For verification of performance and accuracy of the proposed algorithm, it is compared with existing algorithms for standard databases as represented in Fig. 11 . It is noticed that the proposed technique is as accurate as the other algorithms even after For further verification of the performance of the suggested method the feature dimensionality is compared with the existing techniques as illustrated in Table 3 . It is noticed that the proposed algorithm exhibits maximum dimensionality reduction as compared to existing algorithm. Therefore, the proposed technique is capable of performing visual recognition efficiently with minimum feature size in minimum time span. This capability of the suggested approach makes it suitable for the real time implementation on the Raspberry Pi board for the potential uses in IoT applications such as forensic applications, identification in banking sector, in AADHAR database and texture recognition applications. The power consumption of the board is optimum due to better efficiency of the algorithm.

The proposed algorithm is validated through experimental results shown in results section. The features have been reduced effectively so that deployment of the algorithm on IoT devices is achieved in real time. For the real time application it can be implemented on the suitable IoT devices for prototype of the vision system. The proposed system is implemented using open source library OpenCV in C on Raspberry Pi running Ubuntu with USB camera. The reduction of unreliable features improves the capacity of the system memory and reduces the response time of the system which is desired for IoT applications. The proposed algorithm is verified and validated on the sample face of author himself by using Raspberry Pi as the hardware development kit. The same steps can be implemented on other IoT devices like Arduino, RoboCV etc. The proposed algorithm exhibits minimum error rate with maximum feature reduction in minimum time for all the standard databases maintaining the accuracy as much as of the existing techniques. These characteristics of the proposed scheme prove it useful for real time implementation of face and other recognition for IoT based surveillance system. In future, this method can be explored further in consideration with potential deep learning techniques for implementation of real time IoT applications. The same architecture and algorithm be deployed and tested for any visual recognition problem. The proposed architecture and algorithm is generic enough as shown in the results section that it works well on face as well as texture recognition. The real time speed to problems like highway surveillance may be a bottleneck and may need further investigation. The further improvement can be investigated in three areas. First to further reduce the computation complexity so that the frames per seconds of the system can be increased. Secondly, the power consumption factor needs to be investigated and reported for proposed architecture. In future, the proposed architecture can be extended to apply on datasets where human faces are having face masks in post COVID-19 era for person identification. Also this work can be utilized for automatic attendance during online sessions as in the pandemic. Furthermore, this scheme can be explored in doing fingerprint and iris recognition for complete biometric verification in banking or other high security services.

Acknowledgements The authors would like to thank the Cyber Physical System Group, CSIR-Central Electronics Engineering Research Institute, Pilani, Rajasthan (India) for providing infrastructural facilities to carry out the research work. 

Multimodal biometric scheme for human authentication technique based on voice and face recognition fusion

Face description with local binary patterns: application to face recognition

Face recognition against illuminations using two directional multi-level threshold-LBP and DCT

Face recognition based on haar wavelet transform and principal component analysis via levenberg-marquardt back-propagation neural network

The AR face database abstract

An experimental study for the effects of noise on face recognition algorithms under varying illumination

Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification

Tracking a few extreme singular values and vectors in signal processing

Face description using electric virtual binary pattern (EVBP): application to face recognition

Maximum margin distance learning for dynamic texture recognition ECCV

Evaluation of face recognition techniques using PCA, wavelets and SVM

A completed modeling of local binary pattern

Transfer learning with deep convolutional neural network for constitution classification with face image

A comprehensive comparative study of handcrafted methods forface recognition LBP-like and non LBP operators

Reliable facial expression recognition for multi-scale images using weber local binary image based cosine transform features

Scale and rotation-invariant local binary pattern using scale-adaptive text on and sub uniform-based circular shift

Compound dictionary learning based classification method with a novel virtual sample generation Technology for Face Recognition

Dominant local binary patterns for texture classification

Optimized coefficient vector and sparse representation-based classification method for face recognition

DynTex: a comprehensive database of dynamic textures pattern

A real-time high-resolution technique for angle of-arrival estimation

Noise-resistant local binary pattern with an embedded error-correction mechanism

A complete and fully automated face verification system on mobile devices

Dynamic texture recognition using enhanced LBP features ICASSP

A chi-squared-transformed subspace of LBP histogram for visual recognition

Learning LBP structure by maximizing the conditional mutual information

LBP-based edge-texture features for object recognition

Linear representation of intra-class discriminant features for small-sample face recognition

Voxel-based 3D face reconstruction and its application to face recognition using sequential deep learning

A hybrid features extraction on face for efficient face recognition

Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions

Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion

Simple, effective computation of principal eigenvectors and their eigenvalues and application to high resolution estimation of frequencies

Power mean SVM for large scale visual classification

CENTRIST: a visual descriptor for scene categorization

Towards good practices for action video encoding proceedings of the IEEE computer society

Fast subspace decomposition of data matrices

Fast subspace decomposition

Dynamic texture classification using dynamic fractal analysis

Robust head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting

Fine-grained LFW database

Combining weighted adaptive CS-LBP and local linear discriminant projection for gait recognition

Dynamic texture recognition using local binary patterns with an application to facial expressions

Rotation-invariant image and video description with local binary pattern features

Face feature extraction and recognition via local binary pattern and two-dimensional locality preserving projection

A new approach for small sample face recognition with pose variation by fusing Gabor encoding features and deep features

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations