key: cord-1044447-ozkm53zp
authors: Kumari, Punam; Seeja, K. R.
title: A novel periocular biometrics solution for authentication during Covid-19 pandemic situation
date: 2021-01-03
journal: J Ambient Intell Humaniz Comput
DOI: 10.1007/s12652-020-02814-1
sha: a5223b57aa38ee208b300ed51a4a3cc957c15b8d
doc_id: 1044447
cord_uid: ozkm53zp

The outbreak of novel coronavirus in 2019 has shaken the whole world and it quickly evolved as a global pandemic, placing everyone in a panic situation. Considering its long-term effects on day to day lives, the necessity of wearing face mask and social distancing brings in picture the requirement of a contact less biometric system for all future authentication systems. One of the solutions is to use periocular biometric as it does not need physical contact like fingerprint biometric and is able to identify even people wearing face masks. Since, the periocular region is a small area as compared to face, extraction of required number of features from that small region is the major concern to make the system highly robust. This research proposes a feature fusion approach which combines the handcrafted features HOG, non-handcrafted features extracted using pretrained CNN models and gender related features extracted using a five layer CNN model. The proposed feature fusion approach is evaluated using multiclass SVM classifier with three different benchmark databases, UBIPr, Color FERET and Ethnic Ocular as well as for three non-ideal scenarios i.e. the effect of eyeglasses, effect of eye occlusion and pose variations. The proposed approach shows remarkable improvement in performance over pre-existing approaches.

The present situation of COVID-19 pandemic has challenged the world in all dimensions and no one can predict when the human kind will completely get rid of this virus. Considering its long-term effect, we as a human being must understand and learn to live with this virus. But to combat this situation and make our daily lives safer we can rely on technology and primarily on "contact less" technology. Under COVID 19 crisis, biometric systems which are considered as the spinal bone of any country's security system is facing a lot of challenges. Almost all the organizations have stopped the use of contact based biometric systems, which can be a major reason to spread corona virus. Finger print biometric authentication systems which are implemented from small organization to the place where security is on top priority, will no more be considered as a safer option for authentication. Solution is Face or Iris biometric authentication system but even they face certain challenges. The use of face mask that now become an essential part of our daily lives is the major hurdle in face biometric as it can deteriorate the performance of the system. Whereas iris biometric system requires high user cooperation. Under this situation, the use of periocular region as biometric trait is a better solution as compared to other biometric traits. Periocular region refers to the periphery of eyes which contains eye, eyebrow and pre-eye orbital region as illustrated in Fig. 1 .

Periocular region works well even if the face of the subject is occluded because of face mask or veil and if the subject is not cooperative. Figure 2 shows some real-life examples where the use of periocular region for authentication is extremely useful.

Periocular region which can be considered as only 20 to 30% of the face area, itself is a small region of interest. Partial occlusion of such small size region may also deteriorate the performance of the system. The reason of the occlusion may be the use of eyeglasses, partial closure of eyes or pose variation. Based on the literature, most of the researchers extracted either handcrafted features (Raja et al. 2014; Karahan et al. 2014; Mahalingam and Ricanek 2013; Yadav et al. 2015; Bakshi and Manjhi 2015) or non-handcrafted features (Zhao and Kumar 2016; Kandaswamy et al. 2017 ) from small periocular region of interest for matching, whereas the use of semantic information of the subject (gender, ethnicity, race etc.) was almost negligible.

In the proposed work, we have given equal weightage to all three features (handcrafted, non-handcrafted and semantic information) and combined them to create a single feature vector. The single feature vector extracted from the input images were fed to multiclass support vector machine classifier for classification. The significant contributions of this research are, 1. Proposed a feature fusion approach of handcrafted, non-handcrafted features and semantic information for extracting features from the small periocular region. 2. Created a highly robust contactless biometric system whose performance did not affect even when the person is wearing mask. 3. Evaluated the performance of the proposed system in three non-ideal conditions.

• Subject is wearing glasses. • Eye portion is masked or hidden. • Pose variation. Park et al. (2009) proposed periocular region as a biometric trait for person identification and provided a new direction to use periocular region based biometric system as a standalone modality or as a supporting biometric trait for other biometric systems. Later, they extended (Park et al. 2011) their proposed concept to examine the effectiveness of periocular region for nonideal scenarios such as pose variation, masking of critical eye components and inclusion of eyebrow in periocular region of interest. The outcome of their experiments provided a strong support to develop an effective periocular biometric system to use in the scenario where other biometric traits, for example, face and iris biometric are not completely useful.

In recent years, continuous research efforts (Alonso-Fernandez and Bigun 2016; Kumari and Seeja 2019; Nigam et al. 2015) were made to establish periocular region as a strong biometric trait. In early days, researchers used handcrafted features for periocular image matching. Hand crafted features are broadly classified into two categories (1) Global feature descriptors and (2) Local feature descriptors. Global feature descriptor consider image as a whole and create a single feature vector for the whole image whereas local feature descriptor divides the image into patches (group of pixels), create feature vector for every patch and finally combine them to create a single feature vector.

Global feature descriptors can be further classified into three categories (1) texture-based feature descriptor (2) color-based feature descriptor and 3) shape-based feature descriptor. Raja et al. (2014) and Mahalingam et al. (2013) used different texture-based feature descriptor such as Binary Statistical Image Feature (BSIF) and Local Binary Pattern (LBP) respectively for matching. The primary advantage of texture-based feature descriptor is that they are easy to implement with very low computational cost but they are highly sensitive to noise, blurring and rotation in images. Woodard et al. (2010) used Color based feature descriptor for periocular image matching and obtained noticeable recognition accuracy. The primary limitation of color-based feature descriptor is that it works well only if the input image has uniform color distribution. Proenca et al. (2014) and Le et al. (2014) considered shape based feature descriptors such as shape of eye lid and shape of eyebrow respectively for periocular image matching. Shape based features are easy to visualize and implement and they are very helpful when contour of the shape is more important as compare to its inner content, but these features are highly sensitive to noise and shape changes. One more disadvantage of shape-based features is that some time it is very hard to distinguish shape of the contour with background color such as for dark skinned people it is almost impossible to extract shape of eyebrow from face images.

Another category of handcrafted features is local feature descriptor. Hollingsworth et al. (2011) and Rattani et al. (2017) used Histogram of Oriented Gradients (HOG) whereas Karahan et al. (2014) implemented Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF) with their variants as local feature descriptors and obtained remarkable recognition accuracy.Later,it is found that (Alonso-Fernandez and Bigun 2014; Karahan et al. 2016; Chen et al. 2017 ) the presence of non-ideal images can degrade the overall performance of the system. They analysed various non ideal cases such as problem of occlusion which may occur because of the presence of glasses, shadow on face or accidental closer of eyelids because of natural eye movement by the subject and concluded that even occlusion of 5 to 10 percent of small area of periocular region can reduce the final performance of the system. To handle the problem of pose variation, some researchers (Park et al. 2011; Castrillón-Santana 2016) used different handcrafted features and obtained noticeable results but also concluded that their methods need further improvement. With the emergence of deep learning concepts, the focus of the researchers Kumar 2016, 2018; Tiong et al. 2019) has been moved to non-handcrafted features and obtained visible improvement in performance of periocular biometric systems. Tiong et al. (2019) proposed a handcrafted and non-handcrafted feature fusion approach and created a dual stream CNN in which each stream contains eight convolutional layers, two fully connected layers and a late fusion layer with shared weights and parameters. In their fusion approach, they fed a combination of RGB data and OCLBP feature to each stream of CNN for periocular recognition and demonstrated the effectiveness of fusion of multi-features instead of using only raw data as input. The drawback is that this approach is limited to the subject wearing glasses. Raffei et al. (2019) implemented a score fusion approach and Rotation invariant LBP to extract textural features and color moment for color feature extraction. With the use of CHI square and Euclidean distance for matching of templates generated through texture and color feature respectively, they generated two different matching scores. The matching scores were then fused to obtain the final matching accuracy. The drawback of this approach is that in order to use Color Moment, the input images were mapped from RGB to HSV space because of that the images were affected by the problem of non-removable singularities. Moreover, the HSV color space is also low sensitive to small color changes which may remove micro features of images and degrade the performance of the system. Kumar et al. (2019) used a wellknown handcrafted feature known as Local Binary Pattern to match periocular images extracted from images of same subject at different ages. To improve the performance of the system, they have implemented an image enhancement and denoising technique known as self enhancement quotient with discrete wavelet transform. At the first stage, the method looks very simple and result obtained are good but the paper lacks a discussion on the complexity to implement DWT which require more memory and process cycle and can make the whole process computationally very costly. Bakshi et al. (2018) implemented a feature reduction approach to extract Phase Intensive Local Patterns (PILP) with 20 percent reduction in extracted features known as R-PILP. With Nearest Neighbor classifier they obtained remarkable recognition accuracy with less computation time. The proposed method works well but requires very careful approach at the time of feature reduction otherwise wrong feature reduction may lead to high performance degradation in terms of final accuracy.

Periocular biometrics also played a major role in the area of soft biometric classification. Several researchers considered features from periocular region to identify subject's gender (Castrillón-Santana 2016), ethnicity or race (Chen et al. 2017) etc. Zhao and Kumar (2016) proposed a novel idea to use semantic information in recognition. They added an additional CNN branch trained with gender information to an already existing CNN. This model outperforms and obtained noticeable recognition accuracy of ~ 92% in periocular recognition. As an extension of their work (Zhao and Kumar 2018) , they created a novel deep learning architecture which includes attention model concept. They exploited the concept of visual attention to emphasize the critical components of periocular region such as shape of the eyebrow and shape of the eye. Their model performed better and acquired improved recognition accuracy as compared to other works in literature. Based on their work they infer that eyebrow and eye shape are highly feature rich regions and the inclusion of them in periocular region of interest may improve the discriminating power of the model.

Motivating from the above facts (degradation in accuracy because of non-ideal conditions) and by considering the popularity of handcrafted, non-handcrafted and semantic information in the area of biometrics authentication, this paper proposes an ensemble feature vector approach which is a combination of handcrafted features HOG, non-handcrafted features extracted using pretrained CNN models and gender related features extracted using CNN to create a highly robust periocular biometric system for the authentication of the images captured in various non ideal scenarios.

In this study, three different benchmark databases were chosen for assessing the effectiveness of the proposed feature fusion approach. UBIPr database created by Padole and Proenca (2012) , Ethnic Ocular database created by Tiong et al. (2019) and Color FERETdatabase created by Philips et al. (2000) has been selected to analyse the effect of glasses, masked eye portion and pose variation. Figure 3 shows the example images from all of the three databases.

This database contains total 10,252 RGB periocular images from 344 subjects in.bmp format with 501 × 401 pixels of image resolution. Images in this dataset were captured from 4 m to 8 m distance to include distance variability, but in controlled environment. Dataset also contains metadata files for each image which include coordinates of canthus points, center of the iris, end points and mid points of eyebrow as well as information about gender, level of pigmentation and angle of gaze etc. Some of the images in the dataset suffer from the problem of occlusion because of hair or eye glasses and variation in pose because of tilted head.

This database consists of two folders namely Train_data and Test_data. Train_data folder contains 70,142 RGB periocular images from 623 subjects and Test_data folder contains 15,262 RGB periocular images from 411 subjects. All the images are stored in.JPG format with 80 × 80 pixels image resolution. Folder also contains metadata files which include information about name, nationality, ethnicity, gender, profession and number of pair images available for the subject. All the images were collected in wild with different pose, appearance and illumination variation.

The key issue to implement periocular region based biometric authentication system is to extract potential region of interest. The general approach (Park et al. 2009 (Park et al. , 2011 is to use the coordinate of iris center as a reference point to extract periocular region of interest. In the proposed study, for the FERET database, same strategy (use of iris center) is implemented to extract periocular region of interest. The coordinates of iris center of images in FERET dataset is provided in its metadata. Considering the center of iris as center of periocular region of interest, a rectangular periocular region of 64 × 64 size is extracted from the face images for both left and right eye respectively.

Even though the above method is widely used in periocular literature, it does not work well if the eyes of the subjects are fully or partially closed. Hence an algorithm suggested by Liu et al. (2017) is used for the extraction of ROI from images of UBIPr and Ethnic Ocular datasets. This algorithm considers the canthus points as reference points (Fig. 4) . The coordinates of canthus points of the images in UBIPr dataset is provided in its metadata but it was not provided in the The algorithm is summarized in Algorithm 1.

Algorithm 1 Input: x 1 , y 1 are the coordinate for medial canthus points and x 2 , y 2 are the coordinate for lateral canthus points.

Step 1: Calculate Euclidean distance between medial and lateral canthus points

Step 2 Compute L p = L px , L py , midpoint of the line which connects medial and lateral canthus points.

Step 3: Compute the top left corner point x 3 , y 3 and bottom right corner x 4 , y 4 of rectangular ROI Where a and b are constants whose values are chosen to cover the required periocular region

Step 4 Use the calculated points x 3 , y 3 and x 4 , y 4 to extract rectangular ROI.

In order to make the proposed approach capable to handle poor quality images (images with low illumination and contrast or captured in dim lighting condition), the input images are first pre-processed using CLAHE (Zuiderveld 1994) . It is an advancement of Adaptive Histogram Equalization (AHE) which was proposed by Pizer et al. (1987) . The two reasons to use CLAHE rather than other histogram equalization approach are 1. Simple histogram equalization approach enhances the contrast of images globally whereas CLAHE works by dividing the images into number of tiles i.e. it enhances

the local contrast value to highlight the edges and boundaries of the images. 2. Simple histogram equalization methods generally suffer with the problem of over amplification of noise whereas CLAHE limits the over amplification of noise by imposing a clip limit on the image histogram generated by simple Histogram Equalization methods.

To implement CLAHE one has to fine tune two important hyperparameters: Clip limit and Number of tiles.

The CLAHE algorithm is summarised in Algorithm 2.

Step 1: Divide the input image into NXN sub images.

[N: number of tiles].

Step 2: Calculate the histogram of each sub image and calculate its high peak intensity value.

Step 3: Calculate a threshold to use as clip limit between 0 to high peak intensity value.

Step 4: For each bin in the histogram

Step 5: Obtain the normalized histogram for each sub images with enhanced contrast. Combine all the neighbouring sub images using bilinear interpolation to obtain contrast enhanced image.

In the proposed approach HOG is used to extract the handcrafted features. HOG is a feature descriptor which is rotation invariance (since it is working on gradients) and its computational complexity is very low. HOG is selected in this study due to its ability to automatically handle some non-ideal situations. Feature extraction process using HOG is summarized in Algorithm 3.

if (histogram bin > clip limit)

{clip the histogram, collect and redistribute all the clipped pixels uniformly to all the bins and obtain renormalized clipped histogram} Step 1: Consider an image 'I' from dataset.

Step 2: Convert the image into grayscale image.

Step 3: Calculate horizontal and vertical gradient value for every pixelof the image using the kernels as shown in Fig. 5 .

Step 4: Divide the image into adjacent and non-overlapping cells of p × p (p = 4) pixels, compute the histogram of orientation of gradients and binned them into 'B' bins (B = 9).

Step 5: There may be few pixels in the image whose orientation value may be close to bin boundary so, they might contribute to different bin also. To handle this situation, HOG uses weighted voting using bilinear interpolation and make fraction of the pixel's gradient magnitude contribute to two bins.

Step 6: Group the cells into non-overlapping blocks which contained normalized gradient histogram.

Step 7: Normalized the block feature vectors and compute the HOG feature vector for the block.

Step 8: Concatenate all the features obtained from all the blocks to compute HOG feature vector for the image.

Step 9: Repeat step 1 to 8 for all the images in dataset.

Deep learning is a technique used to obtain feature sets directly by observing the input images. In this study, deeper layers of seven off the shelf CNN models are used as feature extractors. A brief description of the CNN models used in the proposed study is given below: AlexNet (Krizhevsky et al. 2012 ): This CNN is 25 layers deep and consists of 5 convolutional layers.

GoogLeNet (Szegedy et al. 2015) : It is a 144 layers deep network, also known as Inception V1. It consists of 9 inception modules and 22 convolutional layers. Inception modules are nothing but a combination of different size filters connected at the same level. Different size filters allow multilevel feature extractions at same level.

ResNet 50 and ResNet 101 (He et al. 2016 ): Residual-Net50 is 177 layers deep and consists of 50 convolutional layers whereas ResidualNet 101 is 347 layers deep and consists of 101 convolutional layers. It works on the idea of Residual Block which uses 'skip connection'by skipping the mid-way layers to avoid the problem of vanishing gradients.

VGG16 and VGG19 (Simonyan and Zisserman 2014): VGG16 is 41 layers deep and consists of 16 convolutional layers whereas VGG19 is 47 layers deep and consists of 19 convolutional layers. The concept of VGG Net comes as an improvement of AlexNet. In VGG Net, large size filters of AlexNet were replaced by small size filters to make VGG Net deeper and to make it capable to learn more complex features from images. Inception V3 (Szegedy et al. 2016 ): consists of 48 layers and works on the concept of factorizing convolution. Factorizing convolution is a concept which is used to reduce the parameter without affecting the network efficiency and save the network to become over fitted.

In the proposed feature fusion-based approach along with hand crafted and non-handcrafted features, model is explicitly assisted with gender information. To extract gender information, a five-layer CNN classifier model is proposed for binary classification (Male/Female). The detailed layer configuration and their sequencing are shown in Fig. 6 . This CNN is trained using gender information of the subject instead of subject identity. The training labels of UBIPr, Ethnic Ocular dataset and Color FERET are extracted from its metadata as gender information is included in the metadata. The detailed layer configuration for the proposed CNN architecture is given below:

1. First two Convolutional layers (Conv 1 and Conv 2) each contains 16 filters of size 3 × 3 with stride of one pixel which is followed by Max Pooling Layer (Max Pooling Layer 1 and Max Pooling Layer 2) with stride of 2 pixels 2. Next two Convolutional layers (Conv 3 and Conv 4) each contains 32 filters of size 3 × 3 with stride of one pixel which is followed by Max Pooling Layer (Max Pooling Layer 3 and Max Pooling Layer 4) with stride of 2 pixels.

3. The last Convolutional layers (Conv 5) contains 64 filters of size 3 × 3 with stride of one pixel which is followed by a Max Pooling Layer (Max Pooling Layer 5) with stride of 2 pixels.

With ReLU as activation function, this architecture contains one fully connected layer (with output size 2), one Soft-Max layer and a classification layer to classify images based on the gender (male/female). The features extracted from convolutional layer 5 (Conv 5) is used as gender information.

In the proposed work, multiclass Support Vector Machine classifier (Zhang 2012) is implemented for classification. The basic idea behind SVM is to consider a hyperplane in n-dimensional space which should be able to separate the data points in their potential classes. The data points which are near to hyperplane are known as support vectors. Up to a greater extent, efficiency of SVM classifier depends on kernel function. The task of kernel function is to map input space into higher dimensional space. For the proposed study, we have used RBF (Radial Basis Function) Kernel.

The methodology followed in this research is illustrated in Fig. 7 . In the proposed work input images from three different databases UBIPr, Ethnic Ocular and Color FERET are used for training as well as testing purposes. The first step of the proposed approach is the extraction of optimum periocular region of interest. UBIPr and Ethnic ocular dataset consists of periocular images whereas Color FERET dataset contains face images. After that CLAHE was used to improve the quality of low illumination and/or low contrast images.

Three different types of features, handcrafted features (HOG), non-handcrafted features (using pretrained CNN) and gender related features (using a 5-layer CNN), are extracted from periocular region. To combine the features extracted using three different methods, all the three feature vectors are concatenated to create an ensemble feature vector. Let F h , F n and F g denote feature vectors corresponds to handcrafted feature, non-handcrafted feature and gender feature. The ensemble feature vector obtained by concatenating all the three feature vectors is, Final feature vector was fed into support vector machine classifier for classification of images and to predict the class labels.

This work is implemented on MATLAB r2019a. The system specification used in this work is Intel Core i7-8750H GPU Processor @ 2.2 GHz, 8 GB DDR4 RAM with Windows 10 operating system. Three different experiments were conducted on three different datasets. The dataset statistics is provided in Table 1 .

The periocular ROIs are extracted by implementing the algorithms mentioned in Sect. 3.2. The values of constants a and b in algorithm 1 are calculated through empirical evaluation so that the selected periocular region covers all relevant periocular information and ignore all irrelevant information such as nose, forehead etc. For the ethnic ocular dataset, a = 1.2 and b = 0.8 and for UBIPr dataset, a = 0.9 and b = 0.4. Then the images are preprocessed by implementing the CLAHE algorithm. To implement CLAHE two important hyperparameters need to be carefully tuned: Clip (6) F ensemble feature vector = concat[F h , F n , F g ] limit and Number of tiles. For this study, the number of tiles used for all the three databases are 4 × 4 whereas clip limit used for UBIPr and Ethnic ocular database is 0.05 and for COLOR FERET database is 0.04 (calculated through empirical evaluation). Figure 8 illustrate a comparison of CLAHE and simple AHE on example images from the three different databases. As shown in the Fig. 8 , the application of CLAHE prominently highlighted the micro features of images which can automatically enhance the discrimination ability of the proposed approach.

After the preprocessing, three different types features were extracted. For hand crafted feature extraction, HOG is implemented. After implementation of HOG, a handcrafted feature vector of size 8100 components was obtained for each input image. For non-handcrafted feature extraction, seven pre-trained CNNs are used. After analyzing features extracted from different initial, middle and last layers of the pretrained networks, the non-handcrafted features extracted from penultimate layers are selected. It is found that the penultimate layers contain maximum relevant information to distinguish class labels. Table 2 shows non-handcrafted feature vector size obtained using different pretrained networks.

For semantic information or gender related feature extraction, a five-layer CNN proposed in Sect. 3.6 has been implemented. The proposed CNN is trained on images extracted from three different databases. After exploiting features from different layers of the proposed CNN architecture, the features extracted from convolutional layer 5 (Conv 5) is used as gender information for the proposed feature fusion work. This feature vector contains 1024 components for each input image. Performance of proposed CNN architecture for gender classification on three different databases is provided in Table 3 . After evaluating a lot of different set of hyper parameters such as different learning rates (0.0001, 0.0003, 0.0005, and 0.0008), minibatch size (10-64) and epoch (starting from 2 to 10) the final parameter specification used to train the proposed CNN model for gender related feature extraction and the hyperparameters used to train the pretrained CNN models are given in Table 4 . For the proposed study, SGDM is used as learning algorithm as it is computationally very fast and since it updates parameters very frequently, it converges fast. The value of momentum lies between 0 and 1 and it help to jump over from local-minima. Large value of momentum provides speed and quick convergence but if momentum is large then learning rate should be small. Hence, in this study good results are obtained with small learning rate such as 0.0001, 0.0003 and 0.0008. If value of momentum and learning rate both is large then algorithm can skip local minima and degrades its performance. L2 regularization also known as 'weight decay' reduces the possibility of overfitting by taking care of the values of bias and weights.

Then, all the three feature vectors (handcrafted, non handcrafted and gender) are concatenated to create the final feature vector. Final size of feature vector after concatenation is provided in Table 5 .

In order to evaluate the efficiency of the proposed ensemble feature vector, the multiclass SVM classifier is implemented. The best classification results were obtained with the RBF kernel.

Since the proposed biometric system is a closed set identification system, the performance of the proposed system is evaluated using two appropriate performance metrics: Rank 1 recognition accuracy and Cumulative Match Characteristic curve.

Rank 1 accuracy tells that how many times the label predicted by the system is same as ground truth label and it is defined as:

(7) Rank 1 recognition accuracy = sum(predicted class label = true class label) Total class label × 100

It is used to assess the probability of identification by the system at various ranks and it plots the number of correct identifications occurred in top k matches or ranks.

Now a days, a large part of population is wearing eye glasses which may hide a part of periocular region. Considering this fact, this experiment assess the accuracy of the proposed approach when subjects are wearing eye glasses. Rank1 recognition accuracy obtained by the proposed approach on UBIPr, Ethnic Ocular and Color FERET dataset is shown in Table 6 with best result highlighted in bold and italic. Comparison of Rank 1 to Rank 10 recognition accuracy using CMC curve is illustrated in Fig. 9 . 

This experiment is carried out to analyze the accuracy of a true periocular biometric system that considered only pre eye orbital region and disregard all the eye components. This may be interesting when subjects' eyes are closed or hidden intentionally. To evaluate this fact, this experiment considered images with eye portion such as sclera, iris and eye shape are manually masked or hide. Rank1 recognition accuracy obtained by the proposed approach on UBIPr, ethnic ocular and color FERET dataset is shown in Table 7 with best result highlighted in bold and italic. Comparison of Rank 1 to Rank 10 recognition accuracy using CMC curve is illustrated in Fig. 10 .

This experiment is implemented to assess the efficiency of the proposed approach when images are suffered with pose variations. Here, for the training features are extracted only from frontal (0 degree pose variation) images, whereas for testing, images with different pose variation were used. Recognition accuracy obtained by proposed approach on UBIPr and color FERET dataset is shown in Table 8 with best result highlighted in bold and italic. Comparison of Rank 1 to Rank 10 recognition accuracy using CMC curve is illustrated in Fig. 11 .

For fair comparison with pre-existing approaches, the recognition accuracy of the proposed feature fusion-based approach is evaluated on complete UBIPr, Ethnic Ocular and color FERET databases and is shown in Table 9 with best result highlighted in bold and italic. Comparison of Rank 1 to Rank 10 recognition accuracy using CMC curve is illustrated in Fig. 12. Table 10 shows a comparison between the proposed model and state of the art models. For this, the best result obtained by the proposed feature fusion approach on complete UBIPr, Ethnic Ocular and Color FERET database is used. From the results shown in Table 10 , it is observed that the proposed approach outperforms the state-of-the-art models in periocular authentication. This observation suggests that the concept of combining handcrafted and non-handcrafted features with semantic information is successfully contributing useful information and thus improving the overall discriminative capability of the periocular system. Some important findings from all of the experiments are enumerated below:

1. The recognition accuracy obtained for first two experiments reveals that the proposed approach is performing low for ethnic ocular and Color FERET dataset as compare to UBIPr dataset. Reason for low performance may be the high degree of randomness in images used in ethnic ocular dataset and images with high pose variation and varying age in Color FERET dataset. 2. Experimental results illustrate that for periocular matching, the combination of handcrafted, non-handcrafted features and semantic information obtained remarkable recognition accuracy for all the non-ideal scenarios and with all pretrained CNNs. 3. It is found that the Eye shape, corner points etc. are critical components for periocular matching since masking of eye region significantly degrade the recognition accuracy whereas presence of eyeglasses does not make much effect on the performance of the system. 4. It is also found that the combination of handcrafted and non-handcrafted features neutralizes the effect of pose variation up to a greater extent and obtained good results as compared to other methods in literature. 5. From Figs. 9, 10, 11 to 12, it can be easily observed that proposed fusion approach outperform handcrafted and non-handcrafted features in every scenario from Rank 1 to Rank 10 recognition accuracy. 

In order to cope up with the COVID-19 pandemic, this research proposes a robust periocular region based biometric authentication system. The proposed system uses handcrafted and non-handcrafted features assisted with semantic information for image matching. The proposed model gained remarkable improvement in recognition accuracy for the non-ideal images, captured in three nonideal scenarios: subjects are wearing eyeglasses (which can partially occlude the periocular region), masking of eye region (effect of partial/full closure of eyes) and pose variation (images with tilted head). The experimental results provide very strong support to the proposed feature level fusion approach which is unique of its type and to the best of our knowledge nothing like this is ever proposed by anyone in the area of periocular biometrics. The future work aims to improve the performance of the proposed approach by assisting it with some more semantic information and binding it with attention mechanism on some critical components in a single learning model.

Eye detection by complex filtering for periocular recognition

A survey on periocular biometrics research

A novel phase-intensive local pattern for periocular recognition under visible spectrum

Fast periocular authentication in handheld devices with reduced phase intensive local pattern

On using periocular biometric for gender classification in the wild

A novel race classification method based on periocular features fusion

Deep residual learning for image recognition

Human and machine performance on periocular biometrics under near-infrared light and visible light

Multisource deep transfer learning for cross-sensor biometrics

On identification from periocular region utilizing sift and surf

How image degradations affect deep cnn based face recognition

Image net classification with deep convolutional neural networks

Periocular region-based age-invariant face recognition using local binary pattern

Periocular Biometrics: A survey

A novel eyebrow segmentation and eyebrow shape-based identification

Ocular recognition for blinking eyes

LBP-based periocular recognition on challenging face datasets

Ocular biometrics: a survey of modalities and fusion approaches

Periocular recognition: Analysis of performance degradation factors

Periocular Biometrics in the visible spectrum: A Feasibility study

Periocular biometrics in the visible spectrum

The FERET evaluation methodology for face recognition algorithms

Adaptive histogram equalization and its variations

Segmenting the periocular region using a hierarchical graphical model fed by texture/shape information and geometrical constraints

Fusion iris and periocular recognitions in noncooperative environment

Binarized statistical features for improved iris and periocular recognition in visible spectrum

Gender prediction from mobile ocular images: a feasibility study

Very deep convolutional networks for large-scale image recognition

Going deeper with convolutions

Rethinking the inception architecture for computer vision

Periocular recognition in the wild with orthogonal combination of local binary coded pattern in dual-stream convolutional neural network

Periocular region appearance cues for biometric identification

Multiresolution local binary pattern variants-based texture feature extraction techniques for efficient classification of microscopic images of hardwood species

Support vector machine classification algorithm and its application

Accurate periocular recognition under less constrained environment using semantics-assisted convolutional neural network

Improving periocular recognition by explicit attention to critical regions in deep neural network

Contrast limited adaptive histogram equalization. Graphics Gems IV