key: cord-1023231-6ksm9c95 authors: Krithika, L. B.; Priya, G. G. Lakshmi title: Graph based feature extraction and hybrid classification approach for facial expression recognition date: 2020-07-14 journal: J Ambient Intell Humaniz Comput DOI: 10.1007/s12652-020-02311-5 sha: 6185e577407f5bb21e1709bfbae75539bd927c52 doc_id: 1023231 cord_uid: 6ksm9c95 In the current trends, face recognition has a remarkable attraction towards favorable and inquiry of an image. Several algorithms are utilized for recognizing the facial expressions, but they lack in the issues like inaccurate recognition of facial expression. To overcome these issues, a Graph-based Feature Extraction and Hybrid Classification Approach (GFE-HCA) is proposed for recognizing the facial expressions. The main motive of this work is to recognize human emotions in an effective manner. Initially, the face image is identified using the Viola–Jones algorithm. Subsequently, the facial parts such as right eye, left eye, nose and mouth are extracted from the detected facial image. The edge-based invariant transform feature is utilized to extract the features from the extracted facial parts. From this edge-based invariant features, the dimensions are optimized using Weighted Visibility Graph which produces the graph-based features. Also, the shape appearance-based features from the facial parts are extracted. From these extracted features, facial expressions are recognized and classified using a Self-Organizing Map based Neural Network Classifier. The performance of this GFE-HCA approach is evaluated and compared with the existing techniques, and the superiority of the proposed approach is proved with its increased recognition rate. Emotion is a unique feature of human, widely facial expressions are utilized in the non-verbal communication process and also plays a significant role in the process of recognizing emotions (Kim et al. 2019 ). Next to the voice tone, the facial expressions are most essentially considered for emotional communication. This permits an individual person to express their emotional state, which is also an indicator of feelings. Therefore, the information on the facial expressions can be utilized in automatic systems of emotion recognition. Depending on the facial expressions, the emotional states are categorized into six states such as joy, anger, surprise, disgust, neutral and fear (Ratliff and Patterson 2008) . The Human Machine Interaction (HMI) systems become more essential because of the significance of facial expression recognition. Thus, various machine learning, as well as computer vision algorithms, are introduced for designing the HMI (Fuentes et al. 2017; Samara et al. 2019) . Moreover, there are so many annotated face databases with basic expressions that are portrayed by human beings and some faces are captured in uncontrolled settings. This facial expression recognition approaches helps to classify the faces in a given image or sequence of images (Mollahosseini et al. 2016 ). Even though several conventional techniques are utilized for detecting and classifying the facial expressions, still they did not detect and classify the facial expressions in a flexible manner (Mollahosseini et al. 2016) . The major reason for this issue lies in determining the accurate training dataset specifically for emotions like fear or sadness becomes a challenge. Depending upon the problems that are identified previously, this research paper has the objectives as follows: To effectively detect and classify the facial expressions, a GFE-HCA: graph-based feature extraction and hybrid classification approach are proposed. Viola-Jones algorithm is utilized to discover the facial image (Jensen 2008) . The features from facial parts are extracted by using the edge-based invariant feature transform method and then the dimensions of the extracted features are optimized by implementing the weighted visibility graph (Xie and Hu 2017) . Similarly, the shape-based features are extracted using facial shape appearances. Finally, the facial expression is recognized and classified by proposing the neural network-based self-organizing classifier. The subsequent sections of this work is organized as follows: Sect. 2 analyses the traditional methods utilized for detecting and recognizing the facial expressions. Section 3 elucidates the proposed method for accurate facial expression recognition. Section 4 assesses the results of both traditional and proposed systems with various performance metrics. Finally, this work is determined in Sect. 5. This division discusses the traditional approaches that are utilized for recognizing facial expressions. Also, the merits and demerits of the existing techniques are analyzed. This section is sub-categorized into three sub-divisions such as pre-processing, feature extraction and classification. The techniques and algorithm that are utilized for these subdivisions are discussed as follows. The nature of axis-symmetric faces and designed a framework for producing axis symmetrical virtual dictionary approximately to increase the precision of face recognition was considered (Xu et al. 2016 ). The proposed system was simple to implement and mathematically tractable. The proposed method offered better results compared to that of other pre-processing approaches. A simple solution for emotion recognition proposed using the group of the convolutional neural network along with some image pre-processing techniques (Lopes et al. 2017) . It was used in real-time applications and took less time. A novel system to recognize the facial expression, which was depending on the ensembles of descriptors that utilized various pre-processing approaches was proposed . The performance of this presented system was evaluated using two distinct datasets, such as FERET and LFW. This system performed well on both the datasets and attained increased performance rates. Also, this novel system offered better results with higher accuracy. The authors utilized stationary wavelet transform for extracting the features to recognize the facial expression (Qayyum et al. 2017) . Here the pre-processing was done using normalization histogram equalization for detecting the face more accurately. With the pre-processing step, the facial parts were determined effectively, and this helped to extract the features in an efficient way. After this, the classification process was carried out to classify the facial expressions. These results offered an increased average recognition rate for different datasets. From the results, it was noted that this approach offered better recognition results with increased accuracy compared to that of the traditional techniques. The author introduced a novel ensemble of descriptors to recognize the facial expressions from the Patterns of the Oriented Edge Magnitudes (POEM) descriptor . Here, various ensembles were developed by using the pre-processing approaches. This approach was evaluated with different datasets, such as LFW and FERET. The approach performed well on both the datasets with accurate results. This section discussed different traditional pre-processing techniques for different facial datasets. It included main benefits like high average recognition rate and less processing time, but the need for improving the accuracy is a constant challenge. A novel approach was adopted for analyzing the facial expressions with the discovery of some common information from several expressions (Zhong et al. 2014 ). The overall expressions and the specific expressions are distinguished through the common and particular patches. For locating the distinct patches, two stages multi-task sparse learning was used. In this approach, the dominant patches were found for all the expressions. The specific patches were analyzed in the individual expression by combining both the related facial expression recognition and facial verification tasks. The main advantage of this method was patch learning which provided better performance. An approach depending upon the multi-level Haar wavelet was used to extract the features of the appearance from the obvious face regions on dissimilar scales (Goyani and Patel 2017) . Initially, the approach segmented the most informative geometric elements like eyes, eyebrows, mouth by utilizing the Viola-Jones cascade object detector. Here, the Haar features of the segmented elements were discovered. Then the process of classification was done by utilizing the one vs all logistic regression model. The advantage of extracting the Haar features was easy computation and helped to represent the signals which were presented in low dimensions in an effective manner. But, it was difficult to recognize the expressions in dynamic images. A new feature extraction background which helped in signifying the differences in the facial expression as a linear mixture of localized basis functions was presented (Sariyanidi et al. 2017 ). Here, a sparse linear model was trained with the Gabor phase shifts, which was computed from the facial video. It helped to obtain the linear basis function of the anticipated framework. The difficulties of generalization were addressed by this framework and achieved by similar learning parameters by recognizing both the posed and spontaneous micro-expressions. The authors announced a novel approach for extracting the noticeable areas of the face with Self-Organizing Map based Neural Network classifier (Liu et al. 2017) . Firstly, the normalization occurs on the prominent areas of similar position on the face. On the salient areas, the features such as LBP and HOG are extracted. The Principal Component Analysis was used to decrease the dimension of the fusion features. The peak expression frames are used in this method with the help of the salient area definitude method. The face expressions were classified using numerous classifiers. The alignment of the particular areas with several expressions in the face was done through the normalization of salient regions. Besides, there were limitations in recognition which was not performed with inaccurate landmarks, and also there was no improvement in recognition with fewer image data. A novel approach was presented to recognize the facial expression with three steps such as feature extraction, feature optimization and emotion recognition (Mistry et al. 2016) . The face representations are originally extracted using altered Local Binary Patterns. The facial expressions are separated using embedded PSO algorithm through finding the important and discriminative features. Different classifiers are used for recognizing the emotions like happy, sad, anger, surprise, fear, disgust and neutral. This approach is assessed using extended Cohn Kanade and MMI databases. The advantage of this work is computational cost and speed of the convergence. The author introduced a novel approach to discover the specificity of the expression variation in the face . The specificity of the expression was signified through the triplet wise expression recognition in accordance with Action Unit (AU) and patch weight optimization. To acquire better generalization, the active AUs are detected and presented in testing samples using sparse representationbased approach. The accuracy is enhanced in this approach, and the cross-database was better in its performance. In realtime applications, the recognition in cross-database was difficult for acquiring better results. From this section, different traditional feature extraction techniques have discoursed. It includes some drawbacks like difficulty in acquiring better results, no improvement in recognition with fewer image data and difficulty in recognizing the expressions in dynamic images. An automatic facial expression system for extracting the salient data from their video sequences was developed (Fang et al. 2014) . The framework explored the parametric space, and it was tested with six machine learning techniques. An entire collection of machine learning approaches were working with the specific task of expression in face recognition. These approaches performed well in the case of face recognition with accurate results. The face identification system was proposed corresponding to the set of three face detectors (Yu and Zhang 2015) . The Face Expression Recognition (FER) challenge offered the pre-trained huge dataset, and each CNN technique was initialized randomly. The multiple CNN model was included with two systems for learning the collaborative loads of their network responses. The authors utilized the dataset for standard expression of face, and it was made by influencing the societal images and the deep model system was trained based on the standard database (Peng et al. 2016) . The specific keywords were used for image search, and it was obtained from social labelled images. The mislabeled images were removed by junk image cleaning. The spontaneous expressions were recognized by deep Convolutional Neural Network. For emotion recognition, the author proposed a novel technique in which the recognition of the face is taken from the single image frame which used the combination of the geometric features as well as /the appearance with the help of Support Vector Machine classification (Ghimire et al. 2017a, b) . By isolating the face region into the regular grid, appearance aspects for the facial expression recognitions were calculated. Also, the geometric aspects are extracted from the respective areas. The emotion recognition technique had been evaluated on Cohn-Kanade dataset. Facial Expression Recognition and Analysis challenge (FERA) was introduced to detect the Action Units and intensity on more number of data where a broad set of videos comprising nine outlooks were generated (Valstar et al. 2017) . FACS interpreted dataset was absorbed for estimating the expressions on several camera visions, extending risky postures. The problems were in estimating the intensity in expression and also in detecting head poses were interpreted in this method. The geometric features used for the standard results improves the potential for the specific challenges. The authors present a real-time facial expression recognition system on smart phones (Suk and Prabhakaran 2014) . The system returns the recognized expression by the SVM classifiers for facial expression recognition with the dynamic features. The results of the work show that the system was effective on mobile devices with speed and accuracy. The above study conveys that existing image processing techniques used for emotion recognition had problems such as inaccurate recognition of the emotion, less efficiency, which degrades the performance in categorizing the facial expressions. To solve these issues, the proposed work aims to introduce an efficient face recognition system by using improved feature extraction and classification techniques. This section describes the overall flow of the GFE-HCA system for identifying the facial images from a certain video sequence. The Fig. 1 represents the flow of the GFE-HCA system in which the input facial image is provided from which the foreground images will be detected using Viola-Jones algorithm (Jensen 2008) . It detects the face region more accurately and faster when compared to other techniques. Thus, the Viola-Jones algorithm is utilized for extracting the facial regions followed with the extraction of facial parts such as right eye, left eye, nose and mouth, which is depicted in Fig. 2 . The steps performed after the detection of facial parts are as follows. • The shape appearance features extracted with respect to the appearance of shape in the facial parts and the edgebased feature extraction is performed by means of edgebased invariant transform. • Using the extracted edge-based invariant transform feature, an Eigen value-based Weighted Visibility Graph is constructed. From this constructed graph, the following features such as link density in the graph, average closeness centrality, graph entropy, average distribution weight of the graph and average degree of a graph are then extracted. Fig. 1 Overall flow of the proposed system • The extracted shape-based and graph-based features will be classified with the help of the Self-Organizing Map based Neural Network Classifier. Finally, performance analysis is calculated to prove the efficiency of the proposed method. An edge is a point in a digital image on which the intensity deviates abruptly. The parts of the facial image are extracted by using the edge-based invariant transform. The invariant features are extracted here for predicting the uniqueness among the emotions, by suppressing the common information of the images. The Algorithm-I for the edgebased invariant transform is provided below. In this, the face parts are extracted initially by the prediction of edge regions on applying the Canny Edge detector (Ramadass et al. 2018) mechanism. The Canny Edge detector is one of the best prevalent algorithm for the detection of edges. It only requires minimum numerical computations and also satisfies the following specifications like good detection, noise sensitivity, good localization, speed and efficiency. The filtered coefficients of the parts are then estimated and the corners are predicted by using the filtered coefficients for each individual parts of the face. The invariant features are estimated by the detected corner in which the area, pixels and points are identified finally. The key points are calculated by the Harris point detection in the image. The invariant transform depends on the appearance of the object in a specified point of interest and is invariant to the scaling images and rotation where the invariant features denote the detection of variations among different emotions. The algorithm for edge-based invariant transform is given as follows. The dimensions of extracted edge-based invariant features are improved using weighted visibility graph. The visibility graph has the property for characterizing the edge-based invariant features into graph-based features since it inherits the dynamical properties of the feature data where it was created. The resultant network is then utilized for acquiring valuable information about the features (i.e.) from that graph that can acquire the graph-based features like link density, an average of closeness centrality, graph entropy, average distribution weight of the graph and average degree of a graph. Here, the weighted visibility graph is constructed for edge-based invariant features by considering a graph as, where N ij represents the feature data which is considered as nodes and E ij represents the edges formed between the features data. The natural visibility graph algorithm is utilized for finding the links amongst the various nodes of the constructed weighted visibility graph (Zhu et al. 2015) . This visibility graph depends on the idea of Euclidean plane in which every vertex denotes the location of the point and the associations among the nodes is possible only if visibility is amongst them which is depicted in Fig. 3 . The edges and the weights amongst any pair of nodes exist if they satisfy the condition depending on the visibility graph and this can be represented as given in Eq. (2), where, p x = p t x and p y = p t y are the sample data points. x < z < y denotes the index of feature values. (1) Here p t x denotes the features points at the t x location, p t y denotes the feature points at the t y location, p t z denotes the feature values at the t z location, t x and t y denotes time events, t z is some event that exists between them, i.e., t x < t z < t y . After this, the edge weight for the links that are formed between two nodes is determined. The preservation of the weight information helps to obtain the robust result of a complex network. This weighted complex network plays an important role in distinguishing the weak and potentially less significant edges. Here the edge invariant data can be defined by producing the weighted visibility graph. This constructed graph is denoted as G V ij , E ij , w ij where w ij ∶ E ij → R weighted function. Naturally, every edge of the graph are directional as a connection amongst the node n x = p x and node n y = p y is considered to have direction from n x to n y where x < y . Here, let us consider the absolute value of the edge weight, and it can be calculated as, where w denotes the weight of the edge amongst the nodes n x and n y . Here every edge weight value is considered in the radian function, and the eigenvalue transforms the weight in the linear form. The arctan is referred to an inverse trigonometric function which supports to recognize the variations in the data. Eigenvalues are the special set of values associated in a linear system where the eigenvectors and eigenvalues are used to transform the matrix given. Here, eigenvalues are used for weight updation. Next, the features from the graph such as link density, average closeness centrality, graph entropy, average distribution weight and average degree of the graph are extracted. The Algorithm II for graph-based feature extraction is illustrated as follows. ( The graph-based features extracted are discussed in the previous sub-section. Instantly, the shape-based features are extracted using the shape appearance of the facial parts. The shape-based feature extraction is the process of extracting the facial parts like the right eye, left eye, nose and mouth. The shape-based extraction mainly focuses on the height and width of each facial parts. Initially, the height and width of each facial parts are calculated as per the total number of face parts in the facial image, and hence the shape of the image is finally extracted. The Algorithm III for this shapebased feature extraction method is shown below. After the extraction of features, the features are then classified with the use of Self-organizing Map-based Neural Network Classifier. The neural network system is generally composed of highly interconnected components that process the information by their dynamic state responses to the external inputs. It is most widely utilized in the application of pattern recognition because they have the capability of generalizing as well as responding to unexpected patterns of inputs. The Self-Organizing Map is a structure of feedforward with a single computational layer of neurons that are arranged in rows and columns. At first, in the training phase, the mean threshold for each feature of the class is estimated. Then, the features that are belonging to the same class are processed. The pairwise distance will be computed for each feature set. After that, the density for that specified class depending on the distance among the features will be evaluated. Depending on the estimated distance, the rank for each class will be identified. For that updated rank, the gradient is being identified. Based on that gradient, the centroid computations are made. Then, in the testing phase, the test features are processed by the same operation of the threshold, pairwise distance and centroid computation. Based on the centroid, the positions are updated. The next phase is the validation phase in which the updated features in the training phase, as well as the testing phase, are applied. From this, the expected class is predicted corresponding to the test features by the trained network. The Algorithm IV for classification is given as follows. The measures like sensitivity and specificity are utilized for evaluating the performance of classification. Sensitivity defines the capability of the classifier to correctly recognize which are given in the following equations, True Positive (TP) is the number of positive cases (emotions) that are detected correctly. True Negative (TN) is the number of negative cases (emotions) that are not detected. False Positive (FP) is the number of positive cases which are falsely detected. False Negative (FN) is the number of negative cases which are rejected. Depending upon these calculations, the accurateness of the emotion recognition system is deliberated. Accuracy is defined as the closeness of a measured value to the standard values, and it is calculated as follows: Precision is defined as the amount of the true positive outcomes beside all positive effects. This can be evaluated using the following equation as, The implementation outcomes of the GFE-HCA system is presented using numerous measures like sensitivity, specificity, accuracy, precision and recall. The proposed method is evaluated using MMI dataset (MMI Facial Expression Database 2016), which consist of 2900 videos containing high-resolution images of 75 subjects with different facial expressions of various people. The issues for analyzing the automatic human behavior are solved using a dataset which is developed here. From MMI dataset, 175 facial expression sequences layering emotions of 33 subjects were selected. Each sequence is labelled with the following emotions, namely anger, disgust, fear, sad and happy. Furthermore, the improvement of the proposed method is presented by relating it with the existing emotion recognition technique is considered in this paper (Ghimire et al. 2017a, b) . The proposed work is implemented using MATLAB version R2017b, Intel core i5-4200U system configured with 8 GB RAM and 2.5 GHz clock speed. The choice for this tool is due to the wide array of functional libraries. The execution of the proposed methodology is evaluated using MMI dataset. The performance metrics like sensitivity, specificity, precision, recall and accuracy are utilized for assessing the performance of the proposed methodology. The confusion matrix of the proposed system is considered for five emotions is enumerated in Table 1 . The proposed approach offers improved results than the existing approach, is shown in Fig. 4 . Cross-validation technique considered for performance analysis makes sure that overfitting does not happen. K-fold cross-validation is employed where the data set is partitioned into K equal parts. During the first run, the first part is kept untouched, and the remaining K − 1 parts are used to perform model training. Testing is done with the part that was kept untouched, similarly for all the K times, one partition is kept untouched, and other parts are used for training. After the training, the untouched part is used for evaluation. This type of validation makes sure the entire data set is equally Fig. 5 a Cross-validation of Sensitivity using 10 Folds. b Cross-validation of Specificity using 10 Folds. c Cross-validation of Precision using 10 Folds. d Cross-validation of Recall using 10 Folds. e Cross-validation of Accuracy using 10 Folds validated. The average accuracy of our proposed work shows above 96%. From Fig. 4 , the sensitivity comparative results for the proposed approach offers increased sensitivity results to the 'anger', 'happy' and 'sad' emotions whereas, it offers less sensitivity to the 'disgust' emotion. Likewise, the proposed approach offers increased accuracy results for the emotions 'fear' and 'sad' with an overall accuracy above 94%. Similarly, the proposed method offers increased specificity results for the emotions 'anger', 'happy', and 'sad' emotion, whereas it gives less specificity results to 'fear' emotion. From the results, it is perceived that the proposed method offers increased precision results for the emotions 'happy' and 'sad' whereas it gives less precision results to 'fear' emotion. The recognition rate of the proposed technique is calculated using various performance measures separately for the listed five emotions like anger, disgust, fear, sad, happy using tenfold cross-validation. The performance of the proposed method has been analyzed with sensitivity, specificity, precision, recall, accuracy using tenfold cross validation as presented in Fig. 5a -e. From the assessment, it is apparent that the proposed technique offers a better recognition rate when measured with the existing methods. MMI dataset is complex when compared with other datasets, as per the base paper (Ghimire et al. 2017a, b) and the author applauded MMI dataset as challenging when compared with CK + and MUG datasets. Here the performance is validated and compared with the existing (Ghimire et al. 2017a, b; Khan et al. 2018; Gogić et al. 2018) methods. Weber Local Binary Image Cosine Transform (WLBI-CT) (Khan et al. 2018 ) extracts and integrates the frequency components of images obtained through Weber local descriptor and local binary descriptor. The confusion matrix of the WLBI-CT method is given in Table 2 . As in Gogić et al. (2018) LBF-NN uses a combination of gentle boost decision tree and neural network on the distinct facial landmarks optimized across all expressions under study. Theses combined feature optimization improves the rate of recognition in complex expressions. The confusion matrix of the LBF-NN method with Person Independent (PI) scenario is given in Table 3 . However, while considering the Person Dependent (PD) scenario, this method yields far better results compared to the methods discussed in this paper. Comparison of the proposed method with the existing methods (Ghimire et al. 2017a, b; Khan et al. 2018; Gogić et al. 2018 ) is depicted in Table 4 . This work aims to recognize human emotion in an effective manner by proposing a novel method. This paper introduces a GFE-HCA: Graph-based Feature Extraction and Hybrid Classification Approach for recognizing the facial expression from the given input image. Initially, the facial regions are detected in the foreground detection step, and then the Table 3 Confusion matrix of the existing work (Gogić et al. 2018) in PI scenario facial parts are extracted. The features from that detected facial parts are extracted using edge-based invariant feature. Then, the weighted visibility graph is constructed for optimizing the dimensions of the edge-based invariant features. Then the features from a graph such as link density, average closeness centrality, graph entropy, average distribution weight and average degree of the graph are extracted. Also, the shape appearance-based features from the facial parts is extracted. Finally, the Self-Organizing Map based Neural Network Classifier is introduced to recognize and classify the facial expressions. This approach is compared with the existing techniques and verified with several performance metrics such as sensitivity, specificity, precision, recall and accuracy. The evaluation results concluded that the proposed approach offers better results with increased emotion recognition rate when compared to the various existing works. Recently Online learning platform, IoT and Autonomous vehicles are trending among the researchers. Future work concentrates on a few of the following. • In our work, the facial image and the camera plane are in parallel, recent advancement in online learning also ventures that the work needs to be improved for at least three different angles 15°, 45° and 60°. • Improving the model to adapt and give accurate results on resource constraint devices as in most cases of IoT devices • Image stabilization is not guaranteed while in autonomous moving vehicles, future work should concentrate on finding a mechanism to give an accurate result. Our future goal on the same research work is to add a feedback continuous training module that can cater to continuous training and layer addition to adapt upcoming time on time changes. One such example is how to use the same model and add a layer to give a good result on facial emotion during CoVID'19 mask applied facial images. Proposed work with MMI and 5 labels 96.47 Ghimire et al. (2017a, b) with MMI dataset 77.2 WLBI-CT (Khan et al. 2018 ) work with MMI and 5 labels 94.44 LBF-NN (Gogić et al. 2018) in PI with MMI dataset 75.09 Facial expression recognition in dynamic sequences: an integrated approach A systematic literature review about technologies for self-reporting emotional information Facial expression recognition based on local region specific features and support vector machines Recognition of facial expressions based on salient geometric features and support vector machines Fast facial expression recognition using local binary features and shallow neural networks Multi-level haar wavelet based facial expression recognition using logistic regression Reliable facial expression recognition for multi-scale images using weber local binary image based cosine transform features A Framework for IoT-enabled virtual emotion detection in advanced smart cities Implementing the Viola-Jones face detection algorithm Facial expression recognition with fusion features extracted from salient facial areas Facial expression recognition with convolutional neural networks: coping with few data and the training sample order Ensemble of texture descriptors and classifiers for face recognition A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition Going deeper in facial expression recognition using deep neural networks Ensemble of texture descriptors for face recognition obtained by varying feature transforms and preprocessing approaches Towards facial expression recognition in the wild: a new database and deep recognition system Facial expression recognition using stationary wavelet transform features Local directional ternary pattern for facial expression recognition Affective state detection via facial expression analysis within a human-computer interaction context Learning bases of activity for facial expression recognition Real-time mobile facial expression recognition system-a case study 2017-addressing head pose in the third facial expression recognition and analysis challenge Facial expression recognition with FRR-CNN Active AU based patch weighting for facial expression recognition Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification Image based static facial expression recognition with multiple deep network learning Learning multiscale active facial patches for expression analysis High-fidelity pose and expression normalization for face recognition in the wild Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations