key: cord-0152215-dr5nidgl
authors: Mao, Qirong; Zhou, Ling; Zheng, Wenming; Shao, Xiuyan; Huang, Xiaohua
title: Region attention and graph embedding network for occlusion objective class-based micro-expression recognition
date: 2021-07-13
journal: nan
DOI: nan
sha: d076d3cca6ea445c9a9d2f2f49aa4267858f65ac
doc_id: 152215
cord_uid: dr5nidgl

Micro-expression recognition (textbf{MER}) has attracted lots of researchers' attention in a decade. However, occlusion will occur for MER in real-world scenarios. This paper deeply investigates an interesting but unexplored challenging issue in MER, ie, occlusion MER. First, to research MER under real-world occlusion, synthetic occluded micro-expression databases are created by using various mask for the community. Second, to suppress the influence of occlusion, a underline{R}egion-inspired underline{R}elation underline{R}easoning underline{N}etwork (textbf{RRRN}) is proposed to model relations between various facial regions. RRRN consists of a backbone network, the Region-Inspired (textbf{RI}) module and Relation Reasoning (textbf{RR}) module. More specifically, the backbone network aims at extracting feature representations from different facial regions, RI module computing an adaptive weight from the region itself based on attention mechanism with respect to the unobstructedness and importance for suppressing the influence of occlusion, and RR module exploiting the progressive interactions among these regions by performing graph convolutions. Experiments are conducted on handout-database evaluation and composite database evaluation tasks of MEGC 2018 protocol. Experimental results show that RRRN can significantly explore the importance of facial regions and capture the cooperative complementary relationship of facial regions for MER. The results also demonstrate RRRN outperforms the state-of-the-art approaches, especially on occlusion, and RRRN acts more robust to occlusion.

M ICRO-EXPRESSIONS, different from macroexpressions or subtle expressions, are hidden emotions with low intensity and frag-mental facial action units where only part of the facial muscles of full-stretched facial expressions are presented [1] . Micro-expression recognition (MER) aims at revealing the hidden emotions of humans and understanding people's deceitful behaviors when microexpressions occur [2, 3] . As it holds tremendous potential Q. Mao and L. Zhou impact on a wide-range of applications including psychology, medicine, and police case diagnosis [3, 4] , MER has attracted significant interest from psychologists and computer scientists in recent years.

Although many MER approaches have been proposed, all of them are built on samples captured in laboratory-controlled environments, such as CASME II [5] and SAMM [6] . Unfortunately, in many real-world scenarios, the human's face may be partially occluded by such as glasses. For example, a sunglass or a virtual reality mask occludes eyes, while a scarf or medical mask occludes the mouth and nose. especially in the challenging Coronavirus epidemic prevention period. Thus far, occlusion becomes a challenging problem in the field of face analysis task since it causes noise to feature descriptor. Note that, occlusion problem has been received attention in face identity recognition [7, 8] and macro-expression recognition [9, 10] . However, little attention has been paid to occlusion for MER, due to the lack of occluded microexpression databases and efficient occlusion MER methods. In last decade, most of current MER methods have focused on micro-expression without occlusion. Our empirical experience in Section IV-C shows these methods perform unsatisfactorily for MER on partial occlusion. Therefore, it is very worthwhile to investigate the influence of occlusion to MER and propose a more robust method for MER to resolve occlusion problem.

To achieve the above-mentioned goals, we first construct six occlusion micro-expression databases derived from CASME II and SAMM by considering different occlusion cases, namely Mask-CASME II, Glass-CASME II, Random Mask CASME II, Mask-SAMM, Glass-SAMM, and Random Mask SAMM. The occlusion databases are manually synthesized with occlusion types of wearing masks, glasses, and random region masks on micro-expression sequences. Second, motivated by [11] [12] [13] in image understanding, we propose the Regioninspired Relation Reasoning Network (RRRN) for MER to highlight the important facial regions and model region relationships for enhancing the ability of resistance to occlusion. It consists Backbone network, Relation Reasoning (RR) module, and Region-Inspired (RI) module. Briefly, Backbone network obtains the coarse region features, RR module learns the global feature by building relationship on the complementary region feature, RI module aims at obtaining the weighted region features for RR module. Moreover, with specific data augmentation strategy for micro-expression samples, experimental results demonstrate that our proposed RRRN substantially sup-presses the influence of occlusion to MER and outperforms the existing state-of-the-art approaches in both partially occluded and non-occluded situations.

The four contributions of our work are summarized as follows.

• We synthesize a number of occluded micro-expression databases. To the best of our knowledge, it is the first to build micro-expression databases in occlusions for videobased MER.

• We present a Region-inspired relation reasoning network to resolve occlusion problem of MER by incorporating region weighted feature learning and region-based relation reasoning into embedding learning in a deep learning network.

• We propose a Region-Inspired module to compute an adaptive weight from the region itself according to the unobstructedness and importance of region. We also introduce a novel region graph representation to capture relationships between attended parts in a single micro-expression sample. Moreover, graph convolutional network-based parts relation reasoning on this graph is then performed, leading to the complementary Relation Reasoning module.

• Experimental results demonstrate the proposed RRRN yields other state-of-the-art methods on non-occluded and synthesized partially occluded micro-expression databases under the MEGC 2018 settings.

Since the occlusion MER problem has not yet been investigated, in this section we literature previous works closely related to ours, i.e., features in MER and occluded facial expression recognition.

Most handcrafted features based methods are originally designed for MER. According to [14] , they are divided into appearance-based and geometric-based features. The widely used representation is Local Binary Pattern from Three Orthogonal Planes (LBP-TOP) [15] . Considering its low computational complexity, many LBP-TOP variants have been proposed, e.g., Spatiotemporal Completed Local Quantized Patterns (STCLQP) [16] , hierarchical spatiotemporal descriptors [17] , discriminative spatiotemporal LBP with revisited integral projection (DiSTLBP-RIP) [18] and others [19] [20] [21] [22] [23] . Besides the LBP family, 3D Histograms of Oriented Gradients (3DHOG) [24, 25] focuses on counting occurrences of gradient orientation in localized portions of the image sequence. Different from appearance-based features, geometric-based features aim to represent micro-expression samples by the aspect of face geometry, e.g., shapes and location of facial landmarks. The representations include Delaunay-based Temporal Coding Model (DTCM) [26] , Main Directional Mean Optical Flow (MDMO) [27] , Facial Dynamics Map (FDM) [28] , and Bi-Weighted Oriented Optical Flow (Bi-WOOF) [29] .

Besides the handcrafted features, many feature learning based methods have been proposed for MER [30] [31] [32] [33] [34] [35] [36] [37] . For example, to better represent the subtle changes in microexpression, Li et al. [37] proposed a joint feature learning architecture coupling local and global information for MER. Khor et al. [38] adopted Enriched Long-term Recurrent Convolutional Network based on optical flow features, which containes the channel-wise spatial enrichment and the feature-wise temporal enrichment. Meanwhile, Li et al. [31] claimed three-stream 3D flow convolutional neural network and Peng et al. [39] leveraged Dual Temporal Scale Convolutional Neural Network (DTSCNN) for MER. The DTSCNN was the first work in MER that utilized a shallow two-stream neural network with inputs of optical-flow sequences. Then, several shallow networks were proposed for MER [40] [41] [42] [43] [44] [45] [46] [47] [48] and they avoided from the over-fitting problem caused by the scarcity of micro-expression data. Different from those models which used shallow networks to alleviate over-fitting, Peng et al. [49] adopted pre-trained ResNet10 [50] as a backbone and introduced transfer learning strategy on macroexpression databases to handle the over-fitting issue. Recently, motivated by the attention strategy and the transfer learning mechanism, Zhou et al. [51] utilized ResNet with the input of apex frames for MER. More recently, to explore the semantic relationships between Action Units and emotion classes, Lo et al. [52] and Xie et al. [47] used graph convolutional network (GCN) [53] to MER. Compared with other deep methods, both works [33, 35] indicate that Eulerian Video Magnification (EVM) contributes to increasing recognition accuracy. However, it is hard to control the magnification factor of EVM in real-world scenarios. Xia et al. [33] proposed an effective data augmentation method to avoid from the over-fitting problem and a balanced loss function to tackle the data imbalance issue for MER. Additionally, Yu et al. [54] proposed a novel Identity-aware and Capsule-Enhanced Generative Adversarial Network to improve MER performance in an end-to-end way.

From the above features learning in MER, it is clear they all focused on un-occluded MER, and ignoring digging the different importance of facial region features for MER, let alone occlusion MER. Our work differs from these approaches in that it explicitly learns weighted facial region features and capture the cooperative complementary relationship of facial regions for occlusion MER.

Facial expression recognition (FER) research on handling occlusion has been studied over the past decade. Amounts of deep learning methods have been proposed to resolve partial occlusions for FER [9, 10, [55] [56] [57] [58] [59] . For example, Houshm et al. leveraged a transfer learning methods to get more discriminative features in presence of occluded facial region, when the user is wearing a head-mounted display in a VR setting [56] . Hu et al. tackled the partial occlusion for FER by using face inpainting and region features [55] . More specifically, the symmetric SURF and face inpainting with mirror transition are applied to detect occluded part, such that it reduced the influence of occluded areas on the performance. Subsequently, a FER network based on heterogeneous soft blocks are leveraged to weight the importance of each area. Li et al. presented a convolutional neutral network with attention mechanism to concern about the contribution of different facial regions [10] . More recently, Ding et al. proposed Occlusion-Adaptive Deep Network (OADN) with a landmark-guided attention branch and a facial region branch [57] . In that work, landmarkguided attention branch discards feature elements which have been corrupted by occlusions, while the facial region branch learns robust feature with complementary context information. Wang et al. proposed Region Attention Network (RAN) to learn robust representation for occlusion FER by exploring combination of local and global features [9] . RAN adaptively captures the importance of facial regions for occlusion and pose variant FER by leveraging the self-attention mechanism, and aggregating those weighted region features into a global representation for final prediction.

It is worth mentioning that RAN is the most related work to ours, but our work is distinct from RAN [9] . The differences are presented as follows: (1) We propose a new method to learn region features. Wang et al. [9] used a simple attention mechanism to obtain the region feature and did not consider the information decay. By contrast, we propose a Region-Inspired (RI) module with region attention to better obtain the region information. (2) We propose the different strategy to get global representation. In [9] , global feature is concatenated by the refined region features with relationattention module but ignoring the complementary relationship of regions. Different from [9] , we leverage RR module to capture appearance relationships among the weighted facial regions by GCN reasoning. The outputs of such GCNs are the updated node features (with each node representing an attended facial region), which are further used to learn embedding to the semantic space and to recover missed features for occlusion MER; (3) Besides only the region biased loss used in [9] to capture the importance of regions, a correlation loss in our work is introduced to enhance the robustness of global feature by balancing the relationship between local and global representation.

In this section, we first overview the proposed RRRN, and then describe its architecture in detail. Finally, we elaborate on the usage of loss functions for RRRN.

RRRN architecture mainly consists of three important modules: the Backbone network, the Region-Inspired module (RI), and the Relation Reasoning (RR) module. To suppress the influence of occlusion to MER, RRRN endeavours to adaptively extract the importance of facial regions in the RI module, and reasonably to model complementary relationships between different facial regions for learning robust features in the RR module. All the modules are jointly trained by the objective function loss (Eqn. 9), which is composed of a region biased loss (Eqn. 7) [9] , a cross-entropy loss (Eqn. 6), and an introduced correlation loss (Eqn. 8) [13] . In the pre-processing, for each micro-expression sequence, we calculated the TV-L1 [60] optical flow from the onset and apex frames. Motivated by [9] , given horizontal and vertical components of each optical flow, we first crop them into a number of regions with fixed position cropping. Subsequently, the cropped regions along with the original optical flow region are fed into the Backbone where its outputs are some region features. Next, each region is assigned with an attention weight by using RI. Afterward, RR represents the region features into relational region features, by reasoning relationship among individual region features to further capture content-aware global graph embedding. Lastly, we leverage the weighted region featured and the global graph representation to predict the micro-expressions.

As a motion information feature, the optical flow is extensively used by [40, 41, 61] for micro-expression recognition. More specifically, the optical flow of each sample is extracted from the onset and apex frames, where onset and apex frames mean the frames with neutral-expression and the highest expression intensity, respectively. For RRRN, TV-L1 optical flow method [60] is utilized to obtain motion features from the onset and apex frames of each micro-expression video.

For preserving more motion information, two optical flow components are extracted to represent the facial change along horizontal and vertical directions. Then, to capture different region features, we use the fixed position cropping method 

where k represents the k-th facial region, and 0 ≤ k ≤ K, K = 5.

C. Backbone network K facial regions are fed into a CNN module with two ResNet18 and a concatenated layer. In detail, given the input X, the vertical and horizontal feature maps are extracted by the CNN module, where they are denoted as

, respectively. Then, the two components feature maps corresponding to the k-th facial region are concatenated into one feature map p k = [Z 1 (x k v ); Z 2 (x k h ))] to represent the k-th region.

According to [62] , FACS demonstrates that there exists relationship between facial muscles. To imitate the relationship between facial regions, we develop RR module based on GCN to model the relationship between facial regions. Experiments of model ablation in Section validates that our developed RR module helps RRRN boost the performance.

In detail, we first construct a region graph Γ ∈ R K×K with K region features as its K nodes. The dot-product is leveraged to calculate the pairwise similarity:

where f k is L 2 -normalized, and p i and p k mean the i-th and k-th region features, respectively. As the dot-product calculation is equivalent to the cosine similarity metric and the graph has self-connections, the degree matrix D of Γ is calculated as followed:

Given Γ, we leverage GCN method to perform reasoning on this graph. In GCN, graph convolution allows each target node in the graph to aggregate features from all neighbor nodes according to the edge weight between them. It means that messages can be passed inside the graph to update each AU feature. Therefore, the GCN outputs can be used to update relational features of each region node. Formally, one GCN layer is represented as,

where σ(.) denotes the ReLU activation function in our work, and P (0) ∈ R K×C denotes the stacked K region features, C is their dimensions. W (l) is the learnable weight matrix. Subsequently, the K region features in P (2) ∈ R K×C are updated by GCNs. Lastly, the K region features are aggregated for classification.

Although our previously developed RR module efficiently learns the relationship between facial regions, RR has weakness to address the occlusion situation when occlusion coming. Therefore, it is expected that one additional module can better polish the noise caused by occlusion for RR module.

Considering the above-mentioned expectation, we propose Region-Inspired (RI) module, which aims at automatically learning low weights for the occluded regions but also high weights for the un-occluded and discriminative regions. It consists of region-squeeze and concatenating, region attention, and weighted region feature stages. The first stage encodes feature maps of each input region by using an independent region-squeeze block to preserve more information when learning region specific patterns. The second stage adaptively learns the weights of different facial regions according to their contribution to the recognition. The last stage obtains the weighted region feature maps by multiplying the region features with its corresponding learned weights.

Region feature maps are first fed into a convolution layer without decreasing the spatial resolution, then pooling beside the channel axis are applied, following with another convolution layer. We use max-pooling and average-pooling besides channel axis and concatenate both to obtain feature descriptors. Mathematically speaking, suppose that p k ∈ R H×W ×C denotes the input feature maps of the k-th region, the k-th region-squeeze block takes the feature maps p k as the input and learns the squeezed region feature maps µ k ∈ R H×W ×1 :

where F 1 and F 2 mean the convolution layer before and after pooling, respectively.

[·] is feature concatenating operation. Afterwards, K sets of squeezed region feature maps are concatenated into one set of feature maps ψ ∈ R H×W ×K , with H, W , and K being its height, width, and the number of regions, respectively.

With the weighted feature f k , we can re-formulate Eqns. (2) and (4) by replacing p k with f k . Lastly, the K weighted region features in F (2) ∈ R K×C updated by GCNs and the K weighted region features f k (0 ≤ k ≤ K) further undergo an element-wise addition, respectively. Therefore, RI is embeded into RR module, such that it assist RR module to suppress the influence of occlusion for MER.

Before aggregating features for classification, weighted region features and relational region features are concatenated to generate micro-expression-level representations for final classification.

The objective of RRRN is to correctly classify the microexpression samples into their corresponding labels y. The standard cross-entropy loss is employed for micro-expression predication,

where N is the number of the training samples, y i is the objective class label of the i-th training instance, Φ i denotes the last fully connected layer in the model activated by a sof tmax unit. As aforementioned, RI is a local feature learning stage while RR is a global feature learning stage. More specifically, RI aims to learn region features with different weights according to the contribution for perceiving the occluded facial regions, and RR aims at learning more effective representation by capturing the cooperative complementary relationship of facial regions. To learn better region features and more robust global feature, two additional losses are thus introduced in both two modules for considerable performance.

1) Region Biased Loss: According to [63] , different facial expressions are mainly defined by facial action units. It is encouraged to assign a high attention weights to the most import region. Therefore, Region Biased Loss (RB-Loss) [9] is used in our RI module to make a straightforward constraint on the attention weights. It resorts a simple constraint in [9] that the maximum attention weight of facial regions should be larger than the one of the original face image. In other words, RB-Loss can make an agreement with our goal. The RB-Loss is formulated as:

where β is a hyper-parameter, α max is the maximum attention weight of all the K − 1 facial crops, and α 0 denotes the attention weight of the original face region. RB-Loss restrains the relationship among region features by using region feature learning based on prior knowledge.

2) Correlation Loss: The Bag of Words [11] indicates the aggregated feature obtains the higher prediction probability than single region feature. Motivated by the work [11] , we embed the Correlation loss (Cor-Loss) into RI and RR modules in RRRN for restraining the relationship between region features and global features. Importantly, the Cor-Loss can guarantee that the prediction probability of the final combined feature is greater than that of single region features. It is formulated as followed:

where f k denotes the feature of the k-th facial region, the function P is the confidence function which reflects the probability of classification into the correct category, f a is the aggregation of weighted region features and relational region features.

The final loss function is defined as followed:

where λ 1 and λ 2 balance these losses.

We focus on objective class-based MER based on the MEGC 2018 protocol for our experiment. In this section, we will first describe the experiment settings, including the two tasks of objective class-based MER, evaluation metrics, and our implementation details. Then, we compare our method with the state-of-the-art objective class-based MER methods on un-occluded and synthesized occluded databases. Finally, feature maps and attention weights of the facial regions in the RI module are visualized for analysis.

In this paper, we focus on MEGC 2018 setting [64] for micro-expression recognition on objective classes. Specifically, objective classes are defined in [65] according to Facial Action Coding System (FACS) [62] . The relationship between action units and objective classes I-V is shown in Table I . MEGC 2018 setting combines CASME II [5] and SAMM [6] datasets into a composite database. It primarily focuses on the first five classes, i.e., objective classes I to V. The composite database contains 185 samples of 26 subjects in CASME II and 68 samples of 21 subjects in SAMM, respectively. Table II summarizes the composite database and its sample distribution over each objective class. Units  I AU6, AU12, AU6+AU12, AU6+AU7+AU12, AU7+AU12   II  AU1+AU2, AU5, AU25, AU1+AU2+AU25, AU25+AU26,  AU5+AU24   III  A23, AU4, AU4+AU7, AU4+AU5, AU4+AU5+AU7,  AU17+AU24, AU4+AU6+AU7, AU4+AU38 IV AU10, AU9, AU4+AU9, AU4+AU40, AU4+AU5+AU40, AU4+AU7+AU9, AU4 +AU9+AU17, AU4+AU7+AU10, AU4+AU5+AU7+AU9, AU7+AU10 V AU1, AU15, AU1+AU4, AU6+AU15, AU15+AU17 2) Synthesis of occluded micro-expression databases: As there is no occluded micro-expression database available so far, we synthesized various occlusion cases on CASME II and SAMM databases for our experiments. More specifically, we synthesized with occlusion types of wearing masks, glasses, and random region masks (5%-50% partial occlusions) on un-occluded micro-expression sequences. Therefore, six occlusion micro-expression databases were derived, namely Mask-CASME II, Glass-CASME II, Random Mask CASME II (RMask-CASME II), Mask-SAMM, Glass-SAMM, and Random Mask SAMM (RMask-SAMM).

In daily life, the head movements and facial changes usually lead to the movements of some accessories, especially masks and glasses. For example, the tilted head causes the masks leaned at the same angle, and some expression related to the movement of frowning may cause a slight shift in the position of the sunglasses. In order to make the synchronous movement of accessories and face, the occlusion patch is fixed in the specific region according to facial landmarks in each frame. Specifically, We manually collected ten masks and ten sunglasses from the website as the occlusion patches. Subsequently, we generated the location of masks, sunglasses, and random region masks for each frame in a micro-expression sequence according to the detected landmarks.

(1) Task of MEGC 2018 MEGC 2018 contains two tasks: Holdout-database evaluation (HDE) and Composite database evaluation (CDE) tasks. In HDE task, there is 2-fold cross-validation in HDE protocol, i.e., training on CASME II while testing on SAMM (CASME II→SAMM), vice versa (SAMM→CASME II). Unweighted average recall (UAR) and weighted average recall (WAR) [66] are introduced in the task to measure the performance of different approaches. The average WAR and UAR are used in the experiment. In CDE task, Leave-One-Subject-Out (LOSO) cross-validation protocol is used. All samples from CASME II and SAMM databases are combined into a single composite database. Samples of each subject are held out as the testing set while the rest for training. F1 score and Weighted F1 score (WF1) are used to measure the performance of various methods. Here, the F1 is an average of the class-specific F1 across the whole classes (or macro-averaging), and WF1 is weighted by the number of samples in their corresponding classes before averaging [64] . These metrics are calculated as follows:

where C is the number of classes, c ≤ C, N c is the number of samples with the c−th class, and N is the total samples. T P c , F P c , and F N c are the true positives, false positives, and false negatives of the c−th class, respectively.

(2) Implementation details In the pre-processing step, we utilize face recognition algorithm 1 to obtain the facial area of each frame, and then extracte TV-L1 optical flow [60] features from the onset and apex frames. Moreover, two components of optical flow images are resized to 224 × 224 pixels.

Data augmentation: Due to limited sample in microexpression database, data augmentation is applied to our proposed network. For each micro-expression video sequence, the position of onset, apex and offset frames are denoted as 

For evaluating the robustness of our proposed method and the existing micro-expression features to micro-expression occlusion problem, we use THREE representative handcrafted descriptors and TWO deep learning networks for comparison. For three hand-engineered features, SVM is served as the classifier. The results are represented in Tables III and IV. (1) Comparison with handcrafted features We compare our RRRN with LBP-TOP [15] , LBP-SIP [67] , and 3DHOG [24] on the synthesized occluded samples. Their parameters setups are the same as that of experiment on unoccluded micro-expression databases and SVM is served as the classifier. Tables III and IV report the comparative results of three handcrafted features and our RRRN on HDE and CDE tasks, respectively. In Table III , WAR and UAR are used, and in Table IV , F1 and WF1 used. As shown in these two tables, some handcrafted features have very limited categorization capabilities, i.e., on the fold of CASME II→SAMM of HDE task, LBP-TOP and LBP-SIP almost categorize all the samples into the dominant class where the WAR and UAR are 0.294 is 0.200, respectively. It is also can be seen that our RRRN achieves significant increases in the performance over these features in all the occlusion experiments (Mask, Glass, and random occlusion). For example, the RRRN obtains the average WAR / UAR of 0.538 / 0.480 and 0.642 / 0.509 in the fold of CASME II→SAMM and SAMM → CASME II, which are much higher than the results of LBP-TOP (0.293 / 0.199 and 0.334 / 0.217) in the two folds of HDE task, respectively. Additionally, the considerable improvement is observed by comparing RRRN and LBP-TOP in the CDE task. The experimental results indicate RRRN can learn more robust representation than three handcrafted features for MER. Moreover, the results demonstrate the effectiveness of deep learning methods.

( The proceeds of RRRN, especially when compared with ResNet18, are due to the significant roles of RI module and the RR module in RRRN. As RI module explores the weighted region feature by using the attention mechanism, RRRN has capability to capture the subtle muscle motions in facial regions and discover the micro-expression related 

In this paper, we go deep into the more challenging task in micro-expression recognition (MER), i.e., occlusion microexpression recognition. We propose a novel approach for occlusion MER, named Region-inspired Relation Reasoning Network (RRRN), which involves three key components: the Backbone network, the Region-Inspired (RI) module, and the Relation Reasoning (RR) module. The proposed method can automatically capture the subtle movement of microexpressions in facial regions and learn the important factors for different facial regions according to the contribution of the regions for the final classification or according to the presence of occlusion. Through reasoning the relationship between the input of the weighted region features, the model achieves robust representation with the aggregation of complementary region features. Experimental results on RRRN under nonocclusion and occlusion scenarios on two widely used CASME II and SAMM databases and our synthesized versions demonstrate that the proposed approach achieves high-recognition performance on different occlusions. In the future, we will study an end-to-end approach for occlusion MER, as our model is based on pre-computed optical flow. Moreover, we will find more effective attention methods to explore subtle micro-expression movements, and introduce AUs relationship in micro-expressions for occlusion MER. 

Reading between the lies

Motion profiles for deception detection using Occ

Proceedings of ECCV

Telling Lies: Clues to Deceit in the Marketplace, Politics,and Marriage

Nonverbal leakage and clues to deception

CASME II: An improved spontaneous microexpression database and the baseline evaluation

SAMM: A spontaneous micro-facial movement dataset

Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes

On improving the generalization of face recognition in the presence of occlusions

Region attention networks for pose and occlusion robust facial expression recognition

Occlusion aware facial expression recognition using CNN with attention mechanism

Visual categorization with bags of keypoints

Attentive region embedding network for zero-shot learning

Graphpropagation based correlation learning for weakly supervised fine-grained image classification

A survey of automatic facial micro-expression analysis: Databases, methods and challenges

Dynamic texture recognition using local binary patterns with an application to facial expressions

Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns

Domain regeneration for cross-database microexpression recognition

Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition

Micro-expression recognition using color spaces

Microexpression recognition using dynamic textures on tensor independent color space

Sparse tensor canonical correlation analysis for microexpression recognition

Efficient spatio-temporal local binary patterns for spontaneous facial micro-expression recognition

Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection

Facial microexpressions recognition using high speed camera and 3d-gradient descriptor

Facial microexpression detection in Hi-speed video based on facial action coding system (FACS)

A delaunay-based temporal coding model for microexpression recognition

A main directional mean optical flow feature for spontaneous micro-expression recognition

Microexpression identification and categorization using a facial dynamics map

Less is more: Micro-expression recognition from video using apex frame

Microexpression recognition with expression-state constrained spatio-temporal feature representations

Micro-expression recognition based on 3d flow convolutional neural network

Spontaneous facial micro-expression recognition using 3d spatiotemporal convolutional neural networks

Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions

Learnet: Dynamic imaging network for micro expression recognition

Eulerian motion based 3dcnn architecture for facial micro-expression recognition

Nonlinearities improve originet based on active imaging for micro expression recognition

Joint local and global information learning with single apex frame detection for micro-expression recognition

Enriched longterm recurrent convolutional network for facial microexpression recognition

Dual temporal scale convolutional neural network for microexpression recognition

Off-apexnet on micro-expression recognition system

Shallow triple stream three-dimensional CNN (ststnet) for micro-expression recognition

Dual-inception network for cross-database micro-expression recognition

A neural microexpression recognizer

Capsulenet for micro-expression recognition

A shallow triple stream three-dimensional CNN (ststnet) for micro-expression recognition system

Revealing the invisible with model and data shrinking for composite-database micro-expression recognition

Auassisted graph attention convolutional network for microexpression recognition

A novel graph-tcn with a graph structured representation for micro-expression recognition

From macro to micro expression recognition: Deep learning on small datasets using transfer learning

Imagenet pretrained models with batch normalization

Cross-database microexpression recognition: A style aggregated and attention transfer approach

MER-GCN: micro-expression recognition based on relation modeling with graph convolutional networks

Graph interaction networks for relation transfer in human activity videos

ICE-GAN: identity-aware and capsule-enhanced GAN for micro-expression recognition and synthesis

Rapid facial expression recognition under part occlusion based on symmetric SURF and heterogeneous soft partition network

Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning

Occlusion-adaptive deep network for robust facial expression recognition

Teacher-student training and triplet loss for facial expression recognition under occlusion

Occlusion expression recognition based on non-convex low-rank double dictionaries and occlusion error model

A duality based approach for realtime tv-L 1 optical flow

Evaluation of the spatio-temporal features and GAN for micro-expression recognition system

Facial Action Coding System (FACS): A technique for the measurement of facial action

Facial areas and emotional information

Facial micro-expressions grand challenge 2018 summary

Objective classes for micro-facial expression recognition

Crosscorpus acoustic emotion recognition: Variances and strategies

LBP with six intersection points: Reducing redundant information in LBP-TOP for micro-expression recognition

Imagenet: A large-scale hierarchical image database

Very deep convolutional networks for large-scale image recognition

Deep residual learning for image recognition