key: cord-0060976-kt2zxmet
authors: Gao, Ya; Wang, Ran; Xue, Chen; Gao, Yalan; Qiao, Yifei; Jia, Chengchong; Jiang, Xianwei
title: Chinese Fingerspelling Recognition via Hu Moment Invariant and RBF Support Vector Machine
date: 2020-06-13
journal: Multimedia Technology and Enhanced Learning
DOI: 10.1007/978-3-030-51103-6_34
sha: c201f39b5a9dba0c2c15963cc353871523957c66
doc_id: 60976
cord_uid: kt2zxmet

Sign language plays a significant role in smooth communication between the hearing-impaired and the healthy. Chinese fingerspelling is an important composition of Chinese sign language, which is suitable for denoting terminology and using as basis of gesture sign language learning. We proposed a Chinese fingerspelling recognition approach via Hu moment invariant and RBF support vector machine. Hu moment invariant was employed to extract image feature and RBF-SVM was employed to classify. Meanwhile, 10-fold across validation was introduced to avoid overfitting. Our method HMI-RBF-SVM achieved overall accuracy of 86.47 ± 1.15% and was superior to three state-of-the-art approaches.

Around the world, hundreds of thousands of people suffer from hearing impairment [1] . Deafness is a problem that affects people's lives in many aspects because all communication between individuals is through language. Nowadays, continuous advances in mobile technology and new forms of user interaction have made it possible to overcome the communication problems between deaf and healthy people. Even so, the use of sign language is necessary. Sign language (SL) refers to the communication tool that people use in the deaf environment to express a specific meaning according to certain grammatical rules in accordance with the gestures, movements, positions and orientations of their hands. Chinese sign language (CSL) can generally be divided into gesture sign language and fingerspelling language [2] . Gesture sign language is a major supplement of spoken language and plays an auxiliary role in communicating information and feelings with the outside world. Gesture sign language often uses both hands at the same time, supplemented by facial expressions, three main features were extracted from the gesture signature gestures, namely the position of the hand, the direction of the hand signature, and the shape of the hand [3] . It emphasizes the situation, relatively complex. While the latter fingerspelling language emphasizes 30 basic finger languages, including 26 basic pinyin letters and 4 raised tongues, and expresses pinyin or some special meanings through their combination. Fingerspelling language is relatively simple and definite. In addition, the origin of gesture sign language is early, which has experienced three main stages, namely the germination of ancient times, the disunity of modern times and the perfection and unification of modern times. Fingerspelling language originated from the west, through the introduction and learning, gradually realize the Chinese practice. Currently, refinement and standardization are two basic tasks in the development of Chinese sign language.

Sign language recognition (SLR) is the use of computer technology to translate sign language information into text, natural language, audio and other information to facilitate understanding and communication [4] . Moreover, sign language recognition is a significant part of intelligent human-computer interaction. According to the data output, SLR technology can be divided into two categories: sensor-based sign language recognition technology and computer vision-based sign language recognition technology. The former employs wearable devices such as data gloves and EMG signal arms, while the latter mainly adopts depth cameras. Due to practical factors, sign language recognition based on computer vision is more popular. The features commonly used in sign language recognition are hand shape, direction, position and movement track. Most experimental studies have revolved around these features.

There are many methods of sign language recognition, one of which utilizes statistical analysis techniques to derive the various eigenvectors of a sample and then classify it. Hidden markov model (HMM) [5] is a typical representative of this. The second method employs template matching technology, that is, first construct a defined template, then match the original data with the template, and use the similarity as a reference to complete the identification. Another method uses the self-learning and organizational functions of the network to build on a new neural network. However, these methods have shortcomings. HMM relies on complex initialization process and huge computation workload. It requires the existence of successfully detected gesture region and gesture movement, which determines the robustness of the algorithm. The accuracy of traditional template matching methods is not satisfactory. New neural network techniques require large amounts of data and consume more training sessions. To solve these problems, a Chinese sign language recognition method based on wavelet entropy and support vector machine was proposed [6] . Gray-level cooccurrence matrix (GLCM) was also employed to identify sign language [7] . In addition, many experts try to use the depth information of gestures to solve the issue of recognition accuracy. Taiwan sign language recognition system based on Kinect extracted three main features from sign language gestures, namely gesture position, gesture direction and gesture shape [8] . The position of the hand was obtained by input sensor. HMM was introduced to determine the direction of the gesture, and a trained SVM was employed to identify the shape of the gesture, the recognition rate of this system reached 85.14% [9] . Convolutional neural network (CNN) and its variant were also tested to identify fingerspelling [10] . For the problem of hand posture recognition, hand contour is usually adopted because of the simplification of real-time. All of the current related work can be divided globally into software-based and hardware-based solutions.

In this study, a suitable method for Chinese fingerspelling recognition via Hu moment invariant and RBF support vector machine was proposed. Hu moment invariant was employed to extract the image features and accelerate the training. RBF was employed to enhance classification due to its effective performance. Meanwhile, the experiment was carried out on 10-fold cross-validation to prevent overfitting.

We acquired 1320 Chinese fingerspelling language samples to construct experiment materials, which were from 44 volunteers. Following 30 categories in Chinese fingerspelling language, every volunteer simulates action one time, thus totally 1320 images were gained. All these images were preprocessed by software and normalized to size of 256 Â 256. Figure 1 showed a sample from one volunteer.

Hu moment was proposed by Hu. M.K in 1962, which has translation, rotation and scale invariance. For images with gray distribution of f x; y ð Þ, the p þ q ð Þ order invariant moment is defined as: The central moment of order p þ q ð Þ is:

where moment center is x 1 ; y 1 ð Þ, and centroid of the component is as follows:

For the digital image, discretization means that in the discrete state, the formula of p þ q ð Þ order ordinary moment and central moment of f x; y ð Þ is as follows:

When the image changes, M pq also changes. Although C pq has translation invariance, it is still sensitive to rotation [11] [12] [13] . To address these problems, we normalize the central moment,

Directly represented by ordinary moments or central moments, features can't be translated, rotated, bonus scaled invariant at the same time. If the normalized center distance is used, the feature not only has translation invariance but also proportional invariance.

The Hu moment constructs seven invariant moments using the second and third order center distances. They keep the translation, zooming and rotation unchanged in consecutive pixels, making the image more efficient to optimize. This is the seven invariant moments of the Hu moment:

In short, the Hu moment is the center x 1 ; y 1 ð Þ of the image obtained from the ordinary moment M pq , and then defines the center distance C pq , normalizes the center distance to obtain the normalized center distance N pq . In addition, together with the normalized second-order moment of the center distance and the third moments, seven invariant moments are formed.

Among supervised classification methods, Support Vector Machine (SVM) [14] is the latest classification method based on machine learning theory. But it is a linear classifier which just uses one hyperplane to divide samples in D-dimension into two categories, thus it has a poor fitting effect [15] [16] [17] [18] [19] . For the purpose of improving the fitting effect, kernel-SVM was proposed. It is an innovation of SVM which can solve the problem of the disability of separating the practical data with complex distribution in supervised learning by combining several hyperplanes. Kernel-SVM extends original linear SVM to nonlinear SVM classifier which represents many machine learning algorithms by using the dot product between samples [20] [21] [22] [23] [24] .

Based on the above theory, the linear function of SVM can be redefined as follows:

Here, c i denotes coefficient vector,x i ð Þ indicates training sample. At the same time, we can substitute u x ð Þ with the output of the eigenfunction x, and introduce a kernel function to replace the dot product. The equation is as follows:

Where Á denotes the dot product. Thus, we can use the substitution function to predict.

The function f x ð Þ is nonlinear and it is completely equivalent to preprocessing inputs using u x ð Þ and then learning a linear model in the new transformation space. After transformation, the kernel-SVM is allowed to fit the maximum-margin hyperplane in a transformed feature space. The transformation can be nonlinear and the transformed space can be high dimensional, but after a series of transformations and replacements, although the classifier is a hyperplane in the high-dimensional feature space, it may be nonlinear in the original input space.

Based on the advantages that kernel trick has, it guarantees the effective convergence of optimization techniques for learning nonlinear models and the implementation of kernel function. k x; x i ð Þ À Á is much more efficient than constructing u x ð Þ and then computing the dot product, so we introduced the kernel function into the kernel-SVM.

Among all of kernel functions, Gaussian kernel is the most popular used kernel which usually be utilized to solve two-dimensional time-independent convection-diffusivereaction equations with inhomogeneous boundary conditions. It is also called as Radial Basis Function (RBF) [25] , which can be expressed as follow:

Here T represents standard normal density. RBF means that its value decreased in the direction in which x i ð Þ radiates outward from x. RBF also can be defined as the following formula in details.

Where b is a parameter which needs to be tuned. To conclude, kernel-SVM has a plenty of advantages which are summarized as follows: (1) it guarantees the effective convergence of optimization techniques for learning nonlinear models; (2) the implementation of kernel function is much more effective than constructing and computing the dot product; (3) it has the faster speed to training samples and classifying samples, besides, it's memory requirement is relatively low; (4) it has less adjustable parameters; (5) convex quadratic optimization is utilized in training, which provides global and unique solution and avoids convergence to the local minimum. We do not use deep learning in this paper, since our dataset is small and not suitable for training a deep neural network model [26] [27] [28] [29] [30] .

We carried out the experiment on a platform of the personal computer installed Windows 10 operating system with Core i7 CPU and 16 GB memory. Overall accuracy (OA) was introduced to evaluate the results, which denotes the value of correct prediction over all test sets divide the total numbers.

In order to obtain the parameters of top optimal kernel-SVM, trial-and-error algorithm is commonly used. Meanwhile, 10-fold cross validation [31] is introduced to validate the performance. Cross validation is a practical method for cutting data samples into smaller subsets. In the 10-fold cross validation, the obtained data samples are randomly divided into 10 equal-sized subsamples. Among the 10 sub-samples, one sample is retained as validation data for testing the model, and the remaining 9 samples are used as training data. Then, the cross validation was repeated 10 times, and each sample was used as a test sample. The average of the 10 results is used as an estimate of the accuracy of the algorithm. (See Fig. 2) The advantage of 10-fold cross validation is that all data samples are trained and verified, and the subset is verified only once, and the error is greatly reduced during the experiment.

In this experiment, we utilized this proposed method, called HMI-RBF-SVM, in which Hu moment invariant and radial basis function support vector machine were employed. The results of 10 Â 10-fold cross validation are listed in Table 1 . It can be observed that the highest accuracy in column Total (Overall Accuracy) achieves 88.79% that has been highlighted with bold font. Another marked number represents the highest accuracy of single run, which reaches 93.18%. Finally, the value of means and standard deviation based on 10 runs is 86.47 ± 1.15%, which indicates the results are effective and accepted. 

We compared RBF-SVM with traditional SVM and polynomial SVM. All the parameters were chosen by trial-and-error method. As can be seen from Table 2 , Mean ± SD of traditional SVM, polynomial SVM (PSVM) and RBF-SVM are 82.39 ± 1.13%, 85.50 ± 1.09%, and 86.47 ± 1.15%, respectively. In term of accuracy, RBF-SVM gains about four percentage points ahead of traditional SVM and also gets about one point ahead of PSVM. It denotes that the higher accuracy is benefit from radial basis function which guarantees the effective convergence and speeds up the training.

In this experiment, three state-of-the-art approaches were compared with our HMI-RBF-SVM method. As can be seen from Table 3 , WE-SVM [6] , GLCM-MGSVM [7] , and HMM-SVM [9] achieved overall accuracy of 85.69 ± 0.59%, 85.3% and 85.14%, respectively. Different feature extraction methods, such as wavelet entropy, gray-level co-occurrence matrix, hidden markov model and Hu moment invariant were employed in these four approaches. Kernel SVM was applied in GLCM-MGSVM and HMI-RBF-SVM, and another two approaches utilized traditional SVM. Our method is superior others about one percentage point, which indicates that Hu moment invariant maintains image feature when extracting and radial basis function provides the effective convergence to enhance performance of classification. 

In our study, three advanced techniques: Hu moment invariant, RBF-SVM and 10-fold cross validation were applied. As the picture is identified by the feature quantity composed of the Hu moment invariant, the advantage is that the speed can be accelerated, and the disadvantage is that the recognition rate may drop relatively. Therefore, Hu moments invariant are generally used to identify large objects in an image. It describes the shape of the object better, but the texture features of the image cannot be too complicated. Thus, it will be relatively suitable for identifying the hand shape preprocessed. Due to excellent performance of RBF, the Gaussian kernel was chosen. It needs few parameters to tune and employs convex quadratic optimization to train. Particularly, kernel SVM provides unique and global solutions, preventing the convergence to local minima. When compared with traditional SVM, its superiority is highlighted distinctly. The application of 10-fold cross-validation is determined by its easy-to-use properties and the use of all data for training and validation, which will not improve the accuracy of the final classification, but will make the classifier reliable, so it can be generalized to other independent data sets. All these techniques contribute to improving of performance.

This study proposed a novel Chinese fingerspelling recognition via Hu moment invariant and RBF-SVM, carrying out on 10-fold cross validation. Hu moment invariant adapts to the feature traction of fingerspelling images, which accelerates the speed of recognition. Based on radial basis function, kernel SVM guarantees the effective convergence and improves the classification. 10-fold cross validation provides the sufficient confirmation of data samples, reducing the error. This approach gained overall accuracy of 86.47 ± 1.15%, which indicates it has superiority in all four stateof-the-art approaches.

In the future, some contributions shall be tried as follows: (1) other advanced methods such as particle swarm optimization, principal component analysis, deep neural network and transfer learning may be applied in this theme. (2) We shall try to apply our approach to other fields. Overall accuracy WE-SVM [6] 85.69 ± 0.59 GLCM-MGSVM [7] 85.3 HMM-SVM [9] 85.14 HMI-RBF-SVM (Ours) 86.47 ± 1.15

Hearing impairment, loneliness, social isolation, and cognitive function: longitudinal analysis using english longitudinal study on ageing

Language contact across time: classical chinese on modern public signs

An intelligent arabic sign language recognition system using a pair of LMCs with GMM based classification

3D sign language recognition with joint distance and angular coded color topographical descriptor on a 2-stream CNN

Adaptive cooperation of multi-swarm particle swarm optimizer-based hidden Markov model

Chinese sign language identification via wavelet entropy and support vector machine

Isolated Chinese sign language recognition using gray-level co-occurrence matrix and parameter-optimized medium gaussian support vector machine

A position and rotation invariant framework for sign language recognition (SLR) using Kinect

Kinect-based Taiwanese sign-language recognition system

Chinese sign language fingerspelling recognition via six-layer convolutional neural network with leaky rectified linear units for therapy and rehabilitation

Pathological brain detection based on wavelet entropy and Hu moment invariants

Pathological brain detection in MRI scanning via Hu moment invariants and machine learning

Alcoholism detection by medical robots based on Hu moment invariants and predator-prey adaptive-inertia chaotic particle swarm optimization

Parameter investigation of support vector machine classifier with kernel functions

Identification of green, Oolong and black teas in China via wavelet packet entropy and fuzzy support vector machine

Magnetic resonance brain image classification based on weighted-type fractional Fourier transform and nonparallel support vector machine

Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy and fuzzy support vector machine

Dual-tree complex wavelet transform and twin support vector machine for pathological brain detection

Morphological analysis of dendrites and spines by hybridization of ridge detection with twin support vector machine

Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine

Wavelet entropy and directed acyclic graph support vector machine for detection of patients with unilateral hearing loss in MRI scanning

Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation

Detection of dendritic spines using wavelet packet entropy and fuzzy support vector machine

Pathological brain detection by wavelet-energy and fuzzy support vector machine

An interpretation of radial basis function networks as zero-mean Gaussian process emulators in cluster space

Teeth category classification via seven-layer deep convolutional neural network with max pooling and global average pooling

Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation

Alcoholism identification via convolutional neural network based on parametric ReLU, dropout, and batch normalization

Polarimetric synthetic aperture radar image segmentation by convolutional neural network using graphical processing units

Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling

Automated and reliable brain radiology with texture analysis of magnetic resonance imaging and cross datasets validation