key: cord-0911065-n8tg1iwy
authors: Chitti, Sridevi; Kumar, J. Tarun; Kumar, V. Sandeep
title: EEG Signal Feature Selection Algorithm and Support Vector Machine Model in Patient's Fatigue Recognition
date: 2021-10-05
journal: Arab J Sci Eng
DOI: 10.1007/s13369-021-06206-1
sha: 1e8bda915180ef50abc07ba78282611e43b67279
doc_id: 911065
cord_uid: n8tg1iwy

In March 2020, a cohort of 26 is treated critically ill hospitalized SARS-CoV-2 infected patients who received EEGs to assess unexplained altered mental status, loss of consciousness, or poor arousal and responsiveness. The objective of the present work is to develop a method that is able to automatically determine mental status of vigilance, i.e., a person's state of alertness. Such a task is relevant to diverse domains, where a person is expected or required to be in a particular state of mind. Aiming at the EEG feature selection and classification model in the identification of fatigue driving, the discretization algorithm using rough set theory is proposed to select the channel and EEG signal feature quantities. The support vector machine (SVM) is selected as the fatigue driving recognition model, and the risk of fatigue misjudgment is taken as SVM model parameters for model optimization. The experimental results of subjects show that compared with the principal component method, the rough set discretization algorithm selects fewer features, and the compatibility threshold 0.8. The number of features selected among the candidate features is 208. The features selected by different subjects are different and have an impact on the establishment of the support vector machine recognition model. Fatigue misjudgment risk control parameters can adjust the support vector machine recognition model error judgment risk. Even if the present approach is costly in computation time, it allows constructing a decision rule that provides an accurate and fast prediction of the alertness state of an unseen individual.

With the popularization of automobiles and the rapid development of road transportation, a large number of traffic accidents have caused huge property losses and casualties to countries all over the world. Where driver fatigue in driving is one of the main factors leading to serious traffic accidents [1] . Therefore, effective real-time detection of the driver's fatigue state can undoubtedly reduce the occurrence of accidents and effectively improve the safety factor. The research on driver fatigue monitoring methods has become a current research topic electroencephalography(EEG) can directly reflect the state of brain, can represent the driver's alertness level [2] , and has good real-time performance, becoming one of the important directions for studying fatigue driving detection. Early research methods mainly divided fatigue into subjective evaluation and guest analysis. Subjective evaluation depends on subjective questionnaires to obtain each person's fatigue state. A reliable method of a person's fatigue state, but susceptible to individual differences, makes the evaluation criteria not EEG (electroencephalogram) uniform enough. It has become a gold standard for monitoring fatigue. Eelectrical signals reflect the electrical activity of neurons in the brain and their functional states. An extremely important form of understanding the processing of brain information. Information processing on cognitive activities, such as exercise and mood, provides strong technical support for the study of visual space work in the same way as the virtual reality environment and brain electrical signals [1, 2] . The coding and retrieval process of memory and the results show that the activity of the brain electrical signal increased dur-ing the working memory code process and the energy of α wave in the memory retrieval process was higher than that of the coding. In the literature [3] , the most obvious areas of brain area changes induced by 3D TV by watching the changes in the EEG signal before and after the eyes closed by watching 3D TV. Thus, they studied the EEG signal pole layout method dedicated to the 3D TV health assessment system. Comparative analysis of the relative energy of the three bands of the EEG signal of each channel: α wave, β wave, and θ wave. The results show that the changes in the visual area are the most obvious, and the changes in the frontal area are also more significant. EEG feedback, treated children with Attention-deficit/hyperactivity disorder (ADHD) and designed an enhanced attention system, and found that the combination of the two not only significantly reduced the behaviors of ADHD children such as inattention and mood swings, but also effectively improved their academic performance and IQ. At the same time, early research has proved that the four characteristic waves (α wave, β wave, θ wave, and δ wave) in the EEG signal are closely related to brain activity and show that the energy of the characteristic wave varies with the degree of fatigue and change [2] . In [3] , proposed the concept of fatigue factor R value when studying driving fatigue. It was found that the larger the EEG signal R value, the higher the fatigue degree. In summary, for virtual reality, the research of brain fatigue caused by experience needs urgent attention and solution, but there is no unified and effective evaluation method and means at present.

In this study, brain fatigue caused by virtual reality immersive experience was studied by recording EEG signals. A comparative analysis of α wave, β wave, θ wave, δ wave and R value, center of gravity frequency and other EEG signal parameters and 30 channels of EEG signal changes was performed, using the combination of subjective questionnaire analysis and objective EEG signal analysis to evaluate the fatigue of the subjects, find a more effective method to monitor the brain fatigue brought by the immersive experience of virtual reality, and then establish the evaluation index of brain fatigue. The design improvement and healthy use of virtual reality equipment provide references.

The key to EEG signal is to detect the fatigue state of the driver is the study of feature extraction and recognition methods that can effectively reflect the alertness of the brain. There are generally two current feature extraction methods: One is for a certain type of feature in a specific brain region, for example, [3] [4] [5] [6] and Oz, O1 and Fz, Oz, T 8, etc., selected in the literature δ, θ wave power spectrum feature quantity. The other is to select several feature quantities of multiple channels. At this time, the dimensionality of the feature space formed by these feature quantities is generally very high and then performs dimensionality reduction. For example, the literature [7, 8] uses linear correlation for multi-channel EEG signals to analyze and extract the power spectrum compo-nents with the highest degree of correlation with alertness as model input variables; the literature [9, 10] uses principal component analysis (PCA) to compress the dimensions of the high-dimensional feature space composed of multiple channel EEG power spectra and then build a linear discriminate classification model (LDA) for the lower space composed of principal components; the literature [11] extracts 26 principal components from the EEG signal of 62 channels through PCA analysis and combines the fisher criterion to select the feature that contributes most to classification; the literature [12, 13] is also the main method of using PCA to extract the power spectrum feature and select the component most relevant to brain alertness to establish a model; and the literature [14, 15] uses random forest algorithm to select feature quantities to reduce the input variables of the model. The statistical methods such as PCA used in the above literature compare the normal population effectively, but has great limitations for non-normal populations. The research on the identification and classification model of fatigue driving mainly includes linear classification model [6, 8, 9] Bayesian classification model [5] , random field model [12] , linear dynamic random model [13] , neural network model [4, 6, 10] , fuzzy model [7] and support vector machine model [6, 10, 11] , etc., where, the support vector machine model has attracted attention for its good generalization ability and suitable for small sample learning. Its classic model will be sober and fatigued. The samples are treated equally without distinction.

In order to enable feature selection to better adapt to changes in objects and time environment, and to better establish a fatigue driving recognition model on this basis, this research proposes a feature selection algorithm based on rough set discretization and controls. The probability that the fatigue state is misjudged as the awake state is used as the control parameter of the SVM model.

Existing research results show that human brain electrical signals will have different rhythmic distributions in different brain regions during waking and sleepy. Therefore, the rhythm energy or power spectrum of brain electrical signals has been used as the characteristic quantity to describe the tired state is widely used [3] [4] [5] [7] [8] [9] [10] [11] [12] [13] [14] [15] . This study also uses the power spectrum-based quantity as the fatigue driving identification characteristic quantity.

The rhythm energy is calculated by the integral of the power spectral density of the EEG signal on the corresponding frequency band

where P s ( f ) is the power spectral density function. The integral limit has the upper and lower limits of the rhythm wave frequency band (i.e., f 2 and f 1 ). Generally, the periodogram method is used to calculate the power spectrum in the digital domain. Typical rhythm waves of EEG signals include δ(0

, below for the sake of brevity; it is still used to represent the corresponding frequency band energy.

When a person is fatigued or awake, the changes in the rhythm wave of the brain electrical signal are often manifested in the changes in the distribution of the cerebral cortex. Therefore, the selection of the electrode position of the brain electrical signal and the corresponding characteristics of the rhythm wave is the basis for the correct recognition of the brain state. In order to make the selection more objective and effective, the author uses the discretization algorithm of rough set theory to select the spatial location of the collected EEG and the corresponding EEG characteristics.

Assume that the logical system corresponding to the selected feature quantity is T :

, the goal is to establish a modified classification interface. For this reason, a set of polylines are fitted to it; the coordinate points of these polylines constitute a set of points of coordinates. Therefore, the input space is divided into different areas by selecting points for the input variable coordinates. In each area, as far as possible, only the same type of samples is included (See Fig. 1 ). Therefore, selecting the best feature amount is regarded as selecting the appropriate feature amount, after selecting the appropriate points to divide the feature space, that the maximum probability that each area contains the same class.

In order to realize the feature selection according to the above ideas, the data discretization algorithm in rough set theory is used. There are many kinds of rough set discretization algorithms, and from the perspective of computational efficiency, an algorithm based on the importance of sub-points (also called breakpoints) is used [16] . The sample point set S assumed has been divided into the set X 1 , X 2 , . . . , X m for a certain input feature value. Suppose that the sample point set c has been divided into a set X k , the number of samples belonging to 1 and −1 are l 1 (k) and l 2 (k), and the samples 

Then, the feature selection algorithm includes five steps

Step 1: Set the selected feature point set P to be an empty set and construct a candidate feature point set C;

Step 2: Calculate the importance of each sub-point in C if all equal to zero, then go to step 5;

Step 3: Select the most important point c to add to the set P, and remove the point c from C;

Step 4: Calculate the set divided by all sample points under the point P. If each set contains only sample points of the same type, go to step 5, otherwise go to step 2;

Step 5: Output the selected feature set and its point set, end.

Suppose the featured quantity points selected by the above algorithm divide the sample point set S into sets X 1 , X 2 , . . . , X m , the logical system compatibility determined by the sample point at this time as

In the formula, when the sample points in set X k belong to the same class,P(X k ) |X k |, otherwise P(X k ) 0, where |X | represents the number of elements in set X . Compatibility C P represents unrecognizable sample points obviously; all sample points are correctly identified when C P 1 is 0 ≤ C P ≤ 1.

It is also possible to use the compatibility degree CP to reach a certain threshold as the condition for terminating the algorithm, i.e., calculate the CP in step 4 and go to step 5 when the CP is greater than or equal to the set threshold CPO, which helps to improve the selected feature quantity anti-interference.

After the above process is completed, the selected feature quantity is used as the input variable of the fatigue detection model.

The sample data of the feature variables selected by the above process can establish a fatigue detection model. A machine learning model based on the support vector machine (SVM) based on the structural risk minimization criterion is adopted. The risk control of misclassification and misjudgment by the classic support vector machine model is equal, for the classification of wakefulness and fatigue. When awake is misjudged as fatigue, the consequences are much smaller than when fatigue is misjudged as awake. In order to meet this special need, modify the classic support vector machine model' is proposed used to build a linear classification model, there are

In the formula, r is the control parameter satisfies −1 < r < 1.

The coefficient |y k − r | has a weighting effect on the model classification error and ξ k . When r is close to 1, the linear classification model has a large deviation for the sample of y k 1, while the deviation of the sample of y k −1 is small; when r is close to −1, the situation is the opposite; if r 0, the above model is the classic support vector machine model; like the classic support vector machine, C is the model error penalty factor parameter.

The dual problem of the above model is

For non-linear classification models, there are

In the formula, K (.) is the same kernel function as the classic support vector machine. Generally, there are many choices. In the text, it is taken as the Gaussian radial basis kernel function.

For a certain input x, when y ≥ 1 the current moment is awake when y ≤ −1 the current moment is fatigue. For −1 < y < 1, is interpreted as the transition from being awake to fatigue, so it is defined as being awake; fatigue index η for

η 1 means the higher the level of wakefulness; η 0 means the higher the degree of fatigue.

In order to experimentally verify the above algorithms and models, while avoiding or reducing various interference factors as much as possible, the entire experiment and the acquisition of EEG signal data were carried out in the laboratory. The experiment was carried out on five volunteers, and the subjects. It is forbidden to eat spicy foods such as peppers and garlic within the day before the experiment and to ensure a good sleep. During the experiment, keep the environment quiet, and use 3D driving school software to simulate the actual driving process. According to the road conditions and scenes displayed on the monitor, use the corresponding buttons to operate and adjust the driving state and synchronize recording while collecting EEG signals; at the end of the experiment, determine the fatigue state of the subjects in the corresponding period according to the correct rate of the button operation and the video information, as a marker of the fatigue degree of the collected EEG signal. This algorithm and model are for two classification problems; therefore, in the experiment, the EEG data in the fully awake and fully fatigued state were selected for numerical calculation.

The collection of EEG signals is in accordance with the international unified 10 ∼ 20 system standard. The frontal area location of the brain is chosen, the occipital area, and the central area to place the EEG electrodes, i.e., collect F P1, F P2, F3, Fz, F4, C3, Cz, C4, P7, P3, P Z, P4, P8, O1, Oz, O2; 16 channels of EEG data at the position of the experiment. The pass band sampling rate range of the 512 Hz filter is 0.5 ∼ 30Hz. In the following EEG feature calculation, the interval is 2s, and the data length is 4s, that is, adjacent interval data have an overlap of 2s.

For each participant's 16 channels of EEG signal data, the candidate features are composed of the following features of each channel: Total average energy (E), 4 rhythm energy (δ, θ , α, β), 4 normalized rhythms energy δ E , θ E , α E , β E and 4 rhythm energy ratios θ β , θ+α β+α , θ+α β , β α , so there are 16 channels in total and 208 candidate features. In the calculation, the candidate feature data of each subject are approximately divided into two groups, each group contains data corresponding to the two states of awake and fatigue, and one group is used for feature selection and SVM model modeling, the other group is used for model checking or testing. For each subject, the three compatibility thresholds of 0.8, 0.9 and 1 are calculated using the above rough set feature selection algorithm.

This study combines two different video viewing methods, traditional plane and virtual reality, and the design considers 30 EEG subjects. The power spectrum energy changes before and after brain fatigue is compared, and the subjective fatigue feelings of the subjects are combined to analyze and study the brain fatigue problems caused by watching virtual reality videos.

The value of CP and the number of corresponding selected features are shown in Table. 1. It is observed that when CP value is greater than 0.8, the contribution of the newly added feature to the increase of CP drops rapidly. Therefore, in order to improve the fatigue recognition model in the application, the generalization ability is selected as 0.8 (i.e., CP 0.8), so to build the model, only fewer channel features are required. The channels and featured components corresponding to the selected features of the CP threshold as 0.8 are shown in Table 2 .

For a comparison, the same data were calculated using the method of compressing the feature space dimension based on principal component analysis (PCA). In the current literature, the number of features compressed by three kinds of variance cumulative percentage thresholds is listed in Table 3 .

Comparing Tables 1 and 3 , it is observed that the number of features selected based on the above rough set discretization algorithm is much smaller than the number of features based on PCA compression. At the same time, the selected features characterize the corresponding electrode positions, that is, only need to obtain these electrodes. The EEG signal of PCA features after PCA compression cannot correspond to a specific electrode position, and the dimensionality of the feature space is compressed only at the cost of losing some information.

Furthermore, the statistical differences between traditional flat video and virtual reality video in different wavebands and EEG signal parameters at each electrode position indicate that compared to watching traditional flat video, watching virtual reality video affects the right frontal area and left temporal area of the brain more. Leaf area and right occipital lobe area of which the left temporal lobe area is the most obvious. Therefore, the relative energy of the alpha band in the temporal lobe area is reduced, the relative energy of the delta band is increased, and the center of gravity frequency is reduced to analyze the impact of virtual reality video on human fatigue.

Then, it compares and analyzes the EEG signal data that changes with time in the process of watching traditional flat video and virtual reality video. It is seen that in the process of watching virtual reality video, it affects the α and the EEG signal more. β band, and at the same time according to the changes of these two bands, there are significant differences in the electrode position, which can further explain that the right prefrontal lobe, the left and right posterior temporal lobe areas, and the parietal lobe area to the occipital area are more affected in the process of watching virtual reality videos. Therefore, the relative energy of α wave in the right prefrontal lobe, left and right posterior temporal lobe areas, and parietal lobe area to the occipital lobe area fluctuates and decreases with time, and the relative energy of β wave's first increases and then decreases with time. Analyze the effect of watching video time on fatigue.

In this study, by comparing the differences in the EEG signal data at each electrode position between watching traditional flat video and virtual reality video, it is concluded that watching virtual reality video more affects the frontal, temporal and occipital areas. The occipital lobe is the brain. The visual processing area, the temporal lobe is the auditory center of the brain, and the frontal lobe is the main brain area for cognitive functions such as memory, analysis, thinking, and judgment. It shows that watching virtual reality videos requires more related brain areas to participate in spatial images than watching traditional flat videos. Harmony sound positioning and spatial visual information processing and processing. Because watching virtual reality videos requires the left and right eyes to watch the images separately, the three-dimensional images are synthesized through the brain. Therefore, watching virtual reality videos requires more visual information to be processed, and the brain functions are more active and the load is more, when the brain is in a high load state for a long time, it is more prone to fatigue. In [15] , the study that more energy is needed to participate in spatial visual information processing when watching 3D video is proposed. Long-term viewing will increase the cor- relation. The burden of the brain area leads to brain fatigue. Therefore, the possible physiological mechanism that watching virtual reality videos is more fatigued than watching traditional flat videos is because the amount of information that needs to be processed to watch virtual reality videos is more and more abundant, and the activity of brain neurons is intense. The rapid and continuous discharge of brain neurons leads to high-frequency and rapid EEG signals. However, the long-term discharge of neurons leads to increased brain energy consumption. Excessive consumption of energy will inhibit the discharge of neurons, and the EEG signals tend to be flat. Fatigue state. The features retained after the feature selection algorithm are used to establish a support vector machine fatigue detection model. In order to study the influence of the risk control parameter r in the proposed model on the modeling, r 0, 0.2, 0.4, 0.6. Several sets of parameters have been modeling 

The model parameters are determined by the crosschecking algorithm. The learning parameter C in the sum formula γ (12) is established according to the characteristics corresponding to the data of the five subjects in Table 2 , and the support vector machine decision function is established. In order to test the performance of these decision functions, decision function value y ≥ 0 is used as the criterion of awakeness and y < 0 is considered as the criterion of fatigue. The awakeness and fatigue recognition error rate changes of different risk control parameters r are calculated, as shown in Figs. 2, 3. It is seen that as the value of r increases from 0 to 1, the risk of the decision function misjudging the fatigue state as awake is indeed decreasing.

In the study of fatigue driving recognition based on EEG signal feature detection, it is very meaningful to select as few EEG electrodes and effective head electrode positions and features as possible. The author proposes a feature variable selection algorithm based on rough set discretization algorithm, the calculation of the data of 5 subjects shows that when the compatibility threshold is 0.8, the number of features selected is 2 ∼ 4, and the number of principal components selected when the cumulative percentage of variance is 80% using the PCA method is 7 ∼ 9, higher than this method; for the EEG signal data of different subjects, the selected features have certain differences. The simulation calculation of the value of the risk preference control parameter r shows that the support vector machine model with the risk preference control can adjust the probability of misjudging awake state in fatigue. In general, it meets a certain degree of compatibility. Next, the number of EEG signal feature variables selected based on the rough set discretization algorithm is minimal, and the SVM model established on this basis can achieve a high recognition rate. Since the features obtained from the sample data that meet the same degree of compatibility are generally not unique, this algorithm only obtains a minimal solution. How to obtain a more effective solution for the quality of modeling is a problem that needs further research. In addition, people in the car, the degree of fatigue during driving is a gradual process. The alertness change characterized by EEG signals should also be a gradual. This method can only solve the two classification problems at present, so there are certain limitations. How to achieve continuous change of alertness degree of EEG signal feature extraction needs further research.

Traffic accidents involving fatigue driving and their extent of casualties

Intrinsic connectivity networks, alpha oscillations, and tonic alertness: a simultaneous electroencephalography/functional magnetic resonance imaging study

A real-time wireless brain-computer interface system for drowsiness detection

EEG signal analysis for the assessment and quantification of driver's fatigue

Drowsiness monitoring based on driver and driving data fusion

Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm

Adaptive EEG-based alertness estimation system by using ICA-based fuzzy neural networks

Detecting frontal EEG activities with forehead electrodes

Detecting behavioral microsleeps from EEG power spectra

Driver's cognitive state classification toward brain computer interface via using a generalized and supervised technology

EEG-based vigilance analysis by using fisher score and PCA algorithm

Dynamic clustering for vigilance analysis based on EEG

Off-line and on-line vigilance estimation based on linear dynamical system and manifold learning

Vigilance analysis based on fractal features of EEG signals

A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy