key: cord-0060974-jf2i8d0o authors: Wang, Liying; Xu, Zhiqiang title: Hearing Loss Identification via Fractional Fourier Entropy and Direct Acyclic Graph Support Vector Machine date: 2020-06-13 journal: Multimedia Technology and Enhanced Learning DOI: 10.1007/978-3-030-51103-6_24 sha: 7161ee6a23b85fb7307329253e97c135642be5de doc_id: 60974 cord_uid: jf2i8d0o With the risk of hearing loss being higher than before since the digital device is more popular, it becomes more urgent to identify the sensorineural hearing loss from the view of changes in internal brain structure. Based on 180 brain MRI of three categories of hearing loss balanced dataset, one schema with fractional Fourier transform entropy and direct acyclic graph support vector machine is proposed and applied to identify the features and predict the categories of hearing loss. The experiments prove this schema rather promising when the dataset is not large since the overall accuracy is up to 94.06 ± 1.08% which is higher than those of some previous methods in scope of traditional machine learning. Hearing Loss (HL) is a general term for decreased auditory sensitivity, increased hearing threshold, hearing impairment and even hearing deafness. It is estimated by the World Health Organization (WHO) ahead of World Hearing Day (3 March) , the risk of hearing loss is higher and higher up to nearly 50% of people aged 12-35 years or 1.1 billion young people due to prolonged and excessive exposure to loud sounds, including music they listen to through personal audio devices. It is estimated that by 2050 over 0.9 billion people or 10% people will have disabling hearing loss. Moreover, permanent hearing loss can lead to changes in brain structure and function, such as brain signal deterioration, auditory cortex degeneration, loss of neurons and branches of neurons, and reduction of overall brain volume. These structural and functional changes may affect the brain's ability to process and perceive sound, and may lead to cognitive decline. These changes can be captured in Magnetic Resonance Imaging (MRI). But it is still challenging for human to recognize these slight changes without computer aided classification. Therefore, this paper focuses on the identification of Sensorineural Hearing Loss (SHL) whose lesions occur in the cochlea, auditory nerve or auditory center. Since MRI is the key medical modality to check and analyze the brain structure, researchers proposed different machine learning algorithms to identify SHL from the data source of MRI in recent years. Because the medical dataset is relatively small, it is not suitable to directly apply the currently hot method of deep learning neural networks, but to depend on traditional machine learning methods. These works mainly solved the three sub-problems of one machine learning algorithm, which are feature extraction, learning model construction and optimization solution. For example, Ref. [1] [2] [3] extracted images features respectively by applying Fractional Fourier Transform (FRFT) as a 25-dimension vector, Wavelet Entropy (WE) as a 10-dimension vector, FRFT as a 12-dimension vector. They supposed the learning model as a Single Hidden Layer-Feedforward Neutral Network (SHL-FNN), statistical model called Support Vector Machine (SVM). To search the optimal parameters of the learning models, they not only use the traditional back-propagate method, but also with the Levenberg-Marquardt algorithm or fitness-scaling adaptive Genetic Algorithm (GA). These methods demonstrated average overall accuracy as 95%, 95.1% and 95.51%. The strength of these methods is they applied the medium-dimension vectors to express features and obtained the acceptable good classification results. Later, Wang et al. used the discrete wavelet transform [4] and dual-tree wavelet transform [5] to extract entropy feature. Then the overall accuracy to classify reached to 95.31% and 96.17 ± 2.49% respectively. Ref. [6] use the Principal Component Analysis (PCA) and SVM to classify the hearing loss whose overall accuracy reached to 95.71%. In addition, Nayak [7] applied stationary wavelet entropy to input one SHL-FNN classifier to detect the unilateral hearing loss which is better than the biorthogonal wavelet transform and get the accuracies of HC, LHL, and RHL are 96.94%, 97.14%, and 97.35%, respectively. Bao, Nakamura [8] combined Wavelet Entropy (WE) and particle swarm optimization (PSO) approach. Tang, Lee [9] proposed a novel method that combines Tabu search (TS) and PSO method. In addition, Nayeem [10] used wavelet entropy and genetic algorithm to detect hearing loss with the sensitivity for HC of 81.25 ± 4.91%, for left-sided hearing loss of 80.42 ± 5.57%, for right-sided hearing loss of 81.67 ± 6.86%, and an overall accuracy of 81.11 ± 1.34%. Gao, Liu [11] tried to use wavelet entropy and Cat Swarm Optimization (CSO) to identify the hearing loss, the overall accuracy achieved 84.50 ± 0.81%. From the review of articles, we found the works with higher accuracy need higherdimension feature expression. Meanwhile, optimization algorithms need to be designed carefully to avoid the local minimum. In addition, most of the above studies are based on the same brain MRI database with the number of 49 images. Since the scale of training dataset is small the overfitting phenomenon may be inevitable. In order to enlarge the dataset, Jia [12] augmented dataset number from total 49 images to each category 420 images, Nevertheless, they tried to design a deep Stacked Sparse Autoencoder to identify the image of unilateral hearing loss. Although the time consuming of this method is longer than others, the result of the overall accuracy reached to 99.5%. It revealed the trend that deep learning method is rather exciting and promising for future medical classification though it is less interpretable or uninterpretable. In order to explore and improve the identification performance of hearing loss from a clinical MRI dataset, we attempt to increase the dimension of the features extracted and rearrange the learning model for the classification of unilateral hearing loss. Firstly, the Fractional Fourier Entropy is applied to extract the time-frequency domain features that can differ the hearing loss images from healthy hearing images. After that the 36dimension feature vector is input to the Direct Acyclic Graph Support Vector Machine (DAG-SVM) to predict the category label of one MRI whether it is unilateral hearing loss. The contribution of this paper is (i) We applied FRFT to extract features from brain images; (ii) We used an advanced SVM to create the classifier; (iii) Our system shows the superiority of our method to state-of-the-art approaches. The rest of this paper is arranged as followed. Section 2 introduces the methodology and related dataset the experiment involved. Section 3 shows the experiments and results of our work in detail. Section 4 discusses the performance, finally gives the conclusion. We use the dataset which includes 180 images in format of MRI, which come from 60 subjects of each of three categories respectively with left-sided hearing loss (LHL), right-sided hearing loss (RHL) and age-, sex-and education-matched healthy controls (HC). The small size of our dataset cannot provide sufficient data for deep learning [13] [14] [15] [16] [17] [18] [19] [20] . This dataset contains same number of three categories because balanced dataset to favor to learning model. This study on the dataset got all the subjects formal written consent approved by the Ethics Committee of Zhongda Hospital which is associated with Southeast University. In order to label the category of hearing loss, the subject was diagnosed by a pure tone audiometry with six different octave frequencies (2.5, 5.0, 10.0, 20.0, 40.0 and 80.0 kHz) to evaluate the Pure Tone Average (PTA). No subjects used any hearing aid over the impaired ear during the pure tone audiometry. The difference between PTA and the average of ISO standard under the corresponding frequency reflects hearing performance. If the difference is more than 20db, it indicates the hearing losing happened. The images from MRI are processed through the pipeline below. The distinct slice images from the three categories are shown in Fig. 2 . There are many feature descriptors in the field of signal processing, such as image statistical measures, moments [21, 22] , color features [23] , wavelet features [24] [25] [26] [27] [28] , texture features [29] , etc. Fractional Fourier entropy (FRFE) is an effective feature descriptor and can achieve excellent performances. As is known, Fractional Fourier Transform (FRFT) is the general form of traditional Fourier Transform (FT) defined as formula (1) with the parameter a; the range of which is [0, 1]. When a = 1, the FRFT is just the traditional FT on the original signal x. If the parameter a is equivalently transformed to 2u=p; a new parameter u representing rotation angle is introduced, the range of which is [0, p=2]. In order to compute the FRFT, one kernel function K t; u; u ð Þwith three parameters of time t, frequency u and rotation u, is defined as below. FRFT has the advantage to transform a signal to any intermediate one between time domain and frequency domain which is called the unified time-frequency domain [30] . It has been applied in many application field since FRFT was proposed in 1980, such as signal or image processing, detection, recognition and classification [31] . where K t; u; u ð Þ ¼ the diract delta function at ðt À uÞ : Þcotu À 2utcsc uÞ other To extend the above one-dimensional FRFT to two-dimensional FRFT, two parameters a x and a y should be introduced to represent the transformation of each dimension. Therefore, 2D FRFT is denoted as F x; y; u; a x ; a y À Á ¼ F x; y; u; u x ; u y À Á ; in which the parameters u x ; u y are used to express he rotation angles respectively along x and y axes. In our work 2D FRFT is calculated in the image space. We choose 6 angles for u x and u y respectively in range of 0 to p=2 with a step of p=10, which are equivalent to set the parameters of a x and a y respectively from 0 to 1 with the step of 0.2. Therefore, there are 36 FRFT images obtained after 2D FRFT for each MRI. The pixel value of one FRFT image is the modulus of the complex of FRFT. For each of the 36 FRFT images, the FRFT Entropy (FRFE) feature is calculated to represent the spatial frequency spectrum energy of each FRFT image, which is denoted as formula (2). In this study, we use the FRFT programs provided by computer science of Ku Leuven which can be downloaded from the website [32] . The 2D FRFT images of the slice MRI of one subject are shown in Fig. 3 . The corresponding FRFE feature values are shown in Table 1 . Next, we organize the 36 FRFE into a vector to represent the key features of the brain MRI of one hearing loss subject. This vector and its category label will be input into the Direct Acyclic Graph support vector machine (DAG-SVM) as the training dataset. Support Vector Machine is one binary classification model combined empirical loss and structural risk minimization together. It has been successfully used in many tasks such as text classification, signal prediction [33] . Besides, SVMs are proven to be able to achieve better performances than other traditional classifiers [34, 35] . For N-classes multiple classification problem, there are two methods to make use of the binary classification model. The first one of them constructs N binary classification to identify one from the rest. The second one constructs N(N-1)/2 binary classification to identify one from the other. Comparatively the former would encounter errors like label overlap and unknown, while the latter would give definite label. In our work, we will use the latter to construct a three-class classification model using the Direct Acyclic Graph shown as Fig. 4 to organize three binary SVM classifications with the soft margin. The soft-margin binary SVM is described as the formula (3) . In that formula, the input data include the FRFE features of MRI and the hearing loss labels (HC, LHL or RHL) of the subject from the dataset to train three binary SVM models. K-fold cross validation is the most common method to evaluate the classification model performance. Especially for the relatively small dataset it is helpful to overcome overfitting through random equally splitting the dataset into K sets and in order choose one as test set and others as train sets. In addition, the accuracy can be estimated. We run 10 times K-fold cross-validation repetitively to alleviate the random effects, but more repetitions will burden the computation needed. Because there are 180 images in our dataset, if we set the default 10-fold, then each fold only contains 18 images, which is too small. Figure 5 shows the index of 6-fold cross validation, which will repeat 10 times with different initialization. In order to measure the identification performance of one learning model, the Sensitivity of one category and the Overall Accuracy are used. The Sensitivity of one category is defined by the percentage of the correct prediction within one category on the basis of testing dataset. After one K-fold cross-validation run, we can get the prediction category of each testing data. We calculate the sensitivity of categories of LHL, HC, and RHL. The Overall Accuracy is defined by the percentage of the correct prediction of all categories of the testing dataset. After runs 10 times, the average sensitivity and overall accuracy can be calculated. Table 2 showed the pseudo code of proposed algorithm. We carry out the 10 times 6-fold cross-validation on the 180 MRI dataset to identify three categories of hearing loss. The confusion matrix of the three categories are obtained to calculate the sensitivity of each category and overall accuracy. Next we show the statistical results on this small data, but also give the comparison with the related other traditional machine learning methods. As shown in Table 3 , the mean and standard deviation of the sensitivity reach to 93.83 ± 1.77, 94.17 ± 2.12 and 94.17 ± 2.75 for HC, LHL and RHL respectively. From the overall accuracy the mean and standard deviation is 94.06 ± 1.08. To validate the effectiveness of our method, we do experiments about the abovementioned methods of WE+PSO [8] , TS-PSO [9] and CSO [11] to compare with ours. From Table 4 and Fig. 6 , it is evident that the overall accuracy of our method is up to 8% better than those of these previous methods. That proved that the stability of the proposed method is better than others. Meanwhile it shows the FRFE can capture more difference of the texture features of the brain structure in MRI of sensorineural hearing loss. The traditional machine learning is enough to train an acceptable and interpretable learning model on a small dataset. This paper proposes one schema with the fractional Fourier transform entropy and direct acyclic graph support vector machine to identify the features of hearing loss on brain MRI. Based on 180 brain MRI of three categories of hearing loss balanced dataset, the experiments prove the schema, though belonging to the traditional machine learning, still a rather promising direction when the dataset is not large, since the overall accuracy is up to 94.06 ± 1.08% which is higher than those of some previous methods in scope of traditional machine learning. In the future, the deep learning method, such as transfer learning, auto encoder would be explored to find more efficient solution with the time and labor consuming trade-off. Detection of left-sided and right-sided hearing loss via fractional fourier transform Wavelet entropy and directed acyclic graph support vector machine for detection of patients with unilateral hearing loss in MRI scanning Error bar of approach comparison Texture analysis method based on fractional fourier entropy and fitness-scaling adaptive genetic algorithm for detecting left-sided and right-sided sensorineural hearing loss Yudong: Hearing loss detection in medical multimedia data by discrete wavelet packet entropy and single-hidden layer neural network trained by adaptive learning-rate back propagation Preliminary study on unilateral sensorineural hearing loss identification via dual-tree complex wavelet transform and multinomial logistic regression Shuihua: Sensorineural hearing loss detection via discrete wavelet transform and principal component analysis combined with generalized eigenvalue proximal support vector machine and Tikhonov regularization Detection of unilateral hearing loss by Stationary Wavelet Entropy Hearing Loss via wavelet entropy and particle swarm optimized trained support vector machine Hearing loss identification via wavelet entropy and combination of Tabu search and particle swarm optimization Hearing loss detection based on wavelet entropy and genetic algorithm Hearing loss identification by wavelet entropy and cat swarm optimization Three-category classification of magnetic resonance hearing loss images based on deep autoencoder Alcoholism identification based on an AlexNet transfer learning model Cerebral micro-bleeding detection based on densely connected neural network Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation Polarimetric synthetic aperture radar image segmentation by convolutional neural network using graphical processing units Multiple sclerosis identification by convolutional neural network with dropout and parametric ReLU Classification of Alzheimer's Disease via Eight-layer convolutional neural network with batch normalization and dropout techniques Alcoholism identification via convolutional neural network based on parametric ReLU, dropout, and batch normalization Cerebral micro-bleeding identification based on a nine-layer convolutional neural network with stochastic pooling Alcoholism detection by medical robots based on Hu moment invariants and predator-prey adaptive-inertia chaotic particle swarm optimization Pathological brain detection in MRI scanning via Hu moment invariants and machine learning RGB-D image-based detection of stairs, pedestrian crosswalks and traffic signs Combination of stationary wavelet transform and kernel support vector machines for pathological brain detection Application of stationary wavelet entropy in pathological brain detection Tea category identification based on optimal wavelet entropy and weighted k-Nearest Neighbors algorithm Multiple sclerosis detection based on biorthogonal wavelet transform, RBF kernel principal component analysis, and logistic regression Wavelet energy entropy and linear regression classifier for detecting abnormal breasts Shuihua: A pathological brain detection system based on kernel based ELM A heuristic neural network structure relying on fuzzy logic for images scoring Comprehensive survey on fractional fourier transform Calculation of the Fractional Fourier Transform Machine-learning methods for integrated renewable power generation: a comparative study of artificial neural networks, support vector regression, and Gaussian Process Regression A Feature-free 30-disease pathological brain detection system by linear regression classifier A gingivitis identification method based on contrast-limited adaptive histogram equalization, gray-level co-occurrence matrix, and extreme learning machine Acknowledgement. The paper is supported by the educational science plan foundation "in 12th