key: cord-0044815-l4b4bjyz authors: Guimarães, Augusto Junio; de Campos Souza, Paulo Vitor; Lughofer, Edwin title: Hybrid Model for Parkinson’s Disease Prediction date: 2020-05-15 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50143-3_49 sha: cc7406dc188ddb26d0414b1e249ae2cf971e0a0c doc_id: 44815 cord_uid: l4b4bjyz Parkinson’s is a chronic, progressive neurological disease with no known cause that affects the central nervous system of older people and compromises their movement. This disorder can impair daily aspects of people and therefore identify their existence early, helps in choosing treatments that can reduce the impact of the disease on the patient’s routine. This work aims to identify Parkinson’s traces through a voice recording replications database applied to a fuzzy neural network to identify their patterns and enable the extraction of knowledge about situations present in the data collected in patients. The results obtained by the hybrid model were superior to state of the art for the theme, proving that it is possible to perform hybrid models in the extraction of knowledge and the classification of behavioral patterns of high accuracy Parkinson’s. Parkinson's disease -PD [12] , progressively affects specific areas of the central nervous system composed of the brain and spinal cord, caused by the intense loss of nerve cells in parts of the basal ganglia known as the black substance, located mainly in a small region of the brain mass. These neurotransmitters are responsible for carrying out the voluntary movements of the body automatically, that is, all those in which there is no need to think to perform them, the muscles perform them to the presence of this substance in the brain. Dopamine is one of the main neurotransmitters in the basal ganglia, and its primary function is to intensify nerve impulses to the muscles [12] . In the absence of it, the individual's control is lost, causing characteristic signs and symptoms. The initial phase of the disease presents subtle symptoms such as variation in speech and other notable ones, such as extensive and rhythmic tremors in the hands, stiffness in the muscles and joints making movement difficult, balance, and posture are also gradually compromised. PD is the second most common degenerative disease of the central nervous system, after Alzheimer's disease [6] , so early diagnosis can provide the patient with a significant improvement in the quality of life and control of PD progression. In order to carry out a diagnosis in an efficient and less invasive way to patients, methods that evaluate voice recordings are being used to perform an acoustic analysis on speech signals and tones to identify various diseases ( [18, 21, 29, 33, 38] ). Many works already consider this sound recording technique useful for the creation of intelligent systems or using smart approaches to diagnose the disease, and these approaches allow discriminating healthy people from those with PD ( [2, 3, 7, 29, 30, 36, 41] ). These structures facilitate the decision making of specialists in the correct diagnosis of patients, with a reduction in the risk of failures and a decrease in false-positive diagnoses, and taking into account the time provided to the patient in a medical follow-up. The classification is only possible by Naranjo et al. [28] that made available a database with 195 recordings consisting of people with PD and healthy people. This work proposes the use of a hybrid model, a Fuzzy Neural Network (FNN), in order to extract diffuse rules and maintain a high level of precision for the results obtained, this interpretability allows the extraction of knowledge from the database. The WEKA tool [14] presents the leading machine learning tools, and it will be possible to compare the results of the FNN network with the main machine learning techniques. In addition to the introduction, this article presents the following sections: right after that, the related works (Sect. 2) and the main concepts that guide this research are present. In Sect. 3, concepts of fuzzy neural networks and the architecture used in this paper are presented to the reader. Also, in Sect. 4, Parkinson's detection tests are presented, and finally, in Sect. 5 it presents the conclusions obtained in the paper and future works that may expand what was accomplished in this paper. It can be said that the cause of PD is considered uncertain (first), with more than one factor involved in triggering the disease. These factors can be genetic or environmental. Studies indicate that due to the abnormal accumulation of synuclein (a protein in the brain that helps the communication of nerve cells). These neurotransmitters, called Lewy bodies, can accumulate in various regions of the brain, mainly in the black substance, and consequently interfere with brain function. When performing an early diagnosis, the impacts on quality of life are reduced by monitoring and treatment by specialists [39] . Medication and physical activities are the main assets for reducing the progression of PD. The treatment and early identification of Parkinson's disease have stimulated several scientists to produce renowned academic works. In this paper, the database developed by Naranjo et al. [28] obtained excellent results using Bayesian models. Their studies generated other proposals and approaches with the same database. A summary of these works is listed in Table 1 . Fuzzy neural networks are an example of hybrid models that act in the synergy between interpretability (provided by fuzzy systems) and the ability to generalize training (provided by artificial neural networks) [24] . These models are seen as the union of a fuzzy inference system and a neural aggregation network responsible for carrying out actions of different natures, such as solving problems in the software area [8] , astronomy [10] , and the time series prediction [9, 34] . It should be noted that these models have been working efficiently to become a reference in solving problems in the area of health and human behavior, such as in solving problems related to immunotherapy [20] , breast cancer [32] , ECG [23] , autism detection [13] , in addition to helping in the detection of cognitive and motor problems in children and adolescents [35] . The FNN model presented in this paper acts with three main layers, a fuzzification technique based on data density [17] , training following the concepts of Extreme Learning Machine [16] , and the classification of patterns performed by a singleton neuron that uses the ReLU [27] approach as a function of activation. Its architecture can be seen in Fig. 1 , and its layers and training methodology are explained below. The first layer of the model is responsible for the fuzzification process and the formation of Gaussian-type fuzzy neurons that will compose the model's input structure. All information will be fuzzified using a technique based on data clusters due to its density. Thus, the neurons formed in this layer, represent the data cluster and, in turn, assist in the construction of a more compact FNN architecture and with neurons more significant to the problem. The objective of the clustering technique proposed by Hyde and Angelov [17] is to autonomously determine the clusters to be formed by the data, thus allowing the use of a simple fuzzification process and inspired by the behavior of the problem data. For this, the following concepts are fundamental, such as data distribution data density, radii and distances between reference points, among others. The DDC works initially with the average of all the data to be evaluated. They are recursively calculated in order to decrease the complexity of the fuzzification approach and can be expressed by [17] : where x i represents a sample from the problem, and N is the number of samples [17] . The next step is to calculate the sample density recursively and is expressed by [17] : The concepts reported in the Eqs. 1 and 2 provide the fundamental calculation in the construction of clusters. The density, as a result of the previous calculations, is also recursively calculated. Thus, it is expressed by [17] : Finally, in order to adjust the centers found in the grouping procedure, the following training algorithm is used [17] : where α it is seen as a learning parameter that can be set by the user or using data propagation. In this paper, the second option was defined so that the clustering approach was autonomous, based on the data essence. Therefore, the clustering algorithm in this paper works by defining the sample with the highest density, transforming it into a center, and starting from there, defining the forms of clustering with the adjacent samples. Therefore, cluster C will have centers that will allow the construction of Gaussian neurons in the first layer of the model. They will be constructed based on the final centers found by the DDC approach and a sigma value defined in the unit interval, that is, between 0 and 1. For each input variable x ij , L neurons are defined A lj , l = 1, ... L, whose activation functions are composed of membership functions in the corresponding neurons. Therefore, it is represented by: where ϑ is the membership degree. The number of Gaussian neurons in the first layer will be equal to the number of centers found by the fuzzification technique and these neurons can be expressed by: The second layer is composed of fuzzy neurons capable of expressing knowledge relationships on the database through IF-THEN rules. For this purpose, fuzzy logical neurons are constructed through the aggregation of the Gaussian neurons of the first layer of the model. To perform the proper calculations, operators such as t-norm (product) and s-norm (probabilistic sum) [5] are applied to the aggregation operators, which in the specific case of this paper, is called uninorm [40] . The uninorm [40] is an aggregator that allows functions of t-norm and s-norm to be used within the same context, because depending on a term, sometimes the operator can perform calculations with the product, sometimes with the probabilistic sum. These factors facilitate the construction of more interpretable, contextual rules that represent the problem domain in a clear and precise way. The uninorm format used in this paper is defined by: where T is a t-norm, S is a s-norm and o is the activation of the fuzzy neuron randomly set between [0, 1]. In order for the uninorm to act on fuzzy neurons, two preliminary steps are required. The first procedure is the transformation of the neuron input (a i ) (which in the case of this paper is represented by the Gaussian neuron of the first layer) together with a weight value (w i ), defined in a random between 0 and 1 with Eq. 7, for i =1, ... L. The second step aggregates all the values resulting from the first step through the following Equation: p(w, a, o) = wa +wo (8) wherew represents the complement of w. The fuzzy neuron (z) that uses uninorm to aggregate neurons in the first layer with weights is called a unineuron [22] and can be described by: The z neurons are seen as an aggregation of the first layer to obtain knowledge about the database. In this case, these neurons can be interpreted as a set of fuzzy rules of the IF-THEN type that can be expressed as: The concept of certainty can also be extended to the rule consequent in Eq. 10. In binary problems, Eq. 11 could be seen as an auxiliary process to interpret fuzzy rules. Therefore, in a hypothetical example where z = Class 0 with certainty 0.8 would mean that this rule has certainty 0.8 in Class 0 and 0.2 in Class 1. As the relationship of z neurons occurs through the fuzzification process and the determination of centers, it is concluded that the model that makes up the first two layers of the model can be seen as a fuzzy inference system based on the density of the data evaluated in the model. Where the values of v are defined by training the model and represent the weights that connect the fuzzy inference system with the third layer. Another view for the value of v is that it represents the weight of that rule in the context of the expected output. The third layer of the model merges the fuzzy rules and provides the expected answers to the problem. It is composed of an artificial neuron (can also be seen as a Singleton), which receives the z neurons of the second layer as input and performs the due calculations with a set of weights v obtained analytically. The neuron activation function, responsible for the necessary calculations for the final responses of the model, uses the Leaky ReLU approach [25] . Therefore, the neurons of the third layer can be expressed by: where ω is defined by the leaky ReLU activation function and the sign function represents the signal function. This activation function introduces a β factor to prevent neurons from being discarded when analyzing the problem. Its function is expressed by [25] : where, in this paper, β = 0.001. The process carried out in the third layer can also be seen as the stage of defuzzification of the model, allowing fuzzy relations to return to crisp values. The training of the model is based on the concepts of Extreme Learning Machine, where the parameters of the hidden layer are defined randomly, and the weights used by the neural aggregation network (third layer) are calculated analytically, through the Moore Penrose pseudoinverse. Thus, we can express the obtainment of the weights that connect the fuzzy inference system to the neural aggregation network by the Equation: v = Z + y (14) where Z + is the Moore Penrose pseudoinverse of Z, which is defined by: The model proposed in this paper needs the initial radii value in the fuzzification process and the sigma value of neurons in the first layer. The other values are defined according to the training algorithm. That makes the model simple and easy to adapt to solve the Parkinson's problem. The fuzzification technique can generate some redundant neurons. To avoid this problem, a resampling technique linked to the LARS [11] model is used to select the best neurons, which consequently will be the best rules. This technique, called bolasso [4] , was proposed by Bach and has been widely used in hybrid models for this purpose. It combines several random replications with fuzzy rules and assesses their relevance to the model's expected outputs. Thus it is possible to define a subgroup of fuzzy candidate rules. At each replication, a different number of candidate rules are selected, and a consensus threshold defines the final selection of fuzzy rules. For example, if in 16 replications with the base, four rules were the most significant in 50% of the replications (that is, 8), they are selected to compose the final model. This selection is made before the weights are generated, so it is guaranteed that the weights that connect the cloudy inference system and the model's output layer are generated with the significant rules of the problem. In order to verify the capacity of the model proposed in this paper, standard classification tests will be performed with a database that was worked on by Naranjo et al. [28] . Initially, the database has 48 features. However, for these experiments, the ID dimension, which is responsible for identifying the people in the experiment, was discarded. The collected records were 240, and the database is balanced, as there are 120 people identified with Parkinson's and 120 others without the disease. Figure 2 presents some of the dimensions present in the database, where the blue colors represent healthy people, and the red color identifies a person with Parkinson's. To verify the model's ability to identify people with Parkinson's disease, the dataset will be divided into 70% for training the model, and the remaining 30% will be used to evaluate the results. All samples were normalized and were selected at random. To avoid trends in the test results, 30 repetitions were performed for the model that is part of the test. As it is a database with binary outputs, the evaluation criteria of an intelligent model are well known by the academic community. The following criteria assess the model's ability to correct the diagnosis, the number of false positives and false negatives. where, T P = true positive, T N = true negative, F N = false negative and F P = false positive. The model proposed in this paper used the following values: σ = 1.9 and radii = 0.08 defined through a preliminary 10 k-fold procedure for the following value range of σ = [1.5, 1.6, 1.7, 1.8, 1.9, 2.0] and radii = [0.05, 0.06, 0.07, 0.08, 0.09]. For the experiments in this paper, the replication values were defined as 16, consensus threshold = 60% after cross-validation and 10 -fold tests with the database for training the model 1 . The other models used in the test are listed below: SVM -Support vector machine algorithm [37] is to find a hyperplane in an N -dimensional space that distinctly classifies the data points. 2 MLP -Multilayer Perceptron [26] . It uses training based on the backpropagation technique and has a hidden layer. 3 NB -The Naive Bayes [19] algorithm is a probabilistic classifier based on the Bayes Theorem. 4 C4.5 -Generating a pruned or unpruned C4.5 decision tree [31] . 5 RNT -Random Tree [1] is use for constructing a tree that considers K randomly chosen attributes at each node. 6 The values in parentheses are standard deviations. Simulations were performed on a Core (TM) 2 Duo CPU, 2.27 GHz, with 3-GB RAM. Time is represented by the sum of training time and test (seconds) in each of the models. Neurons represent the most representative neurons after the pruning or regularization of the models. The results presented in the Table 2 present the total accuracy more significant than the other models used in the test and also surpass the original dataset. All test results were evaluated using a statistical test (ANOVA) [15] . With a 95% probability, we can say that all results marked with an '*' are statistically equal concerning the equitable performance of the factors collected between the models analyzed in the test. All premises (normality of residues, homoscedasticity, and independence) were not violated. Therefore, it proves the efficiency of fuzzy neural networks in solving problems. It should be noted that the total fuzzy rules generated initially revolved around 46 fuzzy relations. After the regularization technique, around 5 to 6 rules are used in the 30 executions performed during the procedures reported in the tests. Another relevant factor reported in the results is that the Random Tree algorithm has results very close to the hybrid model, also standing out in AUC and specificity. The model proposed in this paper had better results in sensitivity (the ability of the diagnostic test to detect truly positive individuals, that is, to correctly diagnose patients with Parkinson's disease). So we can say that the model has the best ability to identify people who have the disease. The following fuzzy rule was extracted from the dataset in an experiment with a final accuracy of 84% probability of defining a patient with a Parkinson's diagnosis. IF Gender is female with certainty 0.33 AND/OR Jitterrel is very high with certainty 0.90 AND/OR Jitterabs is very small with certainty 0.12 AND/OR Jit-terRAP is very small with certainty 0.81 AND/OR JitterPPQ is very small with certainty 0.12 AND/OR Shimloc is small with certainty 0.15 AND/OR ShimdB is very high with certainty 0.94 AND/OR ShimAPQ3 is very small with certainty 0.04 AND/OR ShimAPQ5 is very small with certainty 0.74 AND/OR ShiAPQ11 is very small with certainty 0.73 AND/OR HNR05 is very small with certainty 0.72 AND/OR HNR15 is very small with certainty 0.90 AND/OR HNR25 is very small with certainty 0.15 AND/OR HNR35 is very small with certainty 0.83 AND/OR HNR38 is very small with certainty 0.04 AND/OR RPDE is medium with certainty 0.72 AND/OR DFA is medium with certainty 0.07 AND/OR PPE is medium with certainty 0.05 AND/OR GNE is very small with certainty 0.76 AND/OR MFCC0 is medium with certainty 0.87 AND/OR MFCC1 is medium with certainty 0.98 AND/OR MFCC2 is small with certainty 0.65 AND/OR MFCC3 is very small with certainty 0.32 AND/OR MFCC4 is very small with certainty 0.03 AND/OR MFCC5 is high with certainty 0.04 AND/OR MFCC6 is small with certainty 0.65 AND/OR MFCC7 is very high with certainty 0.45 AND/ORMFCC8 is very high with certainty 0.62 AND/OR MFCC9 is very small with certainty 0.05 AND/OR MFCC10 is small with certainty 0.08 AND/OR MFCC11 is very small with certainty 0.14 AND/OR MFCC12 is medium with certainty 0.11 AND/OR Delta0 is medium with certainty 0.10 AND/OR Delta1 is extremely high with certainty 0.64 AND/OR Delta2 is very high with certainty 0.77 AND/OR Delta3 is very high with certainty 0.65 AND/OR Delta4 is very small with certainty 0.75 AND/OR Delta5 is small with certainty 0.66 AND/OR Delta6 is very small with certainty 0.15 AND/OR Delta7 is medium with certainty 0.54 AND/OR Delta8 is medium with certainty 0.02 AND/OR Delta9 is extremely high with certainty 0.32 AND/OR Delta10 is very high with certainty 0.16 AND/OR Delta11 is very high with certainty 0.34 AND/OR Delta12 is extremely high with certainty 0.16 THEN Status is Parkinson with certainty 0.18. The results of the tests carried out corroborate that the fuzzy neural network proposed in this paper can act efficiently in the identification of patients with Parkinson's. That corroborates the high accuracy of the model in identifying intricate patterns within a database, and the fuzzification technique based on the data essence demonstrates that the constructed fuzzy neurons can efficiently represent characteristics of the problem. This paper encourages new works to be elaborated for the construction of rules more representative of the Parkinson's problem, at the same time that it provides the evolution of comparative techniques using all dimensions of the problem. In future work, it is expected to expand the techniques to be used in hybrid models, such as new fuzzification, training, and defuzzification techniques. The continuum random tree Early diagnosis of Parkinson's disease from multiple voice recordings by simultaneous sample and feature selection Developing a large scale population screening tool for the assessment of Parkinson's disease using telephone-quality voice Bolasso: model consistent lasso estimation through the bootstrap On the distributivity of implication operators over T and S norms Parkinson disease, dementia, and Alzheimer disease: clinicopathological correlations Speech disorders in Parkinson's disease: early diagnostics and effects of medication and brain stimulation Incremental regularized data density-based clustering neural networks to aid in the construction of effort forecasting systems in software development Regularized fuzzy neural network based on or neuron for time series forecasting Pulsar detection for wavelets soda and regularized fuzzy neural networks based on and neuron and robust activation function Least angle regression Diagnostic criteria for Parkinson disease A hybrid model based on fuzzy rules to act on the diagnosed of Autism in adults The WEKA data mining software: an update The Elements of Statistical Learning: Data Mining, Inference, And Prediction. SSS Extreme learning machine: theory and applications Data density based clustering A systematic review of depression and mental illness preceding Parkinson's disease Estimating continuous distributions in Bayesian classifiers Pruning fuzzy neural network applied to the construction of expert systems to aid in the diagnosis of the treatment of cryotherapy and immunotherapy. Big Data Cogn Parkinson disease prediction using intrinsic mode function based features from speech signal New uninorm-based neuron model and fuzzy neural networks Adaptive EEG-based alertness estimation system by using ICAbased fuzzy neural networks Neural Fuzzy Systems: A Neuro-fuzzy Synergism to Intelligent Systems Rectifier nonlinearities improve neural network acoustic models Rectified linear units improve restricted Boltzmann machines Addressing voice recording replications for Parkinson's disease detection A two-stage variable selection and classification approach for Parkinson's disease detection by using voice recording replications An analytical method for diseases prediction using machine learning techniques C4. 5: programs for machine learning Using resistin, glucose, age and BMI and pruning fuzzy neural network for the construction of expert systems in the prediction of breast cancer Intensive voice treatment in Parkinson disease: laryngostroboscopic findings Ensemble of evolving data clouds and fuzzy models for weather time series prediction Using hybrid systems in the construction of expert systems in the identification of cognitive and motor problems in children and young people Detecting Parkinson's disease from sustained phonation and speech signals The Nature of Statistical Learning Theory Complexity measures of voice recordings as a discriminative tool for Parkinson's disease Understanding the molecular causes of Parkinson's disease Uninorm aggregation operators Classification of Parkinson's disease utilizing multi-edit nearestneighbor and ensemble learning algorithms with speech samples The authors acknowledge the support by the Austrian Science Fund (FWF): contract number P32272-N38, acronym IL-EFS.