key: cord-0071306-tu8tthf2 authors: Shoeibi, Afshin; Sadeghi, Delaram; Moridian, Parisa; Ghassemi, Navid; Heras, Jónathan; Alizadehsani, Roohallah; Khadem, Ali; Kong, Yinan; Nahavandi, Saeid; Zhang, Yu-Dong; Gorriz, Juan Manuel title: Automatic Diagnosis of Schizophrenia in EEG Signals Using CNN-LSTM Models date: 2021-11-25 journal: Front Neuroinform DOI: 10.3389/fninf.2021.777977 sha: 754e560270895f4d73838728568a10e8547f7af0 doc_id: 71306 cord_uid: tu8tthf2 Schizophrenia (SZ) is a mental disorder whereby due to the secretion of specific chemicals in the brain, the function of some brain regions is out of balance, leading to the lack of coordination between thoughts, actions, and emotions. This study provides various intelligent deep learning (DL)-based methods for automated SZ diagnosis via electroencephalography (EEG) signals. The obtained results are compared with those of conventional intelligent methods. To implement the proposed methods, the dataset of the Institute of Psychiatry and Neurology in Warsaw, Poland, has been used. First, EEG signals were divided into 25 s time frames and then were normalized by z-score or norm L2. In the classification step, two different approaches were considered for SZ diagnosis via EEG signals. In this step, the classification of EEG signals was first carried out by conventional machine learning methods, e.g., support vector machine, k-nearest neighbors, decision tree, naïve Bayes, random forest, extremely randomized trees, and bagging. Various proposed DL models, namely, long short-term memories (LSTMs), one-dimensional convolutional networks (1D-CNNs), and 1D-CNN-LSTMs, were used in the following. In this step, the DL models were implemented and compared with different activation functions. Among the proposed DL models, the CNN-LSTM architecture has had the best performance. In this architecture, the ReLU activation function with the z-score and L2-combined normalization was used. The proposed CNN-LSTM model has achieved an accuracy percentage of 99.25%, better than the results of most former studies in this field. It is worth mentioning that to perform all simulations, the k-fold cross-validation method with k = 5 has been used. Schizophrenia (SZ) is one of the most important mental disorders, leading to disruption in brain growth (Lewis and Levitt, 2002; Schmitt et al., 2011) . This disorder seriously damages thoughts, expression of emotions, and also individuals' perception of reality (Elvevag and Goldberg, 2000) . The reason for SZ is not fully understood, though most research has demonstrated that the structural and functional abnormalities of the brain play a role in its creation (Qureshi et al., 2019) . According to the World Health Organization reports, nearly 21 million individuals suffer from such a brain disorder worldwide. The average age starting to get affected by this disorder is in youth age; in men 18 years old, and women 25 years old, and it is more prevalent among males (Sadeghi et al., 2021) . Numerous methods have been provided for automated SZ diagnosis; among these techniques, neuroimaging-based methods have a special potential for specialist physicians (Li et al., 2021; Yan et al., 2021) . Generally, neuroimaging methods include various structural or functional modalities (Steardo et al., 2020; Hu et al., 2021) . Structural MRI and diffusion tensor imaging-MRI are among the most important modalities of structural neuroimaging, providing important information regarding brain structure to specialist physicians (Sui et al., 2013; Lee et al., 2018; Oh et al., 2020) . Contrarily, electroencephalography (EEG) (Boutros et al., 2008) , magnetoencephalography (Fernández et al., 2011) , functional MRI (Sartipi et al., 2020) , and functional near-infrared spectroscopy (Chen et al., 2020) are the most important functional modalities of the brain. These modalities provide vital information on brain function to specialist physicians. EEG is one of the most practical and inexpensive functional neuroimaging modalities, specifically capturing the interests of specialist physicians. In this modality, the electrical activities of the brain are recorded from the head surface with a high temporal resolution and an appropriate spatial resolution, which is influential in SZ diagnosis (Murashko and Shmukler, 2019) . In addition to the mentioned merits, EEG signals regularly have various channels recorded in the long term (Murashko and Shmukler, 2019) . In some cases, these reasons make specialist physicians face serious challenges in SZ diagnosis via EEG signals. In recent years, various investigations have provided automated SZ diagnosis via EEG signals using artificial intelligence (AI) methods (Prasad et al., 2013; Shim et al., 2016; Chu et al., 2017; Alimardani et al., 2018; Devia et al., 2019; Jahmunah et al., 2019; Li et al., 2019; Naira and Alamo, 2019; Oh et al., 2019; Phang et al., 2019a,b; Aristizabal et al., 2020; Luo et al., 2020; Prabhakar et al., 2020; Shalbaf et al., 2020; Siuly et al., 2020; Sharma et al., 2021; Singh et al., 2021; Sun et al., 2021) . The AI investigations in this field include conventional machine learning (ML) and deep learning (DL) methods Shoeibi et al., 2020 Shoeibi et al., , 2021 . The AI-based SZ diagnosis algorithm includes preprocessing sections, features extraction and selection, and in the end, classification. Feature extraction is the most important part of SZ diagnosis via EEG signals. In conventional ML, the extracted features from EEG signals are mainly categorized into four groups: time (Diykh et al., 2016) , frequency (Faust et al., 2010) , time-frequency (Madhavan et al., 2019) , and non-linear (Gajic et al., 2015; Shoeibi et al., 2021a) fields. Siuly et al. (2020) used empirical mode decomposition (EMD) in preprocessing step. In the following, various statistical features were extracted from EMD subbands, and the ensemble bagged tree method was used for classification. In another study, Jahmunah et al. (2019) used nonlinear features and support vector machine (SVM) with radial basis function kernel in the feature extraction and classification steps, respectively. Devia et al. (2019) have provided an eventrelated field features-based SZ diagnosis method via EEG signals. Extremely randomized trees (ERT) features were extracted from EEG signals in this effort, and then linear discriminant analysis was used in the classification step. In Prabhakar et al. (2020) , statistical features of steady-state visual evoked potentials were extracted, and in the end, classification has been executed by the k-nearest neighbors (KNN) method. Li et al. (2019) (Shim et al., 2016) . This investigation used sensor-level and source-level features in the feature extraction step and then employed the Fisher's score for feature selection. Ultimately, the SVM method was used in the classification step, and they achieved promising results. In conventional ML, selecting proper feature extraction algorithms for SZ diagnosis is a relatively demanding task, requiring a great deal of knowledge in signal processing and the AI field. To overcome this problem, DL-based methods have been provided in recent years for SZ diagnosis via EEG signals, where feature extraction operations are carried out without surveillance by deep layers (Shoeibi et al., 2021a) . Shalbaf et al. (2020) define a transfer learning model for SZ diagnosis via EEG signals. In this study, the ResNet-18 model has been used for feature extraction from EEG signals. Besides, SVM has been used in the classification step. Some researchers have studied other convolutional network (CNN) models utilization in SZ diagnosis via EEG signals. CNN models have been used in Naira and Alamo (2019) and Oh et al. (2019) for SZ diagnosis, resulting in satisfactory achievements. CNN-recurrent neural network (RNN) models are an important group of DL networks and are significantly popular for their capability of various brain diseases diagnoses via EEG signals. In Aristizabal et al. (2020) , Sharma et al. (2021) , Singh et al. (2021) , Sun et al. (2021) , CNN-long short-term memory (LSTM) models have been used for SZ diagnosis, and the researchers have been able to achieve promising results. In this paper, SZ diagnosis via EEG signals will be investigated by using various proposed DL and conventional MLbased methods. A summary of proposed methods is depicted in Figure 1 . In this study, the dataset of the Institute of Psychiatry and Neurology in Warsaw, Poland, is used (Olejarczyk and Jernajczyk, 2017) . In the preprocessing step, the z-score and L2 normalization techniques will be applied to EEG signals. Next, to classify EEG signals, various conventional ML methods and DL-based proposed models will be used. The conventional ML methods employed, include various classification, SVM (Cortes and Vapnik, 1995) , KNN (Cover and Hart, 1967) , decision tree (DT) (Rokach and Maimon, 2007) , naïve Bayes (Zhang, 2004) , random forest (RF) (Breiman, 2001) , ERT (Geurts et al., 2006) , and bagging (Friedman, 2001) methods. Besides, the proposed DL networks include various one-dimensional (1D)-CNN, LSTM, and ID-CNN-LSTM models for executing the steps from feature extraction to classification. Generally, nine LSTM-, 1D-CNN-, and ID-CNN-LSTM-based DL methods will be investigated in this step. In section Materials and Methods, we described our method in detail. In addition, we outline several baseline methods for comparison purposes in the same section. The statistical metrics to analyze and validate the proposed model are described in section Experiment Results. Experiment results are provided in section Limitation of Study, and some limitations of the proposed method are provided in section Conclusion, Discussion, and Future Works. Finally, a discussion, the conclusion, and future works are represented. This section will discuss the proposed methods for SZ diagnosis via EEG signals and various conventional ML and DL models. First, the proposed dataset will be examined. Then, the preprocessing method of EEG signals will be explained. In the end, conventional ML and DL models will be introduced for SZ diagnosis via EEG signals. This dataset includes recorded EEG signals from 14 females and males with ages between 27.9 and 28.3 years. Besides, 14 normal individuals matched with the patients in terms of age and gender were employed in this institution, and the data recording was carried out (Olejarczyk and Jernajczyk, 2017) . A signal recording was performed with the eyes closed in 15 min for each case. Recording EEG signals was performed by using standard 10-20 with a sampling frequency of 250 Hz (Olejarczyk and Jernajczyk, 2017) . In this study, the used electrodes include Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2. An example of EEG signals of SZ and normal cases is depicted in To preprocess the EEG signals of the mentioned dataset, several steps are used. First, each 19 recorded EEG signal has been divided into overlap-free 25 s frames, each of which includes 6,250 temporal samples. Accordingly, each frame of EEG signals has 6,250 × 19 dimensions. In the following, each EEG frame has been normalized by z-score and L2 methods. The normalization of EEG signals helps the accuracy and performance enhancement in conventional ML and DL models. The proposed conventional ML methods are introduced in this section as a baseline for comparison purposes. The proposed algorithms include SVM (Cortes and Vapnik, 1995) , KNN (Cover and Hart, 1967) , DT (Rokach and Maimon, 2007) , naïve Bayes (Zhang, 2004) , RF (Breiman, 2001) , ERT (Geurts et al., 2006) , and bagging (Friedman, 2001) . Each of these methods will be briefly introduced in the following. Support vector machine (SVM) (Cortes and Vapnik, 1995) is an algorithm that constructs a hyperplane or set of hyperplanes in a high-or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. k-nearest neighbor (KNN) (Cover and Hart, 1967 ) is a classification algorithm where some fixed and small number (k) of nearest neighbors (based on a notion of distance) from the training set are located and used together to determine the class of the test instance through a simple majority voting; that is, the class of the test instance is assigned the data class which has the most representatives within the KNN of that point. Decision trees (DTs) (Rokach and Maimon, 2007) is an algorithm that creates a model that predicts the class of an instance by learning simple decision rules inferred from the data features. The representation of a DT model is a binary tree wherein each node represents a single input variable (X) and a split point on that variable, assuming the variable is numeric. The leaf nodes (also called terminal nodes) of the tree contain an output variable (y) which is used to make a prediction. Naive Bayes (Zhang, 2004 ) is a supervised learning algorithm based on applying Bayes' theorem with the "naive" assumption of conditional independence between every pair of features given the value of the class variable. This means that we calculate P(data|class) for each input variable separately and multiple the results together, for example: P(class | X1, X2, . . . , Xn) = P(X1|class) × P(X2|class) × . . . × P(Xn|class) × P(class) / P(data); where P(A | B) represents the probability of A given B. Random forest (RF) (Breiman, 2001) is an extension of the bagging algorithm where several DT classifiers are fit on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Unlike bagging, RF also involves selecting a subset of input features (columns or variables) at each split point in the construction of trees. By reducing the features to a random subset that may be considered at each split point, it forces each DT in the ensemble to be more different. Extremely randomized trees (ERT) (Geurts et al., 2006) , like RF, is an ensemble of several DT models. However, the ERT algorithm fits each DT on the whole training dataset instead of using a bootstrap sample. Like the RF algorithm, the ERT algorithm will randomly sample the features at each split point of a DT; but instead of using a greedy algorithm to select an optimal split point, the ERT selects a split point at random. Bagging (Friedman, 2001) is an ensemble classifier that fits base classifiers on random subsets of the original dataset and then aggregates their individual predictions (either by voting or by averaging) to form a final prediction. To be more concrete, in bagging, several classifiers are created where each classifier is created from a different bootstrap sample of the training dataset. A bootstrap sample is a sample of the training dataset where a sample may appear more than once in the sample, referred to as sampling with replacement. This section provides various types of 1D-CNN, LSTM, and 1D-CNN-LSTM models for SZ diagnosis via EEG signals. Various types of the suggested 1D-CNN, LSTM, and 1D-CNN-LSTM models will be examined in the following. The higher performance of CNN models in machine vision has led them to be used in time series processing, such as medical signals, leading to successful results (Chen et al., 2019; Mahmud et al., 2021) . The CNN models have important convolutional, pooling, and fully connected (FC) layers (Niepert et al., 2016; Zhang et al., 2019) . In 1D-CNN models, signal time can be considered a spatial dimension, e.g., height or width of a 2D image (Goodfellow et al., 2016) . 1D-CNN models are considered the important rivals of RNN architectures in time series processing. Compared to RNN models, 1D-CNN architectures have lower computational costs (Goodfellow et al., 2016) . In this section, the three proposed 1D-CNN-based models are provided for SZ diagnosis via EEG signals. The details of the first proposed 1D-CNN model are provided in Table 1 . Concerning Table 1 , this model includes nine different layers. The convolutional layers have 64 filters with 3 × 3 dimensions. In addition, various activation functions, e.g., ReLU, Leaky ReLU, and seLU, have been used in convolutional layers, and the related results will be compared in the Experiment Results section. Besides, a max-pooling layer has been used for decreasing dimensions, dropout layers with different rates for the prevention of overfitting, flatten layer for converting a matrix to vector, and in the end, dense layers for classification. The activation function of the final dense layer is of sigmoid type, used for binary classification. ( The architecture of the second proposed 1D-CNN model has three convolutional layers, and their filters' number, kernel size, and activation function have been indicated in Table 2 . In this model, a convolutional layer with a kernel size of 2 has been used. Moreover, this model has four dropout layers with different rates, one flatten layer and two dense layers. The activation function of the first dense layer is of ReLU type, and the activation function of the final dense layer is for sigmoid classification. (C) The third version of 1D-CNN model According to Table 3 , the third proposed 1D-CNN model consists of two convolutional layers with a similar number of filters, kernel size, and activation functions to the previous networks. This model has a max pooling layer with a kernel size of 2. In addition, it takes advantage of dropout with different rates. Similar to previous models, a flatten layer is also used in this model. This model consists of two dense layers, in which the activation functions of the first and second layers are of ReLU and sigmoid type, respectively. Recurrent neural networks (RNNs) are a group of DL models employed in speech recognition (Ogunfunmi et al., 2019) , natural language processing (Deng and Liu, 2018) , and biomedical signal processing (Vicnesh et al., 2020; Baygin et al., 2021) . CNN models are of Feed-Forward types. However, the RNNs have a FeedBack layer, in which the network output returns to the network along with the next input. Because of having internal memory, RNNs memorize their previous input and use it to process a sequence of inputs. Simple RNN, LSTM, and gated recurrent unit networks are three important groups of RNNs (Goodfellow et al., 2016) . In this section, various LSTM models of SZ diagnosis via EEG signals will be proposed. In Table 4 , the details of the first proposed LSTM model consisting of six layers are presented. In this model, an LSTM layer with a kernel size of 100 is employed. Another section of the proposed LSTM architecture consists of two different layers of dropout and rate and two dense layers. In the first and second dense layers, the ReLU and sigmoid activation functions are used. In Table 5 , the details of the second proposed LSTM model consisting of seven layers are presented. In this architecture, an LSTM layer with a kernel size of 50 is added to the previous model. The reason behind this is to examine the effect of adding LSTM layers on SZ diagnosis accuracy via EEG signals. In CNN-RNN models, the convolutional layers are used in the first layers of the model to extract the features and find the local patterns (Goodfellow et al., 2016) . Then, their outputs are applied to RNN layers. Experimentally, the convolutional layers extract the local and spatial patterns of EEG signals better compared to RNNs. Besides, adding convolutional layers to RNN allows a more accurate examination of data. In this section, various CNN-LSTM models for SZ diagnosis will be proposed. The first proposed CNN-LSTM model consists of 11 max, dropout, CNN, LSTM, flatten, pooling, and dense layers. The details of the proposed model are presented in Table 6 . This architecture includes two convolutional layers; three dropout layers with different rates, one Max-Pooling layer, and one flatten layer, one LSTM layer, and finally, two dense layers with ReLU and sigmoid activation functions. In this section, the second proposed CNN-LSTM model will be introduced. This network includes 13 layers, and similar to the previous model, it consists of CNN and LSTM layers whose details are demonstrated in Table 7 and Figure 4 . As can be seen in Table 7 and Figure 4 , the first 10 layers of this proposed model are identical to those of the previous CNN-LSTM model. The dense layer with 50 neurons and the ReLU activation function is used in the 11th layer of this architecture. The 12th layer comprises a dropout with a rate = 0.25. Ultimately, in the 13th layer, the dense layer with a sigmoid activation function for classification is employed. The results of the proposed methods are presented in this section. First, the simulation results obtained from conventional ML techniques for SZ diagnosis via EEG signals are presented and discussed. The original dataset was flattened to have only a vector per sample, and then we used the flattened dataset to train several classification algorithms using the scikit-learn library (Pedregosa et al., 2011) . Namely, we studied the performance of KNNs, DTs, SVMs, and naive Bayes; and three ensemble algorithms (bagging, extremely randomized trees, and RF). The algorithms were trained using the by-default hyperparameters provided by the implementation of the scikit-learn library. Moreover, we studied the impact of z-score normalization (Cheadle et al., 2003) on the performance of the models. All the experiments were conducted in an Intel (R) Core (TM) i7-4810MQ CPU at 2.80 GHz. In Table 8 , the results obtained from conventional classification algorithms for raw input EEG signals or normalized by z-score normalization are indicated. According to Table 8 , the bagging conventional classification algorithms for EEG signals normalized using z-score normalization resulted in the maximum accuracy. Figure 5 shows the ROC curves for ML classification algorithms with different normalizations of EEG signals. The figure on the left shows the results of ML classification methods with z-score normalization; additionally, the ROC curves for ML classification algorithms with z-score + L2 normalization is presented in the figure on the right. We also employed several DL architectures based on CNNs and LSTMs (Goodfellow et al., 2016) , and the combination of both convolutions and LSTM layers. Namely, three CNNs, two LSTMs, and two CNN-LSTM networks (see Tables 1-7 for the concrete architecture of these networks) were studied. We also analyzed the relevance of using three different activation functions (ReLU, Leaky ReLU, and seLU), and the impact of z-score normalization. To avoid overfitting, we applied two regularization techniques that are Dropout (Goodfellow et al., 2016) and weight regularization (Goodfellow et al., 2016) . In particular, dropout was applied after each convolutional and LSTM layer using a dropout value of 0.5, and after dense layers using a dropout value of 0.25. Weight regularization was employed in all the convolutional, LSTM, and dense layers of our architectures using L2 regularization with a value 0.01. The final selected values for batch size and hyperparameters of our networks are all available in Table 9 . All the experiments were conducted using the Keras library (Gulli and Pal, 2017) and using a GPU NVidia RTX2080 Ti. In the following, the results obtained from the DL proposed methods for different activation functions are demonstrated in Tables 10-12. First, the results obtained from the proposed DL method with the Leaky ReLU activation function are demonstrated in Table 10 . As indicated in Table 10 , the second proposed CNN-LSTM model with the Leaky ReLU activation function and combined normalization of z-score with L2 could obtain the maximum accuracy. Table 11 presents the results obtained from the proposed DL method with the seLU activation function. Table 11 indicated that the second proposed LSTM method could result in maximum accuracy. The results of all proposed DL models with the ReLU activation function and z-score and L2 normalizations are presented in Table 12 . According to Table 12 , it can be seen that compared to all classification methods with different activation functions, the second proposed CNN-LSTM model with ReLU activation function and combined normalization technique of z-score and L2 could lead to the maximum accuracy. In the following, the ROC diagrams for the DL models with ReLU activation functions and z-score and z-score + L2 normalization methods are drawn in Figure 6 . Firstly, on the left of Figure 6 , the results of the DL algorithms with z-score + L2 normalization are presented. Also, the ROC curves for DL algorithms with the z-score normalization for EEG signals is shown in the right of Figure 6 . Furthermore, learning curves of the CNN-LSTM method with ReLU activation and z-score normalization and also with z-score + L2 normalization are shown in Figures 7, 8 , respectively. The simulation results of the proposed models for SZ diagnosis via EEG signals were investigated in this section. Compared to all DL and conventional ML methods, the CNN-LSTM models with 13 layers have higher accuracy and efficiency among the proposed methods. Selecting the number of layers in this model and the type of the activation functions are presented in this research for the first time, which is the novelty of the article. Besides, simultaneously using zscore and L2 normalizations along with the proposed CNN-LSTM model is another novelty of this article. Figure 9 shows the DL models with different activation functions and z-score normalization. Also, Figure 10 displayed the DL architectures with different activation functions and z-score and L2 normalization. According to Figures 9, 10 , the second version of CNN-LSTM with z-score and L2 normalization has the best performance compared to other methods. The limitations of the study are investigated in this section. The available EEG datasets for SZ diagnosis consist of a limited number of cases which has made access to the tools of SZ diagnosis via EEG signals and DL models challenging. The dataset in this research was not used to determine the severity of the disorder but to diagnose the disorder. This dataset is unsuitable for prognosis or early diagnosis, and other appropriate datasets must be gathered for these purposes. Another limitation of this study is that the classifiers are not separately designed and compared for different age and gender groups, and other suitable datasets must be gathered for this purpose. Classifiers are of the two-class type and can become multiclass by adding the classes of brain disorders with similar symptoms to SZ. techniques and also DL models (Martinez-Murcia et al., 2019; Górriz et al., 2020; Gorriz et al., 2021; Jiménez-Mesa et al., 2021) . The AI models for SZ diagnosis via EEG signals consist of the following steps: dataset selection, preprocessing, feature extraction and selection, and classification. In this study, the dataset consisted of EEG data of 14 normal individuals and patients with SZ (Olejarczyk and Jernajczyk, 2017) . The EEG signals of this dataset are of a 10-channel type and have a sampling frequency of 250 Hz (Olejarczyk and Jernajczyk, 2017) . In the preprocessing step, first, the EEG signals were divided into 25 s frames. Afterward, z-score and z-score-L2 were used for the normalization of EEG signals. In this section, each frame of EEG signals had a dimension of 19 × 6,250. It should be noted that the preprocessing of EEG signals for the DL models included two z-score and z-score-L2 normalization techniques. Different conventional ML-based classification algorithms were used for SZ diagnosis via EEG signals. In this section, the normalized EEG signals were considered as features to be applied in classification algorithms. The employed classification algorithms included the following methods: SVM (Cortes and Vapnik, 1995) , KNN (Cover and Hart, 1967) , DT (Rokach and Maimon, 2007) , naïve Bayes (Zhang, 2004) , RF (Breiman, 2001) , ERT (Geurts et al., 2006) , and bagging (Friedman, 2001) . The bagging classification via EEG signals normalized using z-score could obtain an accuracy of %81.22 ± 1.74, which is the highest accuracy compared to other classification methods. In the following, different DL methods of SZ diagnosis via EEG signals were employed. The proposed DL methods in this section included three 1D-CNN architectures, two LSTM models, and ultimately two 1D-CNN-LSTM networks. Different activation functions, namely, Leaky ReLU, seLU, and ReLU were used to implement the proposed DL models. Besides, in all models, the sigmoid activation function was used for classification. The results of DL models for different normalization methods and activation functions were indicated in Tables 10-12. Among the proposed DL models, the 1D-CNN-LSTM architecture consisting of 13 layers with the ReLU activation function and z-score + L2 normalization could obtain an accuracy of %99.25 ± 0.25. This model is presented for the first time in this research, as this article's novelty. The comparison between the proposed 1D-CNN-LSTM model with the proposed models of the previous studies conducted on SZ diagnosis via EEG signals is indicated in Table 13 . As shown in Table 13 , the model proposed in this research could obtain higher accuracy compared to a vast majority of conducted studies. The proposed model can be implemented on special software and hardware platforms for quick SZ diagnosis via EEG signals and may be employed as an assistant diagnosis method in hospitals. In the following, some future investigations into SZ diagnosis via EEG signals are presented. The CNN-AE models can be employed for SZ diagnosis via EEG signals as the first future work. Several researchers indicate that CNN-AE models are highly efficient in neural disorders via EEG signals (Shoeibi et al., 2021a) . As mentioned in the section of limitation of the study, the dataset used in this study is for SZ disorder diagnosis. However, providing EEG datasets for SZ disorder diagnosis can be of paramount importance for future investigations. One of the future works is to provide classification models based on DL for different age and gender groups, which requires researchers to have access to relevant data. Another future work is using a combination of conventional ML and DL models for SZ diagnosis such that different nonlinear features are extracted from EEG signals first. Afterward, the features are extracted from raw EEG signals by DL models. Ultimately, manual and DL features are combined, and the classification is carried out. Graph models based on DL are one of the new fields in diagnosing brain disorders. Accordingly, in future works, using graph models based on DL can be suitable for SZ diagnosis via EEG signals (Cao et al., 2016) . The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s. Classification of bipolar disorder and schizophrenia using steady-state visual evoked potential based features Handling of uncertainty in medical data using machine learning and probability theory techniques: a review of 30 years Identification of children at risk of schizophrenia via deep learning and EEG responses Automated ASD detection using hybrid deep lightweight features extracted from EEG signals The status of spectral EEG abnormality as a diagnostic test for schizophrenia Random forests Deep neural networks for learning graph representations Analysis of microarray data using Z score transformation A deep learning framework for identifying children with ADHD using an EEG-based brain network Classification of schizophrenia using general linear model and support vector machine via fNIRS Individual recognition in schizophrenia using deep learning methods with random forest and voting classifiers: insights from resting state EEG streams Support-vector networks Nearest neighbor pattern classification Deep Learning in Natural Language Processing EEG classification during scene free-viewing for schizophrenia detection EEG sleep stages classification based on time domain features and structural graph similarity Cognitive impairment in schizophrenia is the core of the disorder Automatic identification of epileptic and background EEG signals using frequency domain parameters Lempel-Ziv complexity in schizophrenia: a MEG study Greedy function approximation: a gradient boosting machine Detection of epileptiform activity in EEG signals based on time-frequency and non-linear analysis Extremely randomized trees Deep Learning Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications A connection between pattern classification by machine learning and statistical inference with the General Linear Model Deep Learning With Keras Structural and diffusion MRI based schizophrenia classification using 2D pretrained and 3D naive Convolutional Neural Networks Automated detection of schizophrenia using nonlinear signal processing methods Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: a review Diagnostic value of structural and diffusion imaging measures in schizophrenia Schizophrenia as a disorder of neurodevelopment Differentiation of schizophrenia by combining the spatial EEG brain network patterns of rest and task P300 Deep learning based automatic diagnosis of first-episode psychosis, bipolar disorder and healthy controls Biomarkers for prediction of schizophrenia: insights from resting-state EEG microstates Time-frequency domain deep convolutional neural network for the classification of focal and non-focal EEG signals Sleep apnea detection from variational mode decomposed EEG signal using a hybrid CNN-BiLSTM Studying the manifold structure of Alzheimer's disease: a deep learning approach using convolutional autoencoders EEG correlates of face recognition in patients with schizophrenia spectrum disorders: a systematic review Classification of people who suffer schizophrenia and healthy people by EEG signals using deep learning Learning convolutional neural networks for graphs A primer on deep learning architectures and applications in speech processing Identifying schizophrenia using structural MRI with a deep learning algorithm Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals Graph-based analysis of brain connectivity in schizophrenia Scikit-learn: machine learning in Python A multi-domain connectome convolutional neural network for identifying schizophrenia from EEG connectivity patterns Classification of EEG-based effective brain connectivity in schizophrenia using deep neural networks A framework for schizophrenia EEG signal classification with nature inspired optimization algorithms Single-trial EEG classification using logistic regression based on ensemble synchronization 3D-CNN based discrimination of schizophrenia using resting-state fMRI Data Mining With Decision Trees: Theory and Applications An overview on artificial intelligence techniques for diagnosis of schizophrenia based on magnetic resonance imaging modalities: methods, challenges, and future works Diagnosis of schizophrenia from R-fMRI data using Ripplet transform and OLPP Schizophrenia as a disorder of disconnectivity Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals DepHNN: a novel hybrid neural network for electroencephalogram (EEG)-based screening of depression Machine-learning-based diagnosis of schizophrenia using combined sensor-level and source-level EEG features A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in EEG signals Applications of epileptic seizures detection in neuroimaging modalities using deep learning techniques: methods, challenges, and future works Automated detection and forecasting of covid-19 using deep learning techniques: a review Epileptic seizures detection using deep learning techniques: a review Spectral features based convolutional neural network for accurate and prompt identification of schizophrenic patients A computerized method for automatic detection of schizophrenia using EEG signals Application of support vector machine on fMRI data as biomarkers in schizophrenia diagnosis: a systematic review Combination of resting state fMRI, DTI, and sMRI data to discriminate schizophrenia by N-way MCCA+ jICA A hybrid deep neural network for classification of schizophrenia using EEG Data Autism spectrum disorder diagnostic system using HOS bispectrum with EEG signals Mapping relationships among schizophrenia, bipolar and schizoaffective disorders: a deep classification and clustering framework using fMRI time series The optimality of naive Deeplob: deep convolutional neural networks for limit order books This work was supported by the MCIN/AEI/10. 13039/501100011033/ and FEDER "Una manera de hacer Europa" under the RTI2018-098913-B100 project, by the Consejeria de Economia, Innovacion, Ciencia y Empleo (Junta de Andalucia) and FEDER under CV20-45250, A-TIC-080-UGR18, B-TIC-586-UGR20, and P20-00525 projects. AS, NG, and JH: methodology. AS, YK, and JH: software. AS and JH: validation. AS, RA, and JG: formal analysis. AS, DS, and PM: resources. AS and NG: writing-original draft preparation. AS, NG, and RA: writing-review and editing. AS, Y-DZ, SN, and YK: visualization. All authors contributed to the article and approved the submitted version. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.