key: cord-0016310-ths7re57 authors: Mansourian, Mahsa; Khademi, Sadaf; Marateb, Hamid Reza title: A Comprehensive Review of Computer-Aided Diagnosis of Major Mental and Neurological Disorders and Suicide: A Biostatistical Perspective on Data Mining date: 2021-02-25 journal: Diagnostics (Basel) DOI: 10.3390/diagnostics11030393 sha: 07f3e93f675fccb92059934ddcb20cb99cb4a339 doc_id: 16310 cord_uid: ths7re57 The World Health Organization (WHO) suggests that mental disorders, neurological disorders, and suicide are growing causes of morbidity. Depressive disorders, schizophrenia, bipolar disorder, Alzheimer’s disease, and other dementias account for 1.84%, 0.60%, 0.33%, and 1.00% of total Disability Adjusted Life Years (DALYs). Furthermore, suicide, the 15th leading cause of death worldwide, could be linked to mental disorders. More than 68 computer-aided diagnosis (CAD) methods published in peer-reviewed journals from 2016 to 2021 were analyzed, among which 75% were published in the year 2018 or later. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol was adopted to select the relevant studies. In addition to the gold standard, the sample size, neuroimaging techniques or biomarkers, validation frameworks, the classifiers, and the performance indices were analyzed. We further discussed how various performance indices are essential based on the biostatistical and data mining perspective. Moreover, critical information related to the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines was analyzed. We discussed how balancing the dataset and not using external validation could hinder the generalization of the CAD methods. We provided the list of the critical issues to consider in such studies. Mental health is a state of successful cognitive function resulting in adapting to change and coping with everyday stresses of life [1, 2] . Mental disorders refer to a wide range of conditions affecting mood, thinking, and behavior. They could be occasional or chronic [3] . Some major mental disorders include depression, bipolar disorder (BD), and schizophrenia (SZ) [4] . Mental illnesses are globally among the leading causes of disability in Disability Adjusted Life Years (DALYs) [5] . Figure 1 shows the composition of mental disorder DALYs by type of disorder for both sexes combined worldwide from 1990 to 2019 [6] . Depressive disorders (29.74%), followed by anxiety disorders (22.86%), and schizophrenia (11.66%) are the top three contributors to mental disorder DALYs [6] . Among mental disorders, depressive disorders account for 1.84%, anxiety disorders for 1.13%, schizophrenia for 0.60%, and BD for 0.33% of total DALYs [6] . As mentioned in Figure 2 (Source: Institute for Health Metrics Evaluation. Used with permission. All rights reserved.), countries with the highest age-standardized mental disorder DALYs rates were Portugal 2603.92, Greece 2510.55, Greenland 2486.44, Iran 2436. 44 , and Spain 2396.768 DALYs per 100,000, in 2019 [6] . The World Health Organization (WHO) reported that over 450 million people worldwide suffer from mental disorders [7] . Due to the multiplicity of mental disorders and the importance of proper diagnosis and treatment, the need to classify these disorders has always existed and led to the publication of the Diagnostic and Statistical Manual of Mental Disorders (DSM). Its latest version, DSM-5, was released in 2013. Structured Clinical Interview for DSM-5 (SCID-5) is a structured diagnostic interview to diagnose mental disorders according to the criteria characterized in the DSM-5, which a trained clinician should prescribe. This structure specifies the order of the questions, how the questions are worded, and how the subject's responses are classified. The primary diagnosis methods are summarized as the following [45] . SCID is considered to be the commonly used gold standard for a depression diagnosis. Major depressive disorder (MDD) is a type of depression characterized by separate episodes of at least 14 days. Critical symptoms of MDD are depressed mood, loss of interest, weight loss or weight gain without any particular diet, insomnia or hypersomnia, frequent thoughts of death or suicide, decreased ability to concentrate and think, feelings of being worthless and guilty, psychomotor agitation or retardation, feelings of energy loss and indecisiveness. Five or more of the above symptoms, when at least one of them is one of the first two symptoms is required for a depression diagnosis [46] 2.1.2. Bipolar Disorder SCID is used as the gold standard among diagnostic interviews, but its validity will not be known until the discovery of related biomarkers. At least one period of mania is necessary for a specific diagnosis of bipolar disorder I (BD-I), while one hypomania and major depressive episode without a manic episode is essential for bipolar II (BD-II) diagnosis [47, 48] Patients' description of symptoms, mental state tests, and behavioral observations help psychiatrists diagnose schizophrenia based on DSM-5 criteria, which is the gold standard of diagnosis to date. The most important symptoms are delusions, hallucinations, disorganized speech, extremely catatonic behavior, and negative symptoms such as decreased emotional expression. Two or more of these symptoms, when at least one of them is one of the first three symptoms is required for a schizophrenia diagnosis, and each of them should be present for a considerable period within a month [49, 50] . AD is a specific type of dementia. The gold standard hallmarks for definitive diagnosis of AD are cortical atrophy, amyloid-predominant neuritic plaques, and tau-predominant neurofibrillary tangles validated by postmortem histopathological examination. Amyloid precursor protein (APP), presenilin 1 (PSENl), or presenilin 2 (PSEN2) are known causative genes of the AD where genetic tests can show their mutation in early-onset cases. Furthermore, amyloid-based diagnostic tests such as positron emission tomography (PET) and cerebrospinal fluid (CSF) scans can be useful diagnostic tools [51] 2.1.5. Dementia In DSM-5, major neurocognitive disorder (MCD) is considered an alternative term for dementia that was used in previous versions. A significant decrease in the level of the subject's cognitive performance; for example, in learning and memory functions, followed by interference with independent daily activities, is a sign of dementia. Clinical Dementia Rating (CDR) is a cognitive diagnostic assessment widely used as the gold standard for diagnosing dementia. The CDR test is a semi-structured interview with the patient and a trustful informant, consisting of 46 questions, that takes 30-90 min to be completed and must be done by a trained clinician [52] [53] [54] . Validated questionnaires have been used in the literature to diagnose high-risk individuals for suicidal behaviors [55] . Suicide Behaviors Questionnaire-Revised (SBQ-R) is a globalized test for identifying individuals at increased risk of suicidal behaviors, including ideation and attempts [56] . The SBQ-R test was designed based on the SBQ test, a 34-item questionnaire measuring the suicide tendency. It is a self-report test distinguishing between suicidal and non-suicidal subjects. The SBQ-R test includes four Likert-type questions that measure the risk of suicide according to the subject's suicide ideation/attempt during lifetime, suicidal ideation rate in the last year, expressing thoughts of committing suicide with others, and suicidal behavior occurrence probability in the future. Each question has different points from 0 to 6 based on the subject's choice. Two scoring criteria have been proposed so far to classify suicidal and non-suicidal individuals based on SBQ-R results: SBQ-R Item 1 and SBQ-R total score varying between 3 and 18. Clinical and non-clinical samples have an identical cutoff score of 2 in the SBQ-R Item 1. The SBQ-R total score's cutoff scores were 7 and 8 for clinical and non-clinical samples, respectively [42] . There are currently not enough biomarkers in psychiatry to classify disease state from the normal state, so diagnosis mostly depends on patient-physician interactions and questionnaires. Clinical observations based on patient self-reports are subjective and inaccurate even if they are based on DSM-5 criteria since they cannot identify false positives and recognize disorders from risks. This is where artificial intelligence (AI) comes in handy. AI is a general term in psychiatry that denotes the use of advanced computerized techniques and algorithms to diagnose, prevent, and treat mental disorders, such as automatic speech processing and machine learning algorithms applied on electronic medical databases and health records to assess a patient's mental state. AI-based interventions reduce false negative and positive diagnoses and annihilate the stigma associated with mental illness symptoms to the clinician. They are also affordable and have significant benefits for patients suffering from restricted movement due to their symptoms. AI-based methods are not replacing clinicians; they can complement human clinical decisions by providing more comprehensive information to empower the health care system [57, 58] . Here, we provided the literature review of the CAD systems for suicide, neurological disorders, and mental disorders focusing on the sample size, input features, classifiers, type of validations, and their performance indices. We reviewed the works focusing on the diagnosis and prediction of CAD methods proposed in the literature for suicide, neurological disorders, and mental disorders. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [59, 60] was proposed in the literature to enrich and standardize medical reviewer papers [61] . We adopted the PRISMA guideline to select the relevant studies. A literature search of the online database of PubMed between 2016 and 2021 was performed using the terms ("bipolar" OR "bipolar disorder" OR "schizophrenia" OR "suicide" OR "Alzheimer" OR "dementia" OR "major depressive disorder" OR "depression") AND ("machine learning" OR "deep learning") AND "accuracy". The reference lists of the identified publications were also reviewed. Peer-reviewed articles in English on Humans were analyzed. Published studies were included in the review if they met the following criteria: (1) at least a measure of the diagnostic accuracy was provided, (2) at least the classifier, the validation framework, or the validation type were provided. Figure 5 shows a flow diagram describing the study selection process. Among 563 records screened, 71 studies were excluded as irrelevant to the original research question. Among the remaining 492 studies, 424 studies did not meet the eligibility criteria. Thus, 68 studies were included in our analysis. The following characteristics were recorded for each study included in our analysis: publication reference (the first author's surname and the year of publication), the sample size, the case and control groups, input features, classifiers, internal or external validation, type of validation (holdout or resampling), and the diagnostic accuracy. The CAD methods for mental and neurological disorders are listed in Tables 1-7, while the CAD methods for suicide prediction are provided in Tables 8-11 . The validation framework is one of the critical issues in data mining approaches. In "holdout," the most straightforward cross-validation, the data set is randomly assigned to two sets: the training set and the test set. In addition to the data's inefficient use, the method's limitations are pessimistically biased error estimations [127, 128] . Moreover, testing hypotheses proposed by the data are not guarded by this method (type III errors [129] ) as the data may be permuted until there would be an acceptable accuracy on the training and test sets in a "holdout" setting. Therefore, other validation frameworks such as repeated holdout, leave-one-out validation, 0.632+ bootstrap, and cross-validation [130] are preferred. These issues are also addressed in the TRIPOD guideline from a clinical perspective [131] . Choi et al. [104] proposed a framework for early detection of dementia using holdout validation. Moreira et al. [105] presented a hybrid data mining model for the diagnosis of dementia using holdout setting. Lin et al. [97] designed a convolutional neural network (CNN)-based approach to predict mild cognitive impairment to Alzheimer's disease (MCIto-AD) conversion using MRI data with leave-one-out cross-validation (CV). Ding et al. [98] proposed a hybrid computational approach to classify AD with holdout and resampling; synthetic minority oversampling technique (SMOTE). Aidos et al. [101] presented a new methodology to obtain an efficient CAD system for predicting AD using longitudinal information with holdout validation. Li et al. [132] developed a spectral CNN for a reliable AD prediction with 10-fold CV. Sayed et al. [133] designed an automatic system for AD diagnosis with 7-fold CV. The other critical issue is using leave-one-subject-out cross-validation when there are repeated measurements for each subject [134] . Thus, we must take out the entire measurements of a subject from the training set and report the trained system's performance for the test subject. Otherwise, if we use other internal validation methods and perform training and test set random permutations on the entire measurements, rather than subjects, the probability of some measurements of one subject being in the training set and others in the test set is high. If there is a high correlation in such repeated measurements, the accuracy of the diagnosis system is overestimated. To reduce estimation variance, it is preferred to use subject-wise cross-validation with a more extensive test sample size, rather than leave-one-subject-out cross-validation [135] . It is also essential to report various performance indices since they convey critical information that is very important in clinical systems. One of the most important formulas related to the posterior probability is the following [136] : where, Se is the sensitivity, Sp is the specificity, Prev is the prevalence of the disease, D is the positive condition event determined by the gold standard, and E is the test outcome positive event determined by the diagnosis system. The parameter PPV is the disease probability given that the patient test result is positive, which is essential when the system is used in practice. The PPV significantly drops in imbalanced datasets, in which the prevalence of the disease is low. For example, when a CAD with the Se and Sp of 80% and 95% is tested in practice where Prev is 10%, the expected PPV is 64%. The minimum sensitivity of 80% and specificity of 95% [137] , maximum False Discovery Rate (FDR = 1-PPV, Positive Predictive Value) of 5% [138] , and the minimum Diagnostic Odds Ratio (DOR) of 100 [139] could be considered a reasonable requirement of a reli-able clinical diagnosis system. As a complementary condition, the minimum Negative Predictive Value (NPV) of 95% could be listed [136] . Some of the published works on mental health provided a variety of performance indices. For example, Lee et al. [62] designed a diagnostic model using biomarkers in peripheral blood to diagnose BD-II with a 90% specificity and sensitivity of 85%. Ildiz et al. [73] obtained 94% sensitivity, specificity, and precision of their analytical model to diagnose SZ and BD. Alici et al. [63] proposed the utility of optical coherence tomography (OCT) data to distinguish BD-I patients from controls with a sensitivity of 87.5%, a specificity of 47.5%, positive predictive value (PPV) of 52.5%, and negative predictive value (NPV) of 79.2%. Fernandes et al. [66] reached a sensitivity of 88.29% and specificity of 71.11% for BD vs. control, a sensitivity of 84% and specificity of 81% for SZ vs. control, and sensitivity of 71% and specificity of 73% for BD and SZ. Achalia et al. [74] used multimodal neuroimaging and neurocognitive measures to differentiate BD patients from healthy controls and obtained a sensitivity of 82.3% and specificity of 92.7%. Li et al. [140] obtained a sensitivity of 80.6% and specificity of 86.3% in predicting AD with Actigraphy Data. Li et al. [132] showed that their spectral CNN could achieve a sensitivity of 88.24% and specificity of 95.45% in AD and normal control classification, a sensitivity of 92.86% and specificity of 77.78% in AD and MCI classification, and sensitivity of 84.38% and specificity of 92% in MCI and normal control classification. A machine learning approach was used by Bin-Hezam and Ward [102] to detect dementia and yielded a precision of 91.34%, a sensitivity of 91.53%, and F1 score of 91.41% for dementia vs. non-dementia, a precision of 76.76%, sensitivity of 77.00%, and F1 score of 76.35% for control normal (CN) vs. MCI vs. dementia. Choi et al. [104] proposed a novel framework for dementia identification with an F1 score of 78%, sensitivity of 93.43%, specificity of 89.66%, positive likelihood ratio of 9.0319, a negative likelihood ratio of 0.0732, PPV of 0.5064, and NPV of 0.9917. Chen et al. [117] used ensemble learning to predict suicide attempts/death following a visit to psychiatric specialty care. The sensitivity, specificity, PPV, and NPV of the 90-day prediction model were 47.2%, 96.6%, 34.9%, and 97.9%. Ensemble learning was also used by Naghavi et al. [42] for the prediction of suicide ideation/behavior. The proposed system had the sensitivity, specificity, PPV, and DOR of 81%, 98%, 94%, and 227, respectively. In such examples, various performance indices could provide valuable information about the designed systems' clinical reliability. Otherwise, it is not possible to judge the clinical applications of CAD systems. Following the STARD and TRIPOD guidelines, it is necessary to provide the confidence interval (CI) 95% of the performance indices [141, 142] . Such CI 95% values could identify the reliability of the performance indices estimation [143] . For example, in the study by Shang-Ming Zhou et al. [103] , effective predictors related to hospital admission of dementia patients such as blood glucose were found with a sensitivity of 0.758 (95% CI 0.731-0.785), specificity of 0.759 (95% CI 0.710-0.808), precision of 0.766 (95% CI 0.735-0.797), and negative predictive value of 0.751 (95% CI 0.741-0.761). Xuemei Ding et al. [98] achieved a multiclass accuracy of 0.8 (95% CI 0.67-0.89) to classify Alzheimer's disease severity. Kelvin KF Tsoi et al. [144] showed that the combination of drawing behavioral data and digital platform could be useful in early detection of dementia with a sensitivity of 0.742 (95% CI 0.702-0.779), specificity of 0.724 (95% CI 0.668-0.776), positive predictive value of 0.833 (95% CI 0.804-0.859), and negative predictive value of 0.601 (95% CI 0.562-0.640). Klaus Munkholm et al. [70] demonstrated that a composite marker containing different molecular levels and tissue data is an operational biomarker to discriminate bipolar disorder from healthy subjects with an Area Under the ROC Curve (AUC) of 0.826 (95% CI 0.749-0.904). Utilizing optical coherence tomography, Soner Alici et al. [64] indicated an AUC of 0.688 (95% CI 0.604-0.771) in comparing bipolar disorder and healthy individuals. In 2016, Guoqing Zhao et al. [64] performed a study and mentioned that plasma mBDNF and proBDNF levels were the best biomarkers in identifying bipolar disorder among pa-tients in depressive episodes with an AUC of 0.858 (95% CI 0.753-0.963). In the study by Noa Tsujii et al. [67] , a high AUC of 0.917 (95% CI 0.849-0.985) was provided based on hemodynamic response and mitochondrial dysfunction to diagnose bipolar disorder and major depressive disorder. Naghavi et al. [42] assessed the suicide ideation/behavior performance using different indices and CI 95%. Various inputs were used in the literature for mental and neurological disorder diagnosis. They include, for example, Child Behavior Checklist [145] , serum miRNA [62] , blood serum Raman spectra [73] , optical coherence tomography [63] , blood samples [64, 65] , immune and inflammatory biomarkers in peripheral blood and cognitive biomarkers [66] , blood sample Nuclear Magnetic Resonance (NMR) [69] , optical coherence tomography [64] , MRI [76, 82] , fMRI [103, 114, 118] , rs-fMRI [72, 86] , PET [96] , EEG [79, 81] , steady-state visual evoked potentials (SSVEP) [71] , speech signal [86] , demographics and medical history [102] , or drawing behavior [144] . Moreover, demographic, socioeconomic and medical records [109] , fMRI [40] , Weibo posts [109] , questionnaire and web-based survey [40] , and Reddit social media dataset [126] were used to predict or diagnose suicide ideation, behavior, or death. Functional neuroimaging techniques-such as PET and fMRI-enable mapping the brain's physiology by measuring blood flow, receptor-ligand binding, and metabolism. Such techniques have been recently used in mental health, which improved understanding of the underlying mechanisms [146] . Functional imaging is divided into resting state (e.g., rs-fMRI) and studies in active conditions. On the other hand, structural neuroimaging, such as NMR and MRI, has been widely used to exclude organic brain disease in mental disorders. It was shown in the literature that structural brain imaging is clinically useful to discriminate mental disorders, including SZ, BD, depression (MDD), and AD [147] . Both of the functional and structural-except CT-scan-neuroimaging techniques were shown to be useful for suicided diagnosis [148] . Both techniques have advantages and disadvantages (e.g., spatial versus temporal resolution) [149] , and their combination, a.k.a., multimodal neuroimaging, can yield important insights due to its complementary spatiotemporal resolution [150] . Lei et al. used the combination of MRI and rs-fMRI for diagnosing SZ patients. In this study, the multimodal neuroimaging showed better performance than structural or functional neuroimaging separately [151] . A promising feature for the BD-II diagnosis was introduced by Lee et al. [62] , which was the serum miRNA. In this study, serum expression levels of miR-7-5p, miR-23b-3p, miR-142-3p, miR-221-5p, and miR-370-3p significantly reduced in healthy control compared with BD-II ( Figure 6 ). The diagnostic model with support vector machine (SVM) reached good diagnostic accuracy (AUC: 0.907) when using expression of miRNA miR-7-5p + miR-142-3p + miR-221-5p + miR-370-3p. Perhaps the mostly used features for suicide ideation/attempts prediction are demographics, socioeconomic status (SES), and life-style variables. For example, Jung et al. [113] designed a suicide prediction model for middle and high school students based on the multivariate logistic regression and reached the prediction accuracy of 77.9%. The selected significant features included gender, school grade, city type, academic achievement, living with parents, family SES, father's and mother's education, physical activity, and self-rated weight and health. A variety of classification methods were used in the literature to classify mental and neurological disorders. The support vector machine (SVM) was used to diagnose BD [62] . Partial least squares discriminant analysis (PLS-DA) [66] , k-nearest neighbor [71] , deep convolutional neural network (CNN) [78] , and Fisher linear discriminant (FLD) [86] were used for SZ classification. The multivariate logistic regression (MLR) [67] , deep integrated support vector machine (DISVM) [93] , CNN [94] , and SVM [96] were used to classify depression. The SVM, artificial neural network (ANN), decision tree [106] , and CNN [99] were used for AD/MCI diagnosis. Many classifiers were used for suicide ideation, behavior, or death prediction in the literature, including logistic regression with/without regularization [99] , deep neural networks (DNNs) [104, 125] , decision tree algorithm [99] , SVM [40] , random forests [104, 125] , Gaussian Naive Bayes (GNB) [40] , extreme gradient boosting (XGB) [40] , Cox regression [116] , ensemble learning [117] , elastic net [41] , and long short-term memory convolutional neural network (LSTM-CNN) [126] . Decision tree, or its ensemble extensions such as random forests were frequently used for mental health in the literature [42, [105] [106] [107] [108] 112, 118, 120, 122] . A decision tree is a rule-based system, wherein its simplest form is a clinically interpretable structure for clinicians used in clinical decision analysis [152] . Naghavi et al. [42] used the combination of stability feature selection and stacked ensembled decision trees (Figure 7) for suicide ideation/behavior diagnosis and reached an AUC of 0.9. In this study, a variety of questionnaires and demographic information was used. The classifiers used for mental health could be categorized into two main categories: traditional machine learning (e.g., DA and its variants, SVM, decision tree), and deep learning (LSTM, CNN). A deep neural network (DNN) is an artificial neural network with more than one hidden layer. Unlike many traditional classifiers such as linear discriminative analysis (LDA), SVM, or Decision Tree (DT), where few parameters must be estimated or tuned, DNNs have many tunable variables. Thus, they require massive amounts of data to estimate their parameters accurately. When the available data is limited, various issues must be considered to avoid overfitting [153] . Strategies such as early stopping criteria, data augmentation, dropouts, and regularization are used [154] . Moreover, when the dataset is imbalanced (e.g., the mental disorder classification) specific deep learning techniques must be taken into account [155] . Geometrical augmentation is usually used to increase the image sample size by random rotation, translation, and horizontal flipping. However, it was shown that such augmentations do not necessarily improve the predictive accuracy of the deep learning methods [156] . DNNs were used in the literature for multimodal neuroimaging classification in mental health [157] . Although DNNs are promising, they usually appear to be black boxes. The input is the raw data, and the output is the predicted class, and no internal interpretation is provided. It is problematic since clinicians require proper interpretation of abnormal brain regions, for example, in neuroimaging data [158] . There have been some attempts to visualize the black box of the DNNs in the literature [159] . Statistical models such as MLR and Cox regressions were used in mental health literature [67, 116] . MLR is an extension of the linear regression when the outcome is binary. It not only provides the probability that a sample belongs to an output class, but it also identifies the significant features in the model. Thus, it is also a feature selection method [160] . On the other hand, Cox regressions are time-to-event models where the event of interest (e.g., committing suicide) and the event's time (e.g., the time from the suicide attempt to the previous hospitalization) are essential. Such models are usually used in survival analysis. When a proper threshold is estimated, it is possible to dichotomize the model's continuous output risk for discrimination between output classes [161] . Unlike other classification methods, both MLR and Cox models support mixed-type input data, and no transformation is required to perform on nominal or ordinal data. Bayes' theorem (Equation (1)) was addressed in the literature as a confounding effect of the low prevalence of a disorder on the performance of the CAD systems [162] , even when the AUC is very high [163] . Events such as suicide attempt/death have a low prevalence in the population (e.g., 10.7 per 100,000 individuals [164] ). Other mental and neurological disorders have a relatively low prevalence (e.g., the global prevalence of 1% for SZ [165] ). Thus, they can only be reliably predicted using an extraordinary discrimination capability between higher and lower risk groups. Suppose that a CAD system has a Sensitivity of 90% and a Specificity of 95% based on the cross-validated confusion matrix, which is very good for an imbalanced dataset. The probability that the new subject has the disorder, subject to the positive CAD result, could be estimated using Equation (1) for different disease prevalence (Figure 8 ). For example, with the prevalence of 1% in such disorders, the PPV is only 15%. If the dataset is balanced for the analysis (e.g., 3549 suicide-indicative posts, versus 3652 nonsuicidal posts in [126] ), the PPV is 95% on the analyzed dataset. However, when the system is used in practice (the prevalence of 1%), the PPV drops down to 15%. Thus, the analyzed dataset must resemble the population. It is only preserved when proper sampling and sample size calculation is performed. Among the studies analyzed in Tables 1-11 , some use the EEG signal for diagnosis. In such studies, the number of EEG channels was shown in the tables. It is also necessary to report discriminative features based on the traditional frequency bands as important clinical biomarkers in such studies. It is not enough to show whether the classification system has an acceptable accuracy, as these discriminative features are very important for clinicians. The spatial distribution of such features must also be provided over the skull [166] . In EEG studies, either the resting state [166] or evoked or cognitive functions [167] were used for mental disorders. An example was provided from the comparison between schizophrenia and healthy subjects during cognitive functions in Figure 9 . It showed significantly lower power in gamma, beta, theta, and alpha bands in healthy subjects than schizophrenia patients. It also showed that more or less, it includes the entire brain. In agreement with the theory that schizophrenia is not a lesion of a part of the brain, it is a disconnection syndrome. This disconnection would be expressed in a failure to modulate synchronous activity caused by disturbances in the dopaminergic mechanism [168] . It is hypothesized that information flow across larger cortical networks is projected by low-frequency brain oscillations, while local cortical information processing is represented by high-frequency oscillations [169] . Thus, the interaction between different high-and low-frequency bands, also known as cross-frequency coupling (CFC) (Figure 10 ), could provide valuable insights into brain functions [170] and mental disorder diagnosis [171] . Such a representation is currently used instead of simple energy representation of different frequency bands. However, as the dimension increases, it is essential to select connected or disconnected regions of interest and representative interactions. The EEG amplitude modulation analysis ( Figure 11 ) has been used to diagnose AD [172] . First, the full-band EEG signal was decomposed into five sub-bands (delta, theta, alpha, beta, and gamma). The Hilbert transform was used to extract the envelope of each sub-band signal. A second frequency decomposition was then used based on modulation filters to represent cross-frequency modulation interaction [173] . A single row for each condition is generated by merging data from all channels (reproduced with permission from [171] ). Figure 11 . Signal processing steps used to compute resting EEG spectro-temporal modulation energy (reproduced with permission from [172] ). The modulation frequency bands were shown as m-delta (0.5-4 Hz) or m-theta (4) (5) (6) (7) (8) . The m-delta modulation frequency content in the theta frequency band could discriminate between the healthy normal, mild, and moderate AD ( Figure 12 ). This review focused on the data mining methods proposed in the literature to classify major mental and neurological disorders, namely SZ, BD, MDD, AD, suicide ideation, attempt, or death. More than 68 recently peer-reviewed published journal papers since 2016 were considered, among which 75% were published in the year 2018 or later. Alonso et al. [174] provided a systematic review of the major mental and neurological disorders. However, they analyzed papers published by 2017, and the data mining validation frameworks and methods focused on in our study were not covered in their study. Moreover, other (systematic) reviews were published in the literature on this topic [175] . Jo et al. [153] analyzed deep learning papers on AD diagnosis and prognosis published between January 2013 and July 2018 in which neuroimaging data were used. Librenza-Garcia et al. [176] analyzed machine learning papers on BD diagnosis, personalized treatment, and prognosis published up to January 2017. de Filippis et al. [177] analyzed machine learning methods for structural and functional MRI SZ diagnosis published between 2012 and 2019. Castillo-Sánchez et al. [26] reviewed machine learning methods for suicide risk assessment on social networks from 2010 until December 2019. Although the classifiers, sample size, input features, and their performance were taken into account in such studies, the validation type and framework were not directly analyzed. In addition to not following the related clinical standards such as STARD and TRIPOD, these issues would avoid the widespread application of machine learning methods in practice. Our study has some limitations. First, we only considered PubMed for the search strategy. Other online databases such as ISI, Embase, Google Scholar, and Cochrane Collaboration could improve our initial screening records. We only focused on SZ, BD, depression (MDD), AD, dementia, and suicide. Other significant disorders, including anxiety and headache were not considered. Moreover, we mainly focused on the validation type and framework with the biostatistical perspective. However, feature extraction, selection, and classifiers are essential issues in machine learning. In our study, the epidemiological information from the GBD was provided to identify the importance of such disorders, and the gold standard methods for their diagnosis were briefly reviewed. The CAD systems were classified based on the classification goal, sample size, neuroimaging techniques, the number of channels (in EEG signals), type of validation in terms of internal and external (subject-based) methods, type of validation based on holdout, cross-validation, and resampling methods, the performance index, and its value. We also discussed the importance of reporting a variety of performance indices and their CI 95%. Some frequency-domain features used in the literature were reviewed for major mental and neurological disorders. Some issues must be taken into account for better clinical applications of the CAD systems in this field [136] . A simple and intuitive method must present the classification features' discrimination over the recording electrodes and (or) their interactions. The system must be validated using proper performance indices and statistical tests. The proposed system's clinical reliability must also be identified based on Type I, II, and III errors. The clinical interpretation, using the activity maps (for example), must be provided. The rule-based systems or interaction networks are preferred over black box methods to facilitate clinical interpretation and validation [178] . Standardization (e.g., in terms of the brain frequency bands) and benchmark datasets could facilitate the comparison of the state-of-the-art and thus improve the CAD systems' effectiveness to diagnose major mental disorders, neurological disorders, and suicide. The following issues must be taken into account to improve the clinical application of the CAD systems for mental health: The related standards, including STARD and TRIPOD, must be used. TRIPOD-Artificial intelligence (AI) is now underway due to AI applications in CAD [179, 180] . Proper performance indices must be provided in addition to their interpretation. This issue is especially critical when the database is imbalanced, and some indices could be biased [136] . The CI 95% of the performance indices must be provided. It is especially critical for the AUC. If its CI 95% includes 0.5, the diagnostic method's performance is not significantly better than a random generator. The prevalence of the disorder in the analyzed dataset must resemble its actual prevalence in the population. Otherwise, the performance of the method in practice, a.k.a. PPV, is highly deteriorated. • A proper validation framework must be used to avoid Type III error. External validation is the best method to improve the generalization of the designed CAD. The clinical interpretation of the input features, their ranking, and the classifier structure must be provided for clinicians. The status of mental health promotion Mental Health in Australia: A Quick Guide Statewide Peer Network Development Program for Recovery and Resiliency Grants; Department of Health and Human Services Substance Abuse and Mental Health Services Administration The association between comorbid psychiatric diagnoses and hospitalization-related factors among individuals with schizophrenia The global burden of mental, neurological and substance use disorders: An analysis from the Global Burden of Disease Study Institute for Health Metrics and Evaluation (IHME). GBD Compare Data Visualization The Mental Healthcare Act, 2017: A Ray of Hope The size and burden of mental disorders and other disorders of the brain in Europe Mental health issues and challenges in India: A review Detection of mental disorders with the Patient Health Questionnaire in primary care settings in Nigeria Time to end the distinction between mental and neurological illnesses Global, regional, and national burden of neurological disorders during 1990-2015: A systematic analysis for the Global Burden of Disease Study Comparing 3 T and 1.5 T MRI for tracking Alzheimer's disease progression with tensor-based morphometry Trends in Alzheimer's disease and dementia in the asian-pacific region The neuropathological diagnosis of Alzheimer's disease World health organization's comprehensive mental health action plan 2013-2020 WHO Organization. Others Public Health Action for the Prevention of Suicide: A Framework Suicide and Youth: Risk Factors Suicide and suicidal behaviour Global Burden of Disease Self-Harm Collaborators Global, regional, and national burden of suicide mortality 1990 to 2016: Systematic analysis for the Global Burden of Disease Study Disease Control Priorities in Developing Countries The treatment gap in mental health care Suicide and poverty in low-income and middle-income countries: A systematic review A systematic literature review of technologies for suicidal behavior prevention Suicide risk assessment using machine learning and social networks: A scoping review The growing burden of neurological disorders in low-income and middle-income countries: Priorities for policy making Impairment in role functioning in mental and chronic medical disorders in the United States: Results from the National Comorbidity Survey Replication Impact of psychiatric disorders on health-related quality of life: General population survey Trends in sickness benefits in Great Britain and the contribution of mental disorders A population-based cohort study of the effect of common mental disorders on disability pension awards Primary care psychiatry: Pertinent Arabian perspectives Association between psychological distress and mortality: Individual participant pooled analysis of 10 prospective cohort studies Adolescent suicide and suicidal behavior Suicide and the media The lifetime risk of suicide in schizophrenia: A reexamination Effect of neurological screening on early dementia detection in southern Italy The role of neuroimaging in diagnosis and personalized medicine-Current position and likely future directions Machine learning and stress assessment: A review Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth Predictors of suicide attempt in patients with obsessive-compulsive disorder: An exploratory study with machine learning analysis Accurate Diagnosis of Suicide Ideation/Behavior Using Robust Ensemble Machine Learning: A University Student Population in the Middle East and North Africa (MENA) Region. Diagnostics Medical Big Data: Neurological Diseases Diagnosis Through Medical Data Analysis Computer-Aided Diagnosis Systems for Brain Diseases in Magnetic Resonance Images Structured Clinical Interview for the DSM (SCID) Assessing the Accuracy of Diagnostic Tests The right services, at the right time, for the right people Prospective: Is bipolar disorder being overdiagnosed? Do we have any solid evidence of clinical utility about the pathophysiology of schizophrenia? World Psychiatry Looking for a "biological test" to diagnose "schizophrenia": Are we chasing red herrings? World Psychiatry Plasma p-tau181 accurately predicts Alzheimer's disease pathology at least 8 years prior to post-mortem and improves the clinical characterisation of cognitive decline Predicting Clinical Dementia Rating Using Blood RNA Levels Dementia=(MC)2: A 4-item screening test for mild cognitive impairment and dementia Reliability, and Validity of the Vietnamese Version of the Clinical Dementia Rating A systematic review and evaluation of measures for suicidal ideation and behaviors in population-based research The Suicidal Behaviors Questionnaire-Revised (SBQ-R): Validation with clinical and nonclinical samples Computer-Assisted Psychiatric Diagnosis Artificial Intelligence in Psychiatry How to write a review article? Turk PRISMA Group Preferred reporting items for systematic reviews and metaanalyses: The PRISMA statement Serum miRNA as a possible biomarker in the diagnosis of bipolar II disorder Optical coherence tomography findings in bipolar disorder: a preliminary receiver operating characteristic analysis on ganglion cell layer volume for diagnosis Ratio of mBDNF to proBDNF for Differential Diagnosis of Major Depressive Disorder and Bipolar Depression Towards a blood-based diagnostic panel for bipolar disorder Precision psychiatry with immunological and cognitive biomarkers: a multi-domain prediction for the diagnosis of bipolar disorder or schizophrenia using machine learning Mitochondrial DNA Copy Number Raises the Potential of Left Frontopolar Hemodynamic Response as a Diagnostic Marker for Distinguishing Bipolar Disorder From Major Depressive Disorder. Front Objective smartphone data as a potential diagnostic marker of bipolar disorder Peripheral biomarkers allow differential diagnosis between schizophrenia and bipolar disorder A multisystem composite biomarker as a preliminary diagnostic test in bipolar disorder Classification of bipolar disorder and schizophrenia using steady-state visual evoked potential based features Classification of Unmedicated Bipolar Disorder Using Whole-Brain Functional Activity and Connectivity: A Radiomics Analysis Auxiliary differential diagnosis of schizophrenia and phases of bipolar disorder based on the blood serum Raman spectra A proof of concept machine learning analysis using multimodal neuroimaging and neurocognitive measures as predictive biomarker in bipolar disorder Individualized identification of euthymic bipolar disorder using the Cambridge Neuropsychological Test Automated Battery (CANTAB) and machine learning Anatomical connectivity changes in bipolar disorder and schizophrenia investigated using whole-brain tract-based spatial statistics and machine learning approaches Multi-Site Diagnostic Classification of Schizophrenia Using Discriminant Deep Learning with Identifying Schizophrenia Using Structural MRI With a Deep Learning Algorithm. Front Automatic Detection of Schizophrenia by Applying Deep Learning over Spectrogram Images of EEG Signals Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals Classification of People who Suffer Schizophrenia and Healthy People by EEG Signals using Deep Learning Multisite Machine Learning Analysis Provides a Robust Structural Imaging Signature of Schizophrenia Detectable Across Diverse Patient Populations and Within Individuals Machine-learning-based diagnosis of schizophrenia using combined sensor-level and source-level EEG features Can we accurately classify schizophrenia patients from healthy controls using magnetic resonance imaging and machine learning? A multi-method and multi-dataset study Combination of G72 Genetic Variation and G72 Protein Level to Detect Schizophrenia Generalizability of machine learning for classification of schizophrenia based on resting-state functional MRI data Multimodal Discrimination of Schizophrenia Using Hybrid Weighted Feature Concatenation of Brain Functional Connectivity and Anatomical Features with an Extreme Learning Machine. Front A novel fuzzy rough selection of non-linearly extracted features for schizophrenia diagnosis using fMRI Language in schizophrenia: relation with diagnosis, symptomatology and white matter tracts Deep Convolutional Neural Network Model for Automated Diagnosis of Schizophrenia Using EEG Signals A Computer-Aided Diagnosis System With EEG Based on the P3b Wave During an Auditory Odd-Ball Task in Schizophrenia Bi-objective approach for computer-aided diagnosis of schizophrenia patients using fMRI data A depression recognition method for college students using deep integrated support vector algorithm EEG-based mild depression recognition using convolutional neural network Automatic Interaction Detection Modeling for Predicting Depression in Multicultural Female Students The influence of the rs6295 gene polymorphism on serotonin-1A receptor distribution investigated with PET in patients with major depression applying machine learning Convolutional Neural Networks-Based MRI Image Analysis for the Alzheimer's Disease Prediction From Mild Cognitive Impairment A hybrid computational approach for efficient Alzheimer's disease classification based on heterogeneous data Alzheimer's Disease Neuroimaging Initiative Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer's disease Combining EEG signal processing with supervised methods for Alzheimer's patients classification For the Alzheimer's Disease Neuroimaging Initiative Discrimination of Alzheimer's Disease using longitudinal information Tomas A Machine Learning Approach towards Detecting Dementia based on its Modifiable Risk Factors Mining electronic health records to identify influential predictors associated with hospital admission of patients with dementia: an artificial intelligence approach Deep learning based low-cost high-accuracy diagnostic framework for dementia using comprehensive neuropsychological assessment profiles A hybrid data mining model for diagnosis of patients with clinical suspicion of dementia Quad-phased data mining modeling for dementia diagnosis Predicting Risk of Suicide Attempts Over Time Through Machine Learning Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach Use of a Machine Learning Algorithm to Predict Individuals with Suicide Ideation in the General Population Prediction models for high risk of suicide in Korean adolescents using machine learning techniques Machine learning based suicide ideation prediction for military personnel Machine learning for suicide risk prediction in children and adolescents with electronic health records Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: A machine learning study using Swedish national registry data Machine Learning to Differentiate Risk of Suicide Attempt and Self-harm After General Medical Hospitalization of Women With Mental Illness Reaching Those at Highest Risk for Suicide: Development of a Model Using Machine Learning Methods for use With Native American Communities Detection of Suicide Attempters among Suicide Ideators Using Machine Learning Prospective prediction of suicide attempts in community adolescents and young adults, using regression methods and machine learning Detecting risk of suicide attempts among Chinese medical college students using a machine learning algorithm Assessing the predictive ability of the Suicide Crisis Inventory for near-term suicidal behavior using machine learning approaches A Feasibility Study Using a Machine Learning Suicide Risk Prediction Model Based on Open-Ended Interview Language in Adolescent Therapy Sessions Development of an early-warning system for high-risk patients for suicide attempt using deep learning and electronic health records Detection of Suicide Ideation in Social Media Forums Using Deep Learning Pattern Recognition: A Statistical Approach Statistical Pattern Recognition A k-Sample Slippage Test for an Extreme Population Others Pattern recognition Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration Predicting Clinical Outcomes of Alzheimer's Disease from Complex Brain Networks Alzheimer's Disease Diagnosis Based on Moth Flame Optimization The need to approximate the use-case in clinical machine learning Using and understanding crossvalidation strategies Rigorous performance assessment of computer-aided medical diagnosis and prognosis systems: a biostatistical perspective on data mining The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results An investigation of the false discovery rate and the misinterpretation of p-values Predicting Alzheimer's Disease with Actigraphy Data Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies Using the confidence interval confidently Machine Learning on Drawing Behavior for Dementia Screening Further Evidence of the Diagnostic Utility of the Child Behavior Checklist for Identifying Pediatric Bipolar I Disorder Functional neuroimaging in mental disorders Forty years of structural brain imaging in mental disorders: is it clinically useful or not? Structural and functional neuroimaging studies of the suicidal brain Using structural and functional brain imaging to uncover how the brain adapts to blindness General overview on the merits of multimodal neuroimaging data fusion Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual Clinical decision analysis: Incorporating the evidence with patient preferences Deep learning in Alzheimer's disease: Diagnostic classification and prognostic prediction using neuroimaging data The theory behind overfitting, cross validation, regularization, bagging, and boosting: Tutorial. arXiv 2019 Survey on deep learning with class imbalance The effectiveness of image augmentation in deep learning networks for detecting COVID-19: A geometric transformation perspective Alzheimer's Disease Neuroimaging Initiative. Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer's Disease using structural MR and FDG-PET images Deep learning in mental health outcome research: a scoping review Evaluating the visualization of what a deep neural network has learned Logistic regression for feature selection and soft classification of remote sensing data PARS risk charts: A 10-year study of risk assessment for cardiovascular diseases in Eastern Mediterranean Region Can machine-learning methods really help predict suicide? Risk assessment and receiver operating characteristic curves Epidemiology of suicide and the psychiatric perspective Schizophrenia: a concise overview of incidence, prevalence, and mortality EEG Frequency Bands in Psychiatric Disorders: A Review of Resting State Studies Abnormal Spontaneous Gamma Power Is Associated With Verbal Learning and Memory Dysfunction in Schizophrenia Oscillations and neuronal dynamics in schizophrenia: the search for basic symptoms and translational opportunities Cross-Frequency Coupling Based Neuromodulation for Treating Neurological Disorders Phase-amplitude cross-frequency coupling in the human nucleus accumbens tracks action monitoring during cognitive control Components of cross-frequency modulation in health and disease Characterizing Alzheimer's disease severity via resting-awake EEG amplitude modulation analysis Neuronal oscillations in cortical networks Data Mining algorithms and techniques in Mental Health: A systematic review Machine learning in mental health Passos, I.C. The impact of machine learning techniques in the study of bipolar disorder: A systematic review Machine learning techniques in a structural and functional MRI diagnostic approach in schizophrenia: a systematic review Computer-aided diagnosis of psychiatric distress in children and adolescents using deep interaction networks: The CASPIAN-IV study Reporting of artificial intelligence prediction models TRIPOD statement: a preliminary pre-post analysis of reporting and methods of prediction models Funding: This research received no external funding. The authors declare no conflict of interest.