key: cord-0821358-kc22wz3y
authors: Ghazisaeedi, Marjan; Mahmoodi, Hamed; Arpaci, Ibrahim; Mehrdar, Siavash; Barzegari, Saeed
title: Validity, Reliability, and Optimal Cut-off Scores of the WHO-5, PHQ-9, and PHQ-2 to Screen Depression Among University Students in Iran
date: 2021-01-20
journal: Int J Ment Health Addict
DOI: 10.1007/s11469-021-00483-5
sha: 9b580cc9d0aa98b2143e935d9c0403f41368c0d7
doc_id: 821358
cord_uid: kc22wz3y

This study aimed to investigate the validity, reliability, and optimal cut-off points for the Patient Health Questionnaire-2 (PHQ-2), Patient Health Questionnaire-9 (PHQ-9), and Well-being Index (WHO-5) to screen mild depression among 400 Iranian students who completed these tools and Beck Depression Inventory (BDI-13). Further, a psychiatrist diagnosed the depression by using the “Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders.” The validity and internal consistency of tools assessed and the accuracy were computed using the receiver operating characteristic (ROC) and area under the curve (AUC). The internal consistency values of PHQ-2, PHQ-9, and WHO-5 were .73, .88, and .94, respectively. The PHQ-2 (.53), PHQ-9 (.60), and WHO-5 (.54) were significantly associated with the BDI. The PHQ-2, PHQ-9, and WHO-5 had optimal cut-off points of 2, 5, and 9 with an AUC of .809, .851, and .823, respectively. Based on these findings, it is recommended to use the PHQ-9 for mild depression screening among medical university students in Iran because of its high sensitivity and specificity.

Spitzer 2002; Liu et al. 2011; Zuithoff et al. 2010) . It can be used faster than the BDI thanks to its brevity and ease of scoring. Each question rated from 0 to 3 (0, not at all; 1, several days; 2, more than half of all the days; 3, nearly every day), and results range from 0 to 27, with 27 indicating the greatest severity of the depressive symptoms. The optimal cut-off score of the PHQ-9 can be different for patients and general community and also for screening and diagnosis purposes (Kendrick et al. 2009; Kroenke et al. 2010) . The PHQ-9 was previously translated and validated in Iran for patients (Dadfar et al. 2018a; Khamseh et al. 2011; Omani-Samani et al. 2018) ; however, it has not been validated for university students.

Patient Health Questionnaire-2 (PHQ-2) includes the first two items of the PHQ-9 and usually used as the initial depression screening instrument for the major depressive disorder (MDD). Results range from 0 to 6, with 6 indicating the greatest severity of the depressive symptoms. Furthermore, the accuracy of PHQ-2 examined in different studies (Dadfar et al. 2019; Jafari et al. 2014) .

WHO-5 Well-being Index is a widely used tool for depression screening consisting five items rated on a 6-point Likert as follows: at no time (0), some of the time (1), less than half of the time (2), more than half of the time (3), most of the time (4), and all of the time (5). The responses are from 0 (worst well-being) to 100 (best well-being). Validity of the WHO-5 was verified among Iranian outpatients (Dadfar et al. 2018b) .

BDI-13 is developed to assess the severity of depression. It has 13 items that are rated on a four-point Likert scale from "0" to "3" in terms of intensity, and results range from 0 to 39, with 39 indicating the greatest severity of the depressive symptoms. BDI-13 has been validated and is widely used in Iran (Dadfar and Kalibatseva 2016) . 

SPSS 19, STATA, and AMOS were used for statistical analysis of the study. The characteristics of participants and mean score of depression for each tool are determined. Normality of distribution was checked by the Shapiro-Wilk test. Further, independent t test and nonparametric Mann-Whitney test were used to compare mean score of depression among different groups such as gender, marriage, and academic grade. The concurrent validity was tested by using Pearson's(r) correlation between the BDI-13 and other tools. The internal consistency was measured by using the Cronbach's α coefficients. The construct validity was evaluated by using a confirmatory factor analysis (CFA) and fit indices including chi square/df (χ 2 /df), Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Goodness of Fit Index (GFI), Incremental Fit Index (IFI), and Root Mean Square Residual (RMR)). The accuracy of questionnaires was compared against the psychiatrist diagnosis using the receiver operating characteristic (ROC) curve and area under the curve (AUC). The sensitivity, specificity, predictive values, negative values, and optimal cut-off points were calculated for each screening tool. 

The study recruited a total of 442 students; however, 400 participants (90.49%) completed questionnaires and the SCID interview. Among participants, 206 (51.5%) were male, and 194 (48.5%) were female, and the mean age was 23.67 ± 5.37 years, with a range between 18 and 47 years. Demographic characteristics of the participants and mean differences between groups are presented in Table 1 .

The scores of PHQ-2, WHO-5, PHQ-9, and BDI-13 in males (1.92, SD = 1.6; 9.46, SD = 6.3; 6.72, SD = 5.5; 6.92, SD = 8.5, respectively) and females (1.64, SD = 1.5; 8.85, SD = 6.5; 5.54, SD = 5.1; 4.04, SD = 5.8, respectively) were significantly different for BD_13 (P = 0.028) and PHQ-9 (P = <.001). The internal consistency of PHQ-2, WHO-5, PHQ-9, and BDI-13 was 0.73, 0.94, .88, and .94, respectively. Factor loadings were greater than the threshold value of .40. The internal consistency of each tool measured and results indicated in Table 2 .

The CFA was conducted for each tool to test their construct validity. Goodness of fit indices, including normed fit index, relative fit index, incremental fit index, Tucker-Lewis index, comparative fit index, and root mean square error of approximation, were satisfactory. Table 3 shows the CFA results. Concurrent validity of the scales was assessed by using Pearson correlation analysis. Results indicated that PHQ-2 (r = 0.53), WHO-5 (r = 0.54), and PHQ-9 (r = 0.60) total mean scores were statistically significant (P < 0.001) with BDI-13. Correlations between other tools were also statistically significant (PHQ-9 and PHQ-2, r = 0.86; PHQ-9 and WHO-5, r = 0.68; PHQ-2 and WHO-5, r = 0.66; P < 0.001).

The area under the ROC curve of PHQ-9 (AUC: 0.851, 95% CI: 0.814-0.888), WHO-5 (AUC: 0.823, 95% CI: 0.782-0.863), and PHQ-2 (AUC: 0.809, 95% CI: 0.767-0.851) indicates that PHQ-9 provided significantly higher level of discrimination for mild depression (See Fig. 1) .

Accuracy, including sensitivity and specificity of the different cut-off points for the PHQ-2, PHQ-9, and WHO-5, is presented in Table 4 . The best cut-off point was obtained for mild depression (cut-off point: 2, sensitivity: 80.22%, specificity: 66.51%; cut-off point: 5, sensitivity: 84.62%, specificity: 70.18%; cut-off point: 9, sensitivity: 79.12%, specificity: 70.64%, respectively). The results shown in Table 4 indicate that PHQ-9 has the highest accuracy and could effectively discriminate between students with and without mild depression.

We aimed to validate the WHO-5, PHQ-9, and PHQ-2 screening tools for depression among Iranian medical sciences students. Consistent with the previous studies in different populations (Arroll et al. 2010; Kroenke et al. 2001) , the internal consistency of the tools was satisfactory. The concurrent validity results indicated that these tools are significantly correlated with the BDI-13, thereby confirming the results of the previous studies (Cameron et al. 2011; Dum et al. 2008) . We also examined the goodness of fit for the unidimensional structure of the PHQ-9 and WHO-5 and three dimensions of the BDI-II (cognitive, somatic, and affective symptoms). In line with prior studies (Al-Turkait and Ohaeri 2010; Guðmundsdóttir et al. 2014; Keum et al. 2018) , the results indicated the satisfactory goodness of fit for the tools.

Incorporating the sensitivity and specificity, the AUC calculated for each tool to estimate the probability that a tool will correctly classify students as depressed or non-depressed (Hanley and McNeil 1982) . The AUC values were greater than .80 indicating that the screening tools were successful (Holmes 1998) . The results indicated that the validity of the WHO-5 (.823), PHQ-9 (.851), and PHQ-2 (.809) was supported by the excellent discrimination AUC value. The cut-off point of mild depression for the PHQ-9 was recommended as five (Kroenke et al. 2001) . The current study confirmed this result, and it was optimal when screening mild depression among the participants. The sensitivity and specificity values at this cut-off point were 84.62 and 70.18, respectively. These findings suggested that the PHQ-9 is a successful tool in screening depression among students. Further, the original cut-off point for mild depression in the PHQ-2 was recommended as three (Kroenke et al. 2003) . However, our findings found the cut-off point as two for the optimal discrimination with the sensitivity and specificity of 80.22 and 66.51, respectively. However, the optimal cut-off point for depression screening among adolescents was recommended as nine (Allgaier et al. 2012) . The present study confirmed this result, and the sensitivity and specificity values at this cut-off point were 79.12 and 70.64, respectively.

In conclusion, it is important to note that the PHQ-9, PHQ-2, and WHO-5 are brief, easy to use, valid, and reliable tools in screening depression among Iranian medical university students. The cut-off points of two, five, and nine are recommended to identify students with minor depressive disorder using the PHQ-2, PHQ-9, and WHO-5, respectively. The PHQ-9 had the highest AUC value, and therefore, it is highly recommended to apply the PHQ-9 for screening and follow-up assessment.

The participants received the SCID reference standard assessment after the screening tests. This was one of the strengths of the study. However, the study has certain limitations. First, the test-retest reliability was not performed by collecting follow-up data since face to face interactions were stopped because of the COVID-19 pandemic. Further, the participants were recruited by the convenient sampling method, and medical students cannot be representative of the entire student population in Iran. Therefore, further studies should be conducted with a larger sample recruited by random sampling methods.

Dimensional and hierarchical models of depression using the Beck depression inventory-II in an Arab college student sample

Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population

Validation of the PHQ-9 in a psychiatric sample

Measuring depression severity in general practice: Discriminatory performance of the PHQ-9

Psychometric properties of the persian version of the short beck depression inventory with Iranian psychiatric outpatients

Reliability and validity of the Farsi version of the patient health questionnaire-9 (PHQ-9) with Iranian psychiatric outpatients. Trends in psychiatry and psychotherapy

Reliability, validity, and factorial structure of the World Health Organization-5 well-being index (WHO-5) in Iranian psychiatric outpatients. Trends in psychiatry and psychotherapy

Validation of the patient health questionnaire-2 with Iranian students

Comparing the BDI-II and the PHQ-9 with outpatient substance abusers

CBT delivered in a specialized depression clinic for college students with depressive symptoms

Brief assessment of depression: Psychometric properties of the Portuguese version of the patient health questionnaire (PHQ-9

Screening and case-finding instruments for depression: A metaanalysis

A psychometric evaluation of the Icelandic version of the WHO-5

The meaning and use of the area under a receiver operating characteristic (ROC) curve

Enhancing depression screening to identify college students at risk for persistent depressive symptoms

A short, psychiatric, case-finding measure for HIV seropositive outpatients: Performance characteristics of the 5-item mental health subscale of the SF-20 in a male, seropositive sample

A systematic review of studies of depression prevalence in university students

Spiritual well-being and quality of life of Iranian adults with type 2 diabetes. Evidence-Based Complementary and alternative medicine

Management of depression in UK general practice in relation to scores on depression severity questionnaires: Analysis of medical record data

Testing the factor structure and measurement invariance of the PHQ-9 across racially diverse U.S. college students

Comparison of the CES-D and PHQ-9 depression scales in people with type 2 diabetes in Tehran

Emotion regulation and the transdiagnostic role of repetitive negative thinking in adolescents with social anxiety and depression

Detection of depression in low resource settings: Validation of the patient health questionnaire (PHQ-9) and cultural concepts of distress in Nepal

The PHQ-9: A new depression diagnostic and severity measure

The PHQ-9: Validity of a brief depression severity measure

The patient health questionnaire-2: Validity of a two-item depression screener

The patient health questionnaire somatic, anxiety, and depressive symptom scales: A systematic review

Prevalence of depression among Chinese University students: A meta-analysis

Validation of patient health questionnaire for depression screening among primary care patients in Taiwan

Optimal cut-off score for diagnosing depression with the patient health questionnaire (PHQ-9): A meta-analysis

Comparison of prevalence of depression among medical, dental, and engineering students in Patna using Beck's depression inventory II: A cross-sectional study

Prevalence of depression and its determinant factors among infertile patients in Iran based on the PHQ-9

Why do many psychiatric disorders emerge during adolescence?

Prevalence of depression among university students: A systematic review and meta-analysis study. Depression research and treatment

Validating the patient health Questionnaire-9 (PHQ-9) for screening of depression in Tanzania

Rethinking recommendations for screening for depression in primary care

Rapid screening of psychological well-being of patients with chronic illness: Reliability and validity test on WHO-5 and PHQ-9 scales

The patient health questionnaire-9 for detection of major depressive disorder in primary care: Consequences of current thresholds in a crosssectional study

Conflict of Interest The authors declare that they have no conflict of interest.Informed Consent Aims of the study had been explained to participants and informed consent was obtained from the students.

Authors' Contribution All authors contributed to the design, implementation, analysis of the results and to the writing of the manuscript.