key: cord-0018727-bl6m6due
authors: Brañez-Condorena, Ana; Soriano-Moreno, David R.; Navarro-Flores, Alba; Solis-Chimoy, Blanca; Diaz-Barrera, Mario E.; Taype-Rondan, Alvaro
title: Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: A systematic review and meta-analysis
date: 2021-07-01
journal: PLoS One
DOI: 10.1371/journal.pone.0253899
sha: 623fc9606705132fcfef31ea17eb7cabe0247498
doc_id: 18727
cord_uid: bl6m6due

BACKGROUND: The Geriatric Depression Scale (GDS) is a widely used instrument to assess depression in older adults. The short GDS versions that have four (GDS-4) and five items (GDS-5) represent alternatives for depression screening in limited-resource settings. However, their accuracy remains uncertain. OBJECTIVE: To assess the accuracy of the GDS-4 and GDS-5 versions for depression screening in older adults. METHODS: Until May 2020, we systematically searched PubMed, PsycINFO, Scopus, and Google Scholar; for studies that have assessed the sensitivity and specificity of GDS-4 and GDS-5 for depression screening in older adults. We conducted meta-analyses of the sensitivity and specificity of those studies that used the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10) as reference standard. Study quality was assessed with the QUADAS-2 tool. We performed bivariate random-effects meta-analyses to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cut-off. For the overall meta-analyses, we evaluated each GDS-4 version or GDS-5 version separately by each cut-off, and for investigations of heterogeneity, we assessed altogether across similar GDS versions by each cut-off. Also, we assessed the certainty of evidence using the GRADE methodology. RESULTS: Twenty-three studies were included and meta-analyzed, assessing eleven different GDS versions. The number of participants included was 5048. When including all versions together, at a cut-off 2, GDS-4 had a pooled sensitivity of 0.77 (95% CI: 0.70–0.82) and a pooled specificity of 0.75 (0.68–0.81); while GDS-5 had a pooled sensitivity of 0.85 (0.80–0.90) and a pooled specificity of 0.75 (0.69–0.81). We found results for more than one GDS-4 version at cut-off points 1, 2, and 3; and for more than one GDS-5 version at cut-off points 1, 2, 3, and 4. Mostly, significant subgroup differences at different test thresholds across versions were found. The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity. There was high risk of bias in the index test domain. Also, the certainty of the evidence was low or very low for most of the GDS versions. CONCLUSIONS: We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence. Altogether, our results indicate the need for more well-designed studies that compare different GDS versions.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Depression is a major global public health issue [1] . Older adults represent a vulnerable group, likely due to aging-related factors, such as loss of skills and decreased functional activity [2] . It is estimated that around 10% to 20% of older adults worldwide live with depression [3] . This condition increases the risk of suicide [4] , the risk of comorbidities' complications [5] , the use of health services and care costs, and overall mortality [4, 6] . Hence, it represents a source of high burdening, not only for patients but for healthcare systems.

In older adults, depression´s somatic symptoms are similar to other chronic health conditions [7] , and mood changes are less prevalent and commonly replaced by physical discomfort [8, 9] , resulting in challenging diagnosis and subsequent delay of treatment access. Thus, some structured depression screening scales that focus on elderly population have been developed [10] . There are several scales for screening for depression among older adults, such as the Geriatric Depression Scale (GDS) [11] , the Center for Epidemiologic Studies Depression Scale (CES-D), and others. However, the GDS is one of the most used to identify depression among older adults. Among the strengths of the GDS, its use may be easier in people with cognitive impairment because of the simple yes-no format, and it can be used in hospital and community settings [11] .

Its full version contains 30 questions (GDS-30) and requires substantial time for assessment. Therefore, shorter GDS versions, selecting some of the GDS-30 items [12, 13] , have been proposed for a rapid depression assessment in time-restricted scenarios, such as GDS versions with four items (called , and GDS versions with five items (called GDS-5) [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] .

The accuracy of these GDS-4 and GDS-5 versions remains unclear [24] . Although some previous systematic reviews have assessed this subject, these tend to pool different GDS-4 or GDS-5 versions in the same quantitative analysis, even though each version includes different questions [13, [25] [26] [27] . Thus, we performed a systematic review that aims to assess the accuracy of the GDS-4 and GDS-5 versions for depression screening in older adults.

We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [28] . The study protocol is registered in PROSPERO (CRD42020170864).

The inclusion criteria were as follows: 1) Observational studies that reported the sensitivity and specificity of any of the GDS-4 and GDS-5 versions for the diagnosis of depression, using the DSM or ICD-10 diagnosis criteria as reference standard, since these provide a commonly used and accepted framework for depression diagnosis in the clinical practice [29] , 2) studies that were conducted in older adults (at least 2/3 of the study participants must have had � 55 years old), 3) studies that specified the items of the GDS-4 and GDS-5 versions, and 4) studies that provided enough data to construct a 2x2 contingency table to assess sensitivity and specificity. No restrictions on language, publication date, validation of language translation of the short GDS versions, or the mode of test assessment were applied.

We systematically searched the following databases and search engines: PubMed, PsycINFO and Scopus until April 24, 2020. Additionally, we searched the first 100 results retrieved in Google Scholar up to May 16, 2020. Google Scholar was searched to identify grey literature through the first 100 records, as systematic reviews usually examine the first 100 records in Google Scholar [30] [31] [32] because it is a large and unspecific source of grey literature, which sorts results by relevance and coincidence. The search strategy is available at the S1 Table of the Supplementary Material. Later, we complemented the search by reviewing manually the lists of references of all the studies included in the data selection process, the lists of articles that cited each of these included studies (through Google Scholar), and the lists of studies included in previous systematic or narrative reviews on the subject, until May 2020 [13, [25] [26] [27] [33] [34] [35] [36] [37] [38] [39] .

Initially, we removed all duplicated records by using the EndNote software. Two independent authors (ANF and DRSM) independently screened all results for inclusion, first reviewing the titles and abstracts, and later performing a full-text assessment, trough EndNote software. Any disagreement during the selection process was discussed with a third party (ABC) and resolved by consensus.

Two authors (ANF and BSC) independently performed the data extraction from each included study using a standardized Microsoft Excel sheet. Differences were solved by a third researcher (ABC). The following variables were extracted: first author, year of publication, country, population characteristics (number of participants, setting, sex, age), inclusion and exclusion criteria, prevalence of depression in the study according to the reference standard, funding, intervention (short GDS version, language of the test, mode of test assessment, GDS-4 or GDS-5 questions, cut-off used, number of true positives, false positives, true negatives, and false negatives), reference standard (International Classification of Diseases [ICD], the Diagnostic and Statistical Manual of Mental Disorders [DSM], structured interview, or others), type of depression evaluated, and numerical results of sensitivity and specificity. When there were doubts about any information reported in the studies, we sent emails to the authors to clarify the information.

Two researchers (DRSM and MEDB) independently assessed the risk of bias of the included studies using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [40] . This tool has four domains: patient selection, index test, reference standard, and flow and timing. The reference standards considered appropriate for this assessment were any version of the DSM or the ICD-10. In case of disagreement, a consensus was achieved with a third researcher (ATR).

Additionally, we used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) methodology to report the certainty on the evidence [41, 42] . Risk of bias, indirect evidence, inconsistency, imprecision, and publication bias were assessed. We downgraded the certainty of evidence when fewer than 70% of studies had at least 7 of 10 items at low risk according to QUADAS-2, when fewer than 70% of studies had the components (population, index test, or reference standard) similar to the initial diagnostic question, when heterogeneity was moderate or high, when the confidence interval range was greater than or equal to 10%, and when fewer than 4 studies evaluated the outcome of interest.

We conducted meta-analyses of the sensitivity and specificity of each of the GDS-4 and GDS-5 versions whenever studies fulfilled the following condition: 1) There was more than one study that compared the same version of GDS-4 or GDS-5 at the same cut-off point. We performed the meta-analyses of GDS-4 and GDS-5 separately.

When there were at least four studies to include in the meta-analysis, we used bivariate mixed-effects models via random effects that consider the correlation between sensitivity and specificity by each study to provide estimates of effects [43] . When less than four studies were included for a meta-analysis, the mixed-effects model assessment was not appropriate, so we performed meta-analyses of proportions using the exact binomial distribution. We calculated the pooled sensitivity and specificity with their 95% confidence intervals.

In addition, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-4 version. Likewise, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-5 version.

Heterogeneity was assessed through visual assessment of forest plots. To assess if there were subgroup differences across different GDS versions, also we evaluated heterogeneity through visual assessment of forest plots. All analyses were performed using the Stata v14.0 software.

Overall, 2,740 records were retrieved in the database systematic search. After removal of duplicates, 2,254 records were screened, and 71 records were full-text reviewed. From these, we excluded 52 records for not fulfilling the inclusion criteria. Reasons for exclusion are explained in S2 Table. Nineteen records were included in this initial process.

Additionally, we identified seven records that meet our inclusion criteria after searching the lists of references of all included studies, the lists of references of previous reviews, and the lists of articles that cited each of the included studies (through Google Scholar). For a total of 26 included records. 

The number of participants included was 5048. Individual studies´participants ranged from 60 to 586. Regarding the population, one study was performed only in people without dementia [48] , and the rest of the studies were performed in both groups of patients [15] [16] [17] [18] [19] [21] [22] [23] [44] [45] [46] [47] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] . Regarding the gold standard used for depression, most studies used the DSM-IV [15, 18, 22, [44] [45] [46] [47] 49, 52, 55, 56, 58, 59] . Other standards used were the DSM-III [17, 23] , DSM-III-R [19, 51, 60] , DSM-IV-TR [54] , DSM-V [16, 48] , and ICD-10 [21, 49, 53, 57, 61] . One study did not specify which DSM was evaluated [50] . Nine studies additionally used a structured interview to conduct their assessment such as the Structured Clinical Interview for DSM (SCID) or Composite International Diagnostic Interview (CIDI) [15, 23, [44] [45] [46] [47] 49, 51, 54, 56] . The characteristics of the 23 studies are summarized in Table 1 and detailed in S3 [21] , and Apostolo (n = 1) [16] . Table 2 . The most assessed questions were the number 1 (satisfied with life) and 3 (life is empty).

Using the QUADAS-2 tool, we found a high risk of bias in most of the studies. There was high risk of bias in the index test domain. Specifically, the question about the lack of pre-specification of the cut-off points used was the most common flaw (Fig 2) .

As stated before, we assessed the sensitivity and specificity of studies that used the DSM or ICD-10 diagnosis criteria as a reference standard, for all GDS-4 and GDS-5 versions. Thus, 23 studies were included in these quantitative analyses.

For the GDS-4 assessment, 14 studies with a total of 3266 participants were included. We obtained eleven sensitivity and specificity estimates, which gave information regarding six versions of GDS-4 at different cut-offs: D'Ath at cut-off 1 and 2; Van Marwijk at cut-off 1, 2 and 3; Cheng at cut-off 1, 2, 3 and 4; Martinez at cut-off 2; and Galaria at cut-off 2 ( Table 3) .

When taken together, GDS-4 versions at cut-off 1 had a pooled sensitivity of 0.90 (95% CI: 0.85-0.93) and a pooled specificity of 0.57 (95% CI: 0.45-0.67), at cut-off 2 had a pooled sensitivity of 0.77 (95% CI: 0.70-0.82) and a pooled specificity of 0.75 (95% CI: 0.68-0.81), and at cut-off 3 had a pooled sensitivity of 0.63 (95% CI: 0.53-0.71) and a pooled specificity of 0.78 (95% CI: 0.69-0.84).

Among the GDS-4 versions, the results for those with the lower cut-off point tend to have a higher sensitivity and a lower specificity. When assessing the sensitivity and specificity estimates, the Galaria at cut-off 2 and the Cheng at cut-off 4 had the greatest balance, the first one favoring the sensitivity and the second one the specificity. We assessed and found differences in sensitivity and specificity estimates for the different GDS-4 versions, at each cut-off point used.

For the GDS-5 assessment, 15 studies with a total of 3085 participants were included. We obtained thirteen sensitivity and specificity estimates, which gave information regarding five versions of GDS-5 at different cut-offs: De Dios or Ortega at cut-off 2, Hoyl at cut-off 1, 2 and 3, Martinez at cut-off 2, Apostolo at cut-off 1, 3, 4 and 5, and Heisel or Cheng at cut-offs 1, 2, 3 and 4 (Table 3) .

When Among the GDS-5 versions, the results for those with the lower cut-off point tend to have a higher sensitivity and a lower specificity. When assessing the sensitivity and specificity estimates, the De Dios or Ortega at cut-off 2 had the greatest balance of sensitivity (0.98, 95% CI: 0.96-1.00) and specificity (0.83, 95% CI: 0.79-0.87).

We assessed and found differences in sensitivity and specificity estimates for the different GDS-5 versions, at each cut-off point used.

A summary of the sensitivity analysis and all the forest plots could be found in S1-S6 Figs.

Molloy � Martinez Apostolo 

We used GRADE summary of findings (SoF) tables to report the certainty of evidence (Table 3) . Overall, the certainty of the evidence was very low, mostly due to concerns about the indirectness of the evidence, inconsistency, and imprecision of the results. However, the De Dios or Ortega GDS-5 version obtained a high certainty of evidence.

The first versions of the GDS-4 and GDS-5 were D'Ath and Hoyl versions, respectively [14, 15] . However, many other versions have been created in recent years, mostly by testing which combination of GDS-30 items could have a better performance in terms of sensitivity and specificity [17] [18] [19] [20] 22, 23] . In this systematic review, we found five different versions for the GDS-4 instrument and seven different GDS-5 versions. Previous systematic reviews have assessed the accuracy of these GDS short versions [13, [25] [26] [27] . These reviews included from two to ten studies for the GDS-4 assessment, and only one study for the GDS-5 assessment. While in our systematic review we included 23 studies: 15 that evaluated GDS-4 and 15 that evaluated GDS-5.

All previous meta-analysis had pooled the results from studies using different GDS versions. However, results suggest that different versions have different sensitivity and specificity estimates for the same cut-off point. Among the assessed GDS-4 versions, the balance between sensitivity and specificity was greater for the Galaria version at cut-off 2 (pooled analysis of two studies, very low certainty of the accuracy evidence), and for the Cheng version at cut-off 4 (one study, low certainty of the accuracy evidence). Among the assessed GDS-5 versions, the balance between sensitivity and specificity was greater for the "De Dios or Ortega" version at cut-off 2 (pooled analysis of two studies with high certainty of the evidence). Although this suggests that the "De Dios or Ortega" version at cut-off 2 may be a balanced option, with a high certainty that allows a more confident estimation of underdiagnosis and overdiagnosis rates, decision-makers must also consider other factors such as applicability in their contexts or cultural variations in the manifestation of depression, before deciding which GDS version to use.

Subgroup analyses found that estimates were different across different GDS-4 versions, and across different GDS-5 versions. While this suggests that some versions may have a better performance than others, the low certainty of these estimates prevents from making any solid conclusion. However, it seems sensible that future systematic reviews evaluate each version separately.

Moreover, most of the meta-analyses for each version also had significant heterogeneity, which may be due to differences in risk of bias, populations characteristics (such as dementia prevalence), study setting, or reference standard usage (DSM-III, DSM-IV, DSM-V, or ICD-10 criteria). Moreover, some cultural differences in the construct of depression may cause heterogeneous results across different contexts [62] . Regretfully, the low number of studies per GDS version and their heterogeneous characteristics prevent to glimpse any predominant factor that could explain the heterogeneous results.

Certain limitations must be considered when interpreting the results: 1) certainty of the evidence was low or very low for most of the results, mainly due to heterogeneity and risk of bias.

2) Most of studies had a high risk of bias, mainly due to the selective reporting of the cut-off points (some studies seemed to report only the cut-off with the highest sensitivity and specificity), and the assessment of GDS-4 or GDS-5 accuracy by extracting items assessed in a full GDS-30 interview (since the GDS-30 is a much longer survey, it is expected that answering to the GDS-30 would be more exhausted than answering the GDS-4 or GDS-5 versions). 3) Studies had heterogeneous settings, population characteristics, and depression definition.

However, to the best of our knowledge, this is the most comprehensive systematic review performed to date regarding the accuracy of GDS-4 and GDS-5, which included 23 studies; and is the first systematic review that provides the pooled estimates of each GDS-4 and GDS-5 versions. Thus, our results would help guide clinical practice and clinical guidelines recommendations.

This study summarizes the sensitivity and specificity of GDS-4 and GDS-5 for depression screening in older adults. We found several GDS-4 and GDS-5 versions, the results of which had great heterogeneity, which suggest that some versions may be more accurate than others. Certainty for the evidence was low or very low for almost all estimates. Altogether, our results indicate the need for more well-designed studies that compare different GDS versions. 

Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study

Depression in the elderly: clinical features and risk factors

Prevalence of depressive disorders in the elderly

What are the causes of late-life depression?

Somatic symptoms of depression in elderly patients with medical comorbidities

Depression, anxiety and cognition in community-dwelling adults aged 70 years and over

Epidemiology of major depressive disorder in elderly Nigerians in the Ibadan Study of Ageing: a community-based survey

The prognosis of undetected depression in older general practice patients. A one year follow-up study

Clinical diagnosis of depression in primary care: a meta-analysis

Detecting Mood Disorder in Resource-Limited Primary Care Settings: Comparison of a self-administered screening tool to general practitioner assessment

Screening for Depression in Adults: An Updated Systematic Evidence Review for the US Preventive Services Task Force

Agency for Healthcare Research and Quality (US)

Diagnostic validity and added value of the Geriatric Depression Scale for depression in primary care: a meta-analysis of GDS30 and GDS15

Which version of the geriatric depression scale is most useful in medical settings and nursing homes? Diagnostic validity meta-analysis

Screening, detection and management of depression in elderly primary care attenders. I: The acceptability and performance of the 15 item Geriatric Depression Scale (GDS15) and the development of short versions

Development and testing of a five-item version of the Geriatric Depression Scale

Screening capacity of Geriatric Depression Scale with 10 and 5 items

Serie IV

A brief version of the geriatric depression scale for the chinese

Validació n de una versió n de cinco ítems de la Escala de Depresión Geriátrica de Yesavage en població n española

Development of a shorter version of the geriatric depression scale for visually impaired older patients

Screening for suicide ideation among older primary care patients

Aproximació n a versiones ultracortas del cuestionario de Yesavage para el cribado de la depresió n

Validació n de la versió n española de 5 y 15 ítems de la Escala de Depresión Geriátrica en personas mayores en Atención Primaria

Evaluation of the feasibility, reliability and diagnostic value of shortened versions of the geriatric depression scale

Screening for depression in an elderly population living at home

The diagnostic accuracy of brief versions of the Geriatric Depression Scale: a systematic review and meta-analysis

Diagnostic accuracy of various forms of geriatric depression scale for screening of depression among older adults: Systematic review and meta-analysis

Comparison of diagnostic performance of Two-Question Screen and 15 depression screening instruments for older adults: systematic review and meta-analysis

Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement

A comparison of DSM and ICD classifications of mental disorder

Neurological and Musculoskeletal Features of COVID-19: A Systematic Review and Meta-Analysis

Learning curves of open and endoscopic fetal spina bifida closure: systematic review and meta-analysis

Global impact of tobacco control policies on smokeless tobacco use: a systematic review protocol

Assessing for depression and mood disturbance in later life

Rapid Depression Assessment in Geriatric Patients

Fiabilidad y validez de constructo del test MUNSH para medir felicidad, en població n de adultos mayores chilenos

Which DSM validated tools for diagnosing depression are usable in primary care research? A systematic literature review

Screening older adults for depression in primary care settings

Screening accuracy for late-life depression in primary care: a systematic review

Choosing an appropriate depression assessment tool for chinese older adults: a review of 11 instruments. The best tools take into account cultural differences

QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies

GRADE Guidelines: 22. The GRADE approach for tests and strategies-from test accuracy to patient important outcomes and recommendations

Grading quality of evidence and strength of recommendations for diagnostic tests and strategies

Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews

Beside the Geriatric Depression Scale: the WHO-Five Well-being Index as a valid screening tool for depression in nursing homes

Validity of the Brazilian version of the Geriatric Depression Scale (GDS) among primary care patients

Escala de Depressão Geriá trica com quatro itens: um instrumento vá lido para rastrear depressão em idosos em nível primá rio de saú de

The evaluation and design of a short depression screening tool in Turkish older adults

Short versions of the geriatric depression scale: a study of their validity for the diagnosis of a major depressive episode according to ICD-10 and DSM-IV

A study on the validity of different short versions of the Geriatric Depression Scale

Validation of five short versions of the Geriatric Depression Scale in the elder population in Taiwan

Validación de la versió n reducida de la escala de depresión geriátrica en el consultorio externo de geriatría del Hospital Nacional Cayetano Heredia

The validity of the hospital anxiety and depression scale and the geriatric depression scale-5 in home-dwelling old adults in Norway(✰). J Affect Disord

Comparación de la sensibilidad y la especificidad entre diferentes versiones de la Escala de Depresió n Geriátrica

Optimising the diagnostic performance of the Geriatric Depression Scale

Diagnostic accuracy of the original 30-item and shortened versions of the Geriatric Depression Scale in nursing home patients

The effectiveness of very short scales for depression screening in elderly medical patients

Validation of the fiveitem geriatric depression scale in elderly subjects in three different settings

Accuracy of 12 short versions of the Geriatric Depression Scale to detect depression in a prospective study of a highrisk population with different levels of cognition

Comparative performance of long and short forms of the Geriatric Depression Scale in mildly demented Chinese

The geriatric depression scale as a screening tool for depression and suicide ideation: a replication and extention

Cultural differences in the development and characteristics of depression

We would like to thank David Villarreal-Zegarra and Jessica Hanae Zafra-Tanaka for their valuable comments in the revision of the manuscript.