key: cord-1005650-g352voyl
authors: Guillot-Valdés, María; Guillén-Riquelme, Alejandro; Buela-Casal, Gualberto
title: Content Validity through Expert Judgment for the Depression Clinical Evaluation Test
date: 2022-02-04
journal: Int J Clin Health Psychol
DOI: 10.1016/j.ijchp.2022.100292
sha: 87ca992f28c6d59c70a39e601b23b7c8a3f6249f
doc_id: 1005650
cord_uid: g352voyl

BACKGROUND/OBJECTIVE: The evaluation of depression requires valid and reliable measuring instruments, which collect a wide spectrum of symptoms that this disorder displays, in order to carry out an accurate and differential diagnosis. The objective of this work is the construction of the Depression Clinical Evaluation Test (DCET), where affective, somatic, cognitive, behavioral and interpersonal symptoms are considered and also analyze its content validity through an expert judgment. METHOD: Based on different diagnostic and manual classifications, a specification table for a depression test was established. In its evaluation, 16 experts in Psychological Assessment, Psychometry and/or Psychopathology participated. A total of 300 items were created. The experts had to assess the items according to the criteria of Content, Relevance, Clarity, Comprehension, Sensitivity, and Offensiveness. In addition, 50 adults, evaluated the compression of the items. RESULTS: The degree of understanding for all the items was high and the expert judgment favoured the suppression of 104 items, thus obtaining a shorter measuring instrument with a total of 196 items for ease of application. CONCLUSIONS: The content validity of the test is adequate and fits the agreed definition of depression.

Validez de contenido mediante juicio de expertos para el Test de Evaluaci on Clínica de la Depresi on Resumen Antecedentes: La evaluaci on de la depresi on requiere de instrumentos de medida v alidos, fiables y que recojan el amplio espectro de síntomas que este trastorno conlleva, para poder llevar a cabo un diagn ostico certero y diferencial. El objetivo de este trabajo es la construcci on del Test de Evaluaci on Clínica de la Depresi on (TECD), que contempla síntomas afectivos, som aticos, cognitivos, conductuales e interpersonales, y analizar su validez de contenido a trav es de un

Depresi on; Juicio de expertos; Evaluaci on; Validez de contenido; Estudio instrumental juicio de expertos. M etodo: A partir de diferentes clasificaciones diagn osticas y manuales se estableci o la tabla de especificaci on del test para este cuestionario de depresi on. En la evaluaci on de este participaron 16 expertos en Evaluaci on Psicol ogica, Psicometría y/o Psicopatología. Se crearon 300 ítems en total, que los expertos tuvieron que valorar atendiendo a los criterios de Contenido, Relevancia, Claridad, Comprensi on, Sensitividad y Ofensividad. Adem as, 50 adultos, valoraron la compresi on de los items. Resultados: El grado de comprensi on de todos los ítems fue elevado y el juicio de expertos supuso la supresi on de 104 ítems, obteniendo así un instrumento de medida m as breve, con 196 ítems en total, lo que facilitar a su aplicaci on. Conclusiones: La validez de contenido del test es adecuada y se adapta a la definici on de depresi on establecida. © 2022 The Author(s). Published by Elsevier España, S.L.U. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Depression is one of the most common psychological disorders. According to World Health Organization (WHO) data, it is around 5.2% in the general population, being very close to those observed in other studies, around 7.2% (Lim et al., 2018) . The study of depression has aroused interest over the years; and currently there has been a proliferation of work on the prevalence (Bueno-Notivol et al., 2021) and analysis of depression symptoms due to the COVID-19 health crisis (Cecchini et al., 2021) . Incidentally, the evaluation of depression is complex even when there are a variety of instruments for its and diagnosis (Guillot-Vald es et al., 2019 and even in primary care with short evaluations (Rezaeizadeha et al., 2021) .

A difficulty of depression assessment lies in the fact that it is a disorder with wide and varied symptomatology. This range includes cognitive, behavioural and psychosomatic symptoms in addition to the main emotional symptoms of the disorder. There are no scales on which all of them are evaluated with different items for each type of symptom. One of the most classic and used questionnaires is the Beck Depression Inventory (BDI-II; Beck et al., 1988) . One of its advantages is that it covers a wide spectrum of depression with very few items; however, as mentioned above, it only covers each facet with one or two items. This fact makes it difficult to know the most affected areas of a specific case in a reliable way. Thus, it is common for evaluations to be complemented by using various specific questionnaires in order to make a reliable clinical profile of each affected area. This methodology presents results that are not easy to integrate and evaluate independent aspects of depression.

Last but not least, there is a controversy about whether depression has a dimensional or a categorical character, which prevails in current mental disorders classification systems. This approach influences the construction of instruments for the evaluation and diagnosis of the disorder (Chiesa et al., 2017) . However, there are also contributions that emphasize the existence of an orthogonal structure between the two which would imply that obtaining high scores in positive affect would not lead to low scores in negative affect (Watson et al., 2011) . Currently, very few questionnaires are focused on the dimensional approach to depression; therefore, the Basic Depression Questionnaire and the State/Trait Depression Inventory constitute certain examples on which some recent studies have been developed (Guillot-Vald es et al., 2019 . Although they do not cover the entire symptom picture of depressive disorder.

The task of constructing a test implies careful planning, a clear and concrete vision of what it intends to measure, and that the items are well written and include a representative sample of the possible behaviours to be assessed (Muñiz et al., 2013; Muñiz & Fonseca-Pedrero, 2019) . In this case, it is about operationalizing a construct, through concrete and tangible elements (items) (Carretero-Dios & P erez, 2007; Muñiz & Fonseca-Pedrero, 2019) . For this, a detailed process and a multitude of experts are required to help in the review of decisions. One of the most used methods to find the content validity of a questionnaire is the judgment of experts, who can either suggest which items the instrument should consist of to define the construct to be measured, or as in this case, evaluate the items already created based on a series of quantitative criteria (giving scores) or qualitative and suggesting, or adding any change to their wording if they consider it necessary (Garrote & del Carmen Rojas, 2015) . This procedure is widely used by researchers to analyze the content validity of newly created instruments (Leyton-Rom an et al., 2021) or for adaptations of existing instruments (Cervilla et al., 2021) .

The aim of this study is, first, establish a test especificacition table. For this we expect to establish an integral model that evaluates the main components of depression, thus covering all of the related symptoms. Secondly, we will develop an item bank test that cover this test specification table, including a proportional number of items for each factor and subfactor. The second aim is to estimate the content validity of this item pool, based on expert judgments of the Clinical Evaluation of Depression Test (TECD). In addition, it is intended to analyze the degree of understanding of the item bank to verify that they are intelligible to adult population.

The sample, selected by convenience, consists of 16 experts and all of them had PhDs degrees in Psychology, with years of expertise and voluntarily agreed to participate in the study. They were specialized in the area of psychological evaluation, psychometry and/or clinical psychopathology and had great experience in the subject due to their academic training and work experience. Thus, they were able to provide adequate information, evidence, judgments and evaluations (Escobar-P erez & Cuervo-Martínez, 2008) .

The criterion that different authors considered for the selection of judges was followed (Skjong & Wentworth, 2001; Urrutia et al., 2014; Varela-Ruiz et al., 2012) . Not only the already mentioned criterion but also the impartiality, motivation to participate, adaptability and availability of the judges were taken ino account. They were contacted by email, explaining the purpose of the project and requesting their collaboration.

In parallel and following the model of other authors (Fern andez-G omez et al., 2020; García-Cort es & Hern andez, 2021; Luque-Vara et al., 2020) a pre-test of comprehension of the items was carried out, in which a total of 50 voluntarily collaborators participated (M age = 38, SD age = 19.07, 56% women) and to whom, as in the case of the experts, part of the questionnaire (50 items each) was also sent via email. Informed consent was obtained from each of them. The aspects to be evaluated were the degree of understanding of the item, reflected in the question as 'If the item was understood well' and the response ranged from bad (0) to perfect (10). The participants were also asked if there were any words that they did not understand and, finally, if they would express the item in another way and how. Subsequently, the mean of these scores was calculated to determine the degree of comprehension.

For the creation of the Depression Clinical Evaluation Test (DCET) the 'Standards for educational and psychological testing' (American Educational Research Association, et al., 2014) and the guidelines of the International Test Commission (2016) were followed. In addition, several general articles on the creation and adaptation of tests were followed (Almanasreh et al., 2019) .

In the first phase, from the documentary review carried out, a definition of depression was established: it was understood as a series of mood disorders characterized by having a common core symptomatology and that could vary in intensity, frequency or in the specific presence of symptoms among themselves. Derived from this definition and all the material consulted, the factors that composed it were established, collecting a logical grouping of the characteristics established in the manuals. Consequently, the symptoms were grouped into the following factors: affective, physiological/somatic, cognitive, behavioural and interpersonal. The number of symptoms considered in each one ranged from 3 to 8.

The symptoms' weights were established in accordance with whether they appeared in the DSM-5 and / or in the ICD-10 and 11, giving double weight to those that were collected in all classifications (Table 1) . These weights were percentages ranging between 8 and 33%.

Once the weights of the factors and sub-factors were established, a confirmation of this phase was carried out by the experts. In addition, the most accurate response scale was utilized with respect to the proposed objective and it was presented to the experts as a 'table of test specifications' Two response scales were considered: one exclusively temporal with the evaluations marking the time of duration of the symptoms, and the other indicating the frequency of appearance of the symptoms in three temporal moments (last month, last year and always). All the experts agreed that the best alternative was this second modality.

From there, only one change was proposed in the affective factor. Originally it was composed of depressed mood, anhedonia, and undervaluation and guilt each with a value of 33%, but after this initial trial depressed mood changed to 50% and anhedonia as well as undervaluation and guilt each became 25%.

After that, a bank of 300 items was prepared, where writing double negatives, double verbs, complex phrases and complex vocabulary was avoided. These items were subjected to qualitative evaluation by consulting six experts who were asked to indicate the adequacy of the definitions that were given of depression and each of the facets as well as the components that formed them. They were also asked to evaluate the sufficiency of the percentage of importance given to each facet in a component (established according to appearance in the DSM, the ICD or both).

In the second phase, the second expert judgment coming from 13 judges (three of them also participated in the previous phase) was carried out. First, instructions were provided on the importance of this procedure and the tasks to be performed: 1) After the initial instructions, the general information of the test was presented so that the experts had all the necessary information to understand the complete final test and could provide their suggestions as to the general idea of the questionnaire and its objective. 2) Subsequently, the components and the facets of each of them were presented. Along with the definitions, the weight of the factor within the component was indicated (see Appendix A). 3) Then, the experts were asked to use the response scale.

In order to avoid the fatigue effect, the questionnaire was divided into six equal parts (50 items). Each of these parts had the same number of items for each factor and subfactor (also disordered) to avoid both fatigue and response by acquiescence while trying to evaluate all the items of the same factor. Some experts were sent all parts of the questionnaire (300 items) and others only one (50 items) or two of them (100 items). In all cases, the criteria to be evaluated were the following: The qualitative observations of the experts were considered for each of the items that formed the original instrument. In total, five judgments were obtained from each part into which the instrument was divided (50 items). Information was obtained from each of the experts individually (following the individual aggregate method) in a confidential manner, without them having contacted each other (Almenara & Cejudo, 2013) . The data were collected in a Microsoft Excel 2010 sheet and then processed in the SPSS 25 statistical programme. This work was approved by the Ethics Committee of the University of Granada (Spain).

All of the items that met the established requirements were considered adequate. Those that were partially adequate and required some changes and the inadequate ones that were considered incongruous or problematic with the established criteria were eliminated.

First, the adequacy of the item contentin this case depressionwas analyzed to the measured construct. All those items with scores below 1.6 were eliminated (this scale ranges from 0 to 2). Following this criterion, 39 items were eliminated (13% of the total).

Then, items with clarity less than 2.2 (scale from 0 to 3) were eliminated, thereby eliminating 17 items (6% of the total). The next criterion was relevance, where items with a mean of less than 2.4 (scale from 0 to 3) between the five experts were taken as the cut-off point. When applying this criterion, the following 42 items were eliminated (14% of the total). Finally, we observed the presence of items that, having acceptable scores, had various areas with scores that were not maximum and these items also exhibited slight comprehension problems. Here, 8 items were removed (4% of the total). This process involved the suppression of 104 items. Some of the items (10) were corrected in writing. All this made it possible to obtain a clearer and slightly shorter measuring instrument, with 196 items, which helped to reduce the application time and improve the objectivity of the response options. Table 2 shows the number of items that finally remained in each Factor and Subfactor.

In addition to the expert judgment, the 300 items were subjected to comprehension evaluation in an adult's sample. The responses of the 50 people surveyed were taken into Depressed mood X X X Loss of pleasure or interest in almost all activities X X X Significant weight gain X X Significant weight loss X X X Significant increase in appetite X X Significant loss of appetite X X X Insomnia X X X Hypersomnia X X X Psychomotor agitation X X Psychomotor slowing down X X Fatigue or loss of energy X X X Feeling worthless or excessive guilt X X X Decreased ability to conc entrate X X X Decreased ability to think / make decisions X Recurring thoughts of death X X Suicidal plans or ideation X X X Suicide attempt X X X Reduced activity level X Decreased attention X X Loss of self-confidence X Feeling of inferiority X Grim perspective of the future X X Self-harm X Loss of reactivity to pleasant events and stimuli X X Loss of libido X X account (scoring their understanding on a scale of 0 to 10) with an average comprehension of 9.82 out of 10. There were no items with an understanding lower than 9, which indicated that all the items were easily understandable and, therefore, it was not necessary to delete or modify any item after the analysis.

The objective of this work was to propose a comprehensive model of depression in order to develop a test for its evaluation. Secondly, it was intended to estimate the content validity based on expert judgments of the DCET which included five dimensions of the disorder for adults. Finally, the authors wanted to evaluate the comprehension of the developed items. After the different analyses, a test specification table was developed which adequately described the clinical criteria. From it, a sensitive and valid a bank of items was created, after purification. In addition, the items were understandable. One of the strengths of this instrument is that it has been created with the intention of exhaustively evaluating those main, core and representative components of depression that are not present in cases of pure anxiety. This fact represents advancement over current questionnaires (e.g., BDI, Beck et al., 1988; CBD, Peñate, 2001; IDER, Spielberger et al., 2008) . Likewise, it should be noted that the initial item bank that constituted the instrument was so exhaustive that the entire symptomatic picture of depressive disorder was covered as grouped by the following factors: affective, somatic, cognitive, behavioral and interpersonal. Also, various subfactors were considered within each one of them. This fact corresponds with the current psychometric specifications (Muñiz et al., 2013) .

This work was submitted to an evaluation of its quality by experts. They evaluated them based on various categories (relevance, representativeness, etc.), thus making this procedure an essential criterion to determine the quality of measurement by an instrument (Muñiz & Fonseca-Pedrero, 2019) . Incidentally, Almenara and Cejudo (2013) pointed out among the most outstanding benefits of this methodology, the level of depth it offered, the little difficulty one would experience using it or that the technical and human requirements for its utilization were not too demanding.

The present study selected 16 experts to respond to the proposed objectives, a number that was in the range recommended by various authors (Urrutia et al., 2014; Varela-Ruiz et al., 2012) . Experts in the field of clinical psychology were selected and it was determined that all of them had to have experience in research and treatment on emotional as well as depressive disorders and psychometrics.

In view of the results obtained, one can have an instrument that has adequate content validity to evaluate depression and its symptoms. Furthermore, the sub-factors that compose them are also adjusted to the theoretical definition of depression proposed. This will be essential when evaluating depression comprehensively and will help them to know the main affected areas for the treatment (Mavranezouli et al., 2020; Pybis et al., 2017) . Also, it is essential to have evaluation instruments with a dimensional and non-categorical approach. Currently, the ICD-11 (World Health Organization, 2019) recommends the use of these types of approaches as they can more appropriately address various disorders (e.g., personality disorders; Chiesa et al., 2017; Fowler et al., 2015; Waugh et al., 2017) This work is not without its limitations. One of the most outstanding was the large number of items that the instrument initially covered, which meant dividing the questionnaire when presenting it to the experts. Considering future works in obtaining evidence of construct validity, future exploratory developments should also take into account maintaining an adequate number of items in each subfactor, taking special care in factors with few items. The choice of the number of experts was also somewhat difficult, due to Clinical discomfort Interpersonal 10 differences among the authors. Some considered the ideal range between 7 and 30 (Urrutia et al., 2014) . Most authors recommend consulting more than 10 experts (García-Martín et al., 2016; Ju arez-Hern andez & Tob on, 2018). Thus, for the present study, altogether 16 experts were chosen (6 for the first phase and 10 for the second) for their availability as well as level of experience in the matter. Future researches will be focused on applying pertinent statistical analyses (EFA, CFA) which allow selecting the items that will finally constitute each of the factors and sub-factors with adequate statistical significance. In any case, the authors of this work have managed to develop a pilot instrument to assess depression in a multidimensional way.

This study has been funded by Bursary FPU17/05262 for University Professor Training as part of the first author's thesis (Psychological Doctoral Programme B13 56 1; RD 99/2011). 

Evaluation of methods used for estimating content validity

La aplicaci on del juicio de experto como t ecnica de evaluaci on de las tecnologías de la informaci on y comunicaci on (TIC) [Application of expert judgment as a technique for evaluating information

Standards for educational and psychological testing

An inventory for measuring clinical anxiety. Psychometric properties

Prevalence of depression during the COVID-19 outbreak: A meta-analysis of community-based studies

Standards for the development and review of instrumental studies: Considerations about test selection in psychological research

A longitudinal study on depressive symptoms and physical activity during the Spanish lockdown

Development of the Spanish short version of Negative Attitudes Toward Masturbation Inventory

Categorical and dimensional approaches in the evaluation of the relationship between attachment and personality disorders: An empirical study

Content validity and expert judgment: An approach to its use

Content validation through expert judgement of an instrument on the nutritional knowledge, beliefs, and habits of pregnant women

A dimensional approach to assessing personality functioning: Examining personality trait domains utilizing DSM-IV personality disorder criteria

Validation of instruments to evaluate the educational model and degree of progress according to the knowledge society

Analysis of the training process in expert players: instrument validation. Revista Internacional de Medicina y Ciencias de la Actividad Física del Deporte

La validaci on por juicio de expertos: dos investigaciones cualitativas en Ling€ uística aplicada. Revista Nebrija de Ling€ uística Aplicada a la Enseñanza de Lenguas

Reliability and validity of the Basic Depression Questionnaire

The international test commission guidelines on the security of tests, examinations, and other assessments: international test commission (ITC)

Analysis of the implicit elements in the content validation of a research instrument

Validation of the Spanish Healthy Lifestyle Questionnaire. International Journal of Clinical and Health Psychology, 21, Article 100228

Prevalence of Depression in the Community from 30 Countries between

Content Validation of an Instrument for the Assessment of School Teachers' Levels of Knowledge of Diabetes through Expert Judgment

Cost-effectiveness of psychological treatments for post-traumatic stress disorder in adults

Guidelines for the translation and adaptation of tests: second edition

Ten steps for the construction of a test

Presentation of a basic questionnaire to evaluate the genuine symptoms of depression. Introduction. An alisis y Modificaci on de Conducta

The comparative effectiveness and efficiency of cognitive behaviour therapy and generic counselling in the treatment of depression: evidence from the 2 nd UK National Audit of psychological therapies

Depression screening and treatment among uninsured populations in Primary Care

Expert Judgement and risk perception

Optimal methods for determining content validity

Description and uses of the Delphi technique in research in health areas

Emotions and the emotional disorders: A quantitative hierrarchical perspective

Psychological assessment with the DSMÀ5 Alternative Model for Personality Disorders: Tradition and innovation

ICD-11 for mortality and morbidity statistics

The authors want to thank all the experts who have collaborated selflessly with them on the work so that it can meet its objectives as well as the pilot sample.

As you know, within the process of creating a questionnaire, an expert judgment is necessary to guarantee the validity of the content and that each of the items is adequate and representative of the construct. Mentioned below are 50 items belonging to a questionnaire for the evaluation of depression, which in turn consists of 300 items.You have to evaluate each item in terms of its content, relevance, clarity, comprehension, sensitivity and offensiveness, answering in each of the columns provided (Excel sheet). There is also a section for comments, where you can add any suggestions or clarifications.