key: cord-1042824-j39bxa0h
authors: Sinval, Jorge; Vazquez, Ana Claudia S.; Hutz, Claudio Simon; Schaufeli, Wilmar B.; Silva, Sílvia
title: Burnout Assessment Tool (BAT): Validity Evidence from Brazil and Portugal
date: 2022-01-25
journal: Int J Environ Res Public Health
DOI: 10.3390/ijerph19031344
sha: d4cc6fc4e0c4038b3e74a0bc9783c13ea76aef26
doc_id: 1042824
cord_uid: j39bxa0h

The Burnout Assessment Tool (BAT) has been gaining increased attention as a sound and innovative instrument in its conceptualization of burnout. BAT has been adapted for several countries, revealing promising validity evidence. This paper aims to present the psychometric properties of the Brazilian and Portuguese versions of the BAT in both the 23-item and 12-item versions. BAT’s validity evidence based on the internal structure (dimensionality, reliability, and measurement invariance) and validity evidence based on the relations to other variables are the focus of research. A cross-sectional study was conducted with two non-probabilistic convenience samples from two countries (N = 3103) one from Brazil (n(Brazil) = 2217) and one from Portugal (n(Portugal) = 886). BAT’s original structure was confirmed, and it achieved measurement invariance across countries. Using both classic test theory and item response theory as frameworks, the BAT presented good validity evidence based on the internal structure. Furthermore, the BAT showed good convergent evidence (i.e., work engagement, co-worker support, role clarity, work overload, and negative change). In conclusion, the psychometric properties of the BAT make this freely available instrument a promising way to measure and compare burnout levels of Portuguese and Brazilian workers.

Although the burnout syndrome appeared in the 1970s, it is still a global issue such that the 11th revision of the International Classification of Diseases of World Health Organization (ICD-11) defines it as an occupational phenomenon with risk of harming health [1] . The adopted definition of burnout in the ICD-11 comprises three factors (exhaustion, cynicism, and reduced professional efficacy) as the framework proposed by Maslach et al. [2] . However, the conceptualization of burnout is somewhat controversial [3] ; for example, a meta-analytical study on the physicians' burnout found 142 unique definitions of burnout with at least 47 unique definitions using MBI. Some constructs, such as depression and fatigue, are conceptually linked to job burnout [4, 5] . These phenomena are potentially part of the process of long-term sick leave. At the core of burnout lies severe fatigue (i.e., exhaustion); however persistently fatigued workers are not necessarily (by definition) in burnout, nor must burned-out workers necessarily report fatigue as the main complaint [5] . Occupational fatigue has been linked to an imbalance between the intensity and duration and timing of work with recovery time [6] . Studies over decades have shown evidence that burnout syndrome predicts various negative consequences to individuals and organizations, such as cardiovascular diseases, hypercholesterolemia, type 2 diabetes, coronary heart disease, musculoskeletal disorders, prolonged fatigue, headaches, gastrointestinal issues, mood disturbance, depressive symptoms, absenteeism, poor performance, insomnia, depressive symptoms, and life and job dissatisfaction [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] .

Research shows that job demands (e.g., work overload) are more associated with job burnout, while job resources (e.g., co-worker support) are more related to job burnout's antipode, i.e., work engagement [17] . Nowadays, researchers claim the COVID-19 pandemic has posed strain and increased workload and job stress, particularly in healthcare workers, who have presented a higher risk of burnout than other occupations [18] [19] [20] . Going beyond the individual consequences of burnout, recent research has also investigated burnout in a large range of occupations, organizations, and countries [21] [22] [23] [24] [25] [26] . The literature has firmly established that burnout is not only detrimental for workers' health but also has negative effects at the organizational level.

The most widely used instrument to assess burnout is the Maslach Burnout Inventory (MBI) [27] . Despite MBI's early contribution to enlighten burnout as an important psychological state to be deeply studied, researchers are still discussing its theoretical framework and psychometric basis and the practical applicability of this instrument [21, 28] . Schaufeli et al. [29] summarize these criticisms of MBI as including the following as the most important: (a) the questioning of the validity evidence of the constituting dimensions of burnout, (b) the lack of clinically established cut-off values, (c) the lack of representative and national samples to ground its statistical norms, (d) the limitations of its practical usability, and (e) the inconsistent dimensionality also in the cross-national studies on MBI [30] . Finally, with similar problems and weaknesses, there are other burnout measures, such as the Copenhagen Burnout Inventory [31] , the Oldenburg Burnout Inventory [32] , and, recently, the COVID-19 Burnout Scale [33] .

Taken together, these criticisms call for an alternative instrument to assess burnout and overcome these flaws using a novel conceptualization of the matter, which has been addressed by the development of the Burnout Assessment Tool (BAT) [34] . The BAT instrument assumes that burnout is a syndrome assessed by core symptoms (exhaustion, mental distance, emotional impairment, and cognitive impairment) and secondary symptoms (psychological distress and psychosomatic complaints), which could be associated with depressed mood and other comorbidities. Therefore, BAT considers burnout a second-order factor that acts as a syndrome, meaning that all four components are connected and belong to the same higher-order construct, i.e., burnout [21] . Based on the Job Demands-Resources Model (JD-R) [17] , the key components constituting the burnout process are the draining energy that leads to feeling exhausted and extremely tired at the same time that the distancing mentally manifests itself as a lack of interest and aversion to work [35] . In addition, in-depth interviews with experts brought two significant dimensions of burnout, which were not known until then, which are emotional impairment and cognitive impairment. These dimensions affect one's self-regulation to deal adequately with the daily working activities and to recover self-energy linked to the motivational process [36] .

Meanwhile, BAT has largely been investigated [37] [38] [39] [40] [41] and has demonstrated measurement invariance between seven countries in Europe and Japan [21] . As in the works of De Beer et al. [21] and Sakakibara et al. [28] , the current study is based on the BAT reconceptualization of burnout as a work-related state of exhaustion, extreme tiredness with reduced ability to regulate cognitive and emotional processes, and mental distancing. It can develop depressed mood as well as non-specific psychological and psychosomatic complaints [34] . Despite using the raw scores of only one item from MBI (i.e., "I feel exhausted at the end of the working day"), Schaufeli [24] found medium levels of burnout in Portugal, in comparison with a random sample of workers from thirty-five European countries (n = 43,675), using data from the 6th European Working Conditions Survey [42] . While in Brazil there is no publication reporting burnout scores using a survey (conducted with a representative sample) at the national level.

This current study is focused on the psychometric properties of BAT from a crossnational perspective (i.e., Brazil and Portugal). The main goal is to assess BAT's validity evidence based on the internal structure and based on the relations to other variables.

Following the recommendations of the Standards for Educational and Psychological Testing [43] , this study aims to evaluate two types of validity evidence for both for BAT's Portuguese and Brazilian version: one related to the internal structure, and one based on the relations to other variables (i.e., work engagement, role clarity, co-worker support, work overload, and negative change). BAT's original structure was successfully confirmed in several countries in a study by De Beer et al. [21] with data from Austria, Belgium (Flanders), Finland, Germany, Ireland, Japan, and the Netherlands. The Japanese version of BAT was also confirmed in a different study [28] , while the South Korean version maintained the hierarchical structure with four first-order dimensions, albeit with the removal of one item from the mental distance factor [44] . The Russian version also provided evidence indicative of the stability of the hierarchical structure [45] . Altogether, it is expected that the hypothesized hierarchical structure for BAT-23 (one second-order latent variable with four first-order factors, 23 indicators) and BAT-12 (second-order factor with four first-order dimensions, 12 items) hold with a satisfactory fit to the data in both Brazil and Portugal (H1). The reliability of the scores is one of the key components of the internal structure of any psychometric instrument [46] . It can be analyzed using four different types of approaches: internal consistency, test-retest, parallel forms, and interrater agreement. Previous research showed good evidence of internal consistency estimates using the ordinal α [47] for both second-order factor and first-order dimensions [21] . BAT's also presented satisfactory evidence in terms of test-retest evidence [48] . The second hypothesis (H2) states that BAT presents satisfactory internal consistency estimates (≥0.80) [46] . Measurement invariance is another component of the validity evidence based on the internal structure; it is an essential feature that is necessary before any substantive group comparisons (e.g., countries, sex) can be established. BAT has shown measurement invariance between seven countries Austria, Belgium (Flanders), Finland, Germany, Ireland, Japan, and the Netherlands [21] . Regarding sex, there is no single study investigating measurement invariance among sex. However, it is known that sex might be an important factor regarding burnout [49] , it is expected that females present higher levels of burnout [50] namely in terms of exhaustion [51, 52] . While others did not reach definite conclusions [25] .

Other instruments measuring burnout and related constructs have previously shown measurement invariance among workers from Brazil and Portugal [53] and among sex within the two mentioned countries [54, 55] . Reinforcing the similarity of the measurement structure of psychometric instruments among workers from the two countries. It is hypothesized (H3) that BAT holds measurement invariance among countries (Brazil and Portugal), and sex.

Another important source of validity evidence is provided by the relationship of instrument scores to external variables to the instrument [43] . This source of evidence allows understanding if the interpretation of the scores can be done as expected by the nomological network of constructs [56] . The JD-R model identifies possible antecedents of job burnout [57] . The central idea of the JD-R model is that working conditions, which are specific to every occupation, can generally be classified as either job demands or job resources, and those job characteristics will contribute to job burnout and work engagement [58] . The JD-R model suggests that work engagement is negatively related to burnout, since high job demands lead to a health impairment process (i.e., job burnout) and high resources will lead to a motivational process, i.e., work engagement [59] . Several meta-analyses have supported the relationship between job demands and resources and burnout [60] [61] [62] . In these studies, several job demands and resources were identified. For instance, social support, workload, and role clarity have been found as relevant demands and resources. As such, it is expected to observe a positive association between burnout and job demands and a negative relation between burnout and job resources [28, 39, 63] . It is anticipated that BAT's scores are negatively correlated with work engagement, role clarity, and co-worker support and positively correlated with work overload, and negative change (H4).

In this cross-sectional survey, a non-probabilistic convenience sample was collected. The inclusion criteria consisted of participants being able to read Portuguese and having easy access to a smartphone, PC, or tablet where they could open a digital questionnaire. The authors invited workers from Brazil and Portugal to participate. Considering BAT's second-order factor and four first-order dimensions with 23 manifest variables, it results in a total of 226 degrees of freedom [64] , assuming that the population RMSEA should be not higher than 0.08 (i.e., ε 0 = 0.08; H 0 : ε ≥ 0.08), since rejecting this hypothesis will lead to the conclusion that the model fit is better than 0.08, the recommended cutoff for a reasonable fit [65] . Additionally, the true population RMSEA was considered to be ε = 0.064 based on the findings from de Beer's et al. [21] study using a sample of 10,138 participants. Altogether, for an α = 0.05, β = 0.20 (i.e., power = 0.80) resulted in a required sample size of n = 171 [66] .

All used measures were used in their adapted version to Brazilian and Portuguese contexts.

The BAT was used to assess burnout [29] through the development of two transculturally adapted versions: one for Brazil and one for Portugal (Table 1 ). The BAT-23 is a self-report psychometric instrument that comprises 23 items to be answered using a fivepoint rating scale (1-"Never"; 2-"Rarely"; 3-"Sometimes"; 4-"Often"; 5-"Always"). The BAT-23 version measures burnout's core symptoms, and another version is also available that also includes items to assess burnout's secondary symptoms. To develop the Portuguese version (Table 1) , the BAT's English version was used [34] following the ITC Guidelines for Translating and Adapting Tests [67] . BAT-23 operationalizes burnout as a second-order construct with four first-order factors: exhaustion (eight items), mental distance (five items), cognitive impairment (five items), and emotional impairment (five items). From BAT-23's items, it is possible to extract a short version (BAT-12) with three items per each first-order latent construct (Table 1) . 

Work engagement refers to a positive motivational state and is composed of vigor, dedication, and absorption [68] . This construct was measured with the ultra-short version of the Utrecht Work Engagement Scale (UWES-3) [69] , which used items from the shortversion (i.e., UWES-9). The used UWES-3 items have been previously adapted with success to Portugal and Brazil [70, 71] . It uses a seven-point ordinal scale (0-"Never"; 1-"Almost never"; 2-"Rarely"; 3-"Sometimes"; 4-"Often"; 5-"Very often"; 6-"Always"), with one item pertaining to each of the three dimensions. The UWES has shown good convergent evidence with the burnout scores since work engagement and burnout are moderately and negatively related [29, 72] . The UWES already presented measurement invariance among Brazil and Portugal in the 9-item version [70] . One example of an item is: "At my work, I feel bursting with energy."

Co-worker support refers to the function and quality of social relationships at work, such as perceived availability of help from coworkers or support actually received [73] . To assess the perceptions of co-worker support, the co-worker support sub-scale (3 items) of the Energy Compass psychometric instrument [74] was used. The items were answered using an ordinal five-point scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). One example of an item is: "Can you count on your colleagues for help and support when needed?"

The clarity of the role assesses the extent to which the tasks to be performed are clearly defined and the expectations and responsibilities for the employee are clear [75] . This construct was assessed using the sub-scale role clarity of the Energy Compass psychometric instrument [74] . The three items of the sub-scale were answered using a five-point ordinal scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). One example of an item is: "Is it sufficiently clear what you need to do in your job?"

Work overload can be defined as the extent to which the employee has to deal with changes in job content, ICT systems, and leadership, as well as in the organization as a whole. Four items answered using a five-point ordinal scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always") were used, as suggested in Schaufeli et al. [76] . One example of an item is, "Do you have too much work to do?" 2.2.6. Negative Change Negative change refers to the pessimistic views produced by the introduction of modifications at work, e.g., pace of work, interpersonal conflict, work-home conflict, and use of skills [77] . The negative change construct was assessed using the corresponding subscale (three items) from the Energy Compass psychometric instrument [74] answered in a five-point scale of frequency (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). An example of an item is, "Do changes cause turmoil in your company?"

Data were collected simultaneously in Portugal and Brazil. The workers were invited to participate through social networks or e-mail. Firstly, the participants were presented with the electronic informed consent, which they had to accept to participate in the study. The digital survey was deployed using LimeSurvey [78] and SurveyMonkey [79] , which contained a group of psychometric instruments together with a group of sociodemographic and job questions. To check how likely the research process would work, a pilot study was conducted with 15 workers, which provided feedback (e.g., potential issues with the digital platforms where the survey was deployed, clarity of the questions/items, and mean time of fulfillment).

All the subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the study was approved by the Ethics Committee of the Federal University of Health Sciences of Porto Alegre Brazil, (CAAE 78617617.8.0000.5345; 25 October 2017).

To conduct the statistical analysis the statistical programming language R [80] through the integrated development environment, RStudio [81] was used. To estimate the adequate sample size for the confirmatory factor analysis, the MBESS package [82] was used. The skimr package [83] and the table1 package [84] were utilized to produce the descriptive statistics. The skewness (sk) using the "sample" method (i.e., sample skewness of the distribution) and the kurtosis (ku) using the "sample excess" method (i.e., sample kurtosis of the distribution with a value of 3 being subtracted) were calculated using the PerformanceAnalytics package [85] . The coefficient of variation (CV) was estimated with the sjstats package [86] , the standard error of the mean (SEM) was calculated with the plotrix package [87] . The mode was computed by the modeest package [88] . Absolute values of |sk| > 3 and |ku| > 7 were considered as severe univariate normality violations [89, 90] . To evaluate the multivariate normality the psych package [91] was used to calculate Mardia's multivariate kurtosis [92] .

To obtain evidence about the originally proposed dimensionality of the measurement models, the confirmatory factor analysis (CFA) was used. The following goodness-of-fit indices were used: NFI (Normed Fit Index), TLI (Tucker-Lewis Index), CFI (Comparative Fit Index), RMSEA (Root Mean Square Error of Approximation), and SRMR (Standardized Root Mean Square Residual). Estimates above 0.95 are considered good for NFI, TLI, and CFI [93] . While values below 0.08 were considered good for SRMR and RMSEA [94] . The package lavaan [95] was used to run the CFA analysis using the Weighted Least Squares Means and Variances (WLSMV) estimator [96] . The WLSMV was chosen because it does not require multivariate normality as an assumption, and because all items of the used psychometric instruments have an ordinal response scale.

The Average Variance Extracted (AVE) was estimated to test the evidence for convergent validity [97] . Satisfactory convergent validity evidence in terms of the internal structure was assumed for AVE ≥ 0.5 [98] .

Item response theory analysis was conducted using a multidimensional polytomous Rasch model [99] as a particular case of the multidimensional random coefficients multinomial logit model (MRCMLM) [100] . The TAM package [101] was used to conduct the multidimensional polytomous Rasch analysis. Wright maps (also known as item-person maps or item maps) were used to present the location of both items and respondents on the same scale [102, 103] . The WrightMap package [104] was used to produce the Wright Maps. Two mean square fit statistics (i.e., infit and outfit) were used to assess how well the data fit the model [105] . Considering the ordinal nature of the rating scale (i.e., 1-"Never" to 5-"Always"), the interval (0.6; 1.4) was considered as reasonable for the item mean square ranges for infit and outfit statistics [106] . Values above 1 suggest an increasing quantity of answers diverging from model's predictions, while values below 1 indicate answers with less heterogeneity than expected [107] .

To assess the evidence of reliability of the first-order factors, the following estimators of internal consistency were used: composite reliability (CR) [97] , the α ordinal [47] , and ω [108] . Values of ≥ 0.8 on the different mentioned estimators are considered indicative of acceptable reliability evidence [46, 98] . The second-order latent factor also had estimates of internal consistency: the proportion of variance among first-order common factors that is attributable to the second-order factor (ω L2 ), the proportion of variance of a composite score calculated from the observed indicators that is attributable to the second-order factor (ω L1 ), and the proportion of observed variance explained by the second-order factor after partialling out the uniqueness from the first-order factors (ω partial L1 ). Both second-order and first-order internal consistency estimates were calculated using the semTools package [109] . In the item response theory framework, the MRCMLM provided the expected a posteriori (EAP) reliability index for each latent factor. The EAP reliability is defined as the ratio of the variance of the EAPs and the variance of the plausible values [110] . Values of EAP reliability ≥ 0.8 are preferable.

Using the theta-parameterization for categorical items through the semTools package [109] , measurement invariance was evaluated comparing a group of eight different models [111] : (I) configural invariance, (II) thresholds of the indicators, (III) first-order factor loadings, (IV) structural weights, (V) intercepts of the first-order factors, (VI) latent means, (VII) disturbances of the first-order factors, and (VIII) residual variances of observed variables. The differences between the nested models were compared using two criteria. The ∆CFI ≤ −0.010 criterion [112] , which advocates the non-rejection of the null hypothesis of invariance if the ∆CFI is smaller or equal to −0.010, and the ∆χ 2 criterion [113] , which does not reject the null hypothesis of invariance if a significant χ 2 robust difference test is obtained.

The structural models were tested using the lavaan package [95] to test validity evidence based on relations to other variables. In the latent score means comparison, Cohen's d [114] was used as an effect size measure. The doBy package [115] was used to compute the raw score percentiles. A significance level of 5% was used (α = 0.05).

A merged sample of 3103 workers was collected (n Brazil = 2217; n Portugal = 886) 74.2% female, with an average of 37.2 (11.1) years old. More than half of the workers (53.4%) were professionals according to the International Standard Classification of Occupations (ISCO-08) [116] , and 72.5% had graduation or a higher academic level. Table 2 presents the descriptive statistics for each country, and for the merged sample. 

This source of validity evidence investigates the dimensionality, reliability of the scores, and measurement invariance.

The distributional properties of BAT's 23 items are presented in Table 3 ; these were used to judge distributional properties and psychometric sensitivity. None of the items in both countries presented severe univariate normality violations [89, 90] . Mardia's multivariate kurtosis [92] for the data from Brazil was 101.637 (p < 0.001), while for the data from Portugal it was 60.063 (p < 0.001). All items in both countries had the maximum range of possible answers, and no outliers were removed. These items' distributional properties are indicative of appropriate psychometric sensitivity, as it would be expected that these items would follow an approximately normal distribution in the population under study. Despite these univariate normality indicators, the weighted least squares means and variances (WLSMV) [96] estimation method was used, taking into consideration the ordinal level of measurement of the items. 

This source of validity evidence investigates the dimensionality, reliability of the scores, and measurement invariance.

The distributional properties of BAT's 23 items are presented in Table 3 ; these were used to judge distributional properties and psychometric sensitivity. None of the items in both countries presented severe univariate normality violations [89, 90] . Mardia's multivariate kurtosis [92] for the data from Brazil was 101.637 (p < 0.001), while for the data from Portugal it was 60.063 (p < 0.001). All items in both countries had the maximum range of possible answers, and no outliers were removed. These items' distributional properties are indicative of appropriate psychometric sensitivity, as it would be expected that these items would follow an approximately normal distribution in the population under study. Despite these univariate normality indicators, the weighted least squares means and variances (WLSMV) [96] estimation method was used, taking into consideration the ordinal level of measurement of the items. , and 52% or more of the variance of its indicators for BAT-12, i.e., AVE i ≥ 0.52 [98] .

From a Rasch perspective, the items match the workers' sample, since BAT's items garnered information about workers at all ranges of the burnout distribution. Figure 3 displays both items' scale values (in terms of location) and persons' burnout levels (in terms of their location) spaced along a common vertical axis marked with a logits scale [103, 117] .

The reliability values of the second-order factor were good for Portugal (ω L2 = 0.90; ω L1 = 0.86; ω partial L1 = 0.97) and Brazil (ω L2 = 0.90; ω L1 = 0.87; ω partial L1 = 0.97), as they were for the joint sample (ω L2 = 0.90; ω L1 = 0.87; ω partial L1 = 0.97). BAT-12 presented similar values for Portugal (ω L2 = 0.96; ω L1 = 0.84; ω partial L1 = 0.94), Brazil (ω L2 = 0.92; ω L1 = 0.84; ω partial L1 = 0.94) and for the joint sample (ω L2 = 0.93; ω L1 = 0.84; ω partial L1 = 0.94). The internal consistency evidence of the second-order construct was very satisfactory when using the individual data from each country, as when using the joint data (H2). In the BAT-23 the EAP reliability estimate for the burnout latent score was 0.89 for Portugal, 0.74 for Brazil, and 0.88 for the joint sample. The BAT-12 model presented an EAP reliability estimate of 0.85 for Portugal, 0.84 for Brazil, and 0.85 for the joint sample. Regarding the internal consistency estimates of the first-order factors, all used estimators presented values, which were indicative of satisfactory reliability evidence for both BAT versions (H2; α ord i ≥ 0.76; ω i ≥ 0.71; CR i ≥ 0.76; Table 4 ). For both BAT-23 and BAT-12, the EAP reliability estimates for the four latent first-order factors were satisfactory (EAP i ≥ 0.82; Table 4 ). 

The measurement invariance among countries and sex was tested through a group of nested models with increasing constraints (Table 5 ). Full uniqueness measurement invariance (i.e., strict invariance) was achieved among countries and sex for BAT-23 (H3) considering the ∆CFI ≤ −0.010 [112] . Using the ∆χ 2 criterion [113] thresholds, invariance among countries was achieved, and first-order factor loadings invariance was obtained among sex. However, the ∆χ 2 criterion is too restrictive [90] ; consequently, the ∆CFI criterion was preferred. The fit of the data to the model was acceptable among countries and sex, as seen in the dimensionality analysis. The measurement of burnout using BAT works in a similar manner across countries and sex, allowing comparisons of scores to be established between the different groups.

BAT-12 presented scalar measurement invariance among workers from Brazil and Portugal (H3). In order to avoid negative disturbance (it is not theoretically possible) of the mental distance latent variable among the Portuguese sample, the disturbance of the mental distance first-order factor was constrained to 0.1 for three models (i.e., 4, 5, and, 6). Full uniqueness measurement invariance was achieved among sex using the ∆CFI criterion (H3).

The Pearson's correlations between the raw BAT-12 and BAT-23 scores were very strong and statistically significant for the total burnout score (r = 0.979, t (3,101) = 267.096; p < 0.001) and for all first-order dimensions, i.e., exhaustion (r = 0.932, t (3,101) = 143.472; p < 0.001), mental distance (r = 0.960, t (3,101) = 191.928; p < 0.001), cognitive impairment (r = 0.948, t (3,101) = 165.722; p < 0.001), and emotional impairment (r = 0.948, t (3,101) = 166.541; p < 0.001). The raw mean scores per dimension and for the total burnout score are presented (Tables 6 and 7) . The models' latent correlations were virtually the same whether using BAT-23 or BAT-12 within each country ( Table 8 ). The sample from Brazil presented BAT's latent burnout score correlations ranging from −0.81 (work engagement) to 0.59 (work overload) when using BAT-23, and they ranged from −0.80 (work engagement) to 0.61 (work overload) when using BAT-12. For the Portuguese sample, the latent burnout scores' correlations ranged from −0.75 with work engagement to 0.54 with negative change for BAT-23. For the same sample, BAT-12's burnout latent correlations varied from −0.73 (work engagement) to 0.54 (negative change). All latent correlations with BAT's burnout score were statistically significant (Table 8 ) and presented a moderate to strong effect size (H4) [114] . 

The data from Brazil and Portugal provided robust validity evidence for the BAT-23, and the BAT-12 using item response theory (i.e., multidimensional Rasch model) and classical test theory (i.e., confirmatory factor analysis) in conjunction. Satisfactory evidence was obtained based on both the internal structure and the relations to other variables. The current study adds to the already available evidence about BAT's psychometric properties using the classical test theory, e.g., [21] , and item response theory [37] . The present study intends to take advantage of the benefits of the two measurement theories in conjunction while bringing some novelties, such as the second-order estimates of internal consistency (i.e., ω L2 , ω L1 , and ω partial L1 ) and the EAP reliability index. In terms of the Rasch model, the MRCMLM was used in contrast with the unidimensional approach used in BAT's previous research [37] . The multidimensional measurement model is both substantively advantageous and technically appropriate in cases where the unidimensionality is not expected [99] . The multidimensional approach considered BAT's four first-order dimensions, and a second-order latent variable. This is also the first study to provide infit and outfit estimates for BAT's items, these two mean square statistics are useful to understand how well the data fit the model [107] .

The originally proposed dimensionality for BAT-23 and BAT-12 presented a satisfactory fit to the data for both countries without removing items (H1). Such findings are corroborated by samples from other American and European countries, Ecuador using BAT-23 and BAT-12 [40] , and Italy using BAT-23 [38] . Currently, the cumulated evidence of BAT's dimensionality is consistent across countries from Asia, America, and Europe [21] .

Globally, the evidence of the reliability of the scores in terms of internal consistency obtained by both samples was satisfactory both for second-and first-order dimensions (H2). In fact, only the mental distance dimension of the BAT-12 version with the Portuguese data presented estimates slightly below the desirable the Portuguese workers; nevertheless, those values were acceptable (i.e., ≥0.71). BAT's mental distance was the first-order dimension that had the lowest α, and ω in the Ecuadorian version [40] , as did in the Italian version [38] . However, samples from other countries showed that mental distance did not present the lowest internal consistency estimates of all first-order dimensions [21] . As expected, BAT-12's first-order internal consistency estimates were lower than BAT-23's ones. Notwithstanding, BAT-12's internal consistency estimates were globally satisfactory [46] , with both classical test theory (i.e., α ord , ω, and CR) and item theory response estimators (i.e., EAP).

Both versions of BAT had measurement invariance (i.e., at least scalar) for countries and sex (H3), allowing mean comparisons for BAT among countries, and sex. BAT-23 presented measurement invariance among seven countries in a previous study by De beer et al. [21] . One of the novelties of the current study is the measurement invariance of BAT-23 among sex and the test of measurement invariance among countries and sex of BAT's short version (i.e., .

The BAT's scores' relation to other variables presented convergent evidence (H4) since all latent correlations' paths were statistically significant (moderate to strong effect sizes) with the theoretically expected direction for each correlation pair. BAT's burnout latent scores were negatively correlated with work engagement, role clarity, and co-workers' support. Positive latent correlations were found among BAT's burnout latent scores, work overload, and negative change. The latent correlations' effect sizes were similar among countries. Burnout's correlation with work overload and burnout's correlation with role clarity presented the largest difference among countries. The observed latent associations between BAT's scores and job demands and job resources are in accordance with the findings from research reading other BAT's versions; for example, the Romanian [39] and the Japanese [28] . BAT's burnout latent scores' correlation with work engagement was the strongest negative correlation for both countries. Strong negative correlations between burnout and work engagement are in accordance with what is theoretically expected from these two constructs [58, 118] . Regarding the data from Brazil, the strongest positive correlation with BAT's burnout scores was achieved with work overload. While for Portugal, the strongest correlation for BAT's burnout latent scores was observed with negative change. However, the absolutes values were smaller than the ones observed for burnout and work engagement correlation. The data provided validity evidence based on the relations to other variables, allowing to consistently build up on the existing burnout's nomological network of constructs [56] reinforcing BAT's psychometric properties.

Both BAT-23 and BAT-12 presented good validity evidence and thus both can be used to measure burnout levels among workers from Brazil and from Portugal. The advantage of using BAT-23 concerns its finer-grained assessment of burnout (i.e., more items lead to capturing more content of the construct); however, if time is a constraint and other measures are being collected, BAT-12 can be a more parsimonious alternative. As the results showed, the obtained raw scores for BAT-12 and BAT-23 presented an almost perfect correlation (i.e., 0.98). Moreover, short-versions of psychometric instruments are preferable since their validity evidence is not compromised with its shorting from the full-length version. The main goal of using a short-version is to reduce the time burden of assessment. Usually, short-versions have lower estimates of reliability than full versions. The BAT-12 results were (in some of the first-order dimensions) slightly lower than the BAT-23 ones, although with no meaningful losses in terms of its satisfactory validity evidence. Practitioners and researchers opting between BAT-23 and BAT-12 will have to balance between time-saving of brevity versus construct content coverage and validity evidence [119] . The BAT-12 option seems to be the most balanced, since its validity evidence is equivalent to its longer counterpart, and longer instruments can present several problems, for example, boredom, fatigue, increasing dropout rates, and lack of attention [120] .

The obtained non-probabilistic convenience sample introduces some degree of selection bias. However, probabilistic sampling (i.e., all units in the population have known and positive probabilities of inclusion) is only possible when there is a complete and up-todate list of the member of the population being investigated [121, 122] , which was not the case. Even with large samples, the representativeness of the samples cannot be assumed if the sampling method is not probabilistic. However, many valid conclusions can still be taken from the current study. Future research should be conducted with samples from occupational groups with few elements in the current paper (e.g., craft and related trades workers or elementary occupations). The sex proportions and academic level should also be more similar to each workers' population parameters. The current correlational study has a cross-sectional design. Longitudinal designs can strengthen the validity evidence of BAT, namely allowing longitudinal measurement invariance to be tested, which will allow BAT's structure stability through time to be studied. The current paper only investigated two of the five sources of validity evidence [43] . One of them (i.e., validity evidence based on the relations to other variables) was only analyzed from a correlational perspective with five related constructs. Further research on the BAT's scores' relations to other variables should expand to other conceptually linked constructs such as fatigue using, for example, the Portuguese adaptation of the Occupational Fatigue Exhaustion/Recovery [6] or the Brazilian adaptation of the Feeling of Fatigue scale [123] . Test-criterion relationships should be analyzed in future studies using a predictive or concurrent design. Future studies should also investigate other sources of validity evidence, e.g., the validity evidence based on the response processes.

The findings of the current study are based on large samples from two different countries. The obtained findings are promising in terms of the measurement of burnout's core symptoms. Future research should investigate the version of BAT including the secondary symptoms items for both countries, so as to also compare BAT's psychometric properties directly with the Brazilian and Portuguese adaptations of other psychometric instruments that measure burnout (e.g., MBI, OLBI, CBI). It will also be convenient to obtain cut-off values for different levels of burnout; for such a purpose, clinical samples will have to be investigated. Using the receiver operating characteristic (ROC) curve [124] will allow sensitivity (true positive rate) and specificity (true negative rate) to be estimated. Although it should be taken into consideration that BAT's score by themselves will not be enough, a full thorough clinical interview and complementary information will be required [29] . Another call that should be made for future studies is the incorporation of the increasing evidence (prior knowledge) about BAT's dimensionality to take advantage of the Bayesian approach [125] , which particularly useful with small samples and allows some frequentist approach potential problems to be avoided (e.g., non-convergence, negative variances).

BAT-12 was shown to be virtually equal to BAT-23 in terms of scores, as in terms of validity evidence, representing an equally robust alternative to measure burnout. The decision to use BAT-23 or BAT-12 will be related to the level of detail that one intends to obtain regarding the burnout measurement. BAT's Brazilian and Portuguese versions are invariant in terms of measurement, allowing for comparisons of means among countries, and between males and females. BAT's scores presented the expected associations with related measures. The quartiles and mean scores are also provided as the first reference in terms of burnout at the country and sex levels. BAT is a promising instrument and is a viable alternative to measuring burnout in workers from Brazil and Portugal.

The data of multi-occupational workers from Brazil and Portugal presented good validity evidence for both BAT-23 and BAT-12, supporting its use to measure and to compare burnout levels among sex and countries. BAT's scores provided support for the theoretical nomological network of constructs. Both samples' data fitted well the original structure of BAT-23 and BAT-12 with good reliability evidence. This leads to the conclusion that BAT is a good instrument for practitioners and researchers to measure burnout among different occupations. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

The data presented in this study are available on reasonable request from the corresponding author.

World Health Organization. International Statistical Classification of Diseases and Related Health Problems

Job burnout

Is burn-out finally a disease or not?

The relationship between burnout, depression, and anxiety: A systematic review and meta-analysis

Fatigue, burnout, and chronic fatigue syndrome among employees on sick leave: Do attributions make the difference?

Psychometric properties of the portuguese version of the occupational fatigue exhaustion/recovery (OFER) scale among industrial shift workers

Job burnout: The contribution of emotional stability and emotional self-efficacy beliefs

Associations between satisfaction with life, burnout-related emotional and physical exhaustion, and sleep complaints

Job strain, burnout, and depressive symptoms: A prospective study among dentists

The relationships between work intensity, workaholism, burnout, and self-reported musculoskeletal complaints

Do burnout and work engagement predict depressive symptoms and life satisfaction? A three-wave seven-year prospective study

Job burnout and job wornout as risk factors for long-term sickness absence

A simple model of stress, burnout and symptomatology in medical residents: A longitudinal study

Job burnout and affective wellbeing: A longitudinal study of burnout and job satisfaction among public child welfare workers

Physical, psychological and occupational consequences of job burnout: A systematic review of prospective studies

Burnout and risk of coronary heart disease: A prospective study of 8838 employees

Sanz-Vergel, A.I. Burnout and work engagement: The JD-R approach

How essential is to focus on physician's health and burnout in coronavirus (COVID-19) pandemic? Cureus 2020, 12, e7538

Factors contributing to healthcare professional burnout during the COVID-19 pandemic: A rapid turnaround global survey

The psychological impact of COVID-19 and other viral epidemics on frontline healthcare workers and ways to address it: A rapid systematic review

Measurement invariance of the Burnout Assessment Tool (BAT) across seven cross-national representative samples

How are changes in exposure to job demands and job resources related to burnout and engagement? A longitudinal study among Chinese nurses and police officers

How changes in job demands and resources predict burnout, work engagement, and sickness absenteeism

Burnout in Europe: Relations with National Economy

The role of psychosocial working conditions on burnout and its core component emotional exhaustion-A systematic review

Attachment styles and employee performance: The mediating role of burnout

Maslach Burnout Inventory Manual

Validation of the Japanese version of the burnout assessment tool

Burnout Assessment Tool (BAT)-Development, validity, and reliability

The copenhagen burnout inventory: A new tool for the assessment of burnout

The convergent validity of two burnout instruments: A multitraitmultimethod analysis

COVID-19 burnout, COVID-19 stress and resilience: Initial psychometric properties of COVID-19 Burnout Scale

Unpublished internal report

The conceptualization and measurement of burnout: Common ground and worlds apart the views expressed in Work & Stress Commentaries are those of the author(s), and do not necessarily represent those of any other person or organization, or of the journal

Interventions for improving recovery from work

A rasch analysis of the Burnout Assessment Tool (BAT)

The Burnout Assessment Tool (BAT): A contribution to Italian validation with teachers

Romanian short version of the Burnout Assessment Tool: Psychometric properties. Eval. Health Prof

The Ecuadorian version of the Burnout Assessment Tool (BAT): Adaptation and validation

Perfectionism and burnout during the COVID-19 crisis: A two-wave cross-lagged study

Sixth European Working Conditions Survey-Overview Report

American Psychological Association; National Council on Measurement in Education. Standards for Educational and Psychological Testing

A preliminary validation study for the Korean version of the Burnout Assessment Tool (K-BAT)

Personal resources and burnout: Evidence from a study among librarians of Moscow region

Ordinal versions of coefficients alpha and theta for Likert rating scales

Burnout Assessment Tool (BAT): Een nieuw instrument voor het meten van burn-out

Determinants and prevalence of burnout in emergency nurses: A systematic review of 25 years of research

Surgeon burnout: A systematic review

Gender differences in burnout: A meta-analysis

Exploring within-and between-gender differences in burnout: 8 different occupational groups

Short index of job satisfaction: Validity evidence from Portugal and Brazil

Openness toward organizational change scale (OTOCS): Validity evidence from Brazil and Portugal

The quality of work life scale: Validity evidence from Brazil and Portugal

Construct validity in psychological tests

The job demands-resources model of burnout

Job demands-resources theory: Taking stock and looking forward

A critical review of the job demands-resources model: Implications for improving work and health

A meta-analysis of burnout with job demands, resources, and attitudes

Sources of social support and burnout: A meta-analytic test of the conservation of resources model

The job demands-resources model: A meta-analytic review of longitudinal studies

Job demands, job resources, burnout, work engagement, and their relationships: An analysis across sectors

Calculating degrees of freedom for a structural equation model

Analytical power calculations for structural equation modeling: A tutorial and shiny app

Sample size planning for confirmatory factor models: Power and accuracy for effects of interest

International Test Commission. ITC Guidelines for Translating and Adapting Tests

Utrecht Work Engagement Scale

An ultra-short measure for work engagement

Brazil-Portugal transcultural adaptation of the UWES-9: Internal consistency, dimensionality, and measurement invariance

Adaptation and validation of the Brazilian version of the Utrecht Work Engagement Scale

Transcultural adaptation of the Oldenburg Burnout Inventory (OLBI) for Brazil and Portugal

Functional roles of social support within the stress and coping process: A theoretical and empirical overview

Applying the job demands-resources model

Role conflict and ambiguity in complex organizations

Workaholism among medical residents: It is the combination of working excessively and compulsively that counts

The impact of organizational changes on work stress, sleep, recovery and health

An Open Source Survey Tool

R: A Language and Environment for Statistical Computing

The MBESS R Package

Compact and Flexible Summaries of Data

Tables of Descriptive Statistics in HTML

Econometric Tools for Performance and Risk Analysis (R Package Version 2.0.4)

Statistical Functions for Regression Models (R Package Version 0.18.1) [Computer Software

A Package in the Red Light District of R. R-News

Mode Estimation (R Package Version 2.4.0) [Computer Software

Non-normal and categorical data in structural equation modeling

Fundamentos Teóricos, Software & Aplicações

Procedures for Psychological, Psychometric, and Personality Research

Measures of multivariate skewness and kurtosis with applications

Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives

Structural Equation Modeling with AMOS

lavaan: An R package for structural equation modeling

Latent variable structural equation modeling with categorical data

Evaluating structural equation models with unobservable variables and measurement error

Multivariate Data Analysis

An introduction to multidimensional measurement using Rasch models

The multidimensional random coefficients multinomial logit model

Test Analysis Modules

Best Test Design

Some notes on the term

WrightMap: IRT Item-Person Map with "ConQuest

Applying the rasch model

What do infit and outfit mean-square and standardized mean? Rasch Meas

Test Theory: A Unified Treatment

Useful Tools for Structural Equation Modeling (R Package Version 0.5-5)

Reliability as a measurement design effect. Stud

Assessing factorial invariance in ordered-categorical measures

Evaluating goodness-of-fit indexes for testing measurement invariance

A scaled difference chi-square test statistic for moment structure analysis

Statistical Power Analysis for the Behavioral Sciences

Utilities (R Package Version 4.6-2)

International Standard Classification of Occupations; International Labour Organization

Constructing Measures: An Item Response Moeling Approach

A meta-analysis of work engagement: Relationships with burnout, demands, resources, and consequences

Creating short forms and screening measures

Psychological assessment-science and practice

Can't we make it any shorter? The limits of personality assessment and ways to overcome them

Non-probability sampling

The SAGE Handbook of Survey Methodology

Fatigue at work: Scale validation with airline pilots. BAR-Brazilian Adm

ROC Curves for Continuous Data

Bayesian structural equation models via parameter expansion

The authors would like to acknowledge all the international researchers in the broader context of the Burnout Assessment Tool (BAT) consortium for their continued support and contribution to this important area of research.

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.