key: cord-0187093-28ucja4x authors: Muhlbach, Nicolaj Sondergaard title: occupation2vec: A general approach to representing occupations date: 2021-11-03 journal: nan DOI: nan sha: a2cb23f7a35f2a89967e9f875f73f94aee776094 doc_id: 187093 cord_uid: 28ucja4x We propose textbf{occupation2vec}, a general approach to representing occupations, which can be used in matching, predictive and causal modeling, and other economic areas. In particular, we use it to score occupations on any definable characteristic of interest, say the degree of `greenness'. Using more than 17,000 occupation-specific descriptors, we transform each occupation into a high-dimensional vector using natural language processing. Similar, we assign a vector to the target characteristic and estimate the occupational degree of this characteristic as the correlation between the vectors. The main advantages of this approach are its universal applicability and verifiability contrary to existing ad-hoc approaches. We extensively validate our approach on several exercises and then use it to estimate the occupational degree of charisma and emotional intelligence (EQ). We find that occupations that score high on these tend to have higher educational requirements and projected employment growth. Turning to wages, highly charismatic occupations are either found in the lower or upper tail in the wage distribution. This is not found for EQ, where higher levels of EQ are correlated with higher wages. Designing optimal education and labor markets, policy-makers face an increasingly important yet difficult challenge of assessing critical aspects of occupations. For instance, Autor et al. (2003 Autor et al. ( , 2006 ; Acemoglu and Autor (2011) ; Autor and Dorn (2013) ; Autor and Handel (2013) find the occupational degree of analytical, routine, and manual job tasks to be important in explaining changes in the labor market; Goldin (2014) examines the role of various occupational characteristics in explaining the gender wage gap; Deming (2017) consider the degree of social skills in occupations; Frey and Osborne (2017) Dingel and Neiman (2020) and Mongey et al. (2020) construct measures of the occupational feasibility of working at home and exposure to social distancing, respectively, to study the economic impact of social distancing in the perspective of Common to these seminal studies is the need to measure fundamental characteristics of occupations. But all existing studies use case-by-case approaches that are deeply difficult to generalize and validate. Most on them rely on detailed data from the Occupational Information Network (O*NET) that provides measures of hundreds of occupational features. Although the richness of the data certainly allows for novel contributions, an inherent risk of overfitting lurks in the dark. That is, researchers might be tempted to select exactly those features of the data that makes the analysis fit the narrative. The real difficulty is, however, that it may be borderline impossible to validate and verify truly novel measures of occupational characteristics. Another challenge is that every new measure requires a new approach, method, study, etc., which is surely laborious. We propose occupation2vec, a general approach to representing occupations as high-dimensional vectors, which can then be used in matching, comparative studies, predictive and causal modeling, and other economic areas of interest. Specifically, we demonstrate how the high-dimensional occupation vectors can be used to score occupations on any definable target characteristic, for instance, the occupational degree of 'greenness'. At its core, our approach essentially transforms every occupation into a high-dimensional vector for which reason we call it an occupation2vec framework. The only input needed is an objective and reliable textual definition of the target characteristic. For instance, the U.S. Bureau of Labor Statistics defines green jobs as Definition 1: Definition 1 (Green jobs). "(A) Jobs in businesses that produce goods or provide services that benefit the environment or conserve natural resources. (B) Jobs in which workers' duties involve making their establishment's production processes more environmentally friendly or use fewer natural resources." (U.S. Bureau of Labor Statistics, 2021) Using O*NET, we rely on 244 occupational attributes, 873 occupation descriptions, and 16,804 occupation-specific tasks to learn a vector representation of each occupation, leveraging a state-of-the-art natural language processing (NLP) algorithm. 1 Once all occupations are embedded as high-dimensional vectors, we consider a given target characteristic of interest and assign to it a vector in the same vector space as the occupations. This allows us to estimate the occupational degree of the target characteristic as the standardized correlation coefficient between its vector and any occupation vector. The advantage of this framework is that it is fully data-driven and universal, only requiring the user to provide a reliable and objective definition of the target characteristic. Hence, this is easily extendable to genuinely novel attributes of occupations. In principle, one could extend the framework to default to using an encyclopedia, e.g., Wikipedia, and then the approach would be truly automated. Another advantage is how easy it is to validate the framework in contrast to existing approaches that do not consider any ground truth. We validate occupation2vec in two ways. First, we visually inspect the occupation vectors by compressing them into two-dimensional vectors using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). We plot all occupations on the two dimensions and show that occupations cluster according to major occupational groups and educational requirements. This indicates that even after compressing the information from the high-dimensional vectors into two dimensions, the low-dimensional occupation vectors still capture precisely differences between occupations. Second, we estimate the occupational degree of all of the 244 occupational attributes from O*NET and compare our estimates to the original O*NET scores. We find strong evidence that our estimates coincide with the original O*NET scores both between and within occupations and take this altogether as evidence that occupation2vec produces high-quality occupational vectors that capture essential features of the occupations. These vectors can then be used to match occupations more precisely, act as control covariates in regressions, etc. Specifically, we demonstrate how the vectors can be used to estimate novel characteristics of occupations. Once the quality of the occupation vectors and the framework in general have been scrutinized and confirmed, we showcase the framework using several applications, which we divide into estimating well-studied versus novel characteristics. First, we revisit the popular task measures (abstract, manual, and routine tasks, respectively) by Autor et al. (2003 Autor et al. ( , 2006 ; Acemoglu and Autor (2011) ; Autor and Dorn (2013) ; Autor and Handel (2013) and the innovative measures of artificial intelligence (AI) by Felten et al. (2018) , Brynjolfsson et al. (2018) , and Webb (2019) . This illustrates how one could have used our framework had the other case-specific approaches not been invented. Note carefully that this is not another validation exercise, except under the assumption that the existing measures represent the ground truth. We feel more comfortable making this assumption for the original O*NET scores. We document that our estimates of each of the task measures coincide well with the original measures. In particular, we find that performing a high amount of routine tasks is on average associated with lower educational attainment, lower wage, and lower projected employment growth. These occupations are often found within office and administrative support occupations. We find the same pattern for occupations that are characterized by performing manual, whereas the opposite holds for abstract tasks. We find that our estimates of AI exposure generally behave as a mix of the measures of Felten et al. (2018) and Webb (2019) with no clear relationship to that of Brynjolfsson et al. (2018) . The close similarity of Felten et al. (2018) and Webb (2019) in contrast to Brynjolfsson et al. (2018) was first noted by Acemoglu et al. (2020) . Occupations that have a high degree of AI have similar profiles as those that score high on abstract tasks. That is, these occupations generally have higher educational requirements and enjoy higher wages. Second, we investigate the occupational degree of two novel characteristics, namely charisma and emotional intelligence (EQ). We use definitions from psychological outlets, e.g. the American Psychological Association. Both characteristics are intrinsically interesting and have been found to play a fundamental role in leadership skills and social affability. At the outset, charisma and EQ seem conceptually closely related, and we do find similarities empirically. The occupations that score high on both charisma and EQ are often found within community and social service, educational instruction, and arts, design, entertainment, sports, and media occupations. as well as educational instruction. For charisma, another frequent occupational group is sales. The occupations with high degrees of charisma and EQ also share similar wage profiles. Particularly, our estimates suggest occupations in the upper part of the wage distribution score high on both charisma and EQ. One difference appears in the lower part of the wage distribution, where occupations tend to score high on charisma but not on EQ. The projected employment growth is substantial for these occupations. Another difference that we detect at the aggregate level enters between occupations that require master's degree versus doctoral degrees. Essentially, these occupations score similarly on charisma, whereas there is a dip in the level of EQ for occupations that require doctorates. The rest of the paper is organized as follows. Section II presents the data used in occupation2vec, whereas Section III introduces the framework and our NLP method of choice. Section IV validates our framework and Section V considers our four applications on task measures, AI, charisma, and EQ, respectively. Section VI concludes. Appendices to data, method, and applications are found in Appendix A, B, and C-App:eq, respectively. Throughout the paper we will be using the notation |A| for the cardinality of a generic set A and [a, b] = {z ∈ Z|a ≤ z ≤ b} denotes the set of integers between a and b. In this section, we describe the data used to construct the occupation embeddings in the occupation2vec framework. Starting with the data has two advantages. First, in our experience it is easier to understand the framework with a specific source of data in mind, especially for readers less versed in NLP. Second, although any detailed occupational database could be used, O*NET provides the most comprehensive information to our knowledge. Specifically, the validation strategy requires the source of occupational information to contain both textual descriptions and numerical scores for numerous features of the entire universe of occupations, which we have not come across elsewhere. 2 O*NET keeps track of hundreds of descriptors for each occupation and we use the 873 unique occupations available. Each occupation is described textually by a general description (e.g., writers generally originate and prepare written material, such as scripts, stories, advertisements, and other material) and by detailed tasks (e.g., bakers place dough in pans, molds, or on sheets), as well as measured numerically on a numbers of attributes (e.g., how important is critical thinking for surgeons). We denote the three sources of occupational information, i.e., descriptions, tasks, and attributes, collectively as occupational descriptors. We highlight this distinction because the majority of the existing approaches is only suited for using the numerical scores of the attributes, whereas our method easily incorporates purely textual data as the tasks and descriptions as well. 3 Note that attributes are also defined textually and that tasks are also associated with a weight. Thus, all the occupational descriptors are both expressed as text and associated with an occupation-specific weight. For instance, the definitions of the attributes oral comprehension and depth perception follow from Table 1 . These textual definitions naturally apply to all occupations, but each occupation is also assigned a specific score between zero and one for all attributes. For instance, being an astronomer is associated with a score of 0.044 on oral comprehension and of 0.015 on depth perception. Similar, Table 2 shows an example of an occupation description and well as a few tasks for astronomers. In addition, these descriptions are associated with a weight. We will return to the example of astronomers in Section III. The descriptors represent ten categories; namely, description, tasks, abilities, interests, work values, work styles, skills, knowledge, work activities, and work 2 Specifically, the O*NET ® Content Model, which can be found here. We use O*NET version 25.3, which can be found here. 3 One important exception is Webb (2019), who uses the tasks but not the attributes. To the best of our knowledge, Webb (2019) cannot be generalized to using the attributes as well. context. 4 The total number of occupational descriptors amounts to 17,048, which includes 873 occupation descriptions (one for each occupation), 16,804 unique tasks (some tasks are occupation-specific, while others are performed by a few occupations, leading to on average 20 tasks per occupation), and 244 attributes (common across occupations but with occupation-specific weights). Figure 1 shows the structure of the data. We are exhaustive in the selection of occupational descriptors in the sense that we include all those for which meaningful textual descriptions are included. 5 The 4 Technically, the ten categories further represent four broader types of information about work; worker characteristics, worker requirements, occupational requirements, and occupationspecific information. Specifically, worker characteristics include abilities, interests, work values, and work styles; worker requirements include skills and knowledge; occupational requirements include work activities and work context; and occupation-specific information include description and tasks. In this paper, we focus on the ten categories and we do not distinguish further between the four broader types of occupational information. 5 For instance, the ability stamina is defined by the ability to exert yourself physically over long periods of time without getting winded or out of breath, whereas the technological skill (4,566) Descriptions (873) Descriptions (873) Notes: This figure shows the occupational descriptors provided by O*NET that can be expressed meaningfully by both text and numeric scores. The 17,048 unique occupational descriptors consist of 873 occupation descriptions, 16,804 detailed tasks, and 244 attributes. Note that a small number of tasks may be core to some occupations while supplemental to others, meaning that the sum of core and supplemental tasks exceed the total number of unique tasks. We divide the descriptors into ten categories of which one category belong to descriptions, one category belong to tasks (core and supplemental tasks are grouped as one category), and eight categories belong to attributes. occupation-specific weight that is associated with each descriptor is constructed based on at least one scale, e.g., importance, level, etc., and the realized value of the scale differs by occupations. The definition of each scale along with the set of categories that each scale applies to are presented in Table A .1 in Appendix A. Each scale has a minimum and maximum value and to enable comparison, all scales have been standardized to ranging from 0 to 1 following O*NET guidelines. In the case of multiple scales for a specific descriptor, we take a uniform average of the standardized scales. In this section, we present our general approach to measuring any definable target characteristic of an occupation by quantifying the similarity between two vectors that represent the occupation and the target characteristic, respectively. The occupation vectors are text embeddings generated by our preferred NLP method electronic mail software (e.g., Microsoft Outlook) is not further defined. that heavily rely on the plethora of occupation-specific information provided by the O*NET. Similar, the vector of the target characteristic is a text embedding of its definition that can in principle be provided by any objective and reliable source of information, e.g., international standards or dictionaries. Intuitively, our approach transforms every occupation into a high-dimensional vector for which reason we call it an occupation2vec framework. The characteristic of interest, say the 'greenness' of occupations, is similarly transformed into a high-dimensional vector, which allow us to compute a measure of similarity of the two vectors; one vector representing a particular occupation and one vector representing the characteristic. We start by illustrating the core principle of our framework in Figure 2 . Given a vector space produced by an NLP algorithm taking as its input a large corpus of text (e.g., all pages on Wikipedia), the first step is to assign a corresponding vector in the space to each occupational descriptor (all attributes, descriptions, and tasks) from O*NET. Once each source of occupational information is assigned a vector, all kinds of mathematical operations are in principle possible. The second step aggregates all descriptor embeddings into one occupationspecific embedding that takes into account the occupational loadings on the descriptor expressed by the weights. Within each category, we use weights dictated by O*NET 6 to aggregate the descriptor embeddings to category embeddings. The category embeddings are then further aggregated to occupational embeddings by a uniform average (within occupations). At this point, the occupation2vec framework has generated exactly one high-dimensional vector per occupation. 7 Having learned one representation of each occupation, the third step quantifies the degree of the target characteristic of interest, for which there exists a reliable definition. Likewise, we use the NLP algorithm to assign a high-dimensional vector to the target characteristic. Once all occupations and the target characteristic are quantified as embeddings, we measure the (semantic) similarity between each occupation and the target characteristic as the correlation coefficient between the Step 1: Descriptor-specific Embedding Step 2: Occupation-specific Aggregation Occupational embedding Notes: This figure shows the how the occupation2vec framework represents each occupation as a vector based on the rich occupational information from O*NET. −→ represents the text embedding by an NLP algorithm, × represents the multiplication between the vector and the occupation-category specific weights, dotted lines represent the weighted average of descriptor embeddings within categories, and dashed lines represent a uniform average (with weights 1 /10) across category embeddings. two high-dimensional vectors. The last step standardizes the similarities to zero mean and unit variance. These standardized similarities then represent the degree of the characteristic in question and they have an ordinal interpretation. As a specific example, imagine that we are interested in the job-level degree of oral comprehension, which covers the ability to listen to and understand information as well as depth perception, which covers the ability to judge the distance between objects (see Table 1 in Section II for the specific O*NET definitions). We choose these particular characteristics because they are included as ability attributes in O*NET. We compare two occupations from both tails of the attribute distribution, namely arbitrators and roof bolters. For arbitrators, a common task is to prepare written opinions or decisions regarding cases whereas a common task for roof bolters is to drill bolt holes into roofs at specified distances from ribs or adjacent bolts. The estimated degrees of each ability are shown in Table 3 . The estimates follow closely our expectations for the chosen mix of occupations and attributes. In Section IV, we will take additional steps to validate our framework. Index occupations by i for i = 1, . . . , n and assume access to a collection of K sets, In the O*NET context, G represents the set of ten categories available. The distinction between categories allows us to assign different weights to descriptors within and between categories if desirable. For each k = 1, . . . , K, G k contains an occupation-specific set, G k = {G 1,k , . . . , G i,k , . . . , G n,k }. We think of G i,k as containing occupation-specific information of the kth category, such as textual descriptions and numeric weights. All the occupation-specific sets are of the form where j = 1, . . . , N i,k indexes the jth occupational descriptor out of N i,k descriptors in category k, W i,k,j ∈ R represents the weight, and T i,k,j ∈ T is the textual definition of descriptor j in category k for occupation i with T containing the universe of unique descriptor definitions. We scale the weights to sum to one within occupation i and category k, that is Note that within each category k, the textual information T i,k,j is allowed but not required to vary across occupations, which also explains why the number of definitions within a category, N i,k , varies with i. This is the case for both the category Description and Tasks, respectively, because occupation descriptions and tasks differ between occupations. For other categories, e.g., Abilities, T i,k,j = T i ,k,j for all i, i = 1, . . . , n. The number of texts to be embedded is N = |T |, which in our case equals 17,048. The goal of the NLP algorithm of choice is to turn each T i,k,j into a vector X i,k,j of dimensionality d, which is then used to generate the occupational embeddings, Y i for i = 1, . . . , n, via (2), that is where Y i ∈ R d represents occupation i as a d-dimensional vector. Note that d is typically several hundreds and we use d = 1024. We introduce the NLP algorithm of choice next. Embedding more than 17,000 textual descriptors into vector spaces, a plethora of methods exist. Common to all is the assumption of the Distributional Hypothesis due to Harris (1954) , claiming that words that occur in the same contexts tend to have similar meanings and also popularized by Firth (1957) as "a word is characterized by the company it keeps". Some of the natural choices to consider are Word2Vec (Mikolov et al., 2013) , its document version Doc2Vec/Paragraph2Vec (Le and Mikolov, 2014) , GloVe (Pennington et al., 2014) , Fasttext (Bojanowski et al., 2016; Joulin et al., 2016a,b) , or BERT (Devlin et al., 2018) . In this paper, we use an optimized version of BERT called RoBERTa (Liu et al., 2019) , and thus our foundation is BERT. Our framework is not limited to specific embedding techniques, and our findings are robust to the choice of NLP methods. We will not review all details of BERT/RoBERTa as this is out of scope. Instead, we provide an intuitive explanation of BERT/RoBERTa and refer readers to Appendix B for more details or Rogers et al. (2021) for an excellent review. Let T m be a random piece of text (e.g., a sentence or a paragraph) from T . Each T m is essentially a sequence of tokens (e.g., words or subwords), t m,1 , . . . , t m,Nm . Further, let X m ∈ R d . We assume that there is a function f approximating the relationship between X m and Y m as where ε m is noise. Conceptually, any text embedding model may be viewed as a nonparametric estimatef of the function f . The challenge is that X m are latent variables that are not observed. The various methods consider different approaches to circumventing this. Encoder Representations from Transformers and is a revolutionary technique to learning general-purpose language representations, using a deep bidirectional Transformer encoder based on the original Transformer encoder-decoder from Vaswani et al. (2017) . The bidirectional aspect of BERT means that it simultaneously learns a language representation from left to right and from right to left compared to either a left-to-right model or the shallow concatenation of a leftto-right and a right-to-left model. Upon release, BERT achieved state-of-the-art results on a range of NLP tasks. Because the use of Transformers has become common, we will omit an exhaustive description of the attention architecture and refer readers to Vaswani et al. (2017) . In general, BERT consists of two steps, namely a pre-training and a finetuning step. First, the pre-training step encodes a lot of semantics and syntactic information about the language by training the model on massive amounts of unlabeled textual data drawn from the web. The objective is to detect generic linguistic patterns in the text, and it is mathematically represented as the sum of two separate objectives, that is, the Masked Language Model (MLM) objective and the Next Sentence Prediction (NSP) objective. The intuition behind MLM is that deep language knowledge is required to be able to filling in the blanks within a sentence with missing words. This in return requires one to read the entire sentence and that is why the bidirectional conditioning is important. The MLM is essentially a cloze procedure (Taylor, 1953) that masks a random sample of tokens by replacing the given token with the [M ASK] token. The MLM objective is then a cross-entropy loss on predicting the masked tokens, where the cross-entropy loss for a specific masked token, t 0 , is where y t 0 ,c = 1 {c:pt 0 ,c=maxc∈V pt 0 ,c}=c is a binary indicator that indicates if the (predicted) class c is correct for token t 0 , and p t 0 ,c is the (predicted) probability that token t 0 is of c. 8 In theory, BERT performs MLM once in the beginning, although in practice, the masks are not the same for every sequence because the data are duplicated for parallel training. In contrast, NSP is used for understanding the relationship between sentences and it similarly a classification problem. But where MLM considers multiclass classification, however, NSP uses a binary classification loss for whether one sentence is adjacent to another in the original texts. BERT optimizes the sum of the two loss functions using ADAM (Kingma and Ba, 2015) and the parameters can be found in Table B .1 in Appendix B. Second, fine-tuning is used to optimize performance for a specific downstream task, e.g., question answering, text summarization, or sentence embedding, and BERT is thus an example of transfer learning that can use that language representation learnt in the pre-training step across many specific language tasks. Fine-tuning is often performed by adding one or more layers on top of the deep network obtained under pre-training. We follow Reimers and Gurevych (2019) and fine-tune BERT/RoBERTa to yield useful sentence embeddings by training the model on labeled sentence pairs and minimizing the mean-squared-error (MSE) loss between sentence embeddings. 9 RoBERTa-A robustly optimized BERT Liu et al. (2019) find that the original implementation of BERT was significantly undertrained and propose RoBERTa, a robustly optimized version of BERT. RoBERTa builds on the same Transformer model as BERT and also uses the MLM objective, where it learns to predict intentionally hidden pieces of texts. The architectural difference is that RoBERTa drops the NSP objective and updates the MLM strategy to be dynamic within training epochs. In addition, RoBERTa is trained on more data and modifies some hyperparameters in the training process compared to BERT, e.g., larger batches and step sizes, leading to 4-5 times longer training times. The advantage is a 2-20% improvement over BERT on several benchmark tasks for NLP algorithms. In essence, RoBERTa outperforms BERT and could, at the 8 BERT uniformly selects 15% of the input tokens and perform one of three replacements on the selected tokens; (1) 80% are replaced with [M ASK], (2) 10% are left unchanged, and (3) 10% are replaced with a randomly selected token from V. 9 In principal, the pre-training step is sufficient to construct the text embeddings but Reimers and Gurevych (2019) demonstrate that these embeddings are not always meaningful. For this reason, we add the fine-tuning step. time of release, match or exceed every NLP model published after BERT in all individual tasks on several benchmarks, 10 which is the reason we choose RoBERTa as our preferred NLP algorithm. We will omit repeating unnecessarily the details of the implementation of RoBERTa and refer readers to Liu et al. (2019) . In this section, we validate the occupation2vec framework by performing two exercises. The first exercise is exploratory and visual, where we show that occupation vectors can be well separated by major occupational groups and required educational attainment. The second exercise is more formal and statistical, where we use various metrics to assess the performance of our estimates of the same occupational attributes that O*NET provides. Note that iff one is interested in characteristics that appear in the O*NET database (any of the 244 occupational attributes), there is no reason to use our approach. In this case, one should use the O*NET scores directly. Our method is relevant when one is considering novel characteristics that are not included in O*NET. Taking the O*NET scores as the ground truth simply allows for an transparent validation of our approach. Having outlined how we quantify an occupation by transforming it to a highdimensional vector, we illustrate the occupations in a two-dimensional space and color them by major occupational groups and educational requirements in Figure 3 and 4, respectively. Reducing the 1024 dimensions of the embeddings to only two dimensions, we first use PCA to reduce the number of dimensions to 50. Then, we use t-SNE (van der Maaten and Hinton, 2008) to further reduce the dimensionality to two. 11 From Figure 3 , it follows that the 873 occupations can robustly be separated into the 22 major occupational groups. Note that the different colors have no interpretation as groups are sorted alphabetically. For instance, both Production Occupations in purple, Construction and Extraction Occupations in green, and Installation, Maintenance, and Repair Occupations in blue can be found in the lower right corner, and despite the differences in colors, these groups represent a cluster related to physical and manual occupations. Likewise, Figure 4 indicates that educational requirement is an important separator of occupations. Especially the two extremes, i.e., No formal educational credential and Doctoral or professional degree occupy two different positions in the two-dimensional space. Note that the colors do have an interpretation is this figure, because we can order educations by level. The two figures support the idea that occupations are quantifiable and separable and this is despite the fact that we compress all the information in the high-dimensional space (d = 1024) to only two dimensions. When comparing the figures, note that they would be identical without the coloring because we only perform the dimension reduction once. Recall that O*NET provides 244 occupational attributes for each of the 873 occupations. Principally, we may estimate the occupational degree of each of these attributes the same way as we would for any target characteristic of interest. That is, we compare the occupational embeddings to each of the attribute embeddings, thereby quantifying the degree of each attribute. This gives us 244 estimates of occupational attributes for each of the 873 occupations, totaling 213,012 estimates we can readily use to validate our framework as we know the ground truth given as the O*NET numerical scores that are also associated with each attribute alongside the textual definition. Naturally, we cannot do this for the 873 occupation descriptions nor the 16,804 occupation-specific tasks both because these descriptors not comparable across occupations. Between and within occupations First, we assess the ability to correctly estimate an occupational attribute by considering correlations between and within occupations. The reason we care about a high correlation between occupations is that for a specific attribute, we should be able to rank the occupations accordingly (e.g., how do judges compare to carpenters on critical thinking?). To assess between-occupations validity, we use the 213,012 attribute estimates and compute the Pearson correlation coefficient between the O*NET score and our estimate by attribute, yielding 244 correlation estimates. A high correlation within occupations is equally important because for a specific occupation, we should be able to rank multiple attributes (e.g., how does static strength compare to dynamic strength for firefighters?). To assess within-occupations validity, we again use the 213,012 attribute estimates and compute the Pearson correlation coefficient but this time by occupation, yielding 873 correlation estimates. Obtaining two sets of correlations, we determine if each mean correlation differs significantly from zero using classic t-tests. In additional, we test if each mean correlation differs significantly from hypothesized values ordered from 1% to 99% and report the first postulated correlation for which we fail to reject the null hypothesis of no significant difference at the 5% level. The results follow from Table 4, where Table 4a shows the between-occupations tests and Table 4b shows the within-occupations tests. Both types of validity tests strongly reject the null hypothesis of zero correlation. In fact, it is not until reaching a postulated correlation coefficient 50% (i.e., 50% and 53% for between and within occupations, respectively) that we fail to reject the null hypothesis. Note that the t-statistic is more than three times higher comparing the null hypothesis of zero correlation from within-occupation to between-occupation. This indicates that our framework is relatively more capable of ranking attributes for a specific occupation. Together, the validation exercises suggest a significant non-zero mean correlation in the range of 50% between the O*NET scores and our estimates. Notes: This table shows the results from regressing the O*NET score on our estimate using eight specifications that each represent a column. Column (1) represents the baseline specification with no control, whereas column (8) represents the full specification with occupation, descriptor, and category dummies included, respectively. Columns (2)-(7) represent the specifications in between the baseline and full specification as detailed in rows 2-4. Coefficients estimates are shown in the first row alongside standard errors in parentheses and t-statistics in brackets. Superscripts ***, **, and * indicate statistical significance based on a (two-sided) t-test using heteroskedasticity-robust standard errors at significance levels 1%, 5%, and 10%, respectively. Overall explainability As a second validation exercise, we run various regressions of the O*NET scores on our estimates and a constant, using the entire set of 213,012 attribute estimates. The specifications we consider are the eight combinations that may be generated using occupation, attribute, and category dummies, respectively. That is, the baseline specification includes no control dummies, whereas the full specification includes all three sets of dummies. In Table 5 , we report the results from all eight regressions. The first row shows the coefficient on our measure, the standard error in parentheses, and the t-statistic in brackets. Rows 2-4 details the specification and the last two rows show the adjusted R 2 and the number of observations, respectively. It follows from Table 5 that the coefficient under scrutiny is highly significant and strongly robust across all specifications. In fact, the conditional correlation varies narrowly from 52.1% to 52.3%. This further supports the consistency, reliability, and validity of our proposed occupation2vec framework. In this section, we estimate the occupational degree of both some well-known characteristics (task measures and AI measures, respectively), and some novel characteristics (charisma and EQ, respectively). We show how our occupation2vec framework ranks occupations according to the characteristics and how our estimates correlate with various occupational statistics, e.g., educational requirements and wage. All occupational statistics are sourced from the U.S. Bureau of Labor Statistics. 12 For the well-known characteristics, we also assess the similarity of our estimates to those already established in the literature. A.1. The occupational degree of task measures We revisit the seminal work of Autor et al. (2003 Autor et al. ( , 2006 ; Acemoglu and Autor (2011) ; Autor and Dorn (2013) ; Autor and Handel (2013) to study tasks measures, namely abstract, manual, and routine tasks. In Table C .1 in Appendix C, we outline the individual O*NET attributes that comprise the three tasks measures and refer to Acemoglu and Autor (2011) for details on precisely how to construct the tasks measures. We will focus on the degree of routine tasks and refer to Appendix C for similar analyses of the degree of abstract and manual tasks, respectively. We start by defining routine tasks in Definition 2, relying heavily on Acemoglu and Autor (2011) . Recall how we feed this definition into our NLP method, which returns an embedding in the same vector space as the occupations. We then measure the occupational degree of routine tasks by the standardized correlation coefficient between each occupational embedding and the embedding of routine tasks. Definition 2 (Routine tasks). "Routine tasks are cognitive or physical tasks, which follow closely prescribed sets of precise rules and well-defined procedures and are executed in a well-controlled environment. These routine tasks are increasingly codified in computer software and performed by machines or sent electronically to foreign worksites to be performed by comparatively low-wage workers. Routine tasks are characteristic of many middle-skilled cognitive and production activities, such as bookkeeping, clerical work and repetitive production tasks." In Table 6 , we list the top and bottom 10 occupations on the degree of routine tasks. The results are as expected and the NLP algorithm even picks up the specific examples in Definition 2 such as clerical work (e.g., rank 2) and bookkeeping Notes: This table shows the top and bottom 10 occupations on degree of routine tasks estimated by the occupation2vec framework as the standardized correlation coefficient between occupation embeddings and the embedding of routine tasks. (e.g., rank 4). All top 10 occupations are truly characterized by accomplishing codifiable tasks that can be specified as a series of instructions to be executed by a machine. In stark contrast, the tasks of psychiatrists are by no means codifiable. Given the ability of the framework to separate occupations by major occupational groups and educational requirements, we show the degree of routine tasks by these two attributes in Figure 5 and 6, respectively. Figure 5 confirms that occupations that accomplish a higher degree of routine tasks are to be found in office and administrative support occupations, where the opposite can be found in community and social service occupations as well as educational instruction and library occupations. Figure 6 further shows that routine tasks are characteristic of occupations that require low-to-medium educational attainment, where occupations that require bachelor's, master's or doctoral degrees rarely involve routine tasks. We next highlight how our estimates largely coincide with the original measures by Autor and Handel (2013) . Essentially, we estimate a smoothed polynomial regression of both measures of routine tasks for each occupation against its rank in the wage distribution as well as its rank in the employment growth distribution. The results are shown in Figure 7a and 7b, respectively. Overall, the two approaches agree nearly perfectly on the relationship between the degree of routine tasks and the wage rank and employment rank. Figure 7a shows how occupations in the upper part of the wage distribution tend to be characterized by smaller amounts of routine tasks, whereas the opposite holds for the lower part of the wage distribution where mainly occupations with many routine tasks reside. An identical pattern holds for employment growth. Those occupations from which we can expect the highest growth are not characterized by performing many routine tasks. A trend that has been evidenced by many (see, e.g., Acemoglu and Autor (2011)). As a final comparison check, we compute various correlation coefficients between our estimates of abstract, manual, and routine tasks and the original ones. We show the coefficients in Table 7 , where we include the Pearson's correlation as a default measure of linear relationships as well as two non-parametric alternatives, namely Kendall's and Spearman's rank correlation, respectively. For all three tasks measures (and all three correlation coefficients), the degree of association between the two approaches appears strong and significant, particularly for manual and routine tasks. Turning to the occupational degree of AI, we use the Definition 6 to 14 presented in Appendix D. Similar, all figures and tables for this application is deferred to Appendix D. We start by considering the top and bottom 10 occupations on degree of AI from Table D.1. In top 10, we find e.g., data scientists and statisticians, which is aligned with our expectation. These occupations involve tasks that require various forms of e.g., predictive and prescriptive analytics, which are integral parts of AI. In the bottom 10, we find e.g., manicurists, pedicurists, and stonemasons. Considering the degree of AI by major occupational groups and educational requirement, we show Figure D.1 and Figure D. 2, respectively. As expected, the top major occupational groups computer and mathematical occupations, which coincides with the top occupations from Table D.1. The educational requirements tend to be higher for occupations with high degrees of AI, which makes sense as AI is difficult to skill to acquire, but the association is not monotonically increasing in education. Next, we compare our estimates of occupational degree of AI to the already established measures, namely those of Felten et al. (2018 ), Brynjolfsson et al. (2018 , and Webb (2019). In Figure D .3, we repeat the previous analysis and estimate a smoothed polynomial of the measure for each occupation against its rank in the wage and employment growth distribution. On both statistics of the labor market, our measure of the occupational degree of AI is to be found in the middle between the measures of Felten et al. (2018) and Webb (2019) , meaning that its relationship to occupational wage and employment growth is a mix of those of these well-established measures. Interestingly, our measure of occupational AI tends to agree relatively more with the one of Felten et al. (2018) for the high-wage occupations and agree relatively more with the one of Webb (2019) for the low-wage occupations. In fact, the correlation coefficient between the measures is 0.38 and 0.28, respectively, as shown in Table D .2. Specifically, occupations receiving higher wages tend to have a higher degree of AI compared to the occupations that receive less. The picture is less clear when looking at the relationship between the degree of AI and employment growth, and no significant pattern appears. In contrast, our measure does not align well with the one of Brynjolfsson et al. (2018), which seems to be picking up something different. The fact that the three established measures are picking up different aspects of AI is already found by Acemoglu et al. (2020) , who also find that the measures of The second part of this section considers two novel characteristics of occupations, namely the degree of charisma and EQ. To economize on space, all figures and tables for the EQ application are deferred to Appendix F. The first systematic treatment of charisma is due to Weber (1947) and popularized by Dow (1969) . Charisma is at the very center of effective leadership (Avolio and Yammarino, 2013) and is continuously being studied (see, e.g., von Hippel et al. (2016)). It has been recognized as one of the main explanations for why certain leaders, i.e., charismatic leaders, develop emotional attachment with followers and other leaders that eventually foster performance that surpass expectations. Likewise, charisma is associated with being considerate, inspirational, visionary and intellectually stimulating (for a review, see, e.g., Spencer (1973) or Turner (2003) ). One definition of charisma follows from Definition 3, which is part of the definitions we use to construct the charisma embedding. 13 Definition 3 (Charisma). "The special quality of personality that enables an individual to attract and gain the confidence of large numbers of people. It is exemplified in outstanding political, social, and religious leaders." (American Psychological Association, 2021a) We begin our analysis by highlighting the top and bottom 10 occupations on the degree of charisma in Table 8 . Actors, fundraisers, and sociologists are both among the top occupations that associate with charisma, because both occupations require both the ability to catch the attention of the audience and the knowledge of group behavior and dynamics, societal trends and influences. Similar for teachers (particularly within special education and psychology) who must be knowledgeable of human behavior and be capable of assessing individual differences in personality and interests. An interesting yet obvious finding is that clergy are among the top occupations on charisma as these formal religious leaders must be able to speak to the public and gain followers by presiding over specific rituals and teaching their religion's doctrines. Generally, occupations within arts, design, entertainment, sports, media, education and sales score high on charisma. On the other side of the spectrum, we mostly find occupations within construction, installation, maintenance, and production. These occupations have less of a need to be able to inspire and motivate large numbers of people. This finding is confirmed by Figure 8 , showing the occupational degree of charisma by major occupational groups. Interestingly, an almost monotonic pattern appears between education requirements and degree of charisma as shown in Figure 9 . The extent of charisma is comparable for occupations that generally do not require higher degrees, whereas occupations with educational requirements of bachelor's degrees or higher are increasingly associated with charisma. Last, we examine how charisma is connected to wage and employment growth in Figure 10 by the means of a smoothed polynomial regression. Figure 10a indicates that there is a non-monotonic relationship between charisma and wage. Essentially, in the lowest part of the wage distribution, occupations tend to have high degrees of charisma. These occupations are, e.g., actors (presumably non-Hollywood actors), teachers, and social workers. In contrast, the occupations with the lowest score on charisma are on average found between the first and second quartile of the wage distribution, whereas occupations in the upper part of the wage distribution can be characterized by high degrees of charisma. The latter would be the occupations that require advanced degrees and leadership capabilities. Considering employment growth in Figure 10b , the relationship is almost a linear and positive, meaning that the employment outlook for occupations with high degrees of charisma is bright. Although the term gained popularity by Goleman (1995) , the concept of EQ has been well-studied before starting with Maslow (1950); Beldoch (1964) ; Leuner (1966) . Generally speaking, EQ refers to the ability to intelligently perceive, understand, and manage emotions, and people with high EQ use this ability to guide thinking and behavior. The occupational degree of EQ is particularly interesting as EQ has been shown to correlate positively with many desirable outcomes. In summary, a recent review by Mayer et al. (2008) documents that higher EQ correlates with better social relations, better perceptions by others, better academic achievement, and better general well-being, e.g., higher life satisfaction. We begin the analysis with Table F .1, tabulating the top and bottom ten occupations on EQ, and Figure F.1 showing EQ by occupational groups. All top 10 occupations are either therapists, counselors, psychologists or teachers, typically within social service or health care, and these occupations truly require the ability to carry out reasoning about emotions to support others in various needs. Contrary in the bottom 10, these occupations, e.g., pipe-layers, plumbers, or carpenters, require less emotional knowledge and more arm-hand steadiness and manual dexterity. This applies more generally to occupations within construction, maintenance, installation, and repair. An interesting tail effect is found when considering EQ by educational requirements in Figure F .2. We find a monotonic relationship from low education and low EQ to high education and high EQ, but this monotonicity stops once we consider occupations that on average require a doctorate. This indicates that occupations that require doctorates rather than people with a master's degree need very specialized employees that may not have to engage too often with a larger group of coworkers rather than focusing deeply on very complex problems. Last, we consider the usual smoothed regression against ranks in the wage and employment growth distributions and plot the results in Figure F .3. The EQ-wage relationship from Figure F .3a is concave up to the median wage percentile and linear hereafter. This means that while occupations in the very low part of the wage distribution generally score higher on EQ compared to occupations around the first quartile, we generally find that increasing levels of EQ are associated with higher-wage occupations. That EQ should be positively correlated with wage at the individual level is found by Rode et al. (2017) , however, the full distribution tells us that on aggregate relatively high level of EQ is also to be found in very low-wage occupations. Recall that we find a fully concave relationship between occupational charisma and wage in Figure 10a , but we cannot recover this pattern for EQ. This tentatively suggests that EQ is on average a more difficult characteristic to achieve compared to charisma in terms of wage returns. However, we emphasize that these relationships are mere correlations and we do not attempt to draw any causal inference. The employment outlook for occupations with high degrees of EQ is bright, as documented by Figure F .3b. It is inherently interesting to study occupational characteristics and how they associate with descriptive statistics of the labor market, e.g., employment and wage growth, educational attainment, labor force participation rates, etc. These studies inform policy-makers on how to optimally design education and labor markets. But estimating the occupational degree of certain target characteristics is challenging for many reasons. Most importantly, it may be severely difficult to validate and verify novel measures of occupational characteristics once constructed. This implicitly leaves too much room for research creativity with regards to the selection of occupational features that enter the composition of new measures. To accommodate this issue, we propose occupation2vec as a fully data-driven and general approach to quantifying occupations, which enable the measurement of any occupational characteristic of interest in a transparent and verifiable way. Using natural language processing, we embed more than 17,000 occupationspecific descriptors sourced from the O*NET as vectors and combine them into unique occupation vectors of high dimensions. Using an objective and reliable definition of a given target characteristic of interest, we also embed this into a high-dimensional vectors and measure the occupational degree of the given characteristic as standardized correlation coefficients between the vectors. Using the actual scores from O*NET on 244 attributes common to the universe of occupations as the ground truth, we extensively validate our approach by comparing our estimates to those scores. We find that our estimates explain more than 50% of the variation in the original scores and, joint with other validation exercises, we take this as evidence that our framework is capable of producing high-quality occupation vectors that accurately capture detailed aspects of occupations. We apply occupation2vec to four applications, where we study the occupational degree of two known characteristics as well as two novel. As known, or previously studied, characteristics, we consider the popular task measures (abstract, manual, and routine tasks), and exposure to AI. Our estimates of the task measures largely match the original ones. For instance, we also find that occupations involving many routine or manual tasks tend to appear in the bottom of the wage distribution. The opposite holds for abstract tasks. Regarding exposure to AI, our measure also broadly agrees with the already-proposed measures and can in fact be seen as a mix of the two most widely used ones. As novel characteristics, we consider the occupational degree on charisma and EQ. We find that occupational that score high on these attributes tend to be found within community and social service occupations, arts, entertainment, sports, and media occupations, and educational instruction occupations. High scores occur most often for occupations that require advanced degrees and in the top of the wage distribution. The most striking difference in found for occupations in the bottom of the wage distribution, where scores tend to be high on charisma but not on EQ. This suggests that wage returns to charisma and EQ could be very different. In summary, this paper proposes a data-driven and universal approach to representing occupations as mathematical objects. This has many interesting economic applications, and particularly, we demonstrate how one may advance from purely qualitative definitions to quantitative scores at the occupational level. The occupation2vec framework, therefore, opens many doors for future analyses that study how novel occupational characteristics that have previously been inaccessible to researchers matter for labor market outcomes. A. Data appendix Rating indicates the degree of importance a particular descriptor is to the occupation. Knowledge, Skills, Abilities, and Work Activities Rating indicates the degree to which a particular descriptor is required or needed to perform the occupation. Relevance Tasks Rating indicates the degree to which a particular task is relevant to perform the occupation. Frequency Tasks Rating indicates how frequent a particular task is performed within a given time period. Occupational Interest Rating indicates the degree to which a particular interest (Holland, 1985) matches the occupation. Work Values Rating indicates the degree to which a particular work value affects the nature of the occupation. Work Context Rating indicates the degree to which a particular work context influences the nature of the occupation. Notes: This table shows scales associated with each descriptor within categories. All scales are standardized to a scale ranging from 0 to 1. See O*NET site on scales, ratings, and standardized scores for more information (here). Let L denote the number of layers (i.e., Transformer blocks), let d denote the hidden size (i.e., the dimensions of the embeddings), and let A denote the number of self-attention heads. The model size we consider uses L = 24, d = 1024, and A = 16. All hyperparameters can be found in Table B .1. For m = 1, . . . , N , let T m be a piece of text (e.g., a sentence or a paragraph) from T , where we ignore the subscripts i, k, j. Each T m is a sequence of tokens (e.g., words or subwords), t m,1 , . . . , t m,Nm ., generated by WordPiece tokenization (Wu et al., 2016) . BERT takes as input a concatenation of two sequences, t m,1 , . . . , t m,Nm and where l (t, V) is a lookup function that returns the index of token t in vocabulary V of all tokens, p (t, tok (T m , T m )) is a position function that returns the index of token t in the concatenated sequences tok (T m , T m ), and s (t, tok (T m , T m )) is a sequence function that returns the sequence index that token t belongs to in tok (T m , T m ). Essentially, each token is represented as three integers that determines the both position of the token in the vocabulary and the concatenated sequences, respectively, and which of the individual sequences that the token belongs to. Last, each integer (i.e., token, position, and sequence) is passed through an embedding layer φ h : R → R d for h = l, p, s that returns a d- (t m ,1 ) ) . . . dimensional embedding. The three embeddings are summed to generate the final input embedding e t ≡ e (t; V, T m , T m ). That is, e (t; V, T m , T m ) = φ l (l (t, V)) + φ p (p (t, tok (T m , T m ))) + φ s (s (t, tok (T m , T m ))) (B.5) We show the full process of turning tokens into input embeddings in Figure B .1. These input embeddings are then passed through the layers of the deep Transformer model. During training, BERT considers two primary objectives, i.e., the MLM objective and the NSP objective, as described in the main text. In our framework, the object of interest is the sentence or paragraph embedding of T m , and generally, three strategies exist to compute these embeddings once the final word embeddings have been estimated; 14 mean-pooling that uses the average of all the word embeddings from T m , max-pooling that uses the maximum of all the word embeddings from T m , or first-pooling that uses the first of all the word embeddings from T m (i.e., the [CLS] token). We use mean-pooling to construct our sentence embeddings. Technically, we add a pooling operation to the output of BERT/RoBERTa to derive the sentence embeddings, which is proposed Reimers and Gurevych (2019) . This is essentially a fine-tuning layer such that BERT/RoBERTa generates better sentence embeddings. For more details, we refer interested readers to Reimers and Gurevych (2019). The hyperparameters used for pre-training follow from Table B .1. Definition 4 (Abstract tasks). "Abstract tasks are problem-solving and managerial tasks, which require intuition, persuasion, and creativity and are unstructured and nonroutine. These abstract tasks are performed by workers who have high levels of education and analytical capability, and they benefit from computers that facilitate the transmission, organization, and processing of information. Abstract tasks are characteristic of professional, managerial, technical, and creative occupations, such as law, medicine, science, engineering, marketing, and design." Definition 5 (Manual tasks). "Manual tasks are innate tasks like dexterity, sightedness, and visual and language recognition, which require adaptability, flexibility, and in-person interactions. These tasks are typically nonpredictable but straightforward. Manual tasks are characteristic of construction and service occupations, such as truck drivers, janitors, and house-cleaners." Definition of artificial intelligence The following definitions jointly define artificial intelligence. The definitions are from IEEE Guide for Terms and Concepts in Intelligent Process Automation, IEEE Std 2755TM-2017 (link). Starting from Artificial Intelligence in Definition 7, we include the definitions of all related terms repeatedly until we have these nine definitions. By using our NLP algorithm, we get one embeddings for each definition, which we finally combine into one embedding by a simple average. Definition 6 (Artificial General Intelligence). "Complex, computational artificial intelligence capable of providing descriptive, discovery, predictive, prescriptive, and deductive analytics with relevance and accuracy equal to or exceeding human experts in multiple general knowledge domains. Artificial general intelligence includes artificial intelligence systems capable of interacting naturally with humans and machines in a way undetectable to expert observers and consistently passing the Turing Test for artificial intelligence." Definition 7 (Artificial Intelligence). "The combination of cognitive automation, machine learning, reasoning, hypothesis generation and analysis, natural language processing, and intentional algorithm mutation producing insights and analytics at or above human capability." Definition 8 (Machine Learning). "The combination of cognitive automation, machine learning, reasoning, hypothesis generation and analysis, natural language processing, and intentional algorithm mutation producing insights and analytics at or above human capability." Definition 9 (Cognitive Automation). "The identification, assessment, and application of available machine learning algorithms for the purpose of leveraging domain knowledge and reasoning to further automate the machine learning already present in a manner that may be thought of as cognitive. With cognitive automation, the system performs corrective actions driven by knowledge of the underlying analytics tool itself, iterates its own automation approaches and algorithms for more expansive or more thorough analysis, and is thereby able to fulfill its purpose. The automation of the cognitive process refines itself and dynamically generates novel hypotheses that it can likewise assess against its existing corpus and other information resources." Definition 10 (Descriptive Analytics). "Insights, reporting, and information answering the question, "Why did something happen?" Descriptive analytics determines information useful to understanding the cause(s) of an event(s)." Definition 11 (Deductive Analytics). "Insights, reporting, and information answering the question, "What would likely happen IF...?" Deductive analytics evaluates causes and outcomes of possible future events." Definition 12 (Diagnostic Analytics). "Insights, reporting, and information answering the question, "Why did something happen?" Diagnostic analytics determines information useful to understanding the cause(s) of an event(s)." Definition 13 (Predictive Analytics). "Insights, reporting, and information answering the question, "What is likely to happen?" Predictive analytics support high confidence foretelling of future event(s)." Definition 14 (Prescriptive Analytics). "Insights, reporting, and information answering the question, "What should I do about it?" Prescriptive analytics determines information that provides high confidence actions necessary to recover from an event or fulfill a need." Definition of charisma The following definitions jointly define charisma. The definitions are from Psychology Today, Cambridge University Press, Oxford University Press, and Wikipedia, respectively. In addition, we use American Psychological Association's Dictionary of Psychology (see Definition 3). By using our NLP algorithm, we get one embedding for each definition, which we finally combine into one embedding by a simple average. Definition 15 (Charisma). "Charisma is an individual's ability to attract and influence other people. While it is often described as a mysterious quality that one either has or doesn't have, some experts argue that the skills of charismatic people can be learned and cultivated." (Psychology Today, 2021a) Definition 16 (Charisma). "A special power that some people have naturally that makes them able to influence other people and attract their attention and admiration." (Cambridge University Press, 2021a) Definition 17 (Charisma). "A quality possessed by some individuals that encourages others to listen and follow. Charismatic leaders tend to be self-confident, visionary, and change oriented, often with eccentric or unusual behavior." (Oxford University Press, 2021a) Definition 18 (Charisma). "Charisma is compelling attractiveness or charm that can inspire devotion in others. Scholars in sociology, political science, psychology, and management reserve the term for a type of leadership seen as extraordinary; in these fields, the term charisma is used to describe a particular type of leader who uses values-based, symbolic, and emotion-laden leader signaling." (Wikipedia, 2021a) F. Degree of emotional intelligence Definition of emotional intelligence The following definitions jointly define emotional intelligence. The definitions are from American Psychological Association, Psychology Today, Cambridge University Press, Oxford University Press, and Wikipedia, respectively. By using our NLP algorithm, we get one embedding for each definition, which we finally combine into one embedding by a simple average. Definition 19 (Emotional intelligence). "A type of intelligence that involves the ability to process emotional information and use it in reasoning and other cognitive activities, proposed by U.S. psychologists Peter Salovey and John D. Mayer. According to Mayer and Salovey's 1997 model, it comprises four abilities: to perceive and appraise emotions accurately; to access and evoke emotions when they facilitate cognition; to comprehend emotional language and make use of emotional information; and to regulate one's own and others' emotions to promote growth and well-being. Their ideas were popularized in a best-selling book by U.S. psychologist and science journalist Daniel J. Goleman who also altered the definition to include many personality variables." (American Psychological Association, 2021b) Definition 20 (Emotional intelligence). "Emotional intelligence refers to the ability to identify and manage one's own emotions, as well as the emotions of others. Emotional intelligence is generally said to include a few skills: namely emotional awareness, or the ability to identify and name one's own emotions; the ability to harness those emotions and apply them to tasks like thinking and problem solving; and the ability to manage emotions, which includes both regulating one's own emotions when necessary and helping others to do the same." (Psychology Today, 2021b) Definition 21 (Emotional intelligence). "The ability to understand the way people feel and react and to use this skill to make good judgments and to avoid or solve problems." (Cambridge University Press, 2021b) Definition 22 (Emotional intelligence). "Ability to monitor one's own and other people's emotions, to discriminate between different emotions and label them appropriately, and to use emotional information to guide thinking and behavior. It encompasses four competencies: the ability to perceive, appraise, and express emotions accurately; the ability to access and evoke emotions when they facilitate cognition; the ability to comprehend emotional messages and to make use of emotional information; and the ability to regulate one's own emotions to promote growth and well-being. (. . .) Popularized interpretations of emotional intelligence include various other factors such as interpersonal skills and adaptability." (Oxford University Press, 2021b) Definition 23 (Emotional intelligence). "Emotional intelligence is most often defined as the ability to perceive, use, understand, manage, and handle emotions. People with high emotional intelligence can recognize their own emotions and those of others, use emotional information to guide thinking and behavior, discern between different feelings and label them appropriately, and adjust emotions to adapt to environments." (Wikipedia, 2021b) Top and bottom 10 occupations Notes: This figure shows a smoothed polynomial regression of our standardized measure of emotional intelligence in each 6-digit SOC occupation against its rank in the wage distribution (F.3a) and the employment growth distribution (F.3b). Skills, Tasks and Technologies: Implications for Employment and Earnings AI and Jobs: Evidence from Online Vacancies Definition of charisma Definition of emotional intelligence The growth of low-skill service jobs and the polarization of the U.S. labor market Putting Tasks to the Test: Human Capital, Job Tasks, and Wages The polarization of the U.S. labor market The Skill Content of Recent Technological Change: An Empirical Exploration* Transformational and Charismatic Leadership: The Road Ahead 10th Anniversary Edition Sensitivity to expression of emotional meaning in three modes of communication, The Communication of Emotional Meaning, Davitz The Growing Importance of Social Skills in the Labor Market* BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding How Many Jobs Can be Done at Home? The Theory of Charisma A Method to Link Advances in Artificial Intelligence to Occupational Abilities A Synopsis of Linguistic Theory 1930-1955 The future of employment: How susceptible are jobs to computerisation? A Grand Gender Convergence: Its Last Chapter Emotional intelligence Making vocational choices: A theory of vocational personalities and work environments FastText.zip: Compressing text classification models Bag of Tricks for Efficient Text Classification Adam: A Method for Stochastic Optimization Distributed Representations of Sentences and Documents Emotionale Intelligenz und Emanzipation. Eine psychodynamische Studie über die Frau. [Emotional intelligence and emancipation. A psychodynamic study on women RoBERTa: A Robustly Optimized BERT Pretraining Approach A time-lagged study of emotional intelligence and salary A Primer in BERTology: What We Know About How BERT Works What Is Charisma? Cloze Procedure": A New Tool for Measuring Readability Charisma Reconsidered Visualizing Data using t-SNE Quick Thinkers Are Smooth Talkers: Mental Speed Facilitates Charisma The Impact of Artificial Intelligence on the Labor Market The theory of social and economic organization Article on emotional intelligence