Microsoft Word - September_ITAL_Maceli_proofed.docx What Technology Skills Do Developers Need? A Text Analysis of Job Listings in Library and Information Science (LIS) from Jobs.code4lib.org. Monica Maceli INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 8 ABSTRACT Technology plays an indisputably vital role in library and information science (LIS) work; this rapidly moving landscape can create challenges for practitioners and educators seeking to keep pace with such change. In pursuit of building our understanding of currently sought technology competencies in developer-‐oriented positions within LIS, this paper reports the results of a text analysis of a large collection of job listings culled from the Code4lib jobs website. Beginning more than a decade ago as a popular mailing list covering the intersection of technology and library work, the Code4lib organization's current offerings include a website that collects and organizes LIS-‐related technology job listings. The results of the text analysis of this dataset suggest the currently vital technology skills and concepts that existing and aspiring practitioners may target in their continuing education as developers. INTRODUCTION For those seeking employment in a technology-‐intensive position within library and information science (LIS), the number and variation of technology skills required can be daunting. The need to understand common technology job requirements is relevant to current students positioning themselves to begin a career within LIS, those currently in the field that wish to enhance their technology skills, and LIS educators. The aim of this short paper is to highlight the skills and combinations of skills currently sought by LIS employers in North America through textual analysis of job listings. Previous research in this area explored job listings through various perspectives, from categorizing titles to interviewing employers;1,2 the approach taken in this study contributes a new perspective to this ongoing and highly necessary work. This research report seeks a further understanding of the following research questions: • What are the most common job titles and skills sought in technology-‐focused LIS positions? • What technology skills are sought in combination? • What implications do these findings have for aspiring and current LIS practitioners interested in developer positions? As detailed in the following research method section, this study addresses these questions Monica Maceli (mmaceli@pratt.edu) is Assistant Professor, School of Information and Library Science, Pratt Institute, New York. WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 9 through textual analysis of relevant job listings from a novel dataset—the job listings from the Code4lib jobs website (http://jobs.code4lib.org/). Code4lib began more than a decade ago as an electronic discussion list for topics around the intersection of libraries and technology.3 Over time, the Code4lib organization expanded to an annual conference in the United States, the Code4Lib Journal, and most relevant to this work, an associated jobs website that highlights jobs culled from both the discussion list and other job-‐related sources. Figure 1 illustrates the home page of the Code4lib jobs website; the page presents job listings and associated tags, with the tags facilitating navigation and viewing of other related positions. Users may also view positions geographically or by employer. Figure 1. Homepage of the code4lib Jobs Website, Displaying Most-‐Recently Posted Jobs and the Associated Tags.4 In addition to the visible user interface for job exploration, the website consists of software to gather the job listings from a variety of sources. The website incorporates jobs posted to the Code4lib discussion list, American Library Association, Canadian Library Association, Australian Library and Information Association, HigherEd Jobs, Digital Koans, Idealist, and ArchivesGig. This broad incoming set of jobs provides a wide look into new technology-‐related postings. New job listings are automatically added to a queue to be assessed and tagged by human curators before posting. This allows manual intervention where a curator assesses whether the job is relevant to technology in the library domain and to validate the job listing information and metadata (see figure 2). Curating is done on a volunteer basis, and curators are asked to assess whether the position is relevant to the Code4lib community, if it is unique, and to ensure that it has an associated employer, set of tags, and descriptive text. Combining both software processes INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 10 and human intervention in the job assessment results in the ability to gather a large number of jobs of high relevance to the Code4lib community. As mentioned earlier, Code4lib’s origins are in the area of software development and design as applied in LIS contexts. These foci mean that most jobs identified as relevant for inclusion in the Code4lib jobs dataset are oriented toward developer activities. The Code4lib jobs website therefore provides a useful and novel dataset within which to understand current employment opportunities relating to the intersection between technology— particularly developer work—and the LIS field. Figure 2. Code4lib Job Curators Interface Where Job Data is Validated and Tags Assigned.5 RESEARCH METHOD To analyze the job listing data in greater depth, a textual analysis was conducted using the R statistical package, exploring job titles and descriptions.6 First, the job listing data from the most recent complete year (2014) were dumped from the database backend of the Code4lib jobs website; this dataset contained 1,135 positions in total. The dataset included the job titles, descriptions, location and employer information, as well as tags associated with the various WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 11 positions. The text was then cleaned to remove any markup tags or special characters that remained from the scraping of listings. Finally, the tm (text mining) package in R was used to calculate frequency, correlation of terms, generate plots, and cluster terms across both job titles and descriptions.7 RESULTS Job Title Analysis Of the full set of 1,135 positions, 30 percent were titled as a librarian position; popular specialties included systems librarian and various digital collections and curation-‐oriented librarian titles. Figures 3 and 4 detail the most common terms used in position titles across librarian and nonlibrarian positions. Figure 3. Most Common Terms Used in Librarian Position Titles. 345 89 63 59 34 29 25 25 23 21 20 20 18 18 16 14 13 13 13 12 12 11 11 11 10 librarian digital systems services metadata data technologies university technology web electronic resources assistant information emerging scholarship collections library management initiatives sciences cataloging projects research professor Top Title Terms - Librarian Positions INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 12 Figure 4. Most Common Terms Used in Nonlibrarian Position Titles. The most popular job title terms were then clustered using Ward’s agglomerative hierarchical method (dendogram in figure 5). Agglomerative hierarchical clustering, of which Ward’s method is widely used, begins first with single-‐item clusters, then identifies and joins similar clusters until the final stage in which one larger cluster is formed. Commonly used in text analysis, this allows the investigator to explore datasets in which the number of clusters is not known before the analysis. The dendograms generated (e.g., figure 5) allow for visual identification and interpretation of closely related terms representing various common positions, e.g., digital librarian, software engineer, collections management, etc. Given that job titles in listings may include extraneous or infrequent words, such as the organization name, the cluster analysis can provide an additional view into common job titles across the full dataset in a more generalized fashion. 182 141 116 90 86 68 65 59 59 59 55 52 49 49 40 40 40 40 38 35 34 34 33 32 24 digital developer library manager specialist software web archivist services technology engineer director data systems analyst coordinator information senior metadata administrator lead project head programmer research Top Title Terms - Non-Librarian Positions WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 13 Figure 5. Cluster Dendrogram of Terms Used in Job Titles Generated Using Ward's Agglomerative Hierarchical Method. Tag Analysis As described earlier, the Code4lib jobs website allows curators to validate and tag jobs before listing. The word cloud in figure 6 displays the most common tags associated with positions, with XML being the most popular tag (178 occurrences). Figure 7 contains the raw frequency counts of common tags observed. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 14 Figure 6. Word Cloud of Most Frequent Tags Associated with Job Listings by Curators. WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 15 Figure 7. Frequency of Commonly Occurring Tags (frequency of fifty occurrences or more) in the 2014 Job Listings. Job Description Analysis The job description text was then analyzed to explore commonly co-‐occurring technology-‐related terms, focusing on frequent skills required by employers. Figures 8, 9, and 10 plot term correlations and interconnectedness. Terms with correlation coefficients of 0.3 or higher were chosen for plotting; this common threshold chosen broadly included terms with a range in positive relationship strength from moderate to strong. Plots were created to express correlations around the top five terms identified from the tags: XML, Javascript, PHP, metadata, and HTML (frequencies in figure 7). Any number of terms and 178 155 152 142 125 119 114 106 101 99 90 90 89 89 86 82 79 78 70 70 69 69 66 63 62 54 53 51 51 50 50 XML JavaScript PHP Metadata HTML Archive Cascading Style Sheets Python Integrated library system Java MySQL Dublin Core MARC standards Encoded Archival Description Ruby Drupal Project management SQL Metadata Object Description Standard Data management GNU/Linux Digital preservation Perl Digital library XSL Transformations Resource Description and Access Digital repository World Wide Web Management DSpace METS Frequency of Tags - 2014 Job Listings INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 16 frequencies can be plotted from such a dataset; to orient the findings closely around the job listing text, a focus on the top terms was chosen. These plots illustrate the broader set of skills related to these vital competencies represented in the job listings. Figure 8. Job Listing Terms Correlated with “XML” (most popular tag). Figure 9. Job Listing Terms Correlated with “Javascript” (Second Most Popular Tag), including “PHP” and “HTML” (third and fifth most popular tags, respectively). WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 17 Figure 10. Job Listing Terms Correlated with “Metadata” (fourth most popular tag). Finally, a series of general plots was created to visualize the broad set of skills necessary in fulfilling the positions of interest to the Code4lib community. As detailed in the title analysis (figures 3 and 4), apart from the generic term librarian, the two most common terms across all job titles were digital and developer. Correlation plots were created to detail the specific skills and requirements commonly sought in positions using such terms. Figure 11 illustrates the terms correlated with the general term of developer, while figure 12 displays terms correlated with digital. The implications of these findings will be discussed further in the following discussion section. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 18 Figure 11. Job Listing Terms Correlated with “Developer.” Figure 12. Job Listing Terms Correlated with “Ddigital.” WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 19 DISCUSSION Taken as a whole, the job listing dataset covered a quite dramatic range of positions, from highly technical (e.g., senior-‐level software engineer or web developer) to managerial and leadership roles (e.g., director or department head roles centered on digital services or emerging technologies). These findings support the suggestions of earlier research,8 which advocated for LIS graduate programs to build their offerings not just in technology skills but also in technology management and decision-‐making. However, the Code4lib jobs dataset is a one-‐dimensional view into the employment process and is focused largely on the developer perspective. Additional contextual information, including whether suitable candidates were easily identified and if the position was successfully filled, would provide a more complete view of the employment process. Prior research has indicated that many technology-‐related positions in LIS are in fact difficult to fill with LIS graduates.9 While LIS graduate programs have made great strides in increasing the number of courses and topics covered that address technology, these improvements may not benefit those already in the field or wishing to shift towards a more technology-‐focused position. In the common tags and terms analysis, experience with specific LIS applications was relatively infrequently required, with the Drupal content management system a notable exception. More generalizable programming languages or concepts, e.g., Python, relational databases, XML, etc., were favored As with technology positions outside of the LIS domain, employers likely seek those with the ability to flexibly apply their skills across various tools and platforms. This may also relate to the above challenges in filling such positions with LIS graduates, with the goal of opening up the position to a larger technologist applicant base. Common web technologies popular in the open-‐source software often favored by LIS organizations continued to dominate, with a clear preference for candidates well versed in HTML, CSS, JavaScript, and PHP. Relating to these skills, web development and design practices were often intertwined with positions requesting both developer-‐oriented skillsets as well as interface design (e.g., figure 7). Technologies supporting modern web application development and workflow management were evident as well, e.g., common requirements for experience with versioning systems such as Git, popular JavaScript libraries, and development frameworks. Also striking was the richness of the terms correlated with metadata (figure 10), including mention of growing areas of expertise, such as linked data. Interestingly, the general correlation plots expressing the common terms sought around “digital” and “developer” positions were quite varied. While the developer plot (figure 11 above) provided a richly technical view into common technologies broadly applied in web and software development, the terms correlated around digital were notably less technical (figure 12 above). While there was a clear focus on digital preservation activities and common standards in this area, mention of terms such as “grant” indicated that these positions likely have a broad role. The term digital was frequently observed in librarian job titles, so these roles may be tasked with both technical and administrative work. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2015 20 Finally, there are inherent difficulties in capturing all jobs relating to technology use in the LIS domain that introduce limitations into this study. While the incoming job feeds attempt to broadly capture recent job posts, it is possible that jobs are missed or overlooked by the job curators. Given the lack of one centralized job-‐posting source regardless of the field, this is a common challenge to research work attempting to assess every job posting. And as mentioned above, there is also a lack of corresponding data as to whether these jobs are successfully filled and what candidate backgrounds are ultimately chosen (i.e., from within or outside of LIS). CONCLUSION This assessment of the in-‐demand technology skills provides students, educators, and information professionals with useful direction in pursuing technology education or strengthening their existing skills. There are myriad technology skills, tools, and concepts in today’s information environments. Reorienting the pursuit of knowledge in this area around current employer requirements can be useful in professional development, new course creation, and course revision. The constellations of correlated skills presented above (figures 8–12) and popular job tags (figure 7) describe key areas of technology competencies in the diverse areas of expertise presently needed, from web design and development to metadata and digital collection management. In addition to the results presented in this paper, the Code4lib job website provides a continuously current view into recent jobs and related tags; this data can help those in the LIS field orient professional and curricular development toward real employer needs. ACKNOWLEDGEMENTS The author would like to thank Ed Summers of the Maryland Institute for Technology in the Humanities for generously providing the jobs.code4lib.org dataset for analysis. REFERENCES 1. Janie M. Mathews and Harold Pardue, “The Presence of IT Skill Sets in Librarian Position Announcements,” College & Research Libraries 70, no. 3 (2009): 250–57, http://dx.doi.org/10.5860/crl.70.3.250. 2. Vandana Singh and Bharat Mehra, “Strengths and Weaknesses of the Information Technology Curriculum in Library and Information Science Graduate Programs,” Journal of Librarianship & Information Science 45, no. 3 (2013): 219–31, http://dx.doi.org/10.1177/0961000612448206. 3. “About”" Code4lib, accessed January 6, 2014, http://jobs.code4lib.org/about/. 4. “code4lib jobs: all jobs,” Code4lib Jobs, accessed January 12, 2015, http://jobs.code4lib.org/. 5. “code4lib jobs: Curate,” Code4lib Jobs, accessed January 17, 2015, http://jobs.code4lib.org/curate/. 6. R Core Team, R: The R Project for Statistical Computing, 2014, http://www.R-‐project.org/. WHAT TECHNOLOGY SKILLS DO DEVELOPERS NEED? | MACELI doi: 10.6017/ital.v34i3.5893 21 7. Ingo Feinerer and Kurt Hornik, “tm: Text Mining Package,” 2014, http://CRAN.R-‐ project.org/package=tm. 8. Meredith G. Farkas, “Training Librarians for the Future: Integrating Technology into LIS Education,” in Information Tomorrow: Reflections on Technology and the Future of Public & Academic Libraries, edited by Rachel Singer Gordon, 193–201 (Medford, NJ: Information Today, 2007). 9. Mathews and Pardue, “The Presence of IT Skill Sets in Librarian Position Announcements.”