key: cord-0135982-7f7jvhik authors: Berengueres, Jose; Nesterov, Pavel title: A Survey of H-index, Stress, Tenure&Reference Management software use in Academia date: 2020-10-01 journal: nan DOI: nan sha: 6ea506537599718e8fdd46950466d207ae88e8d6 doc_id: 135982 cord_uid: 7f7jvhik We describe the findings of a survey that covered the topics of stress, citation tool use habits, subjective happiness, h-index, research topic and tenure among a sample of 2286 authors of arxiv.org. Ph.D. students report the lowest subjective happiness score among all faculty roles, while tenured faculty report the highest. Tenured faculty report the lowest levels of stress. Undergraduate and graduate students report the highest levels of stress. Non-tenured faculty report stress similar to postdocs. No association between citation management tool usage and h-index was found. The average age at tenure start is 34.9 years. In addition, no significant association between stress levels and the research topic was found In the past four decades , AI has progressively found its way to a plethora of industries under various names. From the expert systems that helped us reduce a foundry's waste output in the 1980s [1] ; to the recommender systems popularized by the 2000 Netflix prize competition [2] . Today, AI, and particularly its successful rebrand as Machine Learning, (following the AI winter of 1980s [3] ), seems to be powering productivity gains across society in the form of selfdriving cars; better fraud detection and analytics in general [4, 5] . Given all this, it is not without irony, that the very same academics that helped create these AI/ ML tools are also the very last ones to enjoy their fruits at their own work. In fact, we are hard pressed to name a blockbuster case where AI helped make the job of a researcher easier. Some notable exceptions follow. In 2019, some timid evidence of how AI could assist in research was published in [6] . It describes one of first cases of NLP use to predict novel chemical compounds that did not exist three years earlier in the literature, the equivalent of an academic Oracle. This result was preceded a year earlier by a seminal paper [7] where the application of machine learning (predictive analytics) to research is foreseen. These, and subsequent papers, were initially circumscribed to the materials and chemical sciences. However, since 2020 we see a spillover to other fields such as life sciences [8] [9] [10] 38] . Another less conspicuous but also promising area primed to enhance the productivity of academics is reference management software. This field gained popularity with London based startup called Mendeley, founded in 2008. Mendeley and other reference tools (RefMan, EndNote, Zotero,) help to reduce paper writing time by managing the reference list, formatting and other overhead tasks. Reference management accounts to about 20% of the total time spent writing a paper according to Citationsy. The average age at which a respondent got tenured is 34.9, SD 6.5, sample size: 291 respondents that self-identified as faculty. As shown by [6] [7] [8] , it is not difficult to foresee how these productivity tools might improve with the acceleration of AI [11] . In the particular case of reference management and discovery, a natural pathway is recommendation on what papers to read next. Interestingly, AI-driven recommendation systems similar to the Facebook feed are already live in the likes of Google Scholar TM and Research Gate TM . Given the controversial impact to these AI-driven feeds on mental health [29] [30] [31] [32] [33] [34] [35] [36] [37] , it is worth considering not only their impact on research but also on the well-being of the person. Surprisingly, very little is known about the impact of these tools on academics. Are academics that use reference management tools more productive? Are they happier? A literature search yields very few results. [12] evaluated ethical conflicts of interest found in Google Scholar algorithms; [13] compared the existing tools; [14] investigated how ResearchGate metrics can be used to infer the performance of a given researcher. However, none considers well-being. Following, we analyze a survey conducted by reference tool Citationsy Ltd. in collaboration with UAE University during the summer of 2020. We hope the findings will help inform the impact research tools on academics while considering as well wellbeing aspects. Data of the survey was collected from 2020-07-09 to 2020-09-28. The data was generated by emailing authors who had authored or co-authored at least one paper in a popular preprint server service in the past 3 years (Arxiv) [15] . In total, 2286 responses were collected. The average age of the respondents is 36.4 years, SD=11.42, sample size 200. The survey consisted of 10 questions. Table 1 shows the questions of the survey. Questions 4 to 7, (shaded), correspond the four standard questions used to compute the subjective happiness scale by [16] . Table 2 shows data about the papers of the respondents. A surprising fact from an initial exploration of the data is that 12% of the respondents (authors) identify themselves as employees rather than academics. Regarding faculty, 293 out of 386 respondents report to be tenured (76%). The average age at which tenure was obtained is 34.9 years old, SD = 6.5 years, sample size 291 (See Fig. 1) . Following is a highlight of the most relevant data challenges encountered. (i) In question 9, due to the proactive nature of the participants, many decided to specify their job role beyond the faculty, Ph.D., postdoc and student prompts offered as defaults. This resulted in responses with a remarkable granularity that reflects the complexity of academic world. All these unexpected roles (doctor, retired doctor, retired faculty, C.E.Os., professor emeritus …), have been mapped to five main roles (See 'roles' in Table 2 ). The mapping of roles can be found in the file roles.txt in the Appendix. (ii) In the same vein, for the question at what age did you get tenure we received respondent feed-back that not all countries have equivalent tenure track systems. However, we do not consider different tenure systems. (iii) Data attrition. Due to technical and anonymization hurdles, from the 2286 responses, only 1016 were matched to a research subject. Hence, some statistics related to disciplines are calculated on sample size of 1016 or lower depending of the completeness of responses or subset considered. (iv) The topic of an authored paper was assigned as the first category out of the maximum of four category options available on Arxiv. (v) Outliers such as very high self-reported h-index (>1000), self-evident typos and so on, are excluded from the calculations (~12 cases). This study received ethics approval by the ethics board at the Research Office at UAE university (id: ERS_2020_6162). Disclosures and conflicts of interest are as follows: The first author is investor in the company that conducted the survey (Citationsy Ltd). As we are analyzing multiple differences in means between disciplines the increased chance of spurious correlations must be accounted for as explained in depth in [17] . One way to do so is by using the Bonferroni correction method. We used the Python library statsmodels (v0.13.0.dev0 +36). A detailed statistical analysis is available in the Appendix. These questions correspond to the standard subjective happiness scale survey by [16] b. Where otherwise stated the first figure is mean and the second is standard deviation Not shared for anonymity purposes. Sample considered 975. c. Percent of paper with this keyword in the title , sample 1016. III. RESULTS No significant correlation is found between h-index, stress, subjective happiness and reference management software use frequency. As expected, a moderate correlation of -.45 is found between stress and subjective happiness. Following we explain the findings by role, subject and email domain of authors. Globally, respondents scored on average a subjective happiness of 4.76. This figure is similar to scores reported by students [18] (4.8 and 4.9), but lower than scores reported by working professionals (health care, 5.2) [19] . Among academic roles, tenured faculty reported the highest subjective happiness at 4.96; Ph.D. students reported the lowest score of all academic groups at 4.5. Tenured faculty reported the lowest stress at work while the rest of the groups reported similar stress. Students (graduate and undergraduate) reported the most stress attributed to work. Table 3 shows a detailed breakdown. Faculty is the role least likely to use a reference management software. Fig. 2 is a density plot of stress that visualizes why tenured faculty shows a lower stress than other groups. Among tenured faculty two subgroups exist. One reported stress similar to non-tenured faculty (5.0), while a second group reported stress near 2.0 (See 'faculty tenured' label in Fig. 2 ). Fig. 3 shows a scatter plot of roles. We note that in terms of stress and subjective happiness, (i) postdocs, (ii) researchers and (iii) non-tenured faculty are clustered close together (the label 'non-tenured' includes postdocs, researchers and non-tenured faculty). In addition, in terms of stress, (Ph.D. students, undergraduates, graduates, employees, non-tenured faculty and postdocs), show similar stress levels but varying degrees of subjective happiness. Table 4 shows a breakdown by the research subject of the paper associated to the respondent. Only the top 10 are shown for brevity. This table is for illustration purposes and the aggregates are not significative. Table 5 shows a summary of the Bonferroni correlation analysis. It yields no significant correlation between research subject and stress. In other words, there are no happier research disciplines than others. C. By e-mail domain Fig. 4 shows a scatter plot of stress vs. subjective happiness. Cultural biases as well as different country-tenure systems might explain the difference between e-mail domains. The moderate association between subjective happiness and stress seems to hold at country level as well. Academics with e-mail addresses from Germany, U.K. and Israel, report to be the least happy and report the most stress. Iran, Japan and India are on the opposite side of the spectrum. No significant association between h-index and how often an author used reference management software tools was found. This somehow counterintuitive fact could be partly explained by the fact that high h-index faculty tend to be older and hypothetically less adept to use these tools as compared to younger groups such as milenials or Gen-Z students. In addition, h-index association with reference management software, if exists, might only show up years later in one's career due to the multi-year dynamics of h-index formula [25, 26] . The topic of the respondent's research was not found to be associated with stress. This came as a surprise, as we expected fields with high popularity and high employment prospects to be associated with lower stress. Nevertheless, results here agree with results found elsewhere in the literature that report that an individual's well-being depends on factors such as for example the environment and attitudes towards life [20, 21] , seniority [27] , and beliefs [28] among many other factors. The results here also support the view that subjective happiness is not associated with the research topic (as expected). Half of the tenured faculty report a stress near 2.0, more than one SD lower than the average of 4.2 (SD = 1.64). Regarding subjective happiness, tenured and non-tenured faculty show a similar distribution. This is in spite that tenured faculty tend to be older and that some surveys show that happiness declines with age [23, 24] . This seems to confirm the robustness of the subjective happiness scale [16] . No gender or age was considered explicitly. Age is considered implicitly in the h-index and with the age at tenure which only applies to faculty roles. Each paper can be classified by up to four subjects. However, only the first subject is used to associate an author's responses to their research subject. This might not be accurate for interdisciplinary authors. Users of preprint servers might not be representative of academia in general. This survey might have other unaccounted biases such selection bias, country specific bias. The main conclusions are four: (i) the stress is correlated -.45 with subjective happiness index. (ii) Tenure is a factor in stress. (iii) No significant correlation was found between reference software usage and h-index. (iv) No association between topic of the paper and author's stress. Generic tasks in knowledge-based reasoning: High-level building blocks for expert system design The netflix prize Avoiding another AI winter Predictive analytics Competing on Analytics: Updated, with a New Introduction Unsupervised word embeddings capture latent knowledge from materials science literature Machine learning for molecular and materials science Knowledge synthesis from 100 million biomedical documents augments the deep expression profiling of coronavirus receptors A deep learning approach to antibiotic discovery How machine learning will transform biomedicine The pace of artificial intelligence innovations: Speed, talent, and trial-anderror Google Scholar: the pros and the cons. Online information review Managing information: evaluating and selecting citation management software, a look at EndNote, RefWorks, Mendeley and Zotero ResearchGate: An effective altmetric indicator for active researchers? Open journals that piggyback on arXiv gather momentum A measure of subjective happiness: Preliminary reliability and construct validation When to use the B onferroni correction. Ophthalmic and Physiological Optics Adult playfulness, humor styles, and subjective happiness Emotional intelligence, life satisfaction and subjective happiness in female student health professionals: the mediating effect of perceived stress The inseparability of professionalism and personal satisfaction: Perspectives on values, integrity and happiness Is happiness a consequence or cause of career success Workplace happiness: Work engagement, career satisfaction, and subjective well-being Money, age and happiness: association of subjective wellbeing with socio-demographic variables A validity and reliability study of the subjective happiness scale in Mexico An h-index weighted by citation impact Reflections around 'the cautionary use'of the h-index: Response to Teixeira da Silva and Dobránszki New hires' job satisfaction time trajectory Looking for adolescents' well-being: Self-efficacy beliefs as determinants of positive thinking and happiness The Social Dilema Ethics and social networking sites: a disclosive analysis of Facebook Social network determinants of depression Adolescent depression: social network and family climate-a case-control study Is social network site usage related to depression? A meta-analysis of Facebook-depression relations Upward social comparison and depression in social network settings Anxiety, depression, loneliness and social network in the elderly: Longitudinal associations from The Irish Longitudinal Study on Ageing (TILDA) Social disconnectedness, perceived isolation, and symptoms of depression and anxiety among older Americans (NSHAP): a longitudinal mediation analysis. The Lancet Public Health A critical consideration of social networking sites' addiction potential Effective Transfer Learning for Identifying Similar Questions: Matching User Questions Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining We would like to thank Cenk Dominic Özbakır for technical support and to the participants in the survey. In particular the ones whose feedback helped improve the design of the survey. APPENDIX Anonymized dataset and python notebook are available at https://www.kaggle.com/harriken/mentalhealth-academics/