Journal Rankings Paper v22 Neophilia Ranking of Scientific Journals Mikko Packalen University of Waterloo Jay Bhattacharya Stanford University 
 October 2015 Abstract The ranking of scientific journals is important because of the signal it sends to scientists about what is considered most vital for scientific progress. Existing ranking systems focus on measuring the influence of a scientific paper (citations)—these rankings do not reward journals for publishing innovative work that builds on new ideas. We propose an alternative ranking based on the proclivity of journals to publish papers that build on new ideas, and we implement this ranking via a text-based analysis of all published biomedical papers dating back to 1946. Our results show that our neophilia ranking is distinct from citation-based rankings. Prior theoretical work suggests an active role for our neophilia index in science policy. Absent an explicit incentive to pursue novel science, scientists underinvest in innovative work because of a coordination problem: for work on a new idea to flourish, many scientists must decide to adopt it in their work. Rankings that are based purely on influence thus do not provide sufficient incentives for publishing innovative work. By contrast, adoption of the neophilia index as part of journal-ranking procedures by funding agencies and university administrators would provide an explicit incentive for journals to publish innovative work and thus help solve the coordination problem by increasing scientists’ incentives to pursue innovative work. * We thank Bruce Weinberg, Vetla Torvik, Neil Smalheiser, Partha Bhattacharyya, Walter Schaeffer, Katy Borner, Robert Kaestner, Donna Ginther, Joel Blit and Joseph De Juan for comments. We also thank seminar participants at the University of Illinois at Chicago Institute of Government and Public Affairs, at the Research in Progress Seminar at Stanford Medical School, and at the National Bureau of Economic Research working group on Invention in an Aging Society for helpful feedback. Finally, we thank the National Institute of Aging for funding for this research through grant P01-AG039347. We are solely responsible for the content and errors in the paper. 
 1. Introduction The ranking of scientific journals is important because of the signal it sends to scientists about what is considered important in science. The top ranked journals by their editorial policies set standards and often also the agenda for scientific investigation. Editors make decisions about which papers to send out for review, which referees to ask for comments, requirements for additional analysis, and of course which papers to ultimately publish. These decisions work to check on the correctness of submitted papers, but they also let other scientists, administrators, and funding agencies know what is considered novel, important, and worthy of study (e.g. Brown 2014; Frey and Katja 2010; Katerattanakul et al. 2005; Weingart 2005). Highly ranked journals thus exert considerable influence on the direction that scientific disciplines move, as well as on the activity of scientists in each field. Journal rankings are also important because they provide a filter for scientists in the face of a rapidly growing scientific literature (e.g. Bird 2008). Given the vast volume of published scientific work, it is impossible for scientists to read and independently evaluate every publication even in their field. Since time is limited, as the number of scientific publications grows, the fraction of published papers that it is possible to read and carefully evaluate shrinks. Journal rankings provide a way to quickly identify those articles that other scientists in a field are most likely to be familiar with. Existing rankings of journals (or individual scientists) almost exclusively rely upon citation counts to determine the pecking order (e.g. Abbott et al. 2010; Adam 2002; Chapron and Husté 2006; Egghe 2006; Engemann and Wall 2009; Frey and Katja 2010; Garfield 1972; Hirsch 2005; Moed 2008; Palacios-Huerta and Volij 2004, 2014; Tort et al. 2012). Citations, of course, are a good measure of the influence of any given paper; a highly cited paper, almost by definition, has influenced many other scientists. While this reliance on citations is sensible if the goal of a ranking system is to identify the most influential journals, there is circularity in the logic. As financial rewards and professional prestige are tied to publishing in highly cited journals, scientists have a strong incentive to pursue work that has the best chance of being published in highly cited journals. Often, this entails work that builds upon and emulates other work that has been published in such journals. Highly cited journals may thus receive a high number of citations merely because scientists aim to publish in these journals. That a journal is highly cited need not tell us anything about what kind of science the journal promotes. One important reason for why rankings should consider also what kind of science is being pursued is that both individual scientists and journals face a coordination problem in moving to a new area of scientific investigation. As new ideas are often raw 
 1 when they are first born, they need revision and the attention of many scientists for the ideas to mature (Kuhn 1962; Marshall 1920). Debate among an emerging community of scientists who build on a new idea is essential both for the idea to mature and for the idea to gain the attention of other scientists. If only one scientist, or only a few, try out a new idea in their work, no new area will open up to broader scientific attention (Kuhn 1962). The presence of this coordination problem — that is, the dependence of scientists on other scientists to productively engage with their work — implies that even if citations accurately reflect the ex post value of working in a given area, absent specific incentives that reward novel science, a suboptimal amount of work takes place in novel areas. Thus, a journal ranking system that rewards only influence will provide too little incentive for a scientist to work in a new area. 1 Reputable journals also face a similar coordination problem; publishing a one-off paper in a new area is unlikely to generate many cites unless multiple journals publish papers in that new area. This exacerbates the coordination problem among scientists who are considering working in a new area, as they need their articles published in reputable journals to attract the attention of fellow scientists to their new area. 2 Citation-based journal rankings thus provide scientists too little incentive to pursue work that builds on new ideas, and too little incentive for journals to publish work that builds on new ideas. Hence, the ranking of scientific journals should instead be based 3 at least partly on things that measure what type of science is being pursued. In this paper, we construct a new journal ranking that measures to what extent the articles published by a given journal build on new ideas. Our neophilia-based ranking is tied directly to an objective of science policy; journals are ranked higher if they 
 A formal model of coordination failure among scientists is provided by Besancenot and Vranceaunu 1 (2015). Using a global games model (e.g. Carlsson and van Damme 1993; Morris and Shin 2003; Sakovics and Steiner 2012), they show that when scientists’ beliefs about the usefulness of a new idea differ even a little, too little novel science takes place in equilibrium. In related empirical work, Foster et al. (2015) show that while successful novel research yields more citations than successful conventional research, the difference is not enough to compensate for the risk associated with pursuing innovative work. Coordination problems among scientists and among journals are not the only reasons for why reliance 2 on influence-based rankings alone does not provide sufficient incentives for high-impact journals to publish novel science. First, because disruptive science implies a decrease in citations to past breakthroughs, journals that have published those past breakthroughs face a disincentive in publishing disruptive science. Second, editors of high-impact journals are often people whose ideas disruptive science seeks to challenge. This view has become become surprisingly common; even the editor-in-chief of the most highly cited 3 scientific journal — Science — has warned that citation-based metrics block innovation and lead to me- too science (Alberts 2013). Moreover, the rise of citation-based metrics over the past three decades may already be changing how scientists work: evidence from biomedicine shows that during this time scientists have become less likely to pursue novel research paths (Foster et al. 2015). 2 publish articles that explore the scientific frontier. Our index is thus a useful complement to citation-based rankings — the latter fail to reward journals that promote innovative science. To construct our neophilia ranking of journals, we must first select a set of journals to be ranked. We rank journals in medicine because of the substantive importance of medical science, because this focus builds on our existing work (e.g. Packalen and Bhattacharya 2015a), and because of the availability of a large database on publications in medicine (MEDLINE). For our corpus of medical research papers, we must first determine which published papers are built on new ideas and which are built on older ideas. We determine the ideas that each paper is built upon from its textual content. To find which ideas each paper builds upon, we take advantage of the availability of a large and well-accepted thesaurus, the United Medical Language System (“UMLS”). We allow each term in this thesaurus to represent an idea, broadly interpreted. Hence, to determine which ideas each paper builds upon, we search each paper for all 5+ million terms that appear in the UMLS thesaurus. For each paper we then determine the vintage of each term that appears in it based the paper’s publication year and the year in which the term first appeared in published biomedical literature. Next, we determine for each paper the age of the newest term that appears in it. Based on this age of the newest term that appears in each paper, we then determine for each journal to what degree it publishes innovative work — papers that mention relatively new terms. This yields us the neophilia index that we propose in this paper. One advantage of the UMLS thesaurus is that it reveals which terms are synonyms, allowing us to treat synonyms as representing the same idea when we construct our neophilia index. However, we also show that neophilia rankings change very little when we employ an alternative approach to constructing the neophilia index, an approach that does not take advantage of the UMLS thesaurus in any way. In this alternative approach, we construct the neophilia index by indexing all words and word sequences that appear in each paper rather than only words and word sequences that appear in the UMLS thesaurus. This sensitivity analysis shows that the neophilia ranking can be constructed also for areas of science for which no thesaurus is available. Besides calculating the new ranking for each journal, we examine the relationship between the neophilia-based measure and the traditional citation-based impact factor rankings. We find that impact factor ranking and our neophilia index are only weakly linked, which shows that our index captures a distinct aspect of each journal’s role in promoting scientific progress. 
 3 2. Methods In this section we first present the two sets of medical journals to which we apply the neophilia ranking procedure that we propose in this paper. We then explain how the neophilia index is constructed for each journal. Next, we discuss our approach for comparing the neophilia ranking against an influence-based ranking. The section concludes with methods for four sets of sensitivity analyses. 2.1 Journals We Rank We analyze two sets of medical journals. The first set of journals is the set of 156 journals that are ranked annually by Thomson Reuters (TR) under the category General and Internal Medicine. Journals in this category are aimed at a general medical audience; this set does not include field journals — even highly ranked field journals — that are aimed at practitioners in a particular medical specialty. The use of this set is advantageous for two reasons. The general nature of these journals implies that the rankings will be relevant to a large audience. Moreover, reliance on a journal set used by TR allows us to examine the relationship between our neophilia index and the widely used citation-based impact factor ranking — a ranking that is published by TR. While TR lists 156 journals in the General and Internal Medicine category, we calculate the neophilia index for only 126 journals. This is for several reasons. Four of the 156 journals are not indexed in MEDLINE. Some of the 156 journals are review journals (e.g. Cochrane Database of Systematic Reviews) whereas we only rank original research articles (and thereby exclude not just reviews but also editorials, commentaries, etc.). Moreover, for some journals MEDLINE has little or no information on article abstracts whereas we only rank articles for which the database includes sufficient textual information. The second set of journals that we analyze is the set of 119 journals that are listed as belonging to the Core Clinical Journals category by MEDLINE (this journal set is also referred to as Abridged Index Medicus). Core Clinical Journals includes both general medicine journals as well as well-known field journals from different areas of medicine. This journal set allows us to examine if either journals aimed at the whole profession or specialized journals play a dominant role in promoting the trying out of new ideas in medicine. 2.2 Constructing the Neophilia Index for a Journal The neophilia index that we propose in this paper measures to what extent articles published in a given journal build on new ideas; the index reflects a journal’s 
 4 propensity to publish innovative articles that try out new ideas. We construct this index based on the textual content of original research articles that appear in a journal. 4 We determine the textual content of a journal from the MEDLINE database. MEDLINE is a comprehensive database of 20+ million biomedical scientific publications. Comprehensive coverage of this database begins in 1946. For articles published before 1975 the textual information generally includes the title but not the abstract of each article. For articles published since 1975 the data generally include both the title and the abstract of each article. For this reason, in our baseline specification we calculate the neophilia index for a journal based on articles published in it during 1980-2013. To determine which ideas each paper in MEDLINE builds upon we use this database in conjunction with the United Medical Language System (UMLS) metathesaurus. The UMLS database is a comprehensive and widely used medical thesaurus that consists of over 5 million different terms (e.g. Chen et al. 2007; Xu et al. 2010). The UMLS database is referred to as a metathesaurus because it links the terms mentioned in over 100 separate medical vocabularies. Each term in the UMLS database is linked to one or more of 127 categories of terms. Further below we present the name of each of these categories and for each category a plethora of examples of terms in the category. An additional curated feature of the UMLS metathesaurus is that terms that are considered synonyms are linked to one another. This feature enables us to treat terms 5 that are synonyms as representing the same idea. We will thus avoid the mistake of assigning a high neophilia ranking to a journal that merely prefers to publish articles that use novel terminology for seasoned ideas. The construction of the neophilia index for a journal proceeds in four steps. In steps 1-3 we treat original research articles published in any journal the same; only in step 4 do we focus the analysis on the two journal sets mentioned in section 2.1. Step 1. Determine when each term was new. For each term in the UMLS thesaurus, we first 
 An alternate approach to ours might measure the vintage of ideas on which a paper is built by the 4 vintage of the publications that the paper cites. The main disadvantage of this approach is that a citation is an ambiguous reference. Citations are sometimes signposts for a bundle of ideas that have appeared in a literature over a long period of time, rather than a pointer to a particular idea in a paper. Thus, it is problematic to infer that a paper builds on a novel idea simply because it cites recent papers. Additionally, a citation may instead reflect similarity in the aims of the citing and cited papers, rather than a citation to any particular idea. To the extent that this is the case, a high propensity to cite recent articles in a journal would merely be a reflection of publishing papers in areas with many similar papers rather than a reflection of the authors’ love of trying out new ideas. Citation-based indices are thus best viewed as measuring a journal’s influence — useful for some purposes — and complementary to the neophilia-based approach we outline in this paper. In UMLS, terms that are synonyms are mapped to one “concept ID”. There are 2 million concept IDs 5 and 5 million terms. Thus, each UMLS term has approximately 1.5 synonyms on average. 5 determine the earliest publication year among all those articles in the MEDLINE database that mention the term (we search all 20+ million MEDLINE articles for each term). For terms that have no synonyms in the UMLS metathesaurus, we refer to this year of first appearance in MEDLINE as the term’s cohort year. For a term that has synonyms, we find the earliest year in which either the term itself or any of its synonyms appeared in MEDLINE and then assign that year as the cohort year of the term. Thus, all terms that are considered synonyms receive the same cohort year. Determining the cohort year of each term allows us to determine in the next steps which papers mention terms that are relatively new. Step 2. Determine age of newest term mentioned in each article. For each original research paper in MEDLINE we then index which of the 5+ million terms in the UMLS database appear in the article. Having found which UMLS terms appear in each article, we determine the age of each such UMLS term by calculating the difference between the publication year of the MEDLINE article in question and the cohort year of the UMLS term. Next, we determine the identity and age of the newest terms mentioned in each paper (here we consider all terms in cohorts 1961-2013). This concludes Step 2. Before proceeding to present Step 3, we now pause to show lists of example terms in each category. For the sake of presenting these lists and for the sake of several sensitivity analyses to be discussed further below (section 2.4), we have grouped each of UMLS’s 127 categories for terms to 8 category groups that we constructed (the number in parenthesis is the number of UMLS categories we assigned to the group): Clinical (21), Anatomy (8), Drug (4), Research Tools (3), Basic Science I (11), Basic Science II (31), Miscellaneous I (27), and Miscellaneous II (22). We constructed two basic science groups merely to limit the size of each list; the first basic science category includes processes and functions, the other everything else. The latter of the two “miscellaneous” groups includes many terms that one may argue do not represent idea inputs to scientific work in the traditional sense; in a sensitivity analysis we exclude from the analysis the terms in this category group. By clicking on one of the following 8 links the reader can open an embedded document that shows example terms for each UMLS category in a given category group. The terms listed for decade in each category group are those terms that are the most often the newest UMLS term for some paper in the MEDLINE database. The purpose of this popularity ranking is merely expositional. In constructing the neophilia index we treat 
6 The popularity ranking allows us to limit the size of the embedded files (there are 449,783 UMLS terms 6 in cohorts 1961-2013 that are at least once the newest term in a MEDLINE paper published during 1971-2013). Focus on less popular terms would obviously put readers not working in those few research areas where such terms are used at a considerable disadvantage. 6 all UMLS terms the same irrespective of how many times they are mentioned in the MEDLINE database and how many times they are the newest term for some paper. [The links do not access the internet; they open inside Abobe Acrobat and may not work insider a browser; the documents are also available on the first author’s homepage.] List 1. Clinical (click here to open an embedded PDF document) List 2. Anatomy (click here to open an embedded PDF document) List 3. Drug (click here to open an embedded PDF document) List 4. Research Tool (click here to open an embedded PDF document) List 5. Basic Science I (click here to open an embedded PDF document) List 6. Basic Science II (click here to open an embedded PDF document) List 7. Miscellaneous I (click here to open an embedded PDF document) List 8. Miscellaneous II (click here to open an embedded PDF document) We hope that browsing these lists makes two issues evident to the reader — at least to a reader with some familiarity with changes in biomedical science in the last 40 years. First, the terms captured by our approach represent ideas that have served as inputs to biomedical science in recent decades. Second, the cohort year for most terms is a reasonable reflection of the time period when the idea represented by the term was a new idea as an input to biomedical scientific work. Step 3. Determining which papers mention relatively new terms. Having determine the age of the newest UMLS term that appear in each article, we next determine which articles mention relatively new terms. To achieve this, we first order all papers published in any given year based on the age of the newest UMLS term that mention in it (as mentioned above, the analysis is limited to all original research papers — we exclude editorials, reviews, etc. from the analysis). Using this ordering we then construct a dummy variable Top 20% by Age of Newest Idea Input that is 1 for papers that are in the top 20% based on the age of the newest term that appears in them and 0 for all other papers. Thus, this dummy variable is 1 for papers that mention one or more relatively new terms and 0 for papers that only mention older terms. In our baseline specification the comparison group for each article is very broad when the Top 20% by Age of Newest Idea Input dummy variable is constructed: the comparison group is all other articles published in the same year. However, in sensitivity analyses 
 7 we employ much narrower comparison groups. Specifically, in these sensitivity analyses we compare articles to other articles published in the same research area in the same year (section 2.4.3). We selected the 20% cutoff to allow for such very strict comparison sets in the sensitivity analyses. In our related previous work (Packalen and 7 Bhattacharya 2015a) we have not found any meaningful differences owing to different cutoff percentiles. Step 4. Constructing the neophilia index for a journal. Having constructed for each article the dummy variable Top 20% by Age of Newest Idea Input, the variable that captures whether the paper mentions a new term, we calculate the average value of this variable for each journal during the time period under consideration. Next, we perform a 8 normalization: we divide these journal-specific average values by the average value of the dummy variable Top 20% by Age of Newest Idea Input for all journals in the journal set General and Internal Medicine. The resulting variable is our journal-specific neophilia index. Based on this index, we determine the neophilia ranking of each journal in a given journal set. The neophilia index is between 0 and 1 for journals that promote the trying out of new ideas less than the average article in the journal set General and Internal Medicine. For example, a neophilia index of 0.75 for a journal implies that articles in that journal mention a relatively new idea 25% less often than the average article in this journal set. The neophilia index is greater than 1 for journals that promote the trying out of new ideas more than the average article published in the journal set General and Internal Medicine. For example, a neophilia index of 1.5 for a journal implies that articles in that journal mention a relatively new idea 50% more often than the average article published in this journal set. 2.3 Comparison of a Neophilia Index and Citation Ranking To compare our neophilia index against citation based journal rankings, we make use of the impact factor rankings published by TR for the year 2013 for journals in the journal set General and Internal Medicine. Analysis of the relationship between our neophilia index and the citation ranking reveals whether our neophilia index captures an aspect 
 A 20% cutoff means the comparison set can be as small as 5 articles. A 1% cutoff would mean that the 7 comparison set can be as small as 100 articles. When there are fewer than 5 articles in a comparison group, which only occurs in our sensitivity analyses, we assign the top 20% status to the article at the top of “age of the newest term” ordering. In our baseline specification this time period is 1980-2013. We weight observations for each decade so 8 that the total weight of observations for any given decade is the same as the total weight of observations is for any other decade. 8 of scientific progress that is distinct from features of scientific progress that are captured by citation based measures. If a journal with a higher citation ranking than another journal always has also a higher neophilia ranking than the other journal, the neophilia index would be of little value. On the other hand, the neophilia index does have value as an input to science policy if the relationship between the neophilia index and impact factor rankings is not one-to-one. 2.4 Sensitivity Analyses We perform four sets of sensitivity analyses. 2.4.1 Sensitivity Analysis I: Time Periods In our baseline specification we calculate the neophilia index of a journal based on the 8+ million original research articles published during 1980-2013 (for our MEDLINE data the year 2013 is the last year of comprehensive coverage). To examine how stable the neophilia index is over time, we also calculate the index separately for four time periods: 1980s, 1990s, 2000s, and 2010-2013. 2.4.2 Sensitivity Analysis II: Subsets of UMLS Terms In our baseline specification we construct the neophilia index based on all terms in the UMLS thesaurus. In one set of sensitivity analyses we calculate the neophilia index based on narrower sets of UMLS terms. First, we calculate the neophilia index after excluding mentions of terms in the category group “Miscellaneous II”. This allows us to examine if the neophilia ranking is robust to excluding terms which may not reflect traditional idea inputs to scientific work. Second, we calculate the neophilia index after excluding mentions of terms in the category groups “Miscellaneous II” and “Drug”. This allows us to examine to what extent our baseline neophilia ranking is driven by research on novel pharmaceutical agents. Third, we calculate the neophilia index by only including in the analysis terms in the category groups “Clinical” and “Drug”. This allows us to examine how different the neophilia rankings would be for a decision maker that is only interested in advancing applied clinical knowledge. Thus, in each of these sensitivity analyses, we exclude from the analysis terms from some UMLS categories. However, because in some UMLS terms are appear in multiple 
 9 categories, some terms that appear in the excluded categories will still be included in the analysis — provided they also appear in one or more of the still included categories. 2.4.3 Sensitivity Analysis III: Narrower Comparison Groups In our baseline specification we construct the neophilia index by comparing each article to all articles published in the same year. In one set of sensitivity analyses we address the fact that some journals may choose to publish articles that are written on topics that are from a field where scientists are more inclined to try out new ideas but may at the same time be less willing to publish articles that use novel terms given the standards of the field. Specifically, in these sensitivity analyses, we no longer compare a publication to all publications published in the same year when we determine a publication’s top 20% status based on the age of the newest term mentioned in it. Instead, we compare the publication to other publications published in the same research area in the same year when we determine a publication’s top 20% status. For these analyses, we follow our earlier work (Packalen and Bhattacharya 2015a) and determine research areas based on the 6-digit Medical Subject Heading (MeSH) codes by which each MEDLINE publication indexed. MeSH is a controlled medical vocabulary of over 27,000 terms. MeSH terms and corresponding codes are affixed to each publication by professional coders with a biomedical degree. We consider papers marked with the same MeSH codes to be in the same research area. In one analysis, we construct the research areas based on the MeSH Disease terms mentioned in each article; for our purposes these terms serve as a proxy for clinical research areas In one analysis, we construct the research areas based on the MeSH Phenomena and Processes terms mentioned in each article; for our purposes these terms serve as a proxy for basic research areas. Having determined the comparison group (based on research area and year of publication) for each publication, we determine which papers in that comparison group are in the top 20% based on the age of the newest term mentioned in them. This dummy variable is then used to construct the neophilia index (analogously to the baseline specification). 2.4.4 Sensitivity Analysis IV: N-Gram Approach In our baseline specification, we determine the ideas that each paper builds upon based on the vintage of any UMLS terms that appear in it. In one sensitivity analysis, we 
 10 instead follow our earlier work (Packalen and Bhattacharya 2015abc) and determine the ideas that each paper builds upon based the vintage of words and 2- and 3- word sequences that appear in it. In this alternative approach (“n-gram approach”) we first index for each publication all words and word sequences that appear in it. For all such “concepts” that appear in MEDLINE, we then determine the cohort year of each such concept as the earliest publication year among papers that mention the concept in the MEDLINE database. For each concept cohort we then determine which 100 concepts in the cohort are the most popular concepts in the cohort. Popularity of each concept is determined based on the number of publications in which it has appeared since. For each cohort year during 1970-2013, we then cull through the list of the top 100 most popular concepts in the cohort and exclude concepts that likely do not represent idea inputs in the traditional sense. The remaining top 100 concepts for each cohort are then used to determine the vintage of idea inputs in any given publication — in the exact the same way that we employ the UMLS thesaurus in the baseline specification. The neophilia index for a journal is then calculated based on the vintage of the newest idea input in each paper. The only difference to the baseline specification is again that the curated top 100 concept lists — one list for each concept cohort — are used in place of the terms that make up the UMLS thesaurus. One advantage of constructing the neophilia index using the n-gram approach is that it does not depend on the availability of a thesaurus, which may not not exist for all fields. One potential disadvantage of the n-gram approach — relative to the baseline specification which relies on the UMLS thesaurus — is that the n-gram approach may assign a different cohort year to two words that are synonyms. To the extent that this occurs, in the present context it would imply that journals that prefer using newer terminology for old ideas receive higher neophilia scores even though the work published in these journals is not particularly innovative in any way that genuinely advances science. 3. Results Our results consist of four sets of results: neophilia rankings for 10 highly cited journals in the General and Internal Medicine journal set (Table 1), neophilia rankings for all journals in the same journal set (Table 2), a scatterplot and a regression line for the relationship between the neophilia index and the citation-based impact factor rankings for the same journal set (Figure 1), and neophilia rankings for the journal set Core Clinical Journals (Table 3). In each table, columns 1d and 1a, respectively, show the 
 11 neophilia index and the corresponding neophilia ranking for the baseline specification. Column 1b shows the journal name (MEDLINE abbreviation) and column 1c shows the number of original research articles published during 1980-2013 based on which the neophilia index shown in column 1d was calculated. Columns 2-5 show the results for the four sets of sensitivity analyses. Entries in each table are color coded, with reddish hues indicating a high propensity to publish articles that mention novel terms relative to the average paper and blue indicating the lowest propensity. We next discuss each of these results in turn. Table 1 shows the neophilia ranking for 10 highly cited general and internal medicine journals. To construct this table, we calculated the neophilia index for the 10 most cited journals that are both ranked by TR in the General and Internal Medicine journal category and for which data is available in MEDLINE to construct the neophilia index. The highly cited status is determined based on TR impact factors in 2013. These 10 journals 9 are arguably some of the most prestigious English language medical journals. Among these 10 highly cited medical journals, the New England Journal of Medicine (N Engl J Med) ranks at the top of our neophilia index. The number 1.81 in the top row of column 1d indicates that over the period 1980 to 2013, the New England Journal of Medicine was 81% more likely to publish articles that mention novel terms compared to the average article published in the General and Internal Medicine journal set. By contrast, out of these 10 journals, the British Medical Journal (BMJ) was the least likely to publish articles that mention new terms during this period. Overall, several features stand out from the results reported in Table 1. First, these highly cited journals vary considerably in their propensity to publish articles that try out new ideas. For the two journals with the highest neophilia indices in column 1d — the New England Journal of Medicine and BMC Medicine (BMC Med)— the neophilia index is more than twice as large as the neophilia index is for either of the two journals with the lowest neophilia index in column 1d— the British Medical Journal and the Canadian Medical Association Journal (CMAJ). Prestigious high-influence journals are not equal in terms of their ability to reward innovative science. Second, while 8 out of the 10 prestigious journals have a higher than average propensity to publish articles that try out new ideas (that is, for 8 journals in Table 1 the neophilia index in column 1d is above 1.0), at the same time 2 out of these 10 prestigious journals 
 Two top 12 journals in the TR impact factor rankings are excluded from our analysis. Cochrane Database 9 of Systematic Reviews is excluded because it does not publish sufficiently many original research articles — the focus of the journal is on reviews. Journal of Cachexia, Sarcopenia, and Muscle is excluded because MEDLINE does not have sufficient textual information on this journal. Accordingly, the 10 highly cited journals in Table 1 are among the top 12 most cited journals in the General and Internal Medicine category. 12 have a lower than average propensity to publish articles that try out new ideas (the British Medical Journal and the Canadian Medical Association Journal). Being a prestigious high-influence journal does not automatically imply that the journal encourages innovative science. Third, for most of these journals the neophilia index and the corresponding neophilia ranking remain relatively stable over time. This is shown by the time-period specific neophilia indices reported in columns 2a-2d of Table 1. That said, some changes over time are apparent. For instance, the neophilia index for the New England Journal of Medicine has increased substantially from 1980s to 2010s (from 1.54 to 2.06). On the other hand, for Annals of Internal Medicine (Ann Intern Med) the neophilia index has changed from well-above average to merely average (from 1.81 to 1.04), and the neophilia indices for the British Medical Journal and the Canadian Medical Association Journal have plummeted from average to well-below average (from 1.01 to 0.70, and from 0.88 to 0.47, respectively). It is also interesting to note that one relatively new journal, BMC Medicine, fares so well in the rankings, but another, PLoS Medicine, appears to be struggling in recent years after initially succeeding in publishing innovative work. Fourth, for most journals the neophilia index and the corresponding neophilia ranking remain robust to the other sensitivity analyses that are reported in columns 3a-3c, 4a-4b, and 5 of Table 1. The neophilia indices reported in columns 3a-3c rely on different subsets of UMLS terms, such as the set that excludes novel pharmaceutical terms (column 3b). The neophilia indices reported in columns 4a and 4b in turn control for the propensity to publish in hot clinical research areas or in hot basic science areas, respectively. While these adjustments have small effects on the relative rankings of these top 10 journals in our neophilia index, they do not have a large effect. This consistency with our main results is not surprising given that these general interest journals tend to publish papers from a broad set of areas, not just drug trials or particular hot clinical or basic science fields. Finally, the neophilia indices reported in column 5 show that the rankings are relatively robust to using the alternative n-gram based approach in place of the UMLS thesaurus approach used in the baseline specification. We now turn our attention to Table 2, which lists the neophilia index and the corresponding ranking for all 126 journals in the General and Internal Medicine category (for 126 out of 156 journals in this category enough data is available in MEDLINE to construct the neophilia index). We have indicated in bold text those journals which are also present in Table 1 (the table on 10 highly cited journals). The top ranked journals in Table 2 are Current Medical Research and Opinion, the American Journal of Chinese Medicine, and Translational Research, none of which rank among the top 10 based on citations. This indicates that our neophilia rankings and citations-based impact factor rankings capture 
 13 different aspects of science. The fact that journals Translational Research and Journal of Investigative Medicine are highly ranked in our neophilia rankings (3rd and 13th, respectively) is reassuring because these journals strive to promote the very thing that our measure seeks to capture — innovative science that builds on new ideas (the journals aim to translate new ideas in ways that benefit patient health). Columns 2a-2d of Table 2 show that also for this broad set of journals the neophilia index remains relatively stable over time. This persistence in journal neophilia indices over time implies that the neophilia rankings of these journals during any given time period are not random; to a significant degree the rankings are the result of variations in editorial policies across journals. Columns 3a-3c of Table 2 in turn show that with some exceptions the neophilia rankings are also relatively independent of the set of UMLS terms that are included in the analysis. One such exception concerns the exclusion of terms in the “Drug” category from the analysis (column 3b): unsurprisingly this dramatically lowers the neophilia index for journals that are mainly focused on research on effects of new pharmaceutical agents — these journals include Current Medical Research and Opinion and International Journal of Clinical Practice (rows 1 and 8, respectively). Columns 4a-4d of Table 2 show that the neophilia rankings are relatively stable to selecting narrower comparison groups in determining which articles build on new ideas. Finally, column 5 shows that the neophilia rankings remain relatively robust to constructing the neophilia index based on appearance of new n-grams rather than based on the appearance of new UMLS terms. We now turn to the results shown in Figure 1 on the link between our neophilia index and the traditional citation-based impact factor rankings. The scatterplot shows for each journal in the General and Internal Medicine category the journal’s citation based impact factor ranking in 2013 (horizontal axis) against the journal’s neophilia index for the 1980-2013 period (vertical axis). The figure also shows the least squares regression line for these observations. The scatterplot and the regression line shown in Figure 1 demonstrate that more cited journals generally have also a higher neophilia index (p < 0.01). There is, however, considerable variation around this regression line, with some less cited journals faring very well on our neophilia index, and some highly cited journals being relatively averse to publishing papers that build on fresh ideas. Our earlier results showing the strong persistence in the neophilia index over time (Table 1 and Table 2) implies that to a significant degree this variation around the regression line reflects genuine, persistent, differences in editorial policies across journals. That the relationship between the 
 14 citation ranking and our neophilia index is not monotonic implies that the neophilia index captures an aspect of scientific progress that is not captured by citations. The neophilia index proposed here thus has value as an additional input to science policy. We next turn our attention to results in Table 3, which reports neophilia rankings for the journal set Core Clinical Journals. This set includes both general medical journals and specialized field journals. We have again indicated in bold text those journals which 10 are also present in Table 1. The most neophilic journals on this list are Blood, the Journal of Immunology, and Medical Letters on Drugs and Therapeutics, showing that no field dominates over others in terms of the propensity to try out new ideas. The same observation is supported by scrolling further down the list; no field appears to have an obvious domination over others in terms of having more journals closer to the top. In the rankings of Table 3, there are 17 specialized journals above the most neophilic general medical journal (the New England Journal of Medicine). And there are even many more specialized journals above another highly cited general medical journal (the British Medical Journal, ranked 88th). These observations indicate that, while general medical journals are usually viewed as more prestigious, field journals too play an important role in promoting the trying out of new ideas in medicine. Neither field journals nor general medical journals appear to have a monopoly in this regard. The results across the different columns of Table 3 follow the pattern that is familiar from Tables 1 and 2. First, there is a lot of variation in the neophilia index across journals. Second, the neophilia index is relatively stable over time, though some variation exists. The journal Hospital Practice (row 83) is an extreme outlier in this regard. But the sudden change its neophilia index is not unexpected as it published no articles during 2002-2008; when the journal was brought back to life it likely followed very different editorial practices compared to its previous incarnation. Third, the neophilia index is generally robust to employing a different set of UMLS terms in the analysis. One exception to this robustness is that excluding terms in the “Drug” category group leads journals such as Medical Letters on Drugs and Therapeutics and Anesthesia and Analgesia (rows 3 and 33, respectively) to fall quite dramatically in the rankings. Because these journals focus on research on new drug compounds, this is not a surprising finding. In fact, it again acts as one validity test for our methods. Fourth, the neophilia index is relatively insensitive to choosing narrower comparison sets and 
 In Table 3 each neophilia index is again normalized relative to journals in the General and Internal 10 Medicine journal set. This way, the neophilia index does not change from one table to the next for journals that appear also in Table 1 or Table 2. In principle, of course, in constructing a neophilia index the normalization can be performed relative to any set of of journals. 15 to employing the n-gram approach over the UMLS thesaurus approach. 4. Discussion Our primary finding is that, on average, highly cited prestige journals in biomedicine actually do a good job in promoting innovative science. This is surprising in one regard. One might think that lower ranked journals would attempt to distinguish themselves by seeking novelty. One possible explanation for this surprising finding is our focus on medicine, rather than other scientific disciplines. By focusing on medicine, we have selected the area of science that may be most disciplined by the practical usefulness of its findings. This discipline may lead prestige journals to be less likely influenced by citation-oriented rankings, and to seek out innovative work that will affect the treatment of patients. Hence, when our neophilia index is exported to other fields, we might expect different results. Furthermore, we should be careful about what to expect given the nature of the coordination problem. This problem causes journals to publish less innovative science than they would absent the problem — it does not necessarily make less influential journals more likely to publish innovative work. Nevertheless, knowing the impact factor alone does not automatically predict the position in the neophilia-based index; there are high impact journals with a low neophilia score and there are lower impact journals with a high neophilia score. While the link between citation-based rankings and the neophilia index is positive, it is not a one-to-one relationship. For example, we found that some prestigious highly cited medical journals have even a below average neophilia score. One implication of these results is that focusing on impact factor alone does not provide appropriate incentives for journals to publish innovative work in biomedicine. Furthermore, lower ranked journals appear to play an important role in science by serving as an outlet for innovative work that — for whatever reason — is not poised to draw many citations from others in a field. A complementary finding of ours was that neither general medical journals nor specialized field journals dominate over one another in terms of publishing innovative work; both types of journals play an important role in advancing science in this regard. 5. Potential Limitations One possible critique of taking the neophilia index seriously is that it might lead a journal to publish work that builds on new ideas simply for the sake of improving its neophilia score, even when the editors do not view the innovative work as particularly 
 16 important in the field. Propagating the neophilia index, under this reasoning, may create incentives on the part of journals to game the index by distorting publication decisions in order to improve a journal’s position. In our view, this is a benefit arising from the neophilia index, rather than an unintended harm. We want journals to compete to publish work that elaborates on newer ideas because it makes science healthier: prior theoretical work suggests that absent such an incentive scientists underinvest in innovative science. Furthermore, one can tweak the index in many ways depending on the purpose; for instance, one can construct the index only based on ideas that have stood the test of time or based on ideas that exceed some popularity threshold. Of course, as with citation-based rankings, the novelty-based ranking too can have unintended consequences. For example, scientists and journals may be tempted to merely mention new ideas rather than actually incorporate them in their work. For most individuals and journals the potential reputational costs should prevent this. Moreover, algorithms will be developed to detect such behavior, as will new more robust versions of the ranking. These developments will mirror the proliferation of various citation- based indexes. 6. Conclusion For science to advance, it is important that journals publish articles that are at the frontier of science. At the same time, papers that are at the frontier — papers that explore new ideas or new areas within a field — are sometimes difficult to get published because there is no existing community of scholars to evaluate the idea and further develop it. This coordination problem leads to a suboptimal rate of publishing at the frontier. Journals can play an important role in combatting this problem by publishing papers that try out new ideas, but will be less willing to do so if they are not rewarded for it. A citation-based ranking system alone will not provide appropriate incentives because it is tied only to the influence that papers published in a journal has, rather than directly to the innovativeness of the published papers. By contrast, the neophilia-based index proposed in this paper captures the proximity of each journal to the scientific frontier. Publishing the neophilia ranking for medicine and other fields can directly lead to more innovative science. Because the ranking provides a visible signal to the scientific community that a journal with a high ranking values innovation, and scientists long for the recognition of other scientists, the new ranking should make the decision to try out innovative but risky ideas easier. Once scientists start paying attention to the new 
 17 rankings, journals will do the same. A positive feedback loop encouraging innovative experimentation will result. Adoption of the neophilia ranking as part of tenure and promotion and granting decisions by university administrators and grant agencies will reinforce this positive feedback loop. We hope that the journal ranking method proposed in this paper opens an empirical conversation on how novelty should be measured. As argued in the previous section, other versions of the neophilia index can and should be designed for different purposes. What should not be controversial, in our view, is the idea that novelty— like impact — can and should be quantified. In the age of relentless quantification scientists can ill afford to hide behind the excuse that the ingenuity of their own work cannot be measured. The issue seems also urgent: exploration in science may be on the decline (Foster et al. 2015) and the reliance on impact factors may hinder not just exploration (e.g. Alberts 2013) but also the desire to become scientists in the first place (Osterloh and Frey 2015). In this paper, we have proposed the neophilia ranking as a constructive way to start addressing these issues. We close with a proposed agenda for future research in this area. In our view, what is needed is a suite of indices that are tied to those aspects of science that we want scientific work to exhibit. Trying out new ideas is one important aspect of a healthy science. Citation-based indexes too will continue to have their place; scientific impact is still important. One could easily list others, such as the presence of work that exchanges ideas across fields, papers that affect real world decisions and outcomes (such as patient mortality), and so on. Theoretical and quantitative work to develop these metrics is an agenda that is important for effective science policy. 18 References Abbott, A., Cyranoski, D., Jones, N., Maher, B., Schiermeier, Q. and R. Van Noorden, 2010, Metrics: Do metrics matter? Nature 465, pp. 860-2. Adam, D., 2002, Citations: The counting house, Nature 415, pp. 726–9. Alberts, B., 2013, Impact Factor Distortions, Science, 340, p. 787. Bird, S. B., 2008, Journal Impact Factors, h Indices, and Citation Analyses in Toxicology, Journal of Medical Toxicology, 4(4), pp. 261-74. Brown, J. D., 2014, Citation searching for tenure and promotion: an overview of issues and tools, Reference Services Review, 42(1), pp. 70-89. Frey, B. and R. Katja, 2010, Do rankings reflect research quality? Journal of Applied Science, 13(1), pp. 1-38. Carlsson, H. and E. van Damme, 1993, Global Games and Equilibrium Selection, Econometrica, 61(5), pp. 989-1018. Chapron, G and A. Husté, 2006, Open, Fair, and Free Journal Ranking for Researchers. Bioscience, 56(7), pp. 558-9. Chen, Y., Perl, Y., Geller, J. and J. J. Cimino, 2007, Analysis of a Study of Users, Uses, and Future Agenda of the UMLS, Journal of the American Medical Informatics Association, 14(2) pp. 221-31. Egghe, L., 2006, Theory and practice of the g-index, Scientometrics, 69(1), pp. 131-52. Engemann, K. M. and H. J. Wall, 2009, A Journal Ranking for the Ambitious Economist, Federal Reserve Bank of St. Louis Review, 91(3), pp. 127-39. Foster, J. G., Rzhetsky, A. and J. A Evans, 2015, Tradition and Innovation in Scientists’ Research Strategies, American Sociological Review, 80(5), pp. 875-908. Garfield, E., 1972, Citation analysis as a tool in journal evaluation, Science, 178, pp. 471-9. Hirsch, J. E., 2005, An index to quantify an individual’s scientific research output, Proceedings of the National Academy of Science, 102, pp. 16569-72. Katerattanakul, P., Razi, M. A., Han, B. T., and H.-J Kam, 2005, Consistency and Concern on IS Journal Rankings, Journal of Information Technology Theory and Application (JITTA), 7(2), pp. 1-20. Kuhn, T. S., 1962, The Structure of Scientific Revolutions. Chicago University Press. Marshall, A., 1920, Principles of Economics, 8th ed. London: Macmillan and Co. Moed, H. F., 2008, UK research assessment exercises: Informed judgments on research quality or quantity? Scientometrics, 74(1), pp. 153-161. Morris, S. and H. S. Shin, 2003, Global Games: Theory and Applications, in Dewatripont, M., Hansen, L. and S. Turnovsky (eds.) Advances in Economics and Econometrics. Cambridge University Press. Osterloh, M. and B. S. Frey, 2015, Ranking Games, Evaluation Review, 32, pp. 102-29.
 19 Packalen, M. and J. Bhattacharya, 2015a, Age and the Trying Out of New Ideas, NBER Working Paper No. 20920. Packalen, M. and J. Bhattacharya, 2015b, New Ideas in Invention, NBER Working Paper No. 20922. Packalen, M. and J. Bhattacharya, 2015c, Cities and Ideas, NBER Working Paper No. 20921. Palacios-Huerta, I. and O. Volij, 2004, The Measurement of Intellectual Influence, Econometrica, 72(3), pp. 963-77. Palacios-Huerta, I. and O. Volij, 2014, Axiomatic measures of intellectual influence, International Journal of Industrial Organization, 34, pp. 85-90. Sakovics J. and J. Steiner, 2012, Who Matters in Coordination Problems? American Economic Review, 102(7), pp. 3439-61. Tort A. B., Targino Z. H. and O. B. Amaral, 2012, Rising publication delays inflate journal impact factors, PLoS One, 7(12), e53374. Weingart, P., 2005, Impact of bibliometrics upon the science system: Inadvertent consequences? Scientometrics, 62(1), pp. 117-31. Xu, R., Musen, M. A. and N. Shah, 2010, A Compehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations, AMIA Annual Symposium Proceedings, pp. 907-11. 20 http://link.springer.com/journal/11192