key: cord-0880903-hxvagvqw authors: Ioannidis, J.; Bendavid, E.; Salholz-Hillel, M.; Boyack, K. W.; Baas, J. title: Massive covidization of research citations and the citation elite date: 2022-01-25 journal: nan DOI: 10.1101/2022.01.24.22269775 sha: 67efac588e8f3bfd81779b622c2e070f9981ece9 doc_id: 880903 cord_uid: hxvagvqw Massive scientific productivity accompanied the COVID-19 pandemic. We evaluated the citation impact of COVID-19 publications relative to all scientific work published in 2020-2021 and assessed the impact on scientist citation profiles. Using Scopus data until August 1, 2021, COVID-19 items accounted for 4% of papers published, 20% of citations received to papers published in 2020-2021 and >30% of citations received in 36 of the 174 disciplines of science (up to 79.3% in General and Internal Medicine). Across science, 98 of the 100 most-cited papers published in 2020-2021 were related to COVID-19. 110 scientists received >=10,000 citations for COVID-19 work, but none received >=10,000 citations for non-COVID-19 work published in 2020-2021. For many scientists, citations to their COVID-19 work already accounted for more than half of their total career citation count. Overall, these data show a strong covidization of research citations across science with major impact on shaping the citation elite. The COVID-19 pandemic resulted in a massive mobilization of researchers across science to address a new major challenge [1] . It is estimated that approximately 4% of the scientific literature published in 2020-2021 was related to COVID-19 [2] : over 720,000 different scientists published over 210,000 relevant publications based on items indexed in Scopus as of August 1, 2021 [2] . COVID-19-related published items exceeded 440,000 by the end of 2021 according to the WHO database [3] . This shift of the research enterprise and massive production of COVID-19-related publications ("covidization") may have had implications for citations to recent scientific work. In most scientific disciplines, most papers get few, if any, citations in the first year, and citations appear gradually, spread over many years, with citation half-lives that typically exceed 5 years for most scientific fields and may exceed 10 years for some fields [4] [5] [6] . The half-life of the citation pattern for COVID-19 work is still unknown, given the short-term follow-up for the COVID-19 published papers. However, the hundreds of thousands of COVID-19 publications likely have drawn citations largely from other very recently published COVID-19 work. Conversely, for non-COVID-19 work, citations from very recent papers (<1-2 years old) are expected to have been a minority. Therefore, it is likely that a large share of citations to very recent work in 2020 and 2021 reflect citations to COVID-19 papers. The extent and distribution of such a COVID-19-enriched pattern of recent citations is worth studying for their implications in understanding the evolving cultural norms. Citations of more recent papers may represent reliance on less vetted, more tentative knowledge. Reliance on less mature knowledge may be more susceptible to reversal, and a number of high-profile retractions have unnerved the scientific world in the COVID-19 era [7] . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01. 24.22269775 doi: medRxiv preprint Moreover, the massive COVID-19 literature and its citations may have had a major impact on the careers of many scientists. The possibility of receiving a large number of citations could be highly appealing to researchers whose careers are influenced by reputation and citation metrics. If covidization of research heralds a new approach to receiving citations, it may change the incentives of scientists motivated by the lure of such scientific rewards. This, in turn, may shift the work of young scientists away from more "gradualist" fields towards COVID-19. The appeal of working on COVID-19, in other words, may extend beyond its health challenges, skewing an important alignment between the burden of disease and interest by scientists. Here, we compare scientists' acquisition of citations for COVID-19 and similarly recent non-COVID-19 work; characterize the profiles of scientists that had extraordinary boosts to their citation profiles; and assess whether COVID-19 citations correlated with overall career impact, or whether they had an independent impact in generating a new citation elite. We addressed these questions using comprehensive data from Scopus [8] from 2020-2021. From January 1, 2020 until August 1, 2021 a total of 5,728,015 items were published and indexed in Scopus, including 210,183 (4%) items related to COVID-19. The number of total citations that they received until August 1, 2021 was 9,174,336, of which 1,832,477 citations (20%) were to the published items related to COVID-19. Therefore, even though COVID-19 items were a minority, they accounted for a 5-times larger share of the citations received to very recently published items. Table 1 shows the 36 scientific disciplines (of a total of 174 fields across all science) where more than 30% of citations received in 2020-2021 to work published in these two years were to COVID-19 work. For 3 scientific fields, more than two-thirds of the citations received to As shown in Figure 1 , the proportion of papers receiving very high numbers of citations by August 1 of the next calendar year increased very slightly between 2017 and 2019. However papers published in 2020 had a major shift, with much larger proportions of papers receiving very high numbers of citations. The shift was entirely attributable to COVID-19-related publications. Of the 3,183,277 publications in 2020, the 96,351 COVID-19 related publications received 8.4-fold more citations than the non-COVID-19 publications. The fold difference was 20.9-fold for General and Internal Medicine, i.e. on average a COVID-19-related paper received more than 20 times the number of citations received by a non-COVID-19 paper. The average citations per paper were higher for COVID-19 papers than for non-COVID-19 papers for 128 of the 129 scientific disciplines that published more than 50 COVID-19-related papers in 2020 (with the exception of Computational Theory and Mathematics) (Supplementary Table 1) . Scientists with high numbers of citations to their 2020-2021 published work . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint A total of 84757 scientists had received >=100 citations to their work published in 2020-2021 by August 1, 2021 (among a total of 4,183,909 Scopus IDs that had published at least one paper in that time period and 5 or more papers in their entire career). Among these 84757 scientists, 35358 had received >=300 citations, 5773 had received >=1000 citations, 240 had received >=5000 citations, and 110 had received >=10000 citations for such very recent work. Of the 84757 scientists with >=100 citations to very recent work, 53% had published at least some COVID-19 papers and of the 5773 scientists with >=1000 citations to very recent work, 65% had published some COVID-19 work. Table 2 shows the number of scientists who had received high numbers of citations to very recent work overall, COVID-19-related work, and non-COVID-19 work. As shown, the number of authors who received >=100 citations to very recent work was almost double for COVID-19 work than for non-COVID-19 work, but the difference was eliminated at the >=1000 citations threshold. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint Among the 84757 scientists with >=100 citations to their very recent work (published in 2020-2021), for n=11767 scientists the citations to their COVID-19 work already accounted for more than half of their total career citation count. Correspondingly, for n=5071 of the 84757 scientists the citations to their non-COVID-19 work published in 2020-2021 already accounted for more than half of their total career citation count. Using a composite citation indicator for ranking the citation impact of scientists, among the top-300 ranked scientists for their COVID-19 work, 117 were among the top-100,000 ranked science-wide for their entire career impact as of August 1, 2021 and 54 were among the top-20,000 ranked science-wide for their career impact. Figure 2 shows the trajectory for the ranking of these 54 scientists across science according to the composite indicator considering the citations received in a single year to all work published in their past career. As shown, in 2019 versus 2017 improvements in ranking were as common as worsening ranking: 13/54 scientists improved their ranking by a third or more and 10/54 worsened their ranking by as much. Conversely, in the 2020 versus 2019 comparison, 47/54 scientists improved their ranking by a third or more, while no scientists worsened their ranking by this margin. Five scientists improved their ranking more than 6-fold; the most impressive improvement was for a long-time coronavirus expert who went from rank 48045 in 2019 to rank 362 in 2020, a 13-fold improvement. Overall, 143 of the top-200 ranked scientists across science based the composite citation indicator calculated specifically for their work published in 2020-2021 had published at least 1 COVID-19-related paper. Correlation between metrics of impact: career impact and 2020-2021 work . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. work in 2020-2021 (r=-0.14). The lack of correlation between performance metrics on COVID-19 work and performance for the entire career was seen also for all other metrics besides citations. Figure 3 shows the strong relationship (r=0.59) between the hm-index for entire career and the hm-index for very recent work overall, but weak relationship (r=0.11) between the hm-index for entire career and the hm-index for COVID-19 work. The present analysis shows a massive covidization of research citations during 2020-2021. A large share of the citations to papers published in 2020-2021 has gone to COVID-19related items. This pattern is seen across many scientific disciplines, with the highest rates in is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. resources and the commitment of large amounts of funding to COVID-19 work and related fields may induce a more lasting impact on the scientific literature and its citation footprint. Citation impact may not necessarily mean high quality or validity of the cited work. papers reflect topics that are debated or even refuted, such as editorials about the origin of the new coronavirus and early reports claiming effectiveness for interventions such as hydroxychloroquine that were not subsequently validated for major outcomes (e.g. mortality). High rates of non-replication and refutation for many of the most highly-cited papers have also been presented in the pre-COVID-19 scientific literature [19, 20] . Beyond COVID-19 there is debate in the literature in other fields on the extent to which citations are influenced by quality [21] [22] [23] [24] and the relative contribution of rigor and relevance is attracting citations [25] [26] [27] . There are several limitations to our work. First, the classification into COVID-19 versus non-COVID-19 work may not be perfect. However, it is unlikely that the existence of a border zone of difficult to classify papers and of papers misclassified by our search algorithm would change the big picture of the results. Second, some scientists may have their publications split into two or more Scopus ID files and some Scopus ID files may include papers by more than one author. Nevertheless, Scopus data have high precision and recall (98.1% and 94.4%, respectively) [4] , therefore this is unlikely to be a source of major error. Third, we could not . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. By design, we opted to focus on authors with at least 5 full papers, conference papers or reviews under their belt by August 21, 2021. This choice has been adopted also in previous work [2] . This design probably excluded from the evaluation a substantial number of early career scientists who have not published that many papers yet, but who may have already co-authored COVID-19 work that gathered many citations. Therefore, the number of authors who more than doubled their total career citations would be probably much larger than our estimates, if these authors with few papers were to be added. However, many Scopus author ID files with few items are fragments that belong to larger profiles and considering these ID files would have added spurious noise to the analysis. Moreover, many of the author ID files with few papers may represent people who have an auxiliary role in the research process rather than being key investigators. Allowing for these caveats, our analysis shows a massive covidization of research citations. Citations are a main coinage used for choices reflecting funding and career advancement in both academia and the wider scientific community and are widely deemed highly desirable [28, 29] . Other investigators have expressed concerns about the covidization of research [30, 31] . The duration and evolution of the phenomenon are unknown, but they warrant careful monitoring. The ultrafast generation of the broad COVID-19 research community is most . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint welcome to the extent that it serves the needs of scientific investigation and its translation to useful medical and public health interventions and policy. Conversely, if that community grows disproportionally large and/or it remains pervasive even when the pandemic dissipates, challenges may arise. Evidence from evaluations of citation patterns in very large scientific fields [32] suggests that when scientific fields grow very large, the list of most-cited papers ossifies to become a canon that slows disruption and real progress. COVID-19 offers a unique example of a scientific field that grew to extremely large dimensions extremely fast. The pandemic has shown the great ability of the scientific workforce to shift attention to an acute problem. It is unknown if this versatility can translate also reversely to shifting away from that problem, when it is no longer an acute and major threatand/or refocusing on different priorities, if such priorities arise. Regardless, given the strong value that publications and citations have on scientists' funding and career prospects, COVID-19 may continue to have a dominant presence in the scientific literature and its citations well beyond the end of the pandemic. We used a copy of the Scopus database [4] Scopus citations to all publications in 2020-2021, to COVID-19 publications in 2020-2021 and to non-COVID-19 publications in 2020-2021 were counted as of August 1, 2021. As many publications related to COVID-19 have been first disseminated using pre-print servers, we also included preprint publications from ArXiv, SSRN, BioRxiv, ChemRxiv and medRxiv [2] . Citations from or to preprints were not included in any of the counts. Publications in 2020-2021 were assigned to a discipline based on their journal of publication and according to the Science Metrix classification of science, which is a standard mapping of all science into 21 main fields and 174 subfield disciplines [33, 34] . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint publications over mean citations to non-COVID-19 publications. This analysis is slightly biased in favor of non-COVID-19 publications, since very few COVID-19 publications appeared in the first two months of 2020 and thus non-COVID-19 publications had slightly more time available to be cited on average. We identified all authors who had received by August 1, 2021 at least 100 citations to their work published in 2020-2021. We noted how many of them had published at least 1 COVID-19-related publication. We also noted how many authors had received by August 1, 2021 at least 100 citations to their COVID-19 versus non-COVID-19 work published in 2020-2021. Numbers of authors passing higher citation thresholds (>=500, >=1000, >=5000, >=10000) for these categories were also noted. We examined the country of the authors at the highest citation levels. For all analyses of authors, similar to prior work [2] we only considered those that have published at least 5 papers (articles, conference papers, or reviews) in their career. This allows the exclusion of authors with limited presence in the scientific literature and of author IDs that may represent split fragments of the publication record of some more prolific authors. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint single/first, single/first, single/first/last author, we generated a ranking of scientists based on their 2020-2021 work alone. We did the same calculations and generated the respective and rankings limited to COVID-19 work published in 2020-2021. For each author with >=100 citations to their work published in 2020-2021, we also calculated the same citation metrics and overall ranking across all science as of August 1, 2021 for the work published during their entire career [40] . We evaluated for how many authors their COVID-19 work accounted for at least half of the citations they had received in their entire career; and for how many authors their non-COVID-19 work published in 2020-2021 accounted for more than half of the citations they had received in their entire career. We had previously generated [2] a list of the top-300 ranked scientists for their COVID-19 work based on the composite citation indicator. We investigated how many of those were also among the top-100,000 ranked science-wide for their entire career impact as of August 1, 2021 and how many were among the top-20,000 ranked science-wide for their career impact. For the scientists who were among the top-20,000 ranked science-wide for their career impact, we also noted their science-wide ranking for their annual citation impact in the single years 2017, 2019, and 2020; data were extracted from previously published, publicly available datasets that use the composite citation indicator for the ranking [37] [38] [39] . This allowed to assess the evolution of the trajectory of the ranking of these scientists before the pandemic and during the pandemic. The annual assessments consider all the citations received in a single year to all work published in the scientist's career. Therefore, they reflect the recent attention not only to the recent work, but also to all past work. Finally, we calculated Pearson correlation coefficients for the productivity and citation metrics of the scientists for their entire career as of August 1, 2021 and the respective metrics for . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint COVID-19 work, and non-COVID-19 work published in 2020-2021. This allowed to evaluate whether the career impact tracked with their recent COVID-19 work, non-COVID-19 work, or both. All calculations throughout the paper include self-citations. No statistical tests were used and no p-values are reported, since analyses are descriptive. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; Table 1 . Scientific disciplines where COVID-19 work received >30% of the citations given to papers published in 2020-2021 (until August 1, 2021) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269775 doi: medRxiv preprint How a torrent of COVID science changed research publishing -in seven charts 2021The rapid, massive growth of COVID-19 authors in the scientific literature Approach for using Journal Citation Reports in determining the dynamics of half-life indicators of journals The obsolescence of cited and citing journals: half-lives and their connection to other bibliometric indicators Cited half-life of the journal literature Inconsistent and incomplete retraction of published research: A cross-sectional study on Covid-19 retractions and recommendations to Towards understanding the relation between citations and research quality in software engineering studies Characteristics of highly cited papers Citation rates and perceptions of scientific contribution What do citation counts measure? A review of studies on citing behavior Evaluation by citation: trends in publication behavior, evaluation criteria, and the strive for high impact publications Citation gaming induced by bibliometric evaluation: A country-level comparative analysis Covidization of research: what are the risks? Scientists fear that 'covidization' is distorting research Slowed canonical progress in large fields of science Towards a multilingual, comprehensive and open scientific journal ontology Character-level convolutional networks for text classification