key: cord-0838028-5adwp5li authors: Cunningham, Eoghan; Smyth, Barry; Greene, Derek title: Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research date: 2021-08-30 journal: Humanities & Social Sciences Communications DOI: 10.1057/s41599-021-00922-7 sha: ce2de4bfaeb7998ac7f8c7601de2401e83131f93 doc_id: 838028 cord_uid: 5adwp5li The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has fostered among different disciplines. Increased multidisciplinary collaboration has been shown to produce greater scientific impact, albeit with higher co-ordination costs. As such, we consider a collection of over 166,000 COVID-19-related articles to assess the scale and diversity of collaboration in COVID-19 research, which we compare to non-COVID-19 controls before and during the pandemic. We show that COVID-19 research teams are not only significantly smaller than their non-COVID-19 counterparts, but they are also more diverse. Furthermore, we find that COVID-19 research has increased the multidisciplinarity of authors across most scientific fields of study, indicating that COVID-19 has helped to remove some of the barriers that usually exist between disparate disciplines. Finally, we highlight a number of interesting areas of multidisciplinary research during COVID-19, and propose methodologies for visualising the nature of multidisciplinary collaboration, which may have application beyond this pandemic. The scientific response to the SARS-CoV-2 pandemic has been unprecedented with researchers from several surprising fields -e.g. artificial intelligence 1 , economics 2 , and particle physics 3 -contributing to solving the many and varied clinical and societal challenges arising from the pandemic. As a result, by January 2021, The Allen Institute for AI 4 and the World Health Organisation 5 had identified over 166,000 research papers relating to SARS-CoV-2 and the COVID-19 illness it causes, highlighting an unprecedented period of scientific productivity. In this study we analyse this body of work to better understand the scale and nature of the collaboration and fields of study that have defined this research. The benefits of collaboration during scientific research are well documented and widely accepted and in recent years there has been steady growth in research team size across all scientific disciplines 6, 7 , which has been shown to correlate positively with research impact 8, 9 . Moreover, multidisciplinary science, which brings together researchers from many disparate subject areas has been shown to be among the most successful scientific endeavours 10, 11 . Indeed, multidisciplinary research has been highlighted as a key enabler when it comes to addressing some of the most complex challenges facing the world today 6 . Not surprisingly then, there have been numerous attempts to encourage and promote collaboration and cooperation in the fight against COVID-19: the World Health Organisation maintains a COVID-19 global research database; scientific journals have published explicit calls for teamwork and cooperation 12, 13 ; in many cases COVID-19 related research has been made freely available to the public and the scientific community; comprehensive datasets have been created and shared; and reports from the International Chamber of Commerce (ICC) and the Organisation for Economic Co-operation and Development (OECD) have argued for international and multidisciplinary collaboration in the response to the pandemic. Although early studies have found that the pandemic has generated a significant degree of novel collaboration 14, 15 other research has suggested that COVID-19 research have been less internationally collaborative than expected, compared with recent research from the years immediately prior to the pandemic 14, 16 . There is also some evidence that COVID-19 teams have been smaller than their pre-2020 counterparts 16, 17 . Thus, despite calls for greater collaboration, the evidence points to less collaboration in COVID-19 related research, perhaps because of the startup and coordination costs associated with multidisciplinary research 14, 16, 17 combined with the urgency of the pandemic response that has been needed. In this study we evaluate the scale and nature of collaboration in COVID-19 research during 2020, using scientometric analysis techniques to analyse COVID and non-COVID publications before (non-COVID) and during (COVID and non-COVID) the pandemic. We determine the nature of collaboration in these datasets using three different collaboration measures: (i) the Collaboration Index (CI) 7 , to estimate the degree of collaboration in a body of research; (ii) author multidisciplinarity to estimate the rate at which authors publish in different disciplines; and (iii) team multidisciplinarity to estimate subject diversity across research teams. We find a lower CI for COVID-related research teams, despite an increasing CI trend for non-COVID work, before and during the pandemic, but COVID-related research is associated with higher author multidisciplinarity and more diverse research teams. This research can help us to better understand the nature of the research that has been conducted under pandemic conditions, which may be useful when it comes to coordinating similar large-scale initiatives in the future. Moreover, we develop a number of techniques for exploring the nature of collaborative research, which we believe will be of general interest to academics, research institutions, and funding agencies. In this section we describe our methods for evaluating scientific collaboration in COVID-19 research. We describe the data that we use throughout our analysis, and we outline three approaches used to evaluate collaboration activity. The COVID-19 Open Research Dataset (CORD-19) 18 comprises more than 400,000 scholarly articles, including over 150,000 with full text, all related to COVID-19, SARS-CoV-2, and similar coronaviruses. CORD-19 papers are sourced from PubMed, PubMed Central, bioRxiv, medRxiv, arXiv, and the World Health Organisation's COVID-19 database. We generate a set COVID-19-related research by excluding articles dated prior to 2020 and the resulting dataset contains CORD-19 metadata for 166,356 research papers containing the terms "COVID", "COVID-19", "Coronavirus", "Corona virus", "2019-nCoV", "SARS-CoV", "MERS-CoV", "Severe Acute Respiratory Syndrome" or "Middle East Respiratory Syndrome". We supplement this metadata with bibliographic information from the Microsoft Academic Graph (MAG) 19 . Notably, we use the MAG fields of study (FoS) to categorise research papers. The MAG uses hierarchical topic modelling to identify and assign research topics to individual papers, each of which represents a specific field of study. To date, this approach has identified a hierarchy of over 700,000 topics within the Microsoft Academic Knowledge corpus. In our dataset of 166,356 COVID-19 research articles, the average paper is associated with 9 FoS from different levels in this hierarchy and in total, 65,427 unique fields are represented. To produce a more useful categorisation of articles, we first reduce the number of topics by replacing each field with its parent and then consider topics at two levels in the FoS hierarchy: (i) the 19 FoS at level 0, which we refer to as 'disciplines', and (ii) the 292 FoS at level 1, which we refer to as 'sub-disciplines'. In this way, each article is associated with a set of disciplines (e.g. 'Medicine', 'Physics', 'Engineering') and sub-disciplines (e.g. 'Virology', 'Particle Physics', 'Electronic Engineering'), which are identified by traversing the FoS hierarchy from the fields originally assigned to the paper. Articles Fields We further extend this dataset with any additional research published by the authors in the COVID-related dataset. Thus, for each author, we include MAG metadata from any available articles dated after 2015. The final dataset consists of metadata for 5,389,445 research papers, which we divide into three distinct groups as follows; see Table 1 with further detail provided in the supplementary materials that accompany this article (Supplementary Table 1 3. 2020-non-COVID research: 1,205,434 non-COVID related articles published during the pandemic period and which are not in the CORD dataset. The Annual Collaboration Index (CI) is defined, for a body of work, as the ratio of the number of authors of co-authored articles to the total number of co-authored articles 7 . Since larger (more collaborative) teams have been shown to be more successful than smaller teams 6, 8, 20 , we can use CI to compare COVID-related research to non-COVID baselines. However, CI is sensitive to the total number of articles in the corpus. Therefore, to facilitate comparison across our COVID and non-COVID baselines we generate a CI distribution for each dataset by re-sampling 50,000 papers 1,000 times, without replacement, from each year, and we calculate the sample distribution for these CI values for each year in our dataset. To evaluate the multidisciplinarity of individual authors, we consider the extent to which they publish across multiple disciplines, based on a network representation of their publications. An un-weighted bipartite network, populated by research fields and authors, links researchers to subjects (that is, based on the subjects of their publications). A projection of this network produces a dense graph of the 292 sub-disciplines at level 1 in the MAG FoS hierarchy, with two sub-disciplines/fields are linked if an author has published work in both. We refer to this projection as a field of study network. In such a network, the edges between fields are weighted according to the number of authors publishing in both fields. Due to the large number of researchers, and the relatively small number of sub-disciplines, the resulting graph is almost fully connected. Thus, the edge weights are an important way to distinguish between edges. Using the MAG FoS hierarchy, we divide the network nodes into 19 overlapping "communities", based on their assignment to level 0 fields of study. This facilitates the characterization of the edges in the graph: an edge within a community represents an author publishing in two sub-disciplines within the same parent discipline, while an edge between communities represents an author publishing in two sub-disciplines from different parent disciplines. For example, if an author publishes research in 'Machine Learning' and 'Databases', the resulting edge is considered to be within the community/discipline of 'Computer Science'. Conversely, if an author publishes in 'Machine Learning' and 'Radiography', the resulting edge is considered to be between the 'Medicine' and 'Computer Science' communities. An edge between disciplines may represent either a single piece of interdisciplinary research or an author publishing separate pieces of research in two different disciplines. To evaluate the effect of COVID-19 on author multidisciplinarity, we produce a field of study network for each year in our dataset and calculate the proportion of the total edge weights that exist between communities. In the special case of 2020 we also report the odds ratio achieved when we compare 2019 with non-COVID research in 2020 (i.e., after we remove COVID-19 research from the graph). In addition to author multidisciplinarity, we also consider the multidisciplinarity of the research teams, by calculating their disciplinary diversity. To do this we compare the research backgrounds of different authors using publication vectors based on the proportions of a researcher's work published across different fields 21 . Specifically, we construct publication vectors for authors in our dataset using the 19 MAG disciplines. Thus, an author's publication vector is a 19-dimensional vector, with each value indicating the proportion of the author's research published in the corresponding domain. For example, an author who has 50 publications classified under 'Computer Science', 30 publications classified under 'Mathematics', and 20 publications classified under 'Biology' would have a publication vector with values {0.5,0.3,0.2} for the entries corresponding to these disciplines respectively, and zeros elsewhere. By using publication vectors to represent an individual's research profile, we can quantify the disciplinary diversity of a research team using Equation 1 from 21 . Note, in Equation 1 |p| refers to the size of the research team and S i j is the cosine similarity of the publication vectors for authors i and j. The team research similarity score for an article is a normalized sum of the pairwise cosine similarities for all authors of the article. To evaluate research team disciplinary diversity, we compute the teams' disciplinary similarity based on publication vectors from pre-2020 research, and we report 1 − S team as the teams' diversity. The year of the paper is excluded from the publication vector to avoid reducing team diversity with the common publication. As such, team disciplinary diversities for COVID-related research (and non-COVID research from 2020) are calculated from publication vectors which exclude work from 2020. We compare these scores with disciplinary diversity scores for research in 2019 when, similarly, the publication vectors exclude work from 2019 and 2020. As the potential for disciplinary diversity in research teams is limited by the number of team members, we compare diversity by team size. The field of study network structure used to calculate author multidisciplinarity encodes relationships between fields of study, with respect to the authors who publish in them. Since these relationships are altered in COVID-related research, we propose a modified network structure to explore the changes to these relationships visually, and to highlight interesting case studies of multidisciplinary research in the COVID-19 literature. In this modified network structure, COVID-related research articles contribute directed edges (SD A , SD B ) to the graph, for all sub-disciplines SD A in which the authors publish in their pre-2020 work, and all sub-disciplines SD B which relate to the article. For example, an edge between the pair of sub-disciplines 'Machine Learning' and 'Radiology' represents an author who published in the field of 'Machine Learning' in their pre-2020 work (2016-2019), publishing COVID-19 research in the field of 'Radiology'. We produce networks of this structure from different subsets of COVID-related research articles, which we will visualise using flow diagrams, where the pre-2020 sub-disciplines are on the left and the COVID-related disciplines are on the right. Figure 1 reports the mean Collaboration Index for the samples of 50,000 research papers taken from each year in the dataset. Mean values for samples of COVID-19 research articles are also included. The Collaboration Index increases year-on-year, indicating a move towards larger research teams. This trend has been noted across many disciplines of academic research 6, 8, 9 . COVID-19 research presents with a very different CI (approximately 5.6), however, indicating that COVID-19 research teams are significantly smaller than expected for research conducted by the same authors in 2020. This result is robust with respect to re-sampling size and in the supplementary materials that accompany this article (see Supplementary Figure 1 ) we report comparable results using sample sizes n = 10,000 and n = 100,000. 1,000 samples are taken from each year (2016-2020). Collaboration index increases annually, r 2 = 0.94, and the CI for COVID-19 articles is significantly less that the CI associated with non-COVID 2020 research; in fact the mean COVID-19 CI is 25 standard deviations below the mean of of non-COVID samples taken from 2020. Thus, research teams publishing COVID-19 research are significantly smaller than expected for research teams in 2020 containing the same authors. We quantify author multidisciplinarity in a year of research by measuring the proportion of the total number of edges in an author-FoS network that are between communities (i.e., disciplines). We find that this proportion is increasing slowly over 4/11 Figure 2 . Odds ratio effect sizes for the proportion of links between disciplines when compared with the previous year. A score of 1 indicates that authors are no more likely to publish in other disciplines than they were in the previous year. time when we produce FoS networks for each year in our data (slope = 0.2%, r 2 = 0.98). Figure 2 reports the odds ratio effect size when the proportion of the edges that are between communities in a given year is compared with that of the previous year. These scores are reported for each community. In the case of 2020 we also report the odds ratio achieved when we compare 2019 with 2020-non-COVID research i.e., after we remove COVID-19 research from the graph. Figure 2 shows an increase in multidisciplinary publication in 2020 across almost all disciplines. The increase in author multidisciplinarity is much greater when we include COVID-19 research in the graph. Despite representing less than 20% of the work published in 2020, COVID-19 research contributes greatly to the proportion inter-disciplinary edges in the FoS network. When we compare authors by their publication backgrounds, encoded as publication vectors, we find COVID-19 research teams to be more diverse than equivalently-sized research teams who published before 2020. Figure 3 presents the relative increase in mean research team disciplinary diversity for different team sizes, when research teams from 2020 are compared with teams from 2019. We divide 2020 research into two sets: (i) 2020-COVID-related; (ii) 2020-non-COVID research and report relative increases in team diversity for each set. Independent t tests show COVID-19 research teams to be significantly more diverse than both pre-2020 and 2020-non-COVID research teams of the same size (p < 0.01, see Supplemental Table 4 ). Despite the recent trend towards larger, more collaborative research teams 6, 8, 9, 21 , COVID-19 research appears to have significantly fewer authors than other publications by the same researchers, during 2020. This may be a concerning finding amid evidence that larger teams produce more impactful scientific research 8 publications relating to the coronavirus. Yet, the majority of COVID-19 research papers (53%) have 4 or fewer authors. We find no evidence that the reduced Collaboration Index of COVID-19 research is due to working conditions and restrictions during the pandemic. Despite a global shift towards remote work, research in 2020 continues the recent trend of increasing collaboration. The preference for smaller research teams appears to be specific to COVID-19 research and not simply a factor of research during COVID-19. The prevalence of smaller research teams is important to understand about COVID-19 research. Smaller teams have been shown to play a different role to larger teams in both research and technology 22 . In an analysis of research collaborations, Wu et al. show that small research teams can disrupt science and technology by exploring and amplifying promising ideas from older and less-popular work, while large teams develop on recent successes by solving acknowledged problems 22 . The definition by Wu et al. of disruptive articles relates closely to the metric of betweenness centrality for citation networks. That is, disruptive papers can connect otherwise separate communities in a research network. We find some evidence that COVID-19 research may be increasing the connectivity between disciplines, as authors are more likely to publish across multiple fields and research teams are more diverse. A trend towards greater levels of multidisciplinary collaboration has been identified in many scientific disciplines 9 . This trend is evident in the non-COVID-19 portions of our dataset. Research teams of fewer than 10 members publishing in 2020 exhibit greater disciplinary diversity than similarly-sized teams publishing in 2019, for example. Likewise, the number of authors publishing in multiple disciplines is increasing steadily year-on-year. In COVID-19 research, the increase in multidisciplinarity (of both teams and individuals) exceeds the established trend. This may be evidence of the disruptive nature of COVID-related research. Below, we use flow diagrams to explore author multidisciplinarity in specific topics in the COVID-related research dataset. Figures 4, 5, 6 and 7 present four selected case studies of author multidisciplinarity in COVID-related research in 2020. To provide a clear visualisation of the strongest trends that exist, each FoS network shows only the 50 edges with the greatest weights. We choose Virology as a case study because it is largest subset in COVID-related research, while Computer Science and Materials Science were chosen to show considerable increases in author multidisciplinarity in 2020 (see figure 2) , and Development Economics presents with a very diverse set of contributing disciplines. For example, Figure 4 shows the intersection between Medicine, Biology and Chemistry in COVID-19 research relating to Virology. Sub-disciplines Molecular Biology, Biochemistry, Immunology, and Virology all appear closely related in this graph. They are strongly interconnected, indicating many instances of authors publishing between disciplines and each acts as both a source and as a destination in the network, as authors who publish in any of these sub-disciplines prior to COVID-19 are likely to publish in the others during COVID-19. Figure 5 illustrates the multidisciplinary nature of Computer Science research in COVID-19. Unlike the Virology graph in figure 4 , there are only two destinations in this network: Computer Science and Medicine. Computer Science research in the COVID-19 dataset is primarily focused on Machine Learning solutions to automating COVID-19 detection from medical images 1 (See Supplementary Table 6(a) ). This effort is evident in the graph, as Computer Science research in COVID-19 is most commonly characterised within the sub-disciplines Machine Learning, Artificial Intelligence, Pathology, Surgery and Algorithm. Also evident is the multidisciplinary nature of the effort, as researchers with backgrounds in many of the S.T.E.M. fields are shown to contribute. Figure 6 reports the FoS network for COVID-19 research relating to Materials Science. The graphs illustrates an intersection between the fields of Physics, Chemistry, Engineering and Materials Science as researchers from each of these disciplines contributes to coronavirus research. Many of the most cited articles in this subset relate to airborne particles and the efficacy of face masks 3 , along with the use of electrochemical biosensors for pathogen detection 23 (See Supplementary Table 7(a) ). Figure 7 presents the FoS network for the COVID-19 related research papers in the field of Development Economics. Some of the most cited articles in this subset concern studies of the socio-economic implications and effects of the pandemic globally 2, 24 , and of health inequity in low-and middle-income countries 25, 26 (See Supplementary Table 8(a) ). Research in this subset is characterised by the diverse set of sub-disciplines shown on the left of the figure, as authors with backgrounds in social science, social psychology, medicine, statistics, economics, and biology are all found to contribute. The methods outlined in this work could be applied in future scientometric analyses to assess and visualise multidisciplinarity in a body of research. This may be of interest to researchers seeking to understand the evolution of their own field of study, or to funding agencies who recognise the established benefits of multidisciplinary collaboration. In the case of this work, we show COVID-19 research teams to be smaller yet more multidisciplinary than non-COVID-19 teams. It is suggested in early work that authors publishing COVID-19 research favoured smaller, less international collaborations in order to reduce co-ordination costs and contribute to the public health effort sooner 16 . We would like to elaborate on this characterisation of collaboration in COVID-19 research; adding that authors sought to minimise the limitations of working in smaller teams by collaborating with scientists from diverse research backgrounds. That is to say, in the urgency of the pandemic, scientists favour smaller, more multidisciplinary research teams in order to collaborate more efficiently. Table 1 . Summary of 2020-COVID-related research analysed in this work, in terms of the number of research articles and mean number of authors per paper for each discipline. In the case of each discipline, we include any articles that are assigned to fields of study that are within the discipline, i.e., the discipline is a parent of any of the smaller fields of study to which the paper is assigned. We also report the number of articles that are explicitly assigned to that discipline, i.e., the discipline is included in the original set of fields of study attributed to the paper. Table 3 . Summary of pre-2020 research analysed in this work, in terms of the number of research articles and mean number of authors per paper for each discipline. Table 4 . Independent t tests comparing research team diversities. We compare diversity scores for teams of equal sizes with the null hypothesis that the mean diversity in each distribution is the same. Each test is one-tailed with the alternative hypothesis stated in the sub-caption. Values significant at p < 0.01 are marked with a *. COVID research teams of all sizes are shown to be significantly more diverse than non-COVID baselines (see tables 4(a) and 4(b)). Title Citations Artificial intelligence in the battle against coronavirus (covid-19): A survey and future research directions The socio-economic implications of the coronavirus pandemic (covid-19): A review Effectiveness of common fabrics to block aqueous aerosols of covid virus-like nanoparticles From sole investigator to team scientist: Trends in the practice and study of research collaboration A bibliometric analysis of the interdisciplinary field of cultural evolution Team size matters: Collaboration and scientific impact since 1900 Is science becoming more interdisciplinary? measuring and mapping six research fields over time Interdisciplinarity revisited: evidence for research impact and dynamism Long-distance interdisciplinarity leads to higher scientific impact Extensive partnership, collaboration, and teamwork is required to stop the covid-19 outbreak Communication, collaboration and cooperation can stop the 2019 coronavirus Can pandemics transform scientific novelty? Evidence from COVID-19 This research was supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2. The data used in our study can reproduced from the set of Microsoft Academic Graph article IDs, which will be made available upon request.