key: cord-1030151-t2c7dl50
authors: Gai, Nan; Aoyama, Kazuyoshi; Faraoni, David; Goldenberg, Neil M.; Levin, David N.; Maynes, Jason T.; McVey, Mark J.; Munshey, Farrukh; Siddiqui, Asad; Switzer, Timothy; Steinberg, Benjamin E.
title: General medical publications during COVID-19 show increased dissemination despite lower validation
date: 2021-02-02
journal: PLoS One
DOI: 10.1371/journal.pone.0246427
sha: 72186c93e21622807d938b1af82532898aa3e19f
doc_id: 1030151
cord_uid: t2c7dl50

BACKGROUND: The COVID-19 pandemic has yielded an unprecedented quantity of new publications, contributing to an overwhelming quantity of information and leading to the rapid dissemination of less stringently validated information. Yet, a formal analysis of how the medical literature has changed during the pandemic is lacking. In this analysis, we aimed to quantify how scientific publications changed at the outset of the COVID-19 pandemic. METHODS: We performed a cross-sectional bibliometric study of published studies in four high-impact medical journals to identify differences in the characteristics of COVID-19 related publications compared to non-pandemic studies. Original investigations related to SARS-CoV-2 and COVID-19 published in March and April 2020 were identified and compared to non-COVID-19 research publications over the same two-month period in 2019 and 2020. Extracted data included publication characteristics, study characteristics, author characteristics, and impact metrics. Our primary measure was principal component analysis (PCA) of publication characteristics and impact metrics across groups. RESULTS: We identified 402 publications that met inclusion criteria: 76 were related to COVID-19; 154 and 172 were non-COVID publications over the same period in 2020 and 2019, respectively. PCA utilizing the collected bibliometric data revealed segregation of the COVID-19 literature subset from both groups of non-COVID literature (2019 and 2020). COVID-19 publications were more likely to describe prospective observational (31.6%) or case series (41.8%) studies without industry funding as compared with non-COVID articles, which were represented primarily by randomized controlled trials (32.5% and 36.6% in the non-COVID literature from 2020 and 2019, respectively). CONCLUSIONS: In this cross-sectional study of publications in four general medical journals, COVID-related articles were significantly different from non-COVID articles based on article characteristics and impact metrics. COVID-related studies were generally shorter articles reporting observational studies with less literature cited and fewer study sites, suggestive of more limited scientific support. They nevertheless had much higher dissemination.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

The coronavirus disease 2019 (COVID-19) pandemic has given rise to an unprecedented quantity of publications in a short period of time as researchers worldwide attempt to report their experiences to better understand this new disease and identify promising treatments [1] . This has contributed to a COVID-19 "infodemic"-an overwhelming quantity of information, leading to the rapid dissemination of less stringently validated information [2] .

Given the devastating severity of COVID-19, there is an understandable urgency to disseminate new findings. However, the rush to publish has potentially led to the compromise of scientific integrity [3] . This has led to advocacy for quality over quantity, cautioning that a crisis is no excuse for lowering scientific standards [3] [4] [5] . Yet, the COVID-19 pandemic has magnified traditional problems of "uninformative" clinical trials-those whose results are not useful to patients, clinicians, researchers, or policy makers [6, 7] .

While specific concerns about COVID-19-related publications have been expressed [8] , a formal analysis of the extent to which the medical literature has shifted during the pandemic is lacking. In this analysis, we aimed to quantify how scientific publications changed at the outset of the COVID-19 pandemic by performing a cross-sectional bibliometric study of published studies in four high-impact medical journals to identify differences in the characteristics of COVID-19 related publications compared to non-pandemic related studies.

This is a cross-sectional bibliometric study of original COVID-19 related research publications in the four general medical journals with the highest impact factors [9] -The Journal of the American Medical Association (JAMA), New England Journal of Medicine (NEJM), The Lancet, and Nature Medicine. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines [10] .

We searched for original investigations related to SARS-CoV-2 and COVID-19 published in March and April 2020 through MEDLINE. MEDLINE alone was used because it contained entries for all publications within our four journals of interest. Accordingly, other databases were not consulted. As comparison groups, we retrieved all non-COVID-19 research publications over the same two-month period in 2019 and 2020. We included original scientific research, and excluded opinion, news, and educational pieces. Two reviewers verified studies for inclusion and two reviewers audited extracted data. Any discrepancies in eligibility assessment and data collection were resolved by consensus. Extracted data included publication characteristics, study characteristics, author characteristics, and impact metrics. Impact metrics (numbers of reads, citations, and tweets) were not normalized to the time since publication.

Categorical data are presented as counts and percentages and continuous data as medians and interquartile ranges (IQRs). Our primary measure was principal component analysis (PCA) of publication characteristics and impact metrics across groups. In our study, we sought to discover any differences in multiple article metrics between the 2020 COVID period and historical controls. Principal component analysis allows for the determination of the largest contributors to the variance in the data across all article metrics, in an unsupervised fashion without biasing data segregation [11] . Using PCA allows us to identify the most important features that capture the maximum information about the dataset, reducing dimensionality without any significant loss of information. Comparisons between groups were conducted using Chi-square or Fisher's exact tests for proportions and non-parametric Kruskal-Wallis tests with Dunn's multiple comparison for continuous data. Data for each journal were aggregated for analysis. P values less than 0.05 were considered statistically significant. Analyses were performed using GraphPad PRISM software version 7.0 and RStudio version 1.3.1056.

The initial MEDLINE literature search identified 1,119 total articles for consideration (262 COVID-related). We identified 402 publications that met inclusion criteria: 76 were related to COVID-19; 154 and 172 were non-COVID publications over the same period in 2020 and 2019, respectively (data available in S1 Dataset). Principal component analysis utilizing the collected bibliometric data revealed segregation of the COVID-19 literature subset from both groups of non-COVID literature (2019 and 2020), verifying that the bibliometric characteristics capture a change in publication metrics (Fig 1) . The most significant contributions to the PCA came from metrics representing article dissemination (reads, tweets, and citations with 57%, 54%, and 43% each towards the first principal component, PC1). The two non-COVID subsets of data possess a near overlap in the PCA, indicating a strong consistency between the two years analyzed and emphasizing the uniqueness of the COVID-related literature.

To further evaluate how the published COVID-19 research literature differed from non-COVID-19 investigations, we first compared their publication characteristics (Table 1) . Publication characteristics segregated by individual journal are provided in the Table in S1 Table. COVID-19 publications were more likely to describe prospective observational (31.6%) or case series (41.8%) studies without industry funding as compared with non-COVID articles, which were represented primarily by randomized controlled trials (32.5% and 36.6% in the non-COVID literature from 2020 and 2019, respectively). Moreover, COVID-related publications had lower word counts with fewer citations of other medical literature. While the number of authors was unchanged, the number of author affiliations was decreased, suggesting a lower level of collaborative or multi-institutional studies. There was no observed difference in the proportion of female first or corresponding authors. For Nature Medicine, the only evaluated journal to report submission dates, COVID-related submissions were published in a much shorter amount of time (35.1 days versus 288.3 and 305.3 days for 2020 and 2019 non-COVID publications, respectively).

The observed differences in publication characteristics presumably represents the initial effort to quickly provide clinicians and policymakers with information in the early phase of the pandemic, regardless of quality. To objectively evaluate the extent to which the COVID-19 literature was disseminated, we analyzed the number of accesses, tweets, and citations within our bibliometric dataset. Publications related to COVID had an order of magnitude greater accesses, tweets, and citations compared with non-COVID publications from the same period (Table 1 ). This absolute difference does not consider the greater time since publication of articles from 2019 and therefore may conservatively underestimate the unparalleled rate at which observational data spread across the international medical community.

Using an unbiased approach, our PCA suggests that published pandemic-related studies have different article characteristics and impact metrics compared with non-COVID studies. They unbiased analysis suggests COVID-related publications differ from both concurrent and historic non-COVID publications.

https://doi.org/10.1371/journal.pone.0246427.g001 generally consist of shorter articles reporting observational studies with less literature cited and fewer study sites, suggestive of more limited scientific support. Yet, pandemic-related research is associated with greater reach in terms of readership, citations, and tweets, which speaks to the strong appetite for pandemic-related findings. The publication characteristics described in our analysis reflect the urgency with which the medical, scientific, and lay communities sought information as the pandemic evolved. This on-going need, however, should be tempered with scientific and ethical oversight that is at least as rigorous as normal times with a focus on well-designed trials and not rapid dissemination of low-quality data. The potential harms of producing multiple iterations of lower-quality studies have been identified, including wasting of resources, lapses in the ethical standard of scientific reporting, delaying the conduct of higher-level evidence trials, diluting the quality of available evidence, and endangering the ethical responsibility to patients who enroll in trials with the expectation of assisting in medical and scientific advancement [6, 12, 13] . Researchers should endeavour to maintain high-quality research methods by increasing collaboration across multiple centres, helping to overcome limitations that may exist from single-centre efforts [3, 14] . International teams working in concert and not in competition on welldesigned studies would greatly improve the capacity to detect clinically meaningful effects to inform the international health system's efforts against COVID-19. For example, research consortia could establish research priorities and promote the implementation of master protocols with adaptive platforms [15] [16] [17] . This type of approach is designed for the perpetual investigation of multiple interventions with timely adaptation, an ideal framework for our evolving COVID-19 health crisis that would facilitate wider collaboration and mitigate against the production of low-quality evidence and poor scientific reporting.

Efforts have also focused on the expanding COVID-19 literature itself using both manual and automated methods. Content experts have been vetting the published literature to provide health care workers and policymakers with curated digital compendiums of high-quality research papers, such as the 2019 Novel Coronavirus Research Compendium [18] . Computational approaches are being used to mine the published COVID-19 literature to answer key questions related to the pandemic [19] . As these resources continue to grow, increasing effort will be required to ensure that the medical, scientific, and lay communities can engage with the resulting data and analyses in a meaningful way.

Our analysis, however, has limitations. We focus on the earliest phase of pandemic in order to capture how the medical community first pivoted to acquire and disseminate COVID-19-related knowledge. This potentially biases our results towards observational studies as there would be limited time to advance and report more rigorous study designs, such as randomized controlled trials. Moreover, to efficiently disseminate medical knowledge, the included journals made pandemic-related content freely available, which may have contributed to the observed increase in impact metrics. Lastly, our bibliometric analysis does not consider the root cause of the disparity between COVID and non-COVID publications. This is likely multifactorial but could, in part, reflect the feasibility of a timely study completion, variable adherence to reporting standards, and a strained peer review system. Ongoing evaluations of the publication process over the entirety of the pandemic will inform how the scientific community can most effectively, safely, and ethically disseminate valuable medical knowledge in a time of acute crisis.

COVID-19 led to a significant change in the characteristics of research studies across highimpact general medical journals. During this pandemic, the rapid and broad dissemination of research findings, regardless of underlying quality, were amplified and potentially contributed to the infodemic of misinformation at a time when best evidence needs to be emphasized. Ultimately, relaxing the rigorous standards for scientific research, although tempting for many altruistic reasons during a pandemic, may not actually achieve the objective of producing a solid evidence-based foundation upon which patients, clinicians, and policymakers can make meaningful decisions. The scientific and medical communities must strongly advocate for the thoughtful selection of high-quality research that will ensure the generation of meaningful knowledge and that participants of scientific trials who volunteer their health experience do not do so in vain.

Supporting information S1 Dataset. The dataset used for the analyses in this study. (XLSX) S1 

Flattening the curve of new publications on COVID-19

Framework for Managing the COVID-19 Infodemic: Methods and Results of an Online, Crowdsourced WHO Technical Consultation

Against pandemic research exceptionalism

Randomized Clinical Trials and COVID-19: Managing Expectations

Preserving Clinical Trial Integrity During the Coronavirus Pandemic

Harms From Uninformative Clinical Trials

Characteristics and Strength of Evidence of COVID-19 Studies Registered on ClinicalTrials.gov

Setting Expectations for Clinical Research During the COVID-19 Pandemic

The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies

Principal component analysis: a review and recent developments

Editorial Concern-Possible Reporting of the Same Patients With COVID-19 in Different Reports

Weighing the Benefits and Risks of Proliferating Observational Treatment Assessments: Observational Cacophony, Randomized Harmony

Generating randomized trial evidence to optimize treatment in the COVID-19 pandemic

Adaptive platform trials: definition, design, conduct and reporting considerations

Creating a Framework for Conducting Randomized Clinical Trials during Disease Outbreaks

Master Protocols to Study Multiple Therapies, Multiple Diseases, or Both

Novel Coronavirus Research Compendium (NCRC)

CORD-19: The COVID-19 Open Research Dataset

Conceptualization: Nan Gai, David Faraoni, Jason T. Maynes, Benjamin E. Steinberg.