key: cord-0183488-ptry4bnc authors: Stegmann, Johannes title: MeSH descriptors indicate the knowledge growth in the SARS-CoV-2/COVID-19 pandemic date: 2020-05-13 journal: nan DOI: nan sha: 0b2b9baffc3066455d3b2c2a982506647f6e26e0 doc_id: 183488 cord_uid: ptry4bnc The scientific papers dealing with the novel betacoronavirus SARS-CoV-2 and the coronavirus disease 2019 (COVID-19) caused by this virus, published in 2020 and recorded in the database PUBMED, were retrieved on April 27, 2020. About 20% of the records contain Medical Subject Headings (MeSH), keywords assigned to records in the course of the indexing process in order to summarise the articles' contents. The temporal sequence of the first occurrences of the keywords was determined, thus giving insight into the growth of the knowledge base of the pandemic. The rapid worldwide spread of the new epidemic COVID-19, caused by the virus SARS-CoV-2, with now more than 3.4 million confirmed cases of the disease and a confirmed death rate of almost 7% (World Health Organization, 2020) requires fast and comprehensive efforts of states and societies to combat the disease effectively by means of practical and appropriate medical, administrative and economic actions. Moreover, the scientific community has the responsibility to bundle resources and manpower to develop * Member of the Ernst-Reuter-Gesellschaft der Freunde, Förderer und Ehemaligen der Freien Universität Berlin e.V. † Former (now retired) employee of the Medical Library of the Free University Berlin and the Charité Berlin. ‡ Radebeul, Germany, johannes.stegmann@fu-berlin.de tests, drugs and vaccines in order to gain control over the virus and the disease as quick as possible. An important research tool is immediate and unlimited access to the scientific literature. For the biomedical specialties, the freely available database PUBMED/MEDLINE * is indispensable for a comprehensive retrieval of the published scientific papers on biomedical research questions. Besides the bibliographic metadata (as author name(s), publication year, journal name and volume, etc.) PUBMED records are indexed by a controlled vocabulary of many thousands descriptors, the Medical Subject Headings (MeSH † ). In addition to the words contained in titles and abstracts of indexed papers, the MeSH descriptors assigned to PUBMED records are significant for a thorough analysis of the papers' content. In the study presented here the publications on SARS-CoV-2 and COVID-19 were retrieved and downloaded from PUBMED. The MeSH descriptors were extracted from the records already annotated with MeSH. The keywords were ordered chronologically according to the publication date of their associated papers and their first occurences were determined. * provided by the U.S. National Center for Biotechnology Information, NCBI, www.ncbi.nlm.nih.gov/pubmed/ † PUBMED's hierarchichal thesaurus, the U.S. National Library of Medicine's controlled vocabulary, www.ncbi.nlm.nih.gov/mesh Papers published 2020 were retrieved and downloaded from PUBMED on April 27, 2020, using the following search profile: new coronavirus* OR novel coronavirus* OR ncov OR sars-cov OR covid* OR cov-2 OR cov-19 (the truncation asterisk -"*" -retrieves all terms with that word stem). The Medical Subject Headings (MeSH) assigned to PUBMED records are contained in the MH fields. Controlled vocabulary is also contained in the RN fields. The contents of both fields were extracted from the records. In addition, the unique record numbers and the database indexing date were extracted from the PMID and MHDA fields, respectively. Extraction of record field contents, clustering, data analysis, calculations and visualisation were done using homemade programs and scripts for perl (version 5.26.1) and the software package R version 3.4.4 (R Core Team, 2018). All operations were done on a commercial PC run under Ubuntu version 18.04 LTS. The search profile mentioned in the Methods paragraph retrieved 7366 publications for the period January to April 27, 2020 (Table 1 ). The daily distribution of the items is shown in Figure 1 . In January and February 2020 few papers appeared, followed by a considerable publication boost in March and April. Similar data were published by Kousha and Thelwall (2020) and Torres-Salinas (2020). About 20% of the papers have already assigned MeSH terms (Table 1) . The daily distribution of papers with MeSH terms (Figure 2) is more or less similar to that of all papers (compare Figure 2 and Figure 1) . Figure 3 shows that the cumulation of MeSH terms parallels the cumulation of MeSH papers. Both, Figure 2 and Figure 3 , indicate that the MeSH terms used to classify SARS-CoV-2/COVID-19 papers may exhibit to some extent the knowledge accumulation and development around the pandemic. Tables 2 and 3 show the addition of (new) MeSH terms to indexed papers by day. Table 2 shows the numbers, Table 3 examples of the terms. Table 2 lists the dates and the number of publications with MeSH indexed as well as the numbers of new MeSH terms, i.e. MeSH terms which are not contained in the papers of the preceding dates. In Table 3 selected Medical Subject Headings are listed according to the sequence of their appearance from February to April 2020. The MeSH terms assigned to papers in the first half of February 2020 indicate the knowledge of a disease outbreak in China of pandemic proportions, caused by a betacoronavirus, and both, disease and virus, are already labelled (Table 3 ). The concomitant -possibly life-threatening -implications of the new disease, disease-spreading mechanisms, necessary diagnostic tools, assessment of especially vulnerable age groups, problems of health care systems, possible drug therapy schemes and other therapy approaches become evident using the information contained in MeSH terms assigned to papers published in subsequent days, weeks and months. Although the fraction of papers with assigned MeSH terms is relatively low (see Table 1 ), may the whole set of already more than 1700 MeSH terms (at the download date, see e.g. Table 2 ) greatly benefit (not only) medical experts. The short analysis of SARS-CoV-2/COVID-19 publications presented here shows that careful inspection of the assigned Medical Subject Headings is worthwhile and associated with an increase of the knowledge base of the pandemic. COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls R: A language and environment for statistical computing. R Foundation for Statistical Computing Daily growth rate of scientific production on Covid-19 Análisis en bases de datos y repositorios en acceso abierto. El profesional de la información WHO Coronavirus Disease (COVID-19) Dashboard 04 03 2020 1 23 2 84 11 04 2020 93 855 79 1236 07 03 2020 8 31 18 102 14 04 2020 56 911 49 1285 10 03 2020 3 34 6 108 15 04 2020 69 980 60 1345 11 03 2020 3 37 11 119 16 04 2020 58 1038 61 1406 13 03 2020 4 41 9 128 17 04 2020 54 1092 56 1462 14 03 2020 5 46 12 140 18 04 2020 53 1145 43 1505 17 03 2020 60 106 137 277 21 04 2020 59 1204 37 1542 18 03 2020 28 134 53 330 22 04 2020 66 1270 44 1586 19 03 2020 96 230 155 485 23 04 2020 44 1314 37 1623 20 03 2020 37 267 62 547 24 04 2020 108 1422 76 1699 25 04 2020 82 1504 70 1769