key: cord-0987712-eav5gr3y authors: Tran, B. X.; Ha, G. H.; Nguyen, L. H.; Vu, G. T.; Phan, H. T.; Le, H. T.; Latkin, C. A.; Ho, C. S. H.; Ho, R. C. M. title: Studies of Novel Coronavirus Disease 19 (COVID-19) Pandemic: A Global Analysis of Literature date: 2020-05-08 journal: nan DOI: 10.1101/2020.05.05.20092635 sha: 9f63e1215cb5d500cc3ce241560777a142e87aeb doc_id: 987712 cord_uid: eav5gr3y An exponential growth of literature about novel coronavirus disease 19 (COVID-19) has been observed in the last few months. This textual analysis of 5,780 publications extracted from the Web of Science, Medline, and Scopus databases was performed to explore the current research focuses and propose further research agenda. The Latent Dirichlet allocation was used for topic modeling. Regression analysis was conducted to examine country variations in the research focuses. Results indicated that publications were mainly contributed by the United States, China, and European countries. Guidelines for emergency care and surgical, viral pathogenesis, and global responses in the COVID-19 pandemic were the most common topics. There was variation in the research approaches to mitigate COVID-19 problems in countries with different income and transmission levels. Findings highlighted the need for global research collaboration among high- and low/middle-income countries in the different stages of prevention and control the pandemic. Novel coronavirus disease 19 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is currently threatening millions of lives in the world. After being introduced at the end of 2019, this disease was officially declared as a global pandemic by the World Health Organization (WHO) on March 11, 2020. 1 Until April 30, 2020, 185 countries/territories reported 3·2 million confirmed cases with 227,847 total deaths, 2 and the highest-burden has been placed in European and American countries. 1 Serious health, social and economic consequences of COVID-19 have been well-recognized, [3] [4] [5] [6] [7] especially among the elderly with comorbidities, homeless individuals, and also residents who face financial, mental, and physical hardships due to social distancing policies. 8 Given that COVID-19 is a new threat without any antiviral therapies or vaccines, current measures to mitigate this crisis depend heavily on the national and regional preparedness and responses. 9 However, optimal strategies to cope with the complexity of this pandemic demands substantial scientific evidence. Recently, the WHO has issued technical guidance for countries/regions and research institutions, as well as worked closely with global researchers to update the empirical evidence. 10,11 Efforts have been made around the globe to enhancethe understanding of the dynamic transmission of COVID-19, appropriate vaccine research strategies effective treatments, as well as the impacts of current responses on population health and well-being. 12 As a result, in the last four months, the number of publications about COVID-19 has been increased dramatically, including articles, reviews, letters to editors, or preprint documents. 13 These contributions have proven the importance of scientific research in pandemic preparedness as well as helping the governments to respond rapidly and effectively to the crisis. 14 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint Cumulatively, current research evidence has partly shaped our knowledge about COVID-19, but there has been still raising more questions to address , which require sharing information and providing scientific expertise and leadership of all countries to accelerate research efforts. 15 Yet, there has been a the lack of studies attempted to identify the research focus in different countries and regions. Previous systematic reviews have been conducted to explore in detail the clinical characteristics of COVID-19, 16, 17 or the effectiveness of specific COVID-19 treatment, 18,19 and policies. 20 However, these studies were conducted on a small volume of articles. One potential approach to delineate the focus of the academic community is bibliometric analysis. This method has been previously applied to identify the rapid development of research output, the most prolific countries, journal information, and major concerns of research, but not provide a deep analysis of publications. [21] [22] [23] In this study, by using text visualization and topic modeling approaches as a part of natural language processing and machine learning, we aimed to explore the research focus in general and in countries with different levels of income and COVID-19 transmission features. Findings might provide a knowledge gap in scientific research related to COVID-19 and future direction to inform research focus and policy responses better. Information on COVID-19 and SARS-CoV-2-related documents published until 23 April 2020 were extracted from the Medline, Scopus, and Web of Science (WoS) databases. The search terms and search queries for each online database were developed according to the WHO naming process for the virus and the disease it causes, 24 and are presented in the Supplementary 1. Any Englishlanguage publications containing COVID-19 disease or SARS-CoV-2 virus published from December 2019 to 23 April 2020 were included. Document types such as corrections, data papers, . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint reprints, or conference papers were excluded because they might be duplicated to peer-review papers. Datasets of three databases were merged, and duplications were screened independently and removed by two researchers. A final dataset of 5,780 papers was used for further analysis. The searching process was presented in Figure 1 . In this paper, we extracted the documents' title, abstract, keywords, citation, and affiliation of authors for analysis. As a document could be authored by scholars from different countries, we considered that all these countries contributed to the document preparation. Moreover, we decided to include both documents with and without abstracts for text analysis since the title of the document could partly reflect the document's topic. We first descriptively analyzed the number of publications in each country and presented these data by using Microsoft Excel's Map function. Then, we exported the top ten most cited publications for a detailed analysis of the content of the most cited papers. We used the VOSviewer textual analysis software tool (version 1·6·15, Centre for Science and Technology Studies, Leiden University, the Netherlands) to measure the co-occurrence of keywords and most frequent terms in title/abstract. 25,26 Then, we employed Latent Dirichlet allocation (LDA) to discover fifteen latent topics from the titles and abstracts of documents. This Bayesian model treats each document as a set of topics, and topics are probability distributed over a set of words and their co-occurrence. 27 Thus, the LDA technique can produce two outputs: 1) Probability distributions of different topics per document (to acknowledge how many topics are created based on the given publications), and 2) Probability distributions of unique words per topic (to define the topics). 27 Because each title/abstract may contain a mixture of topics, the LDA outputs may not reflect a specific research field or discipline. However, experience for previous . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint work suggested that documents that focused on particular themes would be more likely to be categorized in the same group. To ensure robust results in naming the topic, we also checked at least ten documents per topic to ensure that the theme could fit the content of documents. Multivariable linear regression models were performed to examine the research focus of countries with different income classification (low, low-middle, high-middle, and high incomeaccording to World Bank classification), 28 and different transmission classification (Pending, Sporadic case, Clusters of cases, Community transmissionaccording to WHO). 29 The dependent variable was the share of publications in specific topic out of total publications in each country (%), while the independent variables were income classifications and transmission classifications. The models were adjusted to the natural logarithm of gross domestic product (GDP) per capita, the number of COVID-19 cases, and the number of COVID-19 deaths per country. The latest data on GDP per capita and income classifications were collected from the World Bank databases, while data on cases deaths were extracted from WHO reports on April 24, 2020. A p-value of less than 0.05 was used to detect statistical significance. Research is supported by Vingroup Innovation Foundation (VINIF) in project code VINIF.2020.COVID-19.DA03 Figure 2 shows the research productivity of each country. 115 countries produced 5,780 publications in the searching period. It appears that scientific publications were mainly driven by the research hubs such as China, the United States, Canada, France, Italy, the United Kingdom, and India, which were also heavily hit by COVID-19. In contrast, the majority of African countries had no more than ten studies about COVID-19. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. The list of ten most cited publications about SARS-CoV-2 and COVID-19 and their main findings are presented in Table 1 . Reports on the clinical and laboratory characteristics of the confirmed cases are of most interest, with six out of ten papers in the list. The most cited paper was a descriptive study about epidemiological and clinical features of 99 cases from Wuhan, China, which was believed to be the genesisof SARS-CoV-2. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. The research focuses on countries with different income level and epidemic characteristics are shown in Table 3 . High-income countries (HICs) were significantly more likely to report fewer studies focused on research in epidemiological characteristics and interventions of Psychological disorders in COVID-19 pandemic (Topic 6) compared with countries with other income levels. Low-middle income countries were found to publish less on diagnostic values of SARS-CoV-2 tests and improvement strategies (Topic 10). Treatment interventions for COVID-19 (Topic 15) attracted the interest of scientists among countries at all income levels, especially in HICs. Regarding transmission classification, comorbidities in patients with COVID-19 (Topic 8) were found to receive less attention among countries with sporadic cases in comparison with countries having "pending" transmission classification. Treatment interventions were significantly less likely to be the research focused in countries having sporadic cases, a cluster of cases, and community transmission compared with those with "pending" transmission classification. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint By using the natural language processing approach with the Latent Dirichlet allocation, this study was able to capture the focus of COVID-19 related publications in different settings. This paper informed the rapid growth of research publications, the global variation in research productivity, and research interests. Moreover, the findings of this study suggest research gaps. In this study, we found a greater number of publications regarding COVID-19 and SARS-CoV-2 in comparison with previous bibliometric studies. [21] [22] [23] For example, Lou et al. used the Medline database and only found 183 publications through February 29, 2020. 22 This disparity could be justified that our search was far more comprehensive than these studies by usingthree major databases, including the Medline, Scopus, and Web of Science. In addition, we included other document types such as letters, commentaries, or notes rather than concentrating only on original articles. As original papers require a long period for peer-reviewed, 30 scientists tended to publish their ideas in those document types first for receiving rapid feedbacks from others, 31 Therefore, we believed that our approach was appropriate given that these documents might partly reflect the research focus in each country. The thematic maps of authors' keywords and terms reveal that major research themes included virological and molecular analysis of the virus; clinical, laboratory and radiology examinations; and global and public health responses. Our findings are in line with a previous bibliometric study, which showed that virology, clinical characteristics, and epidemiology of COVID-19 were found to be the research focus with the highest volume of papers. 22 Indeed, these research areas are essential components for controlling the pandemic. While understanding the biology of SARS-CoV-2 is critical for the development of effective and safe screening tests, drugs and vaccines.Investigations into clinical characteristics of COVID-19, along with results of laboratory and radiology, could inform a fundamental for appropriate patient management in the clinical . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. The results of topic modeling offer more penetrating insights into the emerging research themes. Of all identified topics, clinical aspects, particularly guidelines for emergency care and surgical management during the COVID-19 pandemic (Topic 11), were most frequent. Along with the rapid increase in the number of confirmed cases, the heavy demand for health facilities and health workers, as well as the lack of effective treatment regimens, place a heavy burden and prevent the healthcare systems from operating efficiently. Without guidelines for prompt responses in emergency care, the burden caused by COVID-19 would go beyond the capacity of most health systems, especially for ICU care. 35 In addition, a number of SARS-CoV-2 infections emerged from operations were reported in China, suggesting the risk of virus exposure despite strict hygienic requirements and aseptic techniques during the surgical process. 36 Research for clinical guidelines, therefore, plays a critical role in mitigating the impact of COVID-19 on the healthcare system. The origin and pathophysiology of the virus have attracted a great deal of attention since the beginning of the outbreak. 37-39 The interest on this topic has continued to rise as the virus has gone beyond China, where the first infection was reported, and positive cases have been found in most countries and territories. 40 On the other hand, the information that SARS-CoV-2 is a laboratory derived virus, albeit has been confirmed to be a false claim, gave rise to considerable controversy and also facilitated research on the nature of the virus 41 . Another topic that should be mentioned is national public health responses and actions against COVID-19, especially at the beginning of the pandemic, when there was a wide difference in policies introduced by different governments. In particular, some countries have been advocating achieving herd immunity, whereas low-and middle-income countries (LMICs) have implemented strict actions, including quarantine, . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint isolation, social distancing, and community containment as soon as the outbreak occurred 42-45 . Although such measures have sdemonstrated their effectiveness, for optimal public health as well as economic outcomes, further investigation into their implementation within specific contextual factors should be prioritized. 46 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. To our knowledge, this is the first analysis using text mining and text modeling to investigate the topics of the worldwide COVID-19 publications. However, some limitations should be notedd. The restriction of the search strategy to the English language might not reflect globalized practices and the research priority of a country, such as articles in the local language. Analyses of keywords, titles, and abstracts may not fully reflect the content of articles. However, with the combination of three large datasets and various techniques of text mining, this study is useful for an overview of the research direction. This study showed that COVID-19 related publications were primarily contributed by major research hubs such as the United States, China, and European countries. Global researchers were currently focusing on clinical management, viral pathogenesis, and public health responses in combating against COVID-19. We alsofound country variation in research priorities. Findings suggest the need for global research collaboration among high-and low/middle-income countries in the different stages of prevention and control the pandemic. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. The authors declare no conflict of interest. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20092635 doi: medRxiv preprint 42. Pen CL. The Theory of Herd Immunity, or the Ayatollahs of Public Health. 2020. https://www.institutmontaigne.org/en/blog/theory-herd-immunity-or-ayatollahs-public-health28/4/2020). 43. Ellyatt H. Sweden resisted a lockdown, and its capital Stockholm is expected to reach 'herd immunity' in weeks. 2020. https://www.cnbc.com/2020/04/22/no-lockdown-in-sweden-butstockholm-could-see-herd-immunity-in-weeks.html (accessed 28 April 2020). . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. April, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. • Fevers, cough, myalgia, sore throat and malaise were also observed. • No neonatal asphyxia was observed in newborn babies. • All the symptomatic patients had increased Creactive protein levels and reduced lymphocyte counts. • The coronavirus may have been transmitted by the asymptomatic carrier. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. WHO Declares COVID-19 a Pandemic World Health Organization. WHO Director-General's opening remarks at the media briefing on COVID-19-11 The many estimates of the COVID-19 case fatality rate. The Lancet Infectious Diseases 34·5) -1·1 (-9·1; 6·9) 3 (-3·8; 9·7) 6·7 (-0·7; 14·1) Topic 2 3·4 (-12·2; 18·9) -7·1 (-29·8; 15·5) -10·4 (-44·4; 23·6) -5·3 (-18·6; 7·9) -3 (-14·2 6·4) Topic 10 -1·5 (-2·9; -0·1)* -1·9 (-3·9; 0·1) -2·7 (-5·6; 0·3) -0·3 (-1·5; 0·8) 0·4 (-0·6; 1·4) 0 (-1·1; 1·0) Topic 11 -10·2 (-20·9; 0·5) -13·7 (-29·3; 1·8) -19·6 (-42·9; 3·8) -2·2 (-11·3; 6·9) -0·2 (-7·9; 7·6) 1·7 (-6·8; 10·1) Topic 12 -1·5 (-4·3; 1·4) -3·2 (-7·3; 1·0) -3·1 (-9·2; 3·1) -0·6 (-3·0; 1·8) -1·1 (-3·1; 1·0) -0·8 (-3·0 The model was adjusted to natural logarithm of GDP per capita, number of cases, number of deaths, and WHO COVID-19 transmission classification Compared to Pending classification. The model was adjusted to natural logarithm of GDP per capita, number of cases, number of deaths, and World Bank Income Classification