key: cord-0437541-7g6qr73e authors: Bhattacharya, Sujit; Singh, Shubham title: Visible Insights of the Invisible Pandemic: A Scientometric, Altmetric and Topic Trend Analysis date: 2020-04-22 journal: nan DOI: nan sha: f84886d52a2dc934793dbaf2272f9a16c8da275c doc_id: 437541 cord_uid: 7g6qr73e The recent SARS-COV-2 virus outbreak has created an unprecedented global health crisis! The disease is showing alarming trends with the number of people getting infected with this disease, new cases and death rate are all highlighting the need to control this disease at the earliest. The strategy now for the governments around the globe is how to limit the spread of the virus until the research community develops treatment/drug or vaccination against the virus. The outbreak of this disease has unsurprisingly led to huge volume of research within a short period of time surrounding this disease. It has also led to aggressive social media activity on twitter, Facebook, dedicated blogs, news reports and other online sites actively involved in discussing about the various aspects of and related to this disease. It becomes a useful and challenging exercise to draw from this huge volume of research, the key papers that form the research front, its influence in the research community, and other important research insights. Similarly, it becomes important to discern the key issues that influence the society concerning this disease. The paper is motivated by this. It attempts to distinguish which are the most influential papers, the key knowledge base and major topics surrounding the research covered by COVID-19. Further it attempts to capture the society's perception by discerning key topics that are trending online. The study concludes by highlighting the implications of this study. Coronaviruses are viruses that circulate among animals and are named because of the crownlike spikes (protein spikes) that protrude from their surface resembling the sun's corona. The first transmission of this type of virus from animals to humans happened in 2002 in the Guangdong province of China which resulted in SARS (Severe Acute Respiratory Syndrome). Bats were thought to be the potential source of this virus. A novel coronavirus (nCoV) was identified in early January 2020 which was traced to the severe pneumonic outbreak of an undocumented cause in early December 2019 in the city of Wuhan, China. Similar to SARS, bats are seen as the potential source from which this virus has spread to humans. Due to rapid spread of the virus within a short time globally resulting in health emergencies in a number of countries, WHO declared this as a pandemic on 11 th March 2020. Initially the virus was named 2019-nCoV but later was named SARS-CoV-2 due to its "genetic relationship to the SARS-COV-1 virus"(Medscape, 2020) The entire human population is at potential risk as being a new virus nobody has prior immunity to it. There is no vaccine and no specific treatment for the disease and is highly transmissible. Epidemiological estimation at present is that on an average, one infected person will infect between two to three other people. The spread has been from respiratory droplets and from contaminated surfaces. The world is facing a common challenge; how to control the spread of and what can be the effective interventions to control mortality. The early examples from China suggested the most effective method to control the spread of the virus is through lockdowns and social distancing measures. The example of South Korea suggested testing as the major component of the mitigation measures. Some studies pointed out the importance of hand hygiene and face masks in controlling the virus. With new hotspots emerging, the number of new cases and those not able to recover are raising new concerns every day. Risk and uncertainty behind this disease control has generated a global concern for health, economy, and for persons at large. The alarming spread of the virus has shocked people across the world pushing among others researchers to understand the virus-its structure, transmission, replication mechanism, latency, etc. and promising interventions that can effectively control it. Extensive global efforts are undertaken to develop vaccine and drug. This is unsurprisingly leading to huge volume of research activity within a short period of time increasing at an exponential rate. As the recent editorial published in The Lancet highlights "The whole-genome sequence of SARS-CoV-2 had been obtained and shared widely by mid-January, a feat not possible at such speed in previous infectious disease outbreaks" The editorial points out the importance of the need for development of effective diagnostics, therapeutics and vaccine for the virus. Examining from the Dimensions database as of 22 April, it was found that 1633 clinical trials are being conducted on the virus and 171 policy documents have been published so far. The number of research papers, clinical trials at different phases within such a short period is unprecedented and shows the intensive efforts of the global research community to understand the different aspects of this disease and address it. Seven patents have also been granted. It is important to capture insights of influential research and innovation from this ongoing activity for policy makers and research scholars from cross-disciplinary areas to build up further on this valuable repository. Societal impact and what aspects are of concern to the people at large are difficult to capture. One useful method would be from online trends surrounding this disease that would indicate to some extent the key issues that are influencing the society at large. The present study is motivated by this and applies tools and techniques of scientometrics to uncover insights from research papers. Scientometrics applies various mathematical and statistical techniques to capture insights of research activity from research papers and patents and other published sources including online sources (Altmetrics) by constructing various types of indicators. Citation based analysis is a prominent method to capture academically significant and theoretically relevant material (see for example Glanzel, 2003) . Keeping in view the research activity in this area started primarily with the outbreak of this disease, impact captured through citations would not give a correct picture as citations takes time to accrue. This is true for research paper as well as patent citations. Citations that influence current research activity would however be useful to construct the present knowledge base. One of the useful method that can do so is based on cocitation analysis which captures frequency with which two documents are cited together (Small, 1973) . Co-Citation establishes an intellectual relationship with earlier literature in a field/subfield/area of research; strength of relationship based on frequency of co-citation pairs. The rise of social networking websites like twitter, Facebook etc. provides researchers a wider scope to share their scholarly publications. Altmetrics allows to track and capture online impact of scholarly research and thus broadly indicates papers that are influencing the research community. To put it in a proper perspective, one can borrow from William (2017), "Altmetrics are measurements of how people interact with a given scholarly work". Altmetrics or article level metrics according to Das and Mishra (2014) is a "new trendsetter" to measure "impact of scientific publication and their social outreach to intended audience". It reflects "a scholarly article's popularity, usage, acceptance and availability" by using an altmetric score. Google Trends which was launched in 2006, primarily shows how frequently a particular search term is entered in comparison with all other search terms in different regions and languages (Google, 2017). In Google trends level of interest in a topic is approximated using search volume of Google. Sullivan (2016) estimated searches on Google Trends reached 2 trillion in 2016! Thus, this is one of the most significant source of data if it is properly analysed. One of the most influential study was by Ginsberg et al. (2006) which showed that Google Trends traced and predicted the spread of influenza earlier than the Centers for Disease Control and Prevention. Jun et al. (2018) provides a good assessment of research studies in the past decade which have utilized Google Trends. They highlight the diverse fields in which this has been used for, from merely describing and diagnosing research trends to forecasting changes. According to Mavragani et al. (2018) "Google Trends shows the changes in online interest for time series in any selected term in any country or region over a selected time period, for example, a specific year, several years, 3 weeks, 4 months, 30 days, 7 days, 4 hours, 1 hour, or a specified time-frame." They argue that as the internet penetration is increasing web based search activity has become a valid indicator of public behaviour. The paper positions itself in this direction; applying various tools and techniques of scientometrics, Altmetrics and Google Trends to draw meaning from the huge volume of research papers and online activity surrounding this pandemic. The study attempts to answer the following research questions:  What are the key papers that captures the most relevant research, areas and topics on COVID-19?  What is the knowledge base that influences current research on this pandemic?  What are the key aspects of this pandemic that is influencing the society at large? The study has used various types of data sets and analytical techniques as highlighted below to capture the research trends and also assess this disease influence on the society. The Dimensions database (www.dimensions.ai) was used for this study. This database has various unique features which makes it very useful to capture various aspects of research activity. It provides dynamic Altmetrics score for each article. The database unlike source based classification (journal classification) used in indexing articles in SCI and Scopus database uses article level classification. Only when an article cannot be classified individually due to lack of information, it uses the Fields of Research (FOR) classification system. The FOR 2 has three hierarchical levels: Divisions (represents a broad subject area or research discipline), with the next two levels Groups and Fields representing increasingly detailed subsets of these categories. In FOR there are 22 Divisions, 157 Groups and 1238 Fields. Dimensions has incorporated only the Groups in its classification system. Thus classification article level provides a more informed assessment of the topic covered by it then based on journal level classification which is a macro level classification. These features motivated us to use this database for this study. The articles on this virus were extracted using the search string "Covid-19" Or "SARS-CoV-2" Or "SARS-CoV2" Or "2019-nCoV" on April 12, 2020 from this database. The final search string was developed based on review of contemporary studies and deleting those search keywords that lead to noises. For example, it was found that nCoV which some studies have used also identifies papers that cover MERS (Middle-East respiratory syndrome). This was first reported in 2012, was initially called novel coronavirus or nCoV as it was a species of coronavirus. Many studies had applied search string without hyphen which also results in extracting papers not covering this disease. The search string applied on the publications database of Dimensions resulted in 9146 papers, containing 7332 articles and 1814 pre-prints. This data set of 9146 papers were further used for analysis. Influential papers were distinguished by using Altmetric score which is a weighted count of all the online attention of a research paper. The altmetrics data was captured from Dimensions database which draws data from altmetrics.com of capturing online activity of research papers on Facebook, twitter, blogpost, news reports etc. The score changes as people mentioning the paper increases (only one mention per user is considered). Each category of mention carries different base amount so a news article contributes more than a blog post which in turn contributes more than a tweet in the final score. Country wise analysis showed that around 78 percent of the total papers were contributed by ten countries. Further analysis of research activity of the ten identified countries was done using altmetric and citation analysis. Word cloud provides a high visual representation of concepts that a paper had frequently applied. It is based on Burst algorithm that captures the sudden rise in the usage of a word. Mane and Borner (2004) highlighted the usefulness of burst words as according to them "it helps humans mentally organize and electronically access and manage large complex information spaces". Using R programming tools, word cloud was constructed from keywords of the data set. The words with higher frequency in the overall corpus of papers (herein 9146) have a larger font size and acquires more space in a visualisation. Word Cloud was used to get visualisation of 70 most frequent words; the number of words chosen was limited by the clarity of visualisation. Co-citation analysis helps to capture papers that are co-cited together in a large number of papers. The highly co-cited papers is seen as the core knowledge base of research area at a particular period. This analysis was undertaken to identify the key knowledge base behind the identified papers. Dimensions database was used to extract a bibliographic mapping file for the papers. Two software's Pajek and VOSviewer were used for co-citation analysis. Initially the bibliographic mapping file was run on the VOSviewer software to identify the most co-cited papers. The co-cited papers were identified at four levels (trim levels) to have a deeper insight of the core knowledge base: Level 1 identified 51 papers co-cited 77 or more times; Level 2 identified 26 papers co-cited 126 or more times; Level 3 identified 10 papers co-cited 277 or more times, and Level 4 identified 5 papers co-cited 463 or more times. For each of the trim levels a network file was obtained from the VOSviewer software. The network file was then run on Pajek software to create a refined co-citation network map so as to avoid overlapping of nodes. The final visualisation was done for the refined map in the VOSviewer software. Data for the Policy Documents referencing these top ten co-cited papers was done by accessing altmetrics.com directly from the dimensions database. Another question which the study explored is the impact of this virus on the society primarily what are the key aspects of and related to this disease that has influenced the society at large. Google trend analysis of key topics have been undertaken to capture this aspect. Google trends website (https://trends.google.com/trends/?geo=US) was first accessed on 10 th April 2020. The topics were chosen based on closely monitoring the news items, and also finally choosing from a large set of topics. Choice for example of 'pandemic' was seen to have initial burst but declined quickly. Vaccine was trending highly but we found lot of noise in this term. The final six topics chosen were "Social distancing", "Quarantine", "COVID-19", "Coronavirus", "Face Mask" and "Hydroxychloroquine". Data for each of the topics was finally taken on 16 April, 2020. For country specific comparison data for five countries having maximum cases of COVID-19 namely USA, Italy, Spain, France, Germany and two emerging economies India and Brazil was also obtained. Hydroxychloroquine was not used in country specific search as it was only visible trending for three countries among the chosen seven countries. Google trend analysis was not done for China as there is much restricted access to Google in that country. One of the first important observation is the intensity with which research on COVID-19 and related aspects is going on globally. Search conducted in two different time periods, 28 th March and then on 12 th April showed that 2172 and 9146 papers were published in these two periods; almost 320 percent growth during such a short time. The insights that we draw from our analysis of the 9146 papers is presented in different sections below Ten most popular research papers among the 9146 COVID-19 papers were extracted on the basis of their Altmertrics score on April 12, 2020. Table 1 highlights these influential papers. 2020), the study most popular on social media platforms (number of tweets more than three times the next popular paper) commented that "SARS-COV-2" is the seventh coronavirus to infect humans". The study found that SARS-COV-2 is not a product of purposeful manipulation and is most likely the result of natural selection of human or human-like ACE2 receptor. The study also found that SARS-COV-2 spike protein has high affinity to bind to human ACE2 receptor. The study also estimated that the undocumented cases contagiousness or transmission rate was 55% of documented infections, yet 79% of documented infection cases were due to these undocumented infections. The suggestion of this study that undocumented infections "isolation and identification is necessary to fully control the virus" is very important and the spread of this virus may be seen as a consequence of this. This study also was cited in policy documents. Leung et al. (2020) explored "the importance of respiratory droplet and aerosol route of transmission" by quantifying the "amount of respiratory virus in exhaled breath of participants" that have acute respiratory virus illness (ARI). The 246 participants were divided in two groups, one wearing surgical face mask and other not wearing face mask. The study found that surgical face masks can efficaciously reduce the respiratory droplet emission of influenza virus particles but not in aerosols. They also found that surgical face masks can be used by ill patients of COVID-19 to reduce "onward transmission". Face mask is getting increasing attention and now being incorporated as essential guideline in health policies of different countries. Table 2 points to some interesting aspects of research activity in this area. These ten countries account for almost 78 percent of total papers with China and USA accounting for 45 percent of the total. China, USA and UK are actively collaborating among themselves and also with other countries. This is a good indication as global collaborative efforts, pooling each other resources are required to meet the challenges posed by this disease. A few leading universities can be discerned which are actively involved in this research. Popularity of a paper can also be seen influenced by journals; papers with high altmetrics score strongly correlate with journals that have high reputation in the field (high impact factor, leading journal of the community). Table 3 highlights the areas covered in the COVID-19 papers. The table provides a broad indication of intensity of research happening in different fields. Figure 1 presents a word cloud of most frequently used terms in Covid-19 Papers. The word cloud shows key aspects that have been part of many studies. The word cloud maps the topics of research surrounding this disease. The two keywords, for example "Pandemics" and "China" that have maximum occurrence in papers indicated by large font size which shows that these two aspects were discussed in many papers. It is known that China was the source of this infection and WHO declared this disease as a pandemics. Thus increasing research mention of these two keywords is not surprising. Coronavirus primarily affects animals, SARS disease as a result of transmission of coronavirus from animal to humans, travel has contributed maximum to the spread of this disease, are all visible prominently in this word cloud. Thus, examination of the word cloud is useful to have a broad view of key areas of research in the 9146 papers. Figure 2 shows co-citation networks at four trim levels. It can be observed from figure 2 that Trim level 4 that contains top 5 co-cited papers is a complete cluster. A complete cluster according to GMÜR (2003) is when "each reference is connected to other references and there is no dominant document within the cluster". Table 4 highlights the details of these co-cited papers at trim level 3 which identifies top 10 cocited papers. It also includes the top 5 co-cited paper at Trim level 4 (refer methodology for details). Pandemic's Influence on the Society Figure 3 provides global Google trends of six topics (refer methodology for details) currently talked about extensively on social media, news reports or in general public discussion. It can be observed the otherwise flat line of COVID-19 started seeing spikes in late February, 2020. This is because the term came into existence when WHO on Feb 11 named the disease from the virus as COVID-19 (Coronavirus Disease 2019). It can also be observed that the term reached maximum level of interest during the end of March as cases started showing significant increase in countries USA, India etc. Measures like quarantine has strong societal influence and thus useful to look at trend in this topic. Another topic "Pandemic" saw maximum interest around March 11, as WHO announced COVID-19 pandemic on that day. It can also be seen that the interest about this topic fell shortly thereafter. As discussed in the methodology, this topic hence was not chosen further in this study for Google trend examination. Hydroxychloroquine term has seen a great amount of interest from middle of March, 2020. This can be traced to the study by reputed French physician and microbiologist Didier Raoult who highlighted the use of this antimalarial drug in the treatment of this disease. It led to French President and US President endorsing this line of treatment which created favourable public opinion in many countries towards this drug. Liu et al. (2020) whose work has also attracted high altmetrics attention found the drug to be effective in "inhibiting SARS-COV-2 in vitro". This line of treatment and the robustness of the methodology and findings have also generated critical comments, see for example Grens (2020) . India has become a key source for this medicine and already exported it to number of countries. Thus a high degree of activity in Google as seen through Google trend can be due to various factors, positive as well as adverse reactions. Similar public opinion generated by studies and endorsement can be seen in YouTube searches in Face Mask. Source: Google Trends Figure 4 provides the comparison of Google trends of "Social Distancing", "COVID-19", "Quarantine", and "Lockdown" and "Face Mask" from 16 Feb to 14 April 2020. Figure 4 (a) presents a global picture of these topics and Figure 4 (b) shows data for five countries having maximum cases of COVID-19 namely USA, Italy, Spain, France, Germany and two emerging economies India and Brazil. Three of the five topics chosen restrict people's movement. Social distancing which basically means keeping a safe distance of around 6 feet from others and avoiding places where this kind of distance cannot be made like schools, workplaces, a sports game or a temple. The second one is Quarantine which applies to a person who have been in exposed to coronavirus or patients having coronavirus. The person has to avoid contact with people till the specified incubation period of the virus to see if they develop symptoms. Third Lockdown, the term mainly used to describe the confinement of prisoners to their cells has now a changed definition during this outbreak. Through lockdown people are not allowed to leave their local area, building and it is used as a control measure to prevent COVID-19 disease transmission. It can be seen from figure 4 (b) that "Lockdown" is the most popular topic worldwide as well as at the country level. According to World Economic forum 2.6 billion people i.e. one thirds of the world population are under some kind of lockdown. 3 This figure alone constitutes India's 1.3 billion, the largest lockdown in the world. Thus it is unsurprising to see the popularity of this topic over others in India. These control measure have major economic, psychological and social impacts and in turn affects lives of all the people involved. A worrying trend is the low comparative interest in Social Distancing in most of the countries and almost negligible comparative interest in countries like Indian and Brazil. The stand taken by Brazil through her President of opening up and has been critical of measures like social distancing and lockdown may have contributed to this type of trend. Proximal Origins of SARS-COV-2 A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Severe Covid-19 Challenges of coronavirus disease 2019 Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Genesis of altmetrics or article-level metrics for measuring efficacy of scholarly communications: Current perspectives The cognitive paradigm : cognitive science, a newly explored approach to the study of cognition applied in the psychology of scientific knowledge and of education in science . RUG. Faculteit Psychologische en Pedagogische Wetenschappen Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an openlabel non-randomized clinical trial Detecting influenza epidemics using search engine query data Co-citation analysis and the search for invisible colleges: A methodological evaluation Bibliometrics as a research field: A course on theory and application of bibliometric indicators Journal Publisher Concerned over Hydroxychloroquine Study Clinical features of patients infected with 2019 novel coronavirus in Wuhan Ten years of research change using Google Trends: From the perspective of big data utilizations and applications Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting SARS-CoV-2 infection in vitro Mapping topics and topic bursts in PNAS Google Trends in Infodemiology and Infoveillance: Methodology Framework Analysis of the Capacity of Google Trends to Measure Interest in Conservation Topics and the Role of Online News Coronavirus Disease 2019 (COVID-19): A Global Crisis Co-citation in the scientific literature: A new measure of the relationship between two documents Google now handles at least 2 trillion searches per year The donut and Altmetric Attention Score: An at-a-glance indicator of the volume and type of attention a research output has received Document co-citation analysis to enhance transdisciplinary research Altmetrics: an overview and evaluation Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China The authors thank Dr Vivek Singh, Professor, Department of Computer Science, Banaras Hindu University for providing access to Dimensions database which helped in framing the pilot study for this research. We are grateful to Dimensions for providing us direct access. Glad to see the initiates they have taken in promoting COVID-19 research. Note: Comparative interest over time of Social Distancing for India and Face Mask and Social Distancing for Brazil were negligible, so these terms have been removed for the Graph for these countries. COVID-19 disease took the world by surprise. The alarming spread of the disease, challenge to control its spread and its grave health consequences, lack of vaccine or effective drug among others has prompted researchers to actively work on various aspects of this disease. This has led to a huge volume of research output within a short period of time. The various aspects surrounding this disease has also effected society's perception of this disease. Drawing insights from this huge volume of research output is a challenge as well as an exercise of significant importance for the policy makers and research community. Scientometrics provides various tools and techniques to uncover insights from research papers. Altmetrics and Google trends are novel approaches to track online impact. These tools and techniques was used in this study to analyse 9146 papers that were discerned from the Dimensions database covering COVID-19 for the period upto 12th April, 2020. Content analysis of a set of ten key papers was also undertaken to draw qualitative insights of these influential papers. Some of the insights from this study reveals the key areas in which research has progressed. Their visible impact can be seen in their altmetrics score and more directed impact in their citation in key policy documents. Key policy influence such as period of quarantine, treatment type, use of face masks, population more vulnerable to disease etc can be traced to the influential papers that were discerned from this study. It would be however fallacy of generalisation if we say that these papers were the only influential factors behind the policy decisions. It is also interesting to see how papers had attracted attention from different online sources like twitter, news, blog, and facebook. Thus, the importance of these sources in influencing research impact calls for researchers to aggressively use these modes for dissemination of their study findings. Another important insight comes from the active collaboration seen in research papers. China and USA drive this research globally and are also actively engaging with other countries in research. This finding is drawn from top ten active countries which constitute almost 78% of the total research output as visible from research papers. Word cloud showed the influential topics of research surrounding COVID-19 research. This type of visual maps provide a good indication of key topics that constitute research activity in a area at a period of time. Co-citation analysis identified the knowledge base that influences current research on this pandemic. It was observed that all of the top co-cited papers were published in high impact factor journals. It was also observed that most of the studies were driven by epidemiology and clinical characteristics of the disease. Google trends analysis showed how the disease shapes the public opinion on certain topics. Global trending topics incorporated interest over time in web, news, Google shopping and YouTube searches. A sudden rise in a topic's interest could be traced to various exogenous factors. The trend, for example in Hydroxychloroquine, use of this antimalarial drug in the treatment of this disease could be seen the interplay of various forces, the impact of research paper, political endorsement and critical questioning of the research community, doctors etc to the effectiveness of this line of treatment. The trends observed in measures like lockdown, social distancing and quarantine at global and country level showed the societal increasing concern with these aspects.The findings of this study suggests how the research and public interest has been shaped around this disease. With so much information surrounding this disease, the study provides a space for understanding its various aspects. The study however is limited as it has not examined patents, clinical studies and policy documents. This may provide indications of implementable aspects that draw from research. Future research intends to examine this.