key: cord-1011785-tywxndui authors: Murillo, Julieth; Villegas, Lina María; Ulloa-Murillo, Leidy Marcela; Rodríguez, Alejandra Rocío title: Recent Trends on Omics and Bioinformatics Approaches to Study SARS-CoV-2: A Bibliometric Analysis and mini-review date: 2020-12-03 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2020.104162 sha: 0b5b68edb004cdeb628c7429d5d2c6a7787b07f5 doc_id: 1011785 cord_uid: tywxndui BACKGROUND: The successful sequencing of SARS-CoV-2 cleared the way for the use of omics technologies and integrative biology research for combating the COVID-19 pandemic. Currently, many research groups have slowed down their respective projects to concentrate efforts in the study of the biology of SARS-CoV-2. In this bibliometric analysis and mini-review, we aimed to describe how computational methods or omics approaches were used during the first months of the COVID-19 pandemic. METHODS: We analyzed bibliometric data from Scopus, BioRxiv, and MedRxiv (dated June 19(th), 2020) using quantitative and knowledge mapping approaches. We complemented our analysis with a manual process of carefully reading the selected articles to identify either the omics or bioinformatic tools used and their purpose. RESULTS: From a total of 184 articles, we found that metagenomics and transcriptomics were the main sources of data to perform phylogenetic analysis aimed at corroborating zoonotic transmission, identifying the animal origin and taxonomic allocation of SARS-CoV-2. Protein sequence analysis, immunoinformatics and molecular docking were used to give insights about SARS-CoV-2 targets for drug and vaccine development. Most of the publications were from China and USA. However, China, Italy and India covered the top 10 most cited papers on this topic. CONCLUSION: We found an abundance of publications using omics and bioinformatics approaches to establish the taxonomy and animal origin of SARS-CoV-2. We encourage the growing community of researchers to explore other lesser-known aspects of COVID-19 such as virus-host interactions and host response. stomach epithelial cells, and kidney proximal tubules (4) (5) (6) . Accordingly, there is a broad set of symptoms that ranges from asymptomatic infections, mild respiratory symptoms, severe pneumonia, acute kidney injury (7), digestive and circulatory system affection, to fatality (1). It is becoming common knowledge that the surface-anchored spike protein mediates coronavirus entry through the binding to ACE2, however, it less explored that it has also been reported that coronaviruses exploit many other surface molecules according to the cell type in order to make the internalization more efficient (8). Surface virus-host triggers a subsequent series of biological events including the formation of vesicles, enzymes activation/repression, host molecules recruitment, synthesis of viral components, among others. Integrated multi-omics studies offer an unbiased approach to study the host-virus interactomics that ultimately can result in the detection of therapeutic marks for this novel infection, which has a pivotal role in drug repurposing as well as developing new drugs and vaccines in a precise and efficient manner (9) . Consequently, arousing omics-scale studies on this viral infection are offering a great potential to study the pathobiology of the infection, and ways forward for diagnostic and therapeutic innovation (9) . Since SARS-CoV-2 is highly contagious, this oftentimes restricts the provisions for clinical samples handling in omics research facilities, making it a challenge to implement systems-level molecular studies. Given this limitation, it would be useful for scientists in this field to be aware of trends in omics approaches and computational methods to address COVID-19 related issues. Hence, in this bibliometric analysis and J o u r n a l P r e -p r o o f mini-review, we aimed to describe how computational methods or omics approaches were used during the first months of the COVID-19 pandemic. We hope that our work will serve as a roadmap to allow the identification of key knowledge gaps and research priorities. Furthermore, we hope that in future pandemics or infectious outbreaks, this document may be useful in helping future researchers quickly understand how omics data and computational methods can be used to help discern the first unknowns that arise in these scenarios. Searches were conducted in Scopus and preprint servers (BioRxiv and MedRxiv) on June 19, 2020 (Search formula in supplemental material 1), and it was not constrained regarding language, publication stage and time. Nevertheless, document type was refined to journal articles. Ethical approval was not required in this study, because no human subjects were enrolled. We analyzed the bibliographic data through a quantitative analysis approach and a knowledge mapping technique using bibliometric data. For the quantitative analysis the information was sourced from Scopus. Knowledge mapping was performed using VOSviewer (V1.6.14) (10), focalizing on "link strength" of networks based on author keywords, and the text corpus (title and abstract). To identify emerging terms, we manually edited the thesaurus list to exclude expected terms such as COVID-19 and SARS-CoV-2 and to avoid irrelevant terms (i.e., mean and order). We used the full J o u r n a l P r e -p r o o f 6 in some cases, we were required to review the methodology session in the publication; 122 fulfilled the inclusion criteria in MedRxiv and BioRxiv. The most active countries publishing about COVID-19 using omics or computational approaches are China, USA, India, and the United Kingdom; figure 1 shows the number of published articles per country, differentiating the ones from Scopus and pre-prints articles at MedRxiv and BioRxiv. Table 1 shows bibliometric information of journals that contributed more than 2 articles related to within defined fields research of omics Docking (cluster 4). As a complement, in the corpus co-occurrence network for Scopus (figure 3 B) divided into three clusters, the most frequent terms were bat, receptor binding, and phylogenetic analysis (cluster 1), receptor, model, and prediction (cluster J o u r n a l P r e -p r o o f 2) and, ACE2, interaction, and expression (cluster 3). All these frequent keywords allowed us to see that in indexed publications, omics and bioinformatics tools are used to research urgent questions such as the animal origin of the virus, the taxonomy of the new infectious agent, and therapeutics strategies. We also constructed a keyword co-occurrence network, based on corpus text using the pre-print databases MedRxiv and BioRxiv ( Figure 3 ). The network formed by 76 nodes is divided into 3 clusters; cluster 1 has terms with similar frequencies, such as disease, protein and cell, cluster 2 has 20 terms including immune response, detection, and The outbreak of COVID-19 has caused more than 1.1 million of deaths worldwide, primarily of older-people, according to the World Health Organization as of October 25, 2020 (18) , which has caused a rapidly increasing number of publications. Since computational approaches could play a crucial role in different aspects of the pandemic, we aimed to analyze the SARS-CoV-2 literature published by authors that used bioinformatics and omics data to help to make informative and urgent decisions. According to our search by June 19th, 2020, China, the USA, the United Kingdom, and India accounted for the highest proportion of published research. These results are consistent with previous bibliometric analysis regarding the scientific production in bioinformatics around the world that is led by the wealthiest countries (19) . It is remarkable that the effort by the Indian government in developing bioinformatics infrastructure and human resources, which shows its results during this pandemic. India is not only among the most productive countries but together with China, it is the origin of the most cited articles during this short and active period (20) . I n terms of international collaborations, we found only a few of the studies were made with researchers from other countries. This is commensurate with the fact that most of the studies worked with stored sequences. The genome sequence of the SARS-CoV-2 is Early in the pandemic, it was established that SARS-CoV-2 is a zoonotic virus that jumped from an animal to a human, presumably in the "wet market" in Wuhan China. Out of the 62 articles related to omics that we found, 13 used proteomics data in primary analysis or secondary analysis combined with bioinformatics tools. Two of the main purposes of the authors who used proteomics were to elucidate the mechanism of host-pathogen interactions, its association with the severity of COVID-19, and to find possible therapeutic targets. Khan and Khan (26) None declared New understanding of the damage of SARS-CoV-2 infection outside the respiratory system Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat Microbiol Single cell RNA sequencing of 13 human tissues identify cell types and receptors of human coronaviruses SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes COVID-19 Pandemic: Hopes from Proteomics and Multiomics Research. Omi A Software survey: VOSviewer, a computer program for bibliometric mapping A new coronavirus associated with human respiratory disease in China Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak Preliminary identification of potential vaccine targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach Epidemiological, clinical and virological characteristics of 74 cases of coronavirus-infected disease 2019 (COVID-19) with gastrointestinal symptoms Transcriptomic J o u r n a l P r e -p r o o f characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients World Health Organization. Weekly epidemiological update -27 The bioinformatics wealth of nations Advancing India's bioinformatics education and research: an assessment and outlook A Scientometric Analysis of Global Health Research Abstracts from the 3rd International Genomic Medicine Conference Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1 An Extensive Meta-Metagenomic Search Identifies SARS-CoV-2-Homologous Sequences in Pangolin Lung Viromes coronavirus main proteases enzyme Withanone and caffeic acid phenethyl ester are predicted to interact with main protease (Mpro) of SARS-CoV-2 and inhibit its activity The Prediction of miRNAs in SARS-CoV-2 Genomes: hsa-miR Databases Identify 7 Key miRs Linked to Host Responses and Virus Pathogenicity-Related KEGG Pathways Significant for Comorbidities. Viruses Exploring potential effect of Shengjiang San on SARS-CoV-2 based on network pharmacology and molecular docking Identification of new anti-nCoV drug chemical compounds from Indian spices exploiting SARS-CoV-2 main protease as target Evolutionary trajectory for the emergence of novel coronavirus SARS-CoV-2 Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments Structural genomics of SARS-COV-2 indicates evolutionary conserved functional regions of viral proteins