key: cord-1044746-0cjissln authors: Song, Shuhui; Ma, Lina; Zou, Dong; Tian, Dongmei; Li, Cuiping; Zhu, Junwei; Chen, Meili; Wang, Anke; Ma, Yingke; Li, Mengwei; Teng, Xufei; Cui, Ying; Duan, Guangya; Zhang, Mochen; Jin, Tong; Shi, Chengmin; Du, Zhenglin; Zhang, Yadong; Liu, Chuandong; Li, Rujiao; Zeng, Jingyao; Hao, Lili; Jiang, Shuai; Chen, Hua; Han, Dali; Xiao, Jingfa; Zhang, Zhang; Zhao, Wenming; Xue, Yongbiao; Bao, Yiming title: The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR date: 2020-12-28 journal: Genomics Proteomics Bioinformatics DOI: 10.1016/j.gpb.2020.09.001 sha: 3f591183961eb0113ed578ee4151635efe399bc2 doc_id: 1044746 cord_uid: 0cjissln On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/. tional annotation, and population frequency for each variant. Spatiotemporal change for each vari- All Influenza Data (GISIAD) [4] repository and NCBI Gen-80 Bank [5] . Many sequences are available in multiple repositories 81 but their updates are not synchronized. This makes it extre-82 mely challenging for worldwide users to effectively retrieve a 83 non-redundant and most updated set of sequence data, and 84 to collaboratively and rapidly deal with this global pandemic. 85 Toward this end, we constructed the 2019 Novel Coronavirus 86 Resource (2019nCoVR, https://bigd.big.ac.cn/ncov/) [6] . 87 Through comprehensive integration and value-added annota-88 tion and analysis, we provide public, free, and rapid access All genome sequences as well as their related metadata were 106 integrated from SARS-CoV-2 resources worldwide, including 107 GISAID [4] , NCBI [5] , National Genomics Data Center 108 (NGDC) [7] , National Microbiology Data Center (NMDC) 109 [8] , and China National GeneBank (CNGB) [9] . To provide 110 a non-redundant dataset, duplicated records from different 111 databases were identified and merged. and source, is provided in Table S1 . To provide high-quality genome sequences that are criti- [14] , and also to provide more friendly interfaces and online 424 tools in support of research activities worldwide. GenBank, GISAID, and NMDC resources. We acknowledge 475 the sample providers and data submitters listed in Table S1 . Coronaviridae Study Group of the International Committee on 512 Taxonomy of Viruses. The species severe acute respiratory 513 syndrome-related coronavirus: classifying 2019-nCoV and naming 514 it SARS-CoV-2 A new 516 coronavirus associated with human respiratory disease in China The elements of 519 data sharing GISAID: global initiative on sharing all 521 influenza data -from vision to reality Reference sequence (RefSeq) database at NCBI: 525 current status, taxonomic expansion, and functional annotation The 2019 novel coronavirus resource National Genomics Data Center Members and Partners. Data-531 base resources of the National Genomics Data gcMeta: a global 534 catalogue of metagenomics platform to support the archiving, 535 standardization and analysis of microbiome data Increased interactivity and improvements to the GigaScience 539 database GigaDB MUSCLE: multiple sequence alignment with high 541 accuracy and high throughput The ensembl variant effect predictor 3Dmol.js: molecular visualization with WebGL An 548 online coronavirus analysis platform from the On the origin 551 and continuing evolution of SARS-CoV-2 Spike mutation pipeline reveals the emergence 555 of a more transmissible form of SARS-CoV-2 Median-joining networks for 558 inferring intraspecific phylogenies The application of genomics to tracing 560 bacterial pathogen transmission Bias and incorrect rooting make phylogenetic 564 network tracing of SARS-CoV-2 infections unreliable Diverse sources of C. difficile infection 568 identified on whole-genome sequencing A dynamic nomenclature proposal for SARS-CoV-2 572 lineages to assist genomic epidemiology The 575 architecture of SARS-CoV-2 transcriptome The authors have declared no competing interests.