key: cord-0961490-2lr4xara authors: Resende, Paola Cristina; Delatorre, Edson; Gräf, Tiago; Mir, Daiana; Motta, Fernando do Couto; Appolinario, Luciana Reis; da Paixão, Anna Carolina Dias; Ogrzewalska, Maria; Caetano, Braulia; dos Santos, Mirleide Cordeiro; de Almeida Ferreira, Jessylene; Santos Junior, Edivaldo Costa; da Silva, Sandro Patroca; Fernandes, Sandra Bianchini; Vianna, Lucas A; da Costa Souza, Larissa; Ferro, Jean F G; Nardy, Vanessa B; Croda, Júlio; Oliveira, Wanderson K; Abreu, André; Bello, Gonzalo; Siqueira, Marilda M title: Genomic surveillance of SARS-CoV-2 reveals community transmission of a major lineage during the early pandemic phase in Brazil date: 2020-06-18 journal: bioRxiv DOI: 10.1101/2020.06.17.158006 sha: b3266275a9974ba471b69fa017a2120ccdd0816f doc_id: 961490 cord_uid: 2lr4xara Despite all efforts to control the COVID-19 spread, the SARS-CoV-2 reached South America within three months after its first detection in China, and Brazil became one of the hotspots of COVID-19 in the world. Several SARS-CoV-2 lineages have been identified and some local clusters have been described in this early pandemic phase in Western countries. Here we investigated the genetic diversity of SARS-CoV-2 during the early phase (late February to late April) of the epidemic in Brazil. Phylogenetic analyses revealed multiple introductions of SARS-CoV-2 in Brazil and the community transmission of a major B.1.1 lineage defined by two amino acid substitutions in the Nucleocapsid and ORF6. This SARS-CoV-2 Brazilian lineage was probably established during February 2020 and rapidly spread through the country, reaching different Brazilian regions by the middle of March 2020. Our study also supports occasional exportations of this Brazilian B.1.1 lineage to neighboring South American countries and to more distant countries before the implementation of international air travels restrictions in Brazil. Introduction 59 COVID-19, the disease caused by Severe Acute Respiratory Syndrome Coronavirus-2 60 (SARS-CoV-2), is leading to high rates of acute respiratory syndrome, hospitalization, and death 61 genomes (> 10% of ambiguous positions), we obtained a final dataset of 7,674 sequences. Because 160 most sequences recovered (75%) were from the United Kingdom (UK), we generate a "non-161 redundant" global balanced dataset by removing very closely related sequences (genetic similarity 162 > 99.99%) from the UK. To achieve this aim, sequences from the UK were grouped by similarity 163 with the CD-HIT program 27 and one sequence per cluster was selected. With this sampling 164 procedure, we obtained a balanced global reference B.1.1 dataset containing 3,764 sequences that 165 were aligned with the new B.1.1 Brazilian sequences generated in this study using MAFFT v7.467 166 28 and then subjected to maximum-likelihood (ML) phylogenetic analyses. The ML phylogenetic 167 tree was inferred using IQTREE v1.6.12 29 (MC) and compared to a null hypothesis generated by tip randomization. Results were considered 182 significant for P < 0.01. 183 The age of the most recent common ancestor (TMRCA) and the spatial diffusion pattern of the 185 Brazilian SARS-CoV-2 sequences here obtained were classified as clade B.1 (95%, n = 90), and 210 particularly within the sub-clade B.1.1 (92%, n = 87) (Fig. 1B) . The prevalence of the sub-clade 211 B.1.1 in our sample (92%) was much higher than that observed in other Brazilian sequences 212 available in GISAID (36%) (Fig. 1C) phylogenetic tree, consistent with the hypothesis of multiple independent introductions (Fig. 2) (Fig. 2) . We also detected two other well-supported (SH-aLRT 228 > 80%) monophyletic clades of small size (n = 2-11) mostly composed by Brazilian sequences 229 ( Supplementary Fig. 1) . 230 In addition to sharing the three nucleotide mutations (G28881A, G28882A, G28883C) 231 characteristic of the clade B. Fig. 2 ). Despite the low genetic diversity, analyses of 253 geographic structure rejected the null hypothesis of a panmixed population (Supplementary 254 January -20 th February) and its dissemination to Brazil at 19 th February (95% HPD: 4 th February 260 -28 th February) (Fig. 3A) from Western Europe into Brazil before 2 nd February and that synapomorphic mutations T29148C 283 and T27299C were fixed at sequential steps during subsequent virus local spread (Fig. 4B) The authors wish to thank all the health care workers and scientists, who have worked hard to deal 499 with this pandemic threat, the GISAID team and all the submitters of the database. GISAID A Novel Coronavirus from Patients with Pneumonia in China The 393 species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV 394 and naming it SARS-CoV-2 An interactive web-based dashboard to track COVID-19 397 in real time Brazilian Ministry of Health. Brasil confirma primeiro caso da doença -COVID-19 Brazilian Ministry of Health Nextstrain: real-time tracking of pathogen evolution A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic 406 epidemiology. bioRxiv Global initiative on sharing all influenza data -from 408 vision to reality Revealing COVID-19 Transmission by SARS-CoV-2 Genome 410 Sequencing and Agent Based Modelling. bioRxiv Tracking the COVID-19 pandemic in Australia using genomics A phylodynamic workflow to rapidly gain insights into the dispersal 415 history and dynamics of SARS-CoV-2 lineages. bioRxiv SARS-CoV-2 Transmission Chains from Genetic Data: A Danish Case 418 Study. bioRxiv Introductions and early spread of SARS-CoV-2 in France. bioRxiv Spread of SARS-CoV-2 in the Icelandic Population Full genome viral sequences inform patterns of SARS-CoV-2 spread into 424 and within Israel. medRxiv Rapid SARS-CoV-2 whole genome sequencing for informed 426 public health decision making in the Netherlands. bioRxiv Phylodynamics of SARS-CoV-2 transmission in Spain. bioRxiv The emergence of SARS-CoV-2 in Europe and the US. bioRxiv Introductions and early spread of SARS-CoV-2 in the New 433 Genomic surveillance reveals multiple introductions of SARS-CoV-2 into 435 Northern California The ongoing COVID-19 epidemic in Minas Gerais, Brazil: insights from 437 epidemiological data and SARS-CoV-2 whole genome sequencing. medRxiv Importation and early local transmission of COVID-19 in Brazil Genomic and phylogenetic characterization of an imported case of SARS-CoV-2 in 444 Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-447 PCR SARS-CoV-2 genomes recovered by long amplicon tiling multiplex 449 approach using nanopore sequencing and applicable to other sequencing platforms SeqinR 1.0-2: A Contributed Package to the R Project for Statistical 452 Computing Devoted to Biological Sequences Retrieval and Analysis. Structural 453 Approaches to Sequence Evolution Cd-hit: a fast program for clustering and comparing large sets of 457 protein or nucleotide sequences MAFFT multiple sequence alignment software version 7: 460 improvements in performance and usability IQ-TREE: a fast and 463 effective stochastic algorithm for estimating maximum-likelihood phylogenies ModelFinder: fast model selection for accurate phylogenetic estimates Exploring the temporal 469 structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Correlating viral phenotypes with phylogeny: 472 accounting for phylogenetic uncertainty Bayesian phylogenetic and phylodynamic data integration using 475 BEAST 1.10 Improved Performance, Scaling, and Usability Performance Computing Library for Statistical Phylogenetics Bayesian coalescent 480 inference of past population dynamics from molecular sequences Bayesian phylogeography 483 finds its roots