key: cord-1006297-sfww0xcc authors: Xavier, J.; Giovanetti, M.; Adelino, T.; Fonseca, V.; Vitor Barbosa da Costa, A.; Aparecida Ribeiro, A.; Nascimento Felicio, K.; Guerra Duarte, C.; Vinicius Ferreira Silva, M.; Salgado, A.; Teixeira Lima, M.; de Jesus, R.; Fabri, A.; Franco Soares Zoboli, C.; Gutemberg Souza Santos, T.; Iani, F.; Maria Bispo de Filippis, A.; Agudo Mendonca Teixeira de Siqueira, M.; Luiz de Abreu, A.; de Azevedo, V.; Brock Ramalho, D.; F. Campelo de Albuquerque, C.; de Oliveira, T.; Holmes, E. C.; Lourenco, J.; Alcantara, L. C. J.; Aparecida Assuncao Oliveira, M. title: The ongoing COVID-19 epidemic in Minas Gerais, Brazil: insights from epidemiological data and SARS-CoV-2 whole genome sequencing. date: 2020-05-11 journal: nan DOI: 10.1101/2020.05.05.20091611 sha: 67fb1261c84d51ce97baa592c10feff6855eb347 doc_id: 1006297 cord_uid: sfww0xcc The recent emergence of a previously unknown coronavirus (SARS-CoV-2), first confirmed in the city of Wuhan in China in December 2019, has caused serious public health and economic issues due to its rapid dissemination worldwide. Although 61,888 confirmed cases had been reported in Brazil by 28 April 2020, little was known about the SARS-CoV-2 epidemic in the country. To better understand the recent epidemic in the second most populous state in southeast Brazil (Minas Gerais, MG), we looked at existing epidemiological data from 3 states and sequenced 40 complete genomes from MG cases using Nanopore. We found evidence of multiple independent introductions from outside MG, both from genome analyses and the overly dispersed distribution of reported cases and deaths. Epidemiological estimates of the reproductive number using different data sources and theoretical assumptions all suggest a reduction in transmission potential since the first reported case, but potential for sustained transmission in the near future. The estimated date of introduction in Brazil was consistent with epidemiological data from the first case of a returning-traveler from Lombardia, Italy. These findings highlight the unique reality of MGs epidemic and reinforce the need for real-time and continued genomic surveillance strategies as a way of understanding and therefore preparing against the epidemic spread of emerging viral pathogens. To date, more than 3.5 million cases of the disease caused by SARS-CoV-2, termed COVID- 19, 66 have been reported around the world [8, 9] . On 11 March 2020, the WHO declared the COVID-19 a pandemic, prompting a dramatic increase in international concern and response [10]. On 26 68 February 2020, the first confirmed case of COVID-19 was reported in São Paulo (SP) state, Brazil [11] . Two months later (28 April 2020), 61,888 cases and 4,205 deaths attributed to COVID-19 had 70 been reported in Brazil [12] . Meanwhile, preliminary phylogenetic analysis using the first two SARS-CoV-2 complete genomes isolated in São Paulo from travelers returning from Italy, revealed 72 two independent introductions into the country, relative to the analyzed dataset available at that time [13] . 74 The state of Minas Gerais (MG) is the second largest Brazilian state in terms of population size, estimated at approximately 21 million people, and is located near the state of São Paulo [14] . Due to 76 its large population size and its well-connected and active neighboring states such as São Paulo and Rio de Janeiro, the state of MG is likely to be highly affected by the COVID-19 pandemic. 78 Genetic analyses and surveillance allow the characterization of circulating viral lineages, the inference of introduction events and the reconstruction of transmission patterns [15] . Together with 80 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint epidemiological data, they are powerful tools to assist public health initiatives and preparedness. In this study, we present a summary of epidemiological data and the generation and analysis of 40 new 82 SARS-CoV-2 genome sequences isolated from clinical samples of confirmed cases from MG, with the aim of providing a preliminary epidemiological overview of the circulation and introduction 84 events of the virus in that state. After the WHO declared the outbreak of SARS-CoV-2 a Public Health Emergency of International 88 Concern (PHEIC) on 30 January 2020, the Brazilian government declared a Public Health Emergency of National Importance on 3 February 2020, enabling the introduction of measures to 90 prevent and control spread [16] . Twenty-three days later, the first confirmed case in Brazil was reported in the city of São Paulo, related to a traveler returning from Lombardy, Italy (Fig 1) [11] . 92 By the 28 April 2020, more than 61,888 COVID-19 cases were confirmed in Brazil, 1,578 of which were from MG (Fig 2A) We used the mortality time series (MTS) from MG, SP and RJ to project the cumulative number of 106 infections, making two main simplifying assumptions: first, that the infection fatality ratio (IFR) of SARS-CoV-2 would be similar in the Brazilian states to that reported elsewhere; and second, that 108 the number of cumulative deaths in each state were well reported. We considered the IFR estimated . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint When using geographic information from reported cases in each state (Fig 2C and S13 Typically, incidence (cases, deaths) would be normalized per 100K individuals, taking into account the total population size of each state. Because of the very different spatial dispersion of cases and 140 deaths in MG when compared to SP and RJ, we decided to also calculate the effective population size -the sum of the population sizes of all municipalities with reports. When using reported cases, 142 we found that the effective population sizes were ~100%, ~100% and 64% of the total population sizes of RJ, SP and MG, respectively. When using reported deaths, the effective population sizes 144 were ~95%, ~92%, and 35% of the total population sizes of RJ, SP and MG, respectively. Overall these numbers suggest that in MG cases and deaths have been reported only in a subset of the 146 overall population, while in the other states SARS-CoV-2 appears widely dispersed. Incidence of reported cases per 100K using the effective population size was ~60 in SP, ~51 in RJ and ~7. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. coverage of 82.5% related to the reference genome NC_045512.3 (S1 Table) . All sequences generated in this study have been submitted to the GISAID Initiative following the WHO guidelines 170 on the importance of sharing genomic data during situations of public health emergency of international concern [25] . 172 Of the 17 (42.5%, n=40) sequenced cases with available travel history information, 14 cases (82.35%, n=17) reported international travel and three reported domestic travel. Two among the 174 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint later visited the city of São Paulo and one the city of Rio de Janeiro (Table 1) . Of the international travel-related cases, seven (50%) were linked to travel to European countries (Portugal, Spain, Italy, 176 Switzerland, Austria, England, Belgium, Germany, Czech Republic, and Hungary), while six reported travel to countries in the Americas (USA, Colombia, Jamaica, Cayman Islands, Panama, 178 Chile, and Peru). One case reported travel to Israel. To explore the history of the virus in MG, we performed a maximum likelihood (ML) phylogenetic 180 analysis on the dataset containing the 40 new sequences plus other 3,062 sequences deposited in GISAID up to 15 April 2020. Our estimated ML phylogeny identified two major clades branching 182 at the root of the tree (Fig 3) . These two clades were named lineages A and B, following a SARS-CoV-2 lineage nomenclature recently proposed [26] . 184 According to this nomenclature scheme, two main SARS-CoV-2 lineages could be identified as other countries such as Australia, China, Canada, Malaysia, and USA [28] . Moreover, two sequences were assigned to lineage B.2 (isolates CV22 and CV36), and one sequence to lineage A 194 (isolate CV7) (see S2 Table for full results) . Slightly different from the lineage assignment approach mentioned before, in our ML phylogeny 196 most of MG's new sequences (n=37, 92,5%) were placed in a descendant lineage we named B.1, which also included other sequences from GISAID sampled worldwide. Of these 37 sequences from 198 MG within lineage B.1 (Fig 3) , 11 are isolates from cases that reported travel to European countries . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint (isolates CV2, CV3, CV11, CV13, CV17) or the Americas (isolates CV4, CV6, CV12, CV20, 200 CV28, CV35), in addition to the isolate CV1 from a traveler who returned from Israel. Two MG's sequences (CV22 and CV9) fell into lineage B, one of which (CV22) reported travel to Germany. 202 The only sequence from MG that fell into lineage A refers to a case (CV7) that reported travel to European countries (Fig 3 and Table 1) . 204 To assess these lineages in more detail and in time, we performed Bayesian time-measured phylogenetic analysis using a molecular clock model. We analyzed three sub-datasets (named subset 206 A, subset B and subset B.1) extracted from each lineage from the ML tree that included Brazilian sequences. Our maximum clade credibility (MCC) trees showed that most of MG's sequences were 208 interspersed with other isolates sampled from other countries (Fig 4b, c, d) . This pattern, similar to that observed in other countries [28] [29] [30] , is also in accordance with our ML tree and with the 210 epidemiological data, indicating that these isolates were linked to travel exposure rather than community transmission, and reinforcing the idea that multiple independent introductions with 212 source abroad have occurred in MG. Despite the observed dispersed distribution, some sequences from MG grouped together, forming 214 clusters that also included sequences from Brazil and other countries. Subset B.1 tree shows these clusters containing more than one sequence from MG (Fig 4d) . However, these clusters have very 216 low posterior probability support, because of the low genetic diversity of SARS-CoV-2 genomes currently available worldwide [31-33]. Nonetheless, four clusters, each consisting of only two MG 218 sequences showed posterior probabilities of >80%. One of these clusters (Fig 4d) , with a posterior probability of 100%, was formed by isolates CV34 and CV36, referring to cases of seemingly local 220 transmission from contacts with a COVID-19 confirmed and suspected case, respectively. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint the epidemiological data from the first case confirmed in SP, regarding a traveler returning from Lombardy, Italy, on 21 February 2020 [11, 13] . 226 Despite the grouping of some MG sequences, we cannot infer a close relationship between these sequences with certainty at this stage, because of the small sample size data which covers only 228 about 30 days of the epidemic in MG. That is, this dataset cannot fully represent the genetic diversity of SARS-CoV-2 strains circulating. Moreover, the low genetic diversity of sequences 230 available so far limits conclusions about SARS-CoV-2 directionality and spread based solely on genetic data. As observed in another study [32] , due to the described limitations of the available 232 genomic data, the phylogenetic results presented should be approached with caution and considered as hypothesis-generating on the transmission events of SARS-CoV-2 in a local setting. 234 In conclusion, at the end of April 2020, the COVID-19 epidemic in the state of MG was still expanding (R>1) and it is highly dispersed with cases and deaths reported mostly away from the 236 capital city and with approximately only 64% and 35% of the total population being represented in case and death reported data, respectively. Genomic data and other epidemiological information 238 from travel-related cases, allowed us to identify several introduction events that occurred independently in MG, further helping to explain the geographical patchiness of reported cases and 240 deaths. These initial insights based on the restricted data that is available show that transmission is likely to continue in the near future and suggest room to improve reporting. Increasing COVID-19 242 testing and SARS-CoV-2 genomic sequencing would help to better understand on how the virus is spreading and would thus inform better control of the COVID-19 epidemic in Brazil. 244 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint Anonymized samples processed in this study were sent to the Central Public Health Laboratory/Octávio Magalhães Institute (IOM) of the Ezequiel Dias Foundation (FUNED), which belongs to 250 public laboratories network from Brazilian Ministry of Health (BMoH). They were previously obtained by the local health services for routine diagnosis of SARS-CoV-2 and epidemiological sur-252 veillance. The availability of these samples for research purposes during outbreaks of international concern is allowed to the terms of the 510/2016 Resolution of the National Ethical Committee for is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint Samples were selected based on the Ct value ≤ 32. Epidemiological data, such as symptoms, travel history and municipality of residency, was collected from medical records accompanying the 276 collected samples provided by IOM/FUNED. For the complementary DNA synthesis stage, the SuperScript IV Reverse Transcriptase kit 280 (Invitrogen) was used following the manufacturer's instructions. The generated cDNA generated was subjected to sequencing multiplex PCR using Q5 High Fidelity Hot-Start DNA Polymerase 282 (New England Biolabs) and a set of specific primers, designed by ARTIC Network (https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V1) for 284 sequencing the complete genome of SARS-CoV-2 [34] . PCR conditions have been previously reported in [34] . All experiments were performed on cabinet safety level 2. 286 Amplified PCR products were purified using the 1x AMPure XP Beads (Beckman Coulter) 288 following previously published protocol [35] . Purified PCR products were quantified using the Qubit® dsDNA HS Assay Kits (Invitrogen), following the manufacturer's instructions. Of the 48 290 samples, only 40 contained enough DNA (≥ 2ng/µL) to proceed to library preparation. Sequencing libraries were prepared using the Oxford Nanopore Ligation Sequencing Kit (SQK-LSK109) 292 following previously published protocol [35] . Before pooling all samples, each sample was barcoded using the Native Barcoding Expansion kits (NBD104 and EXP-NBD114). After barcoding 294 adaptor ligation, sequencing libraries were loaded on a flow cell (FLO-MIN106) for subsequent MinION sequencing, programmed to run for six hours. Reads were basecalled using Guppy and 296 barcode demultiplexing was performed using qcat. Consensus sequences were generated by de novo assembling using Genome Detective and Coronavirus Typing Tool [36, 37] . 298 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint Public SARS-CoV-2 complete genome sequences available up to 15 April 2020 were retrieved from the GISAID. Sequences were aligned using MAFFT (FF-NS-2 algorithm) following default 302 parameters [38] . The alignment was manually curated to remove artifacts at the ends and within the alignment using Aliview [39] . Phylogenetic analysis of these genome sequences was performed 304 using IQ-TREE (version 1.6.10) under the best fit model according to Bayesian Information Criterion (BIC) indicated by the Model Finder application implemented in IQ-TREE [40] . The 306 statistical robustness of individual nodes was determined using 1000 bootstrap replicates. Lineages assessment was conducted using Phylogenetic Assignment of Named Global Outbreak 308 LINeages tool available at https://github.com/hCoV-2019/pangolin [27] . Four complete or nearcomplete SARS-CoV-2 genome datasets were generated. Dataset 1 (n =3,102) comprised the data 310 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 11, 2020. The authors have declared that no competing interests exist. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.05.20091611 doi: medRxiv preprint Novel Coronavirus ( 2019-nCoV ) Situation Report -1, 21 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A new coronavirus associated 380 with human respiratory disease in China A novel coronavirus from patients with pneumonia in China A pneumonia outbreak 386 associated with a new coronavirus of probable bat origin Pangolin homology associated with 2019-nCoV The proximal origin of SARS-CoV-2 Coronavirus disease (COVID-19) Situation Report-106 The 396 species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Boletim Epidemiológico Especial-14 / SE 18-26 de abril de 2020 COE-402 COVID19 Coronavirus disease (COVID-19) Situation Report -406 103 First cases 408 of coronavirus disease (COVID-19) in Brazil, South America (2 genomes Brasileiro de Geografia e Estatística -IBGE. Cidades e Estados: Minas Gerais. 2020 Towards a genomics-informed, real-time, global pathogen surveillance system Secretaria-Geral da Presidência da República -Imprensa Nacional: Portaria N o 188, De 3 De Fevereiro De 2020 Boletim Epidemiológico COVID-19: Doença causada pelo coronavírus -19 28 de abril de 2020 Brasileiro de Geografia e Estatística -IBGE. Cidades e Estados -Belo Horizonte. 2020 Estimating case fatality rates of COVID-19. The Lancet. Infectious diseases Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using 430 age-adjusted data from the outbreak on the Diamond Princess cruise ship Estimates of the severity of coronavirus disease 2019: a model-based analysis Estimating 436 clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 440 European countries Data sharing during the novel 444 coronavirus public health emergency of international concern A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology Phylogenetic Assignment of Named Global Outbreak LINeages Genomic epidemiology of novel coronavirus -Global subsampling CoV-2 whole genome sequencing for informed public health decision making in the Netherlands Genomic epidemiology of SARS-CoV-2 in Guangdong Province, China. medRxiv Regaining perspective on SARS-CoV-2 molecular tracing and its implications A snapshot of SARS-CoV-2 genome availability 464 up to 30th March, 2020 and its implications Phylodynamic Analysis | 176 genomes | 6 nCoV-2019 sequencing protocol. protocols.io; 2020. 470 doi:dx Multiplex 472 PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel 476 coronavirus genomes An automated system for virus identification from high-throughput sequencing data MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization AliView: a fast and lightweight alignment viewer and editor for large data sets IQ-TREE: A fast and effective 486 stochastic algorithm for estimating maximum-likelihood phylogenies Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol Bayesian 492 phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol Sampling theory for neutral alleles in a varying environment Posterior summarization in Bayesian phylogenetics using Tracer 1.7 Confirmed cases and deaths of COVID-19 in Brazil, at municipal (city) level