key: cord-0835975-lrvs4w4j authors: Sant'Anna, F. H.; Varela, A. P. M.; Prichula, J.; Comerlato, J.; Comerlato, C. B.; Roglio, V. S.; Pereira, G. F. M.; Moreno, F.; Seixas, A.; Wendland, E. M. title: Emergence of the novel SARS-CoV-2 lineage P.4.1 and massive spread of P.2 in South Brazil date: 2021-04-20 journal: nan DOI: 10.1101/2021.04.14.21255429 sha: a21604594f89a976d36b82787701385f62e8347a doc_id: 835975 cord_uid: lrvs4w4j South Brazil has been the novel epicenter of Coronavirus Disease 2019 (COVID-19) in 2021, accounting for the greatest number of cumulative cases and deaths (per 100 thousand inhabitants in a week) worldwide. In this study, we analyzed 340 whole genomes of SARS-CoV-2, which were sampled between April and November 2020 in 33 cities in South Brazil. We demonstrated the circulation of two novel emergent lineages, described here as P.4 and P.4.1 (provisionally termed VUI-NP13L), and seven lineages that had already been assigned (B.1.1.33, B.1.1.28, P.2, B.1.91, B.1.1.94, B.1.195 and B.1.212). P.2 and P.4.1 demonstrated massive spread from approximately September/October 2020. Constant and consistent genomic surveillance is crucial to identify newly emerging SARS-CoV-2 lineages in Brazil and to guide decision making in the Brazilian Public Healthcare System. Since the emergence of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) in China at the end of 2019, COVID-19 (coronavirus disease 2019) has been responsible for more than 2.7 million deaths worldwide [1] [2] [3] . . Brazil is second in terms of the number of COVID-19 deaths, only behind the USA, registering more than 303,462 deaths and 12,320,169 cumulative cases as of 26 March 2021. Rio Grande do Sul (RS) , the most southern state of Brazil, borders Argentina and Uruguay and had its first COVID-19 case confirmed at the end of February 2020. More than one year later, RS reached a peak of infections leading to a collapse of the health system 4,5 , with 14,957 deaths and 742,866 cases 6, 7 . Even after one pandemic year and the introduction of SARS-CoV-2 vaccination worldwide, we have no forecasted end of the COVID-19 pandemic. Additionally, resurgence of COVID-19 after the first wave and after reaching high seroprevalence may drive positive selection of new lineages 8, 9 . Constant epidemiological genomic surveillance through largescale pathogen genome sequencing has played a major role in the detection and spatialtemporal distribution of SARS-CoV-2 lineages. Therefore, these data allow quasi-real-time tracking of viral dynamics around the globe [10] [11] [12] [13] [14] [15] . Despite the volume of COVID-19 cases in Brazil, a task force created by the scientific community was able to sequence only approximately 0.03% of all positive SARS-CoV-2 cases through the pandemic's first year 16, 17 . To date, 59 different lineages were announced in Brazil, being the majority sequenced in São Paulo, Rio de Janeiro, Rio Grande do Sul, and Amazonas 16 . The lineages more frequently identified and currently in circulation are B.1, the first ancestral lineage introduced in Brazil, B.1.1.212, B.1.1.33, B.1.1.74, B.1.1.28, B.1.1.143, B.1.1.94 , and three recently assigned lineages, P. 1, P.2, and N.9 , derived from All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 B.1.1. 28 and B.1.1.33 16, 18 . From November 2020 until now, P.1 and P. 2 Despite sequencing efforts, there is no accurate knowledge about the frequency of SARS-CoV-2 lineage distribution in Brazil, mainly due to sampling gaps in spatiotemporal strata and the low number of genomes 19, 20 . At the time of this manuscript (26 March 2021) , approximately 4,500 SARS-CoV-2 genomes from Brazil were available in GISAID. This study aimed to reconstruct the spatiotemporal pattern of SARS-CoV-2 spread in the first year of the pandemic in South Brazil, searching for the emergence of novel lineages. Wholegenome sequencing was performed for 340 SARS-CoV-2 genomes, the largest temporal (April and November 2020) monitoring of Brazil to date. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Between 24 April and 30 November, 2020, 13,700 Our sequencing effort was composed of a homogeneous distribution of samples along a temporal window from April to November 2020, when compared with a dataset available for the same period ( Supplementary Fig. 2 ). Our findings more than double the number of SARS-CoV-2 sequences (from 232 to 572) from South Brazil available in GISAID. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Genomes reported in this study were assigned to 12 lineages based on the proposed dynamic nomenclature of Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) 21, 22 2), and it was termed the P.4 lineage. Inside the P.4 clade, we found a distinct cluster, named P.4.1, of sequences presenting four additional unique amino acid changes in ORF1a (P22875S, V2588F, L3027F, Q3777H) and two synonymous mutations in the ORF1a gene (C1288T and G10870T) ( Fig. 1B; Fig. 2 ). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Considering the other lineages, it is worth noting that B.1.91 presents two amino acid changes in the spike protein. In turn, all P.2 samples have the E484K, V1176F, and D614G mutations in the spike protein, and all those from Rio Grande do Sul present the synonymous mutation T3766C (Fig. 1B) . The nine assigned lineages were investigated regarding their introduction in RS in a patient who was exposed in Goiás (Brazil/SP-HIAE-ID37/2020), followed by a sequence All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 obtained in São Paulo (Brazil/SP-1091 The geographic distribution of lineages was evaluated for all RS-sampled regions since the first detection of P.4.1 (Fig. 4) All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 This is the largest genomic study of SARS-CoV-2 in Brazil considering the number of sequences and the temporal monitoring window. We characterized two new lineages, P.4 and P.4.1, and identified the massive spread of P.2 and P.4.1 lineages since October 2020 in South Brazil. The main strength of our study was the analysis of a substantial number of genomes, which represent a geographic region of 33 cities over 8 months. This approach was decisive to evaluate the lineage prevalence across sampled regions and monitor the emergence of new variants of interest. We optimized the sequencing library preparation protocol, improving SARS-CoV-2 genome amplification and enhancing sequencing performance. Although we investigated different regions of the RS, including the metropolitan region, the sampling was done in only one state, which did not allow extrapolation of the findings to the rest of Brazil. Sequences evaluated in our study are restricted to 2020; therefore, we cannot rule out if the lineages are still active across southern Brazil. Of the seven main circulating lineages, B. Brazil 16, 18, 20, 23 . This lineage was also associated with secondary outbreaks in Argentina and Uruguay, countries bordering RS 18, 24 . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Our phylogenomic analyses provided evidence that B.1.1.28 is a paraphyletic group composed of several subclades of distinct evolutionary origins and unique genetic signatures. Therefore, we found that the B. Our results also indicated a sharp increase in the P. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.14.21255429 doi: medRxiv preprint names for descendants of the B.1.1.28 lineage cannot exceed three sublevels in the current classification system, thus giving P.4 and P.4.1 21, 25 . Although P.4.1 probably emerged in Goiás or São Paulo around Jun-Jul 2020, this lineage was only identified in South Brazil at the beginning of October 2020. According to an independent phylodynamic reconstruction, P.4.1 rapidly arrived in the southeastern and northeastern regions of Brazil and seems to have been exported to Japan, the Netherlands and England 23 . The success of the spread of P.4.1, beyond having important mutations in the spike protein (V1176F, D614G), also shared by P.2 and P.1 lineages, could be justified by the unique amino acid changes in ORF1a 16, 26, 27 . This gene is known to encode a polyprotein involved in the replication complex 28 . Previous studies concerning SARS-CoV and MERS indicated the role of ORF1a in survival and adaptation to the host 28,29 . Further, Forni et al. (2016) suggested that ORF1a positive selection might contribute to host shifts or immune evasion 29 . However, we cannot discard the increase in frequency of P.4.1 due to random causes such as a "founder effect," regardless of the mutation effects of ORF1a on SARS-CoV-2 fitness 27 . Our study reinforces the importance of consistent and continuous genomic surveillance for evaluating the genomic background of SARS-CoV-2 in a given spatiotemporal setup. These data are fundamental for inferring SARS-CoV-2 outbreaks and revealing signatures, activity and origins of the lineages. Furthermore, genome surveillance is an invaluable resource to guide decision making of the Brazilian Public Healthcare System. Immunization campaigns could be especially affected by the emergence of novel lineages that evade antibodies generated by current vaccines. Future studies are needed to assess the fate of P.2 and P.4.1 over time, and if they are still observed, we need to evaluate the impact of amino acid changes on the fitness of these lineages. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Total nucleic acids from positive samples of the Epiclin Laboratory diagnostic tests were selected considering a spatiotemporal outlook. Strata were defined using two criteria: epidemiological week and residence of the participant. The selected period includes epidemiological weeks between 17 (April 2020) and 49 (November 2020) from four regions (Novo Hamburgo, Taquara, Canoas, and Porto Alegre) . In each of the 132 strata (33 weeks and 4 regions), samples were selected randomly for sequencing, totaling 353 RNA samples (Supplementary Methods). Libraries were prepared using the QIASEQ SARS-CoV-2 Primer Panel (Qiagen) and Raw paired-end reads were processed using a bioinformatic pipeline previously described with some modifications 31, 32 . The reads were mapped with BWA 0.7.17 software 33 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 to the Wuhan-Hu-1 reference genome (NC_045512.2) and converted to BAM format using samtools v1.7 34 . The primer sequences were trimmed with the iVar v. 1.2.3 package, and consensus sequences were generated considering a Phred quality score minimum of 20 and N for regions with coverage depths less than 10 bases. The quality of the genome sequences was assessed using Nextclade version v. 0.14.1. Sequences classified as "bad" in the overall quality evaluation were discarded from further analyses. Viral genomes were deposited in GISAID, and the accession numbers are available in Supplementary Data 1 33 . Genome sequences obtained in this study were concatenated with global sequences according to the Pangolin lineage. Sequences were filtered using Augur Filter subcommand with a minimal length of 27,000 nucleotides, and subsequently aligned with MAFFT, FFT-NS-2 option with default parameters. The phylogenetic tree was built using IQ-Tree using the GTR model and ultrafast bootstrapping with 1,000 replicates, and subsequently visualized in iTOL v6 35 . 6. Phylogenomic, phylogeographic, and phylodynamics analyses All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Phylogeographic and phylodynamic analyses were carried out using Nextstrain, a suite of tools that includes subsampling, alignment, phylogenetic reconstruction, geographic and ancestral trait reconstruction, and inference of transmission events 36 . For these analyses, our dataset of 340 local genome sequences was concatenated with the Nextstrain South America dataset (build 2021-03-03) (Supplementary Data 2 and 3). All 3,965 sequences were included in the pipeline (subsampling step was bypassed). Samples were classified according to their exposure location, taking into consideration the following rationale: samples from Rio Grande do Sul, samples from other regions of Brazil, samples from South America, excluding Brazilian sequences; and samples from other continents. Trait reconstruction was performed considering this custom geographic trait. Kernel density of the lineages during the epidemiological weeks was built using a script written in Python (Seaborn library). The 340 SARS-CoV-2 sequences obtained in this study were submitted to the GISAID portal and are available in Supplementary Data 1. Figure 1A legend.. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 doi: medRxiv preprint COVID19: an announced pandemic A Novel Coronavirus from Patients with Pneumonia in China Coronaviridae Study Group of The International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 An interactive web-based dashboard to track COVID-19 in real time Virus population dynamics during infection Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence Genomics and epidemiological surveillance Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted Genomic surveillance of COVID-19 cases in Beijing Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand Genomic epidemiology of the early stages of the SARS-CoV-2 outbreak in Russia Mutation hotspots, geographical and temporal distribution of SARS-CoV-2 lineages in Brazil Pervasive transmission of E484K and emergence of VUI-NP13L with evidence of SARS-CoV-2 co-infection events by two different lineages in Rio Grande do Sul, Brazil A potential SARS-CoV-2 variant of interest (VOI) harboring mutation E484K in the Spike protein was identified within lineage B.1.1.33 circulating in Brazil Genomic Epidemiology of SARS-CoV-2 in Esteio Genomics and epidemiology of a novel SARS-CoV-2 lineage in Manaus A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Genomic surveillance of SARS-CoV-2 tracks early interstate transmission of P.1 lineage and diversification within P.2 clade in Brazil Recurrent dissemination of SARS-CoV-2 through the Uruguayan-Brazilian border Sixteen novel lineages of SARS-CoV-2 in South Africa SARS-CoV-2 spike D614G change enhances replication and transmission Unique and conserved features of genome and proteome of SARScoronavirus, an early split-off from the coronavirus group 2 lineage Extensive Positive Selection Drives the Evolution of Nonstructural Proteins in Lineage C Betacoronaviruses Genomic analysis of SARS-CoV-2 reveals local viral evolution in Ghana Fast and accurate short read alignment with Burrows-Wheeler transform The Sequence Alignment/Map format and SAMtools The authors declare no competing interests. All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 The Brazilian Ministry of Health along with Moinhos de Vento Hospital, through the Program for Supporting the Institutional Development of Public Health System (PROADI-SUS), financed the study. We thank the Federal University of Health Sciences of Porto Alegre (UFCSPA), which provided post doctoral financial support to APMV. We also thank Dr. Vanessa Mattevi, Dr. FHS helped with study design, sequencing, performed the bioinformatic analysis, data interpretation, and writing; APMV and JP helped with study design, performed the sequencing, data interpretation, figure drawing, and writing; JC and CBC helped with study design, data interpretation and writing; VSR helped with sampling design; AS, FM and GPA revised the manuscript; EMW conceived the study and revised the manuscript. All authors contributed to the article and approved the submitted version.All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.14.21255429 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.14.21255429 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 https://doi.org/10. .04.14.21255429 doi: medRxiv preprint 2078 https://doi.org/10. -2079 https://doi.org/10. (2009 All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted April 20, 2021