key: cord-0767566-04usqbsp authors: Franceschi, V. B.; Caldana, G. D.; Perin, C.; Horn, A.; Peter, C.; Cybis, G. B.; Ferrareze, P. A. G.; Rotta, L. N.; Cadegiani, F. A.; Zimerman, R. A.; Thompson, C. E. title: Predominance of the SARS-CoV-2 lineage P.1 and its sublineage P.1.2 in patients from the metropolitan region of Porto Alegre, Southern Brazil in March 2021: a phylogenomic analysis date: 2021-05-22 journal: nan DOI: 10.1101/2021.05.18.21257420 sha: ec0f3eb19dcb22ebe00d3e1e03f3eb2c789c7705 doc_id: 767566 cord_uid: 04usqbsp Almost a year after the COVID-19 pandemic had begun, The United Kingdom, South Africa, and Brazil became the epicenter of new lineages, the Variant of Concern (VOCs), B.1.1.7, B.1.351, and P.1, respectively. These VOCs are increasingly associated with enhanced transmissibility, immunity evasion, and mortality. The previous most prevalent lineages in the state of Rio Grande do South (Brazil), B.1.1.28 and B.1.1.33 were rapidly replaced by P.1 and P.2, two B.1.1.28-derived lineages harboring the E484K mutation. To perform a genomic characterization of SARS-CoV-2 samples from COVID-19 patients from the metropolitan region of Porto Alegre (Rio Grande do Sul, Southern Brazil), in this second pandemic wave, we sequenced viral samples from patients of this region to: (i) identify the prevalence of SARS-CoV-2 lineages in the region, the state and bordering countries/states, (ii) characterize the mutation spectra, and (iii) hypothesize possible viral dispersal routes by using phylogenetic and phylogeographic approaches. As results, we not only confirmed that 96.4% of the samples belonged to the P.1 lineage but also that approximately 20% of which could be assigned as the newer P.1.2 (a P.1 derived new sublineage harboring new signature substitutions recently described and present in other Brazilian states and foreign countries). Moreover, P.1 sequences from this study were allocated in several distinct branches (four clades and five clusters) of the P.1 phylogeny, suggesting multiple introductions of P.1 in Rio Grande do Sul still in 2020 and placing this state as a potential core of diffusion and emergence of P.1-derived clades. It is still uncertain if the emergence of P.1.2 and other P.1 clades are related to further virological, clinical, or epidemiological consequences. However, the clear signs of viral molecular diversification from recently introduced P.1 warrant further genomic surveillance. , the probable place of origin of P.1 lineage, faced a major second wave of COVID-19. An explosive resurgence of cases and deaths became evident in mid-December 2020. Since the P.1 variant carries multiple mutations of potential biological significance (especially E484K, K417T, and N501Y in the Receptor-Binding Domain [RBD] from spike protein), (i) many key substitutions may lead to the immunity evasion, (ii) higher transmissibility when compared with pre-existing lineages have been characterized, (iii) this VOC has been the focus of increased surveillance and has deserved being studied in greater detail (12) . After this outbreak, almost all Brazilian states experienced increases in the number of cases, hospitalizations, intensive care unit (ICU) admissions, and deaths, resulting in a reemergence of the public health crisis previously experienced in the first wave of COVID-19 (13) . The diversity of SARS-CoV-2 during the first epidemic wave in Brazil was mainly composed of B.1.1.28 and B.1.1.33 lineages (2, 14) , although the very low sequencing rate across the country has limited these estimates (14) . However, these previous lineages were rapidly replaced by P.1 and P.2, both derived from the common ancestor B.1.1.28 and harbor concerning mutations in the Spike protein (e.g., E484K and N501Y), from late 2020 and early 2021 (14, 15) . In the RS state, the most common lineages identified to May 2021 still are B.1.1.33 (n=290) and B.1.1.28 (n=238). Nevertheless, P.1 has emerged as the most prevalent lineage sequenced in more recent samples (16) . Recently, newer mutations were detected in addition to the original set presented in P.1, giving rise to the sublineage P.1.2 (17) . P.2 probably emerged in the Rio de Janeiro state (Southeast) (18) , but was also found in several municipalities of the RS state as of October 2020 (19, 20) . The first P.1 infection in the state was once thought to be in a patient of Gramado city in February 2021 (21) . However, in a more recent study, the actual first P.1 was detected on November 30. This happened in a patient with comorbidities from Campo Bom city, who were reinfected by the P.2 lineage on March 11 (22) . Even though RS was one of the least affected Brazilian states in the first epidemic wave, it suffered a pronounced increase in cases in late 2020 (13) . In February 2021, the progressive increases in cases and hospitalizations (3.8-fold) led to the collapse of the local state healthcare system. Since recent findings of the widespread dissemination of the SARS-CoV-2 lineage P. 1 Brazil have been confirmed, we sequenced samples from patients from the metropolitan region of . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint Porto Alegre to (i) identify the prevalence of SARS-CoV-2 lineages in the region, the state and bordering countries/states, (ii) characterize the mutation spectra, and (iii) hypothesize possible viral dispersal routes by using phylogenetic and phylogeographic approaches. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021 The study was performed in accordance with the Declaration of Helsinki. Patients included in the clinical trial approved by CONEP were informed in detail about the study and gave written informed consent to participate. All samples belonging to the Hospital da Brigada Militar patients that yielded positive RT-qPCR had their laboratory electronic records reviewed to compile metadata such as date of collection, sex, age, symptoms, exposure history, and clinical status, when available. Samples were anonymized before being received by the study investigators, following Brazilian and international ethical standards. Samples were obtained from Hospital da Brigada Militar patients, both admitted or visiting the emergency ward, from Porto Alegre, RS, Brazil. Nasopharyngeal swabs were collected and placed in saline solution. Samples were transported to the clinical laboratory (Laboratório Exame) and tested on the same day for SARS-CoV-2 using Real Time Reverse-transcriptase Polymerase Chain Reaction (Charité RT-qPCR assays). The RTq-PCR assay used primers and probes recommended by the World Health Organization targeting the Nucleocapsid (N1 and N2) genes (23) . Remnant samples were stored at -20°C. Between March 9th to March 17th, all the routinely tested samples of the clinical laboratory provenient of the Hospital da Brigada Militar patients and yielded positive RT-qPCR were selected. Subsequently, those positive clinical samples were submitted to a second RT-qPCR performed by BiomeHub (Florianópolis, Santa Catarina, Brazil), using the same protocol (charite-berlin). Only samples with quantification cycle (Cq) below 30 for at least one primer were submitted to the SARS-CoV-2 genome sequencing. In total, 56 patients who presented symptoms such as fever, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint cough, sore throat, dyspnea, anosmia, fatigue, diarrhea, and vomiting (moderate and severe clinical status) (24) were included in the study. Total RNAs were prepared as in the reference protocol (25) Quality control, reference mapping, and consensus calling were performed using an inhouse pipeline developed by BiomeHub (Florianópolis, Santa Catarina, Brazil). Briefly, adapters were removed and reads were trimmed by size=150. Reads were mapped to the reference SARS-CoV-2 genome (GenBank accession number NC_045512.2) using Bowtie v2.4.2 (end-to-end and very-sensitive parameters) (26) . Mapping coverage and depth were retrieved using samtools v1.11 (27) ≤ 1000) combined with bcftools filter (DP>50) and bcftools consensus v1.11 (28) . Coverage values for each genome were plotted using the karyoploteR v1.12.4 R package (29) . Finally, we assessed the consensus sequences quality using Nextclade v0.14.2 (https://clades.nextstrain.org/). Single Nucleotide Polymorphisms (SNPs) and insertions/deletions in each sample were identified using snippy variant calling and core genome alignment pipeline v4.6.0 (https://github.com/tseemann/snippy), which uses FreeBayes v1.3.2 (30) to call variants and snpEff v5.0 (31) to annotate and predict their effects on genes and proteins. Genome map and SNP histogram were generated after running MAFFT v7.475 (32) alignment using the msastats.py script, and plotAlignment and plotSNPHist functions (33) . Sequence positions refer to GenBank . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) We identified global virus lineages using the dynamic nomenclature implemented in Pangolin v2.3.8 (26; https://github.com/cov-lineages/pangolin) and global clades and mutations using Nextclade v0.14.2 (https://clades.nextstrain.org/). We also used the Pathogenwatch (https://pathogen.watch/) and Microreact (35) to explore mutations and lineages across time and geography initially. All available SARS-CoV-2 genomes (1,048,519 sequences) were obtained from GISAID on April 26, 2021 and combined with our 56 sequences to obtain a global representative dataset. These sequences were subjected to analysis inside the NextStrain ncov pipeline (27; https://github.com/nextstrain/ncov). In this workflow, sequences were aligned using nextalign v0.1.6 (https://github.com/neherlab/nextalign). In the initial filtering step, short and low-quality sequences or those with incomplete sampling dates were excluded. Uninformative sites and ends (100 positions in the beginning, 50 in the end) were also masked from the alignment. Genetically closely related genomes to our focal subset were selected, prioritizing sequences geographically closer to Brazil's state RS. The maximum likelihood (ML) phylogenetic tree was built using IQ-TREE v2.1.2 (37) , employing the General time-reversible (GTR) model with unequal rates and base frequencies (38) . The tree's root was placed between lineage A and B (Wuhan/WH01/2019 and Wuhan/Hu-1/2019 representatives), and sequences that deviate more than four interquartile ranges from the root-to-tip regression of genetic distances against sampling dates were excluded from the analysis. A time-scaled ML tree was generated with TreeTime v0.8.1 (39) under a strict clock under a skyline coalescent prior with a mean rate of 8×10 -4 substitutions per site per year. Finally, clades and mutations were assigned and geographic movements inferred. The results were exported to JSON format to enable interactive visualization through Auspice. Additionally, as P.1 sequences mostly represent our dataset, we downloaded all complete and high-quality global genomes assigned to P.1 PANGO lineage (4,499 sequences) submitted until April 26th, 2021. These . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint sequences were aligned using MAFFT v7.475, the ends of the alignment (300 in the beginning and 500 in the end) were masked, and the ML tree was built with IQ-TREE v2.0.3 using the GTR+F+R3 nucleotide substitution model as selected by the ModelFinder (40) . Branch support was calculated using the Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) (41) with 1,000 replicates. Local sequences were classified according to the following scheme: monophyletic clades composed by one local genome were classified as "isolated", while clades composed by 2 < genomes < 4 were considered "clusters" and if ≥ 4 local genomes represented, we assigned a "clade" designation. ML trees were inspected in TempEst v1.5.3 (42) The MCMC chains were run in duplicates for at least 50 million generations, and convergence was checked using Tracer v1.7.1 (50) . Log and tree files were combined using . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint LogCombiner v1.10.4 to ensure stationarity and good mixing (43) after removing 10% as burn-in. Maximum clade credibility (MCC) was generated using TreeAnnotator v1. 10.4 (45) . Viral migrations were reconstructed using a reversible discrete asymmetric phylogeographic model (51) to infer the locations of the internal nodes of the tree. A discretization scheme with a resolution of different Brazilian states and other countries was applied. Location diffusion rates were estimated using the Bayesian stochastic search variable selection (BSSVS) (51) procedure employing Bayes factors to identify well-supported rates. Geographical maps and other plots were generated using R v3.6.1 (52) , and the ggplot2 v3.3.2 (53), geobr v. 1.4 (54) , and sf v0.9.8 (55) packages. For the discrete phylogeographic analysis, the SpreaD3 v0.9.7.1 (56) was used to map spatiotemporal information embedded in MCC trees. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure S2C ). Table 2 ). The only P.1 defining replacement not found at high frequency in our study was the deletion in ORF1ab (del:11288:9), called in only four genomes. This result is due to the stringent coverage depth filter applied (DP>50) for calling the genomic positions in the consensus sequences. After comparing the frequency of mutations from the recently sequenced samples and the Brazilian P.1 genomes, we observed a combination of mutations that stood out in a significant . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint proportion (n=11; 19.6%) compared with previous P.1 sequences from Brazil. This combination was previously described (17) and gave rise to the P.1.2 lineage, which harbors three ORF1ab replacements (synC1912T, D762G, T1820I), one in ORF3a (D155Y), and one in N protein (synC28789T) ( Table 2) . Additionally, two of these genomes (18.2%) carry T11296G (ORF1ab nsp6: F3677L) and 8 (72.7%) harbor G25641T (ORF3a: L83F) substitutions. Another cluster, made of four genomes and subsequently named Clade 2, possesses three defining mutations (ORF1ab nsp4: V2862L, synC10507T, ORF3a: M260K) was also detected. This cluster does not fall into a lineage designation at this moment but deserves further monitoring (Table 2, Figure S1 ). Considering PANGO lineages, 54 genomes (96.4%) were designated as P.1, one (1.8%) as P.2, and one (1.8%) as B.1.1.28. Even without being classified according to the Pangodesignation's most updated version, the P.1.2 lineage was present in 11/54 (20.4%) of the P.1 sequences (https://github.com/cov-lineages/pango-designation/issues/56) ( Figure S1 ). The RS state shares borders with Argentina in its west ( Figure S2 Figure S1 ). After running the Nextstrain workflow using quality control and subsampling approaches, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) October-November 2020, followed by the rise and establishment of P.2 and P.1, respectively ( Figure 3A ). To get a more detailed understanding of the P.1 diffusion throughout Rio Grande do Sul, other Brazilian regions, and worldwide countries, we built a ML tree of 4,499 genomes belonging to this lineage (Supplementary file 4). P.1 sequences from this study were allocated in several distinct branches, suggesting multiple introductions and the formation of different P.1-derived clades and clusters. We identified four clades, five clusters, and 13 isolated sequences ( Figure 3B , Supplementary file 5). Most importantly, clade 1 was composed of 11 sequences originated in this study that shared five lineage-defining mutations as previously described ( . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint synC1471T, A1049V [nsp3]) shared mutations. Among all identified clusters, the most diverse was cluster 5, which contains three samples from this study and has five defining mutations: four in ORF1ab (synT4705C, synC11095T, syn11518, T5541I [helicase]) and one in ORF7a: E16D. Moreover, two sequences share one distinct mutation (ORF1ab: F3677L [nsp6]). To date, the time of the most recent common ancestor (TMRCA) and the diffusion of the four P.1 clades identified in our ML analysis, we used coalescent and phylogeographic methods. another Brazilian state but its earlier representatives were not sampled. This is a strong hypothesis since this sequence is associated with community transmission after contact with tourists in a city of RS (Gramado) that receives numerous visitors annually (21) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint For clade 2, we estimated a median evolutionary rate of 5.×10 -4 (95% HPD: 4.18×10 -4 to 7.71×10 -4 subst/site/year), and the TMRCA was dated November 30, 2020 (95% HPD: November 2 to December 21, 2020). This clade includes sequences from 11 Brazilian states from all five regions and 9 other countries. We were able to detect at least five introductions from Amazonas, where this clade probably emerged. These introductions ranged from December 28, 2020 (95% HPD: December 28, 2020 to January 5, 2021) to January 28, 2021 (95% HPD: January 28 to March 7, 2021). Importantly, we identified a well-supported subclade (PP = 1) of four genomes from this study ( Figure 5A ). to January 19, 2021) ( Figure 5C ). Phylogenetic and molecular clock approaches suggest the wide circulation of the VOC P.1 both nationally and internationally between late 2020 and early 2021. This lineage has already diversified into some clades that bear characteristic mutations, although they exhibit similar . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint evolutionary rates. We have inferred that P.1 (and its derived clades) was introduced multiple times in the southernmost Brazilian state (RS) still in 2020, probably in December. Remarkably, this date is close to the first P.1 detection in Manaus, which is located ~4 thousand kilometers away. These early introductions led to the formation of local subclades that could be identified even using a reduced set of sequenced samples. In most RNA viruses, the accuracy of RNA replication is low, leading to the emergence of frequent nucleotide substitutions. However, since its very long RNA genomic strand requires higher fidelity replication in order to keep up genome integrity, coronaviruses behave differently in this regard. In SARS-CoV-2, for example, a proofreading mechanism increases 100 to 1,000 fold the fidelity of RNA synthesis through the activity of an exonuclease present in NSP14. This enzyme corrects nucleotide misincorporation during RNA duplication by the error-prone RNA-dependent RNA polymerase (57) (58) (59) . As a result, coronaviruses have lower rates of mutation than other RNA viruses that lack proofreading activity (58). After almost one year of relatively slow SARS-CoV-2 evolution, the emergence of multiple and convergent lineages harboring a constellation of mutations in the spike protein raised concern in the scientific community. This protein is present on the viral surface of SARS-CoV-2 and is composed of two subunits, S1 and S2. S1 contains the receptor binding domain (RBD), responsible for virus attachment at the cell surface, the N-Terminal Domain (NTD), also critical for . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint binding properties and the more conserved CTS1. S2 is an alpha-helix-rich subunit that contains HR1 and HR2, critical for viral-cell membrane fusion (58) . Spike protein, therefore, is responsible for mediating interaction with the human Angiotensin-Converting Enzyme 2 receptor (hACE2) and is a primary target of neutralizing antibodies and vaccine development (60) . The variants harboring different mutational signatures, including spike protein substitutions, were classified as VOCs and The presence of common substitutions in different SARS-CoV-2 lineages suggests co-evolutionary and convergent mutational processes (7-9,63). D614G is a substitution of aspartic acid to glycine at amino acid position 614 at the CTS1 segment of the S1 subunit. It emerged at the end of January 2020, first reported in April 2020, and quickly became prevailing in several B.1-derived lineages, including most VOCs and VOIs described thus far (64) (65) (66) (67) . This mutation is interesting, since it does not occur near RBD and thus does not directly modify the binding affinity for hACE-2. Instead, it disrupts important hydrogen bonds with neighbor S2, resulting in altered interprotomeric configurations. As a consequence, the active "one RBD up and two RBD down" is favored. This allows binding to ACE2 more effectively, leading to higher replication rates and viral loads (65) . However, G614 mutants are similarly (or even more) susceptible to immune neutralization than the original D614 variant (65, 68) . The substitution from asparagine to a tyrosine at position 501 (N501Y) has first appeared in the UK in September 2020. It is one of the six key contact residues within RBD interacting with ACE2 and has been associated with increased binding affinity to human and murine ACE2 due to the formation of an extra hydrogen bond with the ACE2 receptor (69) (70) (71) . Studies suggest that this mutation could enhance SARS-CoV-2 transmissibility and mortality (72) (73) (74) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint The substitution of glutamic acid for lysine at the 484 RBD position (E484K) creates a strong ion interaction with the amino acid 75 of ACE2 due to the electrostatic change from a negatively charged to a positively charged amino acid (75) . E484K can also lead to immune evasion since it is located at a flexible loop previously irrelevant for receptor binding when the original glutamic acid is in place. This may explain the enhanced dissemination and improved infectivity and dissemination of SARS-CoV-2 (76, 77) . In fact, as of May 17th 2021, 66, 405 sequences with the E484K mutation have been detected (62) . Recent reinfection cases of E484Kcontaining SARS-CoV-2 (78,79) and the recent proof of its fixation in different lineages (80) are suggestive that this mutation must be investigated in more detail. Despite the residue 452 does not directly contact the ACE2 receptor, L452 with the residues F490 and L492 form a hydrophobic patch on the surface of the spike RBD. This stabilizes the interaction between the spike protein and the ACE2 receptor and promotes an increased virus entry into the cell Moreover, lineages carrying this mutation seem to present a moderate resistance to neutralization by antibodies elicited by prior infection or vaccination (61) . The P681H mutation, in turn, has not its significance completely elucidated. The adjacent position to the furin cleavage site (five sites upstream of the arginine residues), which is needed for the cell membrane fusion, can potentially affect the viral infectivity. Furthermore, the independent appearance in different lineages suggests convergent evolution and a possible adaptive advantage since the acquisition of a multibasic S1/S2 cleavage site was essential for SARS-CoV-2 infection in humans (81, 82) . In the present study, we noticed that B. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021 (85) . Since the end of 2020, these two lineages lead the diversity of SARS-CoV-2 in Brazil (14) and have caused concern in other countries after several introductions. The emergence of a B.1.1.28 derived lineage carrying the S:E484K mutation (P.2) was dated, in a retrospective study, late February 2020 in the Southeast (São Paulo and Rio de Janeiro), followed by transmission to the South (especially RS). Since then, multiple dispersion routes were observed between Brazilian states, especially in late 2020 and early 2021 (15) . However, this lineage went unreported until October 2020, when it was first detected in the state of Rio de Janeiro (18) and in the small municipality of Esteio in RS (19) . The increased frequency of B.1.1.28 and derived lineages was corroborated by another study that included samples from several municipalities of RS in November 2020. This study found that 86% of the genomes could be classified as B.1.1.28 and ~50% of these, in fact, belong to the new lineage P.2 (20) . Nonetheless, our current study suggests that P.2 has already been nearly entirely replaced by the P.1 lineage or is not particularly well represented among the analyzed patients seeking emergency consultation or requiring hospitalization. Between June and October 2020, an extremely high seroprevalence (44-76%) was observed in Manaus (Amazonas, Brazil) in a study from blood donors (11) . However, despite these numbers, Manaus faced a resurgence of cases and a 6-fold increase in hospitalizations between December 2020 and January 2021. The most plausible hypotheses that would justify this condition are: (i) the previous overestimation of seroprevalence in Manaus, (ii) the immune evasion property of some SARS-CoV-2 mutations found in VOCs, and (iii) higher transmissibility and pathogenicity of SARS-CoV-2 lineages circulating in the second wave compared with pre-existing lineages (12) . A genomic epidemiology study that used 250 SARS-CoV-2 genomes from 25 different municipalities from Amazonas sampled between March 2020, and January 2021 shows that the first exponential phase in the state was driven mainly by the spread of lineage B.1.195, which was gradually replaced by B.1.1.28. The second wave coincided with the emergence of P.1 in November, which rapidly replaced the parental lineage (<2 months) (10) and whose emergence . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint was preceded by a period of rapid molecular evolution (9) . Importantly, rapid accumulation of mutations over short timeframes have been reported in chronically infected or immunocompromised hosts (86, 87) . However, preliminary findings pointed to the existence of P. Regarding infectiousness, transmissibility, and case fatality, the viral load was ~10-fold higher in P.1 infections than in non-P.1 infections (10) . Although another study points to uncertainties regarding viral load and duration of infection after accounting for confounding effects (9) . Moreover, it was estimated to be 1.7-2.4-fold more transmissible, raising the probability that reinfections would be caused more frequently in hosts infected by P.1 rather than by older lineages. Remarkably, infections were 1.2-1.9 times more likely to result in death in the period following the emergence of P.1 compared to previous time frames (9) . These findings support that successive lineage replacements in Amazonas were driven by a complex combination of factors, including the emergence of the more transmissible VOC P.1 virus (10). A study conducted in RS described a P.1 lineage infection on November 30th followed by a P.2 lineage reinfection on March 11th in a patient with comorbidities. This report was the first detected P.1 in the state (22) . Other analyses suggest that the P.1 lineage presumably emerged in Manaus, Brazil, in mid-November 2020 (9,10). Therefore, the P.1 lineage was present in Southern Brazil about just 15 days after the P.1 emergence in Brazil. Our molecular clock analysis supported this scenario. Another study, once thought to be the first P.1 report in RS, documented local transmission of P.1 from a person who had close contact with tourists and was positive to COVID-19 in early February 2021 (21) . This happened in the city of Gramado, a town on the mountains that receives around 6.5 million tourists every year and belongs to the Caxias do Sul intermediate region. Interestingly, this sample from Gramado was the earliest representative of a new P.1derived lineage (P.1.2), described in 11 patients from this study and found in transmission clusters . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint from the RJ state in Southeastern Brazil, the USA, and the Netherlands. Remarkably, these local sequences are more similar to genomes from other countries compared to the RJ cluster, which acquired at least four additional mutations (including S:A262S) (17). Whether P.1.2 has worse clinical outcomes than its prior variant (P.1) is unknown. However, as described above, the missense mutations characteristic of the new sublineage are located at nsp2 and nsp3 (ORF1ab), ORF3a, and Nucleocapsid. These sites are known for their interaction with human proteome, potentially influencing the immunological and inflammatory response against SARS-CoV-2 infection (89). The ORF3a:D155Y substitution is located near SARS-CoV caveolin-binding Domain IV. The binding interaction of viral ORF3a protein to host caveolin-1 is essential for entry and endomembrane trafficking of SARS-CoV-2. Since this mutation breaks the salt bridge formation between Asp155-Arg134, it can interfere with the binding affinity of ORF3a to host caveolin-1 and change virulence properties. Most importantly, this disrupted interaction may be associated with improved viral fitness, since it can avoid the induction of host cell apoptosis or extend the asymptomatic phase of infection (90) . We hypothesize that these new substitutions could, therefore, influence epidemiological and clinical outcomes favouring P.1.2 evolution. This is elusive at best at this time, however, and further sublineage characterization is needed to further exploit its real relevance. Some limitations should be considered. Firstly, the sample size is low and not necessarily representative of the RS state. Furthermore, publicly available genomes are a result of episodic sequencing efforts, especially in Brazil. This scenario restricts more precise inferences about introductions and diffusion processes in regional and worldwide contexts since samples are not geographical and temporally well distributed. Therefore, more research and surveillance are essential to unravel a more precise genomic characterization of SARS-CoV-2 in Brazil, identifying novel variants promptly to better respond and control its spread. In summary, our study corroborates the total virtual substitution of previous lineages by P.1 in Southern Brazil in COVID-19 cases sequenced in March 2020. Moreover, we confirmed various cases caused by the novel P.1.2 sublineage and placed its origin in the State of Rio Grande do Sul. The continuous evolution of the VOC P.1 is worrisome, considering its clinical and epidemiological impact, and warrants enhanced genomic surveillance. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint Full tables acknowledging the authors and corresponding labs submitting sequencing data used in this study can be found in Supplementary Files 3 and 4 . Consensus genomes generated in this study were deposited in the GISAID database under Accession IDs: EPI_ISL_2139494 to EPI_ISL_2139549. Additional information used and/or analysed during the current study are available from the corresponding author on reasonable request. The authors declare no competing interests. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint directly contributed to this study. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 22, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint . Root-to-tip regression is depicted on the left of the tree and sequences from "Other continents" were dropped to improve visualization. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (A) Root-to-tip regression of genetic distances and sampling dates for Clade 1. Correlation coefficient and R squared are depicted above the graph. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.18.21257420 doi: medRxiv preprint World Health Organization. WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Evolution and epidemic spread of SARS-CoV-2 in Brazil Rio Grande do Sul -Cidades e Estados Regiões Geográficas Governança e Gestão -Governo do Estado do Rio Grande do Sul. Cogestão Regional -Distanciamento Controlado Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Detection of a SARS-CoV-2 variant of concern in South Africa Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil COVID-19 epidemic in the Brazilian state of Amazonas was driven by long-term persistence of endemic SARS-CoV-2 lineages and the recent emergence of the new Variant of Concern P Threequarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence. The Lancet Brazilian Ministry of Health. Painel Coronavírus Brasil Mutation hotspots, geographical and temporal distribution of SARS-CoV-2 lineages in Brazil Genomic surveillance of SARS-CoV-2 tracks early interstate transmission of P.1 lineage and diversification within P.2 clade in Brazil. medRxiv Genomic Surveillance of SARS-CoV-2 in the State of Rio de Janeiro, Brazil: technical briefing -SARS-CoV-2 coronavirus / nCoV-2019 Genomic Epidemiology Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil Genomic Epidemiology of SARS-CoV-2 in Esteio Pervasive transmission of E484K and emergence of VUI-NP13L with evidence of SARS-CoV-2 co-infection events by two different lineages in Rio Grande do Sul, Brazil. Virus Res Epidemiological investigation reveals local transmission of SARS-CoV-2 lineage P.1 in Southern Brazil Early detection of SARS-CoV-2 P.1 variant in Southern Brazil and reinfection of the same patient by P.2 [Internet Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance World Health Organization. COVID-19 Clinical management: living guidance Genome Sequencing Using Long Pooled Amplicons on Illumina Platforms Fast gapped-read alignment with Bowtie 2 The Sequence Alignment/Map format and SAMtools A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinforma Oxf Engl Haplotype-based variant detection from short-read sequencing. ArXiv12073907 Q-Bio A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability Du Plessis L. laduplessis/SARS-CoV-2_Guangdong_genomic_epidemiology: Initial release A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genomics Nextstrain: real-time tracking of pathogen evolution IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies Some probabilistic and statistical problems in the analysis of DNA sequences TreeTime: Maximum-likelihood phylodynamic analysis ModelFinder: fast model selection for accurate phylogenetic estimates New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Syst Biol Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data Comparative Analyses of Phylogenetics and Evolution in R Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol [Internet] BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics Dating of the human-ape splitting by a molecular clock of mitochondrial DNA Bayesian analysis of elapsed times in continuous-time Markov chains Improving Bayesian Population Dynamics Inference: A Coalescent-Based Model for Multiple Loci Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 Bayesian Phylogeography Finds Its Roots R: A Language and Environment for Statistical Computing Austria: R Foundation for Statistical Computing Elegant Graphics for Data Analysis Loads Shapefiles of Official Spatial Data Sets of Brazil Simple Features for R: Standardized Support for Spatial Vector Data Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes A new coronavirus associated with human respiratory disease in China Structural basis and functional analysis of the SARS coronavirus nsp14-nsp10 complex The Curious Case of the Nidovirus Exoribonuclease: Its Role in RNA Synthesis and Replication Fidelity. Front Microbiol Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Transmission, infectivity, and neutralization of a spike L452R SARS-CoV-2 variant The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape. medRxiv The Genetic Variant of SARS-CoV-2: would It Matter for Controlling the Devastating Pandemic? The D614G mutations in the SARS-CoV-2 spike protein: Implications for viral infectivity, disease severity and vaccine design Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission Functional importance of the D614G mutation in the SARS-CoV-2 spike protein Higher infectivity of the SARS-CoV-2 new variants is associated with K417N/T, E484K, and N501Y mutants: An insight from structural data Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom Emergence of a new SARS-CoV-2 variant in the UK Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy The new SARS-CoV-2 strain shows a stronger binding affinity to ACE2 due to N501Y mutant Molecular dynamic simulation reveals E484K mutation enhances spike RBD-ACE2 affinity and the combination of E484K, K417N and N501Y mutations (501Y.V2 variant) induces conformational change greater than N501Y mutant alone, potentially resulting in an escape mutant. bioRxiv Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe Genomic Evidence of SARS-CoV-2 Reinfection Involving E484K Spike Mutation, Brazil. Emerg Infect Dis Severe Acute Respiratory Syndrome Coronavirus 2 P.2 Lineage Associated with Reinfection Case, Brazil E484K as an innovative phylogenetic event for viral evolution: Genomic analysis of the E484K spike mutation in SARS-CoV-2 lineages from Brazil. bioRxiv A tale of three SARS-CoV-2 variants with independently acquired P681H mutations in New York State. medRxiv A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Evolutionary Dynamics and Dissemination Pattern of the SARS-CoV-2 Lineage B.1.1.33 During the Early Pandemic Phase in Brazil Recurrent dissemination of SARS-CoV-2 through the Uruguayan-Brazilian border. medRxiv Phylogenetic relationship of SARS-CoV-2 sequences from Amazonas with emerging Brazilian variants harboring mutations E484K and N501Y in the Spike protein Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host SARS-CoV-2 evolution during treatment of chronic infection Identification of SARS-CoV-2 P.1-related lineages in Brazil provides new insights about the mechanisms of emergence of Variants of Concern -SARS-CoV-2 coronavirus / nCoV-2019 Genomic Epidemiology The variant gambit: COVID-19's next move D155Y Substitution of SARS-CoV-2 ORF3a Weakens Binding with Caveolin-1: An in silico Study. bioRxiv We thank the administrators of the GISAID database and research groups across the world for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic. We also thank the staff of Hospital da Brigada Militar, Laboratório Exame, Beppler & Puppi Advogados, Smellbox Produtos de Higiene Ltda., Dr. Leonardo Mestre Negri, Florense Brands, and Biome that