key: cord-1006337-u0tu0zvg
authors: Franceschi, Vinícius Bonetti; Caldana, Gabriel Dickin; Perin, Christiano; Horn, Alexandre; Peter, Camila; Cybis, Gabriela Bettella; Ferrareze, Patrícia Aline Gröhs; Rotta, Liane Nanci; Cadegiani, Flávio Adsuara; Zimerman, Ricardo Ariel; Thompson, Claudia Elizabeth
title: Predominance of the SARS-CoV-2 Lineage P.1 and Its Sublineage P.1.2 in Patients from the Metropolitan Region of Porto Alegre, Southern Brazil in March 2021
date: 2021-08-05
journal: Pathogens
DOI: 10.3390/pathogens10080988
sha: ce5ad62a0e5f99f9622ceceec60ca0700371e8fe
doc_id: 1006337
cord_uid: u0tu0zvg

Almost a year after the COVID-19 pandemic had begun, new lineages (B.1.1.7, B.1.351, P.1, and B.1.617.2) associated with enhanced transmissibility, immunity evasion, and mortality were identified in the United Kingdom, South Africa, and Brazil. The previous most prevalent lineages in the state of Rio Grande do Sul (RS, Southern Brazil), B.1.1.28 and B.1.1.33, were rapidly replaced by P.1 and P.2, two B.1.1.28-derived lineages harboring the E484K mutation. To perform a genomic characterization from the metropolitan region of Porto Alegre, we sequenced viral samples to: (i) identify the prevalence of SARS-CoV-2 lineages in the region, the state, and bordering countries/regions; (ii) characterize the mutation spectra; (iii) hypothesize viral dispersal routes by using phylogenetic and phylogeographic approaches. We found that 96.4% of the samples belonged to the P.1 lineage and approximately 20% of them were assigned as the novel P.1.2, a P.1-derived sublineage harboring signature substitutions recently described in other Brazilian states and foreign countries. Moreover, sequences from this study were allocated in distinct branches of the P.1 phylogeny, suggesting multiple introductions in RS and placing this state as a potential diffusion core of P.1-derived clades and the emergence of P.1.2. It is uncertain whether the emergence of P.1.2 and other P.1 clades is related to clinical or epidemiological consequences. However, the clear signs of molecular diversity from the recently introduced P.1 warrant further genomic surveillance.

After its initial emergence in Wuhan (China) in late 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread rapidly around the world leading to the February 2021 [25] . However, in a more recent study, the actual first P.1 was detected on 30 November 2020. This happened in a patient with comorbidities from Campo Bom city, who was reinfected by the P.2 lineage on 11 March 2021 [26] .

Even though RS was one of the least affected Brazilian states in the first epidemic wave, it suffered a pronounced increase in cases in late 2020 [16] . In February 2021, the progressive increases in cases and hospitalizations (3.8-fold) led to the collapse of the local state healthcare system. Since recent findings of the widespread dissemination of the SARS-CoV-2 lineage P.1 in Brazil have been confirmed, we sequenced samples from patients from the metropolitan region of Porto Alegre to: (i) identify the prevalence of SARS-CoV-2 lineages in the region, the state, and bordering countries/regions; (ii) characterize the mutation spectra; (iii) hypothesize possible viral dispersal routes by using phylogenetic and phylogeographic approaches.

Of the 56 samples of hospitalized patients between March 9 and 17 2021, 75.0% (n = 42) of them were male, and the mean age was 37.2 years (interquartile range (IQR): 13.5 years). The mean cycle threshold (Ct) value for the first RT-qPCR conducted at Laboratório Exame was 19.12 cycles (median: 18.00; IQR: 6.00 cycles). Forty-seven (83.9%) had contact with a confirmed or suspected case. The majority of them were from the RS state capital, Porto Alegre (n = 32; 57.1%). In total, 51 (91.1%) were from the intermediate geographic region of Porto Alegre and 5 (8.9%) from the intermediate region of Santa Maria (Table 1 and Figure S2C ).

Consensus SARS-CoV-2 genomes were obtained with an average coverage depth of 813.2× (median: 820.6×; IQR: 184.7×) ( Figure S6 ). We detected 175 different mutations comprising all samples ( Figure 1A ). The ORF1ab carried 102 (58.3%) replacements followed by spike (n = 24; 13.7%), nucleocapsid (n = 18; 10.3%), ORF3a (n = 14, 8.0%), ORF7a (n = 6; 3.43%), ORF8 (n = 5; 2.86%), and membrane (n = 3; 1.7%) genes. Remarkably, 50% of the spike substitutions occurred in only one genome, and, of these, nine (75.0%) were missense (Table S1 ). Fifty-nine (33.7%) mutations were identified in two or more sequences. From these, 36 (61.0%) are non-synonymous (missense), 21 (35.6%) are synonymous, 1 (1.7%) is intergenic at 5 Untranslated Region (UTR), and 1 (1.7%) is an inframe deletion. Highly frequent (≥10 genomes) mutations were found in 34 genomic positions, 24 (70.5%) being missense and 9 (26.5%) synonymous. Fifteen substitutions (10 in the spike protein: L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y, and T1027I) are P.1 lineagedefining mutations ( Figure 1B and Table 2 ). The only P.1 defining replacement not found at high frequency in our study was the deletion in ORF1ab (del 11288:11296), called in only four genomes. Deletions overlapping annealing sites of amplicon primers are associated with a strong decrease in the PCR efficiency prior to sequencing, leading to low genomic coverage [27] . Then, after applying a stringent coverage depth filter (DP > 50) for calling the genomic positions in the consensus sequences, this deleted region was flagged as low coverage.

Most importantly, other positions presenting single nucleotide polymorphisms (SNPs) reached the appropriate threshold, since a point mutation is generally unable to cause dropouts.

After comparing the frequency of mutations from the recently sequenced samples and the Brazilian P.1 genomes, we observed a combination of mutations that stood out in a significant proportion (n = 11; 19.6%) compared with previous P.1 sequences from Brazil. This combination was previously described [21] and gave rise to the P.1.2 lineage, which harbors three ORF1ab replacements (synC1912T, D762G, and T1820I), one in ORF3a (D155Y), and one in N protein (synC28789T) ( Table 2) . Additionally, two of these genomes (18.2%) carry T11296G (ORF1ab nsp6: F3677L) and eight (72.7%) harbor G25641T (ORF3a: L83F) substitutions. Another cluster, made of four local genomes and subsequently named Clade 2, was also detected. This clade possesses three defining mutations (ORF1ab nsp4: V2862L, synC10507T, and ORF3a: M260K), but it does not fall into a lineage designation at this moment but deserves further monitoring ( Figure S1 ). found at high frequency in our study was the deletion in ORF1ab (del 11288:11296), called in only four genomes. Deletions overlapping annealing sites of amplicon primers are associated with a strong decrease in the PCR efficiency prior to sequencing, leading to low genomic coverage [27] . Then, after applying a stringent coverage depth filter (DP > 50) for calling the genomic positions in the consensus sequences, this deleted region was flagged as low coverage. Considering PANGO lineages, 54 genomes (96.4%) were designated as P.1, one (1.8%) as P.2, and one (1.8%) as B.1.1.28. Even without being classified according to the Pangodesignation's most updated version, the P.1.2 lineage was present in 11/54 (20.4%) of the P.1 sequences (https://github.com/cov-lineages/pango-designation/issues/56; accessed on 6 May 2021) ( Figure S1 ). (Figure 2A (Porto Alegre, Canoas, São Sebastião do Caí, and Santa Maria), demonstrating the possible diversification of P.1 and its spread within RS ( Figure 2C and Figure S1 ). After dividing RS into the intermediate regions proposed by IBGE ( Figure S2 ), it was possible to gain insights into the dynamics of the lineages in the state, despite the low sample size of some regions ( Figure 2C ). In most regions, the lineages B.1.1.28 and B.1.1.33 were more prevalent, but P.2 was also detected. In fact, in the Caxias do Sul region, more P.2 (n = 31; 49.2%) were sequenced in relation to other lineages. Since the Porto Alegre region has a larger sample size, we divided its results by year to check the most recent (2021) evolutionary abundance. In 2020, B.1.1.33 (n = 229; 49.3%) and B.1.1.28 (n = 137, 29.5%) were the most abundant, followed by P.2 (n = 37; 8.0%). In 2021, P.1 (n = 26, 68.4%) and P.2 (n = 6, 15.8%) have already outperformed the other lineages ( Figure 2C ). In our study from March 2021, 96.4% of the samples were classified as P.1. We were able to identify a new P.1 sublineage (P.1.2) in 11 (20.4%) genomes from four different municipalities (Porto Alegre, Canoas, São Sebastião do Caí, and Santa Maria), demonstrating the possible diversification of P.1 and its spread within RS ( Figure 2C and Figure S1 ).

After running the Nextstrain workflow using quality control and subsampling approaches, we obtained a dataset of 8635 time-and geographical-representative genomes. From these, 861 were from Africa, 1370 from Asia, 2219 from Europe, 481 from North America, 218 from Oceania, and 3486 from South America. Brazil was represented by 2608 sequences and RS state by 730 sequences (56 from this study and 674 available in GISAID) ( Table S2) .

The time-resolved ML phylogenetic tree confirmed the PANGO lineages assigned, since 54 genomes (96.4%) grouped with P.1 representatives, 1 (1.8%) with B.1.1.28, and 1 (1.8%) with P.2 sequences. We also observed a strong correlation between genetic distances and sampling dates (R 2 = 0.71). The P.1 sequences were grouped above the regression line, showing higher evolutionary rates than the other lineages in the SARS-CoV-2 phylogeny, as observed in other studies [11, 14] . We highlighted the most abundant global lineages present in RS state that passed the quality control criteria (B.1.1 (n = 32), P.1 (n = 83), P.2 (n = 83), B.1.1.28 (n = 203), and B.1.1.33 (n = 286)). We also noticed the high abundance of B.1.1.28 and B.1.1.33 lineages until October and November 2020, followed by the rise and establishment of P.2 and P.1, respectively ( Figure 3A ).

The only B.1.1.28 sequence identified in this study (RS-HBM-39491) branched in a clade represented by 30 sequences, mostly represented by Southeastern Brazilian (n = 17; 56.6%) genomes. This clade is supported by the ORF1ab:synC15810T mutation and includes a subclade characterized by the ORF1ab:L4182F mutation, where the local sequence is placed together with four samples from São Paulo (SP), one from Portugal, and one from Chile ( Figure S4 ). Most importantly, this local genome harbors seven other mutations: ORF1ab:T2087I (nsp3), D3022N (nsp4), N3970S (nsp8), V4436A (RNA-dependent RNA polymerase), synC13724T, synG18973A, and intergenic:G29736T. Additionally, the only P.2 sequence from this study (RS-HBM-39486) formed a separate clade composed of 20 sequences from several Brazilian states (10 from RS, 1 from Paraná, 1 from SP, and 1 from Maranhão), 5 from Argentina, 1 from USA, and 1 from Norway ( Figure S5 ). This clade is characterized by the ORF1ab:synT6218C (nsp3) mutation. Moreover, this local sequence accrued seven specific mutations: ORF1ab:synA7201G, S2926F (nsp4), V6871A (2 -O-ribose methyltransferase), S:G1251V, ORF7a:G38V, N:synC28333T, and intergenic:G29688T.

To get a more detailed understanding of the P.1 diffusion throughout Rio Grande do Sul, other Brazilian regions, and worldwide, we built an ML tree of 4499 genomes belonging to this lineage (Table S3) . P.1 sequences from this study were allocated into several distinct branches, suggesting multiple introductions and the formation of different P.1-derived clades and clusters.

Pathogens 2021, 10, x FOR PEER REVIEW 10 of 23 synG10096A, G3676S (nsp6), F3677L (nsp6)) and M:synT26861C. This clade is mainly found in SP (n = 11), RS (n = 7), Santa Catarina (n = 5), BA (n = 4), and Goiás (n = 4), as well as other countries (mainly USA, Chile, and England). Methods) . Root-to-tip regression is depicted on the left of the tree and sequences from "other continents" were dropped to improve visualization.

We identified 4 clades, 5 clusters, and 13 isolated sequences ( Figure 3B and Figure S7 ). Most importantly, Clade 1 had high branch-support (SH-aLRT = 76.3) and was composed of 11 sequences originated in this study that shared five lineage-defining mutations as previously described (Table 2 ) and were recently attributed to the P.1.2 sublineage (https: //github.com/cov-lineages/pango-designation/issues/56; accessed on 6 May 2021). As of April 26, 2021, this sublineage is already distributed worldwide in 93 sequences (the Netherlands, Spain, England, and USA) and in other Brazilian states (Rio de Janeiro (RJ) and SP) [21] . Clade 2 sequences harbored two mutations in ORF1ab:V2862L (nsp4) and synC10507T and one in ORF3a:M260K, and it comprised 81 genomes. Four samples are from this study. The majority are from Amazonas (n = 15), São Paulo (n = 11), RS (n = 8), and Bahia (BA) (n = 4), and worldwide sequences are mainly represented by French Guiana, USA, Spain, Japan, and Jordan. Clade 3 is represented by three ORF1ab mutations (synC1420T, D1600N [nsp3], and synT8392A) in three of the seven local genomes. It is composed of sequences from RS (n = 25), SP (n = 15), Maranhão (n = 10), and RJ (n = 8), as well as other countries (mainly Spain, French Guiana, and USA). Clade 4 is characterized by two ORF1ab substitutions (G400S (nsp2) and S6822I (2 -O-ribose methyltransferase)), one N:synT26861C in three genomes, and carries other additional mutations (ORF1ab: synG10096A, G3676S (nsp6), F3677L (nsp6)) and M:synT26861C. This clade is mainly found in SP (n = 11), RS (n = 7), Santa Catarina (n = 5), BA (n = 4), and Goiás (n = 4), as well as other countries (mainly USA, Chile, and England).

Clusters 1 and 3 have, respectively, one (ORF1ab: G3676S (nsp6)) and two (ORF1ab: synC1471T and A1049V (nsp3)) shared mutations. Among all identified clusters, the most diverse was Cluster 5, which contains three samples from this study and has five defining mutations: four in ORF1ab (synT4705C, synC11095T, syn11518, and T5541I (helicase)) and one in ORF7a: E16D. Moreover, two sequences share one distinct mutation (ORF1ab: F3677L (nsp6)).

To date the time of the most recent common ancestor (TMRCA) and the diffusion of the four P.1 clades identified in our ML analysis, we used coalescent and phylogeographic methods. For Clade 1, which is correspondent to the recently labeled P.1.2 lineage, sequences showed a moderate correlation of genetic distances and sampling dates (correlation coefficient: 0.59, R 2 = 0.34) ( Figure 4A ). We estimated a median evolutionary rate of 7.68 × 10 −4 (95% highest posterior density interval [HPD]: 4.18 × 10 −4 to 1.14 × 10 −3 subst/site/year) and the TMRCA on 18 December 2020 (95% HPD: 29 October 2020 to 31 January 2021). Interestingly, the tree's root was placed in RS, between a sequence from RS (the oldest sequence from this clade: EPI_ISL_983865) and a subclade from USA. The divergence of these subclades was dated on 15 January 2021 (95% HPD: 15 January to 26 March 2021). The subclade composed of the RS sequences formed two separate clusters, one with three sequences from this study and one Australian genome and another composed of sequences from RS, SP, UK, Portugal, USA, and transmission clusters from RJ and Netherlands ( Figure 4B ). The emergence of an important cluster in RJ carrying additional mutations [21] was dated here on 11 March 2021 (95% HPD: 11 March to 6 April 2021). As American sequences formed a separate subclade, local transmission is probably occurring in the country. The divergence of the American subclade was For Clade 2, we estimated a median evolutionary rate of 5.85 × 10 −4 (95% HPD: 4.18 × 10 −4 to 7.71 × 10 −4 subst/site/year), and the TMRCA was dated November 30, 2020 (95% HPD: November 2 to December 21, 2020). This clade includes sequences from 11 Brazilian states from all 5 regions and 9 other countries. We were able to detect at least five introductions from Amazonas, where this clade probably emerged. These introductions However, it is possible that this lineage emerged in another Brazilian state, but its earlier representatives were not sampled. This is a strong hypothesis since this sequence is associated with community transmission after contact with tourists in a city of RS (Gramado) that receives numerous visitors annually [25] .

For Clade 2, we estimated a median evolutionary rate of 5.85 × 10 −4 (95% HPD: 4.18 × 10 −4 to 7.71 × 10 −4 subst/site/year), and the TMRCA was dated 30 November 2020 (95% HPD: 2 November to 21 December 2020). This clade includes sequences from 11 Brazilian states from all 5 regions and 9 other countries. We were able to detect at least five introductions from Amazonas, where this clade probably emerged. These introductions ranged from December 28, 2020 (95% HPD: 28 December 2020 to 5 January 2021) to 28 January 2021 (95% HPD: 28 January to 7 March 2021). Importantly, we identified a well-supported subclade (PP = 1) of four genomes from this study ( Figure 5A ). 

In this study, the analysis of 56 samples from the state of Rio Grande do Sul (RS), Southern Brazil, confirmed that the P.1 lineage was already highly prevalent. Interestingly, we demonstrated that P.1 is already showing signs of diversification and has originated a new sublineage (P.1.2). Herein, we indicate the likely origin and the first clusters For Clade 3, the TMRCA was estimated on 20 December 2020 (95% HPD: November 25 to 29 December 2020) and the median evolutionary rate was 7.85 × 10 −4 (95% HPD: 6.06 × 10 −4 to 1.02 × 10 −3 subst/site/year). This clade harbors sequences from 9 Brazilian states and 10 other countries. Amazonas is the most probable source of its emergence. From then onwards, multiple transmission clusters were established in foreign countries (e.g., Spain, Portugal, and USA) and Brazilian states (especially Maranhão, SP, and RS). This clade was introduced at least 5 times in RS, leading to 2 major subclades represented by 18 and 4 sequences, respectively. The major subclade (n = 18, PP = 0.98) was dated 11 January 2021 (95% HPD: 11 January to 1 February 2021) ( Figure 5B ).

For Clade 4, the TMRCA was dated on 2 December 2020 (95% HPD: 7 October 2020 to 3 January 2021), and the median evolutionary rate was 6.26 × 10 −4 (95% HPD: 3.51 × 10 −4 to 1.01 × 10 −3 ). This clade comprises nine Brazilian states and five foreign countries. After its initial emergence and spread in Amazonas, it had already formed transmission clusters in SP, BA, United Kingdom, and USA. Most importantly, a subclade containing sequences from two neighboring states from Southern Brazil (seven from RS and five from Santa Catarina (SC)) indicates its diffusion from RS to SC, probably leading to two separate introductions. The divergence of this subclade was estimated on 16 December 2020 (95% HPD: 16 December 2020 to 19 January 2021) ( Figure 5C ).

Phylogenetic and molecular clock approaches suggest the wide circulation of the VOC P.1 both nationally and internationally between late 2020 and early 2021. This lineage has already diversified into some clades that bear characteristic mutations, although they exhibit similar evolutionary rates. We inferred that P.1 (and its derived clades) was introduced multiple times in the southernmost Brazilian state (RS) between mid-December 2020 and January 2021. Remarkably, this date is close to the first P.1 detection in Manaus, which is located~4000 km away. These early introductions led to the formation of local subclades that could be identified even using a reduced set of sequenced samples.

In this study, the analysis of 56 samples from the state of Rio Grande do Sul (RS), Southern Brazil, confirmed that the P.1 lineage was already highly prevalent. Interestingly, we demonstrated that P.1 is already showing signs of diversification and has originated a new sublineage (P.1.2). Herein, we indicate the likely origin and the first clusters of this novel lineage. This sublineage was detected in three Brazilian states, and other countries, and its most recent common ancestor was dated on mid-December, 2020 (95% HPD: 29 October 2020 to 31 January 2021). In accordance with the majority of the states from Brazil, this state experienced significant increases in hospitalizations in early 2021. This scenario was related to the emergence and rapid spread of the P.1 variant across the country.

After almost one year of relatively slow SARS-CoV-2 evolution, the emergence of multiple and convergent lineages harboring a constellation of mutations in the spike protein raised concern in the scientific community. This protein is responsible for mediating interaction with the human Angiotensin-Converting Enzyme 2 receptor (hACE2) and is a primary target of neutralizing antibodies and vaccines [28] . The variants harboring different mutational signatures, including spike protein substitutions, were classified as VOCs and "variants of interest" (VOIs), depending on their growing relevance in the current pandemic. The first three VOCs emerged in England (B.1.1.7) [8] , South Africa (B.1.351) [9] , and Brazil (P.1) [10] . More recently, B.1.617.2 (India) [12, 13] also was characterized as a VOC. By May 2020, B.1.427/429 [29] , B.1.526 (New York, USA), B.1.617 (India), and P.2 (Brazil) [22] were categorized as VOIs. B.1.1.7, B.1.351, and P.1, the most studied VOCs, have the D614G and N501Y mutations in common. B.1.351 and P.1 share a mutation in the K417 site (K417N and K417T, respectively) and the E484K replacement, which is also observed in the P.2 lineage. Additionally, B.1.1.7 carries the P681H substitution in the furin-cleavage site and multiple VOIs bear the L452R substitution [7] . The presence of common substitutions in different SARS-CoV-2 lineages suggests co-evolutionary and convergent mutational processes [8] [9] [10] 30 ].

In the present study, we noticed that B.1.1.33 and B.1.1.28 lineages, detected at the beginning of the pandemic in Brazil [2] , had been similarly prevalent in different regions until September 2020, before the appearance of P.2 (in October) and P.1 (in December 2020). The B.1.1.33 lineage shows variable abundance in different Brazilian states (ranging from 2% in Pernambuco to 80% in Rio de Janeiro), with moderate prevalence in South American countries (5-18%) . Surprisingly, this lineage was firstly detected in early March 2020 in other American countries (e.g., Argentina and USA). Apparently, an intermediate strain probably emerged in Europe and subsequently spread to Brazil, where its spread gave rise to B.1.1.33 [31] and possibly triggered secondary outbreaks in Argentina and Uruguay [31, 32] . We found that N.3 and N.5, both derived from B.1.1.33, represented an important proportion of the sequences from Argentina from May to December 2020, when it was replaced by the P.2 lineage, which probably emerged in Rio de Janeiro (Southeastern Brazil). The B.1.1.28 lineage, despite apparently being less abundant than B.1.1.33 in several Brazilian regions, quickly diversified into two variants: VOC P.1 and VOI P.2 [33] . Since the end of 2020, these two lineages have lead the diversity of SARS-CoV-2 in Brazil [17] and have caused concern in other countries after several introductions. Regarding the distribution of sequenced samples across RS state, the cumulative frequency of B.1.1.28 and B.1.1.33 was higher until mid-April 2021 [16] . However, since the end of 2020 and beginning of 2021, a rise in P.1 and P.2 sequences was observed. Our study supported that P.1 outperformed other lineages in RS as of March 2021, although the collection of samples in hospitalized patients and low geographic representativeness does not allow the extrapolation of these findings.

The emergence of a B.1.1.28 derived lineage carrying the S:E484K mutation (P.2) was dated, in a retrospective study, late February 2020 in the Southeast (São Paulo and Rio de Janeiro), followed by transmission to the South (especially RS). Since then, multiple dispersion routes were observed between Brazilian states, especially in late 2020 and early 2021 [18] . However, this lineage went unreported until October 2020, when it was first detected in the state of Rio de Janeiro [22] and in the small municipality of Esteio in RS [23] . The increased frequency of B.1.1.28 and derived lineages was corroborated by another study that included samples from several municipalities of RS in November 2020. This study found that 86% of the genomes could be classified as B.1.1.28 and~50% of these, in fact, belong to the new lineage P.2 [24] . Nonetheless, our current study suggests that P.2 has already been nearly entirely replaced by the P.1 lineage or is not particularly well represented among the analyzed patients seeking emergency consultation or requiring hospitalization.

Between June and October 2020, an extremely high seroprevalence (44-76%) was observed in Manaus (Amazonas, Brazil) in a study from blood donors [14] . However, despite these numbers, Manaus faced a resurgence of cases and a six-fold increase in hospitalizations between December 2020 and January 2021. The most plausible hypotheses that would justify this condition are: (i) the previous overestimation of seroprevalence in Manaus; (ii) the immune evasion property of some SARS-CoV-2 mutations found in VOCs; (iii) higher transmissibility and pathogenicity of SARS-CoV-2 lineages circulating in the second wave compared with pre-existing lineages [15] .

A genomic epidemiology study that used 250 SARS-CoV-2 genomes from 25 different municipalities from Amazonas sampled between March 2020 and January 2021 shows that the first exponential phase in the state was driven mainly by the spread of lineage B.1.195, which was gradually replaced by B.1.1.28. The second wave coincided with the emergence of P.1 in November, which rapidly replaced the parental lineage (<2 months) [11] and whose emergence was preceded by a period of rapid molecular evolution [10] . Importantly, rapid accumulation of mutations over short timeframes have been reported in chronically infected or immunocompromised hosts [34, 35] . However, preliminary findings pointed to the existence of P.1 intermediate lineages, suggesting that the constellation of mutations defining P.1 were acquired at sequential steps during multiple rounds of infections instead of within a single long-term infected individual [36] . The VOC P.1 carries three deletions, four synonymous substitutions, a four base-pair nucleotide insertion, and at least 17 other lineage-defining replacements, including 10 missense mutations in the spike protein (L18F,  T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y, and T1027I) , 8 of which are subjected to positive selection [10] .

Regarding infectiousness, transmissibility, and case fatality, the viral load was~10-fold higher in P.1 infections than in non-P.1 infections [11] . Although another study points to uncertainties regarding viral load and duration of infection after accounting for confounding effects [10] . Moreover, it was estimated to be 1.7-2.4-fold more transmissible, raising the probability that reinfections would be caused more frequently in hosts infected by P.1 rather than by older lineages. Remarkably, infections were 1.2-1.9 times more likely to result in death in the period following the emergence of P.1 compared to previous time frames [10] . These findings support that successive lineage replacements in Amazonas were driven by a complex combination of factors, including the emergence of the more transmissible VOC P.1 virus [11] .

A study conducted in RS described a P.1 lineage infection on 30 November 2020 followed by a P.2 lineage reinfection on 11 March 2021 in a patient with comorbidities. This report was the first detected P.1 in the state [26] . Other analyses suggest that the P.1 lineage presumably emerged in Manaus, Brazil, in mid-November 2020 [10, 11] . Therefore, the P.1 lineage was present in Southern Brazil about a few days after its emergence in Manaus, Northern Brazil. Our molecular clock analysis supported this scenario. Another study, once thought to be the first P.1 report in RS, documented local transmission of P.1 from a person who had close contact with tourists and was positive for COVID-19 in early February 2021 [25] . This happened in the city of Gramado, a town in the mountains that receives around 6.5 million tourists every year and belongs to the Caxias do Sul intermediate region. Interestingly, this sample from Gramado was the earliest representative of a new P.1-derived lineage (P.1.2), described in 11 patients from our study and found in transmission clusters from the RJ state in Southeastern Brazil, USA, and the Netherlands. Remarkably, our local sequences are more similar to genomes from other countries compared to the RJ cluster, which acquired at least four additional mutations (including S:A262S) [21] .

Whether P.1.2 has worse clinical outcomes than its prior variant (P.1) is unknown. However, as described above, the missense mutations characteristic of the new sublineage are located at nsp2 and nsp3 (ORF1ab), ORF3a, and nucleocapsid. These sites are known for their interaction with human proteome, potentially influencing the immunological and inflammatory response against SARS-CoV-2 infection [37] . The ORF3a:D155Y substitution is located near SARS-CoV caveolin-binding Domain IV. The binding interaction of viral ORF3a protein to host caveolin-1 is essential for entry and endomembrane trafficking of SARS-CoV-2. Since this mutation breaks the salt bridge formation between Asp155 and Arg134, it can interfere with the binding affinity of ORF3a to host caveolin-1 and change virulence properties. Most importantly, this disrupted interaction may be associated with improved viral fitness, since it can avoid the induction of host cell apoptosis or extend the asymptomatic phase of infection [38] . We hypothesize that these new substitutions could, therefore, influence epidemiological and clinical outcomes favoring P.1.2 evolution. This is elusive at best at this time, however, and further sublineage characterization is needed to further explore its real relevance.

Some limitations should be considered. Firstly, the sample size is low and not necessarily representative of RS state. Considering the number of sequences from each intermediate region in RS available in GISAID, it is very likely that the distribution seen on the map ( Figure 2C ) is a consequence of sampling at different times in these localities or simple randomness. Thus, it should not be assumed as a true representation of the spatial diversity in the state. Since publicly available genomes are a result of episodic sequencing efforts, especially in Brazil, more precise inferences about introductions and diffusion processes in regional and worldwide contexts are restricted due to the lack of proper geographical and temporal distribution of the samples. Therefore, more research and surveillance are essential to unravel a more precise genomic characterization of SARS-CoV-2 in Brazil, promptly identifying novel variants to better respond and control its spread.

In summary, our study corroborates the total virtual substitution of previous lineages by P.1 in Southern Brazil in COVID-19 cases sequenced in March 2020. Moreover, we confirmed various cases caused by the novel P.1.2 sublineage and placed its origin in the state of Rio Grande do Sul. The continuous evolution of the VOC P.1 is concerning, considering its clinical and epidemiological impact, and warrants enhanced genomic surveillance.

Samples were obtained from Hospital da Brigada Militar patients, both admitted or visiting the emergency ward, from Porto Alegre, RS, Brazil. Nasopharyngeal swabs were collected and placed in saline solution. Samples were transported to the clinical laboratory (Laboratório Exame) and tested on the same day for SARS-CoV-2 using Real-Time Reversetranscriptase Polymerase Chain Reaction (Charité RT-qPCR assays). The RTq-PCR assay used primers and probes recommended by the World Health Organization targeting the nucleocapsid (N1 and N2) genes [39] . Remnant samples were stored at −20 • C.

Between 9 March and 17 March 2021, all routinely tested samples of the clinical laboratory provenient of the Hospital da Brigada Militar patients and yielded positive RT-qPCR were selected. Subsequently, those positive clinical samples were submitted to a second RT-qPCR performed by BiomeHub (Florianópolis, Santa Catarina, Brazil), using the same protocol (charite-berlin). Only samples with quantification cycle (Cq) below 30 for at least one primer were submitted to the SARS-CoV-2 genome sequencing. In total, 56 patients who presented symptoms such as fever, cough, sore throat, dyspnea, anosmia, fatigue, diarrhea, and vomiting (moderate and severe clinical status) [40] were included in the study.

Total RNAs were prepared as in the reference protocol [41] using SuperScript IV (Invitrogen, Carlsbad, CA, USA) for cDNA synthesis and Platinum Taq High Fidelity (Invitrogen, Carlsbad, CA, USA) for specific viral amplicons. Subsequently, cDNA was used for the library preparation with Nextera Flex (Illumina, San Diego, CA, USA) and quantified with Picogreen and Collibri Library Quantification Kit (Invitrogen, Carlsbad, CA, USA). The sequencing was performed on the Illumina MiSeq (Illumina, San Diego, CA, USA) 150 × 150 runs with 500xSARS-CoV-2 coverage (50-100 thousand reads/sample).

Quality control, reference mapping, and consensus calling were performed using an in-house pipeline developed by BiomeHub (Florianópolis, Santa Catarina, Brazil). Briefly, adapters were removed, and reads were trimmed by size = 150. Reads were mapped to the reference SARS-CoV-2 genome (GenBank accession number NC_045512.2) using Bowtie v2.4.2 (end-to-end and very-sensitive parameters) [42] . Mapping coverage and depth were retrieved using samtools v1.11 [43] (minimum base quality per base (Q) ≥ 30). Consensus sequences were generated using bcftools mpileup (Q ≥ 30; depth (d) ≤ 1000) combined with bcftools filter (DP > 50) and bcftools consensus v1.11 [44] . Coverage values for each genome were plotted using the karyoploteR v1.12.4 R package [45] . Finally, we assessed the consensus sequences quality using Nextclade v0.14.2 (https://clades.nextstrain.org/; accessed on 4 May 2021).

SNPs and insertions/deletions in each sample were identified using snippy variant calling and core genome alignment pipeline v4.6.0 (https://github.com/tseemann/snippy; accessed on 4 May 2021), which uses FreeBayes v1.3.2 [46] to call variants and snpEff v5.0 [47] to annotate and predict their effects on genes and proteins. Genome map and SNP histogram were generated after running MAFFT v7.475 [48] alignment using the msastats.py script, and plotAlignment and plotSNPHist functions [49] . Sequence positions refer to GenBank RefSeq sequence (NC_045512.2), isolated and sequenced from an early case from Wuhan (China) in 2019.

We identified global virus lineages using the dynamic nomenclature implemented in Pangolin v2.3.8 [50] (https://github.com/cov-lineages/pangolin; accessed on 4 May 2021) and global clades and mutations using Nextclade v0.14.2 (https://clades.nextstrain.org/; accessed on 4 May 2021). We also used Pathogenwatch (https://pathogen.watch/; accessed on 4 May 2021) and Microreact [51] to explore mutations and lineages across time and geography initially.

All available SARS-CoV-2 genomes (1,048,519 sequences) were obtained from GISAID on April 26, 2021 and combined with our 56 sequences to obtain a global representative dataset. These sequences were subjected to analysis inside the NextStrain ncov pipeline [52] (https://github.com/nextstrain/ncov; accessed on 4 May 2021). In this workflow, sequences were aligned using Nextalign v0.1.6 (https://github.com/neherlab/nextalign; accessed on 4 May 2021). In the initial filtering step, short and low-quality sequences or those with incomplete sampling dates were excluded. Uninformative sites and ends (100 positions in the beginning and 50 in the end) were also masked from the alignment. Genetically closely related genomes to our focal subset were selected, prioritizing sequences geographically closer to Brazil's RS state. The maximum likelihood (ML) phylogenetic tree was built using IQ-TREE v2.1.2 [53] , employing the general time-reversible (GTR) model with unequal rates and base frequencies [54] . The tree's root was placed between lineage A and B (Wuhan/WH01/2019 and Wuhan/Hu-1/2019 representatives), and sequences that deviate more than four interquartile ranges from the root-to-tip regression of genetic distances against sampling dates were removed from the analysis. A time-scaled ML tree was generated with TreeTime v0.8.1 [55] under a strict clock and a skyline coalescent prior with a mean rate of 8 × 10 −4 substitutions per site per year. Finally, clades and mutations were assigned and geographic movements inferred. The results were exported to JSON format to enable interactive visualization through Auspice.

Additionally, as P.1 sequences mostly represent our dataset, we downloaded all complete and high-quality global genomes assigned to P.1 PANGO lineage (4499 sequences) submitted until 26 April 2021. These sequences were aligned using MAFFT v7.475, the ends of the alignment (300 in the beginning and 500 in the end) were masked, and the ML tree was built with IQ-TREE v2.0.3 using the GTR + F + R3 nucleotide substitution model as selected by the ModelFinder [56] . Branch support was calculated using the Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) [57] with 1000 replicates.

Local sequences were classified according to the following scheme: monophyletic clades composed by one local genome were classified as "isolated", while clades composed by 2 < genomes < 4 were considered "clusters", and, if ≥ 4 local genomes were represented, we assigned a "clade" designation.

ML trees were inspected in TempEst v1.5.3 [58] to investigate the temporal signal through regression of root-to-tip genetic divergence against sampling dates. For the P.1 ML tree, samples with missing days of the collection were filled with the 15th day of the month. ML and time-resolved trees were visualized using FigTree v1.4.4 (http: //tree.bio.ed.ac.uk/software/figtree/; accessed on 4 May 2021) and ggtree R package v2.0.4 [59] .

Considering the four identified clades composed of ≥4 sequences from this study, we extracted the clade members using the caper R package v1.0.1 [60] . Clade-specific ML trees and root-to-tip regression assignments were generated as described above. Evolutionary parameter estimates and spatial diffusion were assessed separately for each clade using a Bayesian Markov Chain Monte-Carlo (MCMC) approach implemented in BEAST v10.4 [61] . The BEAGLE library [62] was used to enhance computational time. Time-stamped Bayesian trees were generated using the HKY + Γ nucleotide model [63] , a strict molecular clock model with a Continuous Time Markov Chain (CTMC) rate reference prior [64] (mean rate = 8 × 10 −4 ) and a non-parametric skygrid tree prior [65] with grid points defined by the approximate number of weeks spanned by the duration of the phylogeny.

The MCMC chains were run in duplicates for at least 50 million generations, and convergence was checked using Tracer v1.7.1 [66] . Log and tree files were combined using LogCombiner v1.10.4 to ensure stationarity and good mixing after removing 10% as burn-in. Maximum clade credibility (MCC) was generated using TreeAnnotator v1.10.4 [61] . Viral migrations were reconstructed using a reversible discrete asymmetric phylogeographic model [67] to infer the locations of the internal nodes of the tree. A discretization scheme with a resolution of different Brazilian states and other countries was applied. Location diffusion rates were estimated using the Bayesian stochastic search variable selection (BSSVS) [67] procedure employing Bayes factors to identify well-supported rates.

Geographical maps and general plots were generated using R v3.6.1 [68] , and the ggplot2 v3.3.2 [69] , geobr v.1.4 [70] , and sf v0.9.8 [71] packages. For the discrete phylogeographic analysis, SpreaD3 v0.9.7.1 software [72] was used to map spatiotemporal information embedded in MCC trees.

Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/pathogens10080988/s1. Figure S1 . Spatiotemporal distribution of the 56 sequenced samples from RS. Figure S2 . Neighbouring countries, Brazilian divisions and RS intermediate regions. Figure S3 . Proportion of the 10 most frequent lineages of SARS-CoV-2 across time in four Brazilian regions. Figure S4 . ML tree augmenting the B.1.1.28 clade where the local sequence (RS-HBM-39491) was placed. Figure S5 . ML tree augmenting the P.2 clade where the local sequence (RS-HBM-39486) was placed. Figure S6 . Coverage depth plots for each genome sequenced. Figure S7 . Expanded ML tree built using 4499 P.1 sequences deposited in GISAID until 26 April 2021. Tips represented by sequences sequenced in this study are augmented to enable visualization of the different introductions, clusters, and clades reported. Table S1 . Collection of all mutations (n = 175) found in the 56 sequenced genomes, including effects on SARS-CoV-2 genes and proteins. The table is ordered by genomic position. Table S2 . GISAID acknowledgment table of the 8635 global sequences used in the Nextstrain workflow. Table S3 . GISAID acknowledgment table of the 4499 P.1 sequences analyzed. Supplementary File S1. Data and code used to reproduce the results presented. Institutional Review Board Statement: Ethical approval was obtained from the Comitê de Ética em Pesquisa em Seres Humanos da Universidade Federal de Ciências da Saúde de Porto Alegre (CEP-UFCSPA) under process number CAAE 35083220.2.0000.5345. The study was performed in accordance with the Declaration of Helsinki. All samples belonging to the Hospital da Brigada Militar patients that yielded positive RT-qPCR had their laboratory electronic records reviewed to compile metadata such as date of collection, sex, age, symptoms, exposure history, and clinical status, when available. Samples were anonymized before being received by the study investigators, following Brazilian and international ethical standards.

Informed Consent Statement: This study obtained a waiver of informed consent approved by Comitê de Ética em Pesquisa em Seres Humanos da Universidade Federal de Ciências da Saúde de Porto Alegre (CEP-UFCSPA) under process number CAAE 35083220.2.0000.5345. Data Availability Statement: Full tables acknowledging the authors and corresponding labs submitting sequencing data used in this study can be found in Files S3 and S4. Consensus genomes generated in this study were deposited in the GISAID database under Accession IDs: EPI_ISL_2139494 to EPI_ISL_2139549. Data and code used to reproduce the results presented are available in Supplementary Materials.

World Health Organization. WHO Director-General's Opening Remarks at the Media Briefing on COVID-19-11

Evolution and Epidemic Spread of SARS-CoV-2 in Brazil

IBGE (Brazilian Institute of Geography and Statistics) Regiões Geográficas

Governança e Gestão-Governo do Estado do Rio Grande do Sul Cogestão Regional-Distanciamento Controlado

Preliminary Genomic Characterisation of an Emergent SARS-CoV-2 Lineage in the UK Defined by a Novel Set of Spike Mutations

Detection of a SARS-CoV-2 Variant of Concern in South Africa

COVID-19 Epidemic in the Brazilian State of Amazonas Was Driven by Long-Term Persistence of Endemic SARS-CoV-2 Lineages and the Recent Emergence of the New Variant of Concern P

Pathogens 2021

Genomic Characterization and Epidemiology of an Emerging SARS-CoV-2 Variant in Delhi

The SARS-CoV-2 Variants Associated with Infections in India, B.1.617, Show Enhanced Spike Cleavage by Furin

Three-Quarters Attack Rate of SARS-CoV-2 in the Brazilian Amazon during a Largely Unmitigated Epidemic

Resurgence of COVID-19 in Manaus, Brazil, despite High Seroprevalence

Brazilian Ministry of Health Painel Coronavírus Brasil

Mutation Hotspots, Geographical and Temporal Distribution of SARS-CoV-2 Lineages in Brazil

Genomic Surveillance of SARS-CoV-2 Tracks Early Interstate Transmission of P.1 Lineage and Diversification within P.2 Clade in Brazil

Global Initiative on Sharing All Influenza Data-From Vision to Reality

Genomic Surveillance of SARS-CoV-2 in the State of Rio de Janeiro, Brazil: Technical Briefing-SARS-CoV-2 Coronavirus/NCoV-2019 Genomic Epidemiology

Genomic Characterization of a Novel SARS-CoV-2 Lineage from Rio de Janeiro, Brazil

Genomic Epidemiology of SARS-CoV-2 in Esteio

Pervasive Transmission of E484K and Emergence of VUI-NP13L with Evidence of SARS-CoV-2 Co-Infection Events by Two Different Lineages

Epidemiological Investigation Reveals Local Transmission of SARS-CoV-2 Lineage P.1 in Southern Brazil

Early Detection of SARS-CoV-2 P.1 Variant in Southern Brazil and Reinfection of the Same Patient by P.2

Recommendations for Accurate Genotyping of SARS-CoV-2 Using Amplicon-Based Sequencing of Clinical Samples

Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein

Transmission, Infectivity, and Neutralization of a Spike L452R SARS-CoV-2 Variant

The Emergence and Ongoing Convergent Evolution of the N501Y Lineages Coincides with a Major Global Shift in the SARS-CoV-2 Selective Landscape

Evolutionary Dynamics and Dissemination Pattern of the SARS-CoV-2 Lineage B.1.1.33 During the Early Pandemic Phase in Brazil

Recurrent Dissemination of SARS-CoV-2 through the Uruguayan-Brazilian Border

Phylogenetic Relationship of SARS-CoV-2 Sequences from Amazonas with Emerging Brazilian Variants Harboring Mutations E484K and N501Y in the Spike Protein

Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host

SARS-CoV-2 Evolution during Treatment of Chronic Infection

Identification of SARS-CoV-2 P.1-Related Lineages in Brazil Provides New Insights about the Mechanisms of Emergence of Variants of Concern-SARS-CoV-2 Coronavirus / NCoV-2019 Genomic Epidemiology

The Variant Gambit: COVID-19 s next Move

D155Y Substitution of SARS-CoV-2 ORF3a Weakens Binding with Caveolin-1: An in Silico Study

Detection of 2019 Novel Coronavirus (2019-NCoV) by Real-Time RT-PCR. Eurosurveillance 2020

World Health Organization COVID-19 Clinical Management: Living Guidance

SARS-CoV-2 Genome Sequencing Using Long Pooled Amplicons on Illumina Platforms

Fast Gapped-Read Alignment with Bowtie 2

The Sequence Alignment/Map Format and SAMtools

A Statistical Framework for SNP Calling, Mutation Discovery, Association Mapping and Population Genetical Parameter Estimation from Sequencing Data

KaryoploteR: An R/Bioconductor Package to Plot Customizable Genomes Displaying Arbitrary Data

Haplotype-Based Variant Detection from Short-Read Sequencing

A Program for Annotating and Predicting the Effects of Single Nucleotide Polymorphisms

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Laduplessis/SARS-CoV-2_Guangdong_Genomic_Epidemiology

A Dynamic Nomenclature Proposal for SARS-CoV-2 Lineages to Assist Genomic Epidemiology

Visualizing and Sharing Data for Genomic Epidemiology and Phylogeography

Nextstrain: Real-Time Tracking of Pathogen Evolution

IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences

Maximum-Likelihood Phylodynamic Analysis

Fast Model Selection for Accurate Phylogenetic Estimates

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

Exploring the Temporal Structure of Heterochronous Sequences Using TempEst (Formerly Path-O-Gen)

Ggtree: An r Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data

Caper: Comparative Analyses of Phylogenetics and Evolution in R (R Package Version 1.0.1)

Bayesian Phylogenetic and Phylodynamic Data Integration Using BEAST 1.10

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA

Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains

Improving Bayesian Population Dynamics Inference: A Coalescent-Based Model for Multiple Loci

Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7

Bayesian Phylogeography Finds Its Roots

R: A Language and Environment for Statistical Computing

R Foundation for Statistical Computing

Elegant Graphics for Data Analysis

Loads Shapefiles of Official Spatial Data Sets of Brazil

Simple Features for R: Standardized Support for Spatial Vector Data

Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes

We thank the administrators of the GISAID database and research groups across the world for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic. We also thank the staff of Hospital da Brigada Militar, Laboratório Exame, Beppler & Puppi Advogados, Smellbox Produtos de Higiene Ltd.a., Leonardo Mestre Negri, and BiomeHub Pesquisa e Desenvolvimento who directly contributed to this study.

Dr. Cadegiani has served as a clinical director for Applied Biology, Inc. The other authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.