key: cord-0764884-hsx53gfm authors: Paz, Mercedes; Aldunate, Fabián; Arce, Rodrigo; Ferreiro, Irene; Cristina, Juan title: An evolutionary insight into Severe Acute Respiratory Syndrome Coronavirus 2 Omicron variant of concern. date: 2022-03-22 journal: Virus Res DOI: 10.1016/j.virusres.2022.198753 sha: 6890239aa6a542543478dae1e3189814699376b7 doc_id: 764884 cord_uid: hsx53gfm Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel virus that belongs to the family Coronaviridae. This virus produces a respiratory illness known as coronavirus disease 2019 (COVID-19) and is to blame for the pandemic of COVID-19. Due to its massive circulation around the world and the capacity of mutation of this virus, genomic studies are much needed in to order to reveal new variants of concern (VOCs). On November 26th, 2021, the WHO announced that a new SARS-CoV-2 VOC, named Omicron, had emerged. In order to get insight into the emergence, spread and evolution of Omicron SARS-CoV-2 variants, a comprehensive phylogenetic study was performed. The results of these studies revealed significant differences in codon usage among the S genes of SARS-CoV-2 VOCs Alfa, Beta, Gamma, Delta and Omicron, which can be linked to SARS-CoV-2 genotypes. Omicron variant did not evolve out of one of the early VOCs, but instead it belongs to a complete different genetic lineage from previous ones. Strains classified as Omicron variants evolved from ancestors that existed around May 15th, 2020, suggesting that this VOC may have been circulating undetected for a period of time until its emergence was observed in South Africa. A rate of evolution of 5.61 × 10(−4) substitutions/site/year was found for Omicron strains enrolled in these analyses. The results of these studies demonstrate that S genes have suitable genetic information for clear assignment of emerging VOCs to its specific genotypes. In December, 2019, a pandemic of coronavirus disease China . This pandemic is caused by a virus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the infection by this virus leads to a severe respiratory pneumonia (Gorbalenya et al., 2020) . As December 14 th , 2021, there have been more than 270 million confirmed cases worldwide and the global deaths of SARS-CoV-2 disease surpasses 5 million people (WHO, 2020b) . SARS-CoV-2 possess a single stranded, positive-sense RNA genome of approximately 30 kilobases in length, which encodes for multiple structural and non-structural proteins. The structural proteins include the spike (S) protein, the envelope (E) protein, the membrane (M) protein, and the nucleocapsid (N) protein . The replication cycle of SARS-CoV-2 starts when it infects epithelial cells using the viral S protein to bind host angiotensin-converting enzyme 2 (ACE2) and by this means fusing with cell membrane to gain cell entry (Hoffmann et al., 2020; . From the beginning of this pandemic several therapies and preventive health cares were developed, such as clinically applied monoclonal antibodies (Weinreich et al., 2021) or vaccinations (Wang et al., 2021) , and both were successfully used to neutralize the virus. However, the emergence of variants of concern (VOCs) with substitutions in the S protein may make these therapies and vaccines to reduce its efficiency by escaping to anti-SARS-CoV-2 antibodies (Davies et al., 2021) . Previous studies have identified four VOCs currently circulating in the human population: VOC Alpha (B.1.1.7, first identified in the United Kingdom); VOC Beta (B.1.351, first identified in South Africa); VOC Gamma (P.1, first identified in Brazil) and VOC Delta (B.1.617.2, first isolated in India). By November 24 th , 2021, a new SARS-CoV-2 VOC, now known as VOC Omicron (B.1.1.529) was first identified in South Africa (Wang & Chen, 2021) . The emergence of Omicron VOC raised a concern that this variant may reduce the efficacy of the anti-SARS-CoV-2 induced-antibodies or be more transmissible (Callaway, 2021; Zhang et al., 2021) . In order to better understand the emergence, spread and evolution of Omicron SARS-CoV-2 variants, a comprehensive phylogenetic study was performed. Available and comparable complete S gene sequences of 159 Omicron SARS-CoV-2 strains isolated from November 13 th to December 2 nd , 2021, in South Africa, Ghana, Singapore, Portugal, Netherlands, Mexico, Malaysia, Japan, Hong Kong, Germany, Canada, Botswana, Israel, Ireland, Belgium, Austria, Australia, South Korea, USA, Sri Lanka and Switzerland, were used throughout these studies. These sequences were aligned with corresponding sequences from 97 Alfa, Beta, Gamma and Delta SARS-CoV-2 strains isolated elsewhere. Sequences were obtained from the Global Initiative on Sharing Avian Influenza Data (GISAID) database. For accession numbers, country of origin and date of isolation, see Supplementary Material Table 1 . Sequences were aligned using MAFFT version 7 program (Kato et al., 2019). In order to capture local and global patterns of virus genetic diversity in a timely and coherent manner, we employed Pangolin COVID-19 genetic lineage strain assignment (Rambaut et al., 2020) . Nucleotide frequencies and codon usage of S proteins from SARS-CoV-2 variants were calculated using the program CodonW (written by John Peden) as implemented in the Galaxy server version 1.4.4 (Afgan et al., 2018) . The relationship between compositional variables and samples was obtained using Principal Component Analysis (PCA). Singular value decomposition (SVD) method was used to calculate the PCA method. The unit variance was used as the scaling method. This means that all variables are scaled so that they will be equally important (variance = 1) when finding the components. By the same approach, Heatmaps were also constructed, which is a data matrix for visualizing values in the dataset by the use of a color gradient. Rows and/or columns of the matrix are clustered so that sets of rows or columns rather than individual ones can be interpreted. PCA and Heatmaps analysis were done using the ClustVis program (Metsalu and Vilo, 2015) . To reconstruct the evolutionary history of Omicron SARS-CoV-2 strains, a Bayesian Markov Chain Monte Carlo (MCMC) approach was used as implemented in the BEAST package v2.5.2 (Bouckaert et al., 2019) . First, the evolutionary model that best fit the sequence dataset was determined using the IQ-TREE program (Trifinopoulos et al., 2016) . Bayesian information criterion (BIC), Akaike information criterion (AIC), and the log of the likelihood (LnL) were used to identify the best model. Both strict and relaxed molecular clock models were used to test different dynamic models (constant population size, exponential population growth, Bayesian Skyline and Birth-Death Skyline Serial). Statistical uncertainty in the data was reflected by the 95 % highest probability density (HPD) values. Results were examined using the TRACER v1.7.2 program (available from http://beast.bio.ed.ac.uk/Tracer). Convergence was assessed by effective sample sizes (ESS) above 200. Models were compared by AICM from the likelihood output of each of the models. Maximum clade credibility trees were generated by means of the use of the Tree Annotator program from the BEAST package. Visualization of the annotated trees was done using the FigTree program v1.4.4 (available at: http://tree.bio.ed.ac.uk). In order to gain insight into the trends of evolution of the S protein, codon usage frequencies of 256 S genes from SARS-CoV-2 strains belonging to VOCs Alpha, Beta, Gamma, Delta and Omicron were determined and PCA and Heatmap analysis were performed (for strains included in these analyses, see Supplementary Material Table 1 ). This analysis revealed significant differences in codon usage among the S genes of SARS-CoV-2 VOCs, which can be linked to SARS-CoV-2 genotypes (see Fig. 1A and B). Average linkage suggest that VOCs Alfa, Beta, Gamma and Delta have a closer genetic relation among themselves and a more distant genetic relation with Omicron variants (see Fig. 1B , upper part). Moreover, average linkage also shows that Omicron isolates are not identical and heterogeneity can be observed (see Fig. 1B ). In order to study if these trends of evolution of the VOC's S genes are due to differences in nucleotide composition, an analysis of nucleotide frequencies for first, second and third codon positions were established for the complete S genes from VOCs SARS-CoV-2 variants included in these analyses and PCA and Heatmap analysis was performed (see Fig. 2 ). A significant bias in nucleotide composition frequencies was found among VOCs S proteins. In order to reconstruct the evolutionary history of Omicron SARS-CoV-2 population, a Bayesian MCMC approach was employed (Bouckaert et al., 2019) using 89 available and comparable full-length genomes by December 2 nd , 2021, (for isolates included in these analyses see Supplementary Material Table 1 ). The results shown in Table 1 are the outcome of 20 million steps of the MCMC, using the HKY+I nucleotide model, a strict molecular clock and the Birth-Death Skyline Serial population model. The results of these studies suggest that strains classified as Omicron variants evolved from ancestors that existed around May 15 th , 2020. A mean rate of evolution of 5.61 x 10 -4 substitutions/site/year (s/s/y) was found for Omicron strains enrolled in these analyses (95 % high probability density values of 3.051 x 10 -4 to 9.014 x 10 -4 s/s/y). This is in agreement with previous estimations of SARS-CoV-2 rate of evolution of SARS-CoV-2 populations (6.57 x 10 -4 s/s/y, Castells et al., 2020; 7 .80 x 10 -4 s/s/y, Lai et al., 2020; 9.90 x 10 -4 s/s/y, Nie et al., 2020; 3 .0 x 10 -4 s/s/y, Simmonds, 2020; 1.60 x 10 -3 s/s/y, Bai et al., 2020; 1.19-1.31 x 10 -3 s/s/y, . To study the phylogenetic relations among Omicron variants, maximum clade credibility trees were generated using software from the BEAST package (Rambaut et al., 2020) . The results of these studies are shown in Fig. 3 . Strains isolated in South Africa belong to two different genetic clades, revealing that Omicron variants have diversified into two distinct genetic groups in that country. Strains isolated in Malaysia and India conform another genetic group (see Fig. 3 ). Inside main cluster of South African strains, strains isolated in Singapore, Hong Kong, Europe, Brazil and the USA are observed, suggesting a rapid spread of this lineage in different continents. In these studies, PCA and Heatmap analysis revealed correlation among codon usage and genotypes in the S protein from VOC's strains (see Fig. 1 ). These results demonstrate that S genes have suitable genetic information for clear assignment of emerging VOCs to its specific genotypes (Fig. 1) . These differences are also related to biases in nucleotide composition among VOC's strains (Fig. 2) . These findings highlight the latent diversity of SARS-CoV-2 that has yet to be fully explored. This is also in agreement with similar results found in other members of the family Coronvaviridae (Kumar et al., 2021) . Bayesian coalescent analysis revealed that Omicron strains evolved from ancestors that existed by May 15 th , 2020. This result revealed that this VOC may have been circulating undetected for a period of time until its emergence was observed in South Africa (Petersen et al., 2021) . This is in agreement with recent results suggesting that Omicron variants have been circulating much longer than previously anticipated (Kandeel et al., 2021) . The earliest known case of Omicron in South Africa was a patient diagnosed with COVID-19 on Nov 9, 2021 (Karim & Karim, 2021) . The results of these studies revealed that South African's Omicron strains evolved from ancestors that circulated around October 10 th , 2021, just before it emerged in the South African population (see Table 1 ). Nevertheless, strains isolated at Singapore and Hong Kong can be traced to one of the main genetic lineages of Omicron, suggesting that although Omicron strains were first detected in South Africa, they may have also been circulating in other regions of the world undetected (see Fig. 3 ). Besides, Omicron strains isolated in South Africa can be assigned to two different genetic groups (see Fig. 3 ). This is in agreement with recent results suggesting that Omicron strains diversified in two different sub-lineages in that country (Wang & Chen, 2021) . The results of these studies revealed that Omicron variant did not evolve out of one of the early VOC's. Instead, it belongs to a complete different genetic lineage from previous VOC's. These results highlight the possibility of Omicron (or other VOCs currently unknown) may change the course of the pandemic. Whether the virus could have been circulating undetected in countries with little surveillance and sequences capacities is currently unknown (Kupferschmidk, 2021) . More studies will be needed to address these important questions. While this manuscript was in the review process, very recent studies revealed that Omicron SARS-CoV-2 viruses have diversified in three different genetic lineages: BA.1, composed by the first emerging viruses isolated in South Africa, Botswana, and elsewhere and being enrolled in the analyses shown in this work; and two more genetic lineages, BA.2, which have been remained minoritarian in several countries although its incidence became predominant in Demark (Houhamdi et al., 2022) and BA.3 . In order to gain insight into this recent evolutionary process, the same strains enrolled in these studies were aligned with comparable BA.2 and BA.3 strains recently isolated in different countries (for strains, accession numbers, date and country of isolation, see Supplementary Material Table 2 ). Then, the same Bayesian MCMC approach depicted in these studies were performed and their results are shown in Supplementary Material Fig. 1 . As it can be seen in the figure, the same conclusions about the origin and phylogenetic relations of BA.1 strains were obtained. Moreover, BA.2 and BA.3 strains belong to different genetic lineages in agreement with recent results . More studies will be needed in order to establish the origin and evolution of BA.2 and BA.3 strains. The results of these studies revealed significant differences in codon usage among the S genes of SARS-CoV-2 VOCs, which can be linked to SARS-CoV-2 genotypes. These results demonstrate that S genes have suitable genetic information for clear assignment of emerging VOCs to its specific genotypes. VOCs Alfa, Beta, Gamma and Delta have a closer genetic relation among themselves and a more distant genetic relation with Omicron variants. Omicron variants did not evolve out of one of the early VOCs, but instead it belongs to a complete different genetic lineage from previous VOCs. A significant bias in nucleotide composition frequencies was found among VOCs S proteins. Omicron variants evolved from ancestors that existed around May 15th, 2020 suggesting that this VOC may have been circulating undetected for a period of time until its emergence was observed in South Africa. Strains isolated in South Africa belong to two different genetic clades, suggesting that Omicron variants have diversified into two distinct genetic groups in that country. Author individual contributions: Mercedes Paz, Fabián Aldunate, Rodrigo Arce, Irene Ferreiro were involved in the research, data curation and visualization of the results obtained. Juan Cristina was involved in conceptualization, methodology, data curation and wrote the original draft of the manuscript. All authors read and approved the final manuscript. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes reveal its epidemic trends BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis Beyond Omicron: what's next for COVID's viral evolution Evidence of increasing diversification of emerging Severe Acute Respiratory Syndrome Coronavirus 2 strains Emerging coronaviruses: genome structure, replication, and pathogenesis First cases of infection with the 21L/BA.2 Omicron variant in Marseille, France Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Bayesian coalescent inference of past population dynamics from molecular sequences Tracking of variants Severe acute respiratory syndrome-related coronavirus: The species and its viruses -a statement of the Coronavirus Study Group Theory and Applications of Correspondence Analysis A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Characteristics of the first 1119 SARS-CoV-2 Omicron variant cases Omicron variant genome evolution and phylogenetics Omicron SARS-CoV-2 variant: a new chapter in the COVID-19 pandemic MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization Where did "weird" Omicron come from? Evolutionary Signatures Governing the Codon Usage Bias in Coronaviruses and Their Implications for Viruses Infecting Various Bat Species Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2 Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Clustvis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap Phylogenetic and phylodynamic analyses of SARS-CoV-2 Emergence of new SARS-CoV-2 Variant of Concern Omicron (B.1.1.529) -highlights Africa's research capabilities, but exposes major knowledge gaps, inequities of vaccine distribution, inadequacies in global COVID-19 response and control efforts A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other Coronaviruses: Causes and Consequences for Their Short-and Long-Term Evolutionary Trajectories Sequence analysis of the Emerging Sars-CoV-2 Variant Omicron in South Africa World Health Organization. 2021a. Coronavirus disease 2019 (COVID-19) Weekly epidemiological update on COVID-19 World Health Organization. 2021b. WHO 2nd Global consultation on assessing the impact of SARS-CoV-2 variants of concern on Public health interventions Domains and Functions of Spike Protein in SARS-Cov-2 in the Context of Vaccine Design Domains and Functions of Spike Protein in SARS-Cov-2 in the Context of Vaccine Design The significant immune escape of pseudotyped SARS-CoV-2 Variant Omicron This research was funded by Agencia Nacional de Investigación e Innovación and PEDECIBA, Uruguay. We acknowledge Comisión Sectorial de Investigación Científica, Universidad de la República, Uruguay, for support through Grupos I + D grant. We gratefully acknowledge the Originating and Submitting Laboratories for sharing newly identified coronavirus sequences through GISAID. We thank Drs. Pilar Moreno and Gonzalo Moratorio for critical reading and support.