key: cord-1029891-0ae8elfk authors: Pereson, Matías J.; Flichman, Diego M.; Martínez, Alfredo P.; Baré, Patricia; Garcia, Gabriel H.; Di Lello, Federico A. title: Evolutionary analysis of SARS‐CoV‐2 spike protein for its different clades date: 2021-02-09 journal: J Med Virol DOI: 10.1002/jmv.26834 sha: a48ec40e3c026380b86e81bbef1da5aa9e1478ad doc_id: 1029891 cord_uid: 0ae8elfk The spike protein of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has become the main target for antiviral and vaccine development. Despite its relevance, e information is scarse about its evolutionary traces. The aim of this study was to investigate the diversification patterns of the spike for each clade of SARS‐CoV‐2 through different approaches. Two thousand and one hundred sequences representing the seven clades of the SARS‐CoV‐2 were included. Patterns of genetic diversifications and nucleotide evolutionary rate were estimated for the spike genomic region. The haplotype networks showed a star shape, where multiple haplotypes with few nucleotide differences diverge from a common ancestor. Four hundred seventy‐nine different haplotypes were defined in the seven analyzed clades. The main haplotype, named Hap‐1, was the most frequent for clades G (54%), GH (54%), and GR (56%) and a different haplotype (named Hap‐252) was the most important for clades L (63.3%), O (39.7%), S (51.7%), and V (70%). The evolutionary rate for the spike protein was estimated as 1.08 × 10(−3) nucleotide substitutions/site/year. Moreover, the nucleotide evolutionary rate after nine months of the pandemic was similar for each clade. In conclusion, the present evolutionary analysis is relevant as the spike protein of SARS‐CoV‐2 is the target for most therapeutic candidates; besides, changes in this protein could have consequences on viral transmission, response to antivirals and efficacy of vaccines. Moreover, the evolutionary characterization of clades improves knowledge of SARS‐CoV‐2 and deserves to be assessed in more detail as re‐infection by different phylogenetic clades has been reported. Patterns of genetic diversifications for both genomic regions S and RBD for each clade were analyzed using the median-joining reconstruction method with the PopART v1.7.2 software. 16 Haplotypes shared among all clades were analyzed in Arlequin 3.5.2.2 software. 17 Polymorphism indices were calculated separately for each clade with DnaSPv. 6.12.01. 18 The estimation of the nucleotide evolutionary rate for the entire S-coding region datasets was carried out with the Beast v1.8.4 program package 19 at the CIPRES Science Gateway server. 20 The temporal calibration was established by the samples' date of sampling. The best nucleotide substitution model was selected according to the Bayesian information criterion method in IQ-TREE v1.6.12 software. 21 The analysis was performed under a relaxed (uncorrelated lognormal) molecular clock model recommended previously by Duchene & col. 22 with an exponential demographic model. 23 Analyses were run for 8 × 10 6 generations and sampled every 8 × 10 5 steps. The convergence of the "meanRate" and "allMus" parameters (effective sample size [ESS] ≥ 200, burn-in 10%) was verified with Tracer v1.7.1. 24 The obtained substitution rate was probed against 10 independent replicates of the analysis with the time calibration information (date of sampling) randomized as described by Rieux and Khatchikian. 25 3 | RESULTS Three-hundred sequences were randomly selected for each clade. Two thousand and one hundred sequences were curated and selected for the analysis. Table 1 shows the SARS-CoV-2 sequences included for every month and clade. The haplotype networks ( Figure 1 ) reflect the diversity indices results as a star shape with multiple haplotypes with a few T A B L E 1 Number of SARS-CoV-2 sequences from GISAID database on September 2020, by month and clade as per the selection criteria (temporal structure) Table 3 shows the frequency of each haplotype with amino acid changes. The haplotype diversity was moderate to high in every clade, ranging from Hd = 0.507 to 0.793 (Table 2 ). In contrast, nucleotide diversity was relatively low for each clade, ranging between π = 0.0018 for V and π = 0.0040 for O ( (Table 4) . A date-randomization analysis showed no overlapping between the 95% HPD substitution-rate intervals obtained from real data and from date-randomized datasets for all clades ( Figure 2 ). The data set for the clade L did not reach convergence (ESS < 200). To verify the reliability of the result, 10 independent runs were performed. All of them converged in a similar posterior distribution. Likewise, for many of the random sample datasets, convergence was not achieved (ESS between 100 and 200). For those datasets that did not reach convergence, two independent runs were carried out and concatenated. 26 When the evolutionary rate was analyzed according to the emergence of each clade, founding clades (L, O, S, and V) tended to present evolutionary rates slightly slower than the more recent clades (G, GH, and GR), (p = .157). The evolutionary characterization of the spike genomic region of SARS-CoV-2 is crucial to estimate the course that re-infections, vaccines, and therapeutics would have in the pandemic's future. In At the beginning of the pandemic, the most prevalent clades were L, O, V, and S. Later, with the appearance of the D614G mutation in the S protein, clade G emerged and remained with a high and stable prevalence. After this initial step, the GR clade has emerged and grown until it became the most prevalent. Finally, the GH clade peaked at 30% in May 2020 and then began to decrease. 3 In this sense, it is important to highlight that clades with the mutation D614G in the S protein (clades G, GH, and GR) have been suggested to present a higher transmission efficiency although they would not be associated with more severe pathogenesis. 27 Therefore, to describe the evolution of the S protein variants, the study of haplotypes network in all seven clades and for both regions (S and RBD alone) was performed. This analysis showed several identical sequences grouped together resulting in a starshaped network, which is characteristic of viral outbreaks. 28 were associated with the binding affinity of RBD. 30, 31 Additionally, the mutation L5F in the signal peptide was present in 3.3% of members belonging clade V. 27 Other changes associated to relevant functions 27, 30 such as H49Y in clade L (associated with monomer stability), A829T in clade S (fusion peptide), D936Y in clade GH (Heptad repeat 1 [HR1] associated with monomer stability), and P1263 in clade G (present in the cytoplasmic tail), were also detected in 1%-3.4%. The evolutionary characterization of the wide spectrum of haplotypes contributes to determining the haplotype significance and its association with disease severity, response to antivirals, development of vaccines, and host genetic factors. The evolutionary rate of S protein estimated for all together clades was significantly higher than that previously reported by analyzing the entire genome. 14,28 This is expected as the complete F I G U R E 2 Test of temporal structure. Comparison of the evolutionary rates estimated for the original data set versus the date-randomized ones. This analysis was performed for the Spike-coding region (3822nt) of each clade. s.s.y, substitutions/site/year PERESON ET AL. Hap-254 (H49Y) Hap-90 (A522S, E780C) 5 (1.7) Hap-320 Hap-91 (E780C) Hap-105 (D936Y) Hap-384 (D614A) Hap-415 (A829T) 5 (1.7) Hap-226 (T478I) 8 (2.6) Hap-437 (A846S) Total A novel coronavirus from patients with pneumonia in China World Health Organization. Coronavirus disease (COVID-19) Weekly Operational Update on COVID-19 Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region Asymptomatic reinfection in two healthcare workers from India with genetically distinct SARS-CoV-2 COVID-19 re-infection by a phylogenetically distinct SARS-coronavirus-2 strain confirmed by whole genome sequencing Symptomatic SARS-CoV-2 reinfection by a phylogenetically distinct strain Characteristics of SARS-CoV-2 and COVID-19 World Health Organization. Draft landscape of COVID-19 candidate vaccines Quinolinesbased SARS-CoV-2 3CLpro and RdRp inhibitors and spike-RBD-ACE2 inhibitor for drug-repurposing against COVID-19: an in silico analysis Discovery of clioquinol and analogues as novel inhibitors of severe acute respiratory syndrome coronavirus 2 infection, ACE2 and ACE2-Spike protein interaction in vitro. bioRxiv: the preprint server for biology Cell entry mechanisms of SARS-CoV-2 An integrated drug repurposing strategy for the rapid identification of potential SARS-CoV-2 viral inhibitors The first two cases of 2019-nCoV in Italy: where they come from Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 Phylogenetic analysis of SARS-CoV-2 in the first few months since its emergence POPART: full-feature software for haplotype network construction Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows DNA sequence polymorphism analysis of large data sets Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Creating the CIPRES Science Gateway for inference of large phylogenetic trees ModelFinder: fast model selection for accurate phylogenetic estimates Temporal signal and the phylodynamic threshold of SARS-CoV-2 Mathematical models of infectious disease transmission Posterior summarization in Bayesian phylogenetics using tracer 1.7 tipdatingbeast: an r package to assist the implementation of phylogenetic tip-dating tests using beast The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus Population genetics of SARS-CoV-2: disentangling effects of sampling bias and infection clusters Geographic and genomic distribution of SARS-CoV-2 mutations Systemic effects of missense mutations on SARS-CoV-2 spike glycoprotein stability and receptor-binding affinity Key residues of the receptor binding motif in the spike protein of SARS-CoV-2 that interact with ACE2 and neutralizing antibodies Evolutionary analysis of SARS-CoV-2 spike protein for its different clades