key: cord-1038693-do3l91v3 authors: Hassan, Sk. Sarif; Lundstrom, Kenneth; Choudhury, Pabitra Pal; Palu, Giorgio; Uhal, Bruce D.; Kandimalla, Ramesh; Seyran, Murat; Lal, Amos; Sherchan, Samendra P.; Azad, Gajendra Kumar; Aljabali, Alaa A. A.; Brufsky, Adam M.; Serrano-Aroca, Ángel; Adadi, Parise; Abd El-Aziz, Tarek Mohamed; Redwan, Elrashdy M.; Takayama, Kazuo; Barh, Debmalya; Rezaei, Nima; Tambuwala, Murtaza; Uversky, Vladimir N. title: Implications Derived from S-Protein Variants of SARS-CoV-2 from Six Continents date: 2021-05-18 journal: bioRxiv DOI: 10.1101/2021.05.18.444675 sha: 5910f284fd98f3337eaf7642a8fc811397cceed0 doc_id: 1038693 cord_uid: do3l91v3 Spike (S) proteins of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are critical determinants of the infectivity and antigenicity of the virus. Several mutations in the spike protein of SARS-CoV-2 have already been detected, and their effect in immune system evasion and enhanced transmission as a cause of increased morbidity and mortality are being investigated. From pathogenic and epidemiological perspectives, spike proteins are of prime interest to researchers. This study focused on the unique variants of S proteins from six continents Asia, Africa, Europe, Oceania, South America, and North America. In comparison to the other five continents, Africa (29.065%) had the highest percentage of unique S proteins. Notably, only North America had 87% (14046) of the total (16143) specific S proteins available in the NCBI database(across all continents). Based on the amino acid frequency distributions in the S protein variants from all the continents, the phylogenetic relationship implies that unique S proteins from North America were significantly different from those of the other five continents. Overtime, the unique variants originating from North America are most likely to spread to the other geographic locations through international travel or naturally by emerging mutations. Hence it is suggested that restriction of international travel should be considered, and massive vaccination as an utmost measure to combat the spread of COVID-19 pandemic. It is also further suggested that the efficacy of existing vaccines and future vaccine development must be reviewed with careful scrutiny, and if needed, further re-engineered based on requirements dictated by new emerging S protein variants. The world is experiencing a health emergency due to Coronavirus disease , caused by a deadly enveloped positive-sense single-stranded RNA virus, severe acute respiratory syndrome coronavirus (SARS-CoV-2) [1, 2, 3, 4, 5, 6] . The spike (S) protein is a homotrimer present on the surface of the SARS-CoV-2 and recognizes the human host cell surface receptor angiotensin-converting enzyme-2 (ACE2) [7, 8, 9, 10] . From the beginning of the second wave of COVID-19 infection, various SARS-CoV-2 variants variants emerged raising concern of enhanced transmission and mortality of the virus and reduced efficacy of vaccine protection [11, 12] . Some of the studies opposed the perception of SARS-CoV-2 mutations as distinctive pathogenic variants and increased rate of transmissibility were questioned [13, 14] . However, the frequency of the mutant strains within the SARS-CoV-2 population carrying the D614G mutation in the spike protein clearly plays a role in enabling the virus to spread more effectively and rapidly [15] . Epidemiologists have been constantly monitoring the evolution of SARS-CoV-2 with a particular focus on the spike protein and other interacting proteins of the virus [15, 16] . The D614G mutation in the S protein discovered in early 2020 makes the virus able to spread more effectively and rapidly [17] . The D614G mutation has been found to be related with high viral loads in infected patients, and high rate of infections, but not with increased disease severity [18] . Various mutations in the S protein make the SARS-CoV-2 more complex and hence it is more difficult to characterize its severity, infectivity and efficacy of vaccines designed to target S protein. Not all mutations are advantageous to the virus but several mutations or a set of mutations may increase the transmission potential through an increase in receptor binding or the ability to evade the host immune response by altering the surface structures recognized by antibodies [19, 20, 21] . To contain the spread of the COVID-19, it is definitely of high interest to detect and identify various unique emerging variants of S proteins. Additionally, it is also worth investigating the impact of new S protein variants on viral infectivity and potential to spread rapidly as well as to acertain the origin of the spread of the new variants concerning spike protein variabilities. Accordingly, it might be possible to segregate the set of new variants with respect to individual characteristics of SARS-CoV-2, which would undoubtedly help policy makers to form various strategies to contain the spread of the virus. There are a large number of different SARS-CoV-2 S protein mutant sequences currently available in the NCBI virus database. In this study, all available S protein sequences from six continents Asia, Africa, Europe, North America, South America, and Oceania were analyzed for their uniqueness and variability. An inter-linkage was made among the unique S proteins available on the six continents was performed. S protein sequences from all six continents (Asia, Africa, Europe, Oceania, South America, and North America) were downloaded in Fasta format (on May 7, 2021) from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/). Further, fasta files were processed in Matlab-2021a for extracting unique S protein sequences for each continent. Any protein sequence is composed of twenty different amino acids with various frequencies starting from zero. The probability of occurrence of each amino acid A i is determined by the formula f (Ai) l where f (A i ) denotes the frequency of occurrence of the amino acid A i in a primary sequence, and l stands as the length of an S protein [22] . Hence for each S protein, a twenty-dimensional vector considering the frequency probability of twenty amino acids can be obtained. Based on this frequency probability, the dominance of amino acid density in a given protein is illuminated. The variability of the amino acid compositions of the unique S-proteins from each continent was evaluated using the webbased tool Composition Profiler (http://www.cprofiler.org/ ) that automates detection of enrichment or depletion patterns of individual amino acids or groups of amino acids in query proteins [23] . In this analysis, we used sets of unique S-proteins from each continent as query samples and the amino acid of the original S-protein (UniProt ID: P0DTC2) as a reference sample that provides the background amino acid distribution. Composition profiler generates a bar chart composed of twenty data points (one for each amino acid), where bar heights indicate normalized enrichment or depletion of a given residue. The normalized enrichment/depletion is calculated as where C continent is the content of given residue in the query set of S-proteins in a given continent and C original is the content of the same residue in the original S-protein. For comparison, we generated composition profile of disordered proteins, where normalized composition was evaluated as C Disprot −C P DB C P DB (C DisP rot = content of a given amino acid in the set of intrinsically disordered proteins in the DisProt database [24] ; C P DB = content of the given residue in the dataset of fully ordered proteins, PDB Select 25 [23] ). In these analyses, the positive and negative values produced in the compositional profiler indicated enrichment or depletion of the indicated residue, respectively. How conserved/disordered the amino acids are organized over S protein is addressed by the information-theoretic measure known as 'Shannon entropy(SE)'. For each S protein, Shannon entropy of amino acid conservation over the amino acid sequence of S protein is computed using the following formula [25, 26] : For a given amino acid sequence of length l, the conservation of amino acids is calculated as follows: where p si = ki l ; k i represents the number of occurrences of an amino acid s i in the given sequence [27] . The isoelectric point (IP), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. We calculate the theoretical pI by using the pKa's of amino acids and summing the net charge across the protein at a given pH (default is typical intracellular pH 7.2), searching with our algorithm for the pH at which the net charge is zero [28] . Note that the isoelectric point of a protein sequence was computed here using the standard routine of Matlab-2021a. This parameter was deployed to characterize the unique S protein sequences, quantitatively. We first determined the set of unique S protein sequences from each continent. Further, every unique S protein from a continent was compared with other unique S proteins from five continents, and the lists of the same are presented in Tables 12-17 . Also, the variability of the S proteins from each continent was shown using Shannon entropy and isoelectric point. In Table 1 , the number of total sequences, unique sequences and percentages are presented. Note that, a complete list of unique S protein accessions and their names (continent-wise) were made available in supplementary file-1. Note that, sequence accession is renamed as Ck where C stands for continent code (Asia:AS, Africa:AF, Oceania:O, Europe:U, South America:SA and North America:NA), and k denotes the serial number. The highest amount (29.065%) of unique S proteins were found in Africa though the total number of available sequences is significantly low as compared with that from other continents. Almost similar amounts (in percentage) of unique S sequence variations were found in Asia and Europe. Among the total 127760 S proteins embedded in SARS-CoV-2 genomes, only 16143 (12%) unique S proteins were detected so far, and notably most of the unique variants (87%) were found in North America only. For each continent, the unique spike (S) proteins were matched with other unique proteins from the rest of the five continents, and a total number of such identical pairs are presented accordingly in the matrix ( Table 2) . From Table 2 , it was observed that, in each continent there is still a significant percentage of unique spike variations available, which are not shared with any rest of the continents. Such percentages of unique variations of S proteins in Asia, Africa, Europe, Oceania, South America, and North America were 41%, 55%, 28%, 92%, 7%, and 97% respectively. The lists of pairs of identical S proteins of SARS-CoV-2 originating from six continents are presented in Tables 9-11 (Appendix -I). The lists of unique S proteins (from a particular continent), which were found to be identical with some unique spike proteins from other five continents, are presented in Tables (12-17 ) (Appendix -II). The frequency and percentage of invariant residue positions, where no amino acid change was detected so far in the unique S proteins available in each continent, are presented in Table 3 . The highest number of mutations (lowest number of invariant residue position, 6.99%) ( Table 3) were detected in the unique S proteins from North America where 12.42% unique S protein sequences were present as mentioned in Table 1 . Likewise, the lowest number (15.95%) of mutations in unique S proteins were observed in South America where 15.3% unique S sequences were found. Only 29.14% residues of 1273 in the unique S proteins were mutated, although a significantly higher number (29.065%) of unique sequences were found in Africa among the other five continents. The unique S proteins from Europe possessed only 25.5% mutations, whereas 45.5% mutations were detected in the unique S proteins from Asia although the same percentage (18.5%) of unique spike proteins were found (Table 1 and 3) . Further it was observed that 11.3% of the unique S proteins from Oceania possessed 42.58% mutations. Additional information on the variability of the amino compositions of the unique S-proteins from each continent relative to the composition of original S-protein from Wuhan was retrieved using the web-based tool Composition Profiler (http://www.cprofiler.org/ ). Results of this analysis are shown in Figure 4A , which clearly shows the presence of some noticeable amino acid composition variability among unique S-proteins from different continents. Since individual S proteins are different from each other and from the original S-protein mostly in very limited number of residues, the range of changes in the normalized enrichment/depletion of a given residue is rather limited (compare scales of Y axis in Figures 1A and 1B , where a composition profile of the intrinsically disordered proteins is shown for comparison). On an average, unique S-proteins form Oceania were found to have the most variability in terms of normalized amino acid composition. This was followed by the unique S-proteins from North America. Curiously, Figure 1A shows that although the normalized content of individual residues in the unique S-proteins from Oceania is always below that of the original S-protein, S-proteins from other continents might have relative excess of some residues. For example, some unique S-proteins from almost all continents can be enriched in glycine or histidine residues, whereas some European S-proteins can also be relatively enriched in cysteine, isoleucine, tyrosine, phenylalanine, and lysine residues (see positive green bars in Figure 1A ). Another interesting observation is that the different sets of S-proteins are typically characterized by rather noticeable variability of the normalized content of most residues. The noticeable exception is given by aspartate, depletion in which is almost uniform between all the unique S-proteins from all the continents. We quantitatively determined the variations in the unique S proteins on six continents. The variations were captured through the frequency distribution of amino acids present, Shannon entropy (amount of conservation of amino acids in a given sequence), and molecular weights and isoelectric points of a given protein sequence. The frequency of each amino acid was computed for each unique S protein available in six continents (Supplementary file-2 ). Maximum and minimum frequencies of amino acids present in the unique S proteins from different continents are presented in Table 4 . 80 44 89 62 41 63 49 84 19 79 109 62 15 78 60 101 98 13 56 98 Min 73 40 85 58 38 59 45 78 14 73 102 57 13 72 55 94 90 11 49 93 Asia Max 80 44 89 63 41 63 49 84 19 78 110 62 15 79 59 101 101 13 57 98 Min 73 39 80 55 36 56 45 76 15 72 100 55 13 68 52 90 90 11 49 90 Europe Max 80 43 89 63 41 63 49 84 19 79 110 62 15 79 59 101 98 13 57 99 Min 75 38 84 59 39 59 46 79 16 74 102 58 13 74 54 96 90 11 50 93 Oceania Max 81 43 90 62 41 63 49 84 18 78 109 62 15 79 59 100 98 12 56 99 Min 72 37 81 58 36 57 44 74 15 71 97 56 13 71 52 92 88 10 43 89 North America Max 82 44 91 63 42 64 49 85 20 79 111 64 15 80 60 102 99 13 58 100 Min 60 32 63 46 32 39 34 63 11 55 82 43 9 55 43 76 77 8 36 82 South America Max 80 43 89 62 41 63 48 83 18 78 109 62 14 79 58 101 98 12 57 98 Min 75 38 82 57 37 59 45 79 16 73 105 57 13 73 57 92 93 11 50 92 All S protein sequences are leucine (L) and serine (S) rich. Tryptophan (W) and methionine (M) were presented with the least frequencies ( Table 4 ). The widest variation in frequency distributions of the twenty amino acids over the unique S proteins was found in North America. To obtain quantitative variations in the unique S proteins available in each continent, differences between maximum and minimum vectors (20 dimensions) were obtained (Table 5) , and then Euclidean distances between the difference vectors was calculated ( Table 6 ). Based on the distance matrix, a phylogenetic relationship was derived among the continents (Figure 2 ). Variations based on the frequency distribution of amino acids present in the S proteins make North America (which belongs to the rightmost branch of the tree) distant from the other five continents (Figure 2 ). Variations among the unique spike proteins from Asia and Oceania turned out to be similar, and they belong to the same level of leaves of the far left branch of the tree. Africa and Europe were found to be the closest in terms of variations based on the frequency distribution of amino acids over the unique spike proteins from each continent. Variability of spike proteins from South America has distant resemblance to that of Africa/Europe as estimated in the phylogeny. The frequencies of amino acid distribution in each unique S protein from each continent were plotted (Figure 3 ). The widest variations of the frequency distribution of amino acids present in S proteins were observed in North America as wide band was observed in Figure 4 . Individual frequency distributions of amino acids in Asia and Oceania seem very close as it was observed from the phylogeny (Figure 2 ). In principle, for a random amino acid sequence, the Shannon entropy (SE) is one. Here Shannon entropy for each S protein sequence was computed using the formula stated in section 2.2 (Supplementary file-2 ) . It was found that the highest and lowest SEs of S proteins from all continents were 0.9643 and 0.9594 respectively. That is, the length of the largest interval is 0.005 which is sufficiently small. Also note that the length of the smallest interval was 0.001 which occurred in the SEs of S proteins from South America. Within this realm, the widest variation of SEs was noticed among the unique S proteins of North America. All other four intervals (considering lowest and highest) of SEs of all the unique S proteins from four continents Africa, Asia, Oceania and Europe were contained in the interval of North America and contain that of South America. (20 1273 ) possible amino acids (20 in number) sequences of length 1273, Nature(?) had selected only a fraction to make S proteins of SARS-CoV-2, and interestingly SEs of them were kept within a very small interval. From the SEs which were close to 1, the S protein sequences are expected to be pseudo-random. Variation of SEs for all unique S proteins from each continent is shown in Figures 5 and 6 . Conservation of amino acids present over each S protein from each continent is different from one another which is depicted by the zig-zag nature of SEs plots (Figure 5 and 6 ). For each S protein sequence from each continent isoelectric point (IP) was computed (Supplementary file-3) . Intervals (considering minimum and maximum) IPs of unique spike proteins from each continent were tabulated in Table 8 . It was noticed that IPs for all the unique S proteins from the six continents were distributed in between 5.61 and 7.79. The largest interval of IPs was found for the unique S proteins from North America. Therefore, the widest varieties of unique S proteins were found in North America. The degree of non-linearity of the plots of IPs for each protein from each continent shows wide variations of unique S proteins (Figures 7 and 8) . Various mutations in S proteins lead to the evolution of new variants of SARS-CoV-2 [29] . Naturally, our attention was captured to characterize unique S protein variants which were embedded in SARS-CoV-2 genomes infecting millions people worldwide [30] . As of May 7, 2021, there are 127760 patients infected with SARS-CoV-2 with 16143 S protein variants, which undoubtedly well-organized by means of amino acids composition and conservation as it was depicted by Shannon entropy and isoelectric point. Among the unique spike proteins present in a continent, many of them are common in other continents as well (Table 2) . On the other hand, there is still a handful of unique spike protein variants residing in each continent. Considering the nature and biological implications of the new variants of SARS-CoV-2 caused by different mutations in S proteins, the appearance of several unique S variants in SARS-CoV-2 is certainly an worrying event. [31] . There are still many unique S protein variants in all continents that may spread from person to person through close communities or by spontaneous mutations caused a condition that may become alarming. We observed that unique S proteins from North America have mutations in almost every amino acid residue position (1184 out of 1273), while unique spike variants from the other continents only have mutations in 16 to 20% of residues. So, even if international travel is limited, S proteins from these five continents will likely acquire mutations at other residue positions where mutations have already been found in the specific variants from North America due to natural evolution. Based on the amino acid frequency distributions in the S protein variants from all the continents, a phylogenetic relationship among the continents was drawn. The phylogenetic relationship implies that unique S proteins from North America were found to be significantly different from that of other five continents. Therefore, the possibility of spreading the unique variants originated from North America to the other geographic locations by means of international travel is high, and numerous mutations have been detected already in the unique variants from North America. Of note, South America infection/herd immunity status may have summarized by Manaus city example (the capital of Amazonas state in northern Brazil) where by June 2020 to October 2020 SARS-CoV-2 prevalence among Manaus' population increased from ¿60% to ¿70%, a condition which may mirror acquisition of herd immunity [32] . By January 2021 Manaus had a huge resurgence in cases due to emergence of a new variant known as P.1 which was responsible for nearly 100% of the new case [33] . Although the population may have then reached a high herd immunity threshold, there is still a risk of resurgence of new immunity-escape variants, which raises important questions. For example, Is post-infection herd immunity not enough for protection and should it be combined with vaccination? 2. Will the crucial viral variants (mutations) be listed by WHO and recommended to be included in "next generation vaccines"? [34, 35] . In addition, we cannot yet exclude the possibility of serious mutations in the viral RBD emerging in India and the USA [34] . Hence in the near future, we can expect to experience more new SARS-CoV-2 variants which might cause third, fourth, and fifth etc. waves of COVID-19. Therefore, massive vaccination is necessary to combat COVID-19, and of course, existing vaccines must be reviewed, and if needed further re-engineered may be required based on newly emerging S protein variants. Table 9 : List of pairs of identical spike proteins of SARS-CoV-2 originated from six continents Table 10 : List of pairs of identical spike proteins of SARS-CoV-2 originated from different continents Table 12 : List of spike proteins from Asia, which were found to be identical with spike proteins from other five continents Spike proteins (Asia) which were found to be identical with spike proteins from other five continents A1 A71 A115 A171 A207 A239 A280 A344 A388 A8 A76 A121 A173 A210 A244 A282 A345 A391 A12 A77 A122 A174 A211 A245 A283 A348 A394 A14 A78 A126 A175 A212 A247 A284 A351 A395 A15 A85 A127 A177 A213 A249 A286 A354 A396 A19 A89 A128 A178 A214 A253 A291 A356 A399 Table 13 : List of spike proteins from Africa, which were found to be identical with spike proteins from other five continents Spike proteins (Afria) which were found to be identical with spike proteins from other five continents AF47 AF71 AF88 AF105 AF120 AF133 AF149 AF179 AF231 AF278 AF9 AF48 AF72 AF90 AF108 AF121 AF134 AF151 AF195 AF247 AF283 AF19 AF50 AF73 AF92 AF114 AF123 AF137 AF152 AF196 AF248 AF31 AF51 AF76 AF99 AF115 AF125 AF138 AF154 AF223 AF254 Table 15 : List of spike proteins from North America, which were found to be identical with spike proteins from other five continents Spike proteins (North America) which were found to be identical with spike proteins from other five continents NA7 NA3911 NA4837 NA5595 NA6161 NA6510 NA6810 NA7300 NA8703 NA9792 NA13390 NA231 NA3986 NA4861 NA5606 NA6178 NA6515 NA6816 NA7312 NA8787 NA9834 NA13404 NA377 NA3988 NA4897 NA5627 NA6185 NA6527 NA6848 NA7355 NA8817 NA9891 NA13414 NA389 NA4024 NA4989 NA5644 NA6193 NA6540 NA6857 NA7375 NA8824 NA9910 NA13438 NA390 NA4028 NA5001 NA5645 NA6240 NA6550 NA6862 NA7402 NA9075 NA10257 NA13444 NA402 NA4051 NA5011 NA5666 NA6244 NA6553 NA6903 NA7430 NA9091 NA10276 NA13465 NA902 NA4061 NA5022 NA5687 NA6258 NA6566 NA6916 NA7431 NA9180 NA10312 NA13478 NA928 NA4117 NA5041 NA5693 NA6276 NA6577 NA6936 NA7453 NA9189 NA10342 NA13551 NA992 NA4169 Exploring the genomic and proteomic variations of sars-cov-2 spike glycoprotein: a computational biology approach, Infection Carbon-based nanomaterials: Promising antiviral agents to combat covid-19 in the microbial resistant era Possible transmission flow of sars-cov-2 based on ace2 features Protective face mask filter capable of inactivating sars-cov-2, and methicillin-resistant staphylococcus aureus and staphylococcus epidermidis Notable sequence homology of the orf10 protein introspects the architecture of sars-cov-2 A unique view of sars-cov-2 through the lens of orf8 protein Sars-cov-2 spike-protein d614g mutation increases virion spike density and infectivity Human sars cov-2 spike protein mutations Controlling the sars-cov-2 spike glycoprotein conformation The structural basis of accelerated host cell entry by sars-cov-2 Emergence in late 2020 of multiple lineages of sars-cov-2 spike protein variants affecting amino acid position Structures and distributions of sars-cov-2 spike proteins on intact virions No evidence for distinct types in the evolution of sars-cov-2 Emergence of genomic diversity and recurrent mutations in sars-cov-2, Infection Structural impact on sars-cov-2 spike protein by d614g substitution Epidemiology, virology, and clinical features of severe acute respiratory syndrome-coronavirus-2 (sars-cov-2; coronavirus disease-19) The coronavirus is mutating-does it matter? Tracking changes in sars-cov-2 spike: evidence that d614g increases infectivity of the covid-19 virus Evaluating the effects of sars-cov-2 spike mutation d614g on transmissibility and pathogenicity Sars-cov-2 evolution and vaccines: cause for concern? Emergence of a sars-cov-2 variant of concern with mutations in spike glycoprotein Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code Composition profiler: a tool for discovery and visualization of amino acid composition differences Disprot: the database of disordered proteins Pathogenetic perspective of missense mutations of orf3a protein of sars-cov-2 Missense mutations in sars-cov2 genomes from indian patients The shannon information entropy of protein sequences Determination of the isoelectric point of proteins by capillary isoelectric focusing Antibody cocktail to sars-cov-2 spike protein prevents rapid mutational escape seen with individual antibodies Identification of sars-cov-2 spike mutations that attenuate monoclonal and serum antibody neutralization A sars-cov-2 vaccine candidate would likely match all currently circulating variants Three-quarters attack rate of sars-cov-2 in the brazilian amazon during a largely unmitigated epidemic Five reasons why covid herd immunity is probably impossible Will sars-cov-2 variants of concern affect the promise of vaccines? Covid-19 pandemic and vaccination build herd immunity U92, NA6723) (U3, O5) (NA992, O5) (NA6751, O398) (NA3313, SA1) (U93, NA6775) (U26, O43) (NA3873, O28) (NA6962, O400) (NA4550, SA5) (U94, NA6862) (U30, O58) (NA4024, O36) (NA7060, O401) (NA4720, SA7) (U98, NA7057) (U52, O201) (NA4243, O43) (NA7090, O402) (NA4989, SA11) (U99, NA7090) (U63, O377) (NA4508, O58) (NA7230, O404) (NA5595, SA13) (U100, NA7129) (U80, O390) (NA4756, O65) (NA7355, O415) (NA5687, SA18) (U103, NA7199) (U99, O402) (NA4861, O83) (NA7402, O419) (NA6101, SA19) (U104, NA7312) (U118, O1032) (NA5011, O105) (NA7510, O422) (NA6146, SA20) (U106, NA7431) (U181, O1104) (NA5041, O114) (NA7811, O625) (NA6161, SA21) (U107, NA7557) (NA5188, O148) (NA7832, O631) (NA6185, SA22) (U111, NA7679) Spike: Europe-South America (NA5194, O201) (NA7845, O633) (NA6299, SA25) (U112, NA7884) (U9, SA1) (NA5200, O225) (NA7901, O645) (NA6373, SA27) (U113, NA7914) (U41, SA11) (NA5205, O238) (NA8514, O751) (NA6395, SA28) (U114, NA9075) (U63, SA13) (NA5372, O368) (NA8646, O770) (NA6396, SA29) (U116, NA9180) (U80, SA32) (NA5538, O370) (NA8703, O798) (NA6406, SA30) (U117, NA9189 Figure 7 : Isoelectric point of unique S proteins from different continents Authors have no conflict of interest to declare.