key: cord-0827968-bs8ohz1g authors: Roy, Chayan; Mandal, Santi M.; Mondal, Suresh K.; Mukherjee, Shriparna; Mapder, Tarunendu; Ghosh, Wriddhiman; Chakraborty, Ranadhir title: Trends of mutation accumulation across global SARS-CoV-2 genomes: Implications for the evolution of the novel coronavirus date: 2020-11-05 journal: Genomics DOI: 10.1016/j.ygeno.2020.11.003 sha: c8eead3f72bc539f59bccd0eb8f1cea209e9907b doc_id: 827968 cord_uid: bs8ohz1g To understand SARS-CoV-2 microevolution, this study explored the genome-wide frequency, gene-wise distribution, and molecular nature of all point-mutations detected across its 71,703 RNA-genomes deposited in GISAID till 21 August 2020. Globally, nsp1/nsp2 and orf7a/orf3a were the most mutation-ridden non-structural and structural genes respectively. Phylogeny of 4618 spatiotemporally-representative genomes revealed that entities belonging to the early lineages are mostly spread over Asian countries, including India, whereas the recently-derived lineages are more globally distributed. Of the total 20,163 instances of polymorphism detected across global genomes, 12,594 and 7569 involved transitions and transversions, predominated by cytidine-to-uridine and guanosine-to-uridine conversions, respectively. Positive selection of nonsynonymous mutations (dN/dS >1) in most of the structural, but not the non-structural, genes indicated that SARS-CoV-2 has already harmonized its replication/transcription machineries with the host metabolism, while it is still redefining virulence/transmissibility strategies at the molecular level. Mechanistic bases and evolutionary/pathogenicity-related implications are discussed for the predominant mutation-types. Of the 83475 SARS-CoV-2 whole-genome sequences available in the repository of Global Initiative on Sharing All Influenza Data (GISAID) on 21 August 2020, 42.22% were from UK, while the rest were from 107 other countries All these sequences were downloaded together with their metadata, and the dataset was filtered using the Augur tool kit [12] to eliminate undesired sequences. 11723 entries were removed based on the minimum 29000 nucleotide length cut-off (the cut-off was based on the genome size of the reference Wuhan strain NC_045512.2), another 49 were removed because they originated from non-human sources. In this way, 71703 GISAID entries remained in the final dataset used for further study. For all the present analyses, the 29903 nucleotide long complete whole-genome of the earliestsequenced SARS-CoV-2 strain from Wuhan (NC_045512.2) was used as the reference sequence. The software package called MicroGMT or Microbial Genomics Mutation Tracker [13] was used to identify modifications in the genome sequences analyzed. This package essentially uses Minimap2 [14] and Bcftools [15] to map individual genomes against the reference and store the results in a Variant Call Format (VCF) table. It further utilizes the SnpEff tool [16] to characterize all the detected mutations at the level of the nucleotide as well as the amino acid in the translated sequence. Although MicroGMT also reports instances of insertion and deletion in the sequences compared, the current study focused only on the pointmutation data, which were further verified as follows. The software MAFFT [17] was used with default options to align all the whole-genome sequences included in the dataset. Polymorphisms (base substitutions) were identified in the individual genomes using the software SNP-sites [18] , which specifically identifies single nucleotide polymorphisms (SNPs) from aligned multi-fasta sequence files. Subsequently, the VCF file generated from the SNPsite analysis was processed using the software VCFtools [19] to enumerate all transition and transversion events within the dataset of aligned whole-genome sequences. Frequency of point mutations (M f ) in the SARS-CoV-2 pan-genome, or a given segment (locus) of the pangenome, was calculated as P i / (L n × N s ), where P i is the number of instances of polymorphism detected within the genome/locus, L n is the nucleotide length of the genome/locus, and N s is the number of sequenced entities present in the dataset. dN/dS (also known as ω or Ka/Ks), which is the ratio between the rates of nonsynonymous (dN) and synonymous (dS) mutations, was determined for all the individual genes of SARS-CoV-2, based on likelihood analysis using J o u r n a l P r e -p r o o f Journal Pre-proof the software package HyPhy [20] . Sequence similarities between SARS-CoV-2 genome pairs were computed using the software FastANI, which uses a high throughput method for average nucleotide identity analysis [21] . Evolutionary relationship between the existing SARS-CoV-2 lineages was inferred from a phylogenetic tree constructed based on a subset of the 71703 whole-genome sequences used for studying mutation accumulation trends. Sub-sampling was necessary because it is not possible to meaningfully display 71703 sequences in a single phylogenetic tree. This subdataset, comprising 4618 complete whole-genome sequences, was created using the software package Augur [12] , and by means of including (in an unbiased way) 150 genomes per geographical region (continent) per month since the first Wuhan strain was sequenced (NC_045512). Multiple sequence alignment was also created using the Augur tool kit of the Nextstrain package. Further alignment was carried out using the software IQ-TREE 2 which uses the maximum likelihood method for tree construction [22] ; Generalised Time Reversible (GTR) model was followed to construct the phylogenetic tree, which was finally visualized in the software Auspice (https://auspice.us). For the labeling of clades in the phylogenetic tree, type defining marker mutations were downloaded from the Nextstrain github repository which comes as a package within the Nextstrain tool (https://github.com/nextstrain/ncov). Rules of clade-labeling followed were those mentioned in the website located at https://nextstrain.github.io/ncov/naming_clades.html. Thus, clades were labeled based on the geographical origin of the sequences, plus three different concepts of clade nomenclature that are in use for COVID-19, namely (i) the dynamic clade nomenclature system PANGOLIN [23] (ii) Year-Letter nomenclature system proposed by Hodcroft et al. (https://nextstrain.org/blog/2020-06-02-SARSCoV2-clade-naming), and (iii) the system proposed by Tang et al. [24] , and followed by GISAID, which names major clades based on nine distinct marker mutations spread over 95% of the known SARS-Cov-2 diversity. In order to elucidate the biogeography and microevolution of SARS-CoV-2 in India, the analyzed using the same methodology as the one described above for the global phylogentic tree, following which the Indian sequences were mapped as per their clade affiliation and indicated using the GISAID and Year-Letter clade nomenclature systems. Average nucleotide identity (ANI, for a Kmer size of 16, over a fragment size of 1000 nucleotides), and sequence length coverage for all the pairwise alignments possible between the 11189 complete whole-genome sequences available simultaneously in GISAID and NCBI SARS-CoV-2 database (https://www.ncbi.nlm.nih.gov/sars-cov-2/) on 21 August 2020, showed that in all the cases both identity and coverage were within 99 and 100% (notably, ANI calculation was not possible for all the 71703 GISAID genomes retrieved on 21 August 2020). Whilst individual SARS-CoV-2 genomes differed only by a few nucleotides, the small sequence divergences across geographies indicated that within the short time span of the current pandemic, the pan-genome has diversified, and the quasispecies reservoir has expanded, rapidly for this novel coronavirus. This holds major implications for the adaptation of the virus within human hosts, and in doing so have serious consequences on the resultant pathogenesis, disease complications, and control [25] . The overall evolutionary paths traced thus far by SARS-CoV-2 was delineated by labeling the 4618 global (GISAID) sequences on the phylogenetic tree using three different concepts of clade nomenclature defined in the web-based resoure https://nextstrain.github.io/ncov/ ( Figures 1A-1C) . Information regarding the geographical origin of the sequences analyzed was also used to label the tree ( Figure 1D ). Figure 1A , where the tree topology was labeled according to the dynamic clade nomenclature system [23] called Phylogenetic Assignment Consistent with the above phylogenetic interpretations, labeling of the tree with the third clade-nomenclature convention proposed by Tang et al. [24] and also followed by GISAID, indicated that the two original lineages, named as S and L (essentially equivalent to 19A and 19B of the Year-Letter nomenclature system), has diversified and thus far given rise to a total of seven clades, based on nine distinct marker mutations spread over 95% of the known SARS-Cov-2 diversity ( Figure 1C ). As per the data available till 21 August 2020, Clade L is apparently more populous than Clade S, and has diversified further into V and G, with G Figure 1D ). India being the latest hotspot of the COVID-19 pandemic, recording >50000 cases of infection and >700 cases of fatality daily between July-end and October-middle 2020 (https://www.worldometers.info/coronavirus/country/india/), the phylogeny and biogeography of Indian SARS-CoV-2 isolates was analyzed using the specialized (GISAID-derived) dataset encompassing 1148 and 4630 genome sequences of Indian and global origins respectively. The phylogenetic tree topology obtained with this India-focused dataset ( Figures 1E and 1F) was essentially congruent with that obtained for the global dataset of 4618 GISAID sequences ( Figures 1A-1D ). Mapping of the Indian sequences on this tree topology using the GISAID ( Figure 1E ) and Year-Letter ( Figure 1F ) clade nomenclature systems showed that all the mutation-types which epitomize the major clades of global SARS-CoV-2 evolution are also present in India, albeit at potentially different frequencies of distribution within the country's viral population. For instance, the relatively lower number of sequences populating the two emerging lineages 20A/20268G and 20A/15324T can be clearly seen in Figure 1F which, in turn, corroborated the hypothesis that in the Asian countries the ancestral lineages are still more prevalent than the recently-derived mutational groups. Table S1 ). This distribution showed that 53.5% (i.e. 16002 / 29903) of the SARS-CoV-2 pan-genome has developed polymorphism via generation of small but definite mutations across the plethora of strains disseminated globally since the COVID outbreak in December 2019. genes SARS-CoV-2 has experienced strong selection pressure over a short period of time. For animal viruses, in general, forces of selection (fitness constraints) emanate from host immunogenic responses, and also during replication and transmission between hosts. Evolutionarily fit (selected) strains develop tropism, and infect different cell-types or tissues of the host, reproduce within them, and in turn give rise to a variety of new strains having diverse chronic to acute infectious characteristics [26, 27] . Genomic data can reveal where, when, and (sometimes) how viral pathogens have responded to various forces of natural selection. In the context of codon models, natural selection of any genetic locus is typically measured using the parameter dN/dS, which represents the ratio between the global rates of nonsynonymous (dN) and synonymous (dS) mutation accumulation in that locus. to infect host cells via evading the immune system (specifically, the innate immune system), and eventually induce apoptotic pathways [29] [30] [31] [32] [33] [34] [35] . Consequently, brisk amino acid changes in these protein sequences may well be instrumental in allowing the virus innovate newer techniques to fulfil its pathogenic objectives. From a holistic evolutionary perspective based on the above considerations, SARS-CoV-2 seems to have already succeeded in stably synchronizing its replication and transcription machineries with the host's metabolic environment (as its non-structural genes are clearly recruiting less missense mutations). The virus, however, by means of actively recruiting more missense mutations in its structural genes, is still testing newer biophysical options to increase the efficiency of its molecular contrivances for virulence and transmissibility (pathogenicity). Of Of the total 12594 transition mutations encountered, maximum, i.e. 3783, featured CU conversion, which was 30% of the total transition count (Table 1) . Individually, again, most of the SARS-CoV-2 genes were found to have CU conversion as the predominant transitiontype across the global genomes analyzed; only in nsp16, geneE, orf6 and orf8 was UC most prevalent (Table1). Of the 7569 transversions detected across global SARS-CoV-2 genomes, an overwhelming 2414 (31.9%) featured GU conversion (Table 1) . Individually, all the SARS-CoV-2 genes had GU conversion as the most predominant transversions-type. Since RNA viruses encode their own genome replication machineries (and do not depend on the hosts' replication systems as the DNA viruses do), they can optimize their mutation rates to achieve evolutionary fitness. This leads to an unrelenting generation of genomic variants for J o u r n a l P r e -p r o o f Journal Pre-proof any RNA virus, alongside a rivalry among the extant variants, including the more advanced ones that are added to the viro-diversity over time [36] . Consequently, all active genomic variants maintained within global/local RNA virus populations (quasispecies) come to possess equal abilities to replicate and complete the infection cycle [36] . In this context, the divergence of several lineages and sub-lineages of SARS-CoV-2 since the December-2019 outbreak (via generation of small mutations across its world-wide strains) -alongside the more or less efficient circulation of its two original major-lineages (clades indicated as S and L in Figure 1) across distinct geographies -reflects the equivalent pathological and evolutionary fitness of all its extant quasispecies. This rich stock of genotypic, and therefore potentially phenotypic, variants is likely to hold major implications for potential multifaceted adaptations of this novel coronavirus within human hosts, and in doing so have serious consequences on the resultant pathogenesis, disease complications and control [25] . Viruses that have evolved to survive via changing their hosts are extremely skilled molecular manipulators; the key to their ecological fitness is attributed to their ability to subvert host defense systems to ensure survival, replication and proliferation [37] . Coronavirusencoded accessory proteins, in general, play critical roles in virus-host interactions and modulation of host-immune responses, thereby contributing to their pathogenicity [38, 39] . nsp1 and nsp2 are the most mutation-prone non-structural genes of SARS-CoV-2, as they have the highest M f values among all such genes. Nsp1 is known to inhibit translation by binding to the host's 40S ribosome, and also inhibit IFN signaling, while Nsp2 inhibits the two host proteins proinhibitin1 and proinhibitin2 to disrupt the cellular environment [33] . Copious mutations in these two genes, therefore, can help the virus innovate novel molecular routes to evade host immunogenic response. With regard to the 16 non-structural genes of SARS-CoV-2 it is remarkable that only nsp11 has a dN/dS value >1. The exact function of Nsp11 is not known. However, in Arterivirus, this protein has been characterized as a Nidoviral uridylate-specific endoribonuclease (NendoU) that is associated with RNA processing [29] . So, a dN/dS vaue >1 Orf3a of SARS-CoVs has pro-apoptotic activity [41] ; very recent studies further implicated this protein of SARS-CoV-2 in inducing extrinsic apoptotic pathway through a unique membraneanchoring strategy [34] . In view of these key roles of Orf3a in SARS-CoV-2 pathogenicity, and Figure S1 ). In all SARS-CoVs, the type I membrane protein encoded by this gene is known to interact with bone marrow stromal antigen-2 (BST-2) and may play a role in viral assembly or budding events unique to SARS-CoVs [33] . Budding events are central to the transmissibility of SARS-CoV-2, so recruitment of copious mutations, especially nonsynonymous ones, in this structural gene ( Figure 3 ) affords novel molecular options to increase the efficiency of virulence (pathogenicity) of the virus. In view of the overwhelming preponderance of CU and GU transitions in the global mutation spectrum of SARS-CoV-2 (as compared to all other transition and transversion mutations respectively) it seems likely that in the ecological context of this novel coronavirus some physicochemical and/or biochemical mutagen is more instrumental in bringing about this selective change, over and above the general replication error-induced mechanism of mutagenesis. Cytosine can convert to uracil through processes akin to hydrolytic deamination under the action of ultra-violet (UV) irradiation, which is well established in the context of DNA Journal Pre-proof [42] . CU conversion is also possible chemically under the mediation of bisulfite reagents [43] that are frequently used as disinfectants, antioxidants and preservative agents. Incidentally, several control techniques involving heating, sterilization, ultraviolet germicidal irradiation (UVGI) [44] and/or chemical disinfectants [45] are being used currently to reduce the risk of viral infection from contaminated surfaces. Of these, intense UV-C irradiation is at the forefront of our fight against COVID-19, so indiscriminate use of the same may well accelerate the incidence of CU mutations in global SARS-CoV-2 genomes. Furthermore, UV's specificity for targeting two adjacent pyrimidine nucleotides is long known [46] , while in the context of DNA, UV-induced signature mutations collated from existing data on cells exposed to UVC, UVB, UVA or solar simulator light, have been confirmed as CT in ≥ 60% dipyrimidine sites, of which again ≥ 5% is CCTT [47] . In consideration of the above facts, it seems likely that prone to mistranslation [48] . It is therefore conceivable that SARS-CoV-2, in addition to classical mutations acquired from error-prone replication at the genomic level, uses the mistranslated replication-cum-transcription complex for the development of diverged genomic lineages [49, 50] . In other words, when the viral infection discharges its positively-sensed RNAgenome into the host cell, errors in the RdRP crops up via mistranslation [51, 52] ; the consequent blend of wild-type and changed RdRP enzymes through its replication activities give rise to a range of viral genome-variants or quasispecies, even within a single transmission event [50] . Those variants which have the best viral fitness, eventually, endure and become predominant in the population. In this context, it is further noteworthy that both tautomeric and anionic Watson-Crick(W-C)-like mismatches can increase the recruitment of replication and translation errors [53, 54] . A sequence-dependent kinetic network system connects G•T/U Journal Pre-proof wobbles with three particular W-C mismatches comprising of two quickly exchanging tautomeric species (Genol•T/U⇌G•Tenol/Uenol, population <0.4%) and one anionic species (G•T − /U − , population ≈0.001% at unbiased pH) [55] . The array of highly glycosylated spike (S) proteins present on the surface of SARS-CoV-2 bind to the host cell receptor called angiotensin-converting enzyme 2 (hACE2), and upon activation by a Type II transmembrane serine protease located on the host cell membrane, facilitate viral entry into the cell [56] . Owing to its crucial role in SARS-CoV-2 infection the spike constitutes a key target for vaccine and drug development against COVID-19 [57] [58] [59] , and for the same Like most other SARS-CoV-2 genes, CU and GU were also found to be the most dominant transition and transversion types across global S gene homologs ( Table S2 ). In all the other structural and nonstructural genes as well, nonsynonymous amino acid substitution-yielding CU mutations exhibited global propensity for replacing proline to serine or leucine residues (Supplementary File 2, Table S2 ). Furthermore, a significant positive correlation (R = 0.97; P = 0.00001) was observed between the number of missense CU mutations accumulating in a gene and the number of proline replacements occurring in the corresponding translated sequence. In a protein sequence, when proline is replaced by serine or leucine a strong helix breaker is removed and replaced by a residue indifferent to helix formation [63] . In the process conformational freedom of the protein is increased, which in the context of the spike can implicate versatile infectivity. The core of SARS-CoV-2 RBD consists of five antiparallel β sheets (β1, β2, β3, β4 and β7) connected by petite helices and loops. The receptor-binding motif (RBM), which mediates the contact with ACE2, lies between the β4 and β7 strands of the RBD core [64] . There are Table S3 ). Lysine and arginine have similar size and charge, so their interchange may cause minimal secondary structure rearrangement [68] ; but how such J o u r n a l P r e -p r o o f Journal Pre-proof changes eventually influence the salt-bridge interaction with Asp30 of hACE2 is still unclear. Likewise, how the Lys417Asn substitution alters the spike-hACE2 interaction paradigm as a whole is also completely unknown. Whereas the jury is still out on the biophysical significances of the global array of missense mutations in the spike, they surely pose matters of concern for drug designers and vaccine developers worldwide. The current investigation of 71703 complete whole-genome sequences of SARS-CoV-2 isolates from across the world brought to the fore a number of remarkable aspects of microevolution of this novel coronavirus. Phylogenomic analysis illustrated that the two major- J o u r n a l P r e -p r o o f Highlights or key findings of the paper -SARS-CoV-2 microevolution was studied alongside the global trends of point-mutation in a universal dataset of 71,703 genomes. -Globally, nsp1/nsp2 and orf7a/orf3a were the most mutation-ridden non-structural and structural genes respectively. -Whole-genome phylogeny revealed that entities belonging to the early lineages are mostly spread over Asian countries (including India) whereas the recently-derived lineages are more globally distributed. -A transition:transversion ratio of 2.66 characterized the nucleotide substitution bias of SARS-CoV-2, with cytidine-to-uridine and guanosine-to-uridine conversions being the predominant transition and transversion types respectively. -In the pan-genome, cytidine-to-uridine mutations yielding nonsynonymous amino acid replacements have a propensity for changing hydrophilic residues to hydrophobic ones. -Nonsynonymous mutations are under positive selection in most of the structural, but not non-structural, genes. J o u r n a l P r e -p r o o f Li Wenliang. The Lancet A new coronavirus associated with human respiratory disease in China Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin Full genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event Viral metagenomics revealed Sendai Virus and coronavirus infection of Malayan Pangolins (Manis javanica) Genomic characterization and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single-blind, randomised controlled trial Evolutionary analysis of SARS-CoV-2: How mutation of non-structural protein 6 (NSP6) could affect viral autophagy Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 Mutations in spike protein of SARS-CoV-2 modulate receptor binding, membrane fusion and immunogenicity: An Insight into viral tropism and pathogenesis of COVID-19 Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Nextstrain: real-time tracking of pathogen evolution MicroGMT: A mutation tracker for SARS-CoV-2 and other microbial genome sequences Minimap2: pairwise alignment for nucleotide sequences A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2 MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform SNP-sites: rapid efficient of SNPs from multi-FASTA alignments The variant call format and VCFtools HyPhy: hypothesis testing using phylogenies High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology On the origin and continuing evolution of SARS-CoV-2 Complications of RNA heterogeneity for the engineering of virus vaccines and antiviral agents Human immunodeficiency virus type 1 cellular entry and exit in the T lymphocytic and monocytic compartments: mechanisms and target opportunities during viral disease Evolution of viral genomes: interplay between selection, recombination, and other forces The population genetics of dN/dS Structural biology of the arterivirus nsp11 endoribonucleases Coronavirus envelope protein: current knowledge Biochemical characterization of SARS-CoV-2 nucleocapsid protein The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19 The ORF3a protein of SARS-CoV-2 induces apoptosis in cells Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses Molecular quasispecies Viral evasion and subversion of pattern-recognition receptor signaling SARS coronavirus accessory proteins SARS-CoV-2 Inflammatory syndrome. clinical features and rationale for immunological treatment Apoptosis in animal models of virus-induced disease Severe acute respiratory syndrome coronavirus 3a protein activates the mitochondrial death pathway through p38 MAP kinase activation Accelerated deamination of cytosine residues in UV-induced cyclobutane pyrimidine dimers leads to CC→TT transitions Discovery of bisulfite-mediated cytosine conversion to uracil, the key reaction for DNA methylation analysis-A personal account Inactivation of viruses on surfaces by ultraviolet germicidal irradiation Effect of the GC content of DNA on the distribution of UVB-induced bipyrimidine photoproducts Mutagenic specificity of ultraviolet light UV signature mutations Translational fidelity and mistranslation in the cellular response to stress Protein mistranslation: friend or foe? Trends Errors in translational decoding: tRNA wobbling or misincorporation? Organisms with alternative genetic codes resolve unassigned codons via mistranslation and ribosomal rescue Mitovirus UGA(Trp) codon usage parallels that of host mitochondria The spontaneous replication error and the mismatch discrimination mechanisms of human DNA polymerase β New structural insights into translational miscoding Dynamic basis for dG•dT misincorporation via tautomerization and ionization Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses The SARSCoV-2 vaccine pipeline: an overview Potential rapid diagnostics, vaccine and therapeutics for 2019 novel coronavirus (2019-nCoV): a systematic review SARS-CoV-2 spike protein: an optimal immunological target for vaccines Confounded cytosine! Tinkering and the evolution of DNA Residue mutations and their mpact on protein structure and function: detecting beneficial and pathogenic changes Mutation patterns of human SARS-CoV-2 and Bat RaTG13 coronavirus genomes are strongly biased toward C>U transitions, indicating rapid evolution in their hosts Alpha-helical, but not beta-sheet, propensity of proline is determined by peptide environment Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structural basis of receptor recognition by SARS-CoV-2 Conformational perturbation of SARS-CoV-2 spike protein using N-acetyl cysteine, a molecular scissor: A probable strategy to combat COVID-19 Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: An in silico analysis Amino acid properties and consequences of substitutions Ranadhir Chakraborty: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing -original draft preparation Data curation, Formal analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing -original draft preparation. Chayan Roy: Data curation Mandal: Data curation, Formal analysis, Investigation, Validation. Suresh K. Mondal: Formal analysis, Investigation, Validation. Shriparna Mukherjee: Formal analysis, Investigation, Validation The authors declare that they have no conflict of interest. J o u r n a l P r e -p r o o f ND = not determined NA = not applicable dN = rate of missense (non-synonymous) mutation accumulation (ratio between the number of non-synonymous mutations and non-synonymous sites) dS = rate of synonymous mutation accumulation (ratio between the number of synonymous mutations and synonymous sites) (117) 10 8 16 14 48 6 3 1 3 2 3 9 2 29 77 9.18 × 10 -6 3' UTR (229) 37 29 34 30 130 19 15 17 12 9 21 43 13 149 279 1.70 × 10 -5 N Pangenome (29903) 3174 2350 3783 3287 12594 969 877 967 865 256 520 2414 701 7569 20163 9.4 × 10 -6 122 J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f