key: cord-0805254-rlw166s1 authors: Mohammadi, Elmira; Shafiee, Fatemeh; Shahzamani, Kiana; Ranjbar, Mohammad Mehdi; Alibakhshi, Abbas; Ahangarzadeh, Shahrzad; Beikmohammadi, Leila; Shariati, Laleh; Hooshmandi, Soodeh; Ataei, Behrooz; Javanmard, Shaghayegh Haghjooy title: Novel and emerging mutations of SARS-CoV-2: Biomedical implications date: 2021-04-23 journal: Biomed Pharmacother DOI: 10.1016/j.biopha.2021.111599 sha: 59b64bd02be09bef9ebb17acae962c979265d4df doc_id: 805254 cord_uid: rlw166s1 Coronavirus disease-19 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The SARS-CoV-2 virus strains has geographical diversity associated with diverse severity, mortality rate, and response to treatment that were characterized using phylogenetic network analysis of SARS-CoV-2 genomes. Although, there is no explicit and integrative explanation for these variations, the genetic arrangement, and stability of SARS-CoV-2 are basic contributing factors to its virulence and pathogenesis. Hence, understanding these features can be used to predict the future transmission dynamics of SARS-CoV-2 infection, drug development, and vaccine. In this review, we discuss the most recent findings on the mutations in the SARS-CoV-2, which provide valuable information on the genetic diversity of SARS-CoV-2, especially for DNA-based diagnosis, antivirals, and vaccine development for COVID-19. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an enveloped RNA virus, belongs to the Betacoronavirus genus. This genus includes zoonotic RNA viruses that have led to recent important epidemics: SARS (Severe Acute Respiratory Syndrome) in 2002 and MERS 4 Genome size seems to have a negative correlation with mutation rate. The high mutation rate of RNA viruses is related to the short length of the genome. Some DNA viruses with larger genome sizes code a DNA repair protein. RNA-dependent RNA polymerase (RdRp) is the other parameter for a higher mutation rate of RNA viruses. Unlike many polymerases involved in the replication of DNA viruses, RdRp does not have a proofreading activity. This enzyme is incapable of correcting mistakes occurring during viral replication because it lacks a 3' exonuclease domain. Retroviruses reverse transcriptase (RT) enzyme has the same property that permits them to mutate and evolve very fast (2) . Unlike most RNA viruses, the order of Nidovirals including, Roniviruses, Toroviruses, and Coronaviruses, have an RdRp independent proofreading activity. They encode RdRp containing a 3' exonuclease region and thus show lower mutation rates. This feature seems to be a critical factor in describing how these viruses possess larger genomes (26 kb) in comparison with other RNA viruses. It should be noted that besides viral factors affecting the mutation rate of the virus, host encoded factors such as Double-strand RNA-dependent adenosine deaminases (ADARs), Apolipoprotein B mRNA editing catalytic polypeptide-like enzymes (APOBEC), and uracil DNA glycosylases (UNG) influence deeply on viral mutation rate (12). Finally, the viral mutation rate is determined by multiple viral and host mechanisms including proofreading activity, environmental changes, genome size, polymerase fidelity, replication mechanisms, post replicative repair, sequence background, template secondary structure, spontaneous nucleic acid damage, editing by host-encoded deaminases, imbalances in nucleotide pools (2). Wu et al. reported the first genomic sequence of SARS-CoV-2 from a worker of the Wuhan market on 26 December 2019 (46) . The accession number of this sequence in the National Center for Biotechnology Information (NCBI) GenBank (47) has been provided since January 2020 (NC_045512). The arrangement of this positive single-stranded RNA genome se uence is - shutoff factor, which suppresses innate immune in the host, binds to the human ribosomal complexes in the 40S subunit, consisting of the 43S pre-initiation complex and the nontranslating 80S ribosome (49) . Nsp2 and S protein have shown a significant stimulatory effect on the type-I interferon (IFN) induction (50) . Two proteases are encoded in the SARS-CoV-2 genome. One of them is a papain-like protease (PL pro ) encoded in nsp3, which cleaves nsp1, nsp2, and nsp3. The other one is 3-chymotrypsin-like "main" protease(3CL pro ) encoded in nsp5, which processes 13 other non-structural proteins after production (51) . It is indicated that nsp7 and nsp13 are related to T-cell immune response (52) . Nsps 12-16, encoding RNA-dependent RNA polymerase (RdRp), helicase, 3′-to-′exonuclease; endoRNAse and 2′-O-ribose methyltransferase, respectively are linked with virus replication and transcription. The structural region is another part of the SARS-CoV-2 genome that encodes four major structural proteins (Spike (S), Membrane (M), Envelope (E), and Nucelocapside (N)). S protein binds to receptor (angiotensin-converting enzyme 2 (ACE2)) and fuses to the host cell membrane (53) . After S, ORF3a is located in the SARS-CoV-2 genome, which encodes an essential accessory protein. This protein can induce apoptosis, but the pro-apoptotic activity of ORF3a in SARS-CoV-2 is relatively weaker than SARS-CoV ORF3. It is probably associated with mild symptoms or asymptomatic during the early stages, which leads to spread more widely for this virus(54). The E protein is the most mysterious structural protein among the four once, which plays a crucial role in the ER-Golgi localization and tight junction disruption (55) . M protein is the most abundant structural protein in coronaviruses that span virion membrane and along E protein have an important role in virus assembling. ORF6, ORF8, and nucleocapsid (N) proteins are other structural proteins that are inhibitors of the interferon type 1 signaling pathway. This interferon is a critical factor in the host's innate immune for the antiviral response (56, 57) . J o u r n a l P r e -p r o o f Since the inception of the COVID-19 pandemic, exploration of SARS-CoV-2 genetic variation has been a notable subject globally, promoting the development of vaccines and diagnostics (58) . A comprehensive study analyzed 12343 SARS-CoV-2 genome sequences isolated from around the world. Also, an investigation was performed on variants' correlation with the fatality rates in several countries. By 16 common amino acid mutations in hierarchical clustering, 28 countries were classified into three clusters (62 For protein M, A2S, V70I, T175M were reported in 6 genomes, and EL37R and P71L were reported just in 3 genomes(63). According to another study, the D614G variant was reported as a high frequent mutation in different European countries, also Turkey and Iran (64) (65) (66) (67) . Studies have shown that D614G in S protein is dominant in pandemic and also is related to more infectivity, transmission, and higher viral load (68) . Some studies exhibit that patients infected with the spike 614G variant show higher mortality or clinical severity, but 614G is associated with higher viral load especially in younger patients (69) . Other studies show that this mutation results in an open conformation of the spike receptor-binding domain (RBD), and increase binding affinity to ACE2 and fusion to the host cell, which in turn leads to an increase in SARS-CoV-2 transmissibility and infectivity. Moreover, lower ACE2 expression that is reported in European populations, North American, and African than that of Asian populations is associated with enhanced transmission efficiency. This indicates a positive selection for the D614G mutation (70) . However, this mutation is fortunately located outside of the RBD, so it probably does not affect vaccine design in the currently viral lineage (67) . Some mutations in the receptor binding domain such as V367F and D364Y seem that can improve the stability of the spike protein structure and lead to more Besides mutation in structural proteins, the mutation in RdRp which is responsible for replication (such as P4715L) can be effective for virus proliferation (60) . On the other hand, the virus mutagenic capability is related to the fidelity of RdRp, which shows the significance of mutation in RdRp(1). RNA viruses such as SARS-CoV-2 usually show a high mutation rate that may be due to the deficiency of proofreading abilities generally by an RNA-dependent RNA polymerase (RdRp) Table 2) . Based on the most occurred missense mutations, the variants were classified into some clades. Generally, the most common nucleotide change in the SARS-CoV-2 genome, is C > T (as a transition mutation), accounting for 55.1% of all observed viral mutations, worldwide (82) . Surprisingly, for mutations that occurred in the SARS-CoV-2 genome, the transition versus transversion ratio was calculated as about 7:3. C > T transitions might be mediated by cytosine deaminases (60) . After that, the A > G especially in Africa, Europe, and the Americans has shown a rate of 14.8%. The third most common event worldwide, G > T, which is the most common transversion showed a rate of 12.0% (42,408 occurrences) may occur as a result of an oxo-guanine lesion from reactive oxygen species(82). As a summary for the most prevalent variants of the SARS-CoV-2 genome, it can be said that 28144T > C, in the ORF8, lead to a missense mutation as L84S, was recognized as a common mutation in the virus genome (61) . This mutation was reported in another study with frequency as 1,669 samples from a total of 10,022 genome sequences (60). In Khailany, et al. study, this variant assumed as the second prevalent mutation after the 8762 C > T (86). This mutation was founded on 5th Jan 2010 for the first time co-observed with 8782 C > T, and the variants with two these mutations called as S type (S clade) in comparison to the native L clade of the genome (87), leads to improve the replication fidelity in this RNA virus. However, the frequency of two of these mutations has gradually declined over time (87) . The transition mutation of 8782 C > T, in the NSP4 gene, on the other hand, co-appeared with 28144 T > C, was assumed as the earlier mutation with decreasing frequency over time (87) . 13 Newer mutations such as 14408C > T, especially in Europe, lead to a missense mutation in the RdRp gene as P323L, was occurred with a higher frequency for 6,319 samples (60) . This mutation is the characteristic of G clade and resulted in higher transmissibility of the virus (87). The variants containing 23403 A > G mutation with an amino acid substitution from aspartic acid to glycine in the S protein, are the other most prevalent variants especially in European samples that created a more transmissible form of SARS-CoV-2 (11) . This mutation is one of the four prevalent mutations co-occurred in the G clade similarly to the 3037C > T as a synonymous mutation showed the highest prevalence as above 6,300 from over 10,000 samples and was very routine in the Russian samples too (60, 88) . Guanidine substitution with Thymidine or Cytidine in the 11083th nucleotide of NSP6 gene was not related to the other frequent mutations and especially related to the European samples (10) . However, in the Koyama study, it was shown that this missense mutation occurred with frequency as about 1100 samples (60) . The other more common point mutation was 26144 G > T, in ORF3a resulted in G251V in the proteome level and was more prevalent in European surveyed samples (87) . This type of mutation was assumed as the third prevalent variant after the 8782 C > T and 28144 T > C, as a single mutation according to the Yin et al. study (80) . Finally, in one study, a single point mutation has been shown as 21707 T > C, corresponds to H49Y amino acid substitution in the S protein, and occurred with a frequency of 0.4% from a total of 4,533 surveyed sequences, until 6 May 2020. The first date for creating this variant was estimated as 12 Jan 2020 and emerged again in all the sequences obtained from Mexico. Since then, this mutation has appeared as a singleton in various virus variants worldwide (13) . However, the final effect of this variant has not yet been fully studied (89). Given that the SARS-CoV-2 originated from China, it can be said that the studied samples from showed a frequency more than 0.1% according to the Koyama's study (60, 68) . Due to the importance of spike protein in pathogenicity and immunogenicity, we retrieved all Spike sequences submitted to NCBI (until 10th January 2021) for each continent and after deletion of the repeated sequences and multiple alignments, Shannon's entropy plot for each continent was attained by using of BioEdit software (Fig. 2) . Whether a mutation changes immune, evasion infectivity, or pathogenicity or some combination of these is yet to be implicit. In some cases, the mutation may transmit with other mutations simultaneously, for example, the mutations that affect the RNA-dependent RNA polymerase, with implications on replication efficiency, proofreading, and the emergence of resistant phenotypes (1) . Finding the answer to these questions depends on worldwide studies to provide comprehensive data on infection and mortality rates as well as the continuous sampling of the genotypes of circulating isolates all around the world. So far, we lack these integrated data and not able to correlate molecular findings with clinical and population-level consequences. These points are critically crucial for recombinant vaccines such as; mRNAs, DNA, subunits, and viral vector-based vaccines that mostly use only full length S protein or truncated S1 subunit in engineering vaccine constructs. A shift from monovalent to bivalent or trivalent was previously used in the Coronaviridae family members for decades and shows near 100% protection (106, 107) . Also, in the killed (inactivated) vaccine, which mainly stimulates the humoral immune system and antibody production, it is necessary to pay attention to these points. Moreover, due to the high frequency of 14408 (P323L) mutation in RdRp, which increases the possibility of mutations and changes in the entire virus genome (especially in S protein), it is worthwhile to have at least one strain with this mutation in live attenuated and killed vaccine mixture. This strategy enriches the pool and stores the antigen of the vaccine seeds during passages before formulation. Studies have also shown that in recombinant coronaviruses vaccines, multivalent vaccines for example those containing proteins S, N, and M are more effective than monovalent vaccines that have only S protein (108) (109) (110) . In this regard, immunoinformatics and antibody-antigens simulations with wide abilities and areas could accelerate selecting vaccinial strains to incorporate with laboratory tests (106, 111) . Furthermore, due to mutations in the cleavage region between S1 and S2, it is better to consider this region in designing recombinant S1 based vaccines that use only S1 length in their construct. Besides considering the immunologic effects of mutations, the selection of vaccine strains could also assist according to GISAID nomenclature system (https://www.gisaid.org/). In the GISAID database, sequenced SARS-CoV-2 genomes were clustered into one of 6 main clades includes G, GH, GR, L, S, and V. J o u r n a l P r e -p r o o f 21 Similarly, most of the points mentioned above are useful for plasma-therapy and therapeutic mAbs. It is due to mutations in the virus at the sites that cause the neutralizing antibodies to fail to inhibit the virus. It decreases the affinity of antibodies to spike glycoproteins (100) . So, it could also recommend that the recovered plasma from individuals must be pooled and then used to reduce the risk of plasma therapy failure (not person to person). Also, in preparing the horse serum for sero-therapy, at least three strains or more should be used for immunization to achieve more strong neutralizing serum, which improves the rate of success of serum therapy. Using different drugs will cause drug resistance due to mutations. For instance, mutations F480L, V557L, and D484Y in RdRp protein may lead to resistance to Remdesivir (112, 113) . Therefore, in the treatment, it may need to use two to three different drugs (combination therapy) against different protein targets to prevent this phenomenon (114, 115) . Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Mechanisms of viral mutation. Cellular and molecular life sciences Virus evolution and transmission in an ever more connected world Deciphering the role of host genetics in susceptibility to severe COVID-19 RNA virus mutations and fitness for survival. Annual review of microbiology Viruses at the edge of adaptation An 81 nucleotide deletion in SARS-CoV-2 ORF7a identified from sentinel surveillance in Arizona Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals an amino acid deletion in nsp2 (Asp268del) Genetic diversity and evolution of SARS-CoV-2. Infection Genetic spectrum and distinct evolution patterns of SARS-CoV-2 Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity Why are RNA virus mutation rates so damn high? Complexities of viral mutation rates WIN 52035-dependent human rhinovirus 16: assembly deficiency caused by mutations near the canyon surface Genetic and molecular analyses of spontaneous mutants of human rhinovirus 14 that are resistant to an antiviral compound High frequency of single-base transitions and extreme frequency of precise multiple-base reversion mutations in poliovirus Mutation rates among RNA viruses Very high frequency of reversion to guanidine resistance in clonal pools of guanidine-dependent type 1 poliovirus Evolutionary histories of coxsackievirus B5 and swine vesicular disease virus reconstructed by phylodynamic and sequence variation analyses Human norovirus hypermutation revealed by ultra-deep sequencing Effect of ribavirin on the mutation rate and spectrum of hepatitis C virus in vivo Quantifying the diversification of hepatitis C virus (HCV) during primary infection: estimates of the in vivo mutation rate How did Zika virus emerge in the Pacific Islands and Latin America? Highly selective transmission success of dengue virus type 1 lineages in a dynamic virus population: An evolutionary and fitness perspective. iScience Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus Mosaic structure of human coronavirus NL63, one thousand years of evolution A comparative epidemiologic analysis of SARS in Hong Kong, Beijing and Taiwan Variation in RNA virus mutation rates across host cells The evolutionary dynamics of canid and mongoose rabies virus in Southern Africa Measurement of the mutation rates of animal viruses: influenza A virus and poliovirus type 1 Comparison of the mutation rates of human influenza A and B viruses Genetic diversity and evolution of human metapneumovirus fusion protein over twenty years The evolutionary and epidemiological dynamics of the paramyxoviridae Determination of spontaneous mutation frequencies in measles virus under nonselective conditions Replacement of previously circulating respiratory syncytial virus subtype B strains with the BA genotype in South Africa Ancient common ancestry of Crimean-Congo hemorrhagic fever virus. Molecular phylogenetics and evolution Reassortment of human rotavirus gene segments into G11 rotavirus strains. Emerging infectious diseases Determination of the mutation rate of a retrovirus In vivo analysis of human T-cell leukemia virus type 1 reverse transcription accuracy The interaction of vpr with uracil DNA glycosylase modulates the human immunodeficiency virus type 1 In vivo mutation rate Unselected mutations in the human immunodeficiency virus type 1 genome are mostly nonsynonymous and often deleterious Identification and genetic diversity of two human parvovirus B19 genotype 3 subtypes JC virus evolution and its association with human populations On the mutation rate of herpes simplex virus type 1 Postreplicative mismatch repair. Cold Spring Harbor perspectives in biology A new coronavirus associated with human respiratory disease in China Genotype and phenotype of COVID-19: Their roles in pathogenesis SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation Activation and evasion of type I interferon responses by SARS-CoV-2 Mechanism and inhibition of the papain-like protease, PLpro, of SARS-CoV-2. The EMBO journal SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19 The ORF3a protein of SARS-CoV-2 induces apoptosis in cells Enhanced binding of SARS-CoV-2 Envelope protein to tight junction-associated PALS1 could play a key role in COVID-19 pathogenesis Indicates Evolutionary Conserved Functional Regions of Viral Proteins SARS-CoV-2 Orf9b suppresses type I interferon responses by targeting TOM70 Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders SARS-CoV-2 genomic variations associated with mortality rate of COVID-19 Variant analysis of SARS-CoV-2 genomes Phylogenetic network analysis of SARS-CoV-2 genomes SARS-CoV-2 genomic variations associated with mortality rate of COVID-19 Evolutionary Analysis of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Reveals Genomic Divergence with Implications for Universal Vaccine Efficacy Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization The emergence of SARS-CoV-2 in Europe and North America The origin of SARS-CoV-2 in Istanbul: Sequencing findings from the epicenter of the pandemic in Turkey Two independent introductions of SARS-CoV-2 into the Iranian outbreak Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity Impact of Genetic Variability in ACE2 Expression on the Evolutionary Dynamics of SARS-CoV-2 Spike D614G Mutation Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharmaceutica Sinica B. 2020. 72. Organization PAHOWH. Occurrence of variants of SARS-CoV-2 in the Americas First detection of SARS-CoV-2 spike protein N501 mutation in Italy in August, 2020. The Lancet Infectious Diseases Key residues of the receptor binding motif in the spike protein of SARS-CoV-2 that interact with ACE2 and neutralizing antibodies Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Coronavirus variants and mutations The New York Genotyping coronavirus SARS-CoV-2: methods and implications The establishment of reference sequence for SARS-CoV-2 and variation analysis Geographic and Genomic Distribution of SARS-CoV-2 Mutations Variant analysis of 1,040 SARS-CoV-2 genomes Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity International expansion of a novel SARS-CoV-2 mutant. medRxiv Genomic characterization of a novel SARS-CoV-2 Genomic, geographic and temporal distributions of SARS-CoV-2 mutations Isolation and phylogenetic analysis of SARS-CoV-2 variants collected in Russia during the COVID-19 outbreak Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding Spike Protein of SARS-CoV-2: Impact of Single Amino Acid Mutation and Effect of Drug Binding to the Variant-in Silico Analysis Genomic analysis of early SARS-CoV-2 strains introduced in Mexico. bioRxiv. 2020. 92 Variant analysis of SARS-CoV-2 genomes in the Middle East The N501Y and K417N mutations in the spike protein of SARS-CoV-2 alter the interactions with both hACE2 and human derived antibody: A Free energy of perturbation study Design of Specific Primer Sets for the Detection of B. 1.1. 7, B. 1.351 and P. 1 SARS-CoV-2 Variants using Deep Learning E484K as an innovative phylogenetic event for viral evolution: Genomic analysis of the E484K spike mutation in SARS-CoV-2 lineages from Brazil Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV Neutralization of SARS-CoV-2 by destruction of the prefusion Spike Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization. Available at SSRN 3725763 A key linear epitope for a potent neutralizing antibody to SARS-CoV-2 S-RBD. bioRxiv Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants Designs of antigen structure and composition for improved protein-based vaccine efficacy Mutations in SARS-CoV-2 Leading to Antigenic Variations in Spike Protein: A Challenge in Vaccine Development Sequencing and In Silico Multi-aspect Analysis of S1 Glycoprotein in 793/B Serotype of Infectious Bronchitis Virus Isolated From Iran in Infectious bronchitis in laying hens: the relationship between haemagglutination inhibition antibody levels and resistance to experimental challenge The immunoreactivity of a chimeric multiepitope DNA vaccine against IBV in chickens. Biochemical and biophysical research communications Multivalent DNA vaccine enhanced protection efficacy against infectious bronchitis virus in chickens Oral and nasal DNA vaccines delivered by attenuated Salmonella enterica serovar Typhimurium induce a protective immune response against infectious bronchitis in chickens Novel Applications of Immuno-bioinformatics in Vaccine and Bio-product Developments at Research Institutes. Archives of Razi Institute Remdesivir failure with SARS-CoV-2 RNA-dependent RNA-polymerase mutation in a B-cell immunodeficient patient with protracted Covid-19 Coronavirus susceptibility to the antiviral remdesivir (GS-5734) is mediated by the viral polymerase and the proofreading exoribonuclease Editorial overview: antivirals and resistance: advances and challenges ahead. Current opinion in virology Comparison of antiviral resistance across acute and chronic viral infections