key: cord-0699257-zqc74vom authors: Peacock, Thomas P.; Penrice-Randal, Rebekah; Hiscox, Julian A.; Barclay, Wendy S. title: SARS-CoV-2 one year on: evidence for ongoing viral adaptation date: 2021-04-15 journal: J Gen Virol DOI: 10.1099/jgv.0.001584 sha: 85684f9e2bc99657877b6bcefaab6125209635ae doc_id: 699257 cord_uid: zqc74vom SARS-CoV-2 is thought to have originated in the human population from a zoonotic spillover event. Infection in humans results in a variety of outcomes ranging from asymptomatic cases to the disease COVID-19, which can have significant morbidity and mortality, with over two million confirmed deaths worldwide as of January 2021. Over a year into the pandemic, sequencing analysis has shown that variants of SARS-CoV-2 are being selected as the virus continues to circulate widely within the human population. The predominant drivers of genetic variation within SARS-CoV-2 are single nucleotide polymorphisms (SNPs) caused by polymerase error, potential host factor driven RNA modification, and insertion/deletions (indels) resulting from the discontinuous nature of viral RNA synthesis. While many mutations represent neutral ‘genetic drift’ or have quickly died out, a subset may be affecting viral traits such as transmissibility, pathogenicity, host range, and antigenicity of the virus. In this review, we summarise the current extent of genetic change in SARS-CoV-2, particularly recently emerging variants of concern, and consider the phenotypic consequences of this viral evolution that may impact the future trajectory of the pandemic. Towards the end of 2019, reports began of an unknown respiratory illness in the Chinese city of Wuhan. Within several weeks, it became clear these infections were being caused by a SARS-like coronavirus, which was termed SARS-CoV-2, with the associated disease called COVID- 19 . In severe cases this results in extensive immunopathology in the lungs [1] . By early March 2020, the virus had entered many countries across the world and the WHO declared a pandemic on 11 March [2] . In the months since, different countries across the world have enacted different pandemic response plans that vary from recurrent lockdowns, mask mandates, social distancing rules, or uncontrolled circulation in a hope to acquire herd immunity. In areas with elevated SARS-CoV-2 prevalence, high levels of morbidity and excess mortality, particularly in the elderly, has resulted. As of 13 March 2021, there have been an estimated 120 million confirmed cases of COVID-19 globally with over 2.6 million confirmed deaths [3] . SARS-CoV-2 is a betacoronavirus, containing a ~30 kb positive-sense RNA genome, among the largest of any RNA virus (Fig. 1) . Coronaviruses, such as SARS-CoV-2, avoid error catastrophe by encoding an exoribonuclease (nsp14) that confers a unique proofreading mechanism during viral RNA synthesis [4, 5] . Genome sequencing of SARS-CoV-2 throughout the course of the outbreak, has revealed a nucleotide substitution rate of ~1×10 −3 substitutions per year [6] . This is comparable to the substitution rate observed for Ebola virus (1.42×10 −3 ) during the 2013-2016 West African outbreak [7] . However, SNPS are not the only genetic variation seen commonly in coronaviruses. Replication of the coronavirus genome and transcription of viral subgenomic mRNAs (sgmRNAs) are complex processes. The genome is roughly organised into two regions. The first two thirds of the genome is immediately translated and proteolytically processed in the host cell cytoplasm to generate the viral polymerase/transcriptase complex and other viral proteins. The remaining one third of the genome is expressed and translated through a nested set of sgmRNAs, this includes the spike glycoprotein and other structural and accessory proteins. These sgmRNAs are 5′ and 3′ co-terminal with the genome; the 5′ end contains a leader sequence that is present on the 5′ end of the genome. Along the genome, OPEN ACCESS proceeding each ORF is a transcription regulatory sequence (TRS). The prevailing thought is that an integral part of the transcription mechanism in coronaviruses for the synthesis of viral sgmRNAs involves a discontinuous step. The easiest way to visualise this, is that the polymerase/transcriptase complex binds to the 3′ end of the positive strand and proceeds along the genome in a 3′ to 5′ direction synthesizing a negative strand. When the polymerase/transcriptase complex reaches a TRS, the newly synthesized negative strand can translocate to the 5′ leader sequence of the genome where it is then copied. This forms a negative sense sgmRNA that is then copied into the positive sense sgmRNA [8] . This discontinuous nature has the consequence of a high degree of recombination resulting in the insertion of viral and non-viral sequences into -or frequent deletions of viral sequence from -the genome. This can result in the formation of viable genomes as well as defective interfering RNAs. Therefore, both SNPs and indels are likely to be the major processes allowing coronaviruses to rapidly switch host range or change their pathogenicity and/ or virulence. For example, in cats infected by feline enteric coronavirus (FECV), variants can be generated within an infected animal by deletion of a key furin cleavage site in the spike protein. This results in feline peritonitis virus (FIPV) that causes a systemic fatal disease [9] . Recombination between different coronaviruses has been hypothesised to have given rise to both the genetically divergent receptor binding domain of SARS-CoV-2 spike [10, 11] , as well as the insertion of the S1/S2 furin (polybasic) cleavage site [12] . MERS-CoV is also thought to have had a major recombination event in recent evolutionary history [13] . Furthermore, deletions in the genome of the porcine coronavirus transmissible gastroenteritis virus (TGEV) gave rise to a new virus called porcine respiratory coronavirus (PRCV) [14] . Human seasonal coronavirus HCoV-OC43 and -HKU1 are thought to have acquired a hemagglutinin esterase (HE) gene following recombination between a progenitor coronavirus and influenza C-like virus [15] . Variants of OC43 and HKU1 HE have been shown to lose their sialic acid binding activity through progressive deletions in their lectin domains [16] . Finally, the N-terminal domain (NTD) of coronavirus spike proteins shares a number of structural similarities to eukaryotic galectins, leading to some to hypothesise the precursor to coronaviruses may have incorporated a portion of the host gene in the distant past [17] . Host RNA may also be a source for the polybasic cleavage site, similar to the proposed mechanism for generating highly pathogenic avian influenza viruses [18] . Studies on a recombinant attenuated SARS-CoV lacking the envelope (E) gene or the PDZ-binding motif (generated as a potential vaccine candidate) showed the virus could revert to virulence by partially duplicating a viral sequence (from ORF8a) which restored E function [19] . Spike is the major glycoprotein responsible for SARS-CoV-2 entry, as well as the primary antigen and target of most SARS-CoV-2 vaccines currently in use and future development (Fig. 2) . SARS-CoV-2 virions contain approximately 23 spike trimers on their surfaces [20] . The SARS-CoV-2 spike glycoprotein is synthesised as a single precursor polypeptide that forms trimers. Spike is subsequently cleaved into two major subunits, S1 and S2, by endogenous cellular furin [21] . The S1 subunit is composed of two further subdomainsan N-terminal domain (NTD), whose function is poorly described for SARS-CoV-2 but can act as a receptor binding domain in some coronaviruses and a potential glycan shield against antibody-mediated immunity, and a C-terminal receptor binding domain (RBD). The RBD of SARS-CoV-2 (as with SARS-CoV and seasonal HCoV-NL63) binds human angiotensin-converting enzyme 2 (ACE2), as its cognate cell surface receptor [22] . Spike glycoprotein shifts between two separate conformations -an 'open' or 'up' conformation able to effectively bind ACE2, and a 'closed' or 'down' conformation, with its receptor binding interface packed down into the top of the spike trimer [23, 24] . Different trimers may have one, two or three spike glycoproteins in either conformation. It has been suggested that the closed conformation may allow for viral escape from RBD-binding neutralising antibodies. The S2 subunit contains the spike fusion peptide, a transmembrane domain and a short cytoplasmic tail. This short cytoplasmic tail contains a signal sequence that retains the spike in the endoplasmic reticulum from where, after particle assembly, virions are able to bud into the endoplasmic reticulum-Golgi intermediate compartment (ERGIC) [25] . Immediately adjacent to the fusion peptide is a second protease cleavage site termed the S2' cleavage site. Upon both S1/S2 cleavage and receptor binding by the RBD, the S1 subunit dissociates from S2 exposing the S2' site and enabling its cleavage by cellular proteases such as TMPRSS2 or Cathepsin L [26] [27] [28] . S2' cleavage results in immediate activation of the fusion peptide and subsequent spike-mediated membrane fusion [26] . Mutations shown in red, ACE2 shown in yellow, spike monomer in RBD 'up' conformation shown in green, spike monomers in RBD 'down' conformation shown in pink and blue. Structure made using PyMOL using PDBID 7A94 [24] . Due to the spike glycoprotein being the major viral antigen and since the RBD/ACE2 interaction is a major host range determinant, there is considerable selection pressure placed on this region of the viral genome. Generally, the S1 subunit is thought to be the major inducer of a protective antibody response and variation in this region can result in antigenic drift either against previous infection by other variants or induced by vaccination [29] . The best characterised of the polymorphisms seen in SARS-CoV-2 since its emergence is the spike glycoprotein mutation D614G. Viruses with D614G were first detected in February 2020 and by May, around 80 % of sequences globally were found to contain this mutation [30] [31] [32] . The rapid replacement of previously circulating SARS-CoV-2 strains is likely due to this virus being slightly more transmissible than the previous strains combined with a strong founder effect as the virus exponentially expanded in a first pandemic wave across Europe and the Americas [30, 33] . Notably, the major clade containing D614G (Pango lineage B.1 and its sub-lineages) also contained several other genetically linked mutations, including one in the main polymerase subunit NSP12, P323L, that may also have contributed to its dominance by exerting a fitness advantage. On the other hand, there are several examples of independent acquisition of D614G (but not P323L), such as the A.19 and A.2.4 lineages, which continue to circulate [34] . D614G has now been shown by a multitude of independent studies to enhance entry into human ACE2 expressing cells in pseudovirus based assays in vitro [30, 32, [35] [36] [37] . In addition, several studies linked D614G containing viruses to lower Ct values in clinical SARS-CoV-2 diagnostic PCR tests, indicating the virus replicated more efficiently in the human respiratory tract, although without any link to higher pathogenicity or severe clinical outcomes [30, 34, [38] [39] [40] . Several groups have also shown that D614G containing viruses (either recombinant or naturally occurring strains) had enhanced growth in primary human airway cells and replicated with greater efficiency in animal models such as hamster, ferret or the human ACE2-expressing mouse, and transmitted more efficiently in a hamster model, suggesting D614G alone is sufficient to confer this advantage in the absence of P323L [41] [42] [43] [44] [45] . Several non-mutually exclusive mechanisms have been proposed to explain how D614G enhances entry, replication, and transmission. The best described of which is that the polymorphism weakens an interaction at the trimer interface leading to a greater proportion of spike RBD in the 'open' conformation and subsequently allows enhanced ACE2 binding [24, [46] [47] [48] [49] [50] . An alternative proposed mechanism, though not necessarily incompatible with that previously described, is that D614G stabilises the pre-fusion structure of the spike trimer, preventing premature shedding of the S1 subdomain which can occur after furin cleavage [32, 47, 49, 51] . One additional proposed mechanism, again not necessarily incompatible with the others, is that D614G results in changes in the conformation of the S1/S2 cleavage site loop allowing more efficient access by furin and therefore more efficient S1/S2 cleavage [42, 49, 51, 52] . Once it became clear D614G containing variants were rapidly expanding globally, a major concern was that this might affect the efficacy of vaccines that were being developed which universally contained spike antigens with the ancestral D614. This concern has been allayed by the repeated finding that the D614G variants are equally, if not more readily, neutralised by antisera raised against D614 containing virus or vaccines, as well as by therapeutic monoclonal antibodies [30, 32, 37, 41, 44, 46, 48] . At present, global sequencing surveillance suggests that viruses without D614G are almost non-existent. One exception is a lineage of viruses identified in Uganda (A.23) that, as of March 2021, continued to contain D614. This lineage, however, does contain a nearby spike mutation, Q613H. Although Q613H is currently uncharacterised, it may play a similar role to the D614G substitution, allowing this variant to continue co-circulating [53, 54] . Ferrets, which are members of the Mustelidae family, have traditionally been models for influenza virus transmission and infection and were quickly utilised in a similar manner for SARS-CoV-2 research, showing a dose dependent response to SARS-CoV-2 and protection from reinfection [55, 56] . Mink are closely related to ferrets and are farmed in many countries for their fur. It became apparent by the middle of 2020 that mink, like ferrets, were highly susceptible to reverse-zoonotic SARS-CoV-2 infection [57, 58] . Although mink (and ferrets) could be readily infected, several spike glycoprotein mutations rapidly and repeatedly arose in these hosts, both in the field and under laboratory conditions, most commonly Y453F and N501T in the RBD (Figs 1 and 2) [59] [60] [61] . Both Y453F and N501T have been shown to allow stronger binding of the spike RBD to human ACE2. Moreover, from analysis of the interaction between the spike glycoprotein and ACE2, it is apparent that the Y453F mutation may optimise an interaction with Y34 present in mink and ferret ACE2 [62] [63] [64] . Further alarm was raised when a large cluster of human cases were detected in Denmark and the Netherlands which harboured these mutations. In particular, the Y453F mutation was detected in Northern Denmark in combination with several other spike mutations including the NTD deletion Δ69-70 [60, 61] . This virus variant, known as 'Cluster 5' , was shown to partially escape neutralisation by convalescent antisera [61, 65, 66] . This led to the culling of nearly 17 million mink [67] , with several other countries making plans to close their own mink farms or carry out mass culling as a precaution. It is possible that by adapting its suboptimal interaction with ACE2 protein found in mustelids, the virus may have inadvertently selected for stronger receptor binding to human ACE2 and this may account for loss of neutralisation. It is well described for both human and avian influenza viruses that increases in receptor binding can allow non-specific antibody escape. A stronger interaction between the virus glycoprotein and host receptor may better outcompete weaker competitive binding between antibody and virus glycoprotein [68] [69] [70] . Additionally, Y453F has been reported in a single case from an immunocompromised patient, potentially as an adaptation to human ACE2 (see Table 1 ) [62, 71] . Vero E6 cells have been widely used for isolation and growth of SARS-CoV-2 stocks as they are readily available, easy to use and highly permissive to the virus [72] . However, during propagation of SARS-CoV-2 isolates in Vero cells, deletions incorporating, or flanking, the furin cleavage site between S1 and S2 spike subunits are often reported [43, [73] [74] [75] [76] [77] [78] . Similar deletions have been detected in clinical samples at very low frequency, including from human autopsy samples [79] [80] [81] . The deletion of the furin cleavage site adapts the virus to higher replication in cells lacking TMPRSS2, such as Vero cells, but attenuates the virus in TMPRSS2-expressing cells such as primary human airway cells [51, 80, [82] [83] [84] [85] . Furthermore, furin cleavage site deletions result in lower pathogenicity in animal models and attenuated virus transmission in hamster and ferret models [75, 80, 81, 84, 86] . This has implications for source virus used in infection and challenge studies and virus stocks should be sequenced prior to use to ensure the furin cleavage site is intact. We and others have suggested the furin cleavage site allows rapid TMPRSS2-dependent cell entry at the cell surface or early endosome allowing the virus to evade highly restrictive endosomal IFITM proteins (such as IFITM2 or IFITM3). In contrast, in the absence of TMPRSS2 the virus must enter via the endosome/lysosome to be activated by cathepsins. In the harsh conditions of the acidifying endosome or the lysosome, having a pre-cleaved S1/S2 site may be disadvantageous as it results in instability of the spike glycoprotein and premature S1 shedding [80, 85] . In December 2020, a cluster of COVID-19 cases (known variously as B.1.1.7, 20B/501Y.V1, or VOC/202012/01) was detected in South East England [87] . This cluster showed evidence of higher transmissibility in the community compared to contemporary strains [88] , significantly higher case-fatality rates [89, 90] , and, depending on the study, lower Ct values from diagnostic PCR tests on clinical swabs [91] [92] [93] [94] . In the UK, B.1.1.7 is now the predominant lineage, accounting for >90 % of infections [95] . In many countries, initially imported B.1.1.7 is rapidly outcompeting local strains and becoming the major circulating strain [96] [97] [98] [99] . B.1.1.7 contains seven non-synonymous mutations in the spike glycoprotein and a Choi et al. Choi et al. Truong et al. Truong et al. Truong et al. Avanzato et al. Truong et al. Khatamzas et al. Truong et al. Choi et al. Khatamzas et al. Kemp et al. Truong et al. Truong et al. Truong et al. Khatamzas et al. Continued total of 23 mutations across the whole genome (see Table 2 ). Notable mutations in the spike glycoprotein include N501Y, Δ69-70, Δ144, and P681H in S1. N501Y lies in the RBD and has been described as increasing human ACE2 binding as well as enabling binding to mouse ACE2 [62, [100] [101] [102] . The P681H polymorphism lies adjacent to the S1/S2 furin cleavage site and we have shown this mutation alone, or in the B.1.1.7 spike, enhances its efficiency of furin cleavage [52] . The Δ69-70 and Δ144 deletions lie in the NTD region of spike and may modulate antigenicity [103] [104] [105] . Additionally, the B.1.1.7 lineage contains a premature stop codon in the accessory protein ORF8 and a three amino acid deletion in NSP6 (both described in more detail later in this review). The apparent long phylogenetic branch length and pattern of mutations has led to the hypothesis that this virus may have emerged from long-term infection in an immunocompromised patient, before spilling back into the general population [87] . There is growing evidence that B. Persistent infection in immunocompromised patients may allow viruses to rapidly generate diversity under prolonged selection pressures that are absent in typical SARS-CoV-2 infections that transmit within days and resolve within weeks. Such infections have been proposed to be a potential mechanism for rapid antigenic evolution in influenza [114] . Various NTD deletions in the SARS-CoV-2 spike glycoprotein are commonly observed in immune-suppressed patients with a long-term infection, supporting the idea of intra-host evolution (see Table 1 ) [71, 100, 103, 104, [115] [116] [117] [118] . A recurrent NTD deletion in the 140-145 region has been found in nine separate chronically infected patients, indicating this may be a signature mutation of these long-term infections [71, 100, 104, [115] [116] [117] [118] .This mutation is also found in the B. Furthermore, the Δ69-70 deletion in the NTD has also arisen multiple times independently, both in healthy and immunocompromised humans, as well as in mink, and often in combination with RBD interface mutants as described earlier in this review [61, 71, 103, 104, 120] . It has been hypothesised that this deletion could act as a 'permissive' mutation, somehow allowing or compensating for receptor binding mutations (such as N439K, Y453F or N501Y) that alone may be deleterious to virus fitness, due to a currently undescribed effect on spike stability or similar [120] . A mechanism for this relationship between NTD and RBD is unclear at present as residues 69-70 of the NTD are distal to the RBD in both the open and closed spike trimer conformations [24] . The 69-70 deletion removes six nucleotides that are part of the probe target sequence in one of the commonly used RT-PCR tests used to screen swabs for diagnosis with COVID-19. The resulting 'S gene target failure' has been a fortuitous way to easily monitor the growth of lineages carrying this deletion, such as the UK B.1.1.7 variant of concern [91] . Fortunately, the diagnosis of cases has not been compromised because of the redundancy built into the diagnosis platforms that use several different primer-probe sets across the SARS-CoV-2 genome. In recent months, several independent lineages of viruses containing the spike glycoprotein mutation E484K have been detected worldwide -once in South Africa (B.1.351 or 20B/501Y.V2) and at least twice independently in Brazil (P.1 or 20B/501Y.V3, and P.2) [121] [122] [123] . This mutation is of particular concern as independent studies have suggested E484K is a bona fide escape mutant to many convalescent antisera [108, 119, 124] . Both Brazil (particularly the Amazonas region) and South Africa experienced high disease burdens in 2020 and likely have high seroprevalence which may have driven emergence of these antigenic variants [125, 126] . This is further reinforced by several case studies showing E484K containing variants reinfecting healthcare workers in Brazil and a high rate of reinfection of seropositive individuals in the placebo arm of a vaccine trial in South Africa [122, [126] [127] [128] [129] . Concerningly, recent evidence suggests that these E484K variants likely partially or fully escape vaccine-or naturally immunity-derived antisera [66, 108, 111, 130] . These E484K lineages represent three independent emergences of the same mutation: E484K alone in the P. 2 [112, 132] . As well as the three main variants of concern described above (B. [136] , seen in the UK and related to travel to Antigua, and finally the P.3 lineage described first in the Philippines [137, 138] . Although there is not currently Table 2 . Substitutions and deletions seen in currently circulating variants of concern and variant of concern-like viruses strong evidence of more rapid transmission of most of these variants, they share a number of molecular characteristics with the major variants of concern, such as combinations of receptor avidity-enhancing mutations (N501Y, E484K and/ or S477N), furin cleavage site adjacent mutations (P681R/H, Q677H, H655Y) and genomic deletions (spike ~Δ140 deletions, ~Δ243 deletions, and NSP6 Δ106-108, see Table 2 ). The similarity between many of these variants suggests a remarkable degree of convergent evolution. Since March 2020, the spike mutation N439K has arisen multiple times (all alongside D614G), independently in Europe and the USA. N439K lies directly within the RBD/ ACE2 binding interface. Subsequent binding studies have shown this variant shows a modest increase in ACE2 binding and clinical data indicates marginally lower Ct values in clinical diagnostic PCR tests, indicative of higher replication [40] . Furthermore, N439K moderately alters antigenicity with some human convalescent antisera and monoclonal antibodies less able to bind and neutralise the variant virus or pseudovirus, although these G changes during replication [172] . For influenza virus, we have previously shown that apparent RNA editing can result in the rapid emergence of antigenic variants with multiple concurrent amino acid changes [173] . Studies have shown a bias in C ->U mutations within the SARS-CoV-2 genome, suggesting editing by APOBEC [168, 171, [174] [175] [176] [177] [178] [179] . RNA editing is enriched in putative RNA loop regions, presumably due to being more exposed than other parts of the genome [177] . Due to observations of a bias in C ->U mutations in the SARS-CoV-2 genome, it has been suggested that cytidine rich regions should be avoided during the design of diagnostic tests [178] . Furthermore, it has also been shown that virus-derived RNA sequences enriched for Uracil correlate with enhanced production of pro-inflammatory cytokines when comparing to the sequence of a reference virus. Based upon previous studies showing U-rich ssRNA stimulating the innate immune response through TLR7 signalling [180, 181] , Kosuge et al. (2020) , investigated the impact of C ->U point mutations on the host response, showing an increase in TNF-α and IL-6 production in immune cell lines [176] . Although SARS-CoV-2 has only circulated in humans for a little over a year, an unprecedented sequencing effort has led to the description of many variants. Until recently, the only robust evidence for a genotypic change that had a strong phenotype was the spike mutation D614G that has strongly been selected for in the human population. D614G has been shown to enhance virus entry and replication in the human respiratory tract. It is now necessary to understand which adaptive mutations are enhancing transmission and driving the increase of new variant of concern lineages such as B. Several studies have examined potential antigenic variants using either 'reverse genetics' approaches -mostly generating libraries of pseudotypes with naturally occurring changes or changes in predicted antigenic regionsor 'forward genetics' approaches -using authentic virus, replication competent chimeric viruses (generally vesicular stomatitis virus with its native glycoprotein replaced with SARS-CoV-2 spike), or phage/yeast display screens and selecting with antibodies to drive the emergence of variants in either naturally occurring or mutagenesis-derived quasi-species or mutant libraries [35, 119, 124, 182, 183] . These approaches are a key part of our ability to predict the antigenic effect of mutations. However, although a variety of mutants have been shown to escape neutralisation or binding from monoclonal antibodies, and occasionally convalescent antisera, it is still unclear whether these mutations would come with a fitness costs in the context of infectious viruses that would make them less likely to arise in the field. Furthermore, although the majority of approved vaccines specifically target the humoral immune response against the spike protein, further work is needed to understand the potential role of cellular immunity in natural infection and how this could be used to optimise future vaccines. As the level of natural immunity increases and global mass vaccination intensifies, it becomes ever more important to continuously sample, sequence and antigenically characterise novel virus variants, particularly from reinfections or from those who have been vaccinated [126] . This will allow for rapid detection of antigenic variants that could lead to potential vaccine failure, and for rapid vaccine updates where required, in a similar manner to seasonal influenza. The possibility of antigenic drift is something vaccine designers, regulators and manufacturers should prepare for in the coming months and years. A likely scenario based on the development of animal coronavirus vaccines is that future SARS-CoV-2 vaccines may have to be multivalent to protect against multiple circulating antigenic variants, similar to vaccines against influenza or the avian gammacoronavirus infectious bronchitis virus in poultry [184] . While changes in the spike protein are most important antigenically, genome alterations that change expression of viral accessory proteins are also expected and may influence transmission and pathogenicity. During its relatively brief human circulation, SARS-CoV ORF8 quickly gained a deletion leading to the creation of ORF8a and ORF8b [147] . Multiple SARS-CoV-2 isolates with accessory protein deletions and truncations are already described, these alterations can also occasionally lead to the expression of 'fusion ORFs' encompassing the N-terminus of one protein and the C-terminus of another. These variants remain rare and none has yet spread rapidly in the human population; most have quickly died out with the exception of the truncated ORF8 seen in the globally emerging B.1.1.7 lineage. However, this clearly remains an area that should be closely monitored during surveillance and performing whole genome sequencing (rather than just spike sequencing) remains imperative, since this type of genetic change might have an impact on transmission or clinical outcomes. In addition, surveillance for reverse zoonoses and for mutations in chronically ill people with COVID-19 should be reinforced and intensified. Certain companion and farmed animals are clearly highly susceptible to SARS-CoV-2 and their infections could drive the selection of variant viruses with different receptor binding or antigenic properties that could cross back into humans. Similarly, the rapidly spreading B.1.1.7 lineage in the UK, which is hypothesised to have gained multiple mutations in a chronically ill patient, is only one of several variants reported from such individuals displaying multiple markers of potential altered antigenic and receptor binding properties, as well as a higher infectivity than previously circulating strains [100, 103] . Constant monitoring and enhanced biosecurity measures in these groups are essential to avoid novel virus variants from emerging. It is important to note that, whilst consensus genomes are reported in global databases, individuals are infected with and have a population of virus within them [167] , manifesting as a consensus genome and minor variants, and all of these genomes are subject to selection pressure. Individual variations can be selected if they are advantageous or through founder effect or a mixture of both. In viruses with significant deletions in proteins, or the presence of stop codons, reported at a consensus sequence level, these may be balanced within the virus population with the presence of functional proteins at a minor variant level, similar to that observed in Ebola virus infection in humans [185] . Deep sequencing of clinical isolates can reveal important cooperative interactions between members of the virus population within a single host, and a better understanding of transmission bottlenecks will allow us to understand whether such interactions are perpetuated in transmission chains. To summarise, there is an urgent need to continue to perform in-depth surveillance and sequencing of SARS-CoV-2 isolates over the coming months and years coupled with detailed downstream phenotypic analysis of the impact of mutations in near real time. This analysis should include both traditional techniques with mutant virus isolates and closely related controls which can be performed fairly rapidly, as well as more modern, but slower, techniques such as reverse genetics [186] , which are vital to disentangle the phenotypes of mutations that occur across multiple genes. Tissue-specific immunopathology in fatal COVID-19 Coronavirus confirmed as pandemic by World Health Organization bbc World Health Organisation. WHO Coronavirus Disease (COVID-19) Dashboard Discovery of an RNA virus 3'->5' exoribonuclease that is critically involved in coronavirus RNA synthesis The enzymatic activity of the nsp14 exoribonuclease is critical for replication of MERS-CoV and SARS-CoV-2 Temporal signal and the phylodynamic threshold of SARS-CoV-2 Temporal and spatial analysis of the 2014-2015 Ebola virus outbreak in West Africa Coronavirus biology and replication: implications for SARS-CoV-2 Mutation in spike protein cleavage site and pathogenesis of feline coronavirus Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins Emergence of SARS-CoV-2 through recombination and strong purifying selection A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein Evolutionary dynamics of MERS-CoV: potential recombination, positive selection and transmission Decline of transmissible gastroenteritis virus and its complex evolutionary relationship with porcine respiratory coronavirus in the United States The hemagglutinin/esterase gene of human coronavirus strain OC43: phylogenetic relationships to bovine and murine coronaviruses and influenza C virus Betacoronavirus adaptation to humans involved progressive loss of hemagglutinin-esterase lectin activity Receptor recognition mechanisms of coronaviruses: a decade of structural studies Genetic predisposition to acquire a polybasic cleavage site for highly pathogenic avian influenza virus hemagglutinin Identification of the mechanisms causing reversion to virulence in an attenuated SARS-CoV for the design of a genetically stable vaccine Structures and distributions of SARS-CoV-2 spike proteins on intact virions Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein A pneumonia outbreak associated with a new coronavirus of probable bat origin Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion Intracellular targeting signals contribute to localization of coronavirus spike proteins near the virus assembly site Characterization of a highly conserved domain within the severe acute respiratory syndrome coronavirus spike protein S2 domain with characteristics of a viral fusion peptide Host cell entry of middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites A human coronavirus evolves antigenically to escape antibody immunity Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus Analysis of genomic distributions of SARS-CoV-2 reveals a dominant strain type with strong allelic associations SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2 Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity The spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types SARS-CoV-2 D614G spike mutation increases entry efficiency with enhanced ACE2-binding affinity A clade of SARS-CoV-2 viruses associated with lower viral loads in patient upper airways Massive dissemination of a SARS-CoV-2 spike Y839 variant in Portugal Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo SARS-CoV-2 spike D614G variant exhibits highly efficient replication and transmission in hamsters Distinct phenotypes of SARS-CoV-2 isolates reveal viral traits critical for replication in primary human respiratory cells Spike mutation D614G alters SARS-CoV-2 fitness SARS-CoV-2 spike D614G change enhances replication and transmission Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant Structural impact on SARS-CoV-2 spike protein by D614G substitution D614G spike mutation increases SARS CoV-2 susceptibility to neutralization D614G mutation alters SARS-CoV-2 spike conformation and enhances protease cleavage at the S1/S2 junction The effect of the D614G substitution on the structure of the spike glycoprotein of SARS-CoV-2 Spike glycoprotein and host cell determinants of SARS-CoV-2 entry and cytopathic effects Increased transmission of SARS-CoV-2 lineage B.1.1.7 (VOC 2020212/01) is not accounted for by a replicative advantage in primary airway cells or antibody escape Entebbe SARS-CoV-2 Sequencing Group. SARS-CoV-2 diversity in Uganda A SARS-CoV-2 lineage a variant (A.23.1) with altered spike has emerged and is dominating the current Uganda epidemic Infection and rapid transmission of SARS-CoV-2 in ferrets Dose-dependent response to infection with SARS-CoV-2 in the ferret model and evidence of protective immunity Coronavirus rips through Dutch mink farms, triggering culls Detection and molecular characterisation of SARS-CoV-2 in farmed mink (Neovision vision) in Poland SARS-CoV-2 is transmitted via contact and via the air between ferrets Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans SARS-CoV-2 spike mutations arising in Danish mink and their spread to humans Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding Possible hostadaptation of SARS-CoV-2 due to improved ACE2 receptor binding in mink The SARS-CoV-2 spike protein has a broad tropism for mammalian ACE2 proteins SARS-CoV-2 mutations acquired in mink reduce antibodymediated neutralization Circulating SARS-CoV-2 variants escape neutralization by vaccine-induced humoral immunity COVID-19: all mink in Denmark must be culled Association of increased receptor-binding avidity of influenza A(H9N2) viruses with escape from antibody-based immunity and enhanced zoonotic potential Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift The molecular basis of antigenic variation among A(H9N2) avian influenza viruses Emergence of Y453F and Δ69-70HV mutations in a lymphoma patient with long-term COVID-19. virological Comparative tropism, replication kinetics, and cell damage profiling of SARS-CoV-2 and SARS-CoV with implications for clinical manifestations, transmissibility, and laboratory studies of COVID-19: an observational study Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein Identification of common deletions in the spike protein of SARS-CoV-2 Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction SARS-coronavirus-2 replication in Vero E6 cells: replication kinetics, rapid adaptation and cytopathology Naturally occurring SARS-CoV-2 gene deletions close to the spike S1/S2 cleavage site in the viral quasispecies of COVID19 patients SARS-CoV-2 growth, furin-cleavage-site adaptation and neutralization using serum from acutely infected hospitalized COVID-19 patients Identification of common deletions in the spike protein of severe acute respiratory syndrome coronavirus 2 The furin cleavage site of SARS-CoV-2 spike protein is a key determinant for transmission due to enhanced replication in airway cells Natural transmission of bat-like SARS-CoV-2PRRA variants in COVID-19 patients SARS-CoV-2 entry into human airway organoids is serine protease-mediated and facilitated by the multibasic cleavage site SARS-CoV-2 variants with mutations at the S1/S2 cleavage site are generated in vitro during propagation in TMPRSS2-deficient cells A genome-wide CRISPR screen identifies host factors that regulate SARS-CoV-2 entry The polybasic cleavage site in the SARS-CoV-2 spike modulates viral sensitivity to Type I interferon and IFITM2 Loss of furin cleavage site attenuates SARS-CoV-2 pathogenesis Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. virological Transmission of SARS-CoV-2 lineage B.1.1.7 in England: insights from linking epidemiological and genetic data Increased hazard of mortality in cases compatible with SARS-CoV-2 variant of concern 202012/1 -a matched cohort study Increased hazard of death in community-tested cases of SARS-CoV-2 variant of concern S-variant SARS-CoV-2 is associated with significantly higher viral loads in samples tested by ThermoFisher TaqPath RT-PCR Estimated transmissibility and severity of novel SARS-CoV-2 variant of concern 202012/01 in England Early analysis of a potential link between viral load and the N501Y mutation in the SARS-COV-2 spike protein Increased infections, but not viral burden, with a new SARS-CoV-2 variant Coronavirus (COVID-19) infection survey Emergence and fast spread of B.1.1.7 lineage in Lebanon Rapid SARS-CoV-2 variants spread detected in France using specific RT-PCR testing Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States Rapid rise of S-gene target failure and the UK variant B.1.1.7 among COVID-19 isolates in the greater Toronto area Persistence and evolution of SARS-CoV-2 in an immunocompromised host Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor SARS-CoV-2 evolution during treatment of chronic infection Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2 Neutralization of N501Y mutant SARS-CoV-2 by BNT162b2 vaccineelicited sera The impact of spike mutations on SARS-CoV-2 neutralization mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants Neutralization of SARS-CoV-2 lineage B.1.1.7 pseudovirus by BNT162b2 vaccine-elicited human sera The N501Y mutation in SARS-CoV-2 spike leads to morbidity in obese and aged mice and is neutralized by convalescent and post-vaccination human sera Emerging SARS-CoV-2 variants reduce neutralization sensitivity to convalescent sera and monoclonal antibodies Sensitivity of SARS-CoV-2 B.1.1.7 to mRNA vaccine-elicited antibodies PHE monitoring of the effectiveness of COVID-19 vaccination Within-Host evolution of human influenza virus Case study: prolonged infectious SARS-CoV-2 shedding from an asymptomatic immunocompromised individual with cancer Emergence of multiple SARS-CoV-2 mutations in an immunocompromised host Longterm evolution of SARS-CoV-2 in an immunocompromised patient with non-Hodgkin lymphoma Persistent SARS-CoV-2 infection and increasing viral variants in children and young adults with impaired humoral immunity SARS-CoV-2 escape in vitro from a highly neutralizing COVID-19 convalescent plasma Recurrent emergence and transmission of a SARS-CoV-2 spike deletion ΔH69/V70. bioRxiv Phylogenetic relationship of SARS-CoV-2 sequences from Amazonas with emerging Brazilian variants harboring mutations E484K and N501Y in the spike protein Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Emergence of a SARS-CoV-2 variant of concern with mutations in spike glycoprotein Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic Preliminary efficacy of the NVX-CoV2373 Covid-19 vaccine against the B.1.351 variant Teixeira de Vasconcelos RH, Arantes I, Appolinario L. Spike E484K mutation in the first SARS-CoV-2 reinfection case confirmed in Brazil SARS-CoV-2 reinfection by the new variant of concern (VOC) P.1 in Amazonas, Brazil Genomic evidence of SARS-CoV-2 reinfection involving E484K spike mutation SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma Evasion of type I interferon by SARS-CoV-2 Covid-19: the E484K mutation and the risks it poses A novel SARS-CoV-2 variant of concern, B.1.526, identified in New Resurgence of SARS-CoV-2 19B clade corresponds with possible convergent evolution Proposal for lineage within B.1.324 with N501Y, P681H and others Genome sequencing and analysis of an emergent SARS-CoV-2 variant characterized by multiple spike protein mutations detected from the central Visayas region of the Philippines Variants of concern or under investigation: data up to 10 Emergence of a novel SARS-CoV-2 variant in Southern California Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of The ORF8 protein of SARS-CoV-2 mediates immune evasion through potently downregulating MHC-I. bioRxiv SARS-CoV-2 genomic surveillance in Taiwan revealed novel ORF8-deletion mutant and clade possibly associated with infections in Middle East Discovery and genomic characterization of a 382-nucleotide deletion in ORF7b and ORF8 during the early evolution of SARS-CoV-2 Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study Infection of human nasal epithelial cells with SARS-CoV-2 and a 382-nt deletion isolate lacking ORF8 reveals similar viral kinetics and host transcriptional profiles The 29-nucleotide deletion present in human but not in animal severe acute respiratory syndrome coronaviruses disrupts the functional expression of open reading frame 8 Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission 7A protein of severe acute respiratory syndrome coronavirus inhibits cellular protein synthesis and activates p38 mitogen-activated protein kinase Severe acute respiratory syndrome coronavirus ORF7a inhibits bone marrow stromal antigen 2 virion tethering through a novel mechanism of glycosylation interference SARS-CoV-2 spike downregulates tetherin to enhance viral spread An 81-Nucleotide deletion in SARS-CoV-2 ORF7a identified from sentinel surveillance in Arizona Genomic surveillance of SARS-CoV-2 in Thailand reveals mixed imported populations, a local lineage expansion and a virus with truncated ORF7a Identification of multiple large deletions in ORF7a resulting in in-frame gene fusions in clinical SARS-CoV-2 isolates Identification of eight SARS-CoV-2 ORF7a deletion variants in 2,726 clinical specimens SARS-CoV-2 genomic surveillance identifies naturally occurring truncations of ORF7a that limit immune suppression SARS-CoV-2 Orf6 hijacks Nup98 to block STAT nuclear import and antagonize interferon signaling A rare deletion in SARS-CoV-2 Orf6 dramatically alters the predicted three-dimensional structure of the resultant protein Characterization of SARS-CoV-2 ORF6 deletion variants detected in a nosocomial cluster during routine genomic surveillance Isolation of SARS-CoV-2 strains carrying a nucleotide mutation, leading to a stop codon in the ORF 6 protein Enisamium is a small molecule inhibitor of the influenza A virus and SARS-CoV-2 RNA polymerases SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation Emerging of a SARS-CoV-2 viral strain with a deletion in NSP1 A SARS-CoV-2 variant with the 12-bp deletion at E gene Deletion in the C-terminal region of the envelope glycoprotein in some of the Indian SARS-CoV-2 genome A severe acute respiratory syndrome coronavirus that lacks the E gene is attenuated in vitro and in vivo Amplicon-Based detection and sequencing of SARS-CoV-2 in nasopharyngeal swabs from patients with COVID-19 and identification of deletions in the viral genome that encode proteins involved in interferon antagonism Hostdirected editing of the SARS-CoV-2 genome The APOBEC protein family: United by structure, divergent in function Modeling the embrace of a mutator: APOBEC selection of nucleic acid ligands Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 A left-handed RNA double helix bound by the Z alpha domain of the RNA-editing enzyme ADAR1 Immune escape variants of H9N2 influenza viruses containing deletions at the hemagglutinin receptor binding site retain fitness in vivo and display enhanced zoonotic characteristics Host immune response driving SARS-CoV-2 evolution Rampant C→U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: causes and consequences for their short-and long-term evolutionary trajectories Point mutation bias in SARS-CoV-2 variants results in increased ability to stimulate inflammatory responses Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic Mutations on COVID-19 diagnostic targets Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity Toll-like receptors in innate immunity Species-specific recognition of single-stranded RNA via toll-like receptor 7 and 8 Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies Prospective mapping of viral mutations that escape antibodies used to treat COVID-19 Vaccination against infectious bronchitis virus: a continuous challenge Variation around the dominant viral genome sequence contributes to viral load and outcome in patients with Ebola virus disease Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform The authors would like to acknowledge Daniel Goldhill, Niluka Goonawardane, Rebecca Frise, Carol Sheppard, and Maya Moshe for their useful comments and help proofreading this review. The authors declare that there are no conflicts of interest. Five reasons to publish your next article with a Microbiology Society journal 1 . The Microbiology Society is a not-for-profit organization. 2. We offer fast and rigorous peer review -average time to first decision is 4-6 weeks. 3. Our journals have a global readership with subscriptions held in research institutions around the world. 4. 80% of our authors rate our submission process as 'excellent' or 'very good'. 5. Your article will be published on an interactive journal platform with advanced metrics.Find out more and submit your article at microbiologyresearch.org.