key: cord-0868523-a3su05vc authors: Pereira, Filipe title: SARS-CoV-2 variants combining spike mutations and the absence of ORF8 may be more transmissible and require close monitoring date: 2021-02-25 journal: Biochem Biophys Res Commun DOI: 10.1016/j.bbrc.2021.02.080 sha: f7b52d325437cf2cef3be46df9d6c333f8ccaf46 doc_id: 868523 cord_uid: a3su05vc The SARS-CoV-2 Variant of Concern 202012/01 (VOC-202012/01) emerged in southeast England and rapidly spread worldwide. This variant is believed to be more transmissible, with all attention being given to its spike mutations. However, VOC-202012/01 has also a mutation (Q27stop) that truncates the ORF8, a likely immune evasion protein. Removal of ORF8 changes the clinical outset of the disease, which may affect the virus transmissibility. Here I provide a detailed analysis of all reported ORF8-deficient lineages found in the background of relevant spike mutations, identified among 231,433 SARS-CoV-2 genomes. I found 19 ORF8 nonsense mutations, most of them occurring in the 5’ half of the gene. The ORF8-deficient lineages were rare, representing 0.67% of sequenced genomes. Nevertheless, I identified two clusters of related sequences that emerged recently and spread in different countries. The widespread D614G spike mutation was found in most ORF-deficient lineages. Although less frequent, HV69-70del and L5F spike mutations occurred in the background of six different ORF8 nonsense mutations. I also confirmed that VOC-202012/01 is the ORF8-deficient variant with more spike mutations reported to date, although other variants could have up to six spike mutations, some of putative biological relevance. Overall, these results suggest that monitoring ORF8-deficient lineages is important for the progression of the COVID-19 pandemic, particularly when associated with relevant spike mutations. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) triggered a major health crisis worldwide by causing the coronavirus disease 2019 . The virus spread rapidly from the original outbreak in Hubei province, China [1] . Several mutations have been described along the SARS-CoV-2 genome, as expected in a fast expanding population of a rapidly evolving RNA virus [2] . In November 2020, a variant named Variant of Concern 202012/01 (VOC-202012/01), originally termed VUI-202012/01, emerged in southeast England and rapidly spread locally and to other countries [3] [4] [5] . This variant belongs to lineage B.1.1.7 and gained particular interest by its fast spread and by having 14 non-synonymous mutations and 3 deletions. Focus has been placed on the mutations located in the spike protein due to their likely phenotypic effect. In particular, mutation N501Y has been shown to provide a more favourable interaction with the angiotensin-converting enzyme 2 (ACE2) in mice [6] , P681H is located immediately adjacent to the furin cleavage site of importance for infection and transmission and the HV 69-70 deletion has been described in the context of evasion to the human immune response [7] . However, VOC-202012/01 has also a mutation that truncates the ORF8 accessory gene. It is now well recognised that SARS-CoV-2 can persist without a functional ORF8 protein, either by the accumulation of nonsense mutations or large deletions that remove or significantly change the ORF8 protein [8] [9] [10] . Moreover, substantial evidence has been accumulated over the last months to suggest that removal of ORF8 changes the clinical outset of the disease, with likely consequences on the transmissibility of the disease. A SARS-CoV-2 variant with a 382-nucleotide deletion in ORF8 results in a milder infection with less systemic release of proinflammatory cytokines, with patients having a longer duration of symptoms [10] . There is currently no data showing if VOC-202012/01 results in greater or lesser severity of disease in infected patients. However, preliminary data seems to indicate J o u r n a l P r e -p r o o f that this lineage is more transmissible than pre-existing variants of SARS-CoV-2 [4, 11] . The co-occurrence of relevant spike mutations with the truncation of ORF8 raises the possibility that both loci are contributing to the increased transmissibility. The spike mutations may be affecting the receptor binding affinity of the spike protein enhancing the transmissibility of the virus, while the longer duration of symptoms resulting from the absence of ORF8 may increase the opportunity for transmission. It is therefore important to identify other SARS-CoV-2 variants that combine the absence of a functional ORF8 with spike mutation of putative effect on transmissibility and infection. If still in circulation, such linages may pose a risk of increased transmissibility, as observed with VOC-202012/01. Here I provide a detailed characterization of such variants, providing the basis for their future monitoring. SARS-CoV-2 sequences were obtained from the GISAID Initiative (https://www.gisaid.org/) using the filters "complete", "high coverage" and "low coverage excl" (all together), accessed on January 4, 2021. These filters excluded sequences with >1% Ns and with insertions/deletions not verified by the submitter. A total of 231,433 genomes were downloaded and analyzed for ORF8 and spike mutations. The sequences were aligned using the MAFFT version 7 online service [12] , with the light-weight option for MSA of full-length SARS-CoV-2 genomes. The SARS-CoV-2 genome with accession number NC_045512.2 was used as reference. The list of ORF8 nonsense mutations was obtained from a previous work [13] which used data from the China National Center for Bioinformation (CNCB) [14] and CoV-J o u r n a l P r e -p r o o f GLUE database [15] .The selection of spike mutations was based on the changes observed in the VOC-202012/01 variant and in two publications: 1) Korber et al. [16] , which proposed an analysis pipeline to facilitate real-time tracking of spike mutation that may confer selective advantages in transmission or resistance to interventions and 2) Li et al. [17] that investigated SARS-CoV-2 spike mutation variants for the infectivity and reactivity to a panel of neutralizing antibodies and sera from convalescent patients. All sequences have been classified into clades using the Nextclade tool Table 1 ). Most of these mutations (n = 14) occurred in the 5' half of the ORF8 gene, resulting in short truncated proteins that are most likely non-functional (Fig. 1A) . Overall, premature ORF8 stop codons were found in 1,540 out of 231,433 genomes (0.67%). The number of observed ORF8 nonsense mutations (n = 1,788) was higher than the number of genomes due to the co-occurrence of more than one mutation in the same Overall, these results support the previous claim that ORF8 has the highest coefficient of premature stop codons among all SARS-CoV-2 genes [18] . Moreover, the data suggests that more than one premature stop codon can occur in ORF8, perhaps due to genomes, the rate of premature termination at ORF8 proves that the virus can persist without a functional protein, as suggested by the ORF8-deleted variants [19, 20] and the now widespread VOC-202012/01 [4, 11] . When considered independently, the most frequent ORF8 nonsense mutations were Q18stop (n = 534), E64stop (n = 348), E110stop (n = 306), Q27stop (n = 158) and E106stop (n = 123) ( Table 1 ). In all cases, the same mutation was found occurring six of those submitting sequences to GISAID [21] . Therefore, the distribution of lineages is highly influenced by the sequencing efforts of these countries, and many cases certainly remain undetected. Nevertheless, some ORF8-deficient variants are found in several countries, which is not correlated with the number of reported sequences. All SARS-CoV-2 variants without a functional ORF8 had at least one relevant spike mutation (Table 1) (Table 1) . There is a clear gap between VOC-202012/01 and other variants in terms of the number of spike mutations (Supplementary Figure S3) . The next ORF8-deficient variant (E64stop) had six spike mutations (I68del, HV69-70del, L189F, N439K, D614G and V772I), two of them of putative biological relevance (HV69-70del and D614G). The sequences form a small cluster in clade 20A (Fig. 2) , being found in Denmark (n = 8), Germany (n = 2), Switzerland (n = 2) and USA (n = 1). The branch where these sequences occurred is predominantly found in central Europe, suggesting it was originated in this area. Several ORF8-deficient variants were found with five spike mutations. For example, a variant with five spike mutation, three of relevance (L5F, HV69-70del and D614G), was found associated with E59stop in a sample from Poland (EPI_ISL_732828). Ten sequences with five spike mutations were found associated with Q18stop, in three different combinations and branches of the SARS-CoV-2 phylogeny (Fig. 1C) . The most divergent sequences within the Q18stop / E110stop cluster had five spike mutations and were detected in England. J o u r n a l P r e -p r o o f The ORF8 accessory gene is not crucial for the replication of SARS-CoV-2, as proven by the several variants lacking this gene reported worldwide and studied here. The gene is most likely involved in modulation of the host infected cell metabolism and innate immunity evasion, although the subject is still under investigation [8, 9, 23] . Removal of ORF8 results in less severe disease with a likely prolonged infection period [10] , which may increase the opportunities for contagious and virus transmissibility. Curiously, middle and late phases of the SARS epidemic were characterized by the spread of SARS-CoV with either partial or complete deletions of the ORF8 gene [24, 25] . The same may be happening in SARS-CoV-2, as more transmissible lineages may become dominant as the pandemic progresses. Overall, I found that ORF8 nonsense mutations occur recurrently in the SARS-CoV-2 pandemic. It is difficult to estimate if such variants had any increased transmissibility, as genomes deposited in GISAID are a biased sampled of circulating variants. Moreover, spread of a variant depends on many factors beyond the viral genetics, such as restrictions on host mobility, rates of diagnostic testing, clinical management, etc. In any case, I identified two clusters of ORF8-deficient variants that emerged recently in the 20B clade, which did not include the emerging VOC-202012/01 variant. The association of these ORF8-deficient variants with spike mutations of interest (as in VOC-202012/01) should raise concern, as they may be more transmissible. The recent origin of these clusters and the high number of variants detected in several countries should be further investigated. In any case, VOC-202012/01 is clearly the ORF8-deficient variant with more associated spike mutations detected so far. It has been speculated that this variant may have emerged in immunodeficient or immunosuppressed patients who are chronically infected with SARS-CoV-2 [5] . Because the ORF8 protein is believed to suppress immune responses, it is possible that the pressure to retain this protein is None to declare. E59stop 67 2 0 2 0 0 0 65 0 0 3 L60stop 2 0 0 0 0 0 0 2 0 0 1 C61stop 3 0 0 0 0 0 0 3 0 0 1 E64stop 348 3 0 14 0 0 0 345 0 0 3 K68stop 16 0 0 5 4 0 5 16 5 0 5 E106stop 123 0 1 4 0 0 0 117 0 0 3 E110stop 306 11 0 0 15 2 0 302 0 1 5 Total 1788 32 6 62 48 3 25 1704 23 2 Nº of ORF8 nonsense mutations with spike mutations 6 4 6 4 2 3 19 3 2 The emergence of SARS-CoV-2 in Europe and North America We shouldn't worry when a virus mutates during disease outbreaks Investigation of novel SARS-COV-2 variant: Variant of Concern 202012/01 Estimated transmissibility and severity of novel SARS-CoV-2 Variant of Concern 202012/01 in England, medRxiv Preliminary genomic characterisation of an emergent SARS lineage in the UK defined by a novel set of spike mutations Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy Natural deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape, bioRxiv Lost in deletion: The enigmatic ORF8 protein of SARS-CoV-2 Evolutionary dynamics of the SARS-CoV-2 ORF8 accessory gene Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study Transmission of SARS-CoV-2 Lineage B. 1.1. 7 in England: Insights from linking epidemiological and genetic data, medRxiv MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization SARS-CoV-2 variants lacking a functional ORF8 may reduce accuracy of serological testing The 2019 novel coronavirus resource CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2, bioRxiv The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity The SARS-CoV-2 ORF10 is not essential in vitro or in vivo in humans SARS-CoV-2 genomic surveillance in Taiwan revealed novel ORF8-deletion mutant and clade possibly associated with infections in Middle East Discovery and Genomic Characterization of a 382-Nucleotide Deletion in ORF7b and ORF8 during the Early Evolution of SARS-CoV-2, mBio Genomic Sequencing Effort for SARS-CoV-2 by Country during the Pandemic Evaluating the effects of SARS-CoV-2 Spike mutation D614G on transmissibility and pathogenicity Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Emergence of Y453F and Δ69-70HV mutations in a lymphoma patient with long-term COVID-19