key: cord-0976887-wdxb1tqp authors: Rocheleau, Lynda; Laroche, Geneviève; Fu, Kathy; Stewart, Corina M; Mohamud, Abdulhamid O; Côté, Marceline; Giguère, Patrick M; Langlois, Marc-André; Pelchat, Martin title: Identification of a High-frequency Intra-host SARS-CoV-2 spike Variant with Enhanced Cytopathic and Fusogenic Effect date: 2021-03-17 journal: bioRxiv DOI: 10.1101/2020.12.03.409714 sha: 512000b2aebacb29c3ef7a8ee2e8f05409719276 doc_id: 976887 cord_uid: wdxb1tqp The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that is continuously evolving. Although its RNA-dependent RNA polymerase exhibits some exonuclease proofreading activity, viral sequence diversity can be produced by replication errors and host factors. A diversity of genetic variants can be observed in the intra-host viral population structure of infected individuals. Most mutations will follow a neutral molecular evolution and won’t make significant contributions to variations within and between infected hosts. Herein, we profiled the intra-sample genetic diversity of SARS-CoV-2 variants using high-throughput sequencing datasets from 15,289 infected individuals and infected cell lines. Most of the genetic variations observed, including C->U and G->U, were consistent with errors due to heat-induced DNA damage during sample processing and/or sequencing protocols. Despite high mutational background, we identified recurrent intra-variable positions in the samples analyzed, including several positions at the end of the gene encoding the viral Spike (S) protein. Strikingly, we observed a high-frequency C->A missense mutations resulting in the S protein lacking the last 20 amino acids (SΔ20). We found that this truncated S protein undergoes increased processing and increased syncytia formation, presumably due to escaping M protein retention in intracellular compartments. Our findings suggest the emergence of a high-frequency viral sublineage that is not horizontally transmitted but potentially involved in intra-host disease cytopathic effects. IMPORTANCE The mutation rate and evolution of RNA viruses correlate with viral adaptation. While most mutations do not have significant contributions to viral molecular evolution, some are naturally selected and cause a genetic drift through positive selection. Many recent SARS-CoV-2 variants have been recently described and show phenotypic selection towards more infectious viruses. Our study describes another type of variant that does not contribute to inter-host heterogeneity but rather phenotypic selection toward variants that might have increased cytopathic effects. We identified that a C-terminal truncation of the Spike protein removes an important ER-retention signal, which consequently results in a Spike variant that easily travels through the Golgi toward the plasma membrane in a pre-activated conformation, leading to increased syncytia formation. Observed for the first time in 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its associated disease, COVID-19, has caused significant worldwide mortality and unprecedented economic burdens. SARS-CoV-2 is an enveloped virus with a nonsegmented, positive-sense single-stranded RNA (vRNA) genome comprised of a ~30K nucleotides (1, 2) . The virus is composed of four main structural proteins, encoded in the last 3' terminal third of the viral genome: the spike glycoprotein (S), membrane (M), envelope (E) and the nucleocapsid (N) (3) (4) (5) . Attachment to the host receptor angiotensin-converting enzyme 2 (ACE2) is mediated by the S protein expressed on the surface of the virion (6) . Following its association, the S protein is cleaved into two separate polypeptides (S1 and S2), which triggers the fusion of the viral particle with the cellular membrane (7, 8) . Once inside a cell, its RNAdependent RNA polymerase (RdRp), which is encoded in the first open reading frame of the Rocheleau et al. 4 viral genome (9) , carries out transcription and replication of the vRNA genome. In addition, mRNAs coding for the structural proteins (e.g., S, M, E and N) are expressed by subgenomic RNAs (9) . Once translated, the S, M and E proteins localize and accumulate at the CoV budding site in the endoplasmic reticulum-Golgi intermediate compartment (ERGIC) (10) . One aspect of CoV biology is that CoV virions bud into the lumen of the secretory pathway at the ERGIC and must then traffic through the Golgi complex and anterograde system to be efficiently released from host cells (11) . The S protein possesses an endoplasmic reticulum retrieval signal (ERRS) at its carboxy terminus, which is required for trafficking through the ERGIC (12) . At this location, the spike protein interacts with the M protein, which has been shown to be essential for accumulation at the ERGIC. The N protein then associates with the viral genome and assembles into virions, which are transported along the endosomal network and released by exocytosis (9) . If not retained at ERGIC, the S proteins traffics through the Golgi and is pre-activated by resident proteases prior to reaching the plasma membrane. Here it can mediate cell fusion between adjacent cells, resulting in the production of multinucleated cells or syncytia (8, 13, 14) . Genomic sequencing of SARS-CoV-2 vRNA from infected populations has demonstrated genetic heterogeneity (15) (16) (17) (18) (19) (20) (21) . Several recurrent mutations have been identified in consensus sequences, and the geographical distribution of clades was established. Because they induce an abundance of missense rather than synonymous or non-sense mutations, it was suggested that regions of the SARS-CoV-2 genome were actively evolving and might contribute to pandemic spreading (21) . It was observed that variations are mainly comprised of transition mutations (purine->purine or pyrimidine->pyrimidine) with a prevalence of C->U transitions and might occur within a sequence context reminiscent of APOBEC-mediated deamination (i.e., Rocheleau et al. [AU]C[AU]; (22, 23) ). Consequently, it was proposed that host editing enzymes might be involved in coronavirus genome editing (24, 25) . Consensus mutations are only part of the genetic landscape with regards to RNA viruses. Replication of RNA viruses typically produces quasispecies in which the viral RNA genomes do not exist as single sequence entity but rather as a population of genetic variants (26) . These mutations are most frequently caused by both the error-prone nature of each of their respective viral RdRps, as well as by host RNA editing enzymes such as APOBECs and ADARs (27) . However, the RdRp complex of large RNA viruses, such as coronaviruses, sometimes possess exonuclease proofreading activity, and consequently have lower error rates (26, 28) . Quasispecies may sometimes exhibit diminished replicative fitness or deleterious mutations and exert different roles that are not directly linked to viral genomic propagation (29) . Mutations that form the intra-host genetic spectrum have been shown to help viruses evade cytotoxic T cell recognition and neutralizing antibodies, rendering these viruses more resistant to antiviral drugs (29) . Additionally, these mutations can also be involved in modulating the virulence and transmissibility of the quasispecies (29) . In this study, we focussed on assessing intra-host genetic variations of SARS-CoV-2. We analyzed high-throughput sequencing datasets to profile the sequence diversity of SARS-CoV-2 variants within distinct sample populations. We observed high genetic intra-variability of the viral genome. By comparing variation profiles between samples from different donors and cell lines, we identified highly conserved subspecies that independently and recurrently arose in different datasets and, therefore, in different individuals. We further analyzed the dominant variant S20 in a functional assay and demonstrate that this truncated S protein avoids inhibition caused by M protein and enhances syncytium formation. We provide evidence for the existence Rocheleau et al. 6 of a consistently emerging variant identified across geographical regions that may influence intra-host SARS-CoV-2 pathogenicity. with SARS-CoV-2 S or SARS-CoV-2 S20 and M or pCAGGS using JetPRIME at a 1:1 ratio. The following day, cells were washed once with cold PBS and lysed in cold lysis buffer (1% Triton X-100, 0.1% IGEPAL CA-630, 150mM NaCl, 50mM Tris-HCl, pH 7.5) containing protease and phosphatase inhibitors (Cell Signaling). Proteins in cell lysates were resolved on 4-12% gradient SDS-polyacrylamide gels (NuPage, Invitrogen) and transferred to polyvinylidenedifluoride (PVDF) membranes. Membranes were blocked for 1h at RT with blocking buffer (5% skim milk powder dissolved in 25mM Tris, pH 7.5, 150mM NaCl, and 0.1% Tween-20 [TBST]). Processing of spike was detected by immunoblotting using an anti-S1 (SARS-CoV/SARS-CoV-2 spike protein S1 polyclonal, Invitrogen) and anti-S2 antibody (SARS-CoV/SARS-CoV-2 spike protein S2 monoclonal, Invitrogen). Overexpression of M was also detected by immunoblotting and using an anti-M antibody (Rabbit anti-SARS Membrane protein, NOVUS). Membranes were incubated overnight at 4°C with the appropriate primary antibody in the blocking buffer. Blots were then washed in TBST and incubated with HRPconjugated secondary antibody for 1h at room temperature (anti-mouse HRP (Cell Signaling) and anti-rabbit HRP (Cell Signaling). Membranes were washed, incubated in chemiluminescence substrate (SuperSignal West Femto Maximum Sensitivity Substrate, ThermoFisher scientific), and imaged using the ChemiDoc XRS+ imaging system (Bio-Rad). In some instances, the same membrane was stripped and re-probed for actin (Monoclonal Anti--Actin, Millipore Sigma). To assess the extent of SARS-CoV-2 sequence intra-genetic variability, we analyzed 15,224 publicly available high-throughput sequencing datasets from infected individuals. The raw sequencing reads were mapped to the SARS-CoV-2 isolate Wuhan-Hu-1 reference genome, and the composition of individuals. The analysis of the type of nucleotide changes within samples revealed that 52.2% were transitions (either purine->purine or pyrimidine->pyrimidine) and 47.8% were transversions (purine->pyrimidine or pyrimidine->purine). Notably, the highest nucleotide variations corresponded to C->U transitions (43.5%) followed by G->U transversion (28.1%; 1B ). Because SARS-CoV-2 is composed of 62% A/U, this suggests that the observed number of As and Us around variation sites are mainly due to the A/U content of the viral genome, the fact that no motifs are enriched around these sites, and that these intra-genetic variations are likely not originating from host editing enzymes. 10 To identify biologically relevant intra-genetic variations, we examined the variable positions that are recurrent in the samples analyzed. The variable positions were tabulated for each sample and then recurrent intra-genetic variations were calculated as percentages of samples containing variation at each position. Most variations are distributed homogeneously on the viral genome and most are poorly shared amongst samples ( Fig. 1C and 1D ). However, our analysis reveals 15 recurrent intra-variations shared by at least 5% of the samples analyzed (Fig. 1C, samples) and corresponds to a C->A transversion producing a nonsense mutation at amino acid 1,254 of the S protein ( Fig. 1C and 1D, red line; Fig. 2B , red rectangle). The resulting S protein lacks the last 20 amino acids (S20), which includes the ERRS motif at its carboxy terminus ( Fig. 2B , white letters on a black background). Amongst the sample with this intra-genetic variation, this C->A transversion represents from 2.9% to 42.4% of the subspecies identified (mean or 8.2+/-2.9%; Fig. 2C and Table 1 ). To further investigate variations in a more controlled system and to determine whether host proteins are involved in SARS-CoV-2 genome editing, we used 65 high-throughput sequencing datasets generated in a recent transcription profiling study of several cell lines infected with SARS-CoV-enzyme expression. For all cell lines, normalized counts for mRNAs corresponding to most host modifying enzymes were very low or non-detected (Fig. 3) , suggesting that these cell lines poorly expressed these host editing proteins. As above, the raw sequencing reads from infected cells were mapped to the SARS-CoV-2 genome sequence, the composition of each nucleotide at each position on the viral genome were generated, and nucleotide variations compared to respective consensus sequences were calculated. Because the sequencing depths of the samples were low, we considered positions mapped by at least 20 reads and having at least 2 reads with variation compared to the sample consensus. In the samples derived from infected cells, we SARS-CoV-2 viral entry into cells is triggered by the interaction between the S glycoprotein and its cellular receptor ACE2. While the complete mechanism of viral entry is not fully understood, it is known that S undergoes different processing steps by cellular surface and endosomal proteases. For several coronaviruses, the S protein not only mediates virion fusion but also syncytia formation (8, 13, 14) . The presence of dysmorphic pneumocytes forming syncytial elements is well-described feature of the COVID-19 disease severity (37) . One particularity of SARS-CoV-2 compared to SARS-CoV, is the presence of an additional furin-like cleavage site at the S1/S2 interface. As a consequence, SARS-CoV-2 cells have higher propensity to express activated S at the surface which can fuse with other cells expressing the receptor ACE2 and form syncytia (37) . The normal route of S trafficking involves an accumulation at the ERGIC which is known to involve, at least in part, the interaction of the cytoplasmic portion of S with the M protein encoded by SARS-CoV-2. This interaction allows complex formation leading to virion formation at the ERGIC interface. The discovery of the S20 variant missing a portion of the Cterminus directed us to investigate the effect on cell fusion using a syncytia assay in the presence of the M protein. HEK293T cells stably expressing the human ACE2 were co-transfected with plasmids encoding GFP, the M protein and the WT or 20 S protein. Consistent with previous findings (8), we observed syncytia formation in the presence of the S WT and S20, indicating induction of cell-to-cell fusion (Fig. 5A) . We also observed larger syncytia formation with S20 compared to S WT, which indicates increased fusogenic activity of this truncated variant. As Rocheleau et al. 13 expected, the co-expression of the M protein and WT S completely abolishes syncytia formation which is a consequence of S being retained to the ERGIC. Strikingly, M protein failed to inhibit syncytia formation in the presence of the S20 (Fig. 5A) . To evaluate the effect of the 20 truncation on spike protein processing, we co-expressed the M protein with WT or 20 S protein in HEK293T in the absence of ACE2 to avoid cell fusion. Cells were lysed 24 hours posttransfection and spike processing was assessed by probing for SARS-CoV2 S1 and S2 subunits by immunoblotting. As seen in figure 5B , the S20 undergoes increase processing as observed by the presence of more S1 and S2 compared to S WT ( Fig. 5B ; lane 2 vs lane 4). Finally, the coexpression of the M protein reduces the processing of the S WT protein while not affecting S20 processing, as observed by a reduction of S1 fragment only for S WT (Fig. 5B; lane 3 Taken together, our results indicate that S20 displays increased processing and syncytia formation as compared to the wild-type S protein and the truncation removes an important regulatory domain involving the M protein. Previous analyses of SARS-CoV-2 nucleotide variations indicated a high prevalence of C->U transitions, suggesting that the viral genome was actively evolving and host editing enzymes, such as APOBECs and ADARs, might be involved in this process (24, 25) . Although instructive on the role of host involvement in SARS-CoV-2 genome evolution, these studies were performed on consensus sequences (i.e., one per sample) and explore only part of the genetic landscape of (22) (23) (24) (25) . Here, we investigated nucleotide compositions at each variation site and observed a high number of As and Us around all variation types and sites. However, since the SARS-CoV-2 genome is 62% A/U-rich and similar percentages of As and Us were observed around all variations, we concluded that no motifs are enriched around these variations in the viral subspecies analyzed. Consequently, our results cannot support that host editing enzymes are a major source of these intra-sample variations. Although it is possible that host RNA-editing enzyme are responsible for the occurrence of some variations, C->U transitions and G->U transversions are also generally associated with nucleotide deamination and oxidation, respectively (38) (39) (40) (41) (42) (43) (44) (45) . It is common practice to thermally inactivate SARS-CoV-2 samples before performing RNA extractions, RT-PCR, and sequencing (46) . However, heating samples can result in free radical formation, such as 8-hydroxy-20deoxyguanine (8-Oxo-dG), that could cause high levels of C->A and G ->U mutations and promote the hydrolytic deamination of C->U (38-41, 43, 45, 47, 48) . It was previously reported that these types of mutations occur at low frequency, that they are mostly detected when Rocheleau et al. 15 sequencing is performed on only one DNA strand, and that they are highly variable across independent experiments (40, 42) . Consequently, most transversions observed in our analysis are likely due to heat-induced damage, RNA extraction, storage, shearing, and/or RT-PCR amplification errors. However, we identified several positions with intra-sample variability recurrent in several independent samples, both from infected individuals and infected cells. They were detected at moderate to high frequencies, ranging from 2.5% to 39.3% per sample (Table 1 and 2), and most were derived from pair-end sequencing (90.7%) in which the two strands of a DNA duplex were considered. Thus, it is likely that these variations are genuine and represent hot spots for SARS-CoV-2 genome intra-sample variability. Amongst the variable positions identified in infected cells, most of them are located at the last 3' terminal third of the viral genome. These cells were infected with a large number of viruses (i.e., a high multiplicity of infection; MOI) for 24h (30) . The presence of several variations at positions in the region coding for the main structural proteins likely reflects that this is a region with increased transcriptional activity due to the requirement of producing their encoded mRNAs from sub-genomic negative-sense RNAs (9). Interestingly, a cluster of variations located at the 3'end of the S gene was observed for the two datasets analyzed. They correspond to four transversions located at the 3'end of the S 16 incorporation into virions (12) . While the mechanism is not completely understood, mutation of the ERRS motif on S resulted in a failure to interact with the M protein at the ERGIC and rather resulted in trafficking of S to the cell surface. Deletion of this motif might cause the S protein of SARS-CoV-2 to accumulate to the plasma membrane and increase the formation of large multinucleated cells known as syncytia. Consistent with these observations, our results indicate larger syncytia formation with S20 compared to the complete S protein. Moreover, we observed that the M protein failed to prevent S20-induced syncytia formation, as observed with the WT S protein, which correlates with the role of the M protein in interacting with Spike and retaining it in ERGIC. Similar mutants (S18, S19 and S21) were recently reported to increase both infectivity and replication of vesicular stomatitis virus (VSV) and human immunodeficiency virus (HIV) pseudotyped with SARS-CoV-2 S protein in cultured cells (49) (50) (51) (52) . Because these viruses bud from the plasma membrane (53, 54) , an increased localization at this site would explain the selection of these deletion mutants in pseudotyped virions. However, such variants would unlikely be transmitted horizontally in naturally occurring CoV where the budding site is the ERGIC compartment (10). Our findings indicate the presence of consistent intra-sample genetic variants of SARS-CoV-2, including a recurrent sub-population of S20 variants with elevated fusogenic properties. It is tempting to suggest a link between SARS-CoV-2 pathogenesis and the presence of S20, since severe cases of the disease were recently linked to considerable lung damage and the occurrence of syncytia (37, 55) . Clearly, more investigation is required to better define the extent of SARS-CoV-2 variability in infected hosts and to assess the role of these subspecies in the life cycle of this virus. More importantly, further studies on the presence of S20 and its link with Rocheleau et al. 17 viral pathogenicity could lead to better diagnostic strategies and design treatments for COVID- Rocheleau et al. shown on the right panel. (B) Processing of spike was detected using an anti-S1 and anti-S2 immunoblotting of HEK293T cells lysates previously transfected with empty vector (pCAGGS), or vector expressing SARS-CoV-2 S or SARS-CoV-2 S20 in the presence or absence of M protein. Characteristics of SARS-CoV-2 and COVID-19 A Novel Coronavirus from Patients with Pneumonia in China Coronaviruses: An Overview of Their Replication and Pathogenesis The molecular virology of coronaviruses A Structural View of SARS-CoV-2 RNA Replication Machinery: RNA Synthesis, Proofreading and Final Capping SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Intra-variations in SARS-CoV-2 samples 24 A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Coronavirus biology and replication: implications for SARS-CoV-2 The Cytoplasmic Tail of the Severe Acute Respiratory Syndrome Coronavirus Spike Protein Contains a Novel Endoplasmic Reticulum Retrieval Signal That Binds COPI and Promotes Interaction with Membrane Protein The Infectious Bronchitis Coronavirus Envelope Protein Alters Golgi pH To Protect the Spike Protein and Promote the Release of Infectious Virus Intracellular targeting signals contribute to localization of coronavirus spike proteins near the virus assembly site Role of the Spike Glycoprotein of Human Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in Virus Entry and Syncytia Formation Efficient Activation of the Severe Acute Respiratory Syndrome Coronavirus Spike Protein by the Transmembrane Protease TMPRSS2 Intra-variations in SARS-CoV-2 samples 25 Spread of SARS-CoV-2 in the Icelandic Population Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome Genetic diversity and evolution of SARS-CoV-2. Infection Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infection Overwhelming mutations or SNPs of SARS-CoV-2: A point of caution Genome Availability up to April 2020 and its Implications: Data Analysis. JMIR Public Health and Surveillance Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3′ UTRs Intra-variations in SARS-CoV-2 samples 26 RNA Editors, Cofactors, and mRNA Targets: An Overview of the C-to-U RNA Editing Machinery and Its Implication in Human Disease Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other Coronaviruses: Causes and Consequences for Their Short-and Long-Term Evolutionary Trajectories. mSphere Viral Mutation Rates Mutation rates among RNA viruses Viral quasispecies Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype Intra-variations in SARS-CoV-2 samples 27 A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data Genome Project Data Processing Subgroup (2009) The Sequence Alignment/Map format and SAMtools ggseqlogo: a versatile R package for drawing sequence logos HTSeq--a Python framework to work with highthroughput sequencing data Receptor Binding and Low pH Coactivate Oncogenic Retrovirus Envelope-Mediated Fusion Persistence of viral RNA, pneumocyte syncytia and thrombosis are hallmarks of advanced COVID-19 pathology Oxidized, deaminated cytosines are a source of C --> T transitions in vivo Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation Intra-variations in SARS-CoV-2 samples 28 DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification Reactive oxygen species, heat stress and oxidative-induced mitochondrial damage. A review Detection of Low-Frequency Mutations and Identification of Heat-Induced Artifactual Mutations Using Duplex Sequencing Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive sequencing applications Detection and quantification of rare mutations with massively parallel sequencing 8-Hydroxyguanine, an abundant form of oxidative DNA damage, causes G----T and A----C substitutions Laboratory management for SARS-CoV-2 detection: a user-friendly combination of the heat treatment approach and rt-Real-time PCR testing. Emerging microbes & infections Intra-variations in SARS-CoV-2 samples 29 Heat-induced formation of reactive oxygen species and 8-oxoguanine, a biomarker of damage to DNA Cytosine deamination and the precipitous decline of spontaneous mutation during Earth's history Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV A Replication-Competent Vesicular Stomatitis Virus for Studies of SARS-CoV-2 Spike-Mediated Cell Entry and Its Inhibition Measuring SARS-CoV-2 neutralizing antibody activity using pseudotyped and chimeric viruses Neutralizing Antibody and Soluble ACE2 Inhibition of a Replication-Competent VSV-SARS-CoV-2 and a Clinical Isolate of SARS-CoV-2 Intra-variations in SARS-CoV-2 samples 30 Assembly of animal viruses at cellular membranes. Annual review of microbiology HIV-1 assembly, release and maturation Syncytia formation by SARS-CoV-2 infected cells