key: cord-0833550-sq4lthew authors: Sadykov, Mukhtar; Mourier, Tobias; Guan, Qingtian; Pain, Arnab title: Short sequence motif dynamics in the SARS-CoV-2 genome suggest a role for cytosine deamination in CpG reduction date: 2021-02-25 journal: J Mol Cell Biol DOI: 10.1093/jmcb/mjab011 sha: 9d92b1108e36be4dab3e3b659d77d196e8d46884 doc_id: 833550 cord_uid: sq4lthew nan brief time scale during the COVID-19 pandemic. Here, we demonstrate progressive C>U substitutions in SARS-CoV-2 genome within the timeframe of five months. We highlight the role of C>U substitutions in the reduction of 5'-UCG-3' motifs and hypothesize that this progressive decrease is driven by host APOBEC activity. We aligned 22164 SARS-CoV-2 genomes from GISAID database to the reference genome and observed a total of 9210 single nucleotide changes with C>U being the most abundant ( Figure 1A ; Supplementary Text, Figures S1 and S2, and Table S1 ). Over a period of five months, we found a steady and substantial increase in C>U substitutions ( Figure 1B) , with almost half of them being synonymous (Supplementary Text and Figure S3 ), but not in other changes (Supplementary Figure S4) . One potential driver behind the increase in C>U changes could be the recently proposed APOBEC-mediated viral RNA editing (Di Giorgio et al., 2020; Simmonds, 2020; Supplementary Text) . Since APOBEC3 family members display a preference for RNA in open conformation as opposed to forming secondary structures (McDaniel et al., 2020) , we calculated the folding potential of all genomic sites that include C>U substitutions ( Figure 1C ). Positions with C>U changes are more often located in regions with low potential for forming secondary RNA structures. These observations are in agreement with the notion that members of the APOBEC family are the main drivers of cytosine deamination in SARS-CoV-2 (Di Giorgio et al., 2020; Simmonds, 2020) . We searched for possible APOBEC genetic footprints (5'-UC-3' > 5'-UU-3') in viral dinucleotide frequencies (Supplementary Figure S5 ). Among all dinucleotides, UpC showed the highest degree of decrease, while UpU exerted the highest rates of increase, which is consistent with APOBEC activity (Supplementary Text). When analyzing the context of genomic sites undergoing C>U changes, we noticed an enrichment for 5'-UCG-3' motifs (Supplementary Table S2 ). To assess the contribution of C>U changes in CpG loss, we examined the dynamics of [A/C/G/U]CG trinucleotides over time ( Figure 1D ). The progressive change (~1% over a 5-month period) of 5'-UCG-3' to 5'-UUG-3' is most striking when supported by a larger number of genomes (days 70-115), whereas no such pattern is observed for the other trinucleotides ( Figure 1D ). The association between cytosine deamination and CpG loss is further underlined by the rapid, progressive increase in 5'-UCG-3' > 5'-UUG-3' changes compared to other 5'-UC[A/C/U]-3' motifs (Supplementary Figure S6) . The genomic region for the highest percentage of 5'-UCG-3' loss is located in ORF1 (Supplementary Text and Figure S7 ). No apparent progression of 5'-UCG-3' over time is observed on the negative strand, suggesting that the action of APOBEC on the negative strand of SARS-CoV-2 is limited compared to that on the positive strand (Supplementary Figure S8) . The zinc-finger antiviral protein (ZAP) selectively binds to viral CpG regions, resulting in viral RNA degradation (Takata et al., 2017) . Previous studies reported that the reduced number of CpG motifs in HIV and other viruses played an important role in the viral replication inside the host cell, allowing the virus to escape ZAP protein activity (Takata et al., 2017) . Similarly, a stronger suppression of CpGs is observed in SARS-CoV-2 compared to other coronaviruses (Digard et al., 2020) . Given the high expression levels of APOBEC and ZAP genes in COVID-19 patients (Blanco-Melo et al., 2020) , the direct interaction of APOBEC with viral RNA (Schmidt et al., 2020) , and our observations, we hypothesize that as a consequence of APOBECmediated RNA editing, SARS-CoV-2 genome may escape host cell ZAP activity. Both APOBEC and ZAP are interferon-induced genes that act preferentially on ssRNA in open conformation (Luo et al., 2020; McDaniel et al., 2020) . Initially, APOBEC and ZAP enzymes may have overlapping preferred target motifs for their enzymatic functions ( Figure 1E ). The catalytic activity of APOBEC on 5'-UC-3' leads to cytosine deamination, which destroys ZAP's specific acting site (5'-CG-3'). The conversion of C>U allows viral RNA to escape from ZAPmediated RNA destruction. Therefore, uracil editing is more likely to become fixed at UCG positions due to the selective advantage this conveys to subvert ZAP-mediated degradation. A recent study hypothesized that both ZAP and APOBEC provide selective pressure that drives the adaptation of SARS-CoV-2 to its host (Wei et al., 2020) . Here, we provided one of the potential mechanisms that contribute to CpG reduction in SARS-CoV-2. In summary, our phylogeny-free approach, together with other recent studies, strongly supports the proposed model and merits future experimental validation. To our knowledge, this is the first study linking the dynamics of viral genome mutation to two known host molecular defense mechanisms, the APOBEC and ZAP proteins. APOBEC-1-mediated RNA editing Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 Intra-genome variability in the dinucleotide composition of SARS-CoV-2 Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 Molecular Mechanism of RNA Recognition by Zinc-Finger Antiviral Protein. Cell Rep Deamination hotspots among APOBEC3 family members are defined by both target site sequence context and ssDNA secondary structure A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Evidence for Strong Mutation Bias toward, and Selection against, U Content in SARS-CoV-2: Implications for Vaccine Design Modeling the Embrace of a Mutator: APOBEC Selection of Nucleic Acid Ligands The SARS-CoV-2 RNA-protein interactome in infected human cells Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other Coronaviruses: Causes and Consequences for Their Short-and Long-Term Evolutionary Trajectories. mSphere CG dinucleotide suppression enables antiviral defence targeting non-self RNA Coronavirus genomes carry the signatures of their habitats The number of C>U substitutions across sample dates. The average number of substitutions for each sampling day is plotted (blue line, left y-axis) with plus/minus one standard deviations as error bars. The number of samples for each day is shown as red bars (right y-axis). (C) Folding potential of positions with C>U changes (Supplementary Text). P-values from Fisher's exact test are shown above bars. (D) The fraction of