key: cord-0893480-noq1p2al authors: Daron, J.; Bravo, I.G. title: Variability in codon usage in Coronaviruses is mainly driven by mutational bias and selective constraints on CpG dinucleotide date: 2021-01-26 journal: bioRxiv DOI: 10.1101/2021.01.26.428296 sha: faff33bc5c821011d43f69250d1737418b6b952c doc_id: 893480 cord_uid: noq1p2al The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus within the Orthocoronavirinae causing an emergent infectious disease in humans, the ongoing coronavirus disease 2019 pandemic (COVID-19). Due to the high zoonotic potential of these viruses, it is critical to unravel their evolutionary history of host species shift, adaptation and emergence. Only such knowledge can guide virus discovery, surveillance and research efforts to identify viruses posing a pandemic risk in humans. We present a comprehensive analysis of the composition and codon usage bias of the 82 Orthocoronavirinae members, infecting 47 different avian and mammalian hosts. Our results clearly establish that synonymous codon usage varies widely among viruses and is only weakly dependent on the type of host they infect. Instead, we identify mutational bias towards AT-enrichment and selection against CpG dinucleotides as the main factors responsible of the codon usage bias variation. Further insight on the mutational equilibrium within Orthocoronavirinae revealed that most coronavirus genomes are close to their neutral equilibrium, the exception is the three recently-infecting human coronaviruses, which lie further away from the mutational equilibrium than their endemic human coronavirus counterparts. Finally, our results suggest that while replicating in humans SARS-CoV-2 is slowly becoming AT-richer, likely until attaining a new mutational equilibrium. The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause 30 of respiratory disease COVID-19, occasionally leading to acute respiratory distress 31 syndrome and eventually death (Zhu et al. 2020 ). With no antiviral drugs nor vaccines, and 32 with the presence of asymptomatic carriers, the COVID-19 outbreak turned into a public 33 health emergency of international concern. Before the 2019 zoonotic spillover, SARS-CoV- Besides been driven by constraints on tRNA abundance, CUB in viruses is also 77 shaped by dinucleotide abundance. It has been long established that many RNA viruses has been formally demonstrated that the severe attenuation of the virus replication is 81 attributed to the increase of CpG and UpA dinucleotides frequencies rather to the use of 82 rare codons (Tulloch et al. 2014) . Viruses with high CpG/UpA frequencies may be more 83 likely to be recognized by pathogen innate immune sensors, preventing them from 84 replication initiation (Kumagai et al. 2008 ). Functionally, the inhibition of viral replication 85 and the degradation of the viral genome have been attributed to the zinc finger antiviral 86 protein (ZAP), a powerful restriction factor that specifically binds to CpG and UpA motifs 87 (Odon et al. 2019) . It is hence critical when investigating CUB to disentangle confounding 88 effects of CUB and dinucleotide abundance. 89 Finally, non-adaptive evolutionary forces can also affect the viral genome 90 composition. Because fitness differences associated to individual codons are very small, it 91 requires very large population sizes (e.g. highly productive and highly prevalent viral 92 infections) for natural selection to act and lead to a significant impact on the global 93 genomic CUB. Such trend is verified in large organisms with small population sizes, such 94 as most mammals, where natural selection on CUB is weak (Duret 2002; Chamary et al. 95 2006) and CUB is instead primarily shaped by mutational biases ) and GC-96 biased gene conversion (Duret and Galtier 2009 Table 1 ). First, we explored the variation in the 59 synonymous codons 128 frequencies through a Principal Component Analysis (PCA) ( Figure 1A and Supplementary 129 Figure 1A ). The PCA efficiently reduced information dimensionality as the first two 130 components capture respectively 34.9% and 20.1% of the variance, and the first four 131 dimensions contribute with above 70% of explanatory power. Interestingly, the 82 CoVs 132 were distributed scattered along the two first PCA axes without any obvious stratification 133 as a function neither of the viral taxonomy nor the host infected. This result contradicts our 134 initial hypothesis of a host-specific CUB signature in CoVs and suggests that translational 135 selection for convergence towards the host's CUB is presumably weak. 136 The PCA results contribute further with information about the spreading of the 137 individual codons as well as their contribution to the total variance (Supplementary Figure 138 1B). The first PCA axis sharply splits the codons depending on their nucleotide in the third 139 codon position (often referred to as GC3), at the exception of UUG-Leu codon, which 140 stands alone among all other A-or T-ending codons. Strikingly, we found that variation in 141 genomic GC3 content perfectly correlated with variation on the first PCA axis (R 2 142 adjust=0.94, pvalue<1e -50 , Figure 1B ). Note that variation in GC3 did not show any 143 correlation with any other of the main PCA axes (Supplementary Figure 2) . In-depth 144 analysis of the frequency patterns for the 18 synonymous codons families showed that A-145 or-U ending codons are systematically preferred over the G-or-C ending ones ( Figure 1C) . 146 Such trend was especially true for amino acids with multiplicity-two, but also confirmed for 147 amino acids with multiplicity four: U-ending codons were systematically the most used 148 among the four codons, while A-ending codons were always preferred over the G-ending 149 one. For amino acids with multiplicity six, the overall scheme corresponds to the combined 150 patterns of a family of multiplicity four and a family of multiplicity two. Altogether, our 151 observations show that variation in GC3 is the main explanatory factor for variation CUB 152 between CoVs, with CoVs genomes being universally enriched in AT-ending over GC-153 ending synonymous codons. 154 We aimed at quantifying further the proportion of the global variability in CUB that is 155 explained by the host and by the virus taxonomic stratification. Through a correspondence 156 analysis, we associated the 82 different viruses of our codon usage table into blocks 157 corresponding to the 13 host taxonomic orders, allowing us to split the total variability into 158 between-category and within-category variability. The top ten eigenvalues of such 159 decomposition are represented in Figure 1D showing that within-host differences in CUB 160 explain four times more of the overall variability than differences between-hosts 161 (respectively 80.3% vs 19.7%), the explanatory power of both levels being larger than the 162 randomly expected one from a homogeneous distribution (7.59%, pvalue < 9e-4). A similar 163 analysis was reproduced associating the viruses into blocks corresponding to the four viral 164 genera within Coronavirinae. We observed that within-genus differences in CUB explain 165 over five times more of the overall variability than between-genera differences 166 (respectively 85.6% vs 14.4%), %), the explanatory power of both levels being again larger 167 than the randomly expected one from a homogeneous distribution (3.72%, pvalue < 9e -4 ). 168 Together, our results are consistent with our initial PCA analysis, and suggest that both 169 host and virus taxonomy variables are not the main factors that drive variation in CUB 170 within Orthocoronavirinae. 171 The mutational spectrum in Orthocoronavirinae is AU-biased 172 Since variation in GC3 content is the main individual driver of the variation in CUB 173 between CoVs, we investigated next whether mutational biases are the underlying main 174 force driving nucleotide content. Population genetics studies show that in order to explore 175 mutational biases independently of the effect of natural selection, one should work at 176 shallow, short-term evolutionary periods, where natural selection is less powerful to impact 177 nucleotide polymorphism patterns (Hershberg and Petrov 2010) . Consequently, we 178 AU-enriching substitutions occur at much higher rate than GC-enriching ones in both 210 synonymous and nonsynonymous compartments (respectively mean fold change=2.49, p-211 value<7.6e -6 ; and mean fold change=1.38, p-value<7.6e -6 ; paired Wilcoxon signed rank 212 test; Figure 2B) . 213 Finally, we tested whether for each viral metapopulation the nucleotide content is at 214 equilibrium in each compartment, by contrasting the observed GC content against the 215 expected GC one if the metapopulation lay at its inferred mutational equilibrium (GCeq) 216 ( Figure 2C ). We state first the very good correlation between observed and expected GC 217 composition in both synonymous and non-synonymous SNPs (respectively R 2 adjusted = 218 0.93 and 0.86; p-values < 2e -10 and 4e -8 ). Notably, CoVs infecting the same host displayed 219 important differences in GC3% in the synonymous compartment, in the case of hCoVs 220 ranging from 18.7% (HKU1) to 30.6% (229E). Such variation indicates again that the host 221 is not the main factor governing the strength of the mutational bias. Altogether our results 222 suggest that the mutational spectrum in CoVs is biased towards AU-enrichment and that 223 the nucleotide content of the viral genomes is primarily determined by mutational biases. populations at demographic equilibrium the GC-content converges towards the mutational 241 equilibrium, yielding a perfect U-shape in the AT-GC allele frequency distribution, resulting 242 in a symmetrically folded SFS. Deviations from the U-shape distribution would thus reflect 243 that the genomic GC-content is not at its mutational equilibrium, a negative (or positive) 244 skewness indicating respectively higher (or lower) GC-content than expected under the 245 mutational equilibrium. We have applied this framework, computed the folded SFS for the 246 seven hCoVs lineages ( Figure 3A ) and quantified the deviation from the expected U-shape 247 distribution by calculating the skewness of each folded SFS as a proxy for the departure of 248 the GC-content from the expected mutational equilibrium ( Figure 3B ). Our results show 249 that for all four endemic hCoVs the observed skewness was not different from the null 250 expectation, while for the three recently emerged hCoVs (SARS-CoV-1, MERS-CoV and 251 SARS-CoV-2) we observed a significant depart from the expected GC content at 252 equilibrium. Both MERS-CoV and SARS-CoV-2 displayed a negative skewness, meaning 253 that their genomes were slightly GC-richer than the anticipated mutational equilibrium, 254 while SARS-CoV-1 displayed a positive skewness, reflecting an AT-richer genomic content 255 than expected for the mutational equilibrium. Finally, benefiting from the wealth of 256 sequence data generated on SARS-CoV-2, we investigated the dynamics of the GC-257 content over the spread of the COVID19 pandemic ( Figure 3C ). We observed that the GC- Our initial characterization demonstrated that variation in GC3 content is the 267 prevailing force shaping CUB among CoVs, explain one-third of the total variation in CUB. 268 We have thus tried to identify other evolutionary forces further shaping CUB in CoVs. 269 Previous works have identified that in several RNA viruses infecting humans certain 270 dinucleotides are under-represented, notably CpG and UpA, and that this low dinucleotide 271 frequency has a strong impact on codon-pair bias (Kunec and Osterrieder 2016). We 272 investigated thus the ratio of the observed over the expected dinucleotide frequency in the 273 CoVs coding sequences. Figure 4A shows that the observed/expected abundance of the 274 CpG dinucleotide in all CoVs lineages is significantly lower than the null expectation based 275 on the individual nucleotide frequency, while this is not the case for the UpA dinucleotide. 276 To a lesser extent, we identify CpA and UpG to be marginally more frequent than 277 expected, which can be linked to the strong decrease in CpG, as they correspond 278 respectively to the transitions CpG->CpA and CpG->UpG. We further state that the 279 observed/expected ratios for CpG and UpA correlated well with the second and third 280 dimension of our PCA analysis, respectively (R 2 adjusted = 0.67 and 0.44; Figure 4B and 281 C). It is important to remind that these dinucleotide frequency values have been estimated 282 for the complete viral coding sequence but are not limited to the codons themselves, i.e. 283 we have also considered the presence of CpG and UpA dinucleotides in the codon 284 boundary context, so that this impact is not simply related to higher frequency of CG-rich 285 or AT-rich codons. This specific point will be addressed below. Overall, our results show 286 that variation in CUB between CoVs is associated with variation in CpG and UpA 287 dinucleotide content. 288 Next, in order to formally demonstrate the impact that CpG and UpA dinucleotide 289 frequency have on CUB, we compared the synonymous codon frequencies among codon 290 families either containing or lacking a CpG-or UpA-ending codon. Having previously 291 established that mutational bias modulates frequencies among synonymous codons, we 292 accounted for this confounding effect by calculating an expected ratio in synonymous 293 codon frequencies for codon pairs in the form of XXA/XXG. To this end we defined as our 294 reference set for comparison the three codon families with multiplicity-two lacking CpG and 295 UpA dinucleotides and ending by either A or G (i.e. Gln -CAA/G-, Lys -AA/G-and Glu -296 GAA/G-, indicated as "XXpA/XXpG" in Figure 4D ). For this reference set, we calculated an 297 overall fold change XXA/XXG ratio of 1.26, consistent with the AT-enriching mutational 298 bias described above. For amino acids encoded by CpG-ending codons (Ser4, Pro, Thr, 299 Ala, indicated as "XCpA/XCpG" in Figure 4D ), we observed a 5.33-fold change XCA/XCG 300 ratio, significantly higher than the corresponding one for the reference amino acids set 301 (within-genome paired Wilcoxon signed rank test, p-value<1e -14 ). Such difference indicates 302 that regardless of the amino acid encoded, CpA-ending codons are systematically 303 preferred over their CpG-ending synonymous counterparts, at higher proportion than 304 expected under the pressure of A->G mutational bias alone. In the case of UpA-ending 305 codons (Leu2, Leu4, Val), we observed an overall XXA/XXG ratio of 1.04, slightly but 306 statistically significantly lower than the corresponding one for the reference amino acid set 307 (within-genome paired Wilcoxon signed rank test, p-value <1e -6 ). We interpret thus that Here we report that the between-host variability of CUB in CoVs accounts for 19.7% of the 474 total variability in CUB, a fraction two times larger than the random expectation. Although 475 statistically significant, this contribution does not necessarily prove that differential 476 adaptation to the translational machinery of the different hosts is shaping CUB in CoVs. 477 Our analyses suggest instead that significant differences in GC3 and UpA occur between 478 In order to investigate mutational bias in CoVs and to disentangle it from the effect 486 of natural selection, we have analyzed mutations occurring during recent evolution in 18 487 viral metapopulations at different taxonomic host and viral levels. With this approach we 488 have aimed at comparing mutational biases and equilibria, while minimizing the impact of 489 natural selection. Notwithstanding, for all metapopulations the substitution rate in the 490 synonymous compartment was systematically higher than in the corresponding non-491 synonymous compartment (Figure 2A-B) . This result suggests that natural selection is still 492 able to prune deleterious mutations at shallow levels in CoVs. We hypothesize that such 493 an effect of natural selection may be linked to the viral life cycle allowing for within-host 494 competition between putatively generated variants, and/or to the presence of bottlenecks 495 or of differential variant success during within-host and during between-host transmission. 496 This verbal argument can obviously be addressed only through observational data, 497 quantifying the within-host viral diversity during the course of the infection, and its 498 connection with the observed between-host viral diversity, or through experimental data 499 using for instance mutation accumulation lines. 500 This metapopulation approach has allowed us also to address the extent to which 501 each of the 18 CoVs may have reached a mutational composition equilibrium. Our results 502 for all metapopulations show a very good correlation between the observed and the 503 expected GC3 contents at both the synonymous and non-synonymous compartments, 504 indicating that all CoVs were found at their GC-content equilibrium ( Figure 2C ). However, 505 when considering the more accurate folded site-frequency spectrum at the population 506 level, we observe that endemic hCoVs are closer to an equilibrium in the AT-GC allele 507 distribution than zoonotic, recently acquired hCoVs ( Figure 3A-B) . This suggests that 508 endemic hCoVs may have undergone a compositional drift towards a novel equilibrium 509 under the novel mutational pressures in the human hos upon host switch. Indeed, at the 510 short time scale that our analyses can explore for the SARS-CoV-2 epidemics in humans, 511 we verify a small albeit significant trend towards GC reduction ( Figure 3C ). Future work will 512 benefit from accessing a sufficient number of SARS-CoV-2 non-human strains collected 513 from different hosts (bats and pangolins in first instance, given the current knowledge) to 514 assess the population-level characteristics of the direction and intensity of mutational bias 515 in endemic host species. Additionally, endemic hCoVs exhibit a large variation in observed 516 synonymous GC3-content, ranging from 18.7% to 30.6%, which suggests that the strength 517 of the mutational bias causing C-to-U change is not similar among all hCoVs. We 518 hypothesize that such differences in mutational signature could be related to the 519 differences in the host mutator APOBEC3 repertoires, which vary between hosts, but also 520 between cell types within an organism, so that changes in the host tropism will have an 521 impact on the actual mutational intensity and direction that the viral genomes experience. 522 SARS-CoV-2 is the third Coronavirinae zoonotic spillover, and the associated 524 For the intra-species diversity, we downloaded strains sequences from the Virus 550 Pathogen Resource database (VIRP). Sequences unusually long or short (>130% or 551 <70% of the median length of the reference sequence of a species) were filtered out. In 552 addition, the complete genomic sequences of SARS-CoV-2 isolates were obtained from 553 GISAID (available at https://www.gisaid.org/epiflu-applications/nexthcov-19-app/), 554 accessed twice, on 2020/04/26 to collect the 9774 sequences for our metapopulation 555 analysis and on 2020/08/18 to investigate the GC-content dynamics over the spread of the 556 pandemic. Detailed accession ID for both datasets are provided in Supplementary Table 1 557 and Supplementary Table 2 . 558 We concatenated the coding sequences of each CoV genome in order to compute 560 the total synonymous codon frequencies and GC-content. This analysis yielded a matrix of 561 82 (CoVs) x 59 (synonymous codons) which served as input for either our PCA analysis 562 (FactoMineR) and correspondence analysis (ade4). In parallel, we calculated the 563 dinucleotide observed/expected abundances in coding sequences by calculating the ratio 564 rXY = fXY/fXfY, where fXY denotes the observed frequency of the dinucleotide XY and 565 fXfY the product of the individual frequencies of the nucleotides X and Y in a sequence. 566 We used as significant lower and upper boundaries the thresholds of 0.78 and 1.25, 567 corresponding to p<0.001 for sufficiently long (>20kb) random sequences (Burge et al. 568 1992) . 569 Prior to the SNP calling analysis, we identified and removed recombining sequence 571 from each dataset using fastGear (Mostowy et al. 2017 ). Short recombination segments 572 (<200 bp) identified by fastGear were not remove but those region were be mask in the 573 downstream analysis. Next, in order to work in a framework where population structure will 574 not interfere with our downstream analysis, for each species we generated a phylogenetic 575 tree root with an outgroup and selected monophyletic group having the most of sequence 576 for our downstream analysis. After those preprocessing sets, we finally performed using TAA TAC TAG TAT TCA TCC TCG TCT TGA TGC TGG TGT TTA TTC TTT Comparison of the CG3 content, CpG, and UpA dinucleotide relative abundances across CoVs hosts. Variance analysis of normally distributed data was done using ANOVA follow by a post doc Tukey test showing further individual pairwise differences. Non-normal data were processed using Krustal-Wallis test follow by a rank Wilcoxon test with a Bonferroni correction (code for p-value: * <0.05, ** <0.01, *** <0.001). Figure 8 : Comparison of the CG3 content, CpG, and UpA dinucleotide relative abundances across CoVs genera. Statistical significance has been treated similarly than above. Synonymous codon usage in Drosophila melanogaster: natural selection and 746 translational accuracy Exploiting tRNAs to Boost Virulence Molecular epidemiology and evolutionary histories of human coronavirus OC43 750 and HKU1 among patients with upper respiratory tract infections in Kuala Lumpur The influence of CpG and UpA 753 dinucleotide frequencies on RNA virus replication and characterization of the innate cellular 754 pathways underlying virus attenuation and enhanced replication Viral adaptation to host: a proteome-based analysis 757 of codon usage and amino acid preferences 759 Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 760 pandemic Over-and under-representation of short 762 oligonucleotides in DNA sequences Chemical differentiation along metaphase chromosomes Hearing silence: non-neutral evolution at 766 synonymous sites in mammals RNase L Targets Distinct Sites in Influenza A Virus RNAs Origin and evolution of pathogenic coronaviruses Fast algorithms for large-scale genome 772 alignment and comparison A genome-wide view of Caenorhabditis elegans base-775 substitution mutation processes SARS-CoV-2 jumping the species barrier: Zoonotic lessons from SARS, 778 MERS and recent advances to combat this pandemic virus Evidence for host-780 dependent RNA editing in the transcriptome of SARS-CoV-2 Codon Usage and 782 Phenotypic Divergences of SARS-CoV-2 Genes Mistranslation-Induced Protein Misfolding as a Dominant 784 Constraint on Coding-Sequence Evolution Evolution of synonymous codon usage in metazoans Biased Gene Conversion and the Evolution of Mammalian Genomic 788 Landscapes Warts, or Asymptomatic Infections: Clinical Presentation Matches Codon Usage 791 Preferences in Human Papillomaviruses The fitness 793 landscape of the codon space across environments Feline panleukopenia virus (FPV) codon bias analysis reveals a progressive adaptation to the 796 new niche after the host jump CpG and UpA 798 dinucleotides in both coding and non-coding regions of echovirus 7 inhibit replication 799 initiation post-entry Usage Bias in Animals: Disentangling the Effects of Natural Selection Size, and GC-Biased Gene Conversion Recurrent Loss of APOBEC3H Activity during Primate Evolution Quantification of GC-biased 806 gene conversion in the human genome Codon bias analysis may be insufficient for identifying 808 host(s) of a novel virus Causes and Effects of N-Terminal Codon Bias in 810 Bacterial Genes Patterns of Evolution and Host Gene 812 Mimicry in Influenza and Other RNA Viruses Multivariate analyses of codon usage of SARS-CoV-814 2 and other betacoronaviruses. Virus Evol Non-neutral processes drive the nucleotide composition of 817 non-coding sequences in Drosophila Evolutionary Paradigms from Ancient and Ongoing Conflicts 819 between the Lentiviral Vif Protein and Mammalian APOBEC3 Enzymes Differential Evolution of Antiretroviral Restriction Factors in Pteropid 823 Bats as Revealed by APOBEC3 Gene Complexity Selection on Codon Bias Evidence That Mutation Is Universally Biased towards AT in 826 Bacteria Evolution of chromosome bands: molecular ecology of noncoding DNA Evidence Supporting a Zoonotic Origin of Human Coronavirus Strain NL63 A 833 functional investigation of the suppression of CpG and UpA dinucleotide frequencies in plant 834 RNA virus genomes Codon usage and tRNA content in unicellular and multicellular organisms High-Resolution 838 Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling. PLoS 839 Pathog Retroviruses drive the rapid evolution of mammalian APOBEC3 842 genes Capturing the mutational landscape of the beta-lactamase TEM-1 Six reference-quality genomes reveal evolution of bat 848 adaptations Cross-species transmission of the newly identified 850 coronavirus 2019-nCoV Codon Usage and tRNA Genes in 852 Eukaryotes: Correlation of Codon Usage Diversity with Translation Efficiency and with CG-853 Dinucleotide Usage as Assessed by Multivariate Analysis Development of a new oral poliovirus vaccine for the 856 eradication end game using codon deoptimization Coding-Sequence Determinants of Gene 858 Expression in Escherichia coli TLR9 as a key receptor for the recognition of DNA Codon Pair Bias Is a Direct Consequence of Dinucleotide Bias Codon usage determines the mutational 864 robustness, evolutionary capacity, and virulence of an RNA virus Rationalizing the development of live attenuated virus 867 vaccines Genome Landscapes and Bacteriophage 869 Codon Usage Rate, molecular spectrum, and consequences of human mutation Synonymous Virus Genome Recoding as 873 a Tool to Impact Viral Fitness Attenuation of RNA viruses by redirecting their 876 evolution in sequence space Efficient 878 Inference of Recent and Ancestral Recombination within Bacterial Populations An ancient history of gene duplications, fusions and 881 losses in the evolution of APOBEC3 mutators in mammals A role for gorilla APOBEC3G in shaping lentivirus evolution including 884 transmission to humans The role of ZAP and OAS3/RNAseL pathways in the attenuation of an RNA 887 virus with elevated frequencies of CpG and UpA dinucleotides Host and viral 890 traits predict zoonotic spillover from mammals The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis 893 thaliana Synonymous mutations in CFTR exon 12 affect splicing 895 and are not neutral in evolution Distribution of fitness effects caused 897 by single-nucleotide substitutions in bacteriophage f1 Patterns of nucleotide substitution in Drosophila and mammalian 899 genomes Syndrome Coronavirus and Close Relatives of Human Coronavirus 229E in Bats, Ghana -903 Synonymous but not the same: the causes and consequences of 906 codon bias Codon Optimality Is a Major Determinant of mRNA Stability Analysis of codon usage bias of Crimean-Congo 911 hemorrhagic fever virus and its adaptation to hosts 914 Detecting positive selection within genomes: the problem of biased gene conversion Estimating Translational Selection in Eukaryotic Genomes Evidence for Strong Mutation Bias toward, and Selection against, U 920 Content in SARS-CoV-2: Implications for Vaccine Design Codon usage bias from tRNA's point of view: redundancy, specialization, and 924 efficient decoding for translation optimization The distribution of fitness effects caused by single-926 nucleotide substitutions in an RNA virus Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other 928 Coronaviruses: Causes and Consequences for Their Short-and Long-Term Evolutionary 929 Trajectories. mSphere [Internet] 5 Modelling mutational and selection 932 pressures on dinucleotides in eukaryotic phyla -selection against CpG and UpA in 933 cytoplasmically expressed RNA and in RNA viruses Host influence in the genomic 935 composition of flaviviruses: A multivariate approach Codon usage determines translation rate in 938 Escherichia coli RAxML version 8: a tool for phylogenetic analysis and post-analysis of 940 large phylogenies Extensive genomic recoding by codon-pair deoptimization selective for mammals is a 943 flexible tool to generate attenuated vaccine candidates for dengue virus 2 946 CG dinucleotide suppression enables antiviral defence targeting non-self RNA The adaptation of codon usage of +ssRNA viruses to 949 their hosts A comprehensive analysis of genome composition and 951 codon usage patterns of emerging coronaviruses RNA virus attenuation by 953 codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies. 954 eLife Complete Genomic Sequence of Human Coronavirus OC43: Molecular Clock Analysis Suggests 957 a Relatively Recent Zoonotic Coronavirus Transmission Event Measuring the 959 distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS 960 ratios Codon usage bias and the evolution 962 of influenza A viruses. Codon Usage Biases of Influenza Virus Retrocopying expands the functional 964 repertoire of APOBEC3 antiviral proteins in primates Relationship of SARS-CoV to other pathogenic RNA 966 viruses explored by tetranucleotide usage profiling Codon usage is an important 968 determinant of gene expression levels largely through its effects on transcription A 971 Novel Coronavirus from Patients with Pneumonia in China