key: cord-0905582-7pflfznh authors: Zang, Minghui; He, Wanting; Du, Fanshu; Wu, Gongjian; Wu, Bohao; Zhou, Zhenlei title: Analysis of the codon usage of the ORF2 gene of feline calicivirus date: 2017-06-16 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2017.06.013 sha: a63c60615fbc8b63dd710aa2ae23c6f2d6d25fa5 doc_id: 905582 cord_uid: 7pflfznh Feline calicivirus (FCV) is a highly prevalent pathogen of the domestic cat that causes acute infections of the oral and upper respiratory tract. The E region of the ORF2 protein is responsible for the induction of virus-neutralizing antibodies, thus it is important to understand the codon usage of this gene. Here, analysed 90 coding sequences of ORF2 and show that it undergoes a low codon usage bias. In addition, although mutational bias is one of the factors shaping the codon usage bias of this gene, natural selection plays a more significant role. Our results reveal part of the mechanisms driving FCV evolution, which will lay foundation for the further research of FCV. Feline calicivirus (FCV) is a highly prevalent pathogen of the domestic cat, with widespread distribution across the world (Radford et al., 2007) . It has been suggested that almost all of the members of Felidae, such as cats, tigers and cheetahs are susceptible to the virus. FCV has also been isolated from the faeces of dogs (Gabriel et al., 1996) . FCV strains have been shown to exhibit interspecific circulation among different animal species. FCV belongs to the genus vesivirus of the Caliciviridae. It has a small, non-enveloped, positive-sense, single-stranded RNA genome of approximately 7700 nucleotides of which the 5′ end is linked covalently to the VPg protein and the 3′ end is linked to poly(A). The genome contains three open reading frames (ORFs) referred to as ORF1, ORF2 and ORF3 (Prikhodko et al., 2014) . ORF2 encodes the capsid precursor protein, which is processed by the viral protease to release a small 124 amino acid protein called the leader of the capsid (LC) and the mature capsid protein (VP1). Comparative analysis of ORF2 sequences has been used to elucidate phylogenetic relationships among different FCV isolates (Prikhodko et al., 2014) . Codons that encode the same amino acid are referred to as synonymous codons. Although synonymous codons encode for the same amino acid, their corresponding tRNAs may differ in relative abundance in the cell as well as the ribosome recognition speed, thus affecting the codon usage. The usage of synonymous codons is a non-random process with some codons being used more often than others (Marín et al., 1989) . This phenomenon which called 'codon usage bias', can be found in numerous species such as prokaryotes, eukaryotes and viruses (Liu et al., 2011) . Codon usage is influenced by two major factors, natural selection and mutation bias (Gu et al., 2004) . The codon usage between the virus and the host will affect the overall survival of the virus, the ability to evade the host immune system and evolution (Moratorio et al., 2013) . Thus, understanding the codon usage of viruses can provide information about viral evolution and expand our understanding of the regulation of viral gene expression based on codon adoption. This can aid rational vaccine design to achieve efficient viral protein expression to induce long-lasting immunity. Because different FCV strains cause disease with a wide range of clinical signs, it is important to characterize the genetic variation, evolution and the codon usage pattern of FCV to understand how these viral strains cause disease. The aim of this study was to describe the genetic features of the ORF2 gene of FCV. To this end, we analysed in detail the genetic evolution, the codon usage pattern and the evolutionary characterization of the codon usage pattern of FCV. 2.1. Sequence data 90 coding sequences (CDS) of ORF2 of FCV strains were included in this study, which were retrieved from the National Center for Biotechnology (NCBI) GenBank database (https://www.ncbi.nlm.nih.gov/ nucleotide/). The details of the sequences analysed including accession number, time of collection and geographical distribution are shown in Table S1 . Infection, Genetics and Evolution 54 (2017) Infection, Genetics and Evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d The frequency of each nucleotide (A%, U%, G% and C%) was calculated using BioEdit. The nucleotide composition of the third synonymous codon position of each codon (A3s, T3 s, G3 s, C3s) was calculated using the Codon W package. The G + C at the first (GC1s), second (GC2s) and third codon positions (GC3s) were calculated using the CodonW program. Additionally, the G + C at the first and the second positions (GC12s) were calculated with the same program. The RSCU value of each codon (except for Met, Trp and termination codons and excluding the influence of amino acid composition and sequence length) was calculated to directly reflect the usage characteristics as first proposed in by sharp et al. (Sharp and Li, 1986) . The RSCU value of a codon is the ratio of its observed frequency to its expected frequency assuming that all codons for a particular amino acid are used evenly (Peden, 1999) and it was calculated using the following equation: It is essential to note that g ij represents the observed number of the i th codon for the j th amino acid which has ni kinds of synonymous codons (Nasrullah et al., 2015) . Normally, it is considered that a high RSCU value reflects a strong codon usage bias. Codon usages with RSCU values of b 1.0, 1.0, N 1.0 stand for negative codon usage bias, no bias and positive codon usage bias respectively (Chen et al., 2014a) . The ENC is considered the magnitude of the codon usage bias of a single gene (Wright, 1990) . The ENC value is not influenced by the amino acid or the gene length (Morla et al., 2016) . The ENC value was calculated using the formula given below: where the s value stands for the GC3s composition of each codon (Chen et al., 2014a) . The ENC value ranges from 20 (only one of the possible synonymous codons is used for the corresponding amino acid) to 61 (all possible synonymous codons are used equally for the corresponding amino acid (Wright, 1990) . In contrast to the RSCU value, the smaller the ENC value, the greater the extent of codon usage bias. A ENC value equal or b35 is considered to be a sign of strong codon bias (Comeron and Aguadé, 1998 ). ENC plots were drawn to determining factors (especially mutation pressure) that influence the codon usage bias (Wright, 1990) taking the GC3s values in the x axes and the ENC values in the y axes. Under the null model, if codon usage is only constrained by G + C, the predicted ENC values would sit on or around the standard curve (Jiang et al., 2007) . Otherwise, if predicted ENC values sit far lower than the standard curve, other factors such as natural selection play a major role in shaping codon usage bias. PCA, a widely used multivariate statistical approach to determine the major trends in codon usage variation of genes was performed using GraphPad Prism 6.0 (Gupta and Ghosh, 2001) . The RSCU value of each gene, excluding Met, Trp and termination codons, is explained by a 59-dimensional space vector and transformed into a smaller number of unrelated factors (Lu et al., 2013) . To unravel if natural selection shapes codon usage bias, the Gravy and Aroma score were determined. Both indices were obtained from CodonW, which reveals the frequencies of hydrophobic and aromatic amino acids respectively (Kyte and Doolittle, 1982) . A higher Gravy or Aroma value suggests a more hydrophobic or aromatic amino acid product. Neutrality analysis was used to determine the role of mutational bias and natural pressure shaping codon usage bias. In neutrality plots, GC12s are drawn against GC3s each dot representing an independent FCV strain. It is essential to demonstrate that the slope of the line near to zero is an indication of only natural selection constrains the codon usage bias, while the near to one represents complete neutrality (Sueoka 1988 ). Correlation analysis was performed using GraphPad Prism 6.0. The nucleotide compositions of 90 coding sequences of FCV ORF2 were calculated. The mean values of A%, C%, G% and U% were 26.48%, 21.23%, 23.20% and 29.10%, with standard deviations (SD) of 0.53, 0.73, 0.44 and 0.62 respectively. This indicates that U and A were more abundant than C and G, while U was the most preferred nucleotide. The codon compositions at the third position (A3, U3, G3, C3, GC3) revealed that the mean U3% (43.57%) was the highest among the four nucleotides, which is consist with the nucleotide content of FCV ORF2 gene (Table S2 ). The GC3 values raged from 32.8% to 44.3% (mean 38.3%.), indicating that A/U terminated codons are preferred over G/C terminated codons. The RSCU values of all 61 codons were calculated and are displayed in Table 1 . Among the 18 most frequently employed synonymous codons, 17 optional codons ended with U (GCU for Ala, UGU for Cys, GAU for Asp, UUU for Phe, GGU for Gly, AUU for Ile, CUU for Leu, CCU for Pro, UCU for Ser, ACU for Thr, GUU for Val, UAU for Tyr), 3 preferred codons terminated with A (GAA for Glu, AAA for Lys, CAA for Gln), 2 codons ended with C (CAC for His, AAC for Asn) and one codon terminated with G (AGG for Arg). It is interesting to note that codons ending in U were the most frequently used. This is in accordance with the fact that U was the most abundantly used nucleotide, demonstrating that codon usage is influenced by compositional constraints. The values of the ENC analysis ranged from 49.38 to 57.55 (average ± SD of 53.70 ± 1.639) indicating fluctuation among the 90 FCV strains. The high ENC values (ENC N 45) indicate a low codon usage bias. ENC-plots were drawn with ENC values plotted against GC3s values according to the geographical distribution of the strains used in this study (Fig. 1) . All the strains, represented in different colours for each continent, located below the theoretical curve (Fig. 1A) . Additionally, strains isolated from the same continents did not cluster together. In particular, all 90 strains collected from 9 countries were distributed widely (Fig. 1B) . This indicates that mutational pressure combined with other factors contributes to codon usage bias of the FCV ORF2 gene (Sueoka 1988 ). In addition, there was a correlation between the nucleotide composition (A%, U%, G%, C%, GC%) and the codon contents (A3s, T3 s, C3s, G3 s, GC3s) (p b 0.05), except between the relationship of A3s and T. Furthermore, there was a significant correlation between the ENC values and the nucleotide compositions (p b 0.01), which indicates that mutational bias influences the synonymous codon usage pattern of the ORF2 gene of FCV. PCA analysis, a multivariable method, was employed to unravel the variation of the synonymous codon usage (Singh et al. 2016) . We found that the first four principal axes accounted for 54.86% of the total variation with the first, second, third and fourth principal axis accounting for 20.81%, 13.8%, 10.83% and 9.42% respectively (Fig. 2) . This suggests that the first and second axis contributed to the variation of RSCU of synonymous codons. PCA analysis was performed based on the continent and country of isolation (Fig. 3) . Based on the distribution of different strains on the first two axes, we found that the distribution of Asian strains, especially strains collected from China, was more widespread than the distribution of strains isolated from Oceania, Europe and North America. Moreover, most of the strains isolated from North America located near the origin, indicating that mutational pressure contributed to codon usage of the FCV ORF2 gene. It is normally considered that natural selection contributes to some extent to codon usage bias, therefore we evaluated the correlation between the Gravy and Aroma values and the codon contents (A3s, G3 s, C3s, U3 s and GC3s) ( Table 2) . We found a correlation between the Aroma values and U3 s, G3 s and GC3s (p b 0.05), confirming that natural selection influences the codon usage bias of the FCV ORF2 gene. 3.6. Natural selection plays a more important role than mutation pressure in shaping the codon usage of FCV We found that both mutation pressure and natural selection contribute to the codon usage bias of the ORF2 gene of FCV. Thus, to understand which one plays a more important role, the GC12s values (the mean value of GC1s and GC2s) were plotted against the GC3s values (Fig. 4) . We found a correlation between GC12s and GC3s (p b 0.05) with a correlation coefficient of 0.22, indicating that relative neutrality was 22% or, conversely, natural selection was 78%. Thus, natural selection plays a major role in shaping the codon usage bias of ORF2 gene of FCV compared to mutational pressure. RNA zoonotic viruses; such as influenza viruses and coronaviruses which have highly susceptible to recombination and cross species transmission (Su et al., 2015 (Su et al., , 2016 . FCV is a RNA virus and as such it has experienced a high evolution rate since its emergence. Previous studies on FCV have mostly focused on infectivity (Pesavento et al. 2004 ) and prevalence (Knowles et al. 1989) . However, there are no studies on codon usage bias of FCV. ORF2 is one of three ORF of FCV genome and encodes is the major capsid protein VP1. Therefore, the codon usage of ORF2 of FCV was first studied. Previous studies on codon usage bias of other RNA viruses showed high codon usage bias. For example, analysis of the G gene of Rabies virus (RABV) showed ENC values ranging from 44.40% to 51.40% (Zhao et al. 2016 ); Foot and Mouth Disease Virus (FMDV) displayed ENC values of 51.42% (Zhou et al. 2013) ; Porcine Epidemic Diarrhea Virus (PEDV) of 47.91% (Chen et al. 2014b) ; and Severe Acute Respiratory Syndrome (SARS) of 48.99% (Zhao et al. 2008 ). However, the mean ENC value of ORF2 of FCV reported here was 53.70% (SD ± 1.639), thus in comparison with the above viruses, the degree of codon usage bias of FCV is lower. Codon usage bias is mainly influenced by natural selection (Romero et al. 2003 ) and mutation pressure (Jenkins et al. 2001 ). Here, we used ENC-Plots (Fig. 1) and PCA (Fig. 3) analysis according to the geographical distribution to investigate the major factors shaping the codon usage bias of the FCV ORF2 gene . We found only one strain isolated from USA, one from Japan and one from Australia to sit near the standard values, suggesting that mutation pressure contributed to the codon usage bias of these three strains. PCA analysis revealed that mutation pressure is the dominant force shaping the codon usage of sequences isolated from North America (Fig. 3) . Furthermore, analysis of the relationships between nucleotide composition and the codon contents at the third base positions suggested that mutation pressure is one of the factors in shaping the codon usage of FCV ORF2. Correlation analysis of Gravy, Aroma values and codon content (A3s, G3 s, C3s, U3 s and GC3s) showed that there is a correlation between Aroma and U3 s, G3 s and GC3s, confirming that natural selection contributes to the codon usage of the ORF2 gene of FCV. Since both mutation pressure and natural selection are important in driving the codon usage of ORF2, we performed neutrality analysis to understand which of the two forces has a bigger impact. We found that natural selection is the major force driving the codon usage of ORF2. This is the first study analysing the codon usage of the FCV ORF2 gene and describing the forces that drive FCV evolution. In the future, more epidemiological surveys and sequence analysis are required to examine the factors that resulted in FCV evolution. Characterization of the porcine epidemic diarrhea virus codon usage bias Characterization of the porcine epidemic diarrhea virus codon usage bias An evaluation of measures of synonymous codon usage bias Isolation of a calicivirus antigenically related to feline caliciviruses from feces of a dog with diarrhea Analysis of synonymous codon usage in SARS coronavirus and other viruses in the Nidovirales Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa Evolution of base composition and codon usage bias in the genus Flavivirus Analysis of synonymous codon usage in Aeropyrum pernix K1 and other Crenarchaeota microorganisms Prevalence of feline calicivirus, feline leukaemia virus and antibodies to FIV in cats with chronic stomatitis A simple method for displaying the hydropathic character of a protein The characteristics of the synonymous codon usage in enterovirus 71 virus and the effects of host on the virus in codon usage pattern Genetic analysis of the PB1-F2 gene of equine influenza virus Variation in G + C-content and codon choice: differences among synonymous codon groups in vertebrate genes A detailed comparative analysis on the overall codon usage patterns in West Nile virus Synonymous codon usage pattern in glycoprotein gene of rabies virus Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Marburg virus evolution Analysis of Codon Usage Pathologic, immunohistochemical, and electron microscopic findings in naturally occurring virulent systemic feline calicivirus infection in cats Genetic characterization of feline calicivirus strains associated with varying disease manifestations during an outbreak season in Missouri Feline calicivirus The influence of translational selection on codon usage in fishes from the family Cyprinidae An evolutionary perspective on synonymous codon usage in unicellular organisms Characterization of codon usage pattern and influencing factors in Japanese encephalitis virus Directional mutation pressure and neutral molecular evolution Epidemiology, Evolution, and Recent Outbreaks of Avian Influenza Virus in China Epidemiology, Genetic Recombination, and Pathogenesis of Coronaviruses The 'effective number of codons' used in a gene Analysis of synonymous codon usage in 11 Human Bocavirus isolates Analysis of codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (NPV) and its relation to evolution The analysis of codon bias of foot-and-mouth disease virus and the adaptation of this virus to the hosts This paper was supported in part by the Priority Academic Program Development of Jiangsu Higher Education Institutions. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.meegid.2017.06.013.