key: cord-0934983-w7pz83s1 authors: Xu, Xin; Li, Pengfei; Zhang, Yating; Wang, Xianhe; Xu, Jiaxin; Wu, Xuening; Shen, Yujiang; Guo, Dexuan; Li, Yuchang; Yao, Lili; Li, Liyang; Song, Baifen; Ma, Jinzhu; Liu, Xinyang; Xu, Shuyan; Zhang, Hua; Wu, Zhijun; Cao, Hongwei title: Comprehensive analysis of synonymous codon usage patterns in orf3 gene of porcine epidemic diarrhea virus in China date: 2019-12-31 journal: Research in Veterinary Science DOI: 10.1016/j.rvsc.2019.09.012 sha: 4b63f28146f06de4615b9e698cc5f7c6c7c86816 doc_id: 934983 cord_uid: w7pz83s1 Abstract The ORF3 protein of porcine epidemic diarrhea virus (PEDV) is found to function as an ion channel which influences virus virulence and production. Taking consideration of the importance of PEDV orf3 gene, we have performed comprehensive analysis to investigate its synonymous codon usage patterns. In this study, the results of base composition analysis showed A/T rich and G/C poor in PEDV orf3 genes, and the most abundant base was nucleotide T. The relative synonymous codon usage value in each codon revealed that codon usage bias existed. The mean ENC value of each gene was 48.75, indicating a low codon usage bias, as well as a relatively instable change in PEDV orf3 genes. The general correlation analysis between base composition and codon usage bias indicated that mutational bias has an impact on the PEDV codon usage bias. Neutral analysis suggested that natural selection pressure takes a more important influence than mutational bias in shaping codon usage bias. Moreover, other factors including hydrophobicity and aromaticity have been also found to influence the codon usage variation among the PEDV orf3 genes. This study not only represents the most systematic analysis of codon usage patterns in PEDV orf3 genes, but also provides a basic shaping mechanism of the codon usage bias. The ORF3 protein of porcine epidemic diarrhea virus (PEDV) is found to function as an ion channel which influences virus virulence and production. Taking consideration of the importance of PEDV orf3 gene, we have performed comprehensive analysis to investigate its synonymous codon usage patterns. In this study, the results of base composition analysis showed A/T rich and G/C poor in PEDV orf3 genes, and the most abundant base was nucleotide T. The relative synonymous codon usage value in each codon revealed that codon usage bias existed. The mean ENC value of each gene was 48.75, indicating a low codon usage bias, as well as a relatively instable change in PEDV orf3 genes. The general correlation analysis between base composition and codon usage bias indicated that mutational bias has an impact on the PEDV codon usage bias. Neutral analysis suggested that natural selection pressure takes a more important influence than mutational bias in shaping codon usage bias. Moreover, other factors including hydrophobicity and aromaticity have been also found to influence the codon usage variation among the PEDV orf3 genes. This study not only represents the most systematic analysis of codon usage patterns in PEDV orf3 genes, but also provides a basic shaping mechanism of the codon usage bias. As a highly contagious and acute enteric viral disease, porcine epidemic diarrhea (PED) is characterized by watery vomiting, diarrhea and severe dehydration, resulting into > 80% mortality in neonatal piglets (Song et al., 2015) . The first PED outbreak was recognized in England in the early 1970s and then has been continually reported in other European, American and Asian countries, including China (Song and Park, 2012; Sun et al., 2012) . The causative agent of PED is porcine epidemic diarrhea virus (PEDV), which belongs to the member of the Coronaviridae family, Coronavirinae subfamily, and Alphacoronavirus genus, including some other swine, bat and human coronaviruses (Chen et al., 2008) . PEDV is a large, single-stranded positive-sense RNA enveloped virus, whose genome is approximately 28 knt encoding at least seven open reading frames (ORF1a, ORF1b, and ORF2-6), a 3′ polyadenylated tail and a 5′ untranslated region (5'-UTR) (Lee et al., 2015) . Replicase proteins are encoded by ORF1a and ORF1b, and the viral proteins are encoded by the next five ORFs, including the spike protein (S), the ORF3 protein (ORF3), the small membrane proteins (E), the membrane proteins (M), and the nucleocapsid protein (N) (Chen et al., 2014) . As one of important viral gene, its product of orf3 is the only accessory protein in PEDV and found to function as an ion channel to influence virus virulence and production (Song et al., 2003; Wang et al., 2012) . In the majority of PEDV strains, orf3 gene is widely used for diagnosis of PEDV infection because of its highly conserved characteristics (Wang et al., 2016) . The differences in orf3 genes between the attenuated-strain and wild-strain can also be served as a marker of the viral adaption to host and used as a potential method to study molecular evolutionary. Previous studies of PEDV orf3 genes have been mainly limited to phylogenic analysis (Huang et al., 2013) , and few synonymous codon usage analyses have been performed (Chen et al., 2014) . Except for tryptophan and methionine, other amino acids are encoded by 2-4 codons because the amino acids types are less than the genetic codes. This phenomenon is defined as synonymous codon usage (Chen et al., 2017) . It is well known that synonymous codons for each amino acids are not used randomly in the genomes of organisms, but some codons are used more frequently than others, which is referred as synonymous codon usage bias (Marín et al., 1989) . Many studies have determined codon usage bias in viruses, bacteria, fungi, and so on (D'Andrea et al., 2011) . For example, the rotavirus and rubella show the strong codon usage bias among viral genome, whose degree of deviation are dependent on the identity of the virus (Belalov and Lukashev, 2013) . On the contrary, other virus display weak codon usage bias, such as classical swine fever virus (CSFV) (Tao et al., 2009) , enterovirus 71 (EV71) , and newcastle disease virus . Up to date, codon usage in RNA virus was also testified to be related to mutation bias, translational selection, dinucleotide bias, and other factors (Zhou et al., 2005; Sharp et al., 2010; Hussain et al., 2019) . Elucidating the extent and causes of codon usage biases is beneficial for the understanding viral molecular evolution (Shackelton et al., 2006) . Considering the highly contagious features of PEDV and significance of orf3 gene, it is need to analyze the codon usage patterns of PEDV orf3 gene during its evolution, which can provide important information about virus evolution, regulation of gene expression and protein synthesis, and further aid in vaccine design that may require high levels of viral antigen expression to produce immunity (Butt et al., 2014) . In this present study, a total of 518 coding sequences (CDS) of orf3 gene (> 99% sequence identities excluded) of PEDV strains isolated from China were retrieved from GenBank database (https://www.ncbi. nlm.nih.gov/nucleotide/). The clustal X software (Thompson et al., 1997) was used for alignment of the orf3 gene sequences. The program codonW program (version 1.4.2) (http://codonw.sourceforge.net//) was applied for calculating the effective number of codons (ENC), total G + C genomic content, as well as G + C content at first, second and third codon positions. The detailed information of the 518 orf3 gene sequences is provided in supplemental data. The results showed that the T (38.22% with a SD of 0.25%) was the most abundant base, and the A (23.77% ± 0.17%), G (19.86% ± 0.23%) and C (17.09 ± 0.33%) were subsequently the second, third and fourth abundant base through base composition analysis. The average GC content of all PEDV orf3 was 36.95% (from 36.16% to 37.95%, with a SD. of 0.29%), and the average GC3s content in codons was 33.21% (from 31.36% to 35.91%, with a SD. of 0.65%), indicating all of the PEDV orf3 genes were A/T rich and G/C poor. It is first proposed that the relative synonymous codon usage (RSCU) value of each codon can be calculated to directly reflect the characteristics of codon usage in 1986 (Sharp and Li, 1986) . RSCU value represents the frequency of codon usage bias, whose value is 1.0 indicating no bias. In contrast, if RSCU deviates 1.0, indicating there exists a negative or positive codon usage bias (Ma et al., 2002) . To gain insight into characteristics of synonymous codon usage in PEDV orf3 genes, RSCU values were calculated using program GCUA (version 1.2) (ftp://ftp.nhm.ac.uk/pub/gcua), and the RSCU values of all 61 codons were displayed in Table 1 . These results showed that the preferentially used codons were U-ended (11 ones), C-ended (4 ones), A-ended (3 ones), and G-ended (3 ones) codons. It was worth noting that the most preferentially used U-ended codons among the synonymous codons were similar with the result of the above T base. These results supported the evidence that T was the most abundant base content and was most preferentially used among the third position of the four kinds of nucleotides, suggesting that codon usage bias exits in the synonymous codon usage pattern in the PEDV orf3 gene, which is influenced by compositional constraints. The ENC value of a gene is usually performed to determine the extent of codon usage bias. The ENC values fluctuate from 20 to 61. If the value is 20, indicating biased gene, but the value of 61 indicates the unbiased gene (Comeron and Aguade, 1998) . In order to investigate the variation of codon usage bias in PEDV orf3 genes, the ENC values of 518 genes were calculated. The results showed that ENC values varied from 45.44 to 56.37, with an average ± SD of 48.75 ± 1.29, which represented a relatively low codon usage bias and an instable change. In addition, we have performed the same analysis as the above orf3 gene, which included a total of 294 coding sequences (CDS) of M gene of PEDV strains collected from China. The results showed that the ENC values of M gene varied from 47.45 to 60.47, with an average ± SD of 56.29 ± 1.74, which represented a comparatively stable change and a lower codon usage bias than orf3. Mutational pressure and translational selection are thought to be two major factors influencing usage variation in RNA virus genome (Belalov and Lukashev, 2013) . The plot of ENC versus GC3s can be used to analyze synonymous codon usage bias of viral genes (Wright, 1990) . Genes represented by the spots in the ENC-GC3s plot will locate above or below the predicted curve when codon usage is constrained only by a G + C mutational bias . As shown in Fig. 1A , the ENC-GC3s plot showed that most points lay below the considerably predicted curve, revealing that the G + C mutational bias might play a major role in PEDV orf3 codon usage. While some points located above the expected curve, suggesting that codon bias is also related to translational selection combined with other factors. Subsequently, we performed a correspondence analysis (COA) to investigate the trends in 59 codon usage variation among PEDV orf3 genes according to the previous method (Chen et al., 2014) . Based on the relative and cumulative inertia of the first 20 factors, we used the Origin software (version 8.0) to display the distributions of each vector, respectively. The 21.7% of the total variation was accounted on the first The preferentially used codons and RSCU values for orf3 gene of the PEDV are in bold and italic. AA Amino acids, N number of codons, RSCU cumulative relative synonymous codon usage. (caption on next page) X. Xu, et al. Research in Veterinary Science 127 (2019) 42-46 principal axis. The next three axes accounted for 15.38%, 14.58%, and 14.50% of the variation, respectively, revealing that the first four axes accounted for 66.16% of the total variation (Fig. 1B) . At the same time, COA was carried out on the RSCU values for each gene and its distribution in the plane defined by the first two principal axes of COA were displayed (Fig. 1C) . The results showed that the vast majority of virulent genes were distributed around the origin of coordinate axis and did not distance too much from one other. Meanwhile, we also found that some genes were located at different positions in the plane, which were dispersed and far away from the origin. These strains mainly collected from southern China distributed more widespread than that of other region of China. In addition, most of the studied strains were isolated from southern China, and whose ENC values (51.73 ± 2.94) were higher than average ENC, as well as the strains belonging to other region of China. These data reflected the relatively low codon usage bias among different strains, indicating mutational bias might contribute to the codon usage bias of PEDV genome. These above results reveled that both mutation pressure and natural selection contribute to the codon usage bias of the orf3 gene of PEDV. Thus, to distinguish which one plays a more important role in shaping condon usage bias, the GC3s values were plotted against the GC12s values (Chen et al., 2014) . The neutrality plot showed that the directional mutation pressure vs natural selection that shapes codon usage in the orf3 gene of PEDV (Fig. 1D ). We found that GC3s was significantly correlated with GC12s (r = −0.442, P < .01), with a correlation coefficient of −0.2368, indicating that relative neutrality was 23.68%, conversely, natural selection was 76.32%. These results demonstrated that compared to mutational pressure, natural selection plays a major role in influencing the codon usage bias of orf3 gene of PEDV. To further analyze the possible effects of mutational pressure on the codon usage bias in PEDV orf3 genes, we performed the correlation analysis among the nucleotide compositions (A%, T%, G%, C%, and GC%), codon compositions (A3s, T3s, G3s, C3s, and GC3s) and the ENC values. Furthermore, correlation analysis and regression analysis was performed using the values of the first two axes of this COA (Chen et al., 2014) and the Spearman's rank correlation analysis method (Tsai et al., 2007) . We have conducted all statistical analyses using the statistical analysis software SPSS (Version 17.0). The nucleotide compositions were correlated with most of the codon compositions (Fig. 1E) . Furthermore, there was a significant correlation between the ENC values and the most of nucleotide compositions, and all of P values were < 0.01, which indicated that mutational bias shapes the synonymous codon usage pattern of the PEDV orf3 gene. At last, we have evaluated the correlation between the Gravy and Aroma values and the codon contents. The results showed that Gravy value was correlated with the A3s, G3 s, C3s, U3 s, GC3s, GC12s and ENC. The Aroma value was correlated with the A3s, G3 s, C3s, GC3s, GC12s and ENC. Both Gravy value and Aroma value were correlated with Axis 1 and Axis 2, indicating that natural selection influences the codon usage bias of PEDV orf3 genes. In conclusion, the codon usage pattern of PEDV orf3 gene is comparatively low. Two main factors, mutational bias and natural selection pressure, contribute to the codon usage pattern with the latter playing a more critical role. Moreover, other factors, such as dinucleotide composition and aromaticity also influence codon usage bias. This study not only represents the most comprehensive analysis of PEDV orf3 codon usage patterns, but also provides a basic understanding of the mechanisms for codon usage bias. However, this study only applies to PEDV isolates from China, and our future direction of this work will focus on the comparison of PEDV isolates from other parts of the world to extensively examine the factors that cause the outbreak and evolution of this virus. There is no conflict of interest among the contributors of this paper. The first 20 axes are used to display the tendency of codon usage bias of PEDV orf3 genes. The plot is drawn according to the relative and cumulative inertia of the first 20 factors, respectively. The relative inertia is represented by the bar chart and the cumulative inertia is indicated by the curve chart based on principal component analysis. (C) Positions of the PEDV orf3 gene in the plot of the first two major axes by COA of RSCU values. The first and second axes account for 21.7% and 15.38% of the total variation, respectively. (D) Neutrality analysis in relation to GC3s and GC12s displays the key role between mutational pressure and natural selection. (E) Summary of correlation analysis nucleotide composition, Axis1, Axis2, Gravy, Aroma, GC3s, GC12s and ENC. * P value ≤.05; ** P value ≤.01. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Causes and implications of codon usage bias in RNA viruses Genome-wide analysis of codon usage and influencing factors in Chikungunya viruses Analysis of synonymous codon usage in Newcastle disease virus hemagglutinin-neuraminidase (HN) gene and fusion protein (F) gene. VirusDiseases 25 Molecular characterization and phylogenetic analysis of membrane protein genes of porcine epidemic diarrhea virus isolates in China Characterization of the porcine epidemic diarrhea virus codon usage bias Comprehensive analysis of the codon usage patterns in the envelope glycoprotein E2 gene of the classical swine fever virus An evaluation of measures of synonymous codon usage bias A detailed comparative analysis on the overall codon usage patterns in hepatitis A virus Origin, evolution, and genotyping of emergent porcine epidemic diarrhea virus strains in the United States A detailed analysis of synonymous codon usage in human bocavirus Isolation and characterization of a Korean porcine epidemic diarrhea virus strain KNU-141112 Cluster analysis of the codon use frequency of MHC genes from different species Variation in G + C-content and codon choice: differences among synonymous codon groups in vertebrate genes Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses Codon usage in regulatory genes in Escherichia coli does not reflect selection for rare codons Forces that influence the evolution of codon bias Porcine epidemic diarrhea virus: a comprehensive review of molecular epidemiology, diagnosis, and vaccines Differentiation of a Vero cell adapted porcine epidemic diarrhea virus from Korean field strains by restriction fragment length polymorphism analysis of ORF 3 Porcine epidemic diarrhea: a review of current epidemiology and available vaccines Outbreak of porcine epidemic diarrhea in suckling piglets Analysis of synonymous codon usage in classical swine fever virus The CLUSTAL_X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Analysis of codon usage bias and base compositional constraints in Iridovirus genomes PEDV ORF3 encodes an ion channel protein and regulates virus production Molecular characterization of the ORF3 and S1 genes of porcine epidemic diarrhea virus non S-INDEL strains in seven regions of China The effective number of codons used in a gene Analysis of synonymous codon usage in enterovirus 71 Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses Supplementary data to this article can be found online at https:// doi.org/10.1016/j.rvsc.2019.09.012.