key: cord-0995452-8790q2in authors: Liu, Yong-sheng; Zhou, Jian-hua; Chen, Hao-tai; Ma, Li-na; Ding, Yao-zhong; Wang, Meng; Zhang, Jie title: Analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus date: 2010-05-01 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2010.04.010 sha: 26029ce6e9b908af320d79f5386380f45578fb82 doc_id: 995452 cord_uid: 8790q2in In this study, we calculated the relative synonymous codon usage (RSCU) values and codon usage bias (CUB) values to implement a comparative analysis of codon usage pattern of open reading frames (ORFs) which belong to the two main genotypes of porcine reproductive and respiratory syndrome virus (PRRSV). By analysis of synonymous codon usage values in each ORF of PRRSV, the optimal codons for most amino acids were all C or G-ended codons except GAU for Asp, CAU for His, UUU for Phe and CCU for Pro. The synonymous codon usage patterns in different ORFs of PRRSV were different and genetically conserved. Among them, ORF1a, ORF4, ORF5 and ORF7 could cluster these strains into the two main serotypes (EU and US). Due to mutational pressure, compositional constraint played an important role in shaping the synonymous codon usage pattern in different ORFs, and the synonymous codon usage diversity in ORFs was correlated with gene function. The degree of CUB for some particular amino acids under strong selection pressure probably served as a potential genetic marker for each ORF in PRRSV. However, gene length and translational selection in nature had no effect on the synonymous codon usage pattern in PRRSV. These conclusions could not only offer an insight into the synonymous codon usage pattern and differentiation of gene function, but also assist in understanding the discrepancy of evolution among ORFs in PRRSV. It is well known that the genetic code chooses 64 codons to represent 20 standard amino acids and stop signals. These alternative codons for the same amino acid are termed as synonymous codons. Although synonymous mutations tend to occur in the third base position, the cases can be interchanged without altering the primary sequence of the protein product. Some reports indicate that synonymous codons are not chosen equally and randomly both within and between genomes (Dittmar et al., 2006; Grantham et al., 1980; Lloyd and Sharp, 1992; Martin et al., 1989; Xie et al., 1998) . In general, translation selection in nature and compositional constraints under the mutational pressure are thought to be the two major factors accounting for codon usage variation among genomes in various organisms (Gu et al., 2004; Karlin and Mrá zek, 1996; Lesnik et al., 2000; Zhong et al., 2007; Zhou et al., 2005 Zhou et al., , 2006 . In some RNA viruses, compared with translation selection in nature, mutation pressure plays an important role in synonymous codon usage pattern (Gu et al., 2004; Levin and Whittome, 2000; Jenkins and Holmes, 2003) . Porcine reproductive and respiratory syndrome virus (PRRSV) is an enveloped, single-stranded positive-sense RNA virus which is classified into the order Nidovirales of family Arteriviridae Cavanagh, 1997) . Based on serological characteristics, there are the two main serotypes of PRRSV, namely the Northern American isolate (US) and the European isolate (EU) (Bautista et al., 1993; Collins et al., 1992; Meng et al., 1995; Wensvoort et al., 1991) . In addition to differences between both serotypes of the viruses, there is obvious genetic variation within PRRSVs, as confirmed by measure of the nucleotide and amino acid sequences of the viruses. The PRRSV genome contains ORF1a, encoding papain-like cysteine protease, ORF1b, encoding RNA dependent RNA polymerase, ORF2-6, encoding envelop proteins, and ORF7, encoding the nucleocapsid protein (Conzelmann et al., 1993; Meulenberg et al., 1993) . The PRRSV can infect swine population and lead to a series of clinical signs, including high mortality, reproductive failure, post-weaning pneumonia and growth reduction (Keffaber, 1989; Loula, 1991) . It is reported that PRRSV could give rise to prolonged viremia and enable its replication in macrophages, and lead to persistent infections (Plagemann and Moennig, 1992) . However, little information about codon usage pattern of PRRSV genome including the relative synonymous codon usage (RSCU) and codon usage bias (CUB) in the process of their evolution are available. In this study, the key genetic determinants of codon usage index in PRRSV were examined. The 13 complete RNA sequences of PRRSV were downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/Genbank/) and detailed information about the viruses were listed in Table 1 . The nucleotide content of ORFs of each PRRSV strain were analyzed by biosoftware DNAStar 7.0 for windows. To investigate the characteristics of synonymous codon usage without the confounding influence of amino acid composition among different sequences, the relative synonymous codon usage (RSCU) values among different codons in each ORF was calculated. The RSCU value of the ith codon for the jth amino acid was calculated according to the published equation (Sharp and Li, 1986) . RSCU: where g ij is the observed number of the ith codon for jth amino acid which has n i type of synonymous codons. The codons with RSCU values more than 1.0 have positive CUB, while the values <1.0 have relative negative CUB. When RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly. To calculate CUB, it is supposed that statistically equal and random usage of all available synonymous codons was the ''neutral point'' (RSCU 0 = 1.00) for the development of serotype-specific codon usage (Zhou et al., 2010) . CUB:CUB ¼ X n i¼1 RSCU i j ÀRSCU 0 n More simply, CUB is the average value of difference between RSCU ij and RSCU 0 at each position of the target region. n represents all codons appearing in this position. When all RSCU values according to a particular position in the target region are RSCU 0 , CUB is equal to 0. It means that there are few preferential or nonpreferential codons existing at this position. In contrast, when CUB value is much more deviation than RSCU 0 , codons with CUB are preferentially chosen at a particular position. Principal component analysis, which was a commonly used multivariate statistical method (Jolliffe, 2002; Mardia et al., 1979) , was carried out to analyze the major trend in codon usage pattern among different strains. Each strain was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding the codon of AUG, UGG and three stop codons. We set up a two-dimensional coordinate system, which was made up of the first principal component ð f 0 1 Þ and the second principal component ð f 0 2 Þ. This two-dimensional coordinate would report the genetic relationship among all strains. Correlation analysis of PRRSV was used to identify the relationship between nucleotide composition and synonymous codon usage pattern (Ewens and Grant, 2001) . This analysis was implemented based on the Spearman's rank correlation analysis way. All statistical processes were carried out by with statistical software SPSS11.5 for windows. The nucleotide contents in each ORF of 13 PRRSV isolates were analyzed and the comparison among the values of A 3 %, U 3 %, C 3 % and G 3 % indicated that the content of A 3 was always lowest and the rest fluctuated similarly. The (C 3 + G 3 )% in ORFs of PRRSV fluctuates from 38.13% to 59.68% ( Table 2 ). The overall RSCU values of 59 sense codons in PRRSV were listed in Table 3 , respectively. Most optimal codons among strains represent C-ended or G-ended codons, however, there was an interesting findings that four codons of GAU, CAU, UUU, CCU and their corresponding amino acids of Asp, His, Phe, Pro were chosen preferentially. Most of Aended codons, excluding CCA and UCA with little slight preferential usage for Pro and Ser were used weakly (Table 3 ). These results suggested that compositional limitation often played an integral role in the codon usage pattern of PRRSV. Principal component analysis was carried out for the identified ORFs of all samples. The method detected one major trend in the first axis ð f 0 1 Þ which can account for 14.59% of the total synonymous codon usage variation, and another major trend in Table 2 Identified ORFs (length > 250 bps) in the PRRSV (13 isolates) genome. Fig. S1 . It appeared to be a little complex with some overlapping plots of ORF1a to ORF7. The plots of ORF1a and ORF1b which can produce viral nonstructural proteins aggregated highly, however, the plots of the rest scattered to different extents (Fig. S1) . It was clear that the plots of ORF4-7 were far from each other obviously, implying that the functions of differential viral products played a role in codon usage pattern. The ORF1a, ORF4, ORF5 and ORF7 could distinguish US and EU serotypes obviously, and these plots reflected codon usage pattern corresponding to each ORF (Fig. S2 ). It is probably been suggested during the evolution of PRRSV, the function of viral proteins enabled PRRSV genome to shape the characteristic of the US and the EU serotypes of PRRSV. The ORF1a could reflect the conserved pattern of synonymous codon usage between the two serotypes compared with ORF4, 5 and 7. This phenomenon reflected that synonymous codon usage pattern of ORF1a was relatively conserved due to the function of nonstructural protein of PRRSV, in contrast, due to the functions of structural proteins, the codon usage patterns of ORF4, 5 and 7 for structural proteins were relative variable. However, the scattered plots of ORF3 reflected that the two main serotypes had a common characteristic of synonymous codon usage pattern, namely, the plots failed to distinguish the two virus serotypes obviously. It suggested that the viral product encoded by ORF3 of different serotypes of PRRSV had a slight ability to identify between US and EU serotypes. (A 3 %, U 3 %, C 3 %, G 3 %, (C 3 + G 3 )), were measured and listed in Table 2 . By analysis of overlay scatter-plot, the content of each nucleotide at the synonymous third position of sense codons did not fluctuate following the total content of corresponding nucleotide, especially A 3 %, C 3 %, G 3 % (Fig. S3) . It is implied that the pattern of synonymous codon usage may not be directly and simply correlated to nucleotide content, but to selection pressure (e.g. mutation pressure and gene function). However, it was observed that the relationship between (C 3 + G 3 )% and (C + G)% did not reveal some special features, namely it did not reflect the real variation of C 3 % or G 3 %. Thus, the variation of (C 3 + G 3 )% may have a slight correlation with the codon usage pattern of PRRSV. To analyze codon usage pattern regulated by natural selection or mutation pressure, the A%, U%, C%, G% and (C + G)% were compared with A 3 %, U 3 %, C 3 %, G 3 % and (C 3 + G 3 )%, respectively. An interesting and complex correlation was observed. In detail, it was apparent that positive correlation existed among the nucleotide contents (i.e., A% and A 3 %) (p < 0.01), however, there were three different correlation degrees including positive correlation, negative correlation or non-correlation among the nucleotides. Further, A 3 % has no correlation with C% or G%, suggesting nucleotide constraint can influence codon usage of the nucleotide A at the synonymous codon; the fluctuation of U 3 % can be affected by A%, C% and G%, respectively, and C 3 % is not related to G%, they does not indicate some special feature; however, G 3 % has a strong positive correlation with A% and non-correlation with C%, indicating (C + G)% and (C 3 + G 3 )% may not reflect some true feature of synonymous codon usage as well (Table 4 ). In addition, the ð f (Table 5) . Although the codon usage patterns in different strains appeared to be related to (C 3 + G 3 )% to a slight extent, correlation analysis has been carried out to find significant correlation between nucleotide compositions and synonymous codon usage to a certain extent. In addition, there is no obvious relationship between codon usage indices ( f 0 1 values) and the length of different ORFs, for example there is no significant difference among codon usage indices for ORF1a, ORF1b and ORF2 (p > 0.05), implying that the different length of genes did not account for codon usage variation of ORFs in PRRSV. Taken together, these analyses indicate that nucleotide compositions play a role in the pattern of codon usage. Furthermore, mutational pressure is the main factor responsible for the variation of synonymous codon usage among samples. There was a seemingly random variation in CUB between amino acids and ORFs. There were several synonymous codons with strong discrepancy for codon usage in each ORF. In details, as for ORF2, GGU for Gly, UUG for Leu, CCA for Pro are chosen preferentially; in ORF3, GAC for Asp, GGG for Gly, GUU for Val are chosen preferentially; in ORF4, AGG for Arg, CAU for His, CCC for Pro are used preferentially; in ORF5, CAA for Gln, AAA for Lys, CCG for Pro, AGC for Ser are used preferentially; in ORF6, GAA for Glu, CAC for Thr, ACA for Thr are used preferentially; as for ORF7, CGC for Arg, UGC for Cys, CUG for Leu, AAG for Lys, AGU for Ser, ACU for Thr, GUC for Val. In ORF1a-1b, there is no preferential condon (Fig. S4) . These results may suggest that with the development of evolution of PRRSV, the discrepancy of synonymous codon usage was formed by accumulation of mutation. In order to analyze whether the evolution of CUB was controlled by mutation pressure or by translational selection in nature, the CUB data had been calculated based on data listed in Table 3 . This table displayed a numerical representation of the translational machinery. The distribution of CUB values is illustrated in Fig. S5 . The transition from maximum-negative to maximum-positive values was smooth and there was no obvious or unambiguous border between the so-called dominant and prohibited codons, namely, all possible codons were used. The result indicated that translational selection in nature has no effect on the pattern of synonymous codon usage and the evolutionary pattern of PRRSV. Generally, previous reports indicates that many viruses including foot-and-mouth disease viruses, influenza A virus subtype H5N1, severe acute respiratory syndrome Coronavirus (SARSCoV) and human bocavirus, preferentially use C and G-ended codons (Zhao et al., 2008; Zhong et al., 2007; Zhou et al., 2005) . It is unclear how the EU and US serotypes of PRRSV influence codon usage pattern to date. In this study, it is revealed that preferentially used codons are C, G and U-ended codons for the PRRSV, especially, four amino acids (Cys, His, Phe and Pro) preferentially used Uended codons. One possible explanation for why PRRSV has three types of optimal codons, compared with other RNA viruses mentioned above was that more types of optimal codon is advantageous to PRRSV which need to replicate efficiently in host cells with potentially distinct codon preferences. Although PRRSV could not been classified into US and EU serotypes by the RSCU values of all ORFs, some reports on the genetic analysis of the ORF5 could help us understand the genetic relationships among different isolates (Cha et al., 2006; Chen et al., 2006; Mateu et al., 2006) . It is known that ORF5 which encodes the major envelope glycoprotein (GP5) is one of the main immunogenic proteins of PRRSV and is the leading target for the development of the genetic engineering vaccines against PRRS. However, due to striking genetic and antigenic variability among different PRRSV isolates, GP5 of a strain has a negative effect on the cross-protection efficiency against heterologous PRRSV strains (Kim and Yoon, 2008; Meng, 2000) . The synonymous codon usage pattern of ORF1a or ORF5 has the ability to distinguish between US The preferentially used codons for each amino acid are described in bold. a AA is the abbreviation of amino acid. b RSCU values are mean values. c The preferentially used codon is U-end codon. Summary of correlation analysis between the A, U, C, G contents and A 3 , U 3 , C 3 , G 3 contents in all selected samples. A 3 % U 3 % C 3 % G 3 % ( C 3 + G 3 )% A% r = 0.378 ** r = À0.612 ** r = À0.433 ** r = 0.738 ** r = 0.249 ** U% r = À0.201 * r = 0.635 ** r = 0.170 NS r = À0.613 ** r = À0.458 ** C% r = À0.124 NS r = À0.125 NS r = 0.368 ** r = À0.189 NS r = 0.239 * G% r = À0.131 NS r = À0.192 NS r = 0.108 NS r = 0.332 ** r = 0.306 ** (C + G)% r = À0.006 NS r = À0.367 ** r = 0.198 * r = 0.126 NS r = 0.375 ** NS means non-significant (p > 0.05). * Means 0.01 < p < 0.05. ** Means p < 0.01. and EU serotypes. To some degree, it seems that codon usage variation of ORF1a has a better ability to distinguish strain types than that of ORF5, and the latter has a remarkable characteristic of genetic variability within both homologous serotype and heterologous serotypes. However, the synonymous codon usage pattern of ORF3 does not group strains into US or EU serotype obviously, possibly suggesting that GP3 contains a common biological function in both serotypes. The result is consistent with some viewpoints about GP3 of PRRSV (Meulenberg et al., 1995) . One possible explanation is that biological function of different viral products is developed by mutational pressure, rather than natural selection. It is also reported that there was a relationship between geographical location of sample origin and the genetic diversity of PRRSV (Stadejek et al., 2006) . In this study, although synonymous codon usage variation fails to reflect that a geographical factor, the index has an obvious effect on codon usage pattern. This result is in agreement with the report by Pesch et al. (2005) . As for RNA viruses, the major factor of shaping codon usage patterns appears to be mutation pressure rather than natural selection (Zhao et al., 2008; Zhong et al., 2007; Zhou et al., 2005) . To reveal the main force driving codon usage variation, it was found that codon usage bias was strong correlated with overall genomic (G + C) content, implying that composition constraint under mutational pressure rather than natural selection for specific coding triplets (Shackelton et al., 2006) . Naya et al. (2001) found that in Chlamydomonas reinhardtii genome with high (G + C) content, there was no evidence that composition constraint played a role in shaping codon usage pattern. The results indicated that there were two main forces, namely, natural selection and mutational pressure, existing in the process of evolution. In this study, each base composition at the third position of synonymous codon was always correlated to the other base composition ( Table 4 ). The fact that (G + C) content varies in a similar way at all codon positions is usually assumed to be the result of mutational bias. The fact that (G + C) content varies in a similar way at all codon positions is usually assumed to be the result of mutational pressure. A general mutational pressure, which affects the whole genome would certainly account for the majority of the codon usage has been reported among some RNA viruses. Since mutation rates of RNA viruses are much higher than those DNA viruses (Jenkins and Holmes, 2003) , it is understandable that mutation pressure plays a key role in shaping the synonymous codon usage pattern in different ORFs of the 13 PRRSV strains included in this study. In addition, the general association between codon usage indices and composition constraint shows that mutational pressure plays an important role in determining codon usage variation of PRRSV, which is supported by the highly significant correlation between codon usage indices ð f 0 1 Þ and A 3 %, U 3 %, G 3 %, C 3 % (Table 5) . Although the codon usage indices ð f 0 1 Þ is strong correlated with composition constraint (C 3 + G 3 )% has stronger correlation with f 0 2 values than f 0 1 , implying that it has no real correlation with codon usage indices. The fluctuation of (C 3 + G 3 )% does not reflect C 3 % and G 3 % and (C 3 + G 3 )% may not always present the degree of composition constraint, so the fluctuation of each nucleotide composition needs to be analyzed further. Sequence analysis indicated PRRSV achieve evolution through random mutation and intragenic recombination (Kapur et al., 1996; Meng et al., 1995; Murtaugh et al., 1995; Nelsen et al., 1999) . Taken together, mutational pressure is one of the main factors responsible for the variation of synonymous codon usage among ORF coding sequences in PRRSV. The codon usage bias for synonymous codon in different ORFs of PRRSV shows different feature in the study. Among these ORFs, each includes some amino acids with strong selective discrepancy and some synonymous codons for a particular amino acid has been used preferentially. It seems that the codon usage discrepancy for some synonymous codon may serve as a potential genetic marker. Nielsen et al. (2001) examined the alternative of synonymous and non-synonymous amino acids in ORF1 of PRRSV and pointed out that there was a stronger selective pressure for amino acid conservation during spread in host animal. Storgaard et al. (1999) found that non-synonymous nucleotide mutations in ORF7 of PRRSV were not stronger than ORF5. These findings suggest that gene function and mutational pressure likely affect codon usage variation. In addition, the effect of natural translation selection in shaping synonymous codon usage is not observed in this study and this is consistent with those previous reports (Biro, 2008; Gu et al., 2004; Levin and Whittome, 2000; Zhou et al., 2005) . These results probably assist us in understanding various factors influencing evolution of PRRSV. Table 5 Summary of correlation analysis between the first two axes in principle and nucleotide contents in samples. Base compositions f 0 1 (15.083%) f 0 2 (11.171%) A 3 % r = À0.449 ** r = 0.441 ** U 3 % r = 0.529 ** r = À0.219 ** C 3 % r = 0.519 ** r = 0.256 ** G 3 % r = À0.695 ** r = À0.454 ** (C 3 + G 3 )% r = À0.128 NS r = À0.194 ** NS means non-significant. ** Means p < 0.01. Comparison of porcine alveolar macrophages and CL2621 for the detection of porcine reproductive and respiratory syndrome (PRRS) virus and anti-PRRS antibody Characterization of swine infertility and respiratory syndrome (SIRS) virus (isolate ATCC VR-2332) Does codon bias have an evolutionary origin? Theor Nidovirales: a new order comprising Coronaviridae and Arteriviridae Molecular characterization of recent Korean porcine reproductive and respiratory syndrome (PRRS) viruses and comparison to other Asian PRRS viruses Genetic variation of Chinese PRRSV strains based on ORF 5 sequence Isolation of swine infertility and respiratory syndrome virus (isolated ATCC-2332) in North American and experimental reproduction of the disease in gnotobiotic pigs Molecular characterization of porcine reproductive and respiratory syndrome virus, a member of the Artervirus group Tissue-specific differences in human transfer RNA expression Codon catalog usage and the genome hypothesis Analysis of synonymous codon usage in SARS coronavirus and other viruses in the Nidovirales The extent of codon usage bias in human RNA virus and its evolutionary origin Principal Component Analysis Genetic variation in porcine reproductive and respiratory syndrome virus isolates in the Midwestern United States What drives codon chices in human genes? Reproductive failure of unknown etiology Molecular assessment of the role of envelope-associated structural proteins in cross neutralization among different PRRS viruses Ribosome traffic in E. coli and regulation of gene expression Codon usage in nucleopolyhedroviruses Evolution of codon usage patterns: the extent and nature of divergence between Candida albicans and Saccharomyces cerevisiae Mystery pig disease Multivariate Analysis Variation in G + C content and codon choice: differences among synonymous codon groups in vertebrate genes Evolution of ORF5 of Spanish porcine reproductive and respiratory syndrome virus strains from Heterogeneity of porcine reproductive and respiratory syndrome virus: implications for current vaccine efficacy and future vaccine development Phylogenetic analysis of the putative M (ORF6) and N (ORF7) genes of porcine reproductive and respiratory syndrome virus (PRRSV): implication for the existence of two genotypes of PRRSV in the USA Lelystad virus, the causative agent of porcine epidemic abortion and respiratory syndrome (PEARS) is related to LDV and EAV Characterization of proteins encoded by ORFs 2 to 7 of Lelystad virus Comparison of the structural protein coding sequences of the VR-2332 and Lelystad virus strains of PRRS virus Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinharatii Porcine reproductive and respiratory syndrome virus comparison: divergent evolution of two continents Reversion of a live porcine reproductive and respiratory syndrome virus vaccine investigated by parallel mutations New insights into the genetic diversity of European porcine reproductive and respiratory syndrome virus (PRRSV) Lactate dehydrogenase-elevating virus, equine arteritis virus, and simian hemorrhagic fever virus: a new group of positive-stranded RNA viruses Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses Porcine reproductive and respiratory syndrome virus strains of exceptional diversity in Eastern Europe support the definition of new genetic subtypes Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codon Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes Examination of the selective pressures on a live PRRS vaccine virus The relationship between synonymous codon usage and protein structure Mystery swine disease in the Netherlands; the isolation of lelystad virus Analysis of synonymous codon usage in 11 Human Bocavirus isolates Mutation pressure shapes codon usage in the GC-rich genome of foot-and-mouth disease virus Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses Synonymous codon usage in environmental Chlamydia UWE25 reflects an evolution divergence from pathogenic chlamydiae Characteristics of codon usage bias in two regions downstream of the initiation codons of footand-mouth disease virus This work was supported in parts by grants from International Science & Technology Cooperation Program of China (No. 2010DFA32640) and Science and Technology Key Project of Gansu Province (No. 0801NKDA034). This study was also supported by National Natural Science foundation of China (No. 30700597). Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2010.04.010.