key: cord-0005009-gtzpz1yb authors: Wang, Meng; Zhang, Jie; Zhou, Jian-hua; Chen, Hao-tai; Ma, Li-na; Ding, Yao-zhong; Liu, Wen-qian; Liu, Yong-sheng title: Analysis of codon usage in bovine viral diarrhea virus date: 2010-11-11 journal: Arch Virol DOI: 10.1007/s00705-010-0848-0 sha: 9817c7e2740f87abc0e932253a2162c413df69b5 doc_id: 5009 cord_uid: gtzpz1yb Bovine viral diarrhea virus (BVDV) is a widespread virus in beef and dairy herds. BVDV has been grouped into two genotypes, genotype 1 and genotype 2. In this study, the relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values and nucleotide content were investigated, and a comparative analysis of codon usage patterns for open reading frames (ORFs) of 22 BVDV genomes, including 14 of genotype 1 and 8 of genotype 2, was carried out. A high A+U content and low codon bias were found in BVDV genomes. Depending on the RSCU data, it was found that there was a significant variation in bias of codon usage between the two genotypes, and a geographic factor exists only in genotype-1 of BVDV. The RSCU data have a negative correlation with general average hydrophobicity (GRAVY), aromaticity and nucleotide content. Furthermore, the overall abundance of C and U has no effect on the synonymous codon usage patterns. In contrast, the A and G content showed a significant correlation with the nucleotide content at the third position. In addition, the codon usage patterns of BVDV are similar to those of 22 conserved genes of Bos taurus. Taken together, the genetic characteristics of BVDV possibly result from interactions between natural seclection and mutation pressure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00705-010-0848-0) contains supplementary material, which is available to authorized users. In 18 out of 20 amino acids (excluding Met and Trp), the degeneracy of the genetic code allows multiple codons to encode the same amino acid, resulting in codon usage bias in genes [7, 24] . Codon usage analysis has been applied to prokaryotes and eukaryotes, such as Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Caenorhabditis elegans and human beings [4, 16, 25, 27] . Some reports have shown that codon usage bias had a high correlation to tRNA abundance, GC content, mRNA secondary structure, exon splicing constraints, translation rate and gene expression level [12, 18, 26] . The study of codon usage can provide some evidence about the molecular evolution of the viruses. It can also enrich our understanding about the relationship between viruses and their hosts by analyzing their codon usage patterns. BVDV is a member of the genus Pestivirus within the family Flaviviridae. The genus also includes classical swine fever virus (CSFV) and Border disease virus (BDV) of sheep [3, 20] . Based on a comparison of the 5' untranslated region (UTR) and the N pro -and E2-encoding sequences [23, 30] , BVDV can be divided into two different genotypes: BVDV-1 and BVDV-2 [21, 22] . The genome of each genotypes contains a single positivestranded RNA with a size of approximately 12.3 kb, consisting of a single large open reading frame (ORF) flanked by 5' and 3' untranslated regions [6, 8] . The BVDV strains can grow in epithelial cell cultures with cytopathic (CP) or noncytopathic (NCP) effect [17] . Since BVDV is highly genetically variable, little information about synonymous codon usage patterns of BVDV genomes has been acquired to date [13, 29] . To our knowledge, this is the first report of codon usage analysis of BVDV. In this study, we analyzed the codon usage data and base composition of 22 available complete ORFs of BVDV to obtain some clues to the features of genetic evolution of this virus. A total of 22 BVDV genomes, consisting of 14 strains of genotype 1 and 8 strains of genotype 2, were used to analyze the relevant factors of synonymous codon usage patterns and nucleotide contents in this study. The genotype, phenotype, country of isolation and GenBank accession numbers of these strains are listed in Table 1 . In addition, 22 different well-conserved genes of Bos taurus were selected to examine the relationship between codon preferences in the host and the viruses ( Table 2 ). All of the abovementioned coding sequences were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/Genbank/). To investigate the patterns of synonymous codon usage (RSCU) without the confounding influence of amino acid composition among all BVDV samples, the RSCU values of codons in the ORFs of BVDV were calculated according to a formula described in previous reports [25, 32] : where g ij is the observed number of the ith codon for the jth amino acid, which has n i types of synonymous codons. A codon with an RSCU value of more than 1.0 has a positive codon usage bias, while a value of less than 1.0 has a negative codon usage bias. When the RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly. The effective number of codons The effective number of codons (ENC) is used to measure deviation from expected random codon usage of BVDV and is independent of hypotheses involving natural selection [5] . The ENC values range from 20 to 61. The larger the codon preference in a gene is, the smaller the ENC value is. In an extremely biased gene where only one codon is used for each amino acid, this value would be 20, and if all codons were used equally, it would be 61 [28, 31] . The formulas for ENC are as follows: The n is the observed number of codons used, k is the number of synonymous codons, and P i is the usage frequency of the ith codon (n i /n). ENC is influenced by the amino acid content of the gene and its length. The fraction of each codon within its synonymous family Codon frequency normalizes the codon observations to a fraction for each codon within its synonymous family [1] . To examine the degree of similarity in codon usage between BVDV and that of its host animal (Bos taurus), the fraction of each codon (a total of 59 standard codons, excluding the synonymous single codon for AUG [Met], UGG [Trp] and the three termination codons) within its synonymous family of 22 ORFs of BVDV and 22 genes of Bos taurus was compared. Principal component analysis (PCA) was conducted to analyze the major trend in codon usage pattern among BVDV samples. This is a statistical method that performs linear mapping to extract optimal features from an input distribution in the mean squared error and can be used by self-organizing neural networks to form unsupervised neural preprocessing modules for classification problems [15] . In order to minimize the effect of amino acid composition on codon usage, each ORF is represented as a 59-dimensional vector, and each dimension corresponds to the RSCU value of one sense codon excluding AUG (Met), UGG (Trp) and the three stop codons. A Spearman's rank correlation analysis was used to identify relationships among nucleotide content, RSCU and principal component factors of BVDV. A linear leastsquare regression was conducted to evaluate the correlation between the fraction of synonymous codons in BVDV and that in the genes of Bos taurus. General average hydrophobicity (GRAVY) and aromaticity scores were used to investigate hydrophobic properties of the targeted proteins. Both scores of each protein were obtained using the software Codon W 1.2.4. The characteristics of synonymous codon usage in BVDV In order to investigate the extent of codon usage bias in BVDV, all RSCU values of different codons in 22 BVDV strains were calculated. There is only one preferred codon, AGU, with U at the third position; all of the remaining preferred codons end with A, C or G (Table 3) Genetic relationship based on synonymous codon usage Principal component analysis was carried out to identify the codon usage bias among ORFs. From this, we could detect one major trend in the first axis (f 0 1 ), which accounted for 26.51% of the total variation, and another major trend in the second axis (f 0 2 ), which accounted for 13.02% of the total variation. A plot of the f 0 1 and the f 0 2 of each gene is shown in Supplementary Fig.1 . Compared with the scattered groups of BVDV genotype 1, all BVDV genotype 2 strains aggregated more tightly to some degree. Interestingly, it seems that there is a clear geographical demarcation in the BVDV-1 groups. Compositional properties of all BVDV genomes Natural selection and mutation pressure are thought to be the main factors that account for codon usage variation in different organisms. The A%, U%, C%, G% and (C?G)% were compared with A 3 %, C 3 %, G 3 %, U 3 %, (G?C) 3 %, respectively. An interesting and complex correlation was observed. In detail, the (C?G) 3 % values have highly significant correlations with the A%, U%, C%, G% and (C?G)% values, indicating that (C?G) 3 % may reflect an interaction between mutation pressure and natural selection. In contrast, the U% and C% values did not correlate with the A 3 %, U 3 %, G 3 % and C 3 % values (Table 5 ). Both cases suggest that nucleotide constraints possibly influence synonymous codon usage in BVDV. Correlation analysis was used to analyze the relationships among ENC values, (G?C) 3 % values and (C?G)% values. A highly significant correlation was observed between ENC and (C?G)% (Spearman r = 0.765, p \ 0.01), while significant correlation was also observed between ENC and (G?C) 3 % (Spearman r = 0.534, 0.01 \ p.05), indicating that codon usage bias is influenced by nucleotide constraints. In addition, the correlation between the f 0 1 value and A%, C%, G%, U%, A 3 %, C 3 %, G 3 %, U 3 %, (G?C)%, (G?C) 3 % values of each strain was also analyzed. A significant The preferentially used codons for each amino acid are indicated in bold a Amino acid b The RSCU value is a mean value of each codon for a particular amino acid c The preferentially used codon ends in U correlation was found between nucleotide composition and synonymous codon usage to some extent ( Table 6 ). The analysis revealed that most of the codon usage bias among ORFs of BVDV strains was directly related to base composition. We found that f where s represents the given (G?C) 3 % value [31] . However, all of the points with low ENC values lying below the expected curve suggest that although codon usage bias is influenced by mutational pressure, certain other factors must have an influence on the variation of codon usage in these genes. Therefore, we performed another correlation analysis on f 0 1 in principal component analysis between GRAVY and the aromaticity score of each protein ( Table 6) . A plot of average proportions of codons within its synonymous family in BVDV (excluding strain no. 14, which was isolated from swine) and Bos taurus was conducted to explore the relationship between BVDV and its host in codon usage. When two factors are both less than or equal to 0.15, it is defined as a low frequency of usage; and when one factor is greater than or equal to twice of the other factor, it is considered a great difference in frequency. The plot gave a clear linear relationship between BVDV and Bos taurus, showing that the virus and host had very similar patterns of codon usage (r 2 = 0.697). The patterns indicate that the least frequently used codons in the host were also the nonpreferred codons of the viruses, such as UCG (Ser), CCG (Pro), ACG (Thr), CGU, CGC, CGA, CGG (Arg) and GCG (Ala), and some highly scattered codons including CUA (Leu), AGG (Arg), AUA and AUU (Ile). Linear regression analysis was also performed to investigate the relationship of codon usage patterns between strain 14 and the other BVDV strains. There was no significant difference between the two patterns (P.05). Natural selection is a phenomenon that alters the behavior and fitness of living organisms within a given environment. It is the driving force of evolution. Mutation pressure is the change in some gene frequencies due to the repeated occurrence of the same mutations. There are not many biologically realistic situations where mutation pressure is the most important evolutionary process. However, for RNA viruses, the mutation rate is sometimes high enough that mutation pressure needs to be considered. It is well established that synonymous codon usage reveals genetic information about some viral genomes [10, 14] . In this study, the evidence suggests that the synonymous codon usage bias in BVDV genes is low (mean ENC = 51.43, greater than 40). Therefore, together with published data on codon usage bias of some RNA viruses, such as influenza A H5N1 virus and SARS coronovirus, with mean values of 50.91 and 48.99, respectively [10, 33] , the low frequency of codon usage bias for RNA viruses is similar to some degree. Bahir et al. also reported that there is a strong resemblance in codon usage between viruses and their host cells [2] . This suggests that the characteristics of low codon bias may assist BVDV to replicate efficiently in the host cells. The general association between codon usage indices and composition constraints shows that mutation pressure plays an important role in determining codon usage variation in BVDV. This is supported by the highly significant correlation between codon usage indices (f 0 1 ) and A%, U%, G%, C%, A 3 %, U 3 %, G 3 % and C 3 % values ( Table 6 ). The relationship between authentic ENC values and (G?C) 3 % is weaker than that of the expected values (Fig. 2) . We suggest that mutation pressure is one of the main factors responsible for the variation of synonymous codon usage in genomes of BVDV. Further analysis showed that these C 3 % values of BVDV isolates were low, with an average Table 5 Correlation analysis between the A, U, C, G content and the A 3 , U 3 , C 3 , G 3 content in all ORFs (Table 3) . Meanwhile, the U 3 % value is higher than the C 3 % value (mean U 3 %: mean C 3 % = 19.97:17.47), but only one U-ended codon, AGU, is used as a preferentially used codon. This indicates that natural selection is possibly involved in the patterns of synonymous codon usage. No correlation was found between C%, or U% and A 3 %, U 3 %, G 3 %, or C 3 % ( Table 5 ), suggesting that nucleotide constraints are involved in codon usage patterns due to low U% and C% values. Aromaticity is one of the factors in variations in amino acid usage [19] . The f 0 1 values had a negative correlation with the aromaticity of each protein (Table 5 ). In this study, the degree of aromaticity had a negative correlation with codon usage bias of BVDV, suggesting that natural selection may be involved in BVDV evolution. BVDV was first reported in 1946 [11] , and the scattered model of all 14 strains of BVDV-1 may imply that there is more diversity among BVDV-1 strains with the development of evolution ( Supplementary Fig. 1 ). Three BVDV-1 strains isolated from Asia were different from other BVDV-1 strains, implying that the strains isolated from Asia were distantly related to American or European strains. However, the strains from American were more closely related to those from Europe than to those from Asia. The low diversity in BVDV-2 might result from the limited number of samples. It is most likely that the codon usage bias in BVDV is related to genotype and geographic factors. The remarkable similarity in the codon usage patterns between the viruses and Bos taurus reveals that natural selective pressure gives BVDV higher adaptability to its host. This adaptability makes it possible for the virus to survive in the host cell and to use the components of the cell to produce more of itself. However, there is no evidence that the viruses are generally adapted to the codon usage patterns of their host (AUU, CUA, AGG, and AUA), and this is consistent with mutational bias theory [1] . Although it has been reported that isolate 14 was first found in swine, its nucleotide content is similar to that of strains originating from cattle, suggesting that strain 14 is also a possible cattle-origin virus. In this study, our analysis reveals that codon usage bias in BVDV is low, and mutation pressure is the main factor that affects codon usage variation in BVDV. Other factors, including base composition, genotype, geography, GRAVY, and even aromaticity may also significantly influence codon usage bias. Although our study provides a basic understanding of the codon usage patterns of BVDV and the roles played by mutation pressure and natural selection, a more comprehensive analysis is needed to reveal more information about codon usage bias variation within BVDV viruses and the other responsible factors. Codon usage bias amongst plant viruses Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences Genetic diversity of pestiviruses: identification of novel groups and implications for classification Codon usage and intragenic position Genome evolution and developmental constraint in Caenorhabditis elegans Molecular cloning and nucleotide sequence of the pestivirus bovine viral diarrhea virus Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei Molecular cloning and nucleotide sequence of a pestivirus genome, noncytopathic bovine viral diarrhea virus strain SD-1 Evolution of synonymous codon usage in metazoans Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales Bovine spongiform encephalopathy: current status and possible impacts Codon usage and tRNA content in unicellular and multicellular organisms The extended genetic diversity of BVDV-1: typing of BVDV isolates from France The extent of codon usage bias in human RNA viruses and its evolutionary origin Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome What drives codon choices in human genes? Distribution of viral antigen and development of lesions after experimental infection of calves with a BVDV 2 strain of low virulence Analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus Mutational bias and Gene expression level shape codon usage in Thermobifida fusca YX A proposed division of the pestivirus genus using monoclonal antibodies, supported by cross-neutralisation assays and genetic sequencing The nucleotide sequence of the 5'-untranslated region of bovine viral diarrhoea virus: its use as a probe in rapid detection of bovine viral diarrhoea viruses and border disease viruses Segregation of bovine viral diarrhea virus into genotypes Multiple outbreaks of severe acute BVDV in North America occurring between 1993 and 1995 linked to the same BVDV2 strain DNA sequence evolution: the sounds of silence Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons Codon usage determines translation rate in Escherichia coli Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases Analysis of synonymous codon usage in classical swine fever virus A survey for BVDV antibodies in cattle farms in Slovakia and genetic typing of BVDV isolates from imported animals Storage of bovine viral diarrhoea virus samples on filter paper and detection of viral RNA by a RT-PCR method The 'effective number of codons' used in a gene Analysis of synonymous codon usage in foot-and-mouth disease virus Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses