key: cord-0812682-80eurxt6 authors: Wang, Meng; Zhang, Jie; Zhou, Jian-hua; Chen, Hao-tai; Ma, Li-na; Ding, Yao-zhong; Liu, Wen-qian; Gu, Yuan-xing; Zhao, Feng; Liu, Yong-sheng title: Analysis of codon usage in type 1 and the new genotypes of duck hepatitis virus date: 2011-06-17 journal: Biosystems DOI: 10.1016/j.biosystems.2011.06.005 sha: 7cb70b41e55103d9c002cf25bb675c244a62e44c doc_id: 812682 cord_uid: 80eurxt6 In this study, an abundant (A + U)% and low codon bias were revealed in duck hepatitis virus type 1 (DHV-1) and the new serotype strains isolated from Taiwan, South Korea and Mainland China (DHV-N). The general correlation between base composition and codon usage bias suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage bias in these samples. By comparative analysis of the codon usage patterns of 40 ORFs of DHV, we found that all of DHV-1 strains grouped in genotype C; the DHV-N strains isolated in South Korea and China clustered into genotypes B; and the DHV-N strains isolated from Taiwan clustered into genotypes A. The findings revealed that more than one subtype of DHV-1 circulated in East Asia. Furthermore, the results of phylogenetic analyses based on RSCU values and Clustal W method indicated obvious phylogenetic congruities. This suggested that better genome consistency of DHV may exist in nature and phylogenetic analyses based on RSCU values maybe a good method in classifying genotypes of the virus. Our work might give some clues to the features and some evolutionary information of DHV. Duck hepatitis virus (DHV) is the causative agent of duck viral hepatitis, an acute and fatal disease of young ducklings, characterized primarily by hepatitis. Three different serotypes (DHV-1, DHV-2, and DHV-3) have been described (Gough et al., 1985; Haider and Calnek, 1979) . DHV-1 was provisionally classified as an enterovirus (Harvala et al., 2005; Wang et al., 2008) , and DHV-3 was classified as a probable picornavirus. DHV-2 is now classified as an astrovirus (Gough et al., 1984 (Gough et al., , 1987 . DHV-1 is distributed worldwide and is one of the most economically important to all duck farms because of the high potential for mortality, while DHV-2 and DHV-3 have only been reported in the UK and the USA, respectively (Ding and Zhang, 2007) . The genome of DHV-1 is a single stranded, polyadenylated, positive sense RNA of approximately 7800 nucleotides with a single, long open reading frame (ORF) encoding a polyprotein covalently linked to the 5 end of genome. As an important evolutionary phenomenon, it was well known that synonymous codon usage bias exists in a wide range of biological systems from prokaryotes to eukaryotes (Archetti, 2004; Liu et al., 2010) . Many previous analysis of codon usage have suggested that many different biological factors are related to synonymous codon usage biases, but codon usage variation is represented by two major paradigms (Guo and Yuan, 2009) . Either of or both of mutational bias and selection determine codon usage (Zhou and Li, 2009 ). These observed patterns in synonymous codon usage varied among genes within a genome, and among genomes. Recently, it was reported that codon usage is an important driving force in the evolution of astroviruses and small DNA viruses (Karlin et al., 1990) . Clearly, a better knowledge of codon usage bias in the virus is essential to understanding the processes governing their evolution, particularly the overall role played by mutation pressure. Recent analyses of Picornaviridae codon composition and codon usage were primarily focused on foot-and-mouth disease (Zhong et al., 2007) and hepatitis A virus (Jenkins and Holmes, 2003) . However, little information about synonymous codon usage patterns of DHV has been acquired to date. In addition, some new serotypes of DHV with some features of DHV-1, but with no antigenic relationship with DHV-1 in cross neutralization test, have been reported (Kim et al., 2007) . In this study, some codon usage indexes, such as the relative synonymous codon usage (RSCU) values, effective number of codon (ENC) were utilized to reveal the relationship between DHV-1 and DHV-N. Previously reported picornavirus sequences including the DHV-N strains isolated in Taiwan , Mainland China Pan et al., (Kim et al., 2007) , the two strains from ATCC (Kim et al., 2006) and other deposited DHV-1 genomes used in the comparisons were retrieved from NCBI (http://www.ncbi.nlm.nih.gov/Genbank/). A total of 40 DHV genomes were used in this study. The serial number (SN), genotype, isolated region, GenBank accession numbers, and other detail information about these strains were listed in Table 1 . All above DHV genomes only contains a single long ORF. In order to investigate the extent of codon usage bias in DHV-1 and DHV-N, all RSCU values of different codon in 40 ORFs of DHV strains were calculated to measure synonymous codon usage (Sharp and Li, 1986; Zhou et al., 2010) . The RSCU values of codons in the ORF of the DHV were calculated according to the formula of our previous reports (Wang et al., 2011a,b) . where g ij is the observed number of the ith codon for jth amino acid which has n i type of synonymous codons. The codon with RSCU value more than 1.0 has positive codon usage bias, while the value <1.0 has relative negative codon usage bias. When RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly. ENC is the best overall estimator of absolute synonymous codon usage bias. It was used to quantify the codon usage bias of each ORF of DHV. The predicted values of ENC and the GC content of the third codon position (GC3) provides a useful display of the main features of codon usage patterns of DHV. All of codon usage indices were calculated by the methods described previously (Wang et al., 2011b) . 40 ORFs of DHV were aligned using the Clustal W method implemented in the MegAlign program (Thompson et al., 1994 ) (DNAStar). Phylogenetic trees were pro- Note: The preferentially used codons for each amino acid are described in bold. a AA is the abbreviation of amino acid. b RSCU value is the fraction of the relative synonymous codon usage. The words in bold refer to preferred codon. duced using Neighbor-joining implemented in the program MEGA 4.0 (Tamura et al., 2007) . Principal component analysis (PCA) was carried out to analyze the major trend in codon usage pattern in different ORF of 40 DHV genomes. It is a statistical method that performs linear mapping to extract optimal features from an input distribution in the mean squared error sense and can be used by self-organizing neural networks to form unsupervised neural preprocessing modules for classification problems (Kanaya et al., 2001) . Correlation analysis is used to identify the relationship between codon usage bias and synonymous codon usage patterns of DHV-1 and DHV-N. This analysis is implemented based on the Spearman's rank correlation analysis way. All statistical analyses were carried out using the statistical analysis software SPSS Version 17.0. The comparative analysis of RSCU values indicated that only two preferred codon GAG and UUG which chooses G at the third position, and the rest of preferred ones are all ended with A or U ( Table 2) . The values of ENC among these samples are very similar, and vary from 49.837 to 52.440 with a mean value of 50.750 and S.D. of 0.124 (Table 3 ), suggesting that the extent of codon preference in DHV-1 and DHV-N genomes are less biased (mean ENC > 40) and keeps at a stable level. In codon usage pattern of DHV, the codons ended with A or U were favored, indicating that the content of A and U in different position of sense codons may reflect some important characteris- tics of codon usage pattern of DHV. Firstly, (A + U) 12 % was compared with (A + U) 3 %, a highly significant correlation was observed (Spearman r = 0.475, P < 0.01). Secondly, the correlation between the Axis 1 (calculated by PCA) which was the largest trends in codon usage among these genomes and A%, C%, G%, U%, A 3 %, C 3 %, G 3 %, U 3 %, (A + U)%, (A + U) 3 % of each strain was also analyzed. The significant correlation was found between nucleotide compositions and synonymous codon usage to some extent (Table 4) . Finally, the ENCplot [ENC plotted against (G + C) 3 %] was used as a part of general strategy to investigate patterns of synonymous codon usage. All of the spots gathered together lie below the expected curve indicates that the codon usage bias of these samples have no apparent difference, implying that the codon bias can be explained mainly by an uneven base composition, in other words, by mutation pressure rather than natural selection (Fig. 1) . From the result of PCA, we could detect one major trend in the first axis (Axis 1) which can account for 33.451% of the total varia-tion, and another major trend in the second axis (Axis 2) for 11.928% of the total variation. A plot of the Axis 1 and the Axis 2 of each ORF in NDV was shown in Fig. 2 . Clearly, because of different codon usage pattern, 40 ORFs of DHV were mainly gathered at three places. We termed them as DHV types A, B, and C. Furthermore, we found that all of DHV-1 strains grouped in genotype C; the DHV-N strains isolated in South Korea and China clustered into genotypes B; and the DHV-N strains isolated Taiwan clustered into genotypes A. Comparisons between members which belong to same lineage showed that the range of nucleotide identity in lineage C was 92.7-99.2%, in lineage B was 92.7-98.6%, and in lineage A was 99.6%. In addition, the nucleotide identity between members belong to different lineages were also calculated as follow: between lineage A and lineage B were 78.5-78.1%, between lineage B and lineage C were 72.9-74.1%, between lineage A and lineage C were72.8-73.4%. In addition, phylogenetic analysis of these 40 ORFs of DHV strains was done to identify the result of principal component analysis. The same classification of lineage A, lineage B or lineage C was obtained with that of principle component analysis (Fig. 3) . As shown in Fig. 4 , the global codon usage pattern was very similar among the three lineages of DHV coding regions except the codons coding for Ala, Arg, Glu and Gly. The synonymous codon usage has been well established to reveal genetic information of some viral genomes (Bai et al., 2004; Jenkins and Holmes, 2003) . Like other viruses, as for, SARS-covs (mean ENC = 48.99) (Gu et al., 2004) and human Bocavirus (mean ENC = 44.45) (Zhao et al., 2008) , the synonymous codon usage bias in DHV was also low (mean ENC = 50.750, higher than 40). A low codon usage bias is advantageous to replicate efficiently in vertebrate host cells, with potentially distinct codon preferences (Das et al., 2006; Zhong et al., 2007) . In this study, the highly significant correlation between (A + U) 12 % and (A + U) 3 % (Spearman r = 0.475, P < 0.01), suggests that mutational pressure was the main factor that determined codon usage bias, rather than natural selection, since the effects are present at all codon positions (Tao et al., 2009; Zhong et al., 2007) . That is also supported by the ENC-plot [ENC plotted against (G + C) 3 %] (Fig. 1 ) and the highly significant correlation between codon usage indices (Axis 1) and A%, U%, G%, C%, (A + U)%, A 3 %, U 3 %, G 3 %, C 3 % and (A + U) 3 % (Table 4) . A phylogenetic tree is a branching diagram showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics and is always generated according to the result of sequence alignment. There existed published reports of similar clustering phenomenon by using different methods. In detail, Wang et al. (2008) reported DHV could be classified into three genotypes according to phylogenetic trees, based on the nucleotide sequences of VP1, VP0, VP3, and partial 3D. All Chinese strains in the above-mentioned paper were clustered into one genotype. However, our study first reported more than one genotypes of DHV circulates in Mainland China and phylogenetic analyses based on RSCU values maybe provide more detailed information in classifying genotypes of DHV. The lineage A of DHV only contains strains 90D and 04G, which was isolated from Taiwan. Recently, Tseng and Tsai reported strains 90D and 04G were determined to be antigenically unrelated to DHV-1 by in vitro cross-neutralization assay and phylogenetic and evolutionary analysis of the two strains revealed that the two strains belong to a novel genus in the Picornaviridae family . Moreover, the strains isolated from China and Korea has two different codon usage patterns, this finding reconfirm that more than one genotype of DHV circulated in Korea and China Kim et al., 2007; . The study also implied that differences of the codon usage patterns might be utilized as a reference to classify genotypes of virus (Wang et al., 2011b) . To our knowledge, our work was the first report of the codon usage analysis on DHV-1 and DHV-N. It was revealed that codon usage bias in DHV was low and mutational pressure was the main factor that affects codon usage variation in DHV. DHV-1 and DHV-N strains of this study were classified into three genotypes based on codon usage analysis. All of three different codon usage patterns existed in DHV strains isolated from East Asia. However, due to a lack of sequence data and detailed information about these isolations, a comprehensive analysis is needed to reveal more information about other responsible factors within DHV. Codon usage bias and mutation constraints reduce the level of error minimization of the genetic code Analysis of codon usage in potato and its application in the modification of t-PA gene Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy Molecular analysis of duck hepatitis virus type 1 Molecular detection and typing of duck hepatitis A virus directly from clinical specimens An outbreak of duck hepatitis type II in commercial ducks An outbreak of duck virus enteritis in commercial ducks and geese in East Anglia Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales Codon usages of genes on chromosome, and surprisingly, genes in plasmid are primarily affected by strand-specific mutational biases in Lawsonia intracellularis In vitro isolation, propagation, and characterization of duck hepatitis virus type III Tissue tropism of recombinant coxsackieviruses in an adult mouse model The extent of codon usage bias in human RNA viruses and its evolutionary origin Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses Recent Korean isolates of duck hepatitis virus reveal the presence of a new geno-and serotype when compared to duck hepatitis virus type 1 type strains Molecular analysis of duck hepatitis virus type 1 reveals a novel lineage close to the genus Parechovirus in the family Picornaviridae Codon usage bias and recombination events for neuraminidase and hemagglutinin genes in Chinese isolates of influenza A virus subtype H9N2 Characteristics of the 3 end sequence in genome of a new serotype of duck hepatitis virus isolated from China Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons Genomic sequence of a new serotype duck hepatitis virus MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0 Analysis of synonymous codon usage in classical swine fever virus CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Molecular analysis of duck hepatitis virus type 1 indicates that it should be assigned to a new genus Molecular characterization of a new serotype of duck hepatitis virus Classification of duck hepatitis virus into three genotypes based on molecular evolutionary analysis Analysis of codon usage in Newcastle disease virus Analysis of codon usage in bovine viral diarrhea virus Analysis of synonymous codon usage in 11 human bocavirus isolates Mutation pressure shapes codon usage in the GC-Rich genome of foot-and-mouth disease virus Analysis of synonymous codon usage in foot-and-mouth disease virus Analysis of synonymous codon usage patterns in different plant mitochondrial genomes This work was supported in parts by grants from National Science & Technology Key Project (2009ZX08007-006B), International Science & Technology Cooperation Program of China (No. 2010DFA32640) and Science and Technology Key Project of Gansu Province (No. 0801NKDA034). This study was also supported by National Natural Science foundation of China (No. 30700597 and No. 31072143).