key: cord-0000872-dncy6tdc authors: Tian, Xiao-ting; Li, Bao-yu; Zhang, Liang; Jiao, Wen-qiang; Liu, Ji-xing title: Bioinformatics analysis of rabbit haemorrhagic disease virus genome date: 2011-11-01 journal: Virol J DOI: 10.1186/1743-422x-8-494 sha: eff26d8739498efca2d32fe2e66cdbebf0569c50 doc_id: 872 cord_uid: dncy6tdc BACKGROUND: Rabbit haemorrhagic disease virus (RHDV), as the pathogeny of Rabbit haemorrhagic disease, can cause a highly infectious and often fatal disease only affecting wild and domestic rabbits. Recent researches revealed that it, as one number of the Caliciviridae, has some specialties in its genome, its reproduction and so on. RESULTS: In this report, we firstly analyzed its genome and two open reading frameworks (ORFs) from this aspect of codon usage bias. Our researches indicated that mutation pressure rather than natural is the most important determinant in RHDV with high codon bias, and the codon usage bias is nearly contrary between ORF1 and ORF2, which is maybe one of factors regulating the expression of VP60 (encoding by ORF1) and VP10 (encoding by ORF2). Furthermore, negative selective constraints on the RHDV whole genome implied that VP10 played an important role in RHDV lifecycle. CONCLUSIONS: We conjectured that VP10 might be beneficial for the replication, release or both of virus by inducing infected cell apoptosis initiate by RHDV. According to the results of the principal component analysis for ORF2 of RSCU, we firstly separated 30 RHDV into two genotypes, and the ENC values indicated ORF1 and ORF2 were independent among the evolution of RHDV. Synonymous codons are not used randomly [1] . The variation of codon usage among ORFs in different organisms is accounted by mutational pressure and translational selection as two main factors [2, 3] . Levels and causes of codon usage bias are available to understand viral evolution and the interplay between viruses and the immune response [4] . Thus, many organisms such as bacteria, yeast, Drosophila, and mammals, have been studied in great detail up on codon usage bias and nucleotide composition [5] . However, same researches in viruses, especially in animal viruses, have been less studied. It has been observed that codon usage bias in human RNA viruses is related to mutational pressure, G +C content, the segmented nature of the genome and the route of transmission of the virus [6] . For some vertebrate DNA viruses, genome-wide mutational pressure is regarded as the main determinant of codon usage rather than natural selection for specific coding triplets [4] . Analysis of the bovine papillomavirus type 1 (BPV1) late genes has revealed a relationship between codon usage and tRNA availability [7] . In the mammalian papillomaviruses, it has been proposed that differences from the average codon usage frequencies in the host genome strongly influence both viral replication and gene expression [8] . Codon usage may play a key role in regulating latent versus productive infection in Epstein-Barr virus [9] . Recently, it was reported that codon usage is an important driving force in the evolution of astroviruses and small DNA viruses [10, 11] . Clearly, studies of synonymous codon usage in viruses can reveal much about the molecular evolution of viruses or individual genes. Such information would be relevant in understanding the regulation of viral gene expression. Up to now, little codon usage analysis has been performed on Rabbit haemorrhagic disease virus (RHDV), which is the pathogen causing Rabbit haemorrhagic disease (RHD), also known as rabbit calicivirus disease (RCD) or viral haemorrhagic disease (VHD), a highly infectious and often fatal disease that affects wild and domestic rabbits. Although the virus infects only rabbits, RHD continues to cause serious problems in different parts of the world. RHDV is a single positive stranded RNA virus without envelope, which contains two open reading frames (ORFs) separately encoding a predicted polyprotein and a minor structural protein named VP10 [12] . After the hydrolysis of self-coding 3C-like cysteinase, the polyprotein was finally hydrolyzed into 8 cleavage products including 7 nonstructural proteins and 1 structural protein named as VP60 [13, 14] . Studies on the phylogenetic relationship of RHDVs showed only one serotype had been isolated, and no genotyping for RHDV was reported. It reported that the VP10 was translated with an efficiency of 20% of the preceding ORF1 [15] . In order to better understand the characteristics of the RHDV genome and to reveal more information about the viral genome, we have analyzed the codon usage and dinucleotide composition. In this report, we sought to address the following issues concerning codon usage in RHDV: (i) the extent and causes of codon bias in RHDV; (ii) A possible genotyping of RHDV; (iii) Codon usage bias as a factor reducing the expression of VP10 and (iiii) the evolution of the ORFs. The 30 available complete RNA sequences of RHDV were obtained from GenBank randomly in January 2011. The serial number (SN), collection dates, isolated areas and GenBank accession numbers are listed in Table 1 . To investigate the characteristics of synonymous codon usage without the influence of amino acid composition, RSCU values of each codon in a ORF of RHDV were calculated according to previous reports (2 Sharp, Tuohy et al. 1986 ) as the followed formula: Where g ij is the observed number of the ith codon for jth amino acid which has n i type of synonymous codons. The codons with RSCU value higher than 1.0 have positive codon usage bias, while codons with value lower than 1.0 has relative negative codon usage bias. As RSCU values of some codons are nearly equal to 1.0, it means that these codons are chosen equally and randomly. The index GC3s means the fraction of the nucleotides G+C at the synonymous third codon position, excluding Met, Trp, and the termination codons. The ENC, as the best estimator of absolute synonymous codon usage bias [16] , was calculated for the quantification of the codon usage bias of each ORF [17] . The predicted values of ENC were calculated as ENC = 2 + s + 29 where s represents the given (G+C) 3 % value. The values of ENC can also be obtained by EMBOSS CHIPS program [18] . Analyses were conducted with the Nei-Gojobori model [19] , involving 30 nucleotide sequences. All positions containing gaps and missing data were eliminated. The values of dn, ds and ω (dn/ds) were calculated in MEGA4.0 [20] . Multivariate statistical analysis can be used to explore the relationships between variables and samples. In this study, correspondence analysis was used to investigate the major trend in codon usage variation among ORFs. In this study, the complete coding region of each ORF was represented as a 59 dimensional vector, and each dimension corresponds to the RSCU value of one sense codon (excluding Met, Trp, and the termination codons) [21] . Correlation analysis was used to identify the relationship between nucleotide composition and synonymous codon usage pattern [22] . This analysis was implemented based on the Spearman's rank correlation analysis way. All statistical processes were carried out by with statistical software SPSS 17.0 for windows. The values of nucleotide contents in complete coding region of all 30 RHDV genomes were analyzed and listed in Table 2 and Table 3 . Evidently, (C+G)% content of the ORF1 fluctuated from 50.889 to 51.557 with a mean value of 51.14557, and (C+G)% content of the ORF2 were ranged from 35.593 to 40.113 with a mean value of 37.6624, which were indicating that nucleotides A and U were the major elements of ORF2 against ORF1. Comparing the values of A 3 %, U 3 %, C 3 % and G 3 %, it is clear that C 3 % was distinctly high and A 3 % was the lowest of all in ORF1 of RHDV, while U 3 % was distinctly high and C 3 % was the lowest of all in ORF2 of Table 2 Identified nucleotide contents in complete coding region (length > 250 bps) in the ORF1 of RHDV (30 isolates) genome Table 4 . Most preferentially used codons in ORF1 were C-ended or G-ended codons except Ala, Pro and Ser, however, A-ended or G-ended codons were preferred as the content of ORF2. In addition, the dn, ds and ω(dN/dS) values of ORF1 were separately 0.014, 0.338 and 0.041, and the values of ORF2 were 0.034, 0.103 and 0.034, respectively. The ω values of two ORFs in RHDV genome are generally low, indicating that the RHDV whole genome is subject to relatively strong selective constraints. COA was used to investigate the major trend in codon usage variation between two ORFs of all 30 RHDV selected for this study. After COA for RHDV Genome, one major trend in the first axis (f' 1 ) which accounted for 42.967% of the total variation, and another major trend in the second axis (f' 2 ) which accounted for 3.632% of the total variation. The coordinate of the complete coding region of each ORF was plotted in Figure 1 defining by the first and second principal axes. It is clear that coordinate of each ORF is relatively isolated. Interestingly, we found that relatively isolated spots from ORF2 tend to cluster into two groups: the ordinate value of one group (marked as Group 1) is To estimate whether the evolution of RHDV genome on codon usage was regulated by mutation pressure or natural selection, the A%, U%, C%, G% and (C+G)% were compared with A 3 %, U 3 %, C 3 %, G 3 % and (C 3 +G 3 )%, respectively (Table 5 ). There is a complex correlation among nucleotide compositions. In detail, A 3 %, U 3 %, C 3 % and G 3 % have a significant negative correlation with G%, C%, U% and A% and positive correlation with A%, U%, C% and G%, respectively. It suggests that nucleotide constraint may influence synonymous codon usage patterns. However, A 3 % has non-correlation with U% and C%, and U 3 % has noncorrelation with A% and G%, respectively, which haven't indicated any peculiarity about synonymous codon usage. Furthermore, C 3 % and G 3 % have non-correlation with A%, G% and U%, C%, respectively, indicating these data don't reflect the true feature of synonymous codon usage as well. Therefore, linear regression analysis was implemented to analyze the correlation between synonymous codon usage bias and nucleotide compositions. Details of correlation analysis between the first two principle axes (f' 1 and f' 2 ) of each RHDV genome in COA and nucleotide contents were listed in Table 6 . In surprise, only f2 values are closely related to base nucleotide A and G content on the third codon position only, suggesting that nucleotide A and G is a factor influencing the synonymous codon usage pattern of RHDV genome. However, f' 1 value has non-correlation with base nucleotide contents on the third codon position; it is observably suggest that codon usage patterns in RHDV were probably influenced by other factors, such as the second structure of viral genome and limits of host. In spite of that, compositional constraint is a factor shaping the pattern of synonymous codon usage in RHDV genome. Figure 1 A plot of value of the first and second axis of RHDV genome in COA. The first axis (f' 1 ) accounts for 42.967% of the total variation, and the second axis (f' 2 ) accounts for 3.632% of the total variation. Table 5 Summary of correlation analysis between the A, U, C, G contents and A 3 , U 3 , C 3 , G 3 contents in all selected samples There have been more and more features that are unique to RHDV within the family Caliciviridae, including its single host tropism, its genome and its VP10 as a structural protein with unknown function. After we analyzed synonymous codon usage in RHDV (Table 2) , we obtained several conclusions and conjectures as followed. 4.1 Mutational bias as a main factor leading to synonymous codon usage variation ENC-plot, as a general strategy, was utilized to investigate patterns of synonymous codon usage. The ENC-plots of ORFs constrained only by a C 3 +G 3 composition will lie on or just below the curve of the predicted values [18] . ENC values of RHDV genomes were plotted against its corresponding (C 3 +G 3 ) %. All of the spots lie below the curve of the predicted values, as shown in Figure 2 , suggesting that the codon usage bias in all these 30 RHDV genomes is principally influenced by the mutational bias. As we know, the efficiency of gene expression is influenced by regulator sequences or elements and codon usage bias. It reported that the RNA sequence of the 3terminal 84 nucleotides of ORF1were found to be crucial for VP10 expression instead of the encoded peptide. VP10 coding by ORF2 has been reported as a low expressive structural protein against VP60 coding by ORF1 [5] . And its efficiency of translation is only 20% of VP60. According to results showed by Table 4 , it revealed the differences in codon usage patterns of two ORFs, which is a possible factor reducing the expression of VP10. Although VP10 encoded by ORF2, as a minor structural protein with unknown functions, has been described by LIU as a nonessential protein for virus infectivity, the ω Figure 2 Effective number of codons used in each ORF plotted against the GC3s. The continuous curve plots the relationship between GC3s and ENC in the absence of selection. All of spots lie below the expected curve. value of ORF2 suggests VP10 plays an important role in the certain stage of whole RHDV lifecycle. After combining with low expression and ω value of VP10, we conjectured that VP10 might be beneficial for the replication, release or both of virus by inducing infected cell apoptosis initiate by RHDV. This mechanism has been confirmed in various positive-chain RNA viruses, including coxsackievirus, dengue virus, equine arterivirus, footand-mouth disease virus, hepatitis C virus, poliovirus, rhinovirus, and severe acute respiratory syndrome [23] [24] [25] [26] [27] [28] [29] , although the details remain elusive. As preceding description, ENC reflects the evolution of codon usage variation and nucleotide composition to some degree. After the correlation analysis of ENC values between ORF1 and ORF2 (Table 7) , the related coefficient of ENC values of two ORFs is 0.230, and p value is 0.222 more than 0.05. These data revealed that no correlation existed in ENC values of two ORFs, indicating that codon usage patterns and evolution of two ORFs are separated each other. Further, this information maybe helps us well understand why RSCU and ENC between two ORFs are quite different. Interestingly, we found that relatively isolated spots from ORF2 tend to cluster into two groups: the ordinate value of one group (marked as Group 1) is positive value and the other one (marked as Group 2) is negative value. And all of those strains isolated before 2000 belonged to Group 2, including Italy-90, RHDV-V351, RHDV-FRG, BS89, RHDV-SD and M67473.1. Although RHDV has been reported as only one type, this may be a reference on dividing into two genotypes. In this report, we firstly analyzed its genome and two open reading frameworks (ORFs) from this aspect of codon usage bias. Our researches indicated that mutation pressure rather than natural is the most important determinant in RHDV with high codon bias, and the codon usage bias is nearly contrary between ORF1 and ORF2, which is maybe one of factors regulating the expression of VP60 (encoding by ORF1) and VP10 (encoding by ORF2). Furthermore, negative selective constraints on the RHDV whole genome implied that VP10 played an important role in RHDV lifecycle. We conjectured that VP10 might be beneficial for the replication, release or both of virus by inducing infected cell apoptosis initiate by RHDV. According to the results of the principal component analysis for ORF2 of RSCU, we firstly separated 30 RHDV into two genotypes, and the ENC values indicated ORF1 and ORF2 were independent among the evolution of RHDV. All the results will guide the next researches on the RHDV as a reference. What drives codon choices in human genes? Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes Ribosome traffic in Ecoli and regulation of gene expression Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses The evolution of base composition and phylogenetic inference The extent of codon usage bias in human RNA viruses and its evolutionary origin Papillomavirus capsid protein expression level depends on the match between codon usage and tRNA availability Codon usage bias and A+T content variation in human papillomavirus genomes Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses Compositional bias and size of genomes of human DNA viruses Host-related nucleotide composition and codon usage as driving forces in the recent evolution of the Astroviridae Genetic map of the calicivirus rabbit hemorrhagic disease virus as deduced from in vitro translation studies 3C-like protease of rabbit hemorrhagic disease virus: identification of cleavage sites in the ORF1 polyprotein and analysis of cleavage specificity Rabbit hemorrhagic disease virus: genome organization and polyprotein processing of a calicivirus studied after transient expression of cDNA constructs Translation of the minor capsid protein of a calicivirus is initiated by a novel termination-dependent reinitiation mechanism An evaluation of measures of synonymous codon usage bias The "effective number of codons" used in a gene Analysis of Synonymous Codon Usage Bias in Chlamydia Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0 Multivariate analysis NewYork Statistical Methods in Bioinformatics NewYork Expression of Hepatitis C virus proteins induced distinct membrane alterations including a candidate viral replication complex Involvement of autophagy in viral infections: antiviral function and subversion by viruses Subversion of cellular autophagosomal machinery by RNA viruses Autophagic machinery activated by dengue virus enhances virus replication Open reading frame 1aencoded subunits of the arterivirus replicase induce endoplasmic reticulum derived double-membrane vesicles which carry the viral replication complex Aggresomes and autophagy generate sites for virus replication Autophagosome supports Coxsackievirus B3 replication in host cells Bioinformatics analysis of rabbit haemorrhagic disease virus genome This work was supported by the fund of Special Social Commonweal Research Programs for Research Institutions (2005DIB4J041, China). XTT and BYL contributed equally to the original draft of the manuscript, and approved the final version. ZL and WQJ contributed to conception and design of the manuscript, and revised the manuscript. LJX is the corresponding author. All authors have read and approved the final manuscript. The authors declare that they have no competing interests. Submit your next manuscript to BioMed Central and take full advantage of: