key: cord-284866-66azyje4
authors: D’ Andrea, Lucía; Pintó, Rosa M.; Bosch, Albert; Musto, Héctor; Cristina, Juan
title: A detailed comparative analysis on the overall codon usage patterns in Hepatitis A virus
date: 2011-02-04
journal: Virus Res
DOI: 10.1016/j.virusres.2011.01.012
sha: 
doc_id: 284866
cord_uid: 66azyje4

Hepatitis A virus (HAV) is a hepatotropic member of the family Picornaviridae. HAV has several unique biological characteristics that distinguish it from other members of this family. Recent and previous studies revealed that codon usage plays a key role in HAV replication and evolution. In this study, the patterns of synonymous codon usage in HAV have been studied through multivariate statistical methods on 30 complete open reading frames (ORFs) from the available 30 full-length HAV sequences. Effective number of codons (ENC) indicates that the overall extent of codon usage bias in HAV genomes is significant. The relative dinucleotide abundances suggest that codon usage in HAV can also be strongly influenced by underlying biases in dinucleotide frequencies. These factors strongly correlated with the first major axis of correspondence analysis (COA) on relative synonymous codon usage (RSCU). The distribution of the HAV ORFs along the plane defined by the first two major axes in COA showed that different genotypes are located at different places in the plane, suggesting that HAV codon usage is also reflecting an evolutionary process. It has been very recently described that fine-tuning translation kinetics selection also contributes to codon usage bias of HAV. The results of these studies suggest that HAV genomic biases are the result of the co-evolution of genome composition, controlled translation kinetics and probably the ability to escape the antiviral cell responses.

Due to the degeneracy of the genetic code, most amino acids are coded by more than one codon (synonymous codon usage). These synonymous codons are not used randomly. Rather, there are some codons that are used more frequently than others. Mutational pressure and translational selection are thought to be among the main factors that account for codon usage variation among genes in different organisms (Sharp and Li, 1986a; Karlin and Mrazek, 1996; Lesnik et al., 2000) .

Understanding the extent and causes of biases in codon usage is essential to the comprehension of viral evolution, particularly the interplay between viruses and the immune response (Shackelton et al., 2006) .

Hepatitis A virus (HAV) is a hepatotropic member of the family Picornaviridae (Wimmer and Murdin, 1991) , and its viral genome consists of a 7.5-kilobase (kb), positive-stranded RNA with a single open reading frame (ORF). The ORF, which codes 2227 amino acids is organized into three functional regions termed P1, P2 and P3. P1 encodes the capsid polypeptides VP1 to VP4, whereas P2 and P3 encode non-structural polypeptides. The ORF is preceded by a 5 untranslated region (UTR) and is followed by a 3 UTR with a short poly A tail (Hollinger and Emerson, 2001) .

The structure of HAV, its tissue tropism, and genetic distance from other members of the family Picornaviridae indicate that HAV is unique within this family (Martin and Lemon, 2006; Cristina and Costa-Mattioli, 2007) .

HAV has been shown to possess a single conserved immunogenic neutralization site, and isolates from different parts of the world belong to a single serotype (Stapleton and Lemon, 1987; Hollinger and Emerson, 2001) . Nevertheless, the study of the HAV evolution in cell culture revealed the presence of some antigenic variants in the mutant spectra that were generated even in the absence of immune selection (Sanchez et al., 2003) . Furthermore, several escape mutants, representing antigenic variants, have been selected for their resistance to different monoclonal antibodies (MAbs), suggesting the occurrence of severe structural constraints in the HAV capsid that prevent the more extensive substitutions necessary for the emergence of a new serotype (Nainan et al., 1992; Ping and Lemon, 1992) .

Very recent in vitro studies have shown the occurrence of highly conserved clusters of rare codons in the HAV capsid-coding region and that substitutions in these clusters are negatively selected, suggesting that the need to maintain such clusters play a role in the low antigenic variability of HAV (Aragones et al., 2008) . Moreover, recent studies suggest that fine-tuning translation kinetics selection is also underlying codon usage bias in this specific genome region (Aragones et al., 2010) . These results reveal that codon usage plays a key role in HAV replication and evolution. However, our knowledge of other factors also contributing to shaping synonymous codon usage bias and nucleotide composition in human HAV in vivo is rather scarce. In order to gain insight into these matters, we analyzed the codon usage and base composition of all available ORFs from 30 human HAV isolates, and investigated the possible key evolutionary determinants of codon usage bias.

Full-length ORFs nucleotide sequences (corresponding to 2227 amino acids) were obtained for 30 human HAV isolates by mean of the use of ARSA at DDBJ database (available at: http://arsa.ddbj.nig.ac.jp/) and EMBL database (available at: http://www.ebi.ac.uk/embl/Access/index.html). For strain names, accession numbers, geographic location of isolation and genotypes, see Supplementary Material Table 1 .

In order to investigate the extent of codon usage bias in HAV, we first aligned the complete ORF code sequences from the HAV strains, using the MUSCLE program (Edgar, 2004) . Once aligned, the relative synonymous codon usage (RSCU) values of each codon were determined in order to measure the synonymous codon usage (Sharp and Li, 1986b) . This was done using the CodonW program (available at: http://mobyle.pasteur.fr). The RSCU is the observed frequency of a codon divided by the frequency expected, if all synonymous codons for that amino acid were used equally. If RSCU value is close to 1.0, it indicates a lack of bias (Tsai et al., 2007) . RSCU values are largely independent of amino acid composition and are particularly useful in comparing codon usage between genes that differ in size and amino acid composition. The RSCU of HAV ORFs were compared with corresponding values of human cells (International Human Genome Sequencing Consortium, 2001). The effective number of codons (ENC) and the frequency of use of GC 3 S (G+C at synonymous variable third position codons, excluding Met, Trp, and termination codons) were also calculated by the use of the Codon W program. ENC was used to quantify the codon usage bias of an ORF (Wrigth, 1990) , which is one of the best overall estimator of absolute synonymous codon usage bias (Comeron and Aguade, 1998) . The ENC values range from 20 to 61. The larger the extent of codon bias in a gene, the smaller the ENC value is. In an extremely biased gene where only one codon is used for each amino acid, this value would be 20; in an unbiased gene, it would be 61. Similarly, the fraction of the G+C nucleotides not involved in the GC 3 S fraction (GC 12 ) was also calculated. All these indices were also calculated using the Codon W program. The relative frequencies of dinucleotides were also calculated using this program as implemented in the Mobile server (http://mobyle.pasteur.fr).

COA is an ordination technique that identifies the major trends in the variation of the data and distributes genes along continuous axes in accordance with these trends. COA creates a series of orthogonal axes to identify trends that explain the data variation, with each subsequent axis explaining a decreasing amount of the variation (Greenacre, 1984) . Each ORF is represented as a 59dimensional and each dimension is related to the RSCU value of each triplet (excluding AUG, UGG and stop codons). This was done using the CodonW program.

Correlation analysis was carried out using Spearman's rank correlation analysis method (Wessa, 2010 ; available at: www.wessa.net).

In order to study the extent of codon bias in HAV ORFs, the average codon usage values for all triplets were calculated. The results of these studies are shown in Table 1 .

Interestingly, the frequencies of codon usage in HAV ORFs are significantly different than the ones used by human cells. Particularly, extremely highly biased codon frequencies were found for Phe, His, Asn, Asp, Cys and Arg (see Table 1 ). Almost all extremely high preferred codons were U-ended (see Table 1 ). Table 2 . Due to the fact that almost all ENC values are less than 40, the results obtained for the HAV ORFs studied reveal that codon usage in HAV is biased.

In order to investigate the patterns of synonymous codon usage, the correlations between the positions of the ORFs along the first principal axis generated by the COA and the respective GC 3 S and GC 12 values of each strain were analyzed. The first principal axis in COA accounts for 45.34% of the total variation, while the next three principal axes in account for 10.14%, 8.63% and 6.56% of the variability, respectively. The first axis in COA is highly correlated with the GC 3 S and GC 12 values in HAV ORFs. This result reveals that nucleotide composition plays an important key role in the codon usage bias observed in HAV ORFs (see Table 2 ).

In order to detect the possibility of codon usage variation of different HAV genomes, the HAV ORFs were divided according to their HAV genotype (IA, IB, IIA, IIB, IIIA and IIIB). COA was performed on the RSCU values of each HAV ORF and the distribution of the six genotypes along the first two principal axes of COA was determined. The results of these studies are shown in Fig. 1 .

Surprisingly, the distribution of the six genetic groups in the plane defined by the first two major axes showed that different genotypes were located at different places, suggesting that different HAV genotypes exhibit differences in their codon usage patterns (see Fig. 1 ).

In order to gain insight into these findings, the average codon usage values for all codons were calculated for genotype IA and IIIA strains enrolled in these studies, accounting for 24,416 and 31,108 codons, respectively. The results of these studies are shown in Supplementary Material Table 3 . Interestingly, the frequencies of codon usage in the different HAV genotypes show significant different frequencies in CCA (Pro) and CGC (Arg) codons.

In order to observe if different frequencies of codon usage are found in different HAV genome regions, the same studies were repeated for the structural (P1) and non-structural regions (P2 + P3) of the HAV genome. The results of these studies are shown in Supplementary Material Table 4 . Roughly similar values are obtained for both regions for most codons, although significant differences were found for Arg codons (CGC and AGG).

It has been suggested that dinucleotide biases can affect codon bias (Tao et al., 2009) . To study the possible effect of dinucleotide composition on codon usage of the HAV ORFs, the relative abundances of the 16 dinucleotides in the ORFs of the 30 HAV strains were established. The results of these analyses are shown in Table 3 .

The occurrences of dinucleotides are not randomly distributed and no dinucleotides were present at the expected frequencies (Table 3 ). The relative abundance of CpG showed a strong deviation from the "normal range" (mean ± SD = 0.063 ± 0.009) and was markedly underrepresented. On the other had, the frequency of UpU was above the expected value (mean ± SD = 1.891 ± 0.051) ( Table 3) . Among the 16 dinucleotides, 14 are highly correlated with the first axis value in COA (Table 4 ). These observations indicate that the composition of dinucleotides also plays a key role in the variation found in synonymous codon usage among HAV ORFs.

To study the possible effects of CpG under-representation on codon usage bias of HAV ORFs, the RSCU value of the eight codons that contain CpG (CCG, GCG, UCG, ACG, CGC, CGG, CGU, CGA) were analyzed. These eight codons [CCG (mean 0.05), GCG (mean 0.03), UCG (mean 0.11), ACG (mean 0.10), CGC (mean 0.09), CGG (mean 0.03) and CGU (mean 0.23), GCC (mean 0.13)] were markedly suppressed.

Besides, the position of each codon in each of the four major axes of COA was determined for the 30 HAV ORFs. Table 5 shows the codons for which the maximum and minimum values were obtained for each of the axes studied (i.e. the most divergent codons values), indicating bias in their use by HAV. As it can be seen in the table, most of the divergent codons were triplets coding for Arg.

In order to observe if dinucleotides frequencies may vary among different genotypes, the same studies were repeated using genotype IA and IIIA strains. The results of these studies are shown in Supplementary Material Table 5 . No significant differences were observed among the two genotypes or using all 30 HAV strains representing all known HAV genotypes. A similar study conducted in order to study dinucleotide frequencies in structural and nonstructural regions of the HAV genome also found no significant differences among the different regions of the HAV genome (see Supplementary Material Table 6 ).

The results of these studies revealed that codon usage in HAV ORFs is quite different from that of human genes (see Table 1 ). This is in agreement with previous results found for the capsid structural region of HAV (Sanchez et al., 2003) . In other members of the family Picornaviridae, like Poliovirus or foot-and-mouth disease virus (FMDV) the codon usage is very similar to that of their hosts, implying competence for tRNAs among virus and host (Sanchez et al., 2003) . In these cases, competition is avoided by the induction of cellular shutoff of protein synthesis through carboxy cleavage of translation initiation factor 4G (eIF4G) by 2A and L proteases, respectively (Racaniello, 2001) . HAV lacks mechanisms of inducing cellular shutoff and needs an intact eIF4G factor for the initiation of translation (Racaniello, 2001; Ali et al., 2001) . Moreover, HAV has a very inefficient IRES (Whetter et al., 1994) . For these reasons, HAV may be able to synthesize its proteins by adapting their codon usage to those less commonly used cellular tRNAs. This may also account for its low replicative rate (Pinto et al., 2007; Moratorio et al., 2007) .

In this study, we analyzed synonymous codon usage and nucleotide compositional constraints in HAV ORFs. Interestingly, contrary to previous results found for other viruses such H5N1 Influenza A Virus (mean ENC = 50.91) (Ahn et al., 2006; Zhou et al., 2005) ; SARS (mean ENC = 48.99) (Zhao et al., 2008) ; footand-mouth disease virus (mean ENC = 51.42) (Zhong et al., 2007) ; classical swine fever virus (mean ENC = 51.7) (Tao et al., 2009 ) and Duck Enteritis virus (mean ENC = 52.17) (Jia et al., 2009) , the ENC values found for human HAV are comparatively low (mean ENC value of 39.78), indicating that the overall extent of codon usage bias in HAV is significant. This is in agreement with recent in vitro studies on HAV capsid variability constraints (Aragones et al., 2008) .

A general correlation between codon usage bias and base composition was observed in these studies. Moreover, highly significant correlations between the first axis of COA and GC 3 S and GC 12 values were obtained for all HAV ORFs studied. These results suggest that mutational pressure significantly contributes to the codon usage bias in HAV strains. Nevertheless, as previously suggested for other viral systems, when significant distance among expected and actual ENC values are found, other factors additional to mutational bias may be also contributing to codon usage bias (Shackelton Fig. 1 . Positions of the 30 HAV ORFs in the plot of the first two major axes by correspondence analysis (COA) of relative synonymous codon usage (RSCU) values. The first and second axes account for 45.34% and 10.14% of the total variation, respectively. The HAV ORFs are divided according to their HAV genotype, genotype IA strains are indicated by a white circle ( ), genotype IB by a white square ( ), genotype IIA by a black circle (᭹), genotype IIB by a black square ( ), genotype IIIA by a black diamond ( ) and genotype IIIB by a black triangle ( ). , 2006) . This is in agreement with very recent studies on HAV populations adapted to propagate in cells with impaired protein synthesis in which fine-tuning translation kinetics selection rather than translation selection was identified as the underlying mechanism of codon usage bias in the capsid coding region (Aragones et al., 2010) . Thus, both mutation pressure, as well as selection pressure for correct protein folding, play a critical role shaping HAV codon usage, indicating that HAV genomic bias is multi-factorial. In order to detect possible codon usage variation of different genomes, the HAV ORFs were divided according to their geno- type. Unexpectedly, the distribution of the six genetic groups along the first two major axes in COA showed that different genotypes are distantly located in the plane defined by the first two axes of the analysis (Fig. 1) . Moreover, the frequencies of codon usage of genotype IA and IIIA showed significant differences in Pro and Arg codons (see Supplementary Material Table 3 ). Since species with a close genetic relationship always present a similar codon usage pattern (Sharp et al., 1988) , these findings suggest that codon usage in HAV is undergoing also an evolutionary process, probably reflecting a dynamic process of mutation and selection to re-adapt its codon usage to different environments (see Fig. 1 

Material 3). The structural and non-structural regions of the genome roughly share the same frequencies of codon usage, except for some of the Arg codons (see Supplementary Material Table 4 ). This is in agreement with COA analysis, were most of the divergent codons were triplets coding for Arg (see Table 4 ). This reveals that the use of Arg codons plays also a role in the evolution and the variability observed among HAV strains.

The frequencies of occurrence for dinucleotides were not randomly distributed and most dinucleotides did not follow the expected frequencies in HAV ORFs (Table 3 ). The high correlation found between the first axis of COA and the relative dinucleotide abundances (Table 4) suggests that codon usage in HAV ORFs can also be strongly influenced by underlying biases in dinucleotide frequencies. All CpG containing codons are markedly suppressed (Table 3) in the 30 HAV strains included in the study, confirming what has been very recently noted . Marked CpG deficiency has been also observed in Coronaviruses (Woo et al., 2007) , vertebrate-infecting members of the family Flaviviridae (Lobo et al., 2009) , Poliovirus (Rothberg and Wimmer, 1981) and other RNA viruses (Karlin et al., 1994) . Moreover, polioviruses synthetically deoptimized either by codon deoptimization or codon pair deoptimization are generally marked by a higher content of CpG (and also UpA) dinucleotide (Burns et al., 2006 (Burns et al., , 2009 Mueller et al., 2006; Coleman et al., 2008) , indicating that polioviruses have naturally evolved to eliminate these dinucleotides. CpG deficiency was proposed to be related to the immunostimulatory properties of unmethylated CpG, which were recognized by the host's innate immune system as a pathogen signature (Shackelton et al., 2006; Woo et al., 2007) . Escaping from the host antiviral response may act as another selective pressure contributing to the multifactorial codon usage shaping (Vetsigian and Goldenfeld, 2009 ).

Thus, the results of these studies suggest that HAV genomic biases are the result from the coevolution of genome composition, the need to a controlled translation kinetics and probably the need to escape the antiviral cell responses, and thus is a model in agreement with the evolution rhetoric theory proposed by Vetsigian and Goldenfeld (2009) in which genome biases emerge by the need to increase communication with the ever changing cell environment without changing the message.

Genomic analysis of Influenza A viruses, including Avian Flu (H5N1) strains

Activity of the Hepatitis A virus IRES requires association between the Cap-binding translation initiation factor (eIF4E) and eIF4G

Fine-tuning translation kinetics selection as the driving force of codon usage bias in the Hepatitis A virus capsid

Hepatitis A virus mutant spectra under the selective pressure of monoclonal antibodies: codon usage constraints limit capsid variability

Coding biases and viral fitness

Genetic inactivation of poliovirus infectivity by increasing the frequencies of CpG and UpA dinucleotides within and across synonymous capsid region codons

Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region

Virus attenuation by genome-scale changes in codon pair bias

An evaluation of measures of synonymous codon usage bias

Genetic variability and molecular evolution of Hepatitis A virus

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Theory and Applications of Correspondence Analysis

Initial sequencing and analysis of the human genome

Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus

What drives codon choices in human genes?

Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses?

Ribosome traffic in E. coli and regulation of gene expression

Virus-host coevolution: common patterns of nucleotide motif usage in Flaviviridae and their hosts

Hepatitis A virus: from discovery to vaccines

Bayesian coalescent inference of Hepatitis A virus populations: evolutionary rates and patterns

Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity

Identification of amino acids located in the antibody binding sites of human hepatitis A virus

Antigenic structure of human hepatitis A virus defined by analysis of escape mutants selected against murine monoclonal antibodies

Codon usage and replicative strategies of hepatitis A virus

Picornaviridae: the viruses and their replication

Mononucleotide and dinucleotide frequencies, and codon usage in poliovirus RNA

Evidence for quasispecies distributions in the human Hepatitis A virus genome

Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses

Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens: a review of the considerable within-species diversity

Codon usage in regulatory genes in Escherichia coli not reflect selection for "rare" codons

An evolutionary perspective on synonymous codon usage in unicellular organisms

Neutralization escape mutants define a dominant immunogenic neutralization site on Hepatitis A virus

Analysis of synonymous codon usage in classical swine fever virus

Analysis of codon usage bias and base compositional constraints in iridovirus genomes

Genome rhetoric and the emergence of compositional bias

Free Statistics Software, Office for Research Development and Education, version 1.1.23-r5

Low efficiency of the 5 nontranslated region of hepatitis A virus RNA in directing cap-independent translation in permissive monkey kidney cells

Hepatitis A and the molecular biology of picornaviruses: a case for a new genus of the family Picornaviridae

Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in Coronaviruses

The "effective number of codons" used in a gene

Analysis of synonymous codon usage in 11 Human Bocavirus isolates

Mutation pressures shapes codon usage in the GC-rich genome of foot-and-mouth disease virus

Analysis of synonymous codon usage in H5N1 virus and other Influenza A viruses

Authors acknowledge support by PEDECIBA and Agencia Nacional de Investigación e Innovación (ANII, FCE 2007 722), Uruguay. We acknowledge anonymous reviewers for important suggestions regarding this work.

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.virusres.2011.01.012.