key: cord-0766133-4afj5us1
authors: Gómez, Mariela Martínez; Tort, Luis Fernando Lopez; de Mello Volotao, Eduardo; Recarey, Ricardo; Moratorio, Gonzalo; Musto, Héctor; Leite, José Paulo G.; Cristina, Juan
title: Analysis of human P[4]G2 rotavirus strains isolated in Brazil reveals codon usage bias and strong compositional constraints
date: 2011-04-30
journal: Infection, Genetics and Evolution
DOI: 10.1016/j.meegid.2011.01.006
sha: b4673cf125af9990a6e8dc146eaec13b3d25b015
doc_id: 766133
cord_uid: 4afj5us1

Abstract The Rotavirus genus belongs to the family Reoviridae and its genome consist of 11 segments of double-stranded RNA. Group A rotaviruses (RV-A) are the main etiological agent of acute viral gastroenteritis in infants and young children worldwide. Understanding the extent and causes of biases in codon usage is essential to the understanding of viral evolution. However, the factors shaping synonymous codon usage bias and nucleotide composition in human RV-A are currently unknown. In order to gain insight into these matters, we analyzed the codon usage and base composition constraints on the two genes that codify the two outer capsid proteins (VP4 [VP8*] and VP7) of 58 P[4]G2 RV-A strains isolated in Brazil and investigated the possible key evolutionary determinants of codon usage bias. The results of these studies revealed that the frequencies of codon usage in both RV-A proteins studied are significantly different than the ones used by human cells. In order to observe if similar trends of codon usage are found when RV-A complete genomes are considered, we compare these results with results found using a dataset of 10 reference strains for whom the complete codes of the 11 segments are known. Similar results were obtained using capsid proteins or complete genomes. The general correlations found between the position of each sequence on the first axis generated by correspondence analysis and the relative dinucleotide abundances indicate that codon usage in RV-A can also be strongly influenced by underlying biases in dinucleotide frequencies. CpG and GpC containing codons are markedly suppressed. Thus, the results of this study suggest that RV-A genomic biases are the result of the evolution of genome composition in relation to host adaptation and the ability to escape antiviral cell responses.

Group A rotaviruses (RV-A) are the main etiological agent of acute viral gastroenteritis in infants and young children worldwide (Aoki et al., 2009; CDC, 2008) . The Rotavirus genus belongs to the family Reoviridae and its genome consist of 11 double-stranded RNA (dsRNA) gene segments encoding six structural (VP) and six non-structural proteins (NSP) (Estes and Kapikian, 2007) . Based on the two genes that codify the outer neutralizing capsid proteins, VP4 and VP7, a widely used binary classification system was established for RV-A that defined G (from VP7, glycoprotein) and P (from VP4, protease-cleaved protein) genotypes (Estes and Kapikian, 2007) . To date, at least 25 G and 32 P genotypes have been identified (Matthijnssens et al., 2009 (Matthijnssens et al., , 2008 Collins et al., 2010; Abe et al., 2009; Ursu et al., 2009; Esona et al., 2010) . Five RV-A G genotypes (G1-G4 and G9) and two P genotypes (P[8] and P[4]) are prevalent worldwide (Santos and Hoshino, 2005; Leite et al., 2008; Iturriza-Gó mara et al., 2009) . Different surveillance studies with RV-A-positive samples have shown that genotype P[4]G2 reemerges in Brazil in 2005, and since then has become predominant in this country (Carvalho-Costa et al., 2006; Gurgel et al., 2007; de Oliveira et al., 2008; Leite et al., 2008; Nakagomi et al., 2008; Mascarenhas et al., 2010) .

Due to the degeneracy of the genetic code, most amino acids are coded by more than one codon. Synonymous codons are not used randomly, and in several organisms natural selection seems to bias codon usage toward a certain subset of optimal codons, mainly in highly expressed genes (Stoletzki and Eyre-Walker, 2007) .

Two major models have been proposed to explain codon usage, the translation related model and the mutational model (Wong et al., 2010) . Translational efficiency or translational accuracy bias may be due to the relationship between local tRNA abundance and A B S T R A C T The Rotavirus genus belongs to the family Reoviridae and its genome consist of 11 segments of doublestranded RNA. Group A rotaviruses (RV-A) are the main etiological agent of acute viral gastroenteritis in infants and young children worldwide. Understanding the extent and causes of biases in codon usage is essential to the understanding of viral evolution. However, the factors shaping synonymous codon usage bias and nucleotide composition in human RV-A are currently unknown. In order to gain insight into these matters, we analyzed the codon usage and base composition constraints on the two genes that codify the two outer capsid proteins (VP4 [VP8*] and VP7) of 58 P[4]G2 RV-A strains isolated in Brazil and investigated the possible key evolutionary determinants of codon usage bias. The results of these studies revealed that the frequencies of codon usage in both RV-A proteins studied are significantly different than the ones used by human cells. In order to observe if similar trends of codon usage are found when RV-A complete genomes are considered, we compare these results with results found using a dataset of 10 reference strains for whom the complete codes of the 11 segments are known. Similar results were obtained using capsid proteins or complete genomes. The general correlations found between the position of each sequence on the first axis generated by correspondence analysis and the relative dinucleotide abundances indicate that codon usage in RV-A can also be strongly influenced by underlying biases in dinucleotide frequencies. CpG and GpC containing codons are markedly suppressed. Thus, the results of this study suggest that RV-A genomic biases are the result of the evolution of genome composition in relation to host adaptation and the ability to escape antiviral cell responses.

ß 2011 Elsevier B.V. major codon preference, wherein a particular codon of an amino acid family pairs most optimally with the most abundant tRNA (Ikemura, 1982) . The discrepancies of codon usage could also be due to genome compositional constraints and mutational biases . Understanding the extent and causes of biases in codon usage is essential to comprehend the interplay between viruses and the immune response (Shackelton et al., 2006) . However, the factors shaping synonymous codon usage bias, like mutational pressure, nucleotide composition or translational selection are currently unknown for human RV-A.

In order to gain insight into these matters, we analyzed the codon usage and base composition constraints of VP4 [VP8*] and VP7 gene sequences of 72 P[4]G2 RV-A strains isolated in Brazil and investigated the possible key evolutionary determinants of codon usage bias. In order to observe if similar trends of codon usage are found when RV-A complete genomes are considered, we compared these results with the ones found using a dataset of reference strains from which the complete sequences of the 11 segments are known. The results of these studies revealed a significant codon usage bias and compositional constraints in the human RV-A strains studied.

A total of 72 diarrheic stool specimens were collected from 1996 to 2009 from children up to 5 years old hospitalized with acute diarrhea. These samples were obtained from children from the States of Acre (AC), Alagoas (AL), Bahia (BA), Espirito Santo (ES), Maranhão (MA), Mato Grosso do Sul (MS), Minas Gerais (MG), Pernambuco (PE), Rio de Janeiro (RJ), Rio Grande do Sul (RS) and Sergipe (SE), and were genotyped as P[4]G2 as previously described (Fischer et al., 2000; Das et al., 1994) . The viral dsRNA was extracted by the glass powder method (Boom et al., 1990) . The dsRNA was reverse transcribed (RT) and amplified by polymerase chain reaction (PCR) using a pair of consensus primers corresponding to a conserved nucleotide sequence of the VP7 (Gouvea et al., 1990; Das et al., 1994) or VP4 (VP8*) (Gentsch et al., 1992; Gó mez et al., 2010) genes. Temperature and time conditions for PCR amplifications were performed as originally described (Gouvea et al., 1990; Gentsch et al., 1992) . Distilled Milli-Q water was used as a negative control in all steps, and recommended manipulations for PCR procedures were carried out as a precaution to avoid falsepositive results.

DNA sequencing was performed with an ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction Kit 1 and an ABI Prism 3730 Genetic Analyzer (both from Applied Biosystems, Foster City, CA, USA). Sequences of the VP4 [VP8]* and VP7 genes were obtained by using the same set of primers utilized in the RT-PCR. For strain names and accession numbers, see Supplementary Material, Table 1 . From the initial 72 stool samples, a total of 58 VP4 [VP8]* and 60 VP7 sequences, 818 and 978 nucleotides inlength, respectively, were obtained.

The relative synonymous codon usage (RSCU) values of each codon in each gene (VP8* or VP7) were determined in order to measure the synonymous codon usage bias (Sharp and Li, 1986) . This was done using the CodonW program (available at: http:// mobyle.pasteur.fr). The RSCU of P[4]G2 RV-A VP8* and VP7 genes were compared with corresponding values of human cells (International Human Genome Sequencing Consortium, 2001) . The effective number of codons (ENC) and the frequency of use of G+C at synonymous variable third positions of codons (GC 3 S) (excluding Met, Trp, and termination codons) were also calculated with CodonW. ENC was used to quantify the codon usage bias of an ORF (Wrigth, 1990; Comeron and Aguade, 1998) . Similarly, the fraction of the G+C nucleotides not involved in the GC 3 S fraction (GC 12 ) was also calculated. All these indices were also calculated using CodonW. Dinucleotides relative frequencies were also calculated using this program as implemented in the Mobyle server (http://mobyle.pasteur.fr).

The relationship between variables and samples can be obtained using multivariate statistical analysis. COA is a type of multivariate analysis that allows a geometrical representation of the sets of rows and columns in a dataset (Wong et al., 2010; Greenacre, 1984) . Each ORF is represented as a 59-dimensional vector and each dimension correspond to the RSCU value of one codon (excluding AUG, UGG and stop codons). Major trends within a dataset can be determined using measures of relative inertia and genes ordered according to their position along the axis of major inertia (Tao et al., 2009) . COA was performed on the RSCU values of the ORFs studied using the CodonW program.

Correlation analysis was carried out using Spearman's rank correlation analysis method (Wessa, 2010 ; available at: www.wessa.net).

Sequences were aligned using the MUSCLE program (Edgar, 2004) .

In order to observe if the codon usage bias found in the outer capsid proteins of P[4]G2 RV-A strains isolated in Brazil, can also be found in other genome regions or considering complete genome codes of human RV-A strains of different genotypes and isolated elsewhere, a new dataset composed of 10 human RV-A reference strains for whom the complete codes of the 11 genome segments are known was constructed. For strain names, genotypes, accession numbers and genomic constellations see Supplementary Material Table 3 .

In order to study the extent of codon usage bias in P[4]G2 RV-A isolated in Brazil, the RSCU values of the codons in VP4 [VP8*] and VP7 ORFs were calculated, and the figures obtained for these genes, comprising a dataset of 58 and 60 sequences, respectively, are shown in Table 1 .

Interestingly, the frequencies of codon usage in both VP4 [VP8*] and VP7 P[4]G2 RV-A ORFs are significantly different in relation to human cells. Particularly, extremely high biased frequencies were found for UUU (Phe), UUA (Leu), GUU and GUA (Val), UCA (Ser), CCA (Pro), GCU (Ala), UAU (Tyr), CAU (His), CAA (Gln), AAU (Asn), AAA (Lys), GAA (Glu), UGU (Cys), AGA (Arg) and GGA (Gly) in both ORFs (see Table 1 ). As can be seen, highly preferred codons are all U/A ending, which strongly suggests that mutational bias is the main force shaping codon usage in these two genes. It is interesting to note that CGC (Arg) is not used in both ORFs.

In order to investigate if these P An ENC-GC 3 S plot (ENC plotted against GC 3 S) can be used as a method that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons (Wrigth, 1990) . As shown in Fig. 1 , the dotted continuous line in the plot represents a curve if codon usage is only determined by GC content at the third codon position. In other words, if GC 3 S is the only determinant factor shaping the codon usage pattern, the values of ENC would fall on a continuous curve, which represents random codon usage (Jiang et al., 2007) . If G+C compositional constraint influences the codon usage, then the GC 3 S and ENC correlated spots would lie on or bellow the expected curve (Tsai et al., 2007) . Otherwise, the codon usage bias of genes may be affected by other factors such as translational selection.

When the GC 3 S values were calculated for VP4 [VP8*] and VP7 ORFs and the ENC-GC 3 S plots constructed (for ENC and GC 3 S values obtained for Brazilian strains enrolled in these studies, see Supplementary Material Table 2 ), all spots lie below and ''parallel'' in relation to the expected curve for both ORFs studied, indicating that the codon usage bias may be influenced by the G+C compositional constraints (see Fig. 1 ).

Since codon usage by its very nature is multivariate, it is necessary to analyze the data using multivariate statistical techniques (i.e. COA) in order to confirm these findings. COA is an ordination technique that identifies the major trends in the variation of the data and distributes genes along continuous axes in accordance with these trends. Moreover, it has the advantage that it does not assume that the data falls into discrete clusters and therefore can represent continuous variation accurately (Greenacre, 1984) . COA creates a series of orthogonal axes to identify trends that explain the data variation, with each subsequent axis explaining a decreasing amount of the variation (Greenacre, 1984) . The correlation between the position on the first axis generated by COA for each gene and the respective GC 3 S values of each strain was analyzed for both VP4 [VP8*] and VP7 ORFs studied. We have found that the position of the sequences on the first axis from COA are highly correlated with the GC 3 S values in both VP4 [VP8*] and VP7 ORFs (r = 0.625, P < 0.0001 and r = À0.469, P < 0.001 for VP4

[VP8] and VP7, respectively). Taking altogether, these results reveal that most of the codon usage bias is directly related to the nucleotide composition. Nevertheless, other factors may be also acting in shaping codon usage bias.

In order to analyze if the codon usage biases reported above can also be found using other genome regions or considering complete genome sequences, a new dataset was constructed composed of 10 human RV-A reference strains, for which the complete genomes of the 11 segments are known. For strains names, genotypes, accession numbers and genomic constellations, see Supplementary Material Table 3 . By concatenation of different genome ORF's sequences, the RSCU values of the different codons were calculated for different virus regions (outer capsid shell proteins, OC, VP4+VP7; intermediate protein shell, IM, VP6; inner capid shell proteins, IC, VP1+VP2+VP3; non-structural proteins, NSP, NSP1+NSP2+NSP3+NSP4+NSP5; and full genome, VP4+VP7+VP6+VP1+VP2+VP3+NSP1+NSP2+NSP3+ NSP4+NSP5, which accounts for a total of 54,318 codons). The results of these studies are shown in Table 2 .

Again, the frequencies of codon usage found in different genomic regions or considering complete genomes of RV-A are significantly different in relation to human cells (see Tables 1 and  2 ). Highly biased frequencies were also found for the same amino acids in all genomic regions or considering full genomes (Table 2 ) and in agreement with the previous results found using outer capsid proteins from P[4]G2 RV-A strains isolated in Brazil. The correlation between the position on the first axis generated by COA and the respective GC 3 S values of each strain was analyzed for the complete genome dataset. A high and significant correlation among the position of the sequences on the first axis of COA and the GC 3 S values (r = À0.9879, P < 0.01) was also found using full, complete genomes.

It has been suggested that dinucleotide biases can affect codon bias (Tao et al., 2009 ). To study this possibility, the relative abundances of the 16 dinucleotides in VP8* and/or VP7 ORFs was established. The results of these studies are shown in Table 3 . As can be seen, the occurrences of dinucleotides are not random and no dinucleotides is present at the expected frequencies.

In the case of VP4 [VP8*] protein, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (i.e. 1.0) (mean AE S.D. = 0.230 AE 0.035 and 0.282 AE 0.009, respectively) and were markedly underrepresented. On the other hand, ApU and ApA are markedly over-used (mean AE S.D. = 1.951 AE 0.033 and 1.979 AE 0.04, respectively) ( Table 3) . Among the 16 dinucleotides, 10 are correlated with the first axis value in COA (P values <0.01, Table 3 In the case of VP7 protein, again, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (mean AE S.D. = 0.397 AE 0.014 and 0.330 AE 0.018, respectively) and were underrepresented. Interestingly, the frequencies of ApU and ApA showed a sharp deviation from the expected frequencies and again we found a markedly over-use of these dinucleotides (mean AE S.D. = 2.056 AE 0.029 and 1.948 AE 0.038, respectively) ( Table 3) . Among the 16 dinucleotides, seven are correlated with the position of the sequences along the first axis in COA (P values <0.01, Table 3 ).

These results indicate that the composition of dinucleotides also determines the variation in synonymous codon usage among P[4]G2 RV-A VP7 ORFs. The RSCU value for the VP7 protein of the 14 codons that contain CpG and GpC (see above) revealed that six [GCG (mean 0.28), CGC (mean 0.00), GCC (mean 0.13), GCG (mean 0.28), AGC (mean 0.26) and GGC (mean 0.24)] were markedly suppressed and five [CCG (mean 0.73), UCG (mean 0.59), CGG (mean 0.64), CGU (mean 0.64) and UGC (mean 0.75)] were slightly suppressed. Besides, the position of each codon in each of the four major axes of COA was determined for both proteins studied. For VP4 [VP8*] ORFs, the first major axis accounted for the 28.67% of the observed variation, while the second, third and fourth axis accounted for the 21.57%, 18.56% and 12.39%, respectively. For VP7 ORFs, the first major axis accounted for the 66.00% of the observed variation; the second, third and fourth major axis accounted for the 14.82%, 8.09% and 2.40% of the observed variation, respectively. Table 4 shows the codons for which the maximum and minimum values were obtained for each of the axes studied (i.e. the most divergent codons values), indicating a strong bias in their use by both VP4 and VP7 proteins. As can be seen, the most divergent triplets tend to be GC-rich (considering the two ORFs, G+C explains 19/24 positions of these codons). Again, this can be explained in terms of a strong mutational bias.

In order to observe if the same results found using outer capsid proteins of P[4]G2 RV-A strains can be found using complete genomes, the same studies were repeated using a dataset of full complete genomes (for strains, accession numbers and genomic constellations, see Supplementary Material Table 3 ). The results of these studies are shown in Supplementary Material Table 4 . Again, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (i.e. 1.0) (mean AE S.D. = 0.360 AE 0.021) and (0.468 AE 0.038, respectively) and were markedly underrepresented. The frequencies of ApU and ApA also showed a sharp deviation from the expected frequencies and were markedly over-used (mean AE S.D. = 1.907 AE 0.069 and 2.089 AE 0.048, respectively). Among the 16 dinucleotides, seven are correlated with the position of the sequences along the first axis in COA (P values <0.01, Supplementary Material Table 4 ). Taking all these results together, it is possible to observe that the composition of dinucleotides also determines the variation in synonymous codon usage in the complete sequences of human RV-A.

The results of these studies revealed that codon usage for VP4 [VP8*] and VP7 in P[4]G2 RV-A is quite different from that of human genes (see Table 1 ). Moreover, this is also observed considering all different genome regions or complete, full genome codes (see Table 2 ). This is in agreement with results found for other viruses such as human immunodeficiency virus 1 (HIV-1) (Grantham and Perrin, 1986; Kypr and Mrazek, 1987) and hepatitis A virus (Aragones et al., 2008) . In other RNA viruses, like poliovirus or foot-and-mouth disease virus (FMDV) the codon usage is very (Sanchez et al., 2003) . In these cases, competition is avoided by the induction of cellular shutoff of protein synthesis through carboxy cleavage of translation initiation factor 4G (eIF4G) by 2A and L proteases, respectively (Racaniello, 2001) . Early during the infection process RV-A also takes over the host translation machinery of the cell, causing a shutoff of cell protein synthesis, although by a different mechanism of picornaviruses. After RV-A infection, the translation initiation factor 2a (eIF2a) becomes phosphorylated and remains in this state throughout the virus replication cycle, leading to a further inhibition of cell protein synthesis (Montero et al., 2008) . However, recent studies have shown that under these restrictive conditions, the viral proteins and some cellular proteins are efficiently translated (Montero et al., 2008) . Whether this extremely different strategy in codon usage among RV-A and human cells is related to this fact is currently unknown, but might allow RV-A to compete successfully for translation of viral RNAs.

We analyzed synonymous codon usage and nucleotide compositional constraints in VP4 [VP8*] and VP7 genes of P[4]G2 RV-A and compare the results found with a dataset of RV-A reference strains from which the complete sequences for the 11 segments were previously known. Interestingly, in contrary to previous results found for other viruses such H5N1 influenza A Virus (mean ENC = 50.91) (Ahn et al., 2006; Zhou et al., 2005) ; SARS (mean ENC = 48.99) (Zhao et al., 2008) ; FMDV (mean ENC = 51.42) (Zhong et al., 2007) ; classical swine fever virus (mean ENC = 51.7) (Tao et al., 2009 ) and duck enteritis virus (mean ENC = 52.17) (Jia et al., 2009) , the ENC values found for human P[4]G2 RV-A are comparatively low (mean ENC values of 37.36 and 40.56 for VP8* and VP7, respectively). Moreover, when the complete genomes are studied (accounting for 54,318 codons), the mean ENC value obtained is 41.60. This indicates that the overall extent of codon usage bias in RV-A genomes is significant.

We observed a general correlation between codon usage bias and base composition was observed, since all spots in the ENC-GC 3 S plot lie below the curve of the predicted values (Fig. 1) . Highly significant correlations between the first axis of COA and GC 3 S values were obtained for both outer surface protein shells. Moreover, concatenation of complete sequences of the 11 segments of 10 reference human RV-A strains also show this significant correlation. All these results strongly suggest that mutational pressure is an important factor in determining codon usage bias in human RV-A. Nevertheless, we cannot completely discard other factors that may also account for codon usage bias.

The frequencies of dinucleotides were not random and no dinucleotides was present at the expected frequencies for both ORFs studied (VP8* and VP7, see Table 3 ). The same results are found using the complete genome dataset (Supplementary  Material Table 4 ). CpG and GpC containing codons are markedly suppressed (see Tables 1 and 2 ). Marked CpG deficiency has been also observed in Coronaviruses (Woo et al., 2007) , vertebrateinfecting members of the family Flaviviridae (Lobo et al., 2009) , poliovirus (Rothberg and Wimmer, 1981) and other RNA viruses (Karlin et al., 1994) . The CpG deficiency was proposed to be related to the immunostimulatory properties of unmethylated CpG, which were recognized by the host's innate immune system as a pathogen signature (Shackelton et al., 2006; Woo et al., 2007) . This is now known to be triggered by the intracellular Pattern Recognition Receptor (PRR) Toll-like 9 (TLR9), which recognizes CpG-unmethylated DNA, and triggers several immune response pathways (Dorn and Kippenberger, 2008) . Since the vertebrate immune system relies on unmethylated CpG recognition in DNA molecules as a sign of infection, and CpG under-representation in RNA viruses is exclusively observed in vertebrate viruses (Lobo et al., 2009) , it is reasonable to suggest that a TLR9-like mechanism exists in the vertebrate immune system which recognizes CpG when in RNA context (such as in the genomes of RNA viruses) and triggers immune responses (Lobo et al., 2009) . Moreover, recent studies on influenza A viruses, which have originated from an avian reservoir and have been infecting human hosts since 1918, were selected under strong pressure to reduce the frequency of CpG in its genome (Greenbaum et al., 2008) .

The results of this work provide a basic knowledge of the mechanisms that give rise to codon usage bias in human RV-A and are also useful in understanding the processes involved in RV-A evolution. Further studies will be needed to reveal more about RV-A viral genome.

Molecular epidemiology of rotaviruses among healthy calves in Japan: isolation of a novel bovine rotavirus bearing new P and G genotypes

Genomic analysis of influenza A viruses, including avian flu (H5N1) strains

Structure of rotavirus outer-layer protein VP7 bound with a neutralizing Fab

Hepatitis A virus spectra under the selective pressure of monoclonal antibodies: codon usage constraints limit capsid variability

Rapid and simple method for purification of nucleic acids

Detection and molecular characterization of group A rotavirus from hospitalized children in Rio de Janeiro, Brazil

Rotavirus surveillance-worldwide

Identification of a G2-like porcine rotavirus bearing a novel VP4 type

An evaluation of measures of synonymous codon usage bias

Theory and Applications of Correspondence Analysis

Characterization of rotavirus strains from newborns in

Reemergence of G2 rotavirus serotypes in Northern Brazil reflects a natural changing pattern over time

Rotavirus International Symposium. Istanbul, Turkey Abstracts

Clinical application of CpG-, non-CpG, and antisense oligodeoxynucleotides as immunomodulators

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Reassortant group A rotavirus from straw-colored fruit bat (Eidolon helvum)

Rotaviruses

Genotype profiles of rotavirus strains from children in a suburban

Identification of group A rotavirus gene 4 types by polymerase chain reaction

Polymerase chain reaction amplification and typing of rotavirus nucleic acid from stool specimens

AIDS virus and HTLV-I differ in codon choices

Patterns of evolution and host gene mimicry in influenza and other RNA viruses

Predominance of rotavirus P[4]G2 in a vaccinated population

Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs

web-enabled reporting and real-time analysis of genotyping and epidemiological data

Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus

Analysis of synonymous codon usage in Aeropyrum pernix K1 and other Crenarchaeota microorganisms

Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses?

Unusual codon usage in HIV

Group A rotavirus genotypes and the ongoing Brazilian experience-a review

Virus-host coevolution: common patterns of nucleotide motif usage in Flaviviridae and their hosts

Identification of two sublineages of genotype G2 rotavirus among diarrheic children in Parauapebas, southern Pará state. Brazil

Full genome-based classification of rotaviruses reveals a common origin between human Wa-like and porcine rotavirus strains and human DS-1-like and bovine rotavirus strains

Rotavirus infection induces the phosphorylation of eIF2a but prevents the formation of stress granules

Apparent extinction of non-G2 rotavirus strains from circulation in Recife, Brazil, after the introduction of rotavirus vaccine

Picornaviridae: the viruses and their replication

Mononucleotide and dinucleotide frequencies, and codon usage in poliovirus RNA

Genome variability and capsid structural constraints of hepatitis A virus

Global distribution of rotavirus serotypes/genotypes and its implication for the development and implementation of an effective rotavirus vaccine

Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses

Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes

An evolutionary perspective on synonymous codon usage in unicellular organisms

Synonymous codon usage in Escherichia coli: selection for translational accuracy

Analysis of synonymous codon usage in classical swine fever virus

Analysis of codon usage bias and base compositional constraints in iridovirus genomes

Molecular analysis of the VP7 gene of pheasant rotaviruses identifies a new genotype, designated G23

Free Statistics Software. Office for Research Development and Education version 1. 1.23-r5

Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in Coronaviruses

Codon usage bias and the evolution of influenza A virus. Codon usage biases of Influenza virus

The ''effective number of codons'' used in a gene

Analysis of synonymous codon usage in 11 human bocavirus isolates

Mutation pressures shapes codon usage in the GC-rich genome of foot-and-mouth disease virus

Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses

We acknowledge support by the Brazilian Federal Agency for Support and Evaluation of Graduate Education (CAPES, Brazil) and Universidad de la Repú blica (Uruguay) through Project CAPES/ UDELAR 006/08. We also acknowledge support by PEDECIBA (Uruguay), Agencia Nacional de Investigació n e Innovació n (ANII) (Uruguay) through a Ph.D. and a M.Sc. fellowship to GM and RR, respectively, and project Fondo Clemente Estable FCE2007_722. We acknowledge support by the National Council for Scientific and Technological Development (CNPq, Brazil). We thank anonymous reviewers of previous versions of this work for important comments.

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2011.01.006.