key: cord-0007587-je6oe1h4 authors: Auewarakul, Prasert title: Composition bias and genome polarity of RNA viruses date: 2004-11-18 journal: Virus Res DOI: 10.1016/j.virusres.2004.10.004 sha: 405178382d43ce2210f9573baba85368387e44f9 doc_id: 7587 cord_uid: je6oe1h4 I have observed a relationship between GC content in coding sequences of RNA viruses and their genome polarity. Positive-stranded RNA viruses have significantly higher GC contents than negative-stranded RNA viruses. Coding sequences of all negative-stranded RNA viruses are biased toward high A in coding strands (high T in genomes), while two distinct patterns were observed among positive-stranded RNA genomes. This finding suggests that RNA viruses with different genome polarity are under different mutational pressure, which may be a consequence of the difference in the strategies of viral genome expression and replication. The GC content directly affects the viral codon adaptation index using highly expressed human genes as the reference set, which may theoretically predict the efficiency of viral gene expression in human cells. Genome composition of living organisms can vary widely. This is considered to be the result of the directional mutational bias toward GC or AT (Lobry and Sueoka, 2002; Sueoka, 1988; Sueoka, 1993) . This directional mutational bias could theoretically be due to a bias in the copying error of viral RNA polymerase, selection pressure, or editing by host RNA-editing enzymes. Certain types of hypermutation have been described in a number of viruses (Cattaneo et al., 1988; Vartanian et al., 1994 Vartanian et al., , 2002 , and may also contribute to the viral genome composition. Genome composition bias in viruses has not been systematically analyzed. A global view of the pattern of viral genome composition bias may give us an insight into the complex evolution history of viruses and viral-host interactions. GC content of genome has been shown to be a major contributing factor to the codon usage bias, which could affect expression efficiency (Aota and Ikemura, 1986; Chen et al., 2004; Francino and Ochman, 1999; Ikemura and Wada, 1991; Kanaya et al., 2001) . It is interesting to see how GC content interacts with genome polarity and codon usage bias in RNA viruses. Genome composition and codon usage bias are particularly interesting in RNA viruses because the same RNA may be used as mRNA, genome, or anti-genome. The replication of RNA genome is also very different from DNA replication of the host using different polymerase enzymes and taking place in different environments, which may contribute to the mutational bias that drives the genome composition. RNA viruses with positive-and negative-stranded genome are very different in their strategies of genome expression and replication, which may contribute to mutational bias and selection pressure. To do the analyses, I retrieved compositions of coding sequences of 50 viruses from a codon usage database available at Kazusa Research Institute, Japan (http://www.kazusa.or.jp/codon/cgi-bin/). The viruses were chosen to cover most viral families that cause diseases in human. When distinct separation between human and animal strains can be made, only human strains were included in the analyses, for example human influenza virus A (H3N2). The names of the viruses and their codon composition are shown in Table 1 . There is a significant difference between GC contents of positive-stranded viruses versus negative-stranded viruses (p < 0.01, t-test) (Fig. 1) . The positive-stranded viruses have a mean GC content of 49.8% in their coding sequences, while that of the negative-stranded viruses is 40.4%. I excluded retroviruses from positive-stranded viruses in these analyses because their replication strategies are very different. If the strategies of genome replication and expression could affect mutational pressure or exert a selection pressure on codon usage, it is likely to be different between retroviruses and other positive-stranded RNA viruses. For retroviruses, two distinct patterns were observed: HIV has lower GC content, while HTLV has high GC con- tent (Fig. 1 ). This is in agreement with a previous observation, but the reason for this difference is unclear (Berkhout et al., 2002) . Codon usage bias of many human viruses does not match the pattern for efficient expression in human and has been shown to be driven mainly by GC contents of their genomes (Jenkins and Holmes, 2003) . Expression of viral genes can be restricted by codon usage bias (Haas et al., 1996) , and codon optimization can enhance expression of viral genes and has been used in development of DNA vaccines (Andre et al., 1998) . To study the codon bias in relation to predicted translational efficiency in human cells, I calculated codon adaptation index (CAI) using highly expressed human genes as the reference set (Haas et al., 1996) . This highly expressed codon set has been used successfully for codon optimization of viral genes for efficient expression in human cells (Andre et al., 1998; Haas et al., 1996) . The CAI was designed for predicting the level of expression of a gene and for assessing the adaptation of viral genes to their hosts. A higher CAI value indicates a better codon adaptation. Genes with welladapted codons for efficient translation generally have CAIs of > 0.6 (Sharp and Li, 1987) . The CAI was calculated on a server of evolving code group at the University of Maryland (http://www.evolvingcode.net/codon/CAI Calculator.php). In this set of RNA viruses, GC contents correlated with CAIs with a Pearson correlation coefficient of 0.959 (p < 0.01) (Fig. 2a) . CAIs varied widely among viruses ranging from 0.288 for parainfluenza virus to 0.612 for rubella virus with an average of 0.369. This result confirmed that codon bias of RNA viruses is driven mainly by GC content, and consequently the positive-stranded viruses have higher CAI than the negative-stranded viruses (0.403 versus 0.325, p < 0.001, t-test). Since codons contain different number of GC, amino acid content can be biased by GC content. To determine the influence of GC content on amino acid choice, I counted the number of amino acids Glycine, Alanine, Arginine, and Proline (GARP), of which codons are GC rich. The GARP contents in this set of viruses show a Pearson correlation coefficient of 0.959 (p < 0.01) with GC content (Fig. 2b) . This indicates that amino acid contents in the viral proteins are determined mainly by GC contents of their genomes. I further analyzed coding nucleotide-count of these viruses. Most positive-stranded viruses, HIVs, and All negative-stranded viruses have high A and low C (in the positive-strands), although the positive-stranded viruses show relatively lower level of bias. Some of the positivestranded viruses and HTLVs, on the other hand, have low A and high C (Fig. 3) . The reason for these two opposite pattern of biases is not clear. These patterns of nucleotide biase were similar when first, second, and third positions of codon were analyzed separately (data not shown). This suggested that selection pressure on codon preference is not likely to be the cause of the nucleotide bias. Because a similar pattern (high A and low C) was observed in both positiveand negative-stranded viruses on the same plus strand, i.e. genome of positive-stranded viruses and anti-genome of negative-stranded viruses, the mechanism underlying the bias may be similar and act in a strand-specific manner. Because copying of both strands uses the same viral RNA polymerase and takes place in similar intracellular environment, intrinsic copying error of the enzyme is unlikely to cause the strandspecific nucleotice bias. Recently, a mechanism responsible for G to A hypermutation in HIV-1 by a host innate defense has been discovered (Mangeat et al., 2003; Shindo et al., 2003) . Other types of RNA-editing, some of which target double-stranded RNA, have been also reported in some RNA viruses (Galinski et al., 1992; Polson et al., 1996) . If host responses are also responsible for mutational bias in other RNA viruses, it is possible that they are less effective for positive-stranded RNA genomes as they might be recognized as self mRNA. It is also possible that the minus strand RNA may be the main target of host RNA-editing mechanism. This would explain the strand-specific pattern of nucleotide bias. It might also explain the nucleotide bias difference between the viruses with different genome polarity, because positive-stranded viruses produce only limited amount of minus strand anti-genome, which may be well protected in their replication complex. Negative-stranded viruses, on the other hand, must produce numerous amount of minus strand RNA. While the explanation awaits further exploration, my analysis gives an initial clue to an interaction between strategies of genome replication (genome polarity) and mutational bias (GC content) of RNA viruses. Increased immune response elicited by DNA vaccination with a synthetic gp120 sequence with optimized codon usage Diversity in G + C content at the third position of codons in vertebrate genes and its cause Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure Biased hypermutation and other genetic changes in defective measles viruses in human brain infections Codon usage between genomes is constrained by genome-wide mutational processes Isochores result from mutation not selection RNA editing in the phosphoprotein gene of the human parainfluenza virus type 3 Codon usage limitation in the expression of HIV-1 envelope glycoprotein Evident diversity of codon usage patterns of human genes with respect to chromosome banding patterns and chromosome numbers; relation between nucleotide sequence data and cytogenetic data The extent of codon usage bias in human RNA viruses and its evolutionary origin Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis Asymmetric directional mutation pressures in bacteria Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts RNA editing of hepatitis delta virus antigenome by dsRNA-adenosine deaminase The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications The enzymatic activity of CEM15/Apobec-3G is essential for the regulation of the infectivity of HIV-1 virion but not a sole determinant of its antiviral activity Directional mutation pressure and neutral molecular evolution Directional mutation pressure, mutator mutations, and dynamics of molecular evolution Sustained G → A hypermutation during reverse transcription of an entire human immunodeficiency virus type 1 strain Vau group O genome G → A hypermutation of the human immunodeficiency virus type 1 genome: evidence for dCTP pool imbalance during reverse transcription I would like to acknowledge supports from the Ellison Foundation and the Thailand Research Fund.