key: cord-0753361-h6fwi0qn authors: Li, Q.-G.; Lindman, K.; Wadell, G. title: Hydropathic characteristics of adenovirus hexons date: 1997-07-01 journal: Arch Virol DOI: 10.1007/s007050050162 sha: 4acc93d79bf6e7caf784e9eba57cce742b32a3f1 doc_id: 753361 cord_uid: h6fwi0qn The complete nucleotide sequence and the predicted amino acid sequence of the adenovirus type 7 hexon gene were determined. The hydro-pathy of the hexon proteins from human adenovirus types 2, 3, 4, 5, 7, 12, 16, 40, 41, and 48, bovine adenovirus type 3, murine adenovirus type 1, and avian adenovirus types 1 and 10 was analysed. The presence of purines and pyrimid-ines in the second position of the codons was correlated to hydrophilicity and hydrophobicity, respectively. Comparison of the hydrophilicity plots of eight hexons showed seven hypervariable regions to be distributed on the surface. A large portion of the hypervariable regions manifests hydrophilicity. The strength of the surface charge accumulated on the hydrophilic and hydrophobic regions correlated to the tissue tropism of the different adenovirus types. Analysis of codon usage for adenovirus hexons showed that among synony-mous codons those with cytidine in the third position were preferably used to a great extent. Analysis of the nucleotide and amino acid sequence pair distances and the phylogenetic tree of 14 hexon proteins showed members of subgenera B, D and E to be closely related, especially Ad4 and Ad16, and subgenus A to be closely related to subgenus F. The family Adenoviridae comprises two genera Mastadenovirus and Aviadenovirus. The genus Mastadenovirus consists of 101 species that have been isolated from nine different host species [24] . The most extensively studied group is that of the human adenoviruses. So far, 49 human adenovirus (Ad) serotypes have been identi®ed [27] and divided into six different subgenera, A to F, which differ from each other in various characteristics such as tropism. Adenovirus type 7 (Ad7) is the serotype most frequently associated with severe diseases [17] . Adenoviruses are non-enveloped icosahedral viruses. The virion contains at least eleven different structural polypeptides. The hexon is the most abundantly produced protein. Each virion contains 240 hexons which form the facets of the icosahedron. The nine central hexon capsomers in each facet are cemented together by polypeptide IX to form a group-of-nine [7] . The hexon is a homotrimer consisting of three identical polypeptide chains. The monomer is an unusually large structural protein with a M r ranging from 102±109 k ( Table 2) . As several entire or partial hexon sequences of different serotypes have been published (references cited in Table 2 ), the hexons of several different serotypes have been compared at the nucleotide and predicted amino acid sequence levels [9, 14, 31, 36] . The biochemical and immunological properties and even the threedimensional structure of adenoviruses have been extensively studied. The trimeric hexon molecule has a pseudo-hexagonal base with a large central cavity and a triangular top. The base contains two pedestal domains, P1 and P2. The top contains three long loops L1, L2 and L4 [3] . However, the hydropathic character of the adenovirus hexons and the codon usage has not been analysed. The hydropathy of a protein is very important for predicting putative antigenic regions of the protein. The antigenic determinants can be deduced by searching the amino acid sequence for the areas of greatest local hydrophilicity. Generally, the highest peak of hydrophilicity correctly predicts an antigenic site [11] . Codon usage bias for codons with thymidine (T) or cytidine (C) at second positions in mitochondrial DNA has been shown to be correlated with hydrophobicity [21] . Having analysed nine adenovirus hexon genes, we found the presence of purine and pyrimidine at the second codon position to be correlated to hydrophilicity and hydrophobicity, respectively [19] . This ®nding was recently con®rmed by analysis of further data obtained from GenBank. Here, we report the hydropathy analysis of 14 adenovirus hexon sequences predicted from a newly determined Ad7 hexon DNA sequence and thirteen published hexon sequences of Ad2, Ad3, Ad4, Ad5, Ad12, Ad16, Ad40, Ad41, Ad48, Bav3, Mav1, Fav1 and Fav10. There is strong correlation between hydrophilicity and the codons with purine in the second position (CPUSs), and between hydrophobicity and the codons with pyrimidine in the second position (CPYSs) in adenovirus hexons. Comparison of the surface charge on these 14 hexons suggests the strength of the surface charge accumulated in hydrophilic regions or hydrophobic regions to be correlated with the tissue tropism of the different adenovirus types. The Ad7 prototype strain Gomen, originally from the American Type Culture Collection (ATCC), was obtained from Dr. G. von Zeipel, Stockholm County Council Central Microbiological Laboratory, Stockholm, Sweden [34] . This strain was propagated in A549 cells. The viral DNA of Ad7 strain Gomen was prepared as described [18] . The puri®ed DNA was used as the template for PCR to amplify ®ve fragments which covered the entire hexon gene. The PCR products were cloned into pT7 Blue vector. Ligation, transformation and rapid screening were performed using the pT7 Blue T-vector Kit protocols (Novagen, Madison, WI, USA). The recombinant pT7-Blue plasmid DNA was prepared for sequencing. As the PCR method occasionally gave rise to error when the complementary DNA strand was synthesised using Taq polymerase, the sequence so obtained was con®rmed with the following procedure: The Ad7 hexon DNA Bam HI restriction fragments A and E, which contain the hexon gene, were cloned into plasmid vector pBR322 and multiplied in E. coli strain HB101 using conventional methods described previously [25] . These recombinant plasmid DNAs were used as the sequencing templates. The nucleotide sequences were determined for both strands with the dideoxynucleotide chain determination method, following procedure C of the AutoRead Sequencing Kit (Pharmacia LKB Biotech, Sweden) and the manufacturer's instructions for the ALF DNA Sequencer (Pharmacia). All the sequence data were analysed with the LaserGene software (DNAStar Inc., Madison, WI) programs, Editseq, MegAlign and Protean. Every hexon DNA sequence was translated to protein sequence by using program EditSeq-Translation. The codon usage data could be obtained when a protein ®le was opened by EditSeq. The hydrophilicity plot (Kyte-Doolittle method) could be shown by program Protean. All the eight hydrophilicity plots could be moved together by the program Microsoft PowerPoint (Fig. 1 ). The isoelectric point and the net charge of a protein could be obtained by program Protean-Composition (Table 2) . A tabular data of hydrophilicity and hydrophobicity for each amino acid could be obtained by program Protean-TabularData. The accumulated charge of all hydrophilic regions of a protein could be obtained by deleting manually all the hydrophobic regions according to the tabular data. An easier way was that, ®rst, to delete the hydrophobic regions within DNA sequence; second, to translate this DNA sequence to an amino acid sequence; then, to obtain the accumulated charge of hydrophilic regions by program Protean-Composition. The accumulated charge of hydrophobic regions could be obtained in the same way ( Table 2) . The data of codon usage of accumulated hydrophilic and hydrophobic regions from 14 different serotypes were analysed by the computer program Microsoft Excel (Tables 1 and 5 ). The complete hexon gene DNA sequence of Ad7 prototype strain, Gomen, was determined. It consists of 2 814 nucleotides. This DNA sequence, together with Fig. 3 . The ®gures around the boxes denote the amino acid numbers at the start and the end of each hypervariable region two short¯anking regions at each end, has been entered in the European Molecular Biology Laboratory (EMBL, accession number: Z48571). The 5 H anking region has a splice acceptor site. The 3 H¯a nking region ends just before the start codon of proteinase. The sequence of the predicted protein, consisting of 937 amino acids, was obtained with the LaserGene software program EditSeq. The hydropathy data of hexon proteins from human adenovirus types 2, 3, 4, 5, 7, 12, 16, 40, 41, and 48, bovine adenovirus type 3, murine adenovirus type 1, and avian adenovirus types 1 and 10 were derived using the prediction method of Kyte-Doolittle in the LaserGene computer program Protean. Analysis of the codon usage for the codons corresponding to hydrophilic and hydrophobic regions showed the presence of CPUSs and CPYSs to be strongly correlated with hydrophilicity and hydrophobicity, respectively. Statistical analysis using the chi square test showed a highly signi®cant difference (1 2 192X79Y P`0X001) for the total numbers of CPUSs and CPYSs in hydrophilic and hydrophobic regions ( Table 1) . The multiple sequence alignments of the 14 complete nucleotide sequences and the amino acid sequences of adenovirus hexons were determined by the program MegAlign (data not shown). The nucleotide and amino acid sequence pair distances and the phylogenetic tree of 14 hexon proteins showed serotypes of subgenera B, D and E to be closely related (Table 3 and Fig. 2) . They manifest 82.0±93.9% amino acid sequence homology. Ad4 (subgenus E) is very similar to Ad16 (subgenus B), the amino acid sequence homology between these two serotypes reaching 93.4%. This is in agreement with the cross neutralisation seen between antihexon sera speci®c for Ad4 and Ad16 [22] . The alignment data also showed Ad12 (subgenus A) to be closely related to Ad40 and Ad41 (subgenus F), the level of sequence homology being 81.5±87.1%. In contrast, the sequence similarity of E3a proteins between Ad12 and Ad41 was very low, only 30% in E3 RL1 and 34% in E3 RL2 [37] . The alignment of the 14 complete sequences of adenovirus hexons showed nucleotide sequence homology to be 77±94% within a subgenus of human adenoviruses, 65±77% between members of different subgenera with the exception between Ad4 and Ad16 (86.2%); 64±94% between adenoviruses from the same host species and 35±59% between adenoviruses from different host species; and 35±43% between the two adenovirus genera Mastadenovirus and Aviadenovirus. The alignment of the 14 complete amino acid sequences of adenovirus hexons revealed the existence of seven hypervariable regions (Fig. 3) . Hydrophilicity plots derived from the hexon sequences of eight serotypes representing eight subgenera of Mastadenovirus were compared (Fig. 1) . Seven hypervariable regions were demonstrated (in boxes, corresponding to the boxes in Fig. 3) . The ®rst four hypervariable regions, A1 to A4, covered most of the external loop 1 (L1) of the hexon. They were separated by three short relatively conserved sequences which may stabilise the outer shell structure of the hexon. The L2 contains two hypervariable regions, B1 and B2. All of the hypervariable regions in L1 and L2 contained longer or shorter deletions/insertions. The amino acid sequences of these regions were serotype speci®c) The hypervariable region D of Mastadenovirus consisting of 13±14 amino acids was located in L4. Amino acid sequence homology showed the region D to be subgenus speci®c. Sequence homology was 93±100% for pairs of serotypes within a subgenus, and less than 79% between serotypes belonging to different subgenera. Interestingly, the major portion of the hypervariable regions manifest hydrophilicity. According to the ribbon representation of the Ad2 hexon subunit [3] , the major portion of the hydrophilic regions in hypervariable areas was located at the surface of the hexon molecule. Codons with cytidine in the third position are highly preferred Analysis of codon usage for the 14 serotypes showed that among the synonymous codons those with cytidine in the third position (NNC) were highly preferred (Table 1) . Among CPUSs the bias for NNC codons was strong. However, although the codon preference for NNC among CPYSs was generally manifest, this was not the case for the NNC codons for the hydrophobic amino acids Ile, Leu and Val. The isoelectric point and the surface charge for hexons of the 14 serotypes were deduced with the program LaserGene-Protean-Composition (Table 2) . At pH 7, the charges of the hexon proteins from different subgenera varied widely. The hexons of Ad2 and Ad5 belonging to subgenus C carried the highest negative charge, À26X73 and À22X57, respectively. These charges were about 2.5 times greater than those of hexons of subgenus F, Bav3, Mav1 and Fav10. More interestingly, the strength of the accumulated charge of hydrophilic or hydrophobic regions correlated with tissue tropism. The prototypes of Ad3, Ad7, Ad2 and Ad5 were isolated from respiratory tract specimens. The prototype of Ad3 was isolated from nasal washing, Ad7 from throat washing, and both Ad2 and Ad5 from adenoid tissue cultures [6] . These adenoviruses were frequently isolated from patients with respiratory diseases [26] . The hexons of these four serotypes were characterised by strong accumulated charges of À12X3 to À20X97 in hydrophilic regions, but manifestly weaker charges of only À3X62 to À5X85 in hydrophobic regions. The prototypes of Ad16 and Bav3 were isolated from conjunctival scrapings and conjunctiva, respectively [6] , the hydrophobic regions of their hexons manifest charges of À11X62 and À10X51. Ad4 can be isolated both from the eye and the respiratory tract [16] . The negative charges of the hydrophilic and the hydrophobic regions of the Ad4 hexon were similar, À8X16 and À9X48, respectively. The prototypes of Ad12, Ad40, Ad41 and Fav10 were isolated from faeces or rectal swabs [6, 8, 32] . All four of these enteric adenoviruses were characterised by lower negative charges of À3X81 to À7X78 in both hydrophilic and hydrophobic regions of their hexons. Mav1 isolated from spleen and Fav1 isolated from allantois [6] manifested individually unique patterns of surface charge. The hydropathic type and value of an amino acid is dependent on its codon usage and its position in a protein structure. Scales for evaluating the hydropathic characteristics of amino acids have been developed by many different research groups. However, the hydrophilicity and hydrophobicity values obtained differ substantially [10, 11, 15] . The most frequently used scales, introduced by Kyte and Doolittle were used in the present study. As ranked on hydropathy scales [15] , the 20 common amino acids can be divided into three different classes. The ®rst is the hydrophobic class including Ile, Val, Leu, Phe and Cys. Of these ®ve amino acids, only Cys is encoded by CPUSs, the other four being encoded by CPYSs. The second is the hydrophilic class consisting of His, Glu, Gln, Asp, Asn, Lys and Arg, all of In general, a A G-rich region (A, adenosine; G, guanine) in a nucleotide sequence contains more CPUS codons, and C T-rich region contains more CPYS codons. Therefore, the part of a protein, which is encoded by A G-rich region usually manifests hydrophilicity and a C T-rich region encoded peptide usually shows hydrophobic characteristics. The hydropathy value of an amino acid in a protein chain is dependent on the protein conformation. The residue side-chains can protrude at the interior or exterior portion of a protein chain. The hydropathy value of an amino acid was determined by averaging over a window which contains several consecutive residues surrounding the amino acid in question [15] . Therefore, the strength and even the type of the hydropathy of an individual amino acid may vary according to its location within a peptide chain. In particular, the hydropathy of the amino acids in the intermediate class was found to vary more frequently and in a more pronounced way ( Table 1) . The major portion of hydrophobic sequences in a protein will be found in the interior of the native structure, and the major portion of hydrophilic sequences will be found on the exterior [15] . Branden and Tooze [5] found that the hydropathy plots (Kyte-Doolittle method) agree with the crystal structure data in a polypeptide of R. Sphaeroides. In this study, we found the major portion of the hypervariable areas to manifest hydrophilicity. The major portion of the hydrophilic regions in hypervariable areas is located at the surface of the crystal structure of the hexon. The NNC codons in adenovirus hexons manifested high usage. The result is compatible with that derived from analysis of 28 different genes of Ad2 (except the Phe) [33] . Preference for the synonymous codon for Phe, UUC was greater than UUU in 14 hexon genes (Table 1) . However, the reverse result was true of 28 different Ad2 genes, UUC accounting for 1.44% and UUU was 2.07%. Phe with its aromatic side chain is highly hydrophobic, nonpolar and chemically stable. The variability of the Phe residue is one of the lowest during divergence of homologous proteins [10] . The patterns of codon usage for Phe in hexon genes were shown to be subgenus speci®c ( Table 4 ). The codon usage for Phe in the hexon genes of the species of each subgenus is highly similar within subgenera B, C and F. The codon preferably used for Phe in hexon genes of members of subgenera A, C and F is UUU, whereas, UUC is preferably used in hexon genes of the species of subgenera B, D and E. These ®ndings corroborate the relatedness of the VA RNA 1 genes [13] , and also the relatedness of the hexon genes of human adenoviruses [4] . We found the NNC codons in adenovirus hexon genes to be highly preferred. Analysis of the data obtained from GenBank [33] The C G content of different organisms is used as one of the criteria in taxonomy. Among various eubacteria the C G content of the genome DNA ranges from 25% to 75% [20] . In Adenoviridae the DNA C G content varies from 48±61% for mastadenoviruses and 54±55% for aviadenoviruses [24] . In the six different subgenera of human adenoviruses the C G content of genome DNA ranges from 48% to 58% [35] . The C G content of all 14 serotypes reveals the existence of two groups: higher C G content (54.59±58.23%) group including Ad48, Ad4, Bav3, Fav1 and Fav10, and lower C G content (47.48±50.78%) group including the remaining nine serotypes ( Table 2 ). The C G content of human adenovirus hexons was consistent with the level of C G content of whole adenovirus genome DNA with the exception of subgenus C, Ad2 and Ad5, which has higher C G content (58%) in genome DNA. There are 15 amino acids encoded by synonymous codons which contain NNC (Table 5 ). Of these 15 amino acids, nine manifested NNC high usage, and three (Asp, Ile and Pro) showed at same level of NNC usage, whereas only three (Cys, Leu and Val) showed low NNC usage. Cys is a rare amino UUC 12 30 28 30 20 19 38 36 21 17 18 5 31 40 UUU 37 17 20 19 25 28 13 11 29 30 29 46 15 9 acid and only four cysteines in Ad7 hexon. Among the synonymous codons for Val, seven GUCs appeared in the hydrophobic region. Therefore, the conclusion is that the NNC codons in Ad7 hexon were frequently used although Ad7 is one of the serotypes which contain lower level of C G content in hexon DNAs. Fav1 (CELO) genes encoding for major core, hexon-associated and hexon proteins The gene for the adenovirus 2 hexon polypeptide The re®ned crystal structure of hexon, the major coat protein of adenovirus type 2, at 2.9 A Ê resolution Phylogenetic relationships among adenovirus serotypes Introduction to protein structure Adenovirues. In: American Type Culture Collection (ATCC) Catalogue of Animal & Antisera, Chlamydiae & Rickettsie, 6th ed. American Type Culture Collection The structure of the adenovirus capsid II. The packing symmetry of hexon and its implications for viral architecture Adenoviruses of chickens: serologic groups Analysis of 15 adenovirus hexon proteins reveals the location and structure of seven hypervariable regions containing serotype-speci®c residues Reconstructing evolution from contemporary sequences (chapter 3.3) and The properties of liquid water and the characteristics of noncovalent interactions in this solvent Prediction of protein antigenic determinants from amino acid sequences Sequence homology between bovine and human adenoviruses Human and simian adenoviruses: phylogenetic inferences from analysis of VA RNA genes Adenovirus hexon: Sequence comparison of subgroup C serotypes 2 and 5 A simple method for displaying the hydropathic character of a protein Nosocomial conjunctivitis caused by adenovirus type 4 Analysis of 15 different genome types of adenovirus type 7 isolated on ®ve continents Genetic relationship between thirteen genome types of adenovirus 11, 34 and 35 with different tropisms Hydropathic character analysis of nine adenovirus hexon sequences The guanine and cytosine content of genomic DNA and bacterial evolution Hydrophobicity and phylogeny Immunological relationships between hexons of certain human adenoviruses Sequence characterisation and comparison of human adenovirus subgenus B and E hexons Adenoviridae Molecular cloning World-wide epidemiology of human adenovirus infections Two new candidate adenovirus serotypes Genomic mapping and sequence analysis of the fowl adenovirus serotype 10 hexon gene Nucleotide sequence of human adenovirus type 12 DNA: comparative functional analysis DNA sequence of the adenovirus type 41 hexon gene and predicted structure of the protein The adenovirus type 40 hexon: sequence, predicted structure and relationship to other adenovirus hexons Isolation and classi®cation of avian enteric cytopathogenic agents Codon usage tabulated from the GenBank genetic sequence data Demonstration of three different subtypes of adenovirus type 7 by DNA restriction site mapping Adenoviruses Sequence and structural analysis of murine adenovirus type 1 hexon Genetic organization, size, and complete sequence of eight region 3 genes of human adenovirus type 41 Authors' address: Dr. G. Wadell, Department of Virology, Umea Ê University, S-901 85 Umea Ê, Sweden.Received October 25, 1996