key: cord-0785450-ehhotvqc authors: Huang, Ping; Yu, ShouYi; Ke, ChangWen title: Stepwise prediction and statistical screening: B-cell epitopes on neuraminidase of human avian H(5)N(1) virus date: 2008-11-28 journal: Chin Sci Bull DOI: 10.1007/s11434-008-0505-0 sha: a6759196de3f0e7751cc6ba28512370671fad309 doc_id: 785450 cord_uid: ehhotvqc The B-cell epitopes of virus are associated with the antiviral drug and the vaccine screening. As the nucleotide sequences of neuraminidase (NA) of stain GD-01-06 were sequenced, we predicted the α-helix and β-fold structure and the indexes of the flexible regions of secondary structure of NA with methods of the Hydrophilicity plot by Kyte-Doolittle, the Surface probability plot by Emini and the Antigenic index by Jameson-Wolf, and then screened statistically the parameters to predict B-cell epitopes by the Hierarchical cluster and the Bivariate correlation and the quartiles with SPSS 13.0. The impact of variation of amino acids in NA on its epitopes was analyzed. The predictive results were evaluated by Wu’s Antigenic Index and SWISS-MODEL. We found that the most possible epitopes on NA were located within or nearby its N-terminal Nos. 120–137, 81–84, 408–415, 273–282, 429–432, 356–368, 46–55, 146–155, 341–350 and 198–209, which were the dominant regions of NA epitopes. Peptide 120–137 including the glycoprotein domain (NGT(126–128)) was first chosen as the B-cell epitopes on NA. NA in H(5)N(1) strain isolated after 2003 lacked in No. 53 amino acid (I), resulting in an increase in the surface flexible region of NA in GD-01-06 and an enlargement to their epitope regions (VEP(46–48) → VEPISNTNFL(46–55)). Conclusively, prediction of the B-cell epitopes on the NA based on multiple parameters is useful for researches on the molecular immunology and drug screening and immuno-prophylaxis. A deletion of No. 53 amino acid (I) in NA in strain GD-01-06 might increase its antigenicity. Infection of human avian H 5 N 1 virus gives rise to human acute respiratory disease, and H 5 N 1 subtype in all influenza A is considered as the most virulent one [1] . Neuraminidase (NA) is the main protein of influenza viron that induces the protective antibody against the virus. As a tetramer of identical subunits of 50 kD, NA cleaves terminal sialic acid from glycoconjugates, such as those on the viral glycoproteins and the surface of target cells in the respiratory tract. NA is a receptor-destroying enzyme, removing sialic acid from carbohydrate chains attached to NA, and releasing the viruses from infected cells. Antibody to NA is capable of inhibiting the infected cell from releasing the viruses. There are 1350 nucleotides encoding 449 amino acids (AA) in NA gene. We isolated strain A/Guangdong/1/2006 (GD-01-06) in March of 2006, and characterized it as the human influenza H 5 N 1 strain. Taking NA of A/Hong Kong/482/1997 (HK-482-97, H 5 N 1 )) as a reference, there were two sig-ARTICLES IMMUNOLOGY nificant differences in NA in all global H 5 N 1 strains during 2003-2006 (including GD-01-06): (i) A deletion of AA 53 (isoleucine, I); (ii) a substitution of AA 49 (C 49 I) [2] . Focusing on B-cell epitopes of A/Memphis/31/98 (H 3 N 2 ), Gulati et al. [3] found that the changes in sequence occurred at amino acid 198, 199, 220 or 221 in two loops of the NA. A change at amino acid 198 had reduced NA activity, but mutations at residue 199, 220, or 221 did not alter the NA activity. Parida et al. [4] predicted the B-cell epitope of H 5 N 1 strain with the hydrphobicity plot by Kyte-Doolittle. At present, there are quite a few methods to predict B-cell epitope, so the predictive results are greatly different with different methods. Bui et al. [5] analyzed all the papers on epitopes in influenza viruses and suggested that the researches in this field be enforced. We sequenced NA gene sequences of GD-01-06, predicted its secondary structure and screened the predictive parameters in B-cell epitopes with molecular-biological software and statistical software to promote the pathogenesis researches and drug screening and immuno-prophylaxis against avian and human H 5 N 1 strain. Primers (P NA-F , 5′-TATTGGTCTCAGGGAGCGAAAG-CAGGAGT-3′; P NA-R , 5′-ATATGGTCTCGTATTAGTA-GAAACAAGGAGTTTTTT-3′) were designed and synthesized in accordance with the gene sequences of human avian H 5 N 1 strains in Southeast Asian emerging during [2004] [2005] [2006] . QIAamp Viral RNA Mini Kit made by QIAGEN in Germany was used to extract RNA from GD-01-06, where the gene fragment was then amplified in RT-PCR with reagents of QIAGEN Sensiscript Reverse Transcriptase and TaKaRa PyroBest Tag. Purification of the PCR products was carried out with QIAGEN Gel Extraction Kit, and sequencing was done by ABI PRISM BigDye Terminator V3.0 Ready Reaction Cycle Sequence Kit in ABI PRISM 3100 Genetic Analyzer [6] . At the same time, the amplifications of the same fragment were carried out with different primers to verify the correctness of the sequence. Based on NA protein sequence of GD-01-06 and analyzed with Protean (a molecular-biological software), the α-helix and the β-fold were analyzed respectively with Garnier-Robson and Chou-Fasman (AG and AC) and Garnier-Robson and Chou-Fasman (BG and BC), and the β-turn respectively with Garnier-Robson and Chou-Fasman (TG and TC), the coil with Garnier-Robson (CG), the flexible regions with Karplus-Schulz (FK), hydrophilicity plot with Kyte-Doolittle (HPK), surface probability plot with Emini (SPE) and antigenic index with Jameson-Wolf (AIJ) were done [7] . With SPSS 13.0, the data obtained from protein software were analyzed [8] . We supposed the "A" to be 1 and "no A" to be 0 in α-helix, the "B" to be 1 and "no B" to be 0 in β-fold, the "T" to be 1 and "no T" to be 0 in β-turn, the "C" to be 1 and "no C" to be 0 in coil, and the "F" to be 1 and "no F" to be 0 in FK. The data in SPE, HPK and AIJ were analyzed with the normality tests and the quartiles. All parameters in data were clustered and correlated with SPSS 13.0. Three continuous amino acids or more functioned as a group in prediction. According to α-helix, β-fold, flexible region, hydrophilicity plot, surface probability plot and antigenic index (AI), a screening procedure was established to predict and screen B-cell epitope of NA. The predictive epitopes were evaluated with Wu'Antigenic Indexes (Wu's AI) [9] . Average AI = Total AI . AminoAcid Number Comparison and analysis on the variation sites of amino acids in NA were carried out between pro-mutation and post-mutation to explore influence on NA protein. At the same time, the model of 3D structure of NA was established with SWISS-MODEL, and was evaluated by Swiss-PdbViewer [10, 11] . With 1350 nucleotides encoding 449 amino acids, the percentage of nucleotide A, C, G and T in NA gene respectively were 29.5, 18.1, 25.5 and 26.9, and the percentage of C+G was 43.6. Compared with NA of HK-482-97 in 1997, there were a deletion of AA 53 (isoleucine, I) and a substitution of AA 49 (C 49 I) in NA in GD-01-06. Isolectric point of NA of GD-01-06 was pH 6.3, and its strongly basic amino acids (K, R), strongly acidic amino acids (D, E), hydrophobic amino acids (A, I, L, F, W, V) and polar amino acids (N, C, Q, S, T, Y) respectively accounted for 9.1% (41/449), 8.0% (36/449), 30.0% (135/449) and 33.6% (151/449). As the secondary structure was predicted with Garnier-Robson and Chou-Fasman, α-helix structure on NA in GD-01-06 respectively accounted for 1.1% (5/449, AG) and 8.7% (39/449, AC); β-fold structure respectively accounted for 54.3 (244/449, BG) and 35.9 (161/449, BC); β-turn structure respectively accounted for 25.6% (115/449, TG) and 39.2% (176/449, TC); the coil structure accounted for 19.4% (87/449, CG). It was found in 11 parameters of NA analyzed by Hierarchical cluster that AG and AC belonged to one cluster; BG and BC belonged to one cluster; TG, TC and CG belonged to one cluster, and AIJ, SPE and HPK belonged to another cluster ( Figure 1 ). According to feature similarity, both AG and AC were α-helix, and might act as a cluster; BG and BC were β-fold, and might act as a cluster; TG, TC and CG were the flexible regions, and might act as a cluster; SPE, HPK and AIJ had protein surface feature, and might be treated as a cluster. In addition, FK was also a flexible region, and was classified into the same cluster with TG, TC and CG. α-helix and β-fold were the secondary structure of rigidity, and were independent (or incompatible with) but complementary to each other in the regions. Eleven parameters of bivariate correlation by Spearman's method are shown in Table 1 , where bivariate correlation coefficients are significant in positive correlation (r>0.50), including FK and SPE (0.558), HPK and AIJ (0.663), HPK and SPE (0.765), AIJ and SPE (0.700); those in negative correlation (r<0.50) include BG and TG (−0.631), BG and AIJ (−0.620), BC and TC (−0.598), BC and AIJ (−0.509). The correlative results of eleven parameters are shown in Table 1 . The parameters of β-fold predicted by two methods (BG and BC, r = 0.444) had great correlation, so predicted by two methods of α-helix (AG and AC, r = 0.344). Correlative coefficient between two methods of β-turn prediction was 0.184. There were significant positive correlations among HPK, SPE and AIJ (r>0.50). Moreover, BG and TC had significant negative correlation (r = −0.631), and BC and TC had significant negative correlation (r = −0.598). It suggested that BG and TC were complementary, and so did BC and TC. For NA, the maximum of HPK was 2.76, the minimum was −2.86, the average was 0.240; and the quartiles in 25%, 50% and 75% respectively were −0.423, 0.265 and 0.940. The Skewness coefficients of HPK and its SE mean were −0.313 and 0.115, respectively, resulting in Z = −2.72; its Kurtosis coefficients and its SE mean were 0.290 and 0.230, respectively, resulting in Z = 1.26. The normality distribution was shown statistically in the data. For NA, the maximum of SPE was 4.66, the minimum was 0.04, the average was 0.89; and the quartiles in 25%, 50% and 75% were 0.330, 0.640 and 1.15, respectively. Its data were shown with skewness distribution. 0.000 0.000 r. 1.000 0.700 AIJ P. 0.000 r. 1.000 SPE P. a) r,correlation coefficient; P, probability. For NA, the maximum of AIJ was 3.4, the minimum was −0.6, the average was 0.69; and the quartiles in 25%, 50% and 75% were −0.20, 0.45 and 1.37, respectively. Its data were shown with skewness distribution. B-cell epitopes were usually located in flexible regions. Predictive parameters of B-cell epitopes of NA included the following 8 parameters: (i) AG and AC; (ii) BG and BC; (iii) TG and TC; (iv) CG; (v) FK; (vi) HPK; (vii) SPE; (viii) AIJ. According to its molecular characteristics, classification and correlation, the procedure of screening predictive parameters included the following 3 steps in detail: (i) Exclusive requirements. There were no α-helix predicted by two methods (AG and AC), and no β-fold predicted by two methods (BG and BC) in epitope region (α-helix and β-fold were usually located in different residue regions). (ii) Selective requirements. The selective requirements included 3 parameters: HPK, SPE and AIJ. The index values that met the following criteria were selected: (1) Any one of the three index values equaled to or more than the second quartile in 3 parameters (0.265, 0.450 and 0.640 in this data, respectively); or (2) any two index values of the 3 index values equaled to or more than the second quartile, plus additional requirement: two of three were positive in TG (or TC), CG and FK. (iii) Chosen residue. Three continuous amino acids or more for each group were chosen as an epitope can-didate and the preliminary prediction was carried out. The comprehensive prediction was conducted according to the above 3 steps. The predictive results of B-cell epitopes were shown in Table 2 . According to Wu's antigenic index (Wu's AI) [10] , the average AI of B-cell epitopes in NA residues was assessed ( Table 2 ). As viewed in Table 2 The According to the above methods, VEP 46 -48 with 0.075 of Wu's AI before the deletion of I 53 (such as HK-485-97) was selected as an epitope; comparatively, VEPISN-TNFL 46 -55 with 0.029 of Wu's AI after the deletion of I 53 (such as GD-01-06) was selected as an epitope. The substitution of No. 49 amino acid (C 49 I) gave rise to a change in the secondary structure of NA in GD-01-06. The impacts of substitution of C 49 I of NA in GD-01-06 were as follows: (i) A characteristic change in 42-54 amino acids; (ii) a decrease of β-turn (TG) in 51-52 amino acids and a decrease of β-turn (TC) in 49 and 50 amino acids and an increase of β-turn (TC) in 53 and 54 amino acids; (iii) an increase of coil (CG) in Nos. 42, 44, 50, 51 amino acids; (iv) a decrease of hydrophilicity plot from 8.08 to 6.08 with a mean of 0.22/per amino acid in 45-53 amino acids; (v) a decrease of antigenic index from 15.00 to 11.15 with a mean of 0.296/per amino acid in 42-54 amino acids; (vi) an increase of Surface probability plot from 4.18 to 5.46 with a mean of 0.213/per amino acid in 46-51 amino acids. It was deduced that a substitution of C 49 I of NA increased SPE value but decreased AIJ value. 3D structure of NA of GD-01-06 was established with SWISS-MODEL [10] . The N 1 structure model was established by amino acid residue from 63 to 447 with 98% identity and its X-ray resolution was 2.5 Å. Nonrigidity structure in NA model accounted for 50.1% (196/385), while 76.1% (108/142) in Table 2 had nonrigidity structure in the N 1 3D model. 3D structure of NA (GD-01-06) was established with Swiss-Pdbviewer ( Figure 2 ). In general, α-helix and β-fold have the characteristic of rigidity structure, which are favorable to the stability of protein; but β-turn and coil are inclined to act as epitopes. The flexible regions may depend on the hydrophilicity, the surface probability and the antigenic index, which incline to act as epitopes too. The epitope chart drawn correctly is critical for researching on pathogenesis and immuno-prophylaxis and immuno-treatment. To predict B-cell epitopes and to synthesize the peptides for experimental confirmation according to the characteristics of B-cell epitopes are both economical and effective [9, 12] . There were higher Wu'AI of glycoprotein of NA (LNDKHSNGTVKDRSPH RT 120 -137 ) in H 5 N 1 strain during 1997-2006 (AI = 0.053), which was possibly acting as B-cell epitopes [6] . A deletion of I 53 of NA resulted in a change in nine of eleven (except for α-helix) and an increase in the flexible region, and the candidate epitope of VEP 46 -48 enlarged to VEPISNTNFL 46 -55 , which indicated that there was a change in antigenicity. A substitution of C 49 I of NA in GD-01-06 resulted in a ARTICLES IMMUNOLOGY change in characteristics of NA in eleven parameters. With comprehensive analysis, the deletion of I 53 of NA resulted in a change in the antigenicity of NA. Substitution of C 49 I in NA gave rise to a change in the features of protein, and 11 parameters had the growth and decline, respectively. Basler et al. [13] studied the Cys residues with Cys-Gly substitution and found that C 49 did not construct the disulfide bond of NA. In this research, the substitution of C 49 I in NA increased the surface probability and decreased the antigenic index, so it was considered that a substitution of C 49 I had little effect on antigenicity of NA. As was warned by the designer of SWISS-MODEL, the result of any modeling procedure is non-experimental and must be considered with care. The N 1 three-dimension structure of NA of GD-01-06 had been established with SWISS-MODEL, which did not mark the epitopes of N 1 protein but indicated the rigid structure. For the top ten of epitopes shown in Table 2 , except for residues 81-84 and 341-350, 86.2% (75/87) of amino acid domains in the rest of residues are of nonrigidity structure in N 1 3D model, meaning that there are high correlation between our predictive results and SWISS-MODEL results, both of which are valuable references. Russell et al. [14] established the 3D models of two groups (G 1 , G 2 ), in which G 1 was aimed at NA of H 5 N 1 strain, N 1 protein. Differences existed between Russell's research and this research. The former was aimed at Oseltamivir (Tamiflu) inhibiting domain on NA against viral infection and the latter focused on B-cell epitopes on NA in viral infection. A few similarities between two papers were as follows: (i) the peptide residue 120-137 in this research was an epitope with the most active activity (Wu's AI = 0.053), while Russell et al. found a cavity region in 3D structure of N 1 protein in G 1 group, which centered at residue 147-152 as a structural domain related to Oseltamivir (Tamiflu), an anti-N 1 drug; (ii) Russell et al found that there were Val 149 , Asp 151 , Arg 156 , etc. related to the cavity region, while this research showed that V 129 , D 131 , R 136 entered into LNDKHSNGTVKDRSPHRT 120 -137 epitope, but the ordinal number of amino acid on N 1 model in Russell's paper were 20 more than those in this paper. This research only focused on standard and objective prediction and screening of the epitopes, and the final conclusion needs to be confirmed by concrete experiments. Although a lot of predictive methods are available at present, molecular biological analysis software has its own merits and demerits. It is necessary to conduct comprehensive analysis. The main predictive parameters are the hydrophilicity plot, the surface probability plot and the flexible regions, and the assisting parameters are α-helix, β-fold, β-turn and coil, which were evaluated by Wu's AI and SWISS-MODEL. The multiple parameters unanimity in prediction is the predictive principle, which helps to avoid the predictive drawback. The appropriate requirements would be verified by further experimental researches. Characteristics and evolutions on neuraminidase genes of human avian H 5 N 1 influenza strains (in Chinese) Antibody epitopes on the neuraminidase of a recent H 3 N 2 influenza virus (A/Memphis/31/98) Computational analysis of proteome of H 5 N 1 avian influenza virus to define T cell epitopeds with vaccine potential Ab and T cell epitopes of influenza A virus, knowledge and opportunities Variation and evolution on NP genes of human avian H 5 N 1 virus strains Prediction foe secondary structure and B-cell epitopes of fusion region in EWSFLI1 protein of Ewing's sarcoma (in Chinese) Application of Statistic Software SPSS 13.0 (in Chinese) A new approach for B-cell epitope prediction in viral proteins The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling An environment for comparative protein modeling Prediction of the B-cell epitope for the S protein of SARS coronavirus (in Chinese) Mutation of neuraminidase cysteine residues yields temperature-sensitive influenza viruses The structure of H 5 N 1 avian influenza neuraminidase suggests new opportunities for drug design