key: cord-0997158-7t4lizde authors: Li, ZhiLiang; Wu, ShiRong; Chen, ZeCong; Ye, Nancy; Yang, ShengXi; Liao, ChunYang; Zhang, MengJun; Yang, Li; Mei, Hu; Yang, Yan; Zhao, Na; Zhou, Yuan; Zhou, Ping; Xiong, Qing; Xu, Hong; Liu, ShuShen; Ling, ZiHua; Chen, Gang; Li, GenRong title: Structural parameterization and functional prediction of antigenic polypeptome sequences with biological activity through quantitative sequence-activity models (QSAM) by molecular electronegativity edge-distance vector (VMED) date: 2007 journal: Sci China C Life Sci DOI: 10.1007/s11427-007-0080-7 sha: db70edfb7b8a15860bbd8e8f73942f404933798b doc_id: 997158 cord_uid: 7t4lizde Only from the primary structures of peptides, a new set of descriptors called the molecular electronegativity edge-distance vector (VMED) was proposed and applied to describing and characterizing the molecular structures of oligopeptides and polypeptides, based on the electronegativity of each atom or electronic charge index (ECI) of atomic clusters and the bonding distance between atom-pairs. Here, the molecular structures of antigenic polypeptides were well expressed in order to propose the automated technique for the computerized identification of helper T lymphocyte (Th) epitopes. Furthermore, a modified MED vector was proposed from the primary structures of polypeptides, based on the ECI and the relative bonding distance of the fundamental skeleton groups. The side-chains of each amino acid were here treated as a pseudo-atom. The developed VMED was easy to calculate and able to work. Some quantitative model was established for 28 immunogenic or antigenic polypeptides (AGPP) with 14 (1–14) A(d) and 14 other restricted activities assigned as “1”(+) and “0”(−), respectively. The latter comprised 6 A(b)(15–20), 3 A(k)(21–23), 2 E(k)(24–26), 2 H-2(k)(27 and 28) restricted sequences. Good results were obtained with 90% correct classification (only 2 wrong ones for 20 training samples) and 100% correct prediction (none wrong for 8 testing samples); while contrastively 100% correct classification (none wrong for 20 training samples) and 88% correct classification (1 wrong for 8 testing samples). Both stochastic samplings and cross validations were performed to demonstrate good performance. The described method may also be suitable for estimation and prediction of classes I and II for major histocompatibility antigen (MHC) epitope of human. It will be useful in immune identification and recognition of proteins and genes and in the design and development of subunit vaccines. Several quantitative structure activity relationship (QSAR) models were developed for various oligopeptides and polypeptides including 58 dipeptides and 31 pentapeptides with angiotensin converting enzyme (ACE) inhibition by multiple linear regression (MLR) method. In order to explain the ability to characterize molecular structure of polypeptides, a molecular modeling investigation on QSAR was performed for functional prediction of polypeptide sequences with antigenic activity and heptapeptide sequences with tachykinin activity through quantitative sequence-activity models (QSAMs) by the molecular electronegativity edge-distance vector (VMED). The results showed that VMED exhibited both excellent structural selectivity and good activity prediction. Moreover, the results showed that VMED behaved quite well for both QSAR and QSAM of poly-and oligopeptides, which exhibited both good estimation ability and prediction power, equal to or better than those reported in the previous references. Finally, a preliminary conclusion was drwan: both classical and modified MED vectors were very useful structural descriptors. Some suggestions were proposed for further studies on QSAR/QSAM of proteins in various fields. In modern sciences and current technologies, of course including life sciences and biological technologies, there exists a tendency from qualitative description to quantitative regularity. Quantitative structure activity relationship (QSAR) is always a very active field of scientific researches, especially in recent study for biological macromolecules such as polypeptides, proteins, genes or nuclei acids. Great progress has been achieved with performing and finishing human genome project (HGP) [1] [2] [3] . How to establish quantitative structure activity relationships (QSARs) between biological sequences and functional activities, i.e. quantitative sequence-activity models (QSAM) has drawn much attention in biological, medical, pharmaceutical and related fields [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] , and great achievements have been obtained in three-dimensional (3D) structural prediction. In 1961, Anfinsen et al. [21] thought that the primary structure completely determined their higher or 3-D structures, which became one of the most important topics. In 1993, Martin et al. [22] further demonstrated that the higher structural information was still entirely contained in the primary structures of proteins. There exist many successful stories [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] although there are still extreme difficulties. Recent comprehensive approaches [22, 33] have been proposed and preliminary achievements have been obtained; among them, many description variables are based on the amino acid side-chains. As compositive segments of different proteins, various peptides are very important in all living systems. They act as hormones, enzyme inhibitors, antibodies, olfaction and taste re-ceptors, antimicrobial compounds or agents, and other biological functions. Hence, they have attracted considerable pharmacological interest in recent years . With development of peptide library, thousands of different peptides have been designed, synthesized and then subjected to a range of screening procedures and biological assays. To effectively use the peptide library, biological data can be analyzed with multivariate quantitative structure-activity relationships. For properties of peptides a precise amino acid sequence is required for a particular function or biological activity. A QSAR model will then indicate how the change in peptide sequence is correlated with the variation in biological activity and how to modify the sequence to achieve the improved activity. The basic assumption in QSAR is that the biological activity within a set is related to the structural variation of the compounds, i.e., biological activity can be modeled as a function of molecular structure. In this context, quantitative amino acid descriptors have shown to be valuable. Since the pioneering work of Sneath [23] , who derived amino acid descriptors from semiqualitative physicochemical data for the 20 coded amino acids and used them in a quantitative sequence-activity model analysis of oxytocin-vasopressine analogues, many amino acid descriptors have been proposed for the 20 coded amino acids [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] . A notable development in QSAR is the use of amino acid "z scores" obtained by principal components analysis (PCA) based on 29 physicochemical variables of 20 coded amino acids [30] [31] [32] . Three resulting principal components (PCs), the so-called principal properties, are linear combinations of the original parameters and primarily represent hydrophobicity, side-chain bulk, and electronic effect of amino acids. The z scores have proven to be useful for modeling some biological activities of small peptides as a function of the z scores. By using only 12 physicochemical variables, Hellberg et al. [31] took a first step toward expanding these scales to encompass 35 non coded amino acids. More recently, the same approach was expanded to more parameters for a larger set of amino acids (20 coded + 67 noncoded). Application of PCA resulted in a set of 5 orthogonal variables termed zz scores, among which the first three corresponded to the original z scores. The zz scores were applied with good results obtained [30, 32] to two peptide data sets, both elastase substrates and neurotensin analogues. However, all the amino acid descriptors mentioned above are derived by PCA from data matrix comprised of hydrophobic, steric, and electronic properties of amino acids. Thus each principal component is still a linear combination of different properties limited to definite physiochemical meanings. In 1985, Kidera et al. [30] collected 188 properties of the 20 natural amino acids and applied factor analysis on these to obtain 10 orthogonal factors that are most important for determining the threedimensional structure of protein. In 1987, Hellberg et al. [31] developed principal properties PP, or z-scores, for 20 natural and more than 110 unnatural amino acids. The z-scores were extracted through PCA from a collected experimental data on various peptides, such as HPLC retention times, pK a 's, NMR-derived properties, and other measurable variables related to hydrophobicity, size, and electronic features. By using z-scores and multivariate statistics, some good regression models were generated for peptide QSARs on oxytocin, bradykinin and substance P receptors or on sweetener peptides by PLS [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] . In 1995, Collantes et al. [34] established good 3D-QSAR models by using three-dimensional descriptors, both Isotropic Surface Area (ISA) and Electronic Charge Index (ECI). In 1999, Zaliani et al. [35] performed QSAR studies on dipeptides with good results from the extracted and condensed steric and electrostatic 3D-properties of the natural amino acids based on 36 statistics indexes. Particularly, Raychaudhury et al. [36] constructed descriptor to perform QSAR studys from the primary structure of polypeptides, and created a well-performed QSAR model. But most of hese suc-cessful reports were involved with complex calculations in structural characterization of the peptides. In our laboratories [37] [38] [39] [40] , based on both the electronegativity of each atom and the distance between these atoms, a new set of descriptors, called the molecular electronegativity distance/edge vector (VMED [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] /VMEE [49, 50] ) to describe the molecular structure of peptides, was proposed only from the primary structure of peptides. Several good quantitative structure activity relationship models were proposed on biological activity of 58 angiotensin converting enzyme (ACE) inhibitors, of 48 bitter tasting dipeptides (BTD), of 31 bradykinin-potentiating pentapeptides (BPP), of 24 tachykinin heptapeptide sequences(TAH) with rabbit pulmonary artery (RPA) activity and of 152 antigenic nonapeptides (AGN) with binding affinities related to HLA-A*0201 restrictive CTL epitopes [52] [53] [54] [55] [56] [57] [58] , and here polypeptides were equal to and/or larger than decapeptides by multiple linear regression (MLR). In order to explain the ability to characterize the molecular structure of polypeptides, QSAR modeling was performed for functional prediction of polypeptide sequences with antigenic activity through quantitative sequence-activity models by the molecular electronegativity distance-edge vector. The obtained results showed that VMED exhibited both eximious structural selectivity and excellent activity prediction. Besides, molecular structure of antigenic polypeptides required to be well expressed in order to propose the automated technique for the computerized identification and/or recognition of helper T lymphocyte (HTL, Th) epitopes. Furthermore, based on both electronic charge index and relative bonding distance of the fundamental skeleton groups as a pseudo-atom, here the side-chain of each amino acids, a modified MED vector was proposed from the primary structure of polypeptides. The developed VMED would be very useful in structural characterization and activity prediction of biological molecules including HTL polypeptide sequences with major histocompatibility antigen (MHC) activity because it was easy to calculate and able to work. Some quantitative model was established for 28 HLA-A*0201 restrictive CTL epitopes or immunogenic or antigenic polypeptides (AGPP) with 14 (1-14) A d and 14 other restricted activities assigned as "1"(+) and "0"(−), respectively, latter covering 6 A b (15 -20) , 3 A k (21-23), 2 E k (24-26), 2 H-2 k (27 and 28) restricted sequences. Stochastic sampling and cross vali-dations were performed to demonstrate good performance. The proposed method may suit for estimation and prediction of both classes I and II for major histocompatibility antigen epitopes. It would be useful in immune recognition of proteins and helpful to design and development of subunit vaccines. Besides, the obtained good results showed that VMED behaved quite well for QSAR and QSAM of poly/oligopeptides, which exhibited good estimation ability and fine prediction capability, equal to or better than those reported in the previous references [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] . It is a general rule in chemistry and physics that the molecular structure determines its property and the molecular property reflects its structure. For the biological molecules in chemobiology, chemogenetics, chemoimmunology and chemopharmacy, their bioactivities are also determined by their molecular structures. The suitable characterization of biological molecular structure is one of the most important fundamental elements in quantitative structure-bioactivity relationships (QSBR). Bioactivities and properties of compounds depend on the types of both composing atoms and bonding conjunctions and furthermore, reflect the results of all atoms' micro-interactions, mainly electronic interaction. Compared to point charge in physics, the interactions between involved atoms can be defined in the following equation: where q i and q j refer to relative Pauling electronegativity (X P ) of the ith and jth atoms versus carbon atom, and d ij refers to the bond-conjunction distance, adding up the number of bond, between the ith and jth atoms, and ε stands for the dielectric constant. Pauling's electronegativity is one of the most important and useful concepts in chemistry and the related fields. It was not defined nor given here due to its wide application and frequent appearance in many textbooks and monographs (see some references, such as (a) Pauling, L, J Am Chem Soc. 1932, 54: 3570-3582; (b) Pauling, L. The Nature of the Chemical Bond, 3rd ed. Ithaca, New York: Cornell University, 1960; and references therein). Its scales were only provided for the related atoms: 2.55 for C, 3.04 for N and 3.44 for O, respectively, which are all non-hydrogen atoms in peptides. When atoms i and j are the carbon atoms, both q i and q j refer to unit: q i = q j = X P(C) /X P(C) =2.55/2.55=1; when either atom i or j is not carbon atom, then either q i or q j is not unit: for the nitrogen atom (N), q i = q j = X P(N) /X P(C) =3.04/2.55=1.192 and for the oxygen atom (O), q i = q j = X P(O) /X P(C) = 3.44/2.55 = 1.349. Add up all interactions between atoms within the equal bond-conjunction distance. That is, adding up the interactions between atoms whose bond-conjunction distance is one, adding up the interactions between atoms whose bond-conjunction distance is two, and so on. Thus a new set of parameters, VMEE (ν in short) have been proposed to characterize the molecular structure of peptides. where ν k refers to the kth descriptor belonging to the ν vector for d ij = k. Generally, the farther the distance between atoms is, the weaker the interaction between them will be. So it seems enough to select ten elements with the farthest being ten bond-conjunction distance to characterize the molecular structure. As for oligo-/poly-peptide structures, such as another dipeptide AG with a molecular graph omitted, the procedure of creating molecular electronegative edge vector [21] [22] [23] is briefly stated, in a similar way as illustrated above, as follows: The distance matrix of non-hydrogen atom in the GA molecule is not shown yet, the number of a certain distance can be known clearly. In the sample molecule AG, for the first element of the ν vector there are altogether 9 groups of interactions between adjacent atoms: 3 groups of carbon-carbon interactions between 1st-2nd, 2nd-4th, 7th-8th atoms, 3 groups of carbon-nitrogen interactions between 2nd-3rd, 4th-6th, 6th-7th atoms, 3 groups of carbon-oxygen interactions between 4th-5th, 8th-9th, 8th-10th atoms. Next, for the second element of the ν vector, there are altogether 11 groups of interactions between atoms with bondconjugation distance being two: 2 groups of carboncarbon interactions between 1st-4th, 4th-7th atoms, 4 groups of carbon-nitrogen interactions between the 1st-3rd, 2nd-6th, 3rd-4th, 6th-8th atoms, 3 groups of carbonoxygen interactions between 2nd-5th, 7th-9th, 7th-10th atoms, one group of nitrogen-oxygen interaction between 5th-6th atoms. Then, for the third to sixth non-zero elements, the calculation is done in the same way, and for the seventh to tenth elements, all zero-valued elements are obtained due to the path account of seven through ten being zero. So all elements, from the first to tenth, of the ν vector can be calculated as ν 1 = 12.6235; ν 2 = 4.3109; ν 3 = 1.9641; ν 4 = 0.9990; ν 5 = 0.53368; ν 6 = 0.1556; ν 7 = ν 8 = ν 9 = ν 10 = 0. Therefore the ν vector of sample molecule GA is: ν=(12.6235, 4.3109, 1.9641, 0.9990, 0.5368, 0.1556, 0, 0, 0, 0). These elements values of the ν vector of any other peptides can also be obtained with similar method. As one of the most complicated and diverse immune systems, MHC is also called the HLA (humanleukocytoantigen, HLA) system. The HLA system possesses 4 types; among them type I consists of HLA-A, HLA-B and HLA-C and widely exists in various tissue cells. All type I MHC molecules are expressed in almost eukaryotic cells including CTL (Cytotoxic T lymphocyte) and infected by outer microbiologics. In order to explain the ability to characterize molecular structure of polypeptides, molecular modeling was further performed for functional prediction of polypeptide sequences with antigenic activity through QSAMs by VMED after extension. The results showed that VMED exhibited both excellent structural selectivity and ascendant activity prediction. Besides, the molecular structure of antigenic polypeptides requires to be well expressed in order to propose the automated technique for the computerized identification and/or recognition of HTL, Th epitopes. Furthermore, by considering the side chain of each amino acid as a pseudo-atom here, a modified MED vector was proposed from the primary structure of polypeptides, based on ECI and relative bonding distance (RBD) of the fundamental skeleton groups. The developed VMED would be very useful in structural characterization and activity prediction of biological molecules because it was easy to calculate and able to work. Some quantitative model was established for 28 AGPP with 14 (1-14) A d and 14 other restricted activities, assigned as "1"(+) and "0"(−), respectively, latter covering 6 A b (15) (16) (17) (18) (19) (20) , 3 A k (21-23), 2 E k (24-26), 2 H-2 k (27 and 28) restricted sequences. VMED is now extended as nonhydrogen atoms i and j, regarded as amino acid side chains; and their corresponding distance d ij is considered as the length of both side chains. The electric charge of non-hydrogen atoms i is placed by its ECI [31] [32] [33] [34] [35] [36] [37] [38] [39] For various antigenic polypeptides i (i=1,2,..,n), the biological activity measured at unrelated conditions, y(i), can be described as a linear combination of the descriptor vector x (i, k) (k=1,2,...,m) correspondingly expressing different features: where e(i) is the statistical residual or measurement noise; and here the descriptor vector x(i,k)(k=1,2,...,m) is justly the above-mentioned ν vector. In this case, the descriptor matrix X refers to the independent descriptive variables and the biological activity matrix Y to the dependent variables or functions. The calibration parameters or combination coefficients, b(k), are usually obtained by indirect calibration methods, i.e., a calibration set consisting of n samples with known measured activities Y (matrix or vector) and the descriptors X (matrix), is used to build up the calibration model or to model the calibration process. Multiple linear regression (MLR) is the most frequently applied direct calibration method for this purpose. Stepwise multiple regression (SMR) is an alternative method. In MLR, a direct regression of X against Y is built up. MLR is performed to solve the above equation to give the calibration model (4) and a prediction model (5): where B is the calibration modeling coefficients; Y un and X un are the unknown biological activities and the calculated descriptor vector or matrix, respectively. All the selected 28 AGPP, from undecopeptide through docosapeptide, are taken from refs. [31] [32] [33] [34] [35] [36] [37] [38] [39] (see Table 1 for details). There are both 14 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) A d and 14 other, non-A d , restricted activities assigned as "1"(+) and "0"(−), respectively, latter covering 6 A b (15) (16) (17) (18) (19) (20) , 3 A k (21-23), 2 E k (24) (25) (26) and H-2 k (27 and 28) restricted sequences. Calculation of VMED for QSAMs was done on a personal computer with the computational programs called MED-LCBMP, written domestically in Turbo C or Vis-ual Basic languages. In order to explain the ability to characterize molecular structure of polypeptides, QSAM modeling was performed for functional prediction of polypeptide sequences with antigenic activity. Before VMED ν was employed to establish a QSAM equation or QSAR model by the multiple linear regression (MLR) technique, all biological activity data should be pretreated as a discrete variable "1" and "0" due to the original activities, "+" for active and "−" for inactive given in literature [35] [36] [37] . In the regression computations, the immunogenic polypeptides were estimated and/or predicted through VMED as active and inactive when the calculated values were near to "1" and "0", respectively (see Table 1 for the results). The obtained results show that VMED exhibits both excellent structural selectivity and prominent activity prediction. Besides, the molecular structure of antigenic polypeptides was well expressed in order to develop automated technique for the computerized identification and/or recognition of HTL, Th epitopes. Furthermore, a modified MED vector was proposed from the primary structure of polypeptides, based on ECI and RBD of the fundamental skeleton groups and/or side chain of each amino acid as a pseudo-atom. The developed VMED would be very useful in structural characterization and activity prediction of biological molecules due to easy calculation and good performance. Some quantitative models were established and tested by both stochastic sampling and cross validations in order to demonstrate preeminent modeling characterization. Stochastic sampling validations were done by arbitrarily selecting 8 samples (see Table 1 for those with the "*"symbol) as the testing prediction set and remaining 20 samples as the training calibration set from all 28 antigenic polypeptides [22] . The estimated and predicted results are shown in the 5th and 11th columns of Table 1 . Only 2 samples (Nos. 5 and 24) were wrongly classified for the case that 20 samples were taken as the training set and were in full agreement with the situation that all 28 samples were taken as the training set, which indicated that the developed model possessed good estimation stability. Besides, all remaining 8 samples were rightly predicted for the testing set, which indicated that the developed model possessed good prediction capability. In ref. [36] , although all the 20 training samples were correctly classified, one testing sample from the remaining 8 ones was mistakenly predicted, which indicated that the referenced model had lower predicting ability. In order to further evaluate validation performance, cross validation with leave-one-out procedure was made by the proposed method with quite good results (see the 5th and 17th columns in Table 1 for details). All these results illustrate that the created model (F) possesses both satisfactory estimation stability and excellent prediction power. Additionally, classification of Th epitopes demonstrates that both active and inactive samples are not all repulsive. In other words, some antigentic peptides with E k restriction do not at all mean its no A drestriction. Actually, in refs. [52] [53] [54] [55] [56] [57] [58] , various subtypes of HMC (DR1, DR7, DR5) can identify the same antigenic peptide, i.e. some epitopes can be both one-restriction (E k ) and another-restriction (A d ). Therefore, the sample (No. 24) "wrongly" classified here, an epitope with E k -restriction, may also behave A d -restriction. Of course, this needs the further validation with experiments. So, it does not mean at all that the classification of reference is unadvisable; but in opposition, this is just a characteristic. The described method, with good results (Y model , Y test ) being very close to those reported in refs. [31] [32] [33] [34] [35] [36] [37] through a much simpler method than the ones in refs. [31] [32] [33] [34] [35] [36] [37] , may be very suitable for both estimation and prediction of classes I and II for MHC epitope of human. It may also be useful and helpful to immune identification and antigenic recognition of both proteins and genes and to design and development of various subunit vaccines. Moreover, the obtained results show that VMED behaves quite well for both QSAR and QSAM of polyand oligopeptides, which exhibit both transcendent estimation ability and prominent prediction power, equal to or better than those reported in the previous references. Finally, a preliminary conclusion may be drawn: both the classical and modified MED vectors are very useful structural descriptor parameters. Some suggestions were proposed for further studies on QSAR/QSAM of proteins and nuclei acids in various fields. Only from the primary structure of peptides, based on Pauling's electronegativity of each atom and the distance between atoms, a new set of descriptors, VMEE, was proposed in our laboratories. Several QSAR models were proposed on biological activity of 58 ACE inhibitors, 48 BTT dipeptides, 31 BPP agents, and 24 rapidus surge kinetin (RSK) heptapeptides, by various samples but effective molecular modeling such as MLR, stepwise multivariate regression (SMR), principal component regression (PCR) and so on. In order to explain the ability to characterize the molecular structure of peptides, a further investigation was carried out on modeling quantitative structure activity relationship of 152 CTL epitopes (antigenic oligopeptides, nonapeptides). Here, the main factors were extracted based on standard regression coefficients of each element and the results were close to or better than literature (see Table 2 and Figure 1 , also see refs. [20] [21] [22] [23] [24] [25] [26] ). Simultaneously, some information of advanced structure can be found from the main influent factors extracted based on the standard regression coefficients. Besides, the developed novel VMEDν has also excellent structural selectivity and ascendant activity estimation and this novel molecular electronegative edge vector, because it can be calculated easily only from the primary structure without requirements of other knowledge about electrostatic or electronic, geometry-steric or stereoscopic, hydrophobic or lipophilic parameters for residues of amino acids, will be useful in structural characterization and activity prediction of biological macromolecules [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] , such as proteins and nuclei acids, due to its high structure selectivity, fine activity correlativity, good computation performance. The related work is in progress. Furthermore, for polypeptides, it will be very heavy, complicated and cockamamie for atom-based fabrication, due to a too large number of atoms and a novel method of VMED ν is then developed by using residue-based construction. Certainly, there are some open problems requiring further consummation: 1) novel methods of both skeleton-and residue-based VMED ν are required to be deeply investigated for molecular structure expression, especially for the case with multifunctional groups, the approach described here needs to be improved further; and local characterization through a given skeleton is worthy to further examination; 2) The developed MED vectors seem suitable for structure expression and QSAR study of oligopeptides based on their primary structure with accurate prediction. However, prediction cu stands for the cumulative correlation coefficients of molecular modeling in the calibration set (n=58); b) Q 2 cu stands for the cumulative correlation coefficients of cross validation in the prediction set (n-1); c) E RMS refers to the rooted mean squares of error; d) n refers to samples, m variables, l latent varibles, nd means not determined; e) Mi stands for QSAM results obtained by SMR-MLR. of protein epitopes or antigenic determinants is involved in molecular immunology and chemical biology, researches and development of vaccines, protection and control of fatal diseases, and some other important problems [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] including design and preparation of vaccines to prevent and control SARS from atypical pneumonia [60] , AIV from bird flu. Further approaches are required to really resolve these difficult QSAR/QSAM problems [60] [61] [62] [63] [64] [65] [66] [67] [68] by quantitative molecular modeling. are thanked for addressing robustness problems and for helpful discussions Human genome-Development of energy on the map A new strategy for genome sequencing International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome Computer-assisted Drug Design-Principle, Method and Application (in Chinese) Quantitative Drug Design: A Critical Introduction Chemical pattern recognition and multivariate analysis for QSAR studies Assessing the accuracy of protein secondary structure How many fold types of protein are there in nature? Influence of substrates on in vitro dephosphorylation of glycogen phosphorylase a by protein phosphatase-1 How good is the prediction of protein structural classes by the component-coupled method? Autophosphorylation kinetics of protein kinases the p21-activated protein kinase PAK2 The mechanism of p21-activated protein kinase 2 autoactivation Modeling protein backbone structure based on C α guiding coordinates Defining topological equivalents in protein structures by means of dynamic programming algorithm Phyletic relationship of proteins based on structure preference factors Molecular recognition: monomer of the yeast transcriptional activator GCN4 recognizes its dimer DNA binding target sites specifically Identification of binding epitope of a monoclonal antibody (Z12) against human TNF-α using computer modeling and deletion mutant technique Epitope prediction based on three-dimensional structure The multi-parameter prediction of protein antigenic determinants The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain The reaction cycle of GroEL and GroES in chaperonin-assisted protein folding Relations between chemical structure and biological activity in peptides Opioid peptides. Pharmacological activity and lipophilic character of dermorphin oligopeptides Quantitative structureactivity relationships of the bitter thresholds of amino acids, peptides, and their derivatives Amino acid side chain parameters for correlation studies in biology and pharmacology 3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: A comparison of CoMFA models based on deduced and experimentally determined active site geometries Amino acids characterization by GRID and multivariate data analysis The quantitative description of amino acid, peptide, and protein properties and bioactivities A statistical analysis of the physical properties of the 20 naturally occurring amino acids Peptide quantitative structure-activity relationships, A multivariate approach Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships Principal property values for six non-natural amino acids and their application to a structure-activity relationship for oxytocin peptide analogues Amino acid side chain descriptors for quantitative structure activity relationship studies of peptide analogues MS-WHIM scores for amino acids: A new 3D-description for peptide QSAR and QSPR studies Topological shape and size of peptides: Identification of potential allele specific helper T cell antigenic sites Approach to estimation and prediction for normal boiling points of alkanes based on a molecular distance-edge vector(MDE), lambt A novel molecular electronegativitydistance vector(MEDV) Novel molecular electronegativity-distance vector for pharmaceutical characterization and application. PhD Dissertation (in Chinese) Novel molecular electronegativity-distance vector for organic characterization and application. Selected 100 Excellent PhD Dissertations Chemical structural parameterization and chemobiological property quantitation of types of organic compounds Structural parameterization and QSAR study of oligopeptides Chemical structural parameterization and chemobiological property quantitation of typical organic compounds Structure expression and function prediction of biologically active compounds. PhD Dissertation (in Chinese) Molecular eletronegativity-distance vector (MEDV-4):A two-dimensional QSAR method for the estimation and prediction of biological activities of estradiol derivatives Eukaryotic promoter prediction Eukaryotic promoter recognition using backpropagation neural network PhD Dissertation On structural parameterization and molecular modeling of peptide analogues by molecular electronegativity edge vector (VMEE): Estimation and prediction for biological activity of dipeptides Applying generalized hydrophobicity scale of amino acids to quantitative prediction of human leukocyte antigen-A*0201-restricted cytotoxic T lymphocyte epitope A new descriptor of amino acids based on the three-dimensional vector of atomic interaction field Sequence pattern common to T cell epitopes Characterization of the insulin A-chain major immunogenic determinant presented by MHC class II I-Ad molecules I-Ad-binding peptides derived from unrelated protein antigens share a common structural motif Self peptide requirement for class II major histocompatibility complex allorecognition The T cell receptor Interaction of an immunodominant epitope with Ia molecules in T-cell activation Production of T-T hybrids from T cell clones. Direct comparison between cloned T cells and T hybridoma cells derived from them QSAR study of steroid benchmark and dipeptides based on MEDV-13 Identification of encoding proteins related to SARS-CoV A new set of amino acid descriptors and its application in peptide QSARs A new sequence pepresentation (FASGAI) as applied in better specificity elucidation for human immunodeficiency virus type 1 protease Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine Using scores of amino acid topological descriptors for quantitative sequence-mobility modeling of peptides based on support vector machine Applying novel molecular electronegativity-interaction vector (MEIV) to QSPR study on collision cross section of singly protonated peptides A new two-dimensional approach to quantitative prediction for collision cross-section of more than 110 singly protonated peptides by a novel moecular electronegativityinteraction vector through quantitative structure-spectrometry relationship studies Molecular electronegative distance vector (MEDV) related to 15 properties of alkanes Novel molecular electronegativity-interaction vector and its application in quantitative prediction for collision cross-section of singly protonated peptides