key: cord-0005271-rx0os4ke authors: Wu, G.; Yan, S. title: Prediction of mutations engineered by randomness in H5N1 hemagglutinins of influenza A virus date: 2007-11-02 journal: Amino Acids DOI: 10.1007/s00726-007-0602-4 sha: 1fd124821f7eedea3ffbf092113a33cb8c3e58e0 doc_id: 5271 cord_uid: rx0os4ke This is the continuation of our studies on the prediction of mutation engineered by randomness in proteins from influenza A virus. In our previous studies, we have demonstrated that randomness plays a role in engineering mutations because the measures of randomness in protein are different before and after mutations. Thus we built a cause-mutation relationship to count the mutation engineered by randomness, and conducted several concept-initiated studies to predict the mutations in proteins from influenza A virus, which demonstrated the possibility of prediction of mutations along this line of thought. On the other hand, these concept-initiated studies indicate the directions forwards the enhancement of predictability, of which we need to use the neural network instead of logistic regression that was used in those concept-initiated studies to enhance the predictability. In this proof-of-concept study, we attempt to apply the neural network to modeling the cause-mutation relationship to predict the possible mutation positions, and then we use the amino acid mutating probability to predict the would-be-mutated amino acids at predicted positions. The results confirm the possibility of use of internal cause-mutation relationship with neural network model to predict the mutation positions and use of amino acid mutating probability to predict the would-be-mutated amino acids. The unpredictable mutations in the proteins from influenza A virus not only threaten the world with possible pan-demics=epidemics, but also raise the issue of how to accurately, precisely and reliably predict the mutations. Generally speaking, the simplest and best way for prediction of mutations is to find the cause for the mutation. Then, we could build either a qualitative or a quantitative cause-mutation relationship, by which we could predict the mutations. Nevertheless, the current research on prediction of mutations is going along this line of thought. However, many causes that historically led mutations might never leave any clue due to the great changes in environments. Therefore we would probably have a detailed record of mutations, but a poor record of mutation causes. Moreover, the current version of proteins from influenza A virus might no longer be subject to the causes, which led the mutations in the past, because of the evolution of influenza A virus. The third difficulty is that we cannot determine the historical micro-environment, under which the mutations occurred. However, randomness should play a role in engineering mutations not only because pure chance is now considered to lie at the very heart of nature (Everitt, 1999) and the occurrence of mutation is generally considered a random event (Fitch et al., 1997) , but more importantly because our previous studies show that the randomness is different before and after mutation (Wu and Yan, 2001b , d, e, 2002a , b, 2003a -h, 2004a -c, f, 2005a when using our methods to quantify the randomness within a protein. Actually, randomness simply means that an amino acid with a bigger mutation probability would more easily mutate than an amino acid with a smaller mutation probability. Hence, we can establish a cause-mutation relationship because we have quantified randomness for a partial cause and we have the occurrence or non-occurrence of mutations by comparing parent and daughter proteins along a branch of evolution tree determined by phylogenetics. In addition, we can classify the occurrence or non-occurrence of mutations as unity and zero. This is very suggestive because such a cause-mutation relationship can be switched to the problem of classification, which can be solved using either the logistic regression in statistics (Draper and Smith, 1981; Hosmer and Lemeshow, 2000) or the neural network model (Demuth and Beale, 2001) . Still, we need to solve the problem of prediction of would-be-mutated amino acids, say, which type of amino acid will an amino acid mutate to? This is because our cause-mutation relationship at this moment deals with binary events, that is, the occurrence or non-occurrence of mutations. Here we face a more complicated problem, because there are at least 20 types of amino acids needed to take into account, which would be too difficult to use the classification method and other methods. All these imply that we need at least two steps for accurate, precise and reliable prediction of mutations, (i) the prediction of mutation positions and (ii) the prediction of would-be-mutated amino acids at predicted positions. Along this two-step frame, we have recently applied the logistic regression to predicting the mutation positions Yan, 2006e, f, 2007a-c) and then applied the amino acid mutating probability (Wu and Yan, 2005g, 2006a to predicting the would-be-mutated amino acids at predicted positions in proteins from influenza A virus. The results show our logic very convincing. This leads us to consider using a more powerful classification method, neural network, to enhance the predictability regarding the prediction of mutation positions to further confirm our logic on the cause-mutation relationship before largescale and full detailed studies. In this proof-of-concept study, we attempt to use the neural network to predict the mutation positions and then apply the amino acid mutating probability to predict the would-be-mutated amino acids at predicted positions in 5HN1 hemagglutinin from influenza A virus, because the hemagglutinin is the major surface antigen of influenza virus, against which neutralizing antibodies are elicited during virus infection and vaccination (Wiley and Skehel, 1987) . The hemagglutinins include many subtypes, of which the H5N1 hemagglutinin is the one currently threatening humans. The amino acid sequences and corresponding RNA sequences of 339 H5N1 hemagglutinins from influenza A virus from 1996 to 2005 are obtained from the influenza virus resources (Influenza virus resources, 2006) . As our approach is not familiar with most researchers yet, we will describe the methods in more detail. As the cause-mutation relationship couples three types of quantified randomness developed by us with the occurrence and non-occurrence of mutation, we would expect the model to have three inputs and one output. After elaborations, we finally use the feedforward backpropagation neural network as prediction model (MathWorks Inc., 2001) , whose network structure is 3-6-1 (Fig. 1) , i.e. the first layer contains three neurons corresponding to three inputs (or three elements of input in neural network terminology), the second layer contains six neurons, and the last layer contains one neuron corresponding to the target (output). The transfer functions for three layers are tan-sigmoid, tan-sigmoid and log-sigmoid, respectively. The training algorithm is the resilient backpropagation, which is the fastest algorithm on pattern recognition (Demuth and Beale, 2001) . 1 . The 3-6-1 feedforward backpropagation neural network. Each circle presents a neuron. IWf1g is the input weights, LWf2, 1g is the layer weights to the second layer from the first layer, and LWf3, 2g is the layer weights to the third layer from the second layer. bf1g, bf2g and bf3g are the biases related to each neuron at the first, second, and third layers This quantification is calculated according to permutation, and we have used it to study various proteins (Wu, 1999 (Wu, , 2000a Wu and Yan, 2000a -c, 2001a -c, 2002a -d, 2003a -h, 2004a -e, 2005a -d, f, 2006b , d-f, 2007a . In general, this amino acid pair predictability is very sensitive to the change in neighboring amino acids, and answers why a type of amino acid is adjacent to a certain type of amino acid but not to the others. Besides, the reason for using amino acid pair is that a good signature pattern of a protein must be as short as possible, but the conserved sequence is not longer than four or five residues (Prosite, 2002) . The simplest calculations are as follows. According to the permutation, for example, there are 45 serines (S) and 48 leucines (L) in the 2004 chicken H5N1 hemagglutinin (accession number AY653200), the frequency of amino acid pair ''SL'' is 4 (45=568 Â 48=567 Â 567 ¼ 3.803), that is, the ''SL'' would appear four times in this hemagglutinin, which is also the reference for comparison. Actually we do find 4 ''SL'', so the amino acid pair ''SL'' is predictable and the difference between its actual and predicted frequencies is 0. Again, there are 30 alanines (A) and 39 isoleucines (I) in AY653200 hemagglutinin, and the frequency of random presence of ''AI'' is 2 (30=568 Â 39=567 Â 567 ¼ 2.060), i.e. there would be two ''AI'' in the hemagglutinin. But the ''AI'' appears seven times in reality, so the difference between its actual and predicted frequencies is 5. After such calculations, each amino acid pair has its difference between actual and predicted frequencies. As a point mutation is relevant to a single amino acid, which connects with two neighboring amino acids except for the terminal one and constructs two amino acid pairs, we use the sum of difference between actual and predicted frequencies in two neighboring amino acid pairs to each amino acid. This quantification is calculated according to the occupancy of subpopulations and partitions (Feller, 1968) , and we have used it to study various proteins (Wu and Yan, 2000d , 2001d , e, 2002c -f, 2004f, 2005d , e, 2006c f, 2007a Gao et al., 2006) . In general, this quantification is mainly subject to any change in the position of amino acid, and answers why the majority of amino acids cluster in some regions rather than homogenously distribute along the primary structure of a protein. The quantification is developed along such line of thought, for example, there are two methionines (M) among 142 amino acids in human hemoglobin a-chain (Wu and Yan, 2000d) . With regard to their random distribution, our intuition may suggest that there would be one M in the first half of the chain and another M in the second half, which is true in real-life case. In fact, there are only three possible distributions of Ms in human hemoglobin a-chain, i.e. (i) both Ms are in the first half, (ii) one M is in each half and (iii) both Ms are in the second half. If we do not distinguish either first half or second half but are simply interested both Ms are in both halves or in any half, we will have the probability of 1=2 for each distribution. If we are interested in the distribution probability of three amino acids in a protein, we naturally imagine to group the protein into three parts, and our intuition may suggest that each part contains an amino acid. If we do not distinguish the first, second and third part, actually there are total three types of distributions, i.e. (i) three amino acids are in each part, (ii) two amino acids are in a part and an amino acid in another part, and (iii) three amino acids are in a part. However, the distribution probabilities are different for these three types of distributions, say, 0.2222 for (i), 0.6667 for (ii) and 0.1111 for (iii). Clearly the protein can only adopt one type of distribution for these three amino acids, which is the actual distribution probability, and we may guess that the distribution (ii) is more likely to happen because of its biggest probability, which is the predicted distribution probability. For four amino acids, we will have five distribution probabilities, i.e. (i) each part contains an amino acid, (ii) a part contains two amino acids and two parts contain an amino acid each, (iii) two parts contain two amino acids each, (iv) a part contains an amino acid and a part contains three amino acids, and (v) a part contains four amino acids. Their distribution probabilities are 0.0938 for (i), 0.5625 for (ii), 0.1406 for (iii), 0.1875 for (iv) and 0.0156 for (v). Further, we have seven distributions for five amino acids, we have 11 distributions for six amino acids, we have 15 distributions for seven amino acids, and so on. So we view the positions of each kind of amino acids in a protein as a certain distribution, whose probability can be calculated according to the Feller, 1968) , where r is the number of amino acids, n is the number of parts and is equal to r in our case, r n is the number of amino acids in the n-th part, qn is the number of parts with the same number of amino acids, and ! is the factorial function. In fact, this distribution probability can be referred to the statistical mechanics, which classifies the distribution of elementary particles in energy states according to three assumptions of whether distinguishing each particle and energy state, i.e. Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein assumptions (Feller, 1968) . In plain words, this distribution probability is the probability if we would receive seven letters in a week but the letters distribute randomly. With respect to hemagglutinins in this study, for instance, there are 20 glutamines (Q) in AY653200 hemagglutinin. Their predicted and actual distribution probabilities are 0.0965 and 0.0128, so the ratio of predicted versus actual distribution probabilities is 7.539, whose natural logarithm is 2.0201, which can be assigned to each Q in the sequence. This quantification is calculated according to the translation probability between RNA codons and translated amino acids (Wu and Yan, 2005g, 2006a , and we have used it to study various proteins (Wu and Yan, 2005g, 2006a , e, f, 2007a . In general, this quantification is mainly subject to the future mutation trend, and answers what probability an amino acid mutates to another type of amino acid. This quantification is developed along such line of thought, for example, we are interested in the amino acid threonine and its mutated amino acids with their mutating probability. As the RNA codons have the unambiguous relationship with their translated amino acids, we can extend this question to RNA level, that is, a point mutation in RNA codon leads to the mutation at amino acid level. Threonine is related to RNA codons ACU, ACC, ACA and ACG, the mutation at the first position of ACU can lead ACU to mutate to CCU, GCU and UCU, which correspond to threonine to mutate to proline, alanine, and serine at amino acid level. Similarly, the mutation at second position of ACU can lead threonine to mutate to isoleucine, asparagine, and serine, the mutation at the third position of ACU can lead threonine to mutate to threonine, threonine, and threonine. Taken four RNA codons together, threonine would mutate in such a way, say, 4 alanines þ 2 arginines þ 2 asparagines þ 3 isoleucines þ 2 lysines þ methionine þ 4 prolines þ 6 serines þ 12 threonines. Thus we have the threonine mutating probability to these amino acids, say, For all 20 types of amino acids, we have the amino acid mutating probability in Table 1 . For the calculation of future composition of amino acids, we have the following steps: (i) we would expect that ''A'' has the 12=36 chance of mutating to ''A'' (line 2 in Table 1 ), ''R'' and ''N'' have no chance of mutating to ''A'' (lines 3 and 4 in Table 1 ), ''D'' has 2=18 chance (line 5 in Table 1) , ''C'' has no chance (line 6 in Table 1 ), ''E'' has 2=18 chance, and so on. (ii) Meanwhile, we know that there are 30 ''A' ', 24 ''R'', 47 ''N'', 26 ''D'', 15 ''C'', 40 ''E'', and so on in AY653200 hemagglutinin. (iii) So we can estimate how many ''A'' can be mutated using 30 Â 12=36 þ 24 Â 0 þ 47 Â 0 þ 26 Â 2=18 þ 15 Â 0 þ 40 Â 2=18 þ , and so on. In total, this is the future composition of amino acid ''A''. (iv) After calculating all 20 kinds of amino acids, ''A'' contributes 5.9077% of future composition in the hemagglutinin. (v) On the other hand, ''A'' contributes 5.2817% (30=568) of current composition in AY653200 hemagglutinin. (vi) Thus, we have the ratio of future versus current compositions, for example, the ratio of ''A'' is 1.1185 (5.9077%=5.2817%), which can be assigned to each ''A'' in AY653200 hemagglutinin. The phylogenetics analyzes the evolutionary process of hemagglutinins in question. Along same branch of the evolutionary tree, we can compare the parent and daughter hemagglutinins, the difference between them indicates the occurrence of mutation, which we mark as unity, whereas no difference between them indicates the non-occurrence of mutation, which we mark as zero. Currently, we have no explicit idea to build a cause-mutation relationship between an original amino acid and its mutated amino acids. However, we can make the estimation according to the amino acid mutating probability based on the translation probability between RNA codons and translated amino acids (Wu and Yan, 2005g, 2006a in Table 1 . For instance, if we predict that the possible mutation position is 196, which houses amino acid ''H'', from Table 1 we know that ''H'' has the largest chance of mutating to ''Q'', and the equal chance of mutating to other seven amino acids. In this manner, we make the prediction. The MatLab software (MathWorks Inc., 2001) is used for the model development and prediction. The outlier (3SD) is detected according to Healy (1979) . The calculations of prediction sensitivity, specificity and total correct rate are according to the published method (Systat Software Inc., 2004). Perhaps, we could stratify the model development into following steps, establishing the model, finding model parameters and determining if the model can explain or capture the data. With respect to neural network, the model parameters are the weights and biases, which need to be determined Table 1 . Amino-acid mutating probability Amino acid Mutated amino acid with its translation probability A Alanine; R arginine; N asparagine; D aspartic acid; C cysteine; E glutamic acid; Q glutamine; G glycine; H histidine; I isoleucine; L leucine; K lysine; M methionine; F phenylalanine; P proline; S serine; T threonine; W tryptophan; Y tyrosine; V valine After a huge amount of calculations, we have three inputs and one target in each amino acid for all parent hemagglutinins. Table 2 shows such a fraction of a hemagglutinin, where each amino acid is associated with three inputs and one target determined by comparing two 2004 chicken hemagglutinins (AY653200 and DQ080022). This format is used for input into computer for training the neural network. After trying different neural network models with different numbers of layers, neurons, transfer functions, training algorithms, the 3-6-1 feedforward backpropagation neural network appears to be a suitable model without compromising predictability (Fig. 1) , the tan-sigmoid, tan-sigmoid and log-sigmoid as suitable transfer functions and the resilient backpropagation as suitable training algorithm. In principle, the cause-mutation relationship exists between three inputs and target, and we hope the neural network can model this implicit relationship. When using a pharmacokinetic model to fit the drug concentration-time curve, the initial model parameters can be determined through various methods. However, we have to use the random initialization function to initiate the neural network weights and biases because no historical data on the initial weights and biases are available for our neural network. The question raised here is whether the neural network can converge during its training with a limited number of epochs. Figure 2 shows the convergence of mean squared error performance function with 100 different initial weights and biases generated by random initialization function in using DQ334760 hemagglutinin. As seen, the neural network converges during its training within 250 epochs although the initial weights and biases were randomly generated by the initialization function. Hence, we can use the random initialization function to train the neural network to find the suitable weights and biases. In order to determine whether the neural network model can capture the cause-mutation relationship, we compare the predicted with the actual mutation positions by classifying the predicted mutation positions as the positives, false positives, negatives and false negatives. Then we calculate the prediction sensitivity, specificity and total correct rate (Fig. 3) . As seen, the prediction specificity and total correct rate are quite high while the prediction sensitivity is low. Until this point, we are step by step approaching to the possibility of using neural network to predict the mutation positions. With this possibility in mind, we used the trained weights and biases to predict the mutation positions, and then predict the would-be-mutated amino acids. Figure 4 shows this two-step prediction in DQ497705 hemagglutinin, A=duck=Vietnam=283=2005 (H5N1). The solid line in the lower panel is the predicted mutation probability by the neural network, and the dash-dotted line is the cut-off mutation probability of 0.5, that is, the amino acid whose mutation probability is larger than Fig. 2 . Convergence of mean squared error performance function with 100 different initial weights and biases generated by random initialization function in using DQ334760 hemagglutinin Fig. 3 . Prediction sensitivity, specificity and total correct rate for the self-validation. The data are presented as mean AE SD (n ¼ 110). The sensitivity is equal to the predicted positives=the actual mutations (%), the specificity is equal to the predicted negatives=the actual non-mutations (%), and the total correct rate is equal to (predicted positives þ predicted negatives)=length of hemagglutinin (%) 0.5 risks mutation. For this hemagglutinin, there are four positions whose mutation probability is larger than 0.5. At these four positions, the would-be-mutated amino acids are predicted using the amino acid mutating probability in Table 1 , which is the upper panel of Fig. 4 . The preparedness for possible pandemics of influenza is currently conducted along various approaches, of which the modeling is playing its role in this battle against influenza A virus. A prominent approach in developing inhibitors is conducted at several levels. At receptor protein level, the modeling helps to determine the ''binding pocket'' of the receptor protein with its ligands (Chou et al., 1997 (Chou et al., , 1999 (Chou et al., , 2000 (Chou et al., , 2003 Chou, 2004a Chou, -e, 2005a Chou, , b, 2006 Li et al., 2007; Wang et al., 2007a, c) . At ''cleavage-site'' level, the modeling is trying to find the target residue for mutagenesis (Poorman et al., 1991; Chou, 1993a Chou, -c, 1996 Elhammer et al., 1993; Thompson et al., 1995) . Upon two levels above, it is generally possible to find the target residues, the next level study is directed to the mutagenesis and the designing of effective inhibitors (Althaus et al., 1993a-c; Chou et al., 1994; Du et al., 2005a Du et al., , b, 2007a Gan et al., 2006; Gao et al., 2007; Wei et al., 2007) . The fourth level of modeling is the determination of 3D structure of binding interaction in proteins of interests (Wei et al., 2006a, b; Wang et al., 2007b) . In this approach, an important concept is the ''binding pocket'', which is the cornerstone for modeling. According to Chou et al. (1999) , the binding pocket was defined by those residues that have at least one heavy atom (i.e. an atom other than hydrogen) with a distance 5 Å from a heavy atom of the ligand. Such a definition has been widely and successfully used for investigating various protein-ligand interactions (see, e.g. Chou et al., 2000; Chou, 2004a Chou, d, 2005a Sirois et al., 2004; Du et al., 2005a Du et al., , b, 2007b Wei et al., 2005 Wei et al., , 2006a Wei et al., , b, 2007 Zhang et al., 2006; Gao et al., 2007; Li et al., 2007; Wang et al., 2007a, c) . However, it is highly likely that the random power plays a continuous role because randomness suggests the maximal probability of occurrence, by which a protein would be constructed with the least time-and energyconsuming, which could meet the speed of rapidly changing environments, although nature can deliberately spend more time and energy to construct an absolutely necessary structure. Hence, our quantifications at least describe the random power engineering mutations. With respect to the prediction of mutation positions, we have the following issues that need to be addressed in future. (1) How can we measure whether the model captures a cause-mutation relationship? Generally we use the correlation coefficient in linear regression between measured and predicted data to make the judgment, which is suited when measured and predicted data are paired. However, this is not the case for actual and predicted mutation positions, because, for example, the actual mutation position is 499 in AF102674 hemagglutinin, while the predicted mutation position is 500. To the best of our knowledge, we cannot pair them, which lead to the asymmetry between actual and predicted positions and the difficulty to use the correlation coefficient of linear regression to evaluate the prediction performance. (2) Although the low sensitivity in Fig. 3 suggests several possibilities such as the mutations related to few mutations, external causes, sampling strategy, etc., the essential problem is that we have no method to measure the performance that the actual position is 499 whereas the predicted position is 500. This distance might be tolerable for proteins as long as hemagglutinin, but might not be so for proteins as short as human hemoglobin a-chain. (3) Another issue related to the measurement of performance is that the number of predicted is not equal to the number of actual mutation positions. For example, there are three mutations in AF102674 hemagglutinin, while the model captures two mutation positions. At this stage, we have yet to develop the method to measure them. Traditionally, we divide the dataset into training, test and validation in neural network modeling, however we consider such division too early because we have yet to have the method to measure the performance regarding actual and predicted positions. However, our approach is promising because it is based on the kinetics, which drives mutations, while the current methods, which search the similar patterns, sequences, signature, etc., in various databases, are more or less based on phenomenon law. The phenomenon observation is very important, by which we can build a dynamic model as the Kelper's laws describe the dynamics of planetary motion. On the other hand, the kinetic deduction is also very important, by which we can build a kinetic model as the Newton's laws describe the kinetics of planetary motion. Moreover, the dynamic model based on phenomenon observation is more suitable to deal with the repeated events, but is less powerful when dealing with the evolutionary process, which in general cannot be reversed. By contrast, the kinetic model can deal with both repeated events and evolutionary process if we can properly define the driving force behind them. Hence, our approach not only has the advantage of quantifying proteins but also has the advantage of kinetic modeling. Also the predicted number of mutation positions is reasonable in Fig. 4 , because four mutations are similar to the prediction we made using the fast Fourier transform to timing the mutation (Wu and Yan, 2005f) . Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E Kinetic studies with the nonnucleoside HIV-1 reverse transcriptase inhibitor U-88204E The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus proteasecleavable sites in proteins Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins Prediction of HIV protease cleavage sites in proteins Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor Insights from modelling the tertiary structure of BACE2 Molecular therapeutic target for type-2 diabetes Structural bioinformatics and its impact to biomedical science Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5 Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein Modeling the tertiary structure of human cathepsin-E Prediction of the tertiary structure and substrate binding site of caspase-8 Steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases Prediction of the tertiary structure of a caspase-9=inhibitor complex A model of the complex between cyclin-dependent kinase 5(Cdk5) and the activation domain of neuronal Cdk5 activator Progress in computational approach to drug development against SARS Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS Neural Network Toolbox for Use with MatLab Molecular modelling and chemical modification for finding peptide inhibitor against SARS CoV Mpro Application of bioinformatics in search for cleavable peptides of SARS-CoV Mpro and chemical modification of octapeptides Inhibitor design for SARS coronavirus main protease based on ''distorted key theory Analogue inhibitors by modifying oseltamivir based on the crystal neuraminidase structure for treating drug-resistant H5N1 virus The specificity of UDP-GalNAc: polypeptide Nacetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides New York Feller W (1968) An introduction to probability theory and its applications Long term trends in the evolution of H(3) HA1 human influenza type A Synthesis and activity assess of an octapeptide inhibitor designed for SARS coronavirus main proteinase Pattern of positions sensitive to mutations in human haemoglobin a-chain Agaritine and its derivatives are potential inhibitors against HIV proteases Outliers in clinical chemistry quality-control schemes Applied logistic regression Computational studies of the binding mechanism of calmodulin with chrysin MatLab -the language of technical computing A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base A dictionary of protein sites and patterns user manual Virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points Systat for windows Neural network prediction of the HIV-1 protease cleavage sites 3D structure modeling of cytochrome P450 2C19 and its implication for personalized drug design Insights from modeling the 3D structure of NAD(P)H-dependent D-xylose reductase of Pichia stipitis and its binding interactions with NAD and NADP Study of drug resistance of chicken influenza A virus (H5N1) from homology-modeled 3D structures of neuraminidases Theoretical studies of Alzheimer's disease drug candidate [(2,4-dimethoxy) benzylidene]-anabaseine dihydrochloride (GTS-21) and its derivatives Insights from modeling the 3D structure of H5N1 influenza virus neuraminidase and its binding interactions with ligands Anti-SARS drug screening by molecular docking Molecular insights of SAH enzyme catalysis and their implication for inhibitor design The structure and function of the hemagglutinin membrane glycoprotein of influenza virus The first and second order Markov chain analysis on amino acids sequence of human haemoglobin a-chain and its three variants with low O 2 affinity Frequency and Markov chain analysis of amino acid sequence of human glutathione reductase Frequency and Markov chain analysis of amino acid sequence of human tumor necrosis factor Frequency and Markov chain analysis of amino acid sequences of mouse p53 Frequency and Markov chain analysis of the amino acid sequence of human alcohol dehydrogenase a-chain Frequency and Markov chain analysis of the amino acid sequence of sheep p53 protein The first, second and third order Markov chain analysis on amino acids sequence of human tyrosine aminotransferase and its variant causing tyrosinemia type II The first, second, third and fourth order Markov chain analysis on amino acids sequence of human dopamine b-hydroxylase Frequency and Markov chain analysis of amino acids sequence of human platelet-activating factor acetylhydrolase asubunit and its variant causing the lissencephaly syndrome Prediction of two-and three-amino acid sequence of human acute myeloid leukemia 1 protein from its amino acid composition Prediction of two-and three-amino acid sequences of Citrobacter Freundii b-lactamase from its amino acid composition Prediction of distributions of amino acids and amino acid pairs in human haemoglobin a-chain and its seven variants causing a-thalassemia from their occurrences according to the random mechanism Frequency and Markov chain analysis of amino acid sequences of human connective tissue growth factor Prediction of presence and absence of two-and three-amino acid sequence of human monoamine oxidase B from its amino acid composition according to the random mechanism Prediction of presence and absence of two-and three-amino acid sequence of human tyrosinase from their amino acid composition and related changes in human tyrosinase variant causing oculocutaneous albinism Analysis of distributions of amino acids, amino acid pairs and triplets in human insulin precursor and four variants from their occurrences according to the random mechanism Analysis of distributions of amino acids and amino acid pairs in human tumor necrosis factor precursor and its eight variants according to random mechanism Determination of amino acid pairs sensitive to variants in human low-density lipoprotein receptor precursor by means of a random approach Estimation of amino acid pairs sensitive to variants in human phenylalanine hydroxylase protein by means of a random approach Random analysis of presence and absence of twoand three-amino acid sequences and distributions of amino acids, twoand three-amino acid sequences in bovine p53 protein Analysis of distributions of amino acids in the primary structure of apoptosis regulator Bcl-2 family according to the random mechanism Analysis of distributions of amino acids in the primary structure of tumor suppressor p53 family according to the random mechanism Randomness in the primary structure of protein: methods and implications Analysis of amino acid pairs sensitive to variants in human collagen a5(IV) chain precursor by means of a random approach Determination of amino acid pairs in human haemoglobulin a-chain sensitive to variants by means of a random approach Determination of amino acid pairs in human p53 protein sensitive to mutations=variants by means of a random approach Determination of amino acid pairs in Von Hippel-Lindau disease tumour suppressor (G7 protein) sensitive to variants by means of a random approach Determination of amino acid pairs sensitive to variants in human b-glucocerebrosidase by means of a random approach Determination of amino acid pairs sensitive to variants in human Bruton's tyrosine kinase by means of a random approach Determination of amino acid pairs sensitive to variants in human coagulation factor IX precursor by means of a random approach Prediction of amino acid pairs sensitive to mutations in the spike protein from SARS related coronavirus Amino acid pairs sensitive to variants in human collagen a1(I) chain precursor Susceptible amino acid pairs in variants of human collagen a1(III) chain precursor Determination of amino acid pairs sensitive to variants in human copper-transporting ATPase 2 Fate of 130 hemagglutinins from different influenza A viruses Potential targets for anti-SARS drugs in the structural proteins from SARS related coronavirus Determination of sensitive positions to mutations in human p53 protein Amino acid pairs susceptible to variants in human protein C precursor Mutation features of 215 polymerase proteins from different influenza A viruses Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different species Searching of main cause leading to severe influenza A virus mutations and consequently to influenza pandemics=epidemics Prediction of mutation trend in hemagglutinins and neuraminidases from influenza A viruses by means of cross-impact analysis Timing of mutation in hemagglutinins from influenza A virus by means of unpredictable portion of amino acid pair and fast Fourier transform Determination of mutation trend in proteins by means of translation probability between RNA codes and mutated amino acids Determination of mutation trend in hemagglutinins by means of translation probability between RNA codons and mutated amino acids Fate of influenza A virus proteins Timing of mutation in hemagglutinins from influenza A virus by means of amino acid distribution rank and fast Fourier transform Mutation trend of hemagglutinin of influenza A virus: a review from computational mutation viewpoint Prediction of possible mutations in H5N1 hemagglutinins of influenza A virus by means of logistic regression Prediction of mutations in H5N1 hemagglutinins from influenza A virus Improvement of model for prediction of hemagglutinin mutations in H5N1 influenza viruses with distinguishing of arginine, leucine and serine Improvement of prediction of mutation positions in H5N1 hemagglutinins of influenza A virus using neural network with distinguishing of arginine, leucine and serine Prediction of mutations initiated by internal power in H3N2 hemagglutinins of influenza A virus from North America Prediction of mutations engineered by randomness in H5N1 neuraminidases from influenza A virus Translation probability between RNA codons and translated amino acids, and its applications to protein mutations. In: Ostrovskiy MH (ed) Leading-edge Messenger RNA Research Communications Molecular modeling studies of peptide drug candidates against SARS