key: cord-0005208-wocuomb1
authors: Wu, G.; Yan, S.
title: Prediction of mutations engineered by randomness in H5N1 neuraminidases from influenza A virus
date: 2007-08-28
journal: Amino Acids
DOI: 10.1007/s00726-007-0579-z
sha: bcd5d18666b54f5e428b11a08e81e4f7dc960c61
doc_id: 5208
cord_uid: wocuomb1

In this proof-of-concept study, we attempt to determine whether the cause-mutation relationship defined by randomness is protein dependent by predicting mutations in H5N1 neuraminidases from influenza A virus, because we have recently conducted several concept-initiated studies on the prediction of mutations in hemagglutinins from influenza A virus. In our concept-initiated studies, we defined the randomness as a cause for mutation, upon which we built a cause-mutation relationship, which is then switched into the classification problem because the occurrence and non-occurrence of mutations can be classified as unity and zero. Thereafter, we used the logistic regression and neural network to solve this classification problem to predict the mutation positions in hemagglutinins, and then used the amino acid mutating probability to predict the would-be-mutated amino acids. As the previous results were promising, we extend this approach to other proteins, such as H5N1 neuraminidase in this study, and further address various issues raised during the development of this approach. The result of this study confirms that we can use this cause-mutation relationship to predict the mutations in H5N1 neuraminidases.

In preparation for the possible epidemics and pandemics of influenza, an important issue is the prediction of mutated proteins of influenza A virus, because the unpredictable mutations lead humans to have little immunity against this deadly disease. Among various subtypes of influenza viruses, the H5N1 viruses are highly pathogenic (Lee et al., 2005; Chen et al., 2006) , of which the mutations mainly occur in the RNA genes coding for ten virus proteins (Hilleman, 2002) .

Neuraminidase is a sialidase (Gottschalk, 1957 ) that prevents virion aggregation by removing cell and virion surface sialic acid (Paulson, 1985) , is the major antigen for neutralizing antibodies and is involved in the binding of virus particles to receptors on host cells (Zambon, 1999) . Still, neuraminidase is the target of several antiinfluenza drugs (Hochg€ u urtel et al., 2002; Garman and Laver, 2004; Oxford et al., 2004) . Of subtypes, H5N1 neuraminidase is important as H5N1 virus is currently threatening humans and the mutations in neuraminidase may lead to the dysfunction of anti-influenza drugs.

The preparedness is currently conduced along various approaches, of which the modeling is playing its role in this battle against influenza A virus. A prominent approach in developing inhibitors is conducted at several levels. At receptor protein level, the modeling helps to determine the ''binding pocket'' of the receptor protein with its ligands (Chou, 2004a (Chou, -e, 2005 Chou et al., 1997 Chou et al., , 1999 Chou et al., , 2000 Chou et al., , 2003 Chou et al., , 2006 Li et al., 2007; Wang et al., 2007a, c) . At ''cleavage-site'' level, the modeling is trying to find the target residue for mutagenesis (Poorman et al., 1991; Elhammer et al., 1993; Chou, 1993a Chou, , b, 1996 Thompson et al., 1995) . Upon two levels above, it is generally possible to find the target residues, the next level study is directed to the mutagenesis and the designing of effective inhibitors (Althaus et al., 1993a-c; Chou et al., 1994; Du et al., 2005 Du et al., , 2007 Gan et al., 2006; Gao et al., 2007; Wei et al., 2007) . The fourth level of modeling is the determination of 3D structure of binding interaction in proteins of interests Wang et al., 2007b) .

Recently we have tried to use the modeling approach to predict mutations in proteins from influenza A virus, which is related to several levels too, say, the prediction of mutation positions, the prediction of would-be-mutated amino acids at predicted positions, the timing of mutations, and the prediction of new functions resulting from the mutations. The first three types of predictions are relevant to the primary structure of proteins, while the last one is relevant to 3D structure of proteins.

It is no doubt that the best way for the prediction of mutation is to find the cause for mutations, and then we can build a cause-mutation relationship, and predict the occurrence of mutation when its cause appears. This approach is quite straightforward, however it is challenged by three facts. First, many causes, which led mutations in the past, might nerve leave any trace due to the huge changes in environments, so we would have a relatively detailed record of mutations, but a poor record of their causes. Thus, we could not establish the one-to-one relationship between causes and mutations. Consequently, we even could not define the scale of causes for monitoring. Second, the current version of proteins might not be subject to the causes, which led the historic mutations, because of evolution, the multi-drug resistance could be an example for the evolution of bacteria. Third, it is difficult to find the historically macro-and micro-environmental surroundings, under which historic causes triggered the mutations.

As the searching of each historically instant cause appears difficult, we might need to direct our effort forward the searching of constant causes, because the protein constantly evolutes although its evolutionary speed is not constant. Randomness should be one of such constant causes, which engineer mutations through generation, not only because the pure chance is now considered to lie at the very heart of nature (Everitt, 1999) and the occurrence of mutation is generally considered a random event (Fitch et al., 1997) , but also because randomness suggests that an event does not occur deliberately, but naturally. This further suggests that the event with a bigger probability would occur more easily than the event with a smaller probability. Although nature should deliberately construct the absolutely necessary structure for a protein with more time and energy, there must be some structures that can be explained by random mechanism, because not only nature follows parsimony, but also nature cannot predict the future by constructing the structure for the future, which are currently useless.

Once we could measure and quantify this randomness, we could compare the quantified randomness before and after mutations to determine whether randomness plays a role. If so, we could build a quantitative cause-mutation relationship to predict the mutations engineered by randomness.

Although it is difficult to measure and quantify the randomness in nature, we could measure and quantify the randomness in a protein, which mirrors the randomness in nature. Since 1999, we have developed three methods to quantify the randomness in protein, and find the quantified randomness sensitive to mutations. This means that the randomness does play an important role in engineering mutations or we can use the random mechanism to explain some mutations.

Furthermore, this also means that we can build a causemutation relationship accounting for the mutations engineered by randomness. This is possible, because we can classify the occurrence and non-occurrence of mutation as unity and zero. This way, the cause-mutation relationship is switched into the classification problem, which can be solved either using logistic regression or neural network. However, the occurrence and non-occurrence of mutation is a binary event, which means that we can only use this cause-mutation relationship to predict the mutation positions rather than the would-be-mutated amino acids at predicted positions.

For prediction of would-be-mutated amino acids, we have more difficulties to build a deterministic relationship or classification model. However, there are several common ways (Dayhoff et al., 1978; Feng et al., 1985; Karlin and Ghandour, 1985; M€ u uller et al., 2002) as well as the amino acid mutating probability developed by us (Wu and Yan, 2005g, 2006a to solve this issue.

All these indicate that the prediction of mutations includes at least two steps, say, the prediction of mutation position and the prediction of would-be-mutated amino acids at predicted positions. Along this two-step frame, we very recently conducted several concept-initiated studies to test whether we can apply this cause-mutation relationship with logistic regression as well as neural network to predicting the mutation positions, and then apply the amino acid mutating probability to predicting the wouldbe-mutated amino acids at predicted mutation positions in hemagglutinins from influenza A virus Yan, 2006e, f, 2007a, c, d) .

As the results of these concept-initiated studies appear promising, we need to conduct many more proof-of-concept studies to determine whether this approach is dependent on different proteins, subtypes, etc., and to refine the approach and to clarify the related issues. Hence, we attempt to apply this approach to predicting the mutations in H5N1 neuraminidase from influenza A virus in this study. As our approaches are not familiar to most researchers, we will explain them in great details.

In our two-step frame, the prediction model is only related to the causemutation relationship, which is switched to the classification problem. Thus, we use the logistic regression, whose output ranges between zero and unity, PðyÞ ¼ 1 1þe b 0 þb 1 x 1 þb 2 x 2 þb 3 x 3 þb 4 x 4 þb 5 x 5 þb 6 x 6 þb 7 x 7 , where x i is the independent, y is the dependent, and b i is the model parameters. As our previous studies shows that seven independents work better than six independents Yan, 2006f, 2007a, d) , we will use seven independents in modeling of neuraminidase in this study.

This quantification is calculated according to permutation, and we have used it to study various proteins (Wu, 1999 (Wu, , 2000a Wu and Yan, 2000a -c, 2001a -c, 2002a -d, 2003a -h, 2004a -e, 2005a -d, f, 2006b , d-f, 2007a . Its rationale includes: (i) this is the simplest way to quantify the randomness in a protein, (ii) the counting of amino-acid pairs was inspired from modern encryption technology by counting the frequency of basic unit in an unknown language, and (iii) a good signature pattern of a protein must be as short as possible, but the conserved sequence is not longer than four or five residues (PROSITE, 2002) , while our previous studies show the amino-acid pair the best for our aim. The practical meanings are that this amino-acid pair predictability is very sensitive to the change in neighboring amino acids, and answers why a type of amino acid is adjacent to a certain type of amino acid but not to the others.

The simplest calculations are as follows: according to the permutation, for example, there are 44 glycines (G) and 34 isoleucines (I) in 2005 AB239126 neuraminidase, the randomly predicted frequency of aminoacid pair ''GI'' is 3 (44=449 Â 34=448 Â 448 ¼ 3.3318), that is, ''GI'' would appear three times in this neuraminidase, which is the predicted frequency and is the reference for comparison. Actually we do find 3 ''GI'', so ''GI'' is predictable and the difference between its actual and predicted frequency is 0. Again, there are 28 threonines (T) in AB239126 neuraminidase, and the randomly predicted frequency of ''TI'' is 2 (28=449 Â 34=448 Â 448 ¼ 2.1203), i.e. there would be two ''TI'' in the neuraminidase. But the ''TI'' appears five times in reality, so the difference between its actual and predicted frequency is 3. After such calculations, each amino-acid pair has its difference between actual and predicted frequency. As a point mutation is related to a single amino acid, it connects with two neighboring amino acids except for the terminal one and constructs two amino-acid pairs, so each amino acid can have the sum of difference between actual and predicted frequency in two neighboring amino-acid pairs (SDAPF).

This is a derivate quantification, because our previous studies show that the mutation minimizes the difference between actual and predicted frequency, and the bigger the difference is, the more vulnerable the amino-acid pair to mutation is (Wu and Yan, 2002a -c, f, 2003a -h, 2004a . As each amino acid in neuraminidase has its SDAPF, which generally ranges from À5 to 9, we count how many amino acids have SDAPF of 1, SDAPF of 2, and so on, then calculate their percentage with respect to all amino acids, and each amino acid has this percentage.

This quantification is based on the common consideration in regression, that is, the first-order interaction between independents is frequently included in regression analysis (Draper and Smith, 1981; Hosmer and Lemeshow, 2000) . In our case, SDAPF and its percentage are closely related one another, and our previous studies suggest that the interaction significantly enhances the predictability Yan, 2006e, f, 2007a, d) . We therefore assign the first-order interaction to each amino acid.

This quantification is calculated according to the occupancy of subpopulations and partitions (Feller, 1968) , and we have used this quantification to study various proteins (Gao et al., 2006; Wu and Yan, 2000d , 2001d , e, 2002c -f, 2004f, 2005d , e, 2006c -f, 2007a .

The quantification is developed along such line of thought, for example, there are two methionines (M) among 141 amino acids in human hemoglobin a-chain (Wu and Yan, 2000d) . With regard to their random distribution, our intuition may suggest that there would be one ''M'' in the first half of the chain and another ''M'' in the second half, which is true in reallife case. In fact, there are only three possible distributions of ''M''s in human hemoglobin a-chain, i.e. (i) both ''M''s are in the first half, (ii) one ''M'' is in each half and (iii) both ''M''s are in the second half. If we do not distinguish either first half or second half but are simply interested in whether both ''M''s are in both halves or in any half, we will have the probability of 1=2 for each distribution.

If we are interested in the distribution probability of three amino acids in a protein sequence, we naturally imagine to group the protein into three parts, and our intuition may suggest that each part contains an amino acid. If we do not distinguish the first, second and third part, actually there are three types of distributions, i.e. (i) each part contains an amino acid, (ii) two amino acids are in a part and an amino acid in another part, and (iii) three amino acids are in a part. However, the distribution probabilities are different for them, say, 0.2222 for (i), 0.6667 for (ii) and 0.1111 for (iii). Clearly the protein can only adopt one type of distribution for these three amino acids, which is the actual distribution probability, and we may guess that the distribution (ii) is more likely to happen because of its biggest probability, which is the predicted distribution probability and is the reference for comparison.

For four amino acids, we will have five distribution probabilities, i.e. (i) each part contains an amino acid, (ii) a part contains two amino acids and two parts contain an amino acid each, (iii) two parts contain two amino acids each, (iv) a part contains an amino acid and a part contains three amino acids, and (v) a part contains four amino acids. Their distribution probabilities are 0.0938, 0.5625, 0.1406, 0.1875, 0.0156, respectively. Further, we have seven distributions for five amino acids, we have 11 distributions for six amino acids, we have 15 distributions for seven amino acids, and so on.

So we view the positions of each kind of amino acids in a protein as a certain distribution, whose probability can be calculated according to the equation of r!=(q0! Â q1! Â Á Á Á Â qn!) Â r!=(r1! Â r2! Â Á Á Á Â rn!) Â n Àr (Feller, 1968) , where ! is the factorial function, r is the number of a kind of amino acid, q is the number of parts with the same number of amino acids and n is the number of grouped parts in the protein for a kind of amino acid. In fact, this distribution probability can be referred to the statistical mechanics, which classifies the distribution of elementary particles in energy states according to three assumptions of whether or not distinguishing of each particle and energy state, i.e. Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein assumptions (Feller, 1968) . In plain words, this distribution probability is the probability if we would receive seven letters in a week but the letters distribute randomly.

The practical meanings are that this quantification is mainly subject to any change in the position of amino acid, and answers why the majority of amino acids cluster in some regions rather than homogenously distribute along the primary structure of a protein.

With respect to neuraminidases in this study, for instance, there are 18 cysteines (C) in AB239126 neuraminidase. Its predicted and actual distribution probabilities are 0.1246 and 0.0138, so the ratio of predicted versus actual distribution probabilities is 9, whose natural logarithm is 2.1972 (LRPADP). In this way, each amino acid has its LRPADP.

With the similar consideration of independents II and III, we give the percentage of LRPADP and the first order interaction between LRPADP and its percentage to each amino acid.

This quantification is calculated according to the translation probability between RNA codons and translated amino acids (Wu and Yan, 2005g, 2006a , and we have used this quantification to study various proteins (Wu and Yan, 2005g, 2006a , f, 2007a .

This quantification is developed along such line of thought, for example, we are interested in the amino acid threonine and its mutated amino acids with their mutating probability. As the RNA codons have the unambiguous relationship with their translated amino acids, we can extend this question to RNA level, this is, a point mutation in RNA codon leads to the mutation at amino acid level.

Threonine is related to RNA codons ACU, ACC, ACA, and ACG, the mutation at the first position of ACU can lead ACU to mutate to CCU, GCU, and UCU, which correspond to threonine to mutate to proline, alanine, and serine at amino acid level. Similarly, the mutation at second position of ACU can result in isoleucine, asparagine, and serine, the mutation at the third position of ACU can result in threonine, threonine and threonine. Taken four RNA codons together, threonine would mutate in such a way, say, 4 alanines þ 2 arginines þ 2 asparagines þ 3 isoleucines þ 2 lysines þ methionine þ 4 prolines þ 6 serinesþ 12 threonines. Thus we have the threonine mutating probability to these amino acids, say

For all 20 kinds of amino acids, we have the amino acid mutating probability in Table 1 .

For the calculation of future composition of amino acids, we have the following steps: (i) We would expect that ''A'' has the 12=36 chance of mutating to ''A'' (line 2 in Table 1 ), ''R'' and ''N'' have no chance of mutating to ''A'' (lines 3 and 4 in Table 1 ), ''D'' has 2=18 chance (line 5 in Table 1) , ''C'' has no chance (line 6 in Table 1 ), ''E'' has 2=18 chance, and so on. (ii) Meanwhile, AB239126 neuraminidase has 18 ''A'', 16 ''R'', 29 ''N'', 21 ''D'', 18 ''C'', 20 ''E'', and so on. (iii) So we can estimate how many ''A'' can be mutated using 18 Â 12=36 þ 16 Â 0 þ 29 Â 0 þ 21 Â 2=18 þ 18 Â 0 þ 20 Â 2=18þ, and so on. In total, this is the future composition of amino acid ''A''. (iv) After calculated all 20 kinds of amino acids, ''A'' contributes 6.3374% to the future composition of neuraminidase, which is the predicted composition and is the reference for comparison. (v) On the other hand, ''A'' contributes 4% (18=450) to the current composition of AB239126 neuraminidase. (vi) Thus, we have the ratio of future versus current compositions, for example, the ratio of ''A'' is 1.5844 (6.3374%=4%), which can be assigned to each ''A'' in AB239126 neuraminidase.

The practical meanings are that this quantification is mainly subject to the future mutation trend, and answers with what probability an amino acid mutates to another type of amino acid.

The phylogenetics analyses the evolutionary process of neuraminidases in question. Along same branch of the evolutionary tree, we can compare the parent and daughter neuraminidases, the difference between them indicates the occurrence of mutation, which is marked as unity, whereas no difference between them indicates the non-occurrence of mutation, which is marked as zero.

To predict the would-be-mutated amino acids at predicted positions, we can also use Table 1 to make the estimation, for example, we would like to know which type of amino acid ''T'' would mutate to, according to Table 1 , we find that ''T'' has the highest mutating probability (12=36), however this is only the case that ''T'' mutates to ''T'', then the next to the highest probability is the one that would be likely to be mutated, that is, ''S'' has 6=36 probability of occurrence. This way, we can approximately predict the would-be-mutated amino acids at the predicted positions. Table 1 . Amino acid mutating probability based on the translation probability between RNA codons and translated amino acids Amino acid Mutated amino acid with its translation probability

A alanine; R arginine; N asparagine; D aspartic acid; C cysteine; E glutamic acid; Q glutamine; G glycine; H histidine; I isoleucine; L leucine; K lysine; M methionine; F phenylalanine; P proline; S serine; T threonine; W tryptophan; Y tyrosine; V valine

The SigmaStat (SPSS 1992 (SPSS -2003 and Systat (Systat Software 2004) are used to conduct all the logistic regressions. The outlier (3SD) is calculated according to Healy (1979) . The prediction sensitivity, specificity and total correct rate are calculated according to the method mentioned in Systat software (Systat Software 2004) . The Chi-square test is performed for comparison.

After all the calculations, each parent neuraminidase has seven independents and one dependent for each amino acid of its sequence, for example, Table 2 shows a fraction of a neuraminidase after the calculation, where each amino acid is associated with seven independents and one dependent, which is determined by comparing 1996 AAD51926 and 1997 AAK38299 neuraminidases. Thus, we can input this format of data into the logistic regression to obtain the model parameters.

In modeling, we use the so-called population estimates to make predictions, and we have technically two ways to obtain the population estimates, either by calculating mean AE SD of all obtained model parameters or by pooling the data into a representative. We use the second method in this study because the logistic regression does not appear powerful enough to capture the mutation in each sequence, which is particularly related to the ratio of number of mutations to the length of sequence, although neuraminidase is generally longer than hemagglutinin and the logistic regression functions better.

In our previous studies (Wu and Yan, 2006e , f), we used the linear regression to evaluate the prediction performance, which is a very traditional method for evaluation of prediction performance. However, we soon realized the limitation of linear regression in context with the prediction of mutation. This is because we generally have the paired datasets for linear regression, for example, we might have the measured and predicted blood drug concentrations at certain time points, and then we can use the linear regression to regress them and get the correlation coefficient. However, this is not suitable for the prediction of mutation, for example, we might have five actual mutation positions, but four predicted mutation positions. In such a case, it would be difficult to use the linear regression because of unpaired datasets. Still, we also cannot use the linear regression for evaluation of would-be-mutated amino acids, because an amino acid can mutate to several different types of amino acids, which cannot be considered as paired cases.

To overcome this difficulty, we used the percentage of captured positions for evaluation (Wu and Yan, 2007a, d) , and more recently we use the prediction sensitivity, specificity and total correct rate (Wu and Yan, 2007c) according to the method mentioned in Systat software (Systat Software 2004) because we can classify the predicted mutation positions as the positives, false positives, negatives and false negatives when comparing the predicted with the actual mutation positions. Thus, the percentage of captured positions in our previous studies (Wu and Yan, 2007a, d) is in fact equal to the total correct rate.

As can be seen in Fig. 1 , the prediction pattern of H5N1 neuraminidase is similar to the prediction patterns of other hemagglutinins although we can find the statistical difference. However, the statistical difference is mainly found between the prediction in hemagglutinins with distinguishing arginine, leucine and serine and others. This is very suggestive because it implies our research direction in near future, say, to conduct the prediction at RNA codon level as a single mutation in RNA codon level may not lead to the mutation at amino acid level such as ''A'' has 12=36 chance of mutating ''A'' in Table 1 . Fig. 1 suggests that the cause-mutation relationship defined is independent of subtypes of protein as well as proteins, at least for hemagglutinins and neuraminidases. In this case, it means that the randomness does play a mutating role not only in hemagglutinins but also in neuraminidases although we need to conduct more proof-of-concept studies to further determine this issue.

This way, we can obtain the population estimates from regressing historical data. Thereafter, we can input seven independents of recent neuraminidases, whose mutations are yet to know, into the logistic regression with population estimates and get the output, which is ranged from 0 to 1 in each position. For example, the human H5N1 neuraminidase (AB239126) is a relatively new sequence, which can serve for our prediction. For this neuraminidase, we have seven independents (Table 3) , and then we put them into PðyÞ ¼ 1 1þe 0:42À0:214x 1 À0:061x 2 À0:156x 3 À0:021x 4 þ0:11x 5 þ0:016x 6 þ0:035x 7 which is based on the population of 90 neuraminidases from 2000 to 2004. Figure 2 displays the prediction of mutation in AB239126 H5N1 neuraminidase according to our twostep frame. The solid line in the lower panel is the predicted mutation probability with respect to each position, and the dash-dotted line is the cut-off mutation probability of 0.5, that is, the amino acid whose mutation probability is larger than 0.5 risks mutation. The pie picture in the upper panel shows how to predict the would-be-mutated amino acid from serine at position 319 according to the amino acid mutating probability in Table 1 .

With the population estimates as model parameters for prediction, an important issue is the sampling strategy (Wu et al., 1995 (Wu et al., , 1996 Wu, 1997) , that is, from which population we get the population estimates, not only because there are many subtypes in neuraminidases (Air et al., 1 . Prediction performance in studied proteins. The sensitivity is equal to the predicted positives=the actual mutations (%), the specificity is equal to the predicted negatives=the actual non-mutations (%), and the total correct rate is equal to (predicted positives þ predicted negatives)=length of hemagglutinin (%). 241 H5N1 HA six is the predictions using 241 H5N1 hemagglutinins with six independents; 333 H5N1 HA seven is the predictions using 333 H5N1 hemagglutinins with seven independents; 333 H5N1 HA seven with RrLlSs is the predictions using 333 H5N1 hemagglutinins with seven independents with distinguishing arginine, leucine and serine; 482 H3N2 HA seven is the predictions using 482 H3N2 hemagglutinins with seven independents; 429 H5N1 NA seven is the predictions using 429 H5N1 neuraminidases with seven independents. The Chi-square test indicates the statistically significant difference in sensitivity between 241 H5N1 HA six and 333 H5N1 HA seven with RrLlSs, between 333 H5N1 HA seven and 333 H5N1 HA seven with RrLlSs, between 333 H5N1 HA seven with RrLlSs and 482 H3N2 HA seven, between 333 H5N1 HA seven with RrLlSs and 429 H5N1 NA seven; the statistically significant difference in specificity between 241 H5N1 HA six and 333 H5N1 HA seven with RrLlSs, between 333 H5N1 HA seven and 429 H5N1 NA seven, between 333 H5N1 HA seven with RrLlSs and 482 H3N2 HA seven, between 333 H5N1 HA seven with RrLlSs and 429 H5N1 NA seven; the statistically significant difference in total correct rate between 241 H5N1 HA six and 333 H5N1 HA seven with RrLlSs, between 333 H5N1 HA seven with RrLlSs and 482 H3N2 HA seven, between 333 H5N1 HA seven with RrLlSs and 429 H5N1 NA seven 1985; Schreier et al., 1988; Harley et al., 1989; Liu et al., 2003; Campitelli et al., 2004; Suzuki et al., 2004; Bragstad et al., 2005) but also the migration of wild birds is different one from another (Donis et al., 1989; Rohm et al., 1995; Hoffmann et al., 2000; Guan et al., 2004; Krauss et al., 2004; Wu and Yan, 2005e) . This implies that the population estimates obtained from Asian wild bird may not be suited for the prediction of mutation in wild bird in North America, which nevertheless needs more studies. Suggestive is that we may have many different population estimates, based on which we make the predictions, which of course needs more studies.

Gene and protein sequence of an influenza neuraminidase with hemagglutinin activity

Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E

Kinetic studies with the nonnucleoside HIV-1 reverse transcriptase inhibitor U-88204E

The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase

New avian influenza A virus subtype combination H5N7 identified in Danish mallard ducks

Interspecies transmission of an H7N3 influenza virus from wild birds to intensively reared domestic poultry in Italy

Establishment of multiple sublineages of H5N1 influenza virus in Asia: implications for pandemic control

A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins

A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus proteasecleavable sites in proteins

Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach

Prediction of HIV protease cleavage sites in proteins

Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor

Insights from modelling the tertiary structure of BACE2

Molecular therapeutic target for type-2 diabetes

Prediction of mutations in AB239126 neuraminidase based on logistic regression (lower panel) and amino acid mutating probability

Structural bioinformatics and its impact to biomedical science

Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5

Modeling the tertiary structure of human cathepsin-E

Prediction of the tertiary structure and substrate binding site of caspase-8

Steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases

Prediction of the tertiary structure of a caspase-9=inhibitor complex

A model of the complex between cyclin-dependent kinase 5(Cdk5) and the activation domain of neuronal Cdk5 activator

Progress in computational approach to drug development against SARS

Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS

A model of evolutionary change in protein

Distinct lineages of influenza virus H4 hemagglutinin genes in different regions of the world

Heuristic molecular lipophilicity potential (HMLP): a 2D-QSAR study to LADH of molecular family pyrazole and derivatives

Inhibitor design for SARS coronavirus main protease based on ''distorted key theory

New York Feller W (1968) An introduction to probability theory and its applications

Long term trends in the evolution of H(3) HA1 human influenza type A

Controlling influenza by inhibiting the virus's neuraminidase

Pattern of positions sensitive to mutations in human haemoglobin a-chain

Agaritine and its derivatives are potential inhibitors against HIV proteases

The specific enzyme of influenza virus and Vibrio cholerae

H5N1 influenza: a protean pandemic threat

Molecular cloning and analysis of the N5 neuraminidase subtype from an avian influenza virus

Outliers in clinical chemistry quality-control schemes

Realities and enigmas of human viral influenza: pathogenesis, epidemiology and control

Characterization of the influenza A virus gene pool in avian species in southern China: was H6N1 a derivative or a precursor of H5N1?

Target-induced formation of neuraminidase inhibitors from in vitro virtual combinatorial libraries

Applied logistic regression

Multiple-alphabet amino acid sequence comparisons of the immunoglobulin kappa-chain constant domain

Influenza A viruses of migrating wild aquatic birds in North America

Characterization of highly pathogenic H5N1 avian influenza A viruses isolated from South Korea

Computational studies of the binding mechanism of calmodulin with chrysin

The influenza virus gene pool in a poultry market in South central China

Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method

A new millennium conundrum: how to use a powerful class of influenza anti-neuraminidase drugs (NAIs) in the community

Interactions of animal viruses with cell surface receptors

Do hemagglutinin genes of highly pathogenic avian influenza viruses constitute unique phylogenetic lineages?

Complete nucleotide sequence of the neuraminidase gene of the human influenza virus A=Chile=1=83 (H1N1)

Evolutional analysis of human influenza A virus N2 neuraminidase genes based on the transition of the low-pH stability of sialidase activity

Neural network prediction of the HIV-1 protease cleavage sites

3D structure modeling of cytochrome P450 2C19 and its implication for personalized drug design

Insights from modeling the 3D structure of NAD(P)H-dependent D-xylose reductase of Pichia stipitis and its binding interactions with NAD and NADP

Study of drug resistance of chicken influenza A virus (H5N1) from homology-modeled 3D structures of neuraminidases

Insights from modeling the 3D structure of H5N1 influenza virus neuraminidase and its binding interactions with ligands

Molecular insights of SAH enzyme catalysis and their implication for inhibitor design

An explanation for failure to predict cyclosporine area under the curve using a limited sampling strategy: a beginner's second note

The first and second order Markov chain analysis on amino acids sequence of human haemoglobin a-chain and its three variants with low O 2 affinity

Frequency and Markov chain analysis of amino-acid sequence of human glutathione reductase

Frequency and Markov chain analysis of amino-acid sequence of human tumor necrosis factor

Frequency and Markov chain analysis of amino-acid sequences of mouse p53

Frequency and Markov chain analysis of the amino acid sequence of human alcohol dehydrogenase a-chain

Frequency and Markov chain analysis of the aminoacid sequence of sheep p53 protein

The first, second and third order Markov chain analysis on amino acids sequence of human tyrosine aminotransferase and its variant causing tyrosinemia type II

The first, second, third and fourth order Markov chain analysis on amino acids sequence of human dopamine b-hydroxylase

Effects of different sampling strategies on predictions of blood cyclosporine concentrations in haematological patients with multidrug resistance by Bayesian and non-linear least squares methods

Prediction of blood cyclosporine concentrations in haematological patients with multidrug resistance by one-, two-and three-compartment models using Bayesian and nonlinear least squares methods

Frequency and Markov chain analysis of aminoacids sequence of human platelet-activating factor acetylhydrolase asubunit and its variant causing the lissencephaly syndrome

Prediction of two-and three-amino acid sequence of human acute myeloid leukemia 1 protein from its amino acid composition

Prediction of two-and three-amino-acid sequences of Citrobacter Freundii b-lactamase from its amino acid composition

Prediction of distributions of amino acids and amino acid pairs in human haemoglobin a-chain and its seven variants causing a-thalassemia from their occurrences according to the random mechanism

Frequency and Markov chain analysis of amino-acid sequences of human connective tissue growth factor

Prediction of presence and absence of two-and three-amino-acid sequence of human monoamine oxidase B from its amino acid composition according to the random mechanism

Prediction of presence and absence of two-and three-amino-acid sequence of human tyrosinase from their amino acid composition and related changes in human tyrosinase variant causing oculocutaneous albinism

Analysis of distributions of amino acids, amino acid pairs and triplets in human insulin precursor and four variants from their occurrences according to the random mechanism

Analysis of distributions of amino acids and amino acid pairs in human tumor necrosis factor precursor and its eight variants according to random mechanism

Determination of amino acid pairs sensitive to variants in human low-density lipoprotein receptor precursor by means of a random approach

Estimation of amino acid pairs sensitive to variants in human phenylalanine hydroxylase protein by means of a random approach

Random analysis of presence and absence of twoand three-amino-acid sequences and distributions of amino acids, twoand three-amino-acid sequences in bovine p53 protein

Analysis of distributions of amino acids in the primary structure of apoptosis regulator Bcl-2 family according to the random mechanism

Analysis of distributions of amino acids in the primary structure of tumor suppressor p53 family according to the random mechanism

Randomness in the primary structure of protein: methods and implications

Analysis of amino acid pairs sensitive to variants in human collagen a5(IV) chain precursor by means of a random approach

Determination of amino acid pairs in human haemoglobulin a-chain sensitive to variants by means of a random approach

Determination of amino acid pairs in human p53 protein sensitive to mutations=variants by means of a random approach

Determination of amino acid pairs in Von Hippel-Lindau disease tumour suppressor (G7 protein) sensitive to variants by means of a random approach

Determination of amino acid pairs sensitive to variants in human b-glucocerebrosidase by means of a random approach

Determination of amino acid pairs sensitive to variants in human Bruton's tyrosine kinase by means of a random approach

Determination of amino acid pairs sensitive to variants in human coagulation factor IX precursor by means of a random approach

Prediction of amino acid pairs sensitive to mutations in the spike protein from SARS related coronavirus

Amino acid pairs sensitive to variants in human collagen a1(I) chain precursor

Determination of amino acid pairs sensitive to variants in human copper-transporting ATPase 2

Fate of 130 hemagglutinins from different influenza A viruses

Potential targets for anti-SARS drugs in the structural proteins from SARS related coronavirus

Susceptible amino acid pairs in variants of human collagen a1(III) chain precursor

Determination of sensitive positions to mutations in human p53 protein

Amino acid pairs susceptible to variants in human protein C precursor

Mutation features of 215 polymerase proteins from different influenza A viruses

Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different species

Timing of mutation in hemagglutinins from influenza A virus by means of unpredictable portion of amino-acid pair and fast Fourier transform

Searching of main cause leading to severe influenza A virus mutations and consequently to influenza pandemics=epidemics

Prediction of mutation trend in hemagglutinins and neuraminidases from influenza A viruses by means of cross-impact analysis

Determination of mutation trend in proteins by means of translation probability between RNA codes and mutated amino acids

Fate of influenza A virus proteins

Mutation trend of hemagglutinin of influenza A virus: a review from computational mutation viewpoint

Timing of mutation in hemagglutinins from influenza A virus by means of amino-acid distribution rank and fast Fourier transform

Determination of mutation trend in hemagglutinins by means of translation probability between RNA codons and mutated amino acids

Prediction of possible mutations in H5N1 hemagglutinins of influenza A virus by means of logistic regression

Prediction of mutations in H5N1 hemagglutinins from influenza A virus

Improvement of model for prediction of hemagglutinin mutations in H5N1 influenza viruses with distinguishing of arginine, leucine and serine

Translation probability between RNA codons and translated amino acids, and its applications to protein mutations. In: Ostrovskiy MH (ed) Leading-edge messenger RNA research communications

Prediction of mutations initiated by internal power in H3N2 hemagglutinins of influenza A virus from North America

Epidemiology and pathogenesis of influenza