key: cord-0005287-vk8n8s2q authors: González-Díaz, Humberto; Dea-Ayuela, María A.; Pérez-Montoto, Lázaro G.; Prado-Prado, Francisco J.; Agüero-Chapín, Guillermín; Bolas-Fernández, Francisco; Vazquez-Padrón, Roberto I.; Ubeira, Florencio M. title: QSAR for RNases and theoretic–experimental study of molecular diversity on peptide mass fingerprints of a new Leishmania infantum protein date: 2009-07-04 journal: Mol Divers DOI: 10.1007/s11030-009-9178-0 sha: 1b03e0640b73b78829a3d1da25d94210f0a62014 doc_id: 5287 cord_uid: vk8n8s2q The toxicity and low success of current treatments for Leishmaniosis determines the search of new peptide drugs and/or molecular targets in Leishmania pathogen species (L. infantum and L. major). For example, Ribonucleases (RNases) are enzymes relevant to several biologic processes; then, theoretical and experimental study of the molecular diversity of Peptide Mass Fingerprints (PMFs) of RNases is useful for drug design. This study introduces a methodology that combines QSAR models, 2D-Electrophoresis (2D-E), MALDI-TOF Mass Spectroscopy (MS), BLAST alignment, and Molecular Dynamics (MD) to explore PMFs of RNases. We illustrate this approach by investigating for the first time the PMFs of a new protein of L. infantum. Here we report and compare new versus old predictive models for RNases based on Topological Indices (TIs) of Markov Pseudo-Folding Lattices. These group of indices called Pseudo-folding Lattice 2D-TIs include: Spectral moments π (k)(x,y), Mean Electrostatic potentials ξ (k)(x,y), and Entropy measures θ (k)(x,y). The accuracy of the models (training/cross-validation) was as follows: ξ (k)(x,y)-model (96.0%/91.7%)>π (k)(x,y)-model (84.7/83.3) > θ (k)(x,y)-model (66.0/66.7). We also carried out a 2D-E analysis of biological samples of L. infantum promastigotes focusing on a 2D-E gel spot of one unknown protein with M<20, 100 and pI <7. MASCOT search identified 20 proteins with Mowse score >30, but not one >52 (threshold value), the higher value of 42 was for a probable DNA-directed RNA polymerase. However, we determined experimentally the sequence of more than 140 peptides. We used QSAR models to predict RNase scores for these peptides and BLAST alignment to confirm some results. We also calculated 3D-folding TIs based on MD experiments and compared 2D versus 3D-TIs on molecular phylogenetic analysis of the molecular diversity of these peptides. This combined strategy may be of interest in drug development or target identification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11030-009-9178-0) contains supplementary material, which is available to authorized users. useful for drug design. This study introduces a methodology that combines QSAR models, 2D-Electrophoresis (2D-E), MALDI-TOF Mass Spectroscopy (MS), BLAST alignment, and Molecular Dynamics (MD) to explore PMFs of RNases. We illustrate this approach by investigating for the first time the PMFs of a new protein of L. infantum. Here we report and compare new versus old predictive models for RNases based on Topological Indices (TIs) of Markov Pseudo-Folding Lattices. These group of indices called Pseudo-folding Lattice 2D-TIs include: Spectral moments π k (x,y), Mean Electrostatic potentials ξ k (x,y), and Entropy measures θ k (x,y). The accuracy of the models (training/cross-validation) was as follows: ξ k (x,y)-model (96.0%/91.7%) > π k (x,y)-model (84.7/ 83.3) > θ k (x,y)-model (66.0/66.7). We also carried out a 2D-E analysis of biological samples of L. infantum promastigotes focusing on a 2D-E gel spot of one unknown protein with M < 20, 100 and pI < 7. MASCOT search identified 20 proteins with Mowse score >30, but not one >52 (threshold value), the higher value of 42 was for a probable DNA-directed RNA polymerase. However, we determined experimentally the sequence of more than 140 peptides. We used QSAR models to predict RNase scores for these peptides and BLAST alignment to confirm some results. We also calculated 3D-folding TIs based on MD experiments and compared 2D versus 3D-TIs on molecular phylogenetic analysis of the molecular diversity of these peptides. This combined strategy may be of interest in drug development or target identification. Keywords QSAR · Topological indices · Markov models · Protein folding · HP Lattice model · Ribonucleases · Leishmania · MALDI-TOF Mass Spectroscopy · 2D-Electrophoresis · Sequence alignment · Molecular dynamics Introduction Ribonucleases (RNases) are enzymes that usually make staggered cuts in both strands of a double helical RNA, although in some cases they cleave once in a single-stranded bulge in the helix. This fact becomes the exploration of the molecular diversity of RNases (or their peptide fragments that retain RNase activity) as an interesting source to search drug or drug-target candidates for drug development. For instance, Kimberly and Rosenberg [1] have recently reviewed and discussed the molecular diversity of the RNase A super-family that includes an extensive network of distinct and divergent gene lineages. Although all RNases of this super-family share invariant structural and catalytic elements and some degree of enzymatic activity, the primary sequences have diverged significantly, ostensibly to promote novel functions. The authors reviewed the literature on the evolution and biology of the RNase A lineages that have been characterized, specifically as involved in host defense including: (1) RNases 2 and RNases 3, also known as the eosinophil ribonucleases, which are rapidly evolving cationic proteins released from eosinophilic leukocytes, (2) RNase 7, an anti-pathogen ribonuclease identified in human skin, and (3) RNase 5, also known as angiogenin, another rapidly evolving RNase known to promote blood vessel growth with recently discovered antibacterial activity. Interestingly, some of the characterized anti-pathogen activities do not depend on RNase activity per se. The authors also discussed the ways in which the antipathogen activities characterized in vitro might translate into experimental confirmation in vivo. Then, they considered the possibility that other RNases, such as the dimeric bovine seminal RNase and the frog oocyte RNase, may have host defense functions and therapeutic value that remain unexplored. This therapeutic value was demonstrated by Onconase an RNase derived from the frog (Rana pipiens). However, this is the first and only RNase currently evaluated in clinical trials [2] . Conjugation or fusion of RNases to tumor-specific antibodies is a promising approach to further boost tumor cell killing of these compounds. In addition, Dicer and Drosha are type III RNases responsible for the generation of short interfering RNAs (siRNAs) from long double-stranded RNAs during RNA interference (RNAi). It involves both RNase proteins in several important biological processes with high biological and molecular diversity. For instance, the function of Dicer on the vascular system regulating the embryonic angiogenesis probably by processing miRNAs, which regulates the expression levels of some critical angiogenic regulators in the cell [3] . The cellular processing of shRNAs shares common features with the biogenesis of naturally occurring miRNA, such as the cleavage by nuclear RNase Drosha, the translocation from the nucleus, processing by a cytoplasmic RNase Dicer, and the incorporation into the RNA-induced silencing complex (RISC). Each step has a crucial influence on the efficiency of RNAi and their consideration should be a part of a standard experimental design. The possible use of RNAi in the treatment of spinocerebellar ataxia or amyotrophic lateral sclerosis, with its advantages and pitfalls and possible extensions to other diseases has been discussed before [4] . More recently, a new RNase with tobacco mosaic virus inhibition was isolated and purified from Bacillus cereus ZH14. The inhibitory activity of the RNase in the purification process against tobacco mosaic virus was tested, and the percentage inhibition of the purified RNase (48 U/mL) reached 90% [5] . All the aspects above-mentioned becomes the isolation and prediction of new RNases (or peptides with RNase activity) a goal of the major importance for drug development and/or drug-target prediction. One possibility to accomplish the study of molecular diversity is the use of proteomics techniques. For instance, some authors often use a combination of 2D-Electrophoresis (2D-E) and Mass Spectroscopy (MS) to isolate and characterize new sequences from biological samples [6] . Obtaining the peptide mass fingerprint (PMF) of a protein is a very useful procedure in this sense [7] and also for clinical purposes [8, 9] . In these cases, we employ informatics tools, such as Sequest or MASCOT, to have the MS outcomes for some of the more important peptides of the more similar proteins [10, 11] . It means that, for instance, MAS-COT may provides a collection of MS signals and the corresponding sequence of peptides presented in known proteins matching with our MS input. In order to rank and select the best protein/peptide candidates, MASCOT uses the Mowse score [12] . If a template protein in the database has a high Mowse score (>52), this protein has a PMF very similar to the PMF of our query proteins, and we can detect a high sequence homology and perform the function annotation. However, there is still another situation that often appears in proteome research and do not coincide exactly with the two situations mentioned previously. We refer to this case, when you identify a new protein, perform the MS analysis of PMF, introduce it in MASCOT (or other MS and sequence database), and the software identify some template candidates with an important Mowse score that is not sufficiently high to accurately annotate the query protein (>40). A previous study has reported an alternative to Mowse scoring with MASCOT and discussed the limits of accurate scoring [13] . Nevertheless, if this kind of situation persists you have neither the sequence of the query protein nor the sequence of a template protein with high homology but you have the PMFs of both the query and the template. We call this situation here as: the query sequence missing and Low-Mowse scoring case. Independently from the possibility of function annotation of Low-Mowse proteins this kind of PMFs are, in our opinion, ideal sources to fish interesting peptides with bioinformatics and/or data mining computational methods. Many studies have indicated that computational modeling and various automated prediction methods developed recently [14] , such as structural bioinformatics [15, 16] , molecular docking [17] [18] [19] , molecular packing [20, 21] , pharmacophore modelling [22, 23] , Monte Carlo simulated annealing approach [24] , diffusion-controlled reaction simulation [25] , identification of membrane proteins and their types [26] , identification of enzymes and their functional classes [27] , identification of GPCR and their types [28, 29] , identification of proteases and their types [30, 31] , protein cleavage site prediction [32] [33] [34] , and signal peptide prediction [35, 36] can timely provide very useful information and insights for both basic research and drug design. In general, the bioinformatics approaches used to annotate biological functions of nucleic acids and proteins, predict protein secondary structure, and exploring molecular diversity are based on sequence alignment procedures [37] [38] [39] [40] . However, it has been noted that such procedures perform poorly in cases of low sequence homology between the query and template sequences deposited in the data base. Alignment techniques are also useless if there is a high querytemplate homology where we do not know the function of the template sequence deposited in the database [41] . One alternative is the application of alignment-free Machine Learning methods to predict protein functional class and explore molecular diversity based on structural parameters independently of sequence-sequence similarity [42] [43] [44] [45] [46] . For instance, the so-called pseudo-amino acid (PseAA) composition or PseAAC indices introduced by Chou to improve the prediction quality for protein subcellular localization and membrane protein type [47] , as well as for enzyme functional class irrespective of sequence similarity [48] . The PseAA composition can be used to represent a protein sequence with a discrete model without completely losing its sequence-order information. Ever since the concept of Chou's PseAA composition was introduced, a variety of PseAAC approaches have been stimulated for enhancing the prediction quality of different protein features [30, [49] [50] [51] [52] [53] [54] [55] [56] [57] . Using graphic approaches to study biological systems can also provide useful insights, as indicated by many previous studies on a series of important biological topics, such as enzyme-catalyzed reactions [58] [59] [60] [61] [62] [63] [64] , protein folding kinetics [65] , inhibition kinetics of processive nucleic acid polymerases, and nucleases [66] [67] [68] , analysis of codon usage [69, 70] , and base frequencies in the anti-sense strands [71] . Moreover, graphical methods have been introduced for QSAR study [72] [73] [74] as well as utilized to deal with complicated network systems [75, 76] . Recently, the "cellular automaton image" [77, 78] has also been applied to study hepatitis B viral infections [79] , HBV virus gene missense mutation [80] , and visual analysis of SARS-CoV [8, 9] , as well as representing complicated biological sequences [81] and helping to identify protein attributes [29, 82, 83] . Authors such as Randic, Nandy, Liao, and others have introduced 2D or higher dimension graph representations of sequences prior to the calculation of numerical parameters, sometimes called Topological Indices (TIs). This constitutes an important step in order to uncover useful higherorder information not encoded by 1D sequence parameters [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] . Finally, these TIs or other type of parameters may be used as inputs to develop Quantitative Structure-Activity Relationship (QSAR) models in order to predict protein function and explore protein molecular diversity [98] [99] [100] [101] . The idea behind this type of QSAR-like approach to protein molecular diversity is essentially the same reported by other authors on low-weight molecules QSAR/QSPR study, e.g., the important works of Roy et al. [101] [102] [103] [104] [105] [106] [107] [108] . In fact, QSAR is one of the more important tools to explore molecular diversity nowadays [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] . In particular, for the case of proteins, the idea of describing them as networks is very interesting and has important advantages over computationally expensive methods (see, for instance, the interesting studies of Krishnan, Zibilut, and Giuliani et al. [120] [121] [122] [123] [124] [125] ). Specifically, different computational schemes have used charge and Hydrophobicity patterning along sequence to predict folding and mechanism and aggregation of proteins, Zibilut, and Giuliani et al. in proteome research [126] . Recently, our group have introduced Hydrophobicity-Polarity (HP) 2D Cartesian or latticelike network representations for proteins [127] . We can use Markov Chains theory in order to calculate TIs of these lattices, which allow us to numerically encode higher-order sequence information. The method consists of the following steps, which can be applied to many different problems and have been revised in recent reviews [98, 99, 128] . First, we derived the Lattice-like representations (also called maps or graphs) for protein sequences. Next, we calculated the TIs values to characterize the protein sequence. Finally, we use these pseudo-folding TIs as inputs for QSAR or Clustering algorithms [95] . On the other hand, Molecular Dynamics (MD) of peptides and proteins is central for drug and target discovery. Since, the pioneering article entitled "The Biological Functions of Low-Frequency Phonons" [129] was published in 1977, a series of investigations into biomacromolecules by means of dynamic avenues have been stimulated. It has been suggested through these studies that low-frequency (or terahertz frequency) collective motions do exist in proteins and DNA that hold a very high potential to reveal the profound dynamic mechanisms of many marvelous biological functions in biological systems (see, e.g., [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] and a comprehensive review [144] ). Such inferences have been later observed by NMR [145] , and applied in medical treatments [146, 147] . In view of this, to really understand the action mechanism of drugs with their receptors, we should consider not only their static structures but also their dynamical processes by simulating their interactions through a dynamic process. Thus, MD has become the foremost computational technique to investigate structure and function of peptides [148] [149] [150] [151] [152] [153] . Consequently, we can use the 3D folded structures of the peptides obtained by MD to calculate 3D-TIs instead of pseudo-folding 2D-TIs. The present study is aimed to develop a powerful computational approach for studying Peptide Mass Fingerprints of Ribonucleases by combining QSAR models, 2D-Electrophoresis (2D-E), MALDI-TOF Mass Spectroscopy (MS), BLAST alignment, and Molecular Dynamics (MD) in hopes that it may become a useful tool for drug development. We report two different experiments in order to introduce new Sequence and MD pseudo-folding TIs for the study of molecular diversity of PMFs. We also report new QSAR and Clustering analysis models based on these indices. In the first experiment (Experiment 1), we show the use in an experimental example to use 2D-Lattice electrostatic parameters to numerically characterize protein sequences and seek a model to predict RNase III function without relying on alignment. Different classes of 2D graphs representations of DNA, RNA, protein sequence, or proteomic maps have been used by other researchers [87, 91, 92, [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] . We subsequently developed three different classifiers (one for each type of TIs) to connect protein sequence information (represented by TIs values) with the classification of sequences as RNase III or not. In general, different kinds of classifiers have been used to derive protein sequence QSAR models [165, 166] . We selected a Linear Discriminant Analysis (LDA), which is a simple but powerful technique [167] . In the other experiment (Experiment 2), we compared phylogenetic analysis of Peptides based on both folding 3D-TIs and pseudo-folding 2D-TIs. In both experiments, we illustrate the use of the new models in a practical example based on the analysis of the PMF of a new protein. As a result of this work we could characterize the PMF of the new protein and introduced at the same time new QSAR and Phyloge-netic algorithms of general use for other peptides or proteins. The MARCH-INSIDE approach is used to calculate the Pseudo-Folding TIs of sequences. First, each aminoacid in the sequence is placed in a Cartesian 2D space r 2 = (x,y) starting with the first monomer at the (0, 0) coordinates. The coordinates of the successive aminoacids are calculated as follows: in a similar manner, then it can be used for a DNA [127] : (a) Increases in +1 the x axe; coordinate for an acid aminoacid (rightwards-step), (b) Decreases in −1 the x axe; coordinate for a basic aminoacid (leftwards-step), (c) Increases in +1 the y axe; coordinate for a polar aminoacid (upwards-step), and (d) Decreases in −1 the y axe; coordinate for a non-polar aminoacid (downwards-step). Second, the method uses the Markov matrix 1 , which is a squared matrix to characterize electrostatic interactions between aminoacids in the folded protein. Note that the number of nodes (n) in the graph may be equal or even smaller than the number of aminoacids. The matrix 1 contains the probabilities 1 p i j (r 2 ) of direct electrostatic interaction between two nodes placed at distance y k = 1 within the lattice in r 2 . The formula for 1 p i j (r 2 ) values is the following: where Q j is the charge of the node n j (coincide with the sum of the charge for all aminoacids projected over the node), d i j is the Euclidean distance between the nodes i and j, and α i j equals to 1, if the nodes n i and n j are adjacent in the graph and equals to 0 otherwise. The charge of the node is equals to the sum of the charges of all aminoacids placed at this node. Afterward, we can calculate sequence pseudofolding TIs in the form of different invariants of this matrix. In this study, we consider three different classes of pseudofolding Electrostatic TIs: spectral moments π k (x,y), entropy values θ k (x,y), and average electrostatic potentials ξ k (x,y). Using the Markov chain theory, we can calculate the values of these parameters for all nodes placed a topological distance k > 1: where Tr is called the trace and points to the sum of all the values in the main diagonal of the matrices k = ( 1 ) k , calculated as natural powers of 1 . The present 2D-TIs encode in a stochastic manner the interactions of charged nodes (one or more amino acids) placed at different distances not in the sequence (1D space), but in the 2D lattice embedded in r 2 . Note that in Eqs. 3 and 4, we used absolute probabilities k p j (r 2 ) of interaction for a node with any other node placed at distance k instead of using directly the interaction probabilities k p i j (r 2 ). In protein QSAR, this kind of pseudo-folding lattices in r 2 = (x, y) may become an alternative, in terms of computational cost, to real folded structures in r 3 = (x, y, z). Figure 1 depicts both the pseudo-folding lattice network for a protein in r 2 and the aminoacid-aminoacid contact map network for the same protein in r 3 . The calculation of the k p j (r 2 ) values has already been explained in detail in the literature, therefore, we do not cover this here [127, 168] . This theoretical description contains the essential elements to understand the work and the reader may also consult recent reviews that explain in detail the theory and applications of the MARCH-INSIDE approach [98, 99, 128] . Linear Discriminant Analysis (LDA) was used to construct the QSAR classifier. LDA forward stepwise analysis was carried out for variable selection to build up the model [167] . All of the variables included in the model were standardized in order to bring them onto the same scale. Subsequently, a standardized linear discriminant equation that allows comparison of their coefficients was obtained [169] . The square of Canonical regression coefficient (Rc) and Wilk's statistics (U) were examined in order to assess the discriminatory power of the model (U = 0 perfect discrimination, being 0 < U < 1), and the separation of the two group of proteins was statistically verified by the Fisher ratio (F) test with error level p < 0.05. The Molecular Dynamics Trajectories (MDTs) or energetic profiles of all the starting structure of peptides were also obtained by means of the Monte Carlo (MC) method, using the HyperChem package [170, 171] . In this sense, the AMBER94 force field [172] was used with distant-dependent dielectric constant (scale factor 1), electrostatic and Van der Waals values by default and cutoffs shifted with outer radius of 14 Å (see Fig. 2 ). All the components of the force field were included and the atom type was recalculated keeping their current charges. Previous to MC simulation, the geometry of all the structures of peptides were optimized with this same force field. Finally, the simulation was executed in vacuo at 300 K and 100 optimization steps obtaining MDTs with 100 potential energy d E j (j = 1, 2, 3, . . . , 100) values each one. We obtained 22 MDTs for 19 peptides. In order to obtain realistic MDTs, there is an additional parameter that we monitor in MD algorithms, which is known as the acceptance ratio (ACCR). It appears as ACCR on the list of possible selections in the MC Averages dialog box of HyperChem (see Fig. 2 ). The ACCR is a running average of the ratio of the number of accepted moves to attempted moves. Varying the step size can produce a large effect on the ACCR value. The step size ( r 3 ) is the maximum allowed atomic displacement used in the generation of trial configurations. The default value of r 3 in HyperChem is 0.05 Å [170] . For most organic molecules, this will result in ACCR of about 0.5 Å, which means that about 50% of all moves are accepted. Increasing the size of the trial displacements may lead to more complete searching of configuration space, but the acceptance ratio will, in general, decrease. Smaller displacements generally lead to higher acceptance ratios but result in more limited sampling. There has been little research to date on what the optimum value of the acceptance ratio should be. The method may also use the Markov matrix 1 , which is a squared matrix to characterize electrostatic interactions between aminoacids in the folded 3D structure of the peptide obtained by MD. The matrix 1 contains the probabilities 1 p i j of direct electrostatic interaction between two nodes placed at distance lower than cut-off within the 3D space of coordinates r 3 = (x,y,z): where Q j is the charge of the node n j (coincide with the sum of the charge for all aminoacids projected over the node), d i j is the Euclidean distance between the nodes i and j, and α i j equals to 1, if the nodes n i and n j are adjacent in the graph and equals to 0 otherwise. Afterward, we can calculate sequence pseudo-folding TIs in the form of different invariants of this matrix. In this study, we consider three different classes of real folding 3D-TIs: spectral moments π k (r 3 ), entropy values θ k (r 3 ), and average electrostatic potentials ξ k (r 3 ). Using the Markov chain theory, we can calculate the values of these parameters for all nodes placed a topological distance k > 1: 2D versus 3D-TIS phylogenetic analysis of PMFs In principle, we can use different distance functions, here, we select only the Euclidean distance due to the Euclidean nature of the Cartesian of both the space used to derive the pseudo-folding lattices r 2 and the real folding space r 3 . Using the Tree Joining Cluster (TJC) analysis, algorithm implemented on the software Statistica, we were able to construct, visualize, and compare the phylogentic trees based on both 2D and 3D-TIs. The molecules used in this study were the same 19 peptides found on the PMF of the new protein. In general, in the phylogentic analysis, we can calculate here (3 type of indices) × (2 type of graphs) = 6 different Euclidean distances. In order to give a general notation for all these equations, we use the symbol p TI k (r d ), which take the values TI = θ , ξ , or π and the dimension of the space d = 2 for r 2 = (x,y) or d = 3 for r 3 = (x,y,z). The equation that describes the formula may used to calculate the nine types of Euclidean distances, mentioned above or alternatively, we can group all the TIs of the same r d : Promastigotes of the Leishmania strain LEM75 were grown in Schneider medium supplemented to a final concentration of 0.4 g/L NaHCO 3 , 4 g/L HEPES, 100 mg/L penicillin and streptomycin, and 10% fetal bovine serum (Gibco), pH 6.8 and 26 • C. Mid-log promastigotes were recovered on day 7 post-inoculum (p.i.) and the parasites were centrifuged at 3,000 rpm for 10 min at 4 • C. The resulting pellet was washed five times with Tris-HCl pH 7.8, and resuspended in 0.1 mL of this same buffer. The sample was sonicated for 10 s with a virsonic 5 (virTis, NY, USA) set at 70% output power on ice bath. The homogenate was extracted in 5 mM Tris-HCl buffer pH 7.8 containing 1 mM phenylmethylsulfonyl fluoride (PMSF) as a protease inhibitor, at 4 • C overnight and, subsequently, centrifuged at 10,000g for 1 h at 4 • C (Biofuge 17RS: Heraeus Sepatech, Gmb, Osterode, Denmark). The supernatant was dialyzed overnight at 4 • C in 0.5 mM Tris-HCl buffer. Proteins were precipitated by 20% TCA (trichloroacetic acid) in acetone with 20 mM DTT for 1 h at −20 • C, added 1:1 to the homogenate. Then, the sample was centrifuged at 10,000 rpm for 15 min and the pellet was washed with cold acetone containing 20 mM DTT. Residual acetone was removed by air drying. In order to achieve a well-focused first-dimension separation, sample proteins must be completely disaggregated and fully solubilized, in a sample buffer containing 7 M urea, 2 M thiourea, 4% CHAPS, DeStreak buffer (Amersham Bioscience), 5 mM Co 3 K 2 , 2% IPG buffer (Amersham Bioscience), and incubated at room temperature for 30 min. Following clarification by centrifugation at room temperature (12,000 rpm, 10 min) the supernatant were stored frozen. In total 340 µL of rehydration buffer were added to promastigotes solubilized extracts (7 M urea, 2 M thiourea, 2% CHAPS, 0,75% IPG buffer 4-7, and bromophenol blue) and immediately were adsorbed onto 18 cm immobilized pH 4-7 gradient (IPG) strips (Amersham Biosciences) [173] . Optimal IEF was carried out at 20 • C, with an active rehydration step of 12 h (50 V), and then focused on an IPGphor IEF unit (Amersham Biosciences) by using the following program: 150 V for 2 h, 500 V for 1 h, 1,000 V for 1 h, 1,000-2,000 V for 1 h, and 8,000 V for 6 h. After focusing, IPG strips were equilibrated for 15 min in 10 mL of 50 mM Tris-HCl, pH 8.8, 6 M urea, 30% v/v glycerol, 2% w/v SDS, traces of bromophenol blue containing 100 mg of DTT, and further incubated for 25 min in the same buffer replacing DTT by 300 mg of iodoacetamide. After equilibration, the IPG strips were placed onto 12.5% SDS-polyacrylamide gels and sealed with 0.5% (w/v) agarose. SDS-PAGE was run at 15 mA/gel. The 2D gels were stained with silver staining mass spectrometry compatible. Briefly, the gels were fixed in 40% ethanol (v/v), 10% (v/v) acetic acid overnight, then sensitized with sodium acetate 0.68 % (w/v) and 0.05% sodium thiosulfate for 30 min, and washed with desionizated water thrice for 5 min. The gels were incubated in 0.25% (w/v) silver nitrate for 30 min. After incubation, it was rinsed with desionizated water twice for 50 s followed by adding the developing solution, which contained 2.5 % (w/v) sodium carbonate with 0.04% (v/v) formaldehyde until intensity desired. Development was terminated by adding 1.5 % (w/v) EDTA. Spots of interest were manually excised from silver-stained 2D-E gels after being distained as described by Gharahdaghi et al. [174] . Then, gel pieces were incubated with 12.5 ng/µL sequencing grade trypsin (Roche Molecular Biochemicals) in 25 mM AMBIC overnight at 4 • C. After digestion, the supernatants (crude extracts) were separated. Peptides were extracted from the gel pieces first into 50% ACN, 1% trifluoroacetic acid and then into 100% ACN. Then, 1 µL of each sample and 0.4 µL of 3 mg/mL α-cyano-4-hydroxycinnamic acid matrix (Sigma) in 50% ACN, 0.01% trifluoroacetic acid were spotted onto a MALDI target. MALDI-TOF MS analyzes were performed on a Voyager-DE STR mass spectrometer (PerSeptive Biosystems, Framingham, MA, USA). The following parameters were used: cysteine as S-carbamidomethyl derivative and methionine in oxidized form. Spectra were acquired over the m/z range of 700-4500 Da. Tryptic, monoisotopic peptide mass lists were generated and exploited for database searching. MS/MS sequencing analysis were carried out using the MALDI-tandem time-of-flight mass spectrometer 4700 Proteomics Analyzer (Applied Bio-systems, Framingham, MA). The MS study was performed at the University Complutense de Madrid Proteome Facility platform. The peptide mass fingerprinting data obtained from MALDI-TOF analyses were used to search for protein candidates using MASCOT software program [10] . The more relevant peptide fragments of the new protein were submitted to BLASTP to show graphically the similarity of the sequence compared to other RNases [175] . The BLAST procedure was carried out using as query database the nonredundant NCI database and allowing BLAST to search for conserved domains through the CD-search tool [176] . The search for tools to explore molecular diversity that complement or improve classical alignment tools like BLAST with information from gene ontology, RNA secondary structure prediction, partial ordering, or other sources constitutes a goal of major importance [177] [178] [179] [180] . In particular, different structural parameters have been used to mining the molecular diversity of peptides. For instance, Jacchieri have investigated structural propensities, co-localization of peptide fragments in protein sequences, interactions between peptide fragments in close structural proximity and the participation of physical chemical profiles in the distribution of structural motifs among peptide fragments in the Protein Data Bank (PDB) and the SwissProt databases [181] . In this study, we calculated three families of TIs that can be used as inputs for the QSAR study of the molecular diversity of RNase proteins and peptides. We selected TIs instead of other indices due to their fast calculation and high accuracy demonstrated in QSAR studies of molecular diversity [116, [182] [183] [184] [185] . This calculation was carried out for two groups of protein sequences, one made up of RNase-like enzymes and the other formed by heterogeneous proteins. A simple LDA was developed to classify a novel sequence as RNase or not using as inputs the above-mentioned parameters. The best equation found was: The statistical parameters for the above equation were: Canonical Regression Coefficient (R), Wilk's statistic (U), Fisher ratio (F), and error level (p-level), which have to be <0.05 [186] . In this equation, as well as in the two other QSAR (see below) the variable S(TI) = S(ξ ), S(π ), or S(θ ) are the outputs of the models. These are real valued scores assigned by the model to the propensity with which a given protein is predicted as RNase. This discriminant function presented excellent results both in training and external crossvalidation series carried out with an external set made up of RNase proteins and diverse no-RNase proteins not used to train the model (see Table 1 ). In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling test, and jackknife test [187] . However, as elucidated by [188] and demonstrated in [189] , among the three cross-validation methods, the jackknife test is deemed the most objective that can always yield a unique result for a given benchmark data set, and hence has been increasingly used by investigators to examine the accuracy of various prediction models (see, e.g., [30, [49] [50] [51] [52] 190, 191] ). In the current study, for reducing computational time as done by many other investigators, we used independent data set test for cross-validation. Its results are remarkable in comparison to results obtained by other researchers on using the LDA method in QSAR studies [192] . In order to compare the previous model with other methodologies based on MM, we developed two additional MARCH-INSIDE models. These models were based on spectral moments and entropy invariants. The equations of these models and their more important statistic parameters are depicted bellow: Both equations perform a statistically significant separation of two groups of proteins ( p < 0.05). The equation based on π k is essentially the same model that was previously reported by our group but, we incorporate it here in order to perform a comparative study [193] . However, the accuracy of the models is notably lower than the accuracy of model 1 (10) . Note that the values of Canonical Regression coefficients are R model 1 > R model 2 (11) > R model 3 (12) and, correspondingly, the inverse tendency is observed for the Wilk's statistics of group separation (U model 1 < U model 2 < U model 3). Detailed information on the classification performance of these models was reported in Table 1 . From these results, we can expect that the models based on different families of indices will present different accuracy in predictions. In this case, we should select the ξ -model represented by Eq. 10 as the better option with respect to the π -model and the θ -model. These results are consistent with those obtained in our previous reports, in which we used 2D pseudo-folding electrostatic parameters as sequence descriptors for function annotation of other classes of proteins [127] . In this section, we present a comparative study of molecular phylogenetic trees, useful for molecular diversity characterization, which are based on Pseudo-folding lattice 2D-TIs versus other trees that use Folding 3D-TIs values. We illustrate the comparison with a practical case: comparison of peptides found in the PMF of a new query protein reported here. In Fig. 3 , we illustrate an overall view of the 2D-E map obtained from the L. infantum promastigote homogenate. In this figure, we have done a zooming in the left-to-down corner to highlight an area of high density of spots, which apparently corresponds to protein fragments of low MW and low pI. Our interest in this area derived from the fact that these spots remained unchanged from gel to gel repetitions and might correspond to relevant proteins of this parasite. In order to start investigation on the nature of these proteins, initially, we marked the spot with an arrow and encircled in the zoom image for this area, see Fig. 3 . The protein contained in each spot was submitted to in-gel trypsin digestion and the mass of the resulting PMF, which is expression of the molecular diversity of the parasite protein, was obtained from MALDI-TOF MS analysis. We have studied before other proteins on the same region [194] . However, we focus our attention in this study on the protein corresponding to one spot not investigated before. Once we have obtained the data from MALDI-TOF MS analysis for this spot, the more relevant MS signals were introduced into the MASCOT search engine [195, 196] . We selected in MAS-COT the L. major database of annotated proteins with MS recorded due to its similarity to L. Infantum [197] . The MAS-COT search of MS signals does not match to any template hit with Ms higher than 51 ( p < 0.05) (see Table 2 ). However, we found a relatively high score of Ms = 42 for an RNase I with MASCOT accession code CHR16-22_tmp.17 and molecular weight Mw = 108,096. The two following match founds (Ms = 40 and Ms = 39) correspond to template proteins CHR16-22_tmp.27 and L344.4 with Mw = 30,867 and 52,863, but unknown function. In any case, almost all relative interesting matches found have been also recorded for unknown function or hypothetical proteins. These aspects make difficult the assignation of sequence and function for the new protein. But, at the same time, increase our interest on the PMF of this new query protein that do not match to known templates. As we mentioned in the introduction of this report the PMF of this type of protein may be of high interest. In Table 3 , we give detailed information on the results of the MS analysis of the PMF of the new protein using MALDI-TOF technique and MASCOT search engine. Similar combination have been successfully used in the past to study Trichinella antigens [173] and possible Leshmania dynein proteins [194] . In this table, we have shown only the 22 more interesting peptides matching with the MS of other proteins on the MASCOT search. We calculated the three type of pseudo-folding lattice 2D-TIs for these peptides. In Table 4 , we summarized the results obtained after the QSAR-based exploration of the molecular diversity of the PMF of the new protein. We depict in this table, the pseudofolding lattices for some peptides with higher Mw. We also predicted the contribution to RNase activity (see in Table 4 score values) using the two best QSAR models reported on this experiment (previous section). Both QSAR models coincide very well on the prediction of RNase scores for the new peptides. We found a regression coefficient of R = 0.88 between the RNase score of the QSAR based on ξ k (r 2 ) values versus the model based on θ k (r 2 ) indices. The QSAR study predicted the higher RNase scores for peptides P07, P08, P09, and P14. The first three peptides protein for the design of new RNases, we decided to confirm the predicted scores with a BLAST alignment search. In Table 5 , we summarized the result of this search. The BLAST score was adjusted considering that we use here short peptides chains of <20 aa length and not full protein sequences. We selected this approach, since BLAST-like method, such as PSI-BLAST, and other methods have been used to confirm and/or complement predictive algorithms before [39] . In Table 5 , we can note that in fact both QSAR and BLAST predict a positive RNase score for these peptides. This may be relevant, as we are using alternative methods that complement each other (QSAR is alignment-free whereas BLAST rely upon alignment) [127, [198] [199] [200] [201] . It can be noted in Table 4 that in this type of representation some aminoacids (aa) overlap on the same nodes resulting that the number of aa is higher than the number of nodes in the lattice (see Experiment 1). This aspect plus the pseudofolding procedure used to obtain lattices (not real folding) have given rise to the question about the structural accuracy versus computational cost, when we compare 2D-TIs to 3D-TIs. The problem is relevant and not only restricted to lattices 2D-TIs but also any kind of 2D-TIs [202] . In this sense, we decided to investigate in which extension the pseudo-folding lattice 2D-TIs are able to capture information present on 3D structure. For it, we first need the 3D structures of the peptides in order to calculate the 3D folding versions of the same type of pseudo-folding TIs. Then, we need to compare the higher dimension π k (r 3 ), θ k (r 3 ), and ξ k (r 3 ) values with the lower dimension π k (r 2 ), θ k (r 2 ), and ξ k r 2 indices. For this study, we used the same 19 peptides found on the PMF of the new protein. Unfortunately, we have only the sequences of the peptides but not the 3D structures. Consequently, we obtained first, the optimal 3D folded structures using a MD search for the 19 peptides (see Fig. 2 ). In Table 6 , we have summarized the results of MD simulation of these peptides. In this table, we reported the initial energy (E 0 ) and energy gradient (δ 0 ) based on the starting structure constructed with standard parameters for α-helixes (bond distances, angles, and dihedral angles) set as default on the sequence editor of Hyperchem [170, 171] . We also reported the (E 1 ) and energy gradient (δ 1 ) obtained after optimization of the structure with AMBER force field obtained by MC method applied to MD simulation. Finally, we report in Supplementary material file sm3 the ACCR values for the MDT of the 19 peptides. In the MD study, most researchers tend to try for an average ACCR value around 0.5 and smaller values may be appropriate when longer runs are acceptable, and more extensive sampling is necessary. In the present study, all the ACCR values were lower than 5.0, in consequence, we can accept the MD results as valid [170, 171] . Using information about the distribution of aminoacids in the sequence of the protein has been the major tendency on molecular phylogentic analysis [203] . In the introduction, we discussed the importance of new molecular phylogenetic approaches for protein based on other types of molecular structure information. In materials and methods, we outlined the possibility of construction of a phylogenetic tree for the PMFs of the new protein using TIs based on folded r 3 structure or pseudo-folded structures in r 2 . In the previous section, we recalled that the first type of TIs gives a more realistic picture of the protein structure, but the second-one are easier to calculate, which is important to scale the method up for large databases [202] . In this sense, it is important to compare the different TIs and the subsequent phylogenetic trees generated. For it, we have calculated first, the TI k (r d ) values for the 19 peptides and then the peptide-peptide distance using Eq. 9. We calculated only the TI k (r d ) that have some relevance for RNase activity according to the QSAR Eqs. 10, 11, and 12. It means that, we calculated the pseudo-folding indices ξ 1 (r 2 ), ξ 2 (r 2 ), π 0 (r 2 ), π 2 (r 2 ), and θ 0 (r 2 ). In Table 6 , we reported the values of all these TI k (r d ) for the 19 peptides. In Fig. 4 , we illustrated with a Two-way joining analysis that the indices calculated at different structural levels have typical values and forming structural clusters. In fact, Two-way joining analysis can detect automatically the 2Dpseudo-folding cluster and the cluster for 3D-folding TIs. It demonstrates that the method presents variations on the results depending on the detail level selected to describe the protein structure. In order to reaffirm this, we calculated the TIs using 3D-folded structure considering all atoms in the protein and Table 6 Some ξ k (r 2 ), θ k (r 2 ), and π k (r 2 ) values for 19 peptides found on the PMF of the new protein Sequence θ 0 (r 2 ) π 0 (r 2 ) π 2 (r 2 ) ξ 1 (r 2 ) ξ 2 (r 2 ) not only C α atoms as many researchers use to. The results show that we can detect certain hierarchy in the cluster organization of the indices (see Fig. 4 ). However, in cluster analysis, we can easily note that even (see Table 6 ) the three classes of indices have different values and form different clusters. The overall variability for all the indices is very similar in each peptide and somehow peptide specific. It means that peptide-to-peptide variations are more notable than structural level variations. In fact, the results of the phylogenetic tree analysis demonstrated relatively larger variations on the alternative clustering of the 19 peptides than on the alternative clustering of TIs using r 2 , r 3 for C α only, or all-atoms r 3 TIs. In Table 7 , we depict the final results obtained for the phylogenetic tree analysis of either peptides or TIs. This results show that, in principle, the distance T I D pq (r d ) between a peptide p and other q based on TI k (r 2 ) is structurally sensitive and codify sufficient structural information with respect to more detailed structural level. Actually, an inspection of a simple correlation matrix demonstrated that all the TIs calculated have correlations are significant at p < 0.05 except for π k (r 3 ) based on all atoms, which seems to be the more structurally sensitive TI calculated in this study. We can conclude that pseudo-fold-ing TI k (r 2 ) phylogenetic algorithms may become a fast and efficient alternative to TI k (r 3 ) methods, as well as a higher structurally detail complement to traditional sequence-only methods. In this study, we demonstrate that it is possible to develop and compare alignment-free QSAR models using sequence pseudo-folding TIs (based on Markov matrices). In addition, we compared this indices with similar indices based on 3D Fig. 4 Two-way joining study of folding TIs for different structural levels structures obtained by MD simulation. We also show with a practical example, the use of these QSAR and Molecular Phylogenetic models to predict RNase activity and explore the molecular diversity of peptides found on the PMFs of the new query protein isolated here by the first time from L. infantum. The RNase a superfamily: generation of diversity and innate host defense Targeted therapeutic RNases (ImmunoRNases) The nuclear RNase III Drosha initiates microRNA processing Design of shRNAs for RNAi-a lesson from pre-miRNA processing: possible clinical applications Purification and some properties of an extracellular ribonuclease with antiviral activity against tobacco mosaic virus from Bacillus cereus An iterative calibration method with prediction of post-translational modifications for the construction of a two-dimensional electrophoresis database of mouse mammary gland proteins Analysis of the cytosolic proteome of Halobacterium salinarum and its implication for genome annotation ) A novel fingerprint map for detecting SARS-CoV A new nucleotide-composition based fingerprint of SARS-CoV with visualization analysis Probability-based protein identification by searching sequence databases using mass spectrometry data Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics Proteomics-grade de novo sequencing approach New data baseindependent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques Automated prediction of protein attributes and its impact to biomedicine and drug discovery Structural bioinformatics and its impact to biomedical science Molecular therapeutic target for type-2 diabetes Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS Computational approach to drug design for oxazolidinones as antibacterial agents Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design Energetic approach to packing of a-helices: 2. General treatment of nonequivalent and nonregular helices Energetics of the structure of the four-alpha-helix bundle in proteins Virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points Progress in computational approach to drug development against SARS Energy-optimized structure of antifreeze protein and its binding mechanism Role of the protein outside active site on the diffusion-controlled reaction of enzyme MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM EzyPred: a top-down approach for predicting enzyme functional classes and subclasses Prediction of G-protein-coupled receptor classes GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information Identification of proteases and their types A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins Prediction of human immunodeficiency virus protease cleavage sites in proteins HIVcleave: a web-server for predicting HIV protease cleavage sites in proteins Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides Signal-3L: a 3-layer approach for predicting signal peptides Molecular evolution of toxin genes in Elapidae snakes Cellulose membrane supported peptide arrays for deciphering protein-protein interaction sites: the case of PIN, a protein with multiple natural partners Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information Fungal BLAST and model organism BLASTP best hits: new comparison resources at the Saccharomyces Genome Database (SGD) Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity Prediction of transporter family from protein sequence by support vector machine approach Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries Prediction of protein cellular attributes using pseudo-amino acid composition Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Large-scale plant protein subcellular location prediction Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo amino acid composition The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes Predicting lipase types by improved Chou's pseudo-amino acid composition Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier Graph theory of enzyme kinetics: 1. Steady-state reaction system Graphical rules for enzyme-catalysed rate laws Two new schematic rules for rate laws of enzyme-catalysed reactions An extension of Chou's graphical rules for deriving enzyme kinetic equations to system involving parallel reaction pathways Microcomputer tools for steady-state enzyme kinetics Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws: new methods based on directed graphs Graphic rules in steady and non-steady state enzyme kinetics Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase Steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences Do "antisense proteins" exist? 3D-QSAR study for DNA cleavage proteins with a potential anti-tumor ATCUN-like motif Unified QSAR approach to antimicrobials. Part 3: first-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds ANN-QSAR model for selection of anticancer leads from structurally heterogeneous series of compounds Medicinal chemistry and bioinformatics-current trends in drugs discovery with networks topological indices Cellular automation as models of complexity A new kind of science A probability cellular automaton model for hepatitis B viral infections An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation Using cellular automata to generate image representation for biological sequences Using cellular automata images and pseudo amino acid composition to predict protein subcellular location Digital coding of amino acids based on hydrophobic index Graphical approach to analyzing DNA sequences Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases New 2D graphical representation of DNA sequences Coronavirus phylogeny based on 2D graphical representation of DNA sequence A 2D graphical representation of RNA secondary structures and the analysis of similarity/dissimilarity based on it A 3D Graphical representation of RNA secondary structure On a six-dimensional representation of RNA secondary structures On a seven-dimensional representation of RNA secondary structures RNA secondary structure 2D graphical representation without degeneracy A condensed 3D graphical representation of RNA secondary structures On the similarity of DNA primary sequences Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases: isolation and prediction of a novel sequence from Psidium guajava L On 3-D graphical representation of DNA primary sequences and their numerical characterization Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences Medicinal chemistry and bioinformatics: current trends in drugs discovery with networks topological indices Prediction of protein structural classes using hybrid properties Multiple field three dimensional quantitative structure-activity relationship (MF-3D-QSAR) QSAR by LFER model of HIV protease inhibitor mannitol derivatives using FA-MLR, PCRA, and PLS techniques QSAR analyses of 3-(4-benzylpiperidin-1-yl)-N-phenylpropylamine derivatives as potent CCR5 antagonists QSAR of adenosine A3 receptor antagonist 1,2,4-triazolo[4,3-a]quinoxalin-1-one derivatives using chemometric tools Exploring 3D-QSAR of thiazole and thiadiazole derivatives as potent and selective human adenosine A3 receptor antagonists + Topological descriptors in drug design and modeling studies Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools Heuristic molecular lipophilicity potential (HMLP): a 2D-QSAR study to LADH of molecular family pyrazole and derivatives Semiempirical QSAR study and ligand receptor interaction of estrogens Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection Comparison of electrotopological-state indices versus atomic charge and superdelocalisability indices in a QSAR study of the receptor binding properties of halogenated estradiol derivatives Comparison of binary and 2D QSAR analyses using inhibitors of human carbonic anhydrase II as a test case Creating molecular diversity from antioxidants in Brazilian propolis. Combination of TOPS-MODE QSAR and virtual structure generation Artificial neural networks: non-linear QSAR studies of HEPT derivatives as HIV-1 reverse transcriptase inhibitors Virtual generation of agents against Mycobacterium tuberculosis: a QSAR study QSAR study using topological indices for inhibition of carbonic anhydrase II by sulfanilamides and Schiff bases QSAR study on phosphoramidothioate (Ace) toxicities in housefly A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis Fragment-based quantitative structure-activity relationship (FB-QSAR) for fragment-based drug design Implications from a network-based topological analysis of ubiquitin unfolding simulations Proteins as networks: usefulness of graph theory in protein science Network scaling invariants help to elucidate basic topological principles of proteins Indeterminacy of reverse engineering of Gene Regulatory Networks: the curse of gene elasticity Essentiality is an emergent property of metabolic network wiring Metabolic pathways variability and sequence/networks comparisons Charge and hydrophobicity patterning along the sequence predicts the folding mechanism and aggregation of proteins: a computational approach Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases: isolation and prediction of a novel sequence from Psidium guajava L Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach The biological functions of low-frequency phonons The biological functions of low-frequency phonons. 2. Cooperative effects Low-frequency vibrations of helical structures in protein molecules Identification of low-frequency modes in protein molecules Biological functions of low-frequency vibrations (phonons). III. Helical structures and microenvironment The biological functions of low-frequency vibrations (phonons). 4. Resonance effects and allosteric transition Low-frequency vibrations of DNA molecules Low-frequency motions in protein molecules. Beta-sheet and beta-barrel The biological functions of low-frequency vibrations (phonons). VI. A possible dynamic mechanism of allosteric transition in antibody molecules Collective motion in DNA and its role in drug intercalation Low-frequency resonance and cooperativity of hemoglobin Quasi-continuum models of twist-like and accordion-like low-frequency motions in DNA Biophysical aspects of neutron scattering from vibrational modes of proteins Solitary wave dynamics as a mechanism for explaining the internal motion during microtubule growth Soliton/exciton transport in proteins Low-frequency collective motion in biomacromolecules and its biological functions Solution structure of Ca 2+ -calmodulin reveals flexible hand-like properties of its domains Designed electromagnetic pulsed therapy: clinical applications Extrinsic electromagnetic fields, low frequency (phonon) vibrations, and control of cell function: a non-linear resonance system Dynamics of folded proteins Molecular dynamics simulations of biomolecules Internal motions of antibody molecules Solution NMR structure of a D, L-alternating oligonorleucine as a model of betahelix Conformational and structural analysis of the equilibrium between single-and doublestrand beta-helix of a D, L-alternating oligonorleucine Solution structure of a D, L-alternating oligonorleucine as a model of double-stranded antiparallel beta-helix Detection of secondary structure elements in proteins by hydrophobic cluster analysis 2-D graphical representation of proteins based on virtual genetic code On representation of proteins by star-like graphs Quantitative characterizations of proteome: dependence on the number of proteins considered Algorithm for coding DNA sequences into "spectrum-like" and "zigzag" representations Canonical labeling of proteome maps Order from chaos: observing hormesis at the proteome level On invariants of a 2-D proteome map derived from neighborhood graphs On characterization of dose variations of 2-D proteomics maps by matrix invariants A 4D representation of DNA sequences and its application A 2D graphical representation of DNA sequence Support vector machine approach for protein subcellular localization prediction Prediction of protein signal sequences A QSAR model for in silico screening of MAO-A inhibitors. Prediction, synthesis, and biological assay of novel coumarins 2D-RNA-coupling numbers: a new computational chemistry approach to link secondary structure topology with biological function Standardized multiple regression model HyperChem: a software package for computational chemistry and molecular modeling Exploratory studies of ab initio protein structure prediction: multiple copy simulated annealing, AMBER energy functions, and a generalized born/solvent accessibility solvation model Two-dimensional electrophoresis and mass spectrometry for the identification of species-specific Trichinella antigens Mass spectrometric identification of proteins from silver-stained polyacrylamide gel: a method for the removal of silver ions to enhance sensitivity Gapped BLAST and PSI-BLAST: a new generation of protein database search programs CD-Search: protein domain annotations on the fly Automated methods of predicting the function of biological sequences using GO and BLAST OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms Structure-dependent sequence alignment for remotely related proteins Multiple sequence alignment using partial order graphs Mining combinatorial data in protein sequences and structures On an aspect of calculated molecular descriptors in QSAR studies of quinolone antibacterials A topological substructural molecular design to predict soil sorption coefficients for pesticides Valence topological charge-transfer indices for dipole moments Chemometric methods in molecular design Prediction of protein structural classes Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms Recent progress in protein subcellular location prediction Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach Protein linear indices of the 'macromolecular pseudograph alpha-carbon atom adjacency matrix' in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor MMM-QSAR recognition of ribonucleases without alignment: comparison with HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence A two-dimensional electrophoresis proteomic reference map and systematic identification of 1367 proteins from a cell suspension culture of the model legume Medicago truncatula Genomebased peptide fingerprint scanning Application of machine learning to structural molecular biology Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity 2D RNA-QSAR: assigning ACC oxidase family membership with stochastic molecular descriptors; isolation and prediction of a sequence from Psidium guajava L Comparative study of topological indices of macro/supramolecular RNA complex networks Computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices Molecular phylogenetics of the Pectinidae (Mollusca: Bivalvia) and effect of increased taxon sampling and outgroup selection on tree topology