key: cord-312757-58p5b2vw authors: Pérez-Montoto, Lázaro G.; Santana, Lourdes; González-Díaz, Humberto title: Scoring function for DNA–drug docking of anticancer and antiparasitic compounds based on spectral moments of 2D lattice graphs for molecular dynamics trajectories date: 2009-11-30 journal: European Journal of Medicinal Chemistry DOI: 10.1016/j.ejmech.2009.06.011 sha: doc_id: 312757 cord_uid: 58p5b2vw Abstract We introduce here a new class of invariants for MD trajectories based on the spectral moments πk (L) of the Markov matrix associated to lattice network-like (LN) graph representations of Molecular Dynamics (MD) trajectories. The procedure embeds the MD energy profiles on a 2D Cartesian coordinates system using simple heuristic rules. At the same time, we associate the LN with a Markov matrix that describes the probabilities of passing from one state to other in the new 2D space. We construct this type of LNs for 422 MD trajectories obtained in DNA–drug docking experiments of 57 furocoumarins. The combined use of psoralens+ultraviolet light (UVA) radiation is known as PUVA therapy. PUVA is effective in the treatment of skin diseases such as psoriasis and mycosis fungoides. PUVA is also useful to treat human platelet (PTL) concentrates in order to eliminate Leishmania spp. and Trypanosoma cruzi. Both are parasites that cause Leishmaniosis (a dangerous skin and visceral disease) and Chagas disease, respectively; and may circulate in blood products collected from infected donors. We included in this study both lineal (psoralens) and angular (angelicins) furocoumarins. In the study, we grouped the LNs on two sets; set1: DNA–drug complex MD trajectories for active compounds and set2: MD trajectories of non-active compounds or no-optimal MD trajectories of active compounds. We calculated the respective πk (L) values for all these LNs and used them as inputs to train a new classifier that discriminate set1 from set2 cases. In training series the model correctly classifies 79 out of 80 (specificity=98.75%) set1 and 226 out of 238 (Sensitivity=94.96%) set2 trajectories. In independent validation series the model correctly classifies 26 out of 26 (specificity=100%) set1 and 75 out of 78 (sensitivity=96.15%) set2 trajectories. We propose this new model as a scoring function to guide DNA-docking studies in the drug design of new coumarins for anticancer or antiparasitic PUVA therapy. Quantitative Structure-Activity Relationship (QSAR) studies unravel structural and physicochemical requirements for biological activity in a great variety of compounds [1] . The classic QSAR studies connect information of the chemical structure of the molecule, expressed by means of numbers, with the biological activity [2] . However, QSAR-like procedures are not restricted to drugs and biological activity but other systems and properties, such as allergenic character of proteins, may be predicted [3] [4] [5] [6] . One special class of indices used in QSAR called Topological Indices (TIs) are based on the concept of molecular graph; which indicates the presence of vertices or nodes (atoms) and connections or edges between nodes (chemical bonds) [7] [8] [9] [10] . In the same way, the field of application of TIs is, of course, not restricted to the chemistry of low-molecular-weight compounds and extends to other branches of sciences. In general, TIs of different types of graph representations or networks such as protein structure, gene polymorphisms, metabolic networks, food webs or host-parasite networks, internet, or social networks may be used. In these networks, amino acids, nucleotides, enzymes, microorganisms, cerebral cortex regions, web pages, social groups.etc, may play the role of nodes and electrostatic interactions, mutations, metabolic reactions, host-parasite relationships, brain region co-activations, links, disease propagations.etc may play the role of edges [11] [12] [13] [14] [15] [16] [17] . For instance, the pseudo amino acid (PseAA) composition or PseAAC method for calculation of protein TIs was originally introduced by Chou to improve the prediction quality for protein subcellular localization and membrane protein type [18] , as well as for enzyme functional class [19] . The PseAA composition can be used to represent a protein sequence with a discrete model yet without completely losing its sequence-order information. Ever since the concept of Chou's pseudo amino acid composition was introduced, various PseAAC approaches have been stimulated for enhancing the prediction quality of various protein features [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] . Owing to its wide usage, recently a very flexible pseudo amino acid composition generator, called ''PseAAC'' [30] , was established at the website http://chou.med.harvard.edu/bioinf/PseAAC/, by which users can generate 63 different kinds of PseAA composition, including the dipeptide components. The publication in the area has steadily increased and, consequently, in the last years have appeared in-depth reviews that could be useful for the readers of the present manuscript [11, [31] [32] [33] [34] [35] [36] [37] [38] . Many authors prefer to use the term Quantitative Structure-Binding affinity Relationship (QSBR) when one uses QSAR-like procedures to predict drug-target binding affinity and 3D structural information [39] . Anyhow, the term QSBR has to be used carefully to avoid confusion with Quantitative Structure-Biodegradability Relationships analysis [40, 41] . In this work, we use QSBR in the first sense. In any case, both approaches QSAR and QSBR diverge in some degree in the type of measure (activity or binding) and sometimes in how detailed manner we need to know the chemical structure (2D or 3D) but both use essentially the same algorithm. In addition, predicting drug activity we can use 3D drugtarget QSAR/QSBR models as scoring function to guide the search of optimal drug-target interaction geometries in drug-target docking studies [42] [43] [44] . Almost all QSAR/QSBR or other types of docking scoring functions are aimed to predict protein-drug interactions. For instance, Wang et al. [45] reported a comparative study of eleven whereas Ferrara et al. [46] studied nine different docking scoring functions all for protein-drug interactions. Conversely, DNA-drug and RNA-drug dockings are generally less investigated. In particular, we did not find a QSBR scoring function for DNA-Furocoumarin docking. The furocoumarins are a class of natural or synthetic compounds with very interesting pharmacological properties [47] , commonly used in the treatment of skin diseases such as psoriasis and mycosis fungoides [48] . This treatment called PUVA consists in a therapy that combines the use of both chemicals and long-wave ultraviolet light (UVA) [49] . The molecular basis of PUVA is connected with the highly specific photo-damage in DNA of epidermal cells. This damage interferes with the DNA replication, producing an inhibition of DNA synthesis which reduces or blocks the cell duplication [50] . Although the lineal furocoumarins (psoralens) are able to form the three adduct types, the geometry of the angular ones (angelicins) only allows them to form mono-adducts with the DNA. It is well known that the side effects observed in PUVA therapy, such as skin photo-toxicity and risk of skin cancer are strictly connected with the bi-functional lesions in DNA [51] . Recently, Eastman et al. demonstrated the effectiveness of PUVA treatment with the psoralen analogue called amotosalen to inactivate the parasite Leishmania spp. in human platelet (PLT) concentrates intended for transfusion. Leishmania spp. are protozoans that cause skin and visceral diseases. Leishmania spp. are obligate intracellular parasites of mononuclear phagocytes and have been documented to be transmitted by blood transfusion. Both metacylic promastigotes and amastigotes were extremely susceptible to photochemical inactivation by PUVA. Promastigotes represent the infectious from the sand fly vector and amastigotes are the form that grow in mononuclear phagocytes. Thus, the PUVA of PLT concentrates inactivates both forms of Leishmania that would be expected to circulate in blood products collected from infected donors [52] . In addition, Gottlieb et al. reported the inactivation of the blood-borne parasite Trypanosoma cruzi (T. cruzi) by PUVA with 4 0 -aminomethyl-4,5 0 ,8-trimethylpsoralen (AMT) in PLT concentrates [53] . The infectivity of the parasite is eliminated at 4.2 J/cm 2 . The trypomastigote motility continues for at least 16 h-post-treatment and is inhibited only after much higher light doses. Isolation of total DNA from the parasite cells after treatment in the presence of 3H-AMT indicated that at the lethal UVA influence about 0.5 AMT adducts per kilobase pairs occurred. These results suggest that this PUVA methodology may eliminate blood-borne T. cruzi, the causative agent of Chagas disease. More recently, Castro and Girones demonstrated that the pathogen reduction system based on PUVA with amotosalen presents a robust efficacy for inactivation of high doses of three different strains of T. cruzi and offers the potential to make the PLT supply safer [54] . The biological activity of these compounds is normally studied by evaluating their capacity of forming an intercalated complex with DNA and their ability of photo-binding through mono or bi-functional addition to the same macromolecule [55] . A traditional procedure to determine the photo-biological and antiproliferative activity of furocoumarins measures ID 50 , the UVA dose that reduces to 50% of the DNA synthesis in Ehrlich Ascites tumor cells (EATC) in presence of tested compound at certain concentration (18-20 mM). The protocols used in the activity determination are heterogeneous, however the use of the 8-MOP as reference to express the activity is very common [56] [57] [58] . These facts point to the stability of DNA-drug complex as a central factor in the activity of anticancer drugs in general including furocoumarins. At the light of these facts Molecular Dynamics (MD) of the DNA-drug complexes is central for drug design towards PUVA therapy. Since the advent of bioscience with the studies of Karplus et al. MD has become the by foremost wellestablished, computational technique to investigate structure and function of bio-molecules and their respective complexes and interactions [59] [60] [61] . In addition, after a pioneer paper entitled 'The Biological Functions of Low-Frequency Phonons' [62] published in 1977, a series of investigations into biopolymers from MD point of view have been stimulated. These studies have suggested that lowfrequency (or terahertz frequency) collective motions do exist in proteins and DNA that hold a very high potential to reveal the profound dynamic mechanisms of many marvelous biological functions in biological systems (see, e.g., [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] and a comprehensive review [77] ). This kind of inferences has been later observed by NMR [78] , and been further used for medical treatments [79, 80] . In view of this, to understand really the interaction mechanism of drugs with proteins or DNA, we should consider not only the static structures concerned but also the dynamical information obtained by simulating their interactions through a dynamic process. In this sense, it is of high relevance taking into account that the previous non-covalent binding (in dark) between drug and DNA has a strong influence on the subsequent photoreaction and therefore on their biological activity [81] . Consequently, MD studies of the DNA-drug complexes in furocoumarins and anticancer drugs in general are of the major relevance. In this sense, it would be very interesting to work with invariants that encode information about the MD trajectories for the intercalation complexes of furocoumarins and anticancer drugs in general. The analysis of the MD trajectories resulting from the integration of the equations of motions in MD remains, however, the greatest challenge and requires a great deal of insight, experience, and effort. In a recent and very important work, Hamacher [82] proposed a new, theoretically sound, and versatile analysis procedure that provides scientists with a semi-quantitative invariant measure to compare various scenarios of their respective simulations. However, using graphic approaches to study biological systems can provide useful insights, as indicated by many previous studies on a series of important biological topics. Graphs have been used to study enzyme-catalyzed reactions [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] , protein folding kinetics [36] , inhibition kinetics of processive nucleic acid polymerases and nucleases [93] [94] [95] [96] [97] [98] [99] , codon usage [100] [101] [102] , base frequencies in the anti-sense strands [103] , and analysis of DNA sequence [104] . Moreover, graphical methods have been introduced for QSAR study [2, 11, 105, 106] as well as utilized to deal with complicated network systems [11, 38, 107] . Recently, the ''cellular automaton image'' [108, 109] has also been applied to study hepatitis B viral infections [110] , HBV virus gene miss-sense mutation [111] , and visual analysis of SARS-CoV [112, 113] , as well as representing complicated biological sequences [114] and helping to identify protein attributes [115, 116] . Several authors have used pseudo-folding lattice Hydrophobicity-Polarity (HP) models to simulate protein folding making simulations to optimize the lattice structure and resemble real folding [117] [118] [119] [120] [121] [122] [123] [124] . However, we can choose notably simpler pseudo-folding rules to avoid optimization procedures and speed up notably the construction of the lattice. In this sense, useful graph representations of DNA, RNA and/or protein sequences have been introduced by Nandy et al. [125] [126] [127] [128] [129] [130] [131] , Gates [132] , Leong and Morgenthaler [133] , Randic et al. [134] based on 2D coordinate systems. We call these graph representations as sequence pseudofolding lattice networks (LNs) because they look like lattice structures and in fact, we are forcing a sequence to fold in a way that not necessarily occurs in nature. In general, these LNs (as for other graphs) can be numerically characterized with TIs. These TIs describe the distribution of amino acids or nucleotides along the sequence but also encode higher order information. Thus lattice pseudo-folding TIs can be used in protein QSAR. Our group, have used different MARCH-INSIDE TIs of pseudo-folding lattice-like networks to predict diverse protein or DNA/RNA functions. For instance, we have used stochastic pseudo-folding spectral moments to predict ribonucleases [135] and dyneins [136] . In other works, we used Markov chain pseudo-folding electrostatic potentials to predict polygalacturonases [137] or human colon and breast cancer biomarkers [138] . All these MARCH-INSIDE pseudo-folding TIs can be calculated when we sum the respective indices for each node of the graph. All the above-mentioned values were used recently to predict mycobacterium promoters and compare entropies, spectral moments, and pseudo-folding electrostatic potentials [139] . The readers may see three recent reviews discussing the applications ranging from graph of small molecules to graph or network representation of protein sequences and 3D structure, DNA sequences, RNA secondary structure, or human blood proteome mass spectroscopy outcomes [11, 38, 140] . In any case, if we understand sequence as a type of input data we need not limit the applications of the pseudo-folding lattice network method to proteins, DNA or RNA sequences. Elaborating this line of thinking, we have proposed pseudo-folding lattice network representations of mass spectroscopy outcomes typical of blood proteome samples containing many proteins. For instance we have constructed lattice network representations for mass spectroscopy results obtained from blood proteome samples typical of drugs causing cardiotoxicity [141] . After calculation of the sum of the TIs of each sample we used them to seek a new type of classifier. The model connects TI values of the mass spectra of the blood proteome with the probability of appearance of drug cardiotoxicity. This new type of model was called Quantitative Proteome-Property Relationships (QPPR) in analogy to QSAR or QSPR [142] . We have used these lattice network TIs also to predict human prostate cancer [143] . The success of this strategy encouraged us to consider other classes of sequence data and solve different problems. For instance, the MD trajectories referred in previous paragraphs are time series obtained from simulation runs that constitute another type of sequential data. In any case, even if spectral moments of different graphs have been successfully used in QSAR before [144] [145] [146] [147] we can see that spectral moments of an LN for MD trajectories have not been explored. Consequently, we decided to study here these indices to describe MD trajectories. In the present paper, we introduce an LN representation for the study of MD trajectories. We also obtain quantitative models able to differentiate furocoumarin derivatives according to their antiproliferative activity and the stability of the DNA-drug complex. The new model is also QSBR that has potential applications as scoring function for DNA-furocoumarin docking studies. For our study we used the decanucleotide of sequence d(CCGCTAGCGG) and the software application HyperChem [148] , a fragment of DNA with double helix in B form and sugars in 2 0 -endo form. This decanucleotide sequence has been used in different studies concerning psoralens intercalation [149] . The structure of all the compounds selected for DNA-drug interaction studies were optimized using the interactive model building package of Hyper-Chem [148] . The optimization of their geometries was carried out by the semi-empirical quantum mechanics calculations with method PM3 [150] using the Polak-Ribiere algorithm and the options implemented by default in the mentioned package. Thus, the minimized molecular structures were intercalated by hand approach in the DNA fragment, using the HyperChem package and taking into account the following experimentally demonstrated statements: 1. In the dark, the poly[dA-dT] poly[dA-dT] sequence in DNA is the most favorable site for intercalation since the further photoreaction takes place mainly on the 5,6 double bond of the thymine [151] . So, the optimized molecules were inserted among the thymine units in a parallel plane to the bases and, according to our decision, in a halfway position ( Fig. 1. Left ). 2. The furocoumarins have two reactive sites, but after photoreaction, different types of cycloadducts can be formed: mono (furan-side or pyrone-side) and di-adducts (the cross-link) [152] . Although psoralens are able to form all the cycloadduct types, angelicins form only mono-adducts owing to their angular molecular structure. Keeping this in mind, for each lineal molecule we modeled only one starting conformation, for which the cycloadduct formation by either one or other reactive site (furan or pyrone-side) is equally feasible from a geometric point of view. For each angular molecule, we decided to model two starting conformations, one for each mono-adduct formation (for the furan-side that we named as j-conformation and for the pyrone-side that we named as c-conformation). 3. The stereochemistry of the furocoumarin adducts is cis-syn [153, 154] . Consequently, the molecules were oriented in such a way that the intercalation complex favors mainly the formation of cycloadducts with this stereochemistry. In the case of the furan-side, the stereochemistry syn means that the furan O 1 0 and the pyrimidine N 1 are going to be on the adjacent corners of the future cyclo-butane ring. For the pyrone-side, the stereochemistry syn is defined as having the carbonyl-carbon of the pyrone ring and the N 1 of the pyrimidine on the adjacent corners of the future cyclo-butane ring (Fig. 1, right) . On the other hand, some of the studied angular molecules present ramifications in the C3 carbon that hindered us to model appropriately their j-conformation, due to steric problems with the thymine ring. We also found steric impediments in the backbone of the DNA when these ramifications are much bigger. In all these cases, we decided to model several alternative starting conformations for which the steric effects were eliminated. For the majority of the cases we just varied the insertion degree of molecule in the DNA; in the most critical cases we also had to rotate the molecule clockwise, see Figs. 1 and 2 and Table 1 for details. Both, the displacement outwards DNA and the molecule rotation were carried out in the halfway and parallel plane to the nitrogen bases. In this sense, the geometric criterion used was the relative distance (in the plane projection) between the geometric centers of the double bonds (j or c bond for furocoumarins and 5,6 bond for the thymine) that will take part in the photo-addition and the relative angle between them. In both Table 1 and Fig. 2 , the variations of these geometric parameters used to model the j-conformations are represented in a simplified way. Taking these aspects into consideration the notation of an MD trajectory is given here as: m-[Bond/Dist./Ang.], where, m is the number of the compound in Table 2 or Table 3 , Bond ¼ j, c, or j and c are the chemical bonds susceptible to photo-addition in this position; whereas Dist. and Ang. are the distance and angular intercalation parameters, respectively (see Table 1 ). The DNA-drug docking molecular dynamics trajectories or energetic profiles of all the starting intercalation complexes were obtained by means of the Monte Carlo [155] method, using the HyperChem package. In this sense, the force field AMBER94 of molecular mechanics was used with distant-dependent dielectric constant (scale factor 1), electrostatic and Van der Waals values by default and cutoffs shifted with outer radius of 14 Å. All the components of the force field were included and the atom type was recalculated keeping their current charges. Finally, the simulation was executed in vacuo at 300 K and 100 optimization steps obtaining MD trajectories with 100 potential energy d E j (j ¼ 1, 2, 3, .100) values each. We obtained 21 MD trajectories for psoralens and 154 MD trajectories for the 36 different angelicins. We also analyzed 36 averaged MD trajectories for each angelicin taking the Secondly, the method uses the matrix 1 P, which is a squared matrix to characterize the MD profile embedded into the latticelike graph. Please, note that the number of nodes (n) in the graph may be equal or even smaller than the number of steps given to obtain the MD profile. The same happens for amino acids or DNA bases in the polymeric chain. Accordingly, the matrix 1 P contains the probabilities 1 p ij to reach a node n i moving throughout a walk of length k ¼ 1 from other node n j [139, 141] : where, d E j is the sum of all energy values of the steps d E s that overlap on the same node j. The parameter a ij equals to 1 if the nodes n i and n j are adjacent in the graph and equals to 0 otherwise. The value D 0j gives the geometric location of the node and represents the Euclidean distance between the node and the center of coordinates. In Fig. 1 (bottom part) we depicted an example of LN for an MD trajectory. Afterwards, it is straightforward to realize the calculation of p k (L) values: Using MCM we can calculate these p k (d) values associated to the electronic distribution in molecule of the drug (d). The theoretic foundations of the method have been given in previous works, so we do not detail it here but refer the reader to these works [156] [157] [158] . We can use p k (d) values in addition to the p k (L) values to describe only the drug (not the MD trajectory). The p k (d) values are spectral moments of the classic electronic Markov matrix ( 1 P). These values have been used in QSAR before and depend only on connections (chemical bonds) between node (atoms) in the molecular graph and the electronegativity (c j ) of these atoms. The values p k (d) are referred to atoms (nodes) in molecular graphs. These vectors are elements of a Markov chain based on the stochastic matrix 1 P, which contains elements that describe the probabilities of transition of electrons p 1 (i,j) from node (atom) i-th to, j-th. In order to ensure that the p 1 (i,j) values describe the probabilities of transition of electrons from node (atom) i-th to, j-th we use atmoic electronegativity values. At following, we give the formula for both the transition probabilities (elements of the matrix) and the atoms set entropy centrality measures. In this study, we selected different furocoumarins and some of their aza-analogues, whose antiproliferative activities in Ehrlich Ascites tumor cells have been determined (Tables 2 and 3 ). We obtained in total 422 MD trajectories for these compounds. We constructed 422 LNs (one for each MD trajectory) transformed them in a vector of 11 p k (d) values for the compound and 11 p k (L) values for the MD trajectory (see previous sections). We grouped all these 422 MD trajectories into two sets, one composed of MD trajectories of complexes between DNA and active compounds and the other composed of trajectories of active compounds with no-optimal MD trajectories and/or trajectories of non-active compounds. In general, compounds such as 4 0 -MAP and the 4-MBAP, with activities (ID 50 relative to 8-MOP) of 0.13 and 0.14 are considered as poorly active [57, 58] . Keeping in mind all the abovementioned aspects, we classified the 57 compounds, compiled for our dataset into two observed activity groups: 0 for the inactive compounds (LD 50 0.1) and 1 for the active ones (LD 50 > 0.1). QSBR studies were carried out to obtain models that allow us to classify the furocoumarin derivatives in one of these two activity groups. We selected Linear Discriminant Analysis (LDA) [159, 160] to fit the discriminant function as implemented in the LDA module of the STATISTICA 6.0 software package [161] . We used forwardstepwise algorithm for variable selection [162] [163] [164] . The strength of the correlation was determined by the Canonical Regression Coefficient (Rc) and the statistical significance of the LDA model was determined with U-statistics (U) and the respective p-level (p). We standardized all the variables included in the model in order to bring it into the same scale. Subsequently, a standardized linear discriminant equation that allows to compare their coefficients is obtained [165] . We also inspected the percentage of good classification, cases/variables' ratios (r parameter), and number of variables to be explored to avoid over-fitting or chance correlation [162, 163] . The general form of this model is: Computational approaches, such as structural bioinformatics [76, 77] , molecular docking [66, 78, 79] , Monte Carlo simulated annealing approach [80] and QSAR [50, 81, 82] can timely provide very The output of the model, MD-score, is a real value variable that scores the predicted goodness of fit for one MD trajectory. In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, sub-sampling test, and jackknife test [166] . However, as elucidated by [167] and demonstrated in [168] , among the three cross-validation methods, the jackknife test is deemed the most objective that can always yield a unique result for a given benchmark dataset, and hence has been increasingly and widely used by investigators to examine the accuracy of various predictors (see, e.g., [20] [21] [22] [23] [24] [25] 27, [169] [170] [171] [172] ). In the current study, because the jackknife test would take a lot of computational time, we choose to use the independent dataset test to examine the prediction accuracy. The model was trained with a training series and later validated with an external validation series. In training series the model correctly classifies 79 out of 80 (specificity ¼ 98.75%) optimal and 226 out of 238 (sensitivity ¼ 94.96%) no-optimal MD trajectories. In external validation series the model correctly classifies 26 out of 26 (specificity ¼ 100%) optimal and 75 out of 78 (sensitivity ¼ 96.15%) nooptimal MD trajectories. These results represent total accuracy ¼ 95.91% and 97.12% in training and validation respectively. Previous QSAR works that use LDA as the classification technique accept this level of sensitivity, specificity, and accuracy as indicative of high quality of the model [106,173-178þ ]. We can obtain new types of 2D graph theoretical representation for Molecular Dynamics (MD) trajectories that resemble LNs used for DNA and protein sequences. At the same time, it is possible to calculate new classes of invariants for MD trajectories based on the spectral moments p k (L) of the Markov matrix associated to these LNs. The p k (L) values can be used as inputs to train new classifiers in order to discriminate between optimal and no-optimal intercalation modes relevant to the biological activity. The new models can be used as scoring functions to guide DNA-docking studies in the drug design of new coumarins for PUVA therapy. Handbook of Graphs and Complex Networks: From the Genome to the Internet The Science of Photomedicine A New Kind of Science Hyperchem Software. Release 7.5 for Windows, Molecular Modeling System A Handbook of Computational Chemistry Discriminant analysis for activity prediction STATISTICA. 6.0 for Windows Pattern recognition in chemistry Chemometric Methods in Molecular Design Standardized multiple regression model, Applied Linear Statistical Models The authors sincerely thank kind attention and useful comments from Editor Prof. Dr. Antonio Monge-Vega and two reviewers. Gonzá lez-Díaz H acknowledges financial support of Program Isidro Parga Pondal funded by Xunta de Galicia and European Social Fund (E.S.F.). Supplementary data associated with this article can be found in the online version, at doi:10.1016/j.ejmech.2009.06.011.