key: cord-0003520-0ec2xunt
authors: Plewczyński, Dariusz; Ginalski, Krzysztof
title: The interactome: Predicting the protein-protein interactions in cells
date: 2008-10-06
journal: Cell Mol Biol Lett
DOI: 10.2478/s11658-008-0024-7
sha: 18d1ff1667652b4e383a0a1ea258db97ac74735c
doc_id: 3520
cord_uid: 0ec2xunt

The term Interactome describes the set of all molecular interactions in cells, especially in the context of protein-protein interactions. These interactions are crucial for most cellular processes, so the full representation of the interaction repertoire is needed to understand the cell molecular machinery at the system biology level. In this short review, we compare various methods for predicting protein-protein interactions using sequence and structure information. The ultimate goal of those approaches is to present the complete methodology for the automatic selection of interaction partners using their amino acid sequences and/or three dimensional structures, if known. Apart from a description of each method, details of the software or web interface needed for high throughput prediction on the whole genome scale are also provided. The proposed validation of the theoretical methods using experimental data would be a better assessment of their accuracy.

Interactions between proteins are crucial for most of the molecular processes in cells. Therefore, identifying the full interaction repertoire, the so-called protein interaction network, will yield a better understanding of the cellular machinery on the molecular level. A combination of experimental and theoretical approaches is needed to accomplish this goal. Large-scale experiments allow massive amounts of data to be gathered and interacting proteins to be characterized, even if the data is partial or inaccurate. It is essential to carefully select the most important pairs of interacting proteins to properly interpret such experimental results, i.e. observed protein interaction networks. The term Interactome describes the set of all molecular interactions in a cell, especially in the context of protein-protein interactions. The whole interactome is often presented as a graph with nodes denoting interacting partners and edges representing interactions. In order to properly understand the molecular interactions, one has to start by analyzing interacting pairs of proteins in terms of their structure and sequence. Recent advances in bioinformatics tools and resources allow for a wider view of a protein and its biological context or physico-chemical properties. Thanks to the structural genomic initiative of the last decade, more and more protein sequences and their three-dimensional structures are known, giving rise to large publicly available databases. Furthermore, many experimental laboratories are performing more detailed experimental analyses of protein functions, yielding small but specialized databases. The most important goal of bioinformatics is integrating these experimental resources, developing computational methods based on these databases, and providing access to this converted information to a wider research audience. The integration of various resources into a single, publicly available metadatabase would allow for the easy extraction of valuable biological information from the massive amount of accumulated data. Moreover, the recent advances in theoretical methodology have been a crucial step into a systematic understanding of the molecular machinery. Most methods for the prediction of protein-protein interactions use sequence information to train various supervised machine learning algorithms. Three-dimensional structures acquired from the Protein Data Bank (PDB) [1] are also used to improve the accuracy of predictions. Further advances are possible when the genomic context, and predicted or known functional annotation of any interaction partners are taken into account. The number of crystallographically solved (known) structures is much smaller than the number of protein sequences. Therefore, most protein-protein interaction prediction methods neglect the three-dimensional details, focusing instead on sequence analysis. Major advances in the field of protein structure prediction allow the introduction of a complementary approach that will use predicted 3D models together with sequence features in order to correctly pair potentially interacting proteins. For instance, the MetaBASIC algorithm developed by Ginalski et al. [2] compares the sequence profiles of two proteins enriched by their predicted secondary structures. Such alignments of query sequences with proteins of known structure allow for quick mean resolution 3D model creation and optimization. The prediction of protein-protein interactions is a difficult problem when an analysis of both the protein sequences and known three-dimensional structures is needed. There are at least two reasons for that. Firstly, the number of possible interactions that have to be taken into account is extremely large. In yeast cells, one would have to analyze a possible 18,000,000 interactions between 6,000 proteins encoded in its genome, not counting multiple variants of gene products coming from alternative splicing or post-translational modifications. Only a small portion of those pairs is actually present in cells [3, 4] . Secondly, there are several types of possible interactions in living cells, from stable complexes up to temporary, functional pairings (for example, during the phosphorylation process in response to external stimuli). Protein complexes are better preserved during the evolution process than single proteins, so some computational methods focus on the prediction or searching of complexes that are common to several species [5] [6] [7] [8] [9] [10] [11] [12] . Those methods use available information about experimentally verified interactions between proteins, orthologies, and comparison of protein sequences [13] . This review outlines recent advances in the field of protein-protein interaction prediction, and introduces some ideas about the application of structural modeling to improve the accuracy of these approaches. Such rapid methods allow the whole proteomes of various organisms to be scanned. By improving selection accuracy, computational methods help in the discovery of true interactions within the experimental data, which contains many false positives. Thus, the ultimate goal of system biology, i.e. to draw and understand the real interactome, is close to being achieved.

The main goal for scientists of post-genomic biology is understanding the complex network of interacting proteins, DNA, RNA and small chemical molecules in living cells. Proteins are crucial parts of those networks, and information about their structure, amino acid sequence and functional context is important for gaining a better insight into the whole cell. Furthermore, knowledge about the interaction partners of a selected protein helps to obtain a more detailed description or even prediction of that protein's function [14, 15] . The gathered information about proteins for various species can be presented as complex graphs. Each edge can be described by a weight estimating the likehood of a certain interaction. Two types of interactions can be distinguished: physical coupling (where stable or meta-stable complexes are formed) or functional coupling (where the pair of proteins is connected due to the mutual or directed influence of one on another during a chemical reaction). Both types of interaction are very different in terms of the molecular details, but they can be integrated into the same complex graph. For example, the signaling cascades can be displayed as linear graphs of interacting proteins, where each node modifies the next one to transmit a signal. Such chains represent simple paths in the complex networks of interactions that also contain the stable protein complexes [16] . Several sources of biological information on protein sequences, structures and their interactions are available online. Those resources can be roughly divided into two groups: sequence and structural. The first group of experimentally confirmed protein-protein pairings involves transient interactions, and the second focuses on complexes, i.e. stable interactions. Most of the databases use their own format for the data, so their current level of integration is limited. The theoretical analysis of interactions depends on heterogeneous sources of biological information, such as sequence and structural databases, the literature, and experimental data. The main databases containing experimental information about protein-protein interactions are: the Database of Interacting Proteins (DIP) [17] , the Biomolecular Interaction Network Database (BIND) [18] , the Molecular Interaction Database (MINT) [19] , INTACT [20, 21] , and Human Protein Reference Database (HPRD) [22] . The literature data on selected protein sequences is available from the iHOP [23] and STRING [24] databases. In the Protein Data Bank (PDB) [1] database, one can find the three-dimensional structures of protein complexes, while in SCOP, one can find protein domains, whereas protein families in PFAM [25] . It is possible to find the homologous proteins to each of a given protein's interacting partners using tools like PSI-BLAST [26] . The PSI-PRED [27] allows the secondary structure of a given protein to be predicted using only sequence information, and can be used to enrich the sequence data by some structural features. In a similar way, one can assign the likely molecular function through functional categories from COG [28] , GO annotations [29] , the GOA database [30, 31] , or the metabolic classification KO [32] . In order to enrich this information, it is useful to include information about the participation of a query protein in the metabolic or signaling pathways using the standard toolbox from the KEGG http://www.genome.jp/kegg/ [33] and GO http://www.geneontology.org/ [29] consortia. The whole set of available sources of biological information on interacting protein sequences, and experimental or predicted three-dimensional complexes could be integrated into a single meta-database. Meta-databases allow for significant advances in the analysis of the statistical properties of protein interaction networks. The yeast analysis by Jeong et al. [34] or Sprinzak et al. [35] can be used as an example. This organism is well described in terms of experimental data, which allows for a detailed analysis of transient and stable interactions between the proteins of the yeast proteome. Another example of such a meta-database is OPHID (the Online Predicted Human Interaction Database) [36] . This resource only contains human proteins compiled from BIND, HPRD and MINT, along with predictions based on interaction networks from yeast, fruit fly, mouse and other species, with the underlying assumption that othologous proteins have similar interaction partners. Such integrated data was used to propose a novel theoretical method of proteinprotein interaction prediction using logistic regression on a selected set of global protein attributes. This example clearly shows that the availability of high fidelity data is of crucial importance for the further development of theoretical methods.

Detailed experimental data is needed for a better understanding of the functional rules governing cellular life on the molecular level. Over the last decade, we could observe significant progress in the experimental techniques for the identification of interactions between proteins. Several types of experimental assay, such as the yeast two-hybrid assay [37] [38] [39] [40] or tandem affinity purification [41] , allow the high efficiency experimental analysis of protein-protein interactions on the whole proteome scale (for a review of the experimental methods, please refer to [11, 12, [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] ). Those advances in experimental methodology quickly led to progress in theoretical approaches, such as those based on homology [53] , protein pathways analysis [54] , multimeric threading [55] , or the prediction of interaction sites by docking methods [56] . The latter depends on previously acquired threedimensional protein structures [57] . New data from high throughput methods of structural genomics allow for further advances in this field. The interacting sites on protein surfaces are often hydrophobic [58, 59] , with evolutionarily conserved polar residues, so-called "hot spots" [60] [61] [62] . The important factor influencing the accuracy of interaction predictions is the proper representation of the proteins. There are two classes of input data used in most theoretical methods: sequence profiles (phylogenetic profiles [63] , mRNA expression levels [64] , the presence of protein domains in analyzed sequences [65] , and other approaches [66] [67] [68] ); and three-dimensional structures of interacting partners (prediction methods tested every year in independent computational experiments in the Critical Assessment of PRediction of Interactions, CAPRI [57] , methods based on the features of protein surfaces that allow for typing of the protein complexes [58, [69] [70] [71] , the selection of the most important residues and their features on protein interfaces [59, [72] [73] [74] , and other representations of experimental data [6, [75] [76] [77] ). If the complex structure is known, it is possible to select interacting interfaces on the protein surfaces, find the most important residues, and analyze their evolutionary conservation or physico-chemical features [6, 58, 78, 79] . Those methods do not depend strongly on the details of the protein structures, because the evolutionary information of each amino acid is projected from multiple sequence alignments mapped onto a selected, single protein structure [6, 80, 81] . Unfortunately, such an approach has some inherent limits, as described recently [82, 83] . The accuracy of these methods is limited to a few experimentally characterized protein families. We are aware of only a few experiments that have confirmed those predictions. Lately, new methods emerged utilizing the three-dimensional structures of single protein crystals compared to their structures in the crystals of complexes [84] . Both spatial descriptors of the solved proteins are analyzed in terms of some structural features (such as solvent accessibility) that are selected by machine learning algorithms in order to build representations that well distinguish interacting and non-interacting partners. The current theoretical and experimental approaches have different characteristics of systematic errors, and focus on different parts of the whole interactome. In most cases, one can observe only a small intersection between the various data sources, with no single method providing the perfect accuracy and good selectivity of predicted interaction networks. It is also important to stress that interactions confirmed by two or more methods are much more likely to be true than those identified by only a single method [3, 4] . This observation gives rise to the whole family of meta-predictors, i.e. methods that use several independent algorithms and experimental data sets in order to predict interactions. For data analysis, most of these use various machine learning methods, such as the support vector machine [66, 83, 85] , naïve Bayesian classification [86] , and decision trees [87, 88] . All those methods use only sequence information and single machine learning methods, and do not build consensus between various machine learning algorithms. The methods use sequences of proteins or their known three-dimensional structures in order to predict interactions. In most cases, the three-dimensional structures of the interacting molecules are not known. Therefore, most of the existing methods focus on sequence-based protein-protein interaction predictions, and use interacting sequences as their training sets. Knowledge about complex structures helps in the selection of positives and negatives for training, which is of crucial importance for machine learning methods. Zhou and Shan [89] predicted interactions between proteins using neural networks trained on the sequence profiles of interacting proteins for 615 pairs of non-homologous proteins and predicted solvent accessibility residues. In independent tests of 129 pairs of proteins, their method predicts that 70% of 11,004 residues take part in the formation of the complex. The list of neighbor residues on a protein chain and solvent accessibility are not dependent on structural changes during complex formation. Therefore, the accuracy of the method is not worse when single protein crystals are used instead of three-dimensional complex structures. A similar approach was proposed by Fariselli et al. [6] , who focused on the selection of important structural features from known protein complexes. Their neural network is able to predict 73% of the interaction residues on the independent benchmark of 226 structures. This work confirms that the use of physico-chemical features of interaction patches and sequence profiles allows for the proper selection of residues important for interactions.

Interacting residues tend to build clusters in protein sequences, so the support vector machine can easily be applied [83] . Ofran and Rost [68] proposed a simple neural network that can predict which protein fragments interact. The 94% high fidelity predictions for 34 of 333 proteins were confirmed experimentally. In the case of good predictions, almost 70% are confirmed experimentally, and at least one interaction patch is predicted correctly in 20% of cases, i.e. in 66 of 333 protein complexes. Those results show that effective prediction using only sequence information is possible. The inclusion of evolutionary information and structural descriptors improves those methods.

Recent advances in System Biology, especially in the context of high-throughput DNA sequencing (genomics), gene expression (transcriptomics), metabolite and ion analysis (metabolomics/ionomics) and protein analysis (proteomics) carries with it the challenge of processing and interpreting the accumulating data sets [90] [91] [92] [93] [94] [95] . Publicly accessible databases and bioinformatic tools are employed to mine this data in order to filter relevant correlations and create models describing physiological states [90] . Reconstructing the networks of interactions of the various cellular components as enzyme activities and complexes, gene expression, metabolite pools or pathway flux modes is therefore possible. However, capturing the interactions of network elements requires experimental setups with a variety of conditions [90, [95] [96] [97] [98] [99] . The ultimate goal of systems biology in the context of plant research is to understand the molecular principles governing plant responses and consistently explain plant physiology [90, 100, 101] . These approaches were described in detail in this very journal in the 2001-2008 period. Recent studies of protein-protein interactions focused on using some global sequence or structural features. For example Sprinzak et al. [35] presented a linear regression method trained on nine global protein attributes, such as domain signature, fold type, gene fusion, phylogenetic profile, gene context, conservation of neighboring genes, protein localization, type of molecular pathway, mRNA coexpression, or transcription coregulation. A protein could then be represented as points in nine-dimensional space using those features, and linear regression was applied. The problem is very complicated and non-trivial. Proper selection of the protein domain is necessary [102] [103] [104] [105] [106] [107] [108] . In addition to pure chemical data [109] [110] [111] [112] [113] [114] [115] [116] in the context of the Drug Discovery [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] , there is also a need for some knowledge on protein-protein interactions, the high quality structural prediction of proteins [2, [128] [129] [130] [131] [132] [133] [134] [135] [136] and their inhibitors, and a detailed understanding of how those inhibitors affect the molecular recognition between proteins. The development of theoretical methods for function annotation clearly shows that a detailed analysis of local characteristics of the protein chain can significantly improve accuracy. Recently published papers [137] [138] [139] [140] [141] [142] use local sequence description with some structural features (for example relative solvent accessibility RSA) for much more efficient training of machine learning methods. In the work of Meller et al. [84] , the authors tested the whole set of machine learning methods (neural networks, support vector machines and linear discrimination). Those methods were trained on the stable protein complexes from the Protein Data Bank (PDB). This work strongly supports the use of machine learning methods and the local representation of proteins. The authors obtained almost 70% accuracy of protein-protein interaction prediction, provided the structures of the interaction partners were known. The next step in the development of computational methods is based on several structural features of both interacting partners predicted from a sequence. The sequences can be represented as sets of short fragments with a known homology profile (build for the whole proteins). Such an approach was first used by Ofran and Rost in 2003 [68] , and later by Fernandez-Ballester and Serrano [143] as position-specific multiple-sequence alignment matrixes. Both broad types of interactions (stable and transient) are described as interactions between some key amino acids of both partners. The proper selection of important interacting residues allows for the performance of more certain predictions. Therefore, the analysis of contact maps in protein complexes with predicted local structural conformation of the main chain enriched by homology profiles will allow the collection of a more natural training set for machine learning methods. The proper representation of both the sequence and structure of the interacting partners is of crucial importance for the further development of bioinformatics computational methods for protein-protein interaction prediction [144, 145] . The sequence-based methods typically search for homologues of both interacting proteins with tools such as PSI-BLAST [26] or RPS-BLAST. If some interacting partners are present in both sets, the protein pair is likely to interact. Those methods are not yet sensitive enough in the case of distant homology. Some proteins from large and diverse superfamilies are dissimilar in terms of sequence: only a fold and a few important residues in an active site are preserved [146] . The lack of clear sequence similarity makes new family identification or function prediction more difficult. Protein-protein interaction prediction is also more complex when no close interacting homologs are known. Structure-based methods have a very limited area of usefulness due to their dependence on known structures of protein-protein complexes. Therefore, more sensitive sequence (eg.: Meta-BASIC at http://basic.bioinfo.pl/ [2] ) or structure prediction methods (like those coupled in the Protein Structure Prediction Meta Server http://BioInfo.PL/Meta/ [133, 147, 148] ) will be a significant improvement over standard sequence-based methods, even without knowledge of the exact threedimensional structures of the interacting partners. Other types of approach focus on predicting protein-protein interactions using protein-protein interaction network topology [149, 150] . In their approach, a spectral method derived from graph theory is used to select the topological substructures of protein-protein interaction networks that are biologically relevant functional groups. The function of uncharacterized proteins is then assigned based on the classification of known proteins within topological structures [149] . The clusters can also be identified by an eigenmode analysis of the connectivity matrix of the protein-protein interaction network. Such functional clustering allows for further prediction of new protein interactions [150] . The structural matching is also used to recognize protein-protein interaction sites in protein structures [151] . Prism (http://gordion.hpc.eng.ku.edu.tr/prism) uses protein interface structures derived from the Protein Data Bank (PDB) for the automated prediction of protein-protein interactions. Both structure and sequence conservation in protein interfaces are employed providing additional insight into the protein-protein interaction predictions [152] . The InterPreTS server http://www.russell.embl.de/cgi-bin/interprets2, given a pair of query sequences, searches for homologs in a database of interacting domains (DBID) of known three-dimensional complex structures. Pairs of sequences homologous to a known interacting pair are then scored for how well they preserve the atomic contacts at the interaction interface [153] [154] [155] .

Up to now, all the computational algorithms have only used single machine learning methods for the analysis and prediction of protein-protein interactions [156] [157] [158] [159] [160] , or the statistical analysis of interacting patches of protein surfaces [75, 149, 161, 162] . Our experience clearly supports the idea that each machine learning algorithm performs better for selected types of training data [163, 164] . Some have very high specificity, others focus more on sensitivity. Sometimes one can have a very large number of positives in training, but it is also common for some specific types of experiments with only a few confirmed instances known. In most cases, the proper selection of negatives is not trivial. In the case of protein-protein interactions, one should use a rich variety of input data for training, such as sequences, short sequence motifs, evolutionary information, genomic context, enzymatic classification, or the known or predicted local or global structure of interacting proteins. Using this data, one can apply various types of machine learning methods trained on the same set of positives and negatives, for example neural networks, the support vector machine, the random forest, decision trees, or rough sets. The crucial step of meta-prediction is building a consensus between those various prediction methods. Since systematic errors of multiple methods are usually randomly distributed, the consensus approach can be used to select a common prediction, probably the most accurate one [165] . Thanks to its easy parallelization, a consensus method can improve the accuracy of any single machine learning method without extending the time of prediction (the time needed is equal to the slowest used machine learning algorithm). The combination of various approaches done by Sen and Kloczkowski [166] provides the solid justification for this statement. They combine four different methods, such as data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich [167] [168] [169] . A consensus method predicts protein-protein interface residues by combining sequence and structure-based methods. Therefore, we hypothesize that consensus approaches are the main tools to handle the prediction of protein-protein interactions on the whole proteome level, the ultimate goal of system biology.

The Protein Data Bank

Detecting distant homology with Meta-BASIC

How reliable are experimental protein-protein interaction data?

Comparative assessment of large-scale data sets of proteinprotein interactions

Protein-protein docking using 3D-Dock in rounds 3, 4, and 5 of CAPRI

Prediction of proteinprotein interaction sites in heterocomplexes with neural networks

An algorithm for predicting protein-protein interaction sites: Abnormally exposed amino acid residues and secondary structure elements

Coevolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions

Transcriptional regulation of protein complexes within and across species

Principles of protein-protein interactions

Interactome: gateway into systems biology

Co-evolutionary analysis reveals insights into protein-protein interactions

Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data

CIS: compound importance sampling method for protein-DNA binding site p-value estimation

Conserved patterns of protein interaction in multiple species

Conserved pathways within bacteria and yeast as revealed by global protein network alignment

The Database of Interacting Proteins: 2004 update

The Biomolecular Interaction Network Database and related tools 2005 update

MINT: the Molecular INTeraction database

IntAct: an open source molecular interaction database

IntAct -open source resource for molecular interaction data

Implementing the iHOP concept for navigation of biomedical literature

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Pfam: clans, web tools and services

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Protein secondary structure prediction based on positionspecific scoring matrices

The COG database: an updated version includes eukaryotes

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium

The Gene Ontology Annotation (GOA) Database -an integrated resource of GO annotations to the UniProt Knowledgebase

The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro

Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary

The KEGG resource for deciphering the genome

Lethality and centrality in protein networks

Characterization and prediction of protein-protein interactions within and between complexes

Online predicted human interaction database

High-throughput screening for proteinprotein interactions using two-hybrid assay

A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae

A comprehensive two-hybrid analysis to explore the yeast protein interactome

Exploring the protein interactome using comprehensive two-hybrid projects

A generic protein purification method for protein complex characterization and proteome exploration

Analyzing yeast protein-protein interaction data obtained from different sources

Algorithms for identifying protein cross-links via tandem mass spectrometry

Roles for the two-hybrid system in exploration of the yeast protein interactome

Functional annotation from predicted protein interaction networks

A lock-andkey model for protein-protein interactions

Microarrays to characterize protein interactions on a whole-proteome scale

A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules

Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm

Monitoring regulated proteinprotein interactions using split TEV

Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations

Advances in proteomic technologies

Detecting protein function and protein-protein interactions from genome sequences

A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)

MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading

Prediction of protein-protein interactions by docking methods

Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications

Analysis of protein-protein interaction sites using surface patches

The atomic structure of proteinprotein recognition sites

Residue frequencies and pairing preferences at protein-protein interfaces

Conservation of polar residues as hot spots at protein interfaces

Unraveling hot spots in binding interfaces: progress and challenges

A fast algorithm for genome-wide analysis of proteins with repeated sequences

Cluster analysis and display of genome-wide expression patterns

Correlated sequence-signatures as markers of protein-protein interaction

Predicting protein--protein interactions from primary structure

A fast method to predict protein interaction sites from sequences

Predicted protein-protein interaction sites from local sequence information

Principles of protein-protein interactions

Diversity of protein-protein interactions

Structural characterisation and functional significance of transient protein-protein interactions

A dissection of specific and non-specific protein-protein interfaces

Analysing six types of protein-protein interfaces

Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface

Statistical analysis and prediction of proteinprotein interfaces

ProMate: a structure based prediction program to identify the location of protein-protein binding sites

Exploiting sequence and structure homologs to identify protein-protein binding sites

Protein-protein interfaces: analysis of amino acid conservation in homodimers

An accurate, sensitive, and scalable method to identify functional sites in protein structures

Automated structurebased prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking

ConSeq: the identification of functionally and structurally important residues in protein sequences

Are protein-protein interfaces more conserved in sequence than the rest of the protein surface?

A two-stage classifier for identification of protein-protein interface residues

Prediction-based fingerprints of protein-protein interactions

Prediction of protein-protein interaction sites using support vector machines

A Bayesian networks approach for predicting protein-protein interactions from genomic data

Prediction of protein secondary structure based on residue pairs

Predicting cocomplexed protein pairs using genomic and proteomic data integration

Prediction of protein interaction sites from sequence profile and residue neighbor list

On the way to understand biological complexity in plants: S-nutrition as a case study for systems biology

Polymorphisms of the uridine-diphosphoglucuronosyltransferase 1A1 gene and coronary artery disease

A proteomics study of the mung bean epicotyl regulated by brassinosteroids under conditions of chilling stress

Uncharacterized DUF1574 leptospira proteins are SGNH hydrolases

Cell electrophoresis -a method for cell separation and research into cell surface properties

Molecular cloning and characterization of a novel human gene containing 4 ankyrin repeat domains

The inhibition of in vivo tumorigenesis of osteosarcoma (OS)-732 cells by antisense human osteopontin RNA

Is a genetic defect in Fkbp6 a common cause of azoospermia in humans?

Is a fluid-mosaic model of biological membranes fully relevant? Studies on lipid organization in model and biological membranes

Regulation of bacterial protease activity

Molecular characterisation of the SAND protein family: a study based on comparative genomics, structural bioinformatics and phylogeny

Molecular mechanisms of retinoid action

Graph-representation of oxidative folding pathways

Application of a simple likelihood ratio approximant to protein sequence classification

Application of compression-based distance measures to protein sequence classification: a methodological study

The SBASE domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines

The SBASE domain sequence library, release 10: domain architecture prediction

PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics

ProteinSplit: splitting of multi-domain proteins using prediction of ordered and disordered regions in protein sequences for virtual structural genomics

DFT study on hydroxy acid-lactone interconversion of statins: The case of fluvastatin

Syn-and anti-conformations of 5'-deoxy-and 5'-O-methyl-uridine 2',3'-cyclic monophosphate

Modeling of purine derivatives transport across cell membranes based on their partition coefficient determination and quantum chemical calculations

Quantum chemical study of the mechanism of ethylene elimination in silylative coupling of olefins

New type of bonding formed from an overlap between pi aromatic and pi C=O molecular orbitals stabilizes the coexistence in one molecule of the ionic and neutral meso-ionic forms of imidazopyridine

Effects of substituting a OH group by a F atom in D-glucose. Ab initio and DFT analysis

Mechanism of activation of an immunosuppressive drug: azathioprine. Quantum chemical study on the reaction of azathioprine with cysteine

Relationship between structure and photoinitiating abilities of selected bromide salts of 2-oxo-2,3-dihydro-1H-imidazo[1,2-a]pyridine (IMP): influence of the solvent and the substitution in benzaldehyde on the course of its reaction with IMP

Three dimensional model of severe acute respiratory syndrome coronavirus helicase ATPase catalytic domain and molecular design of severe acute respiratory syndrome coronavirus helicase inhibitors

Three clinical variants of gastroesophageal reflux disease form two distinct gene expression signatures

Cooperative binding of the hnRNP K three KH domains to mRNA targets

The binding activity of yeast RNAs to yeast Hek2p and mammalian hnRNP K proteins, determined using the three-hybrid system

Info small-molecule Meta-Database

Ligand-Info, searching for similar small compounds using index profiles

mRNA cap-1 methyltransferase in the SARS genome

Herpes glycoprotein gL is distantly related to chemokine receptor ligands

Plant nitric oxide synthase: a never-ending story?

In silico prediction of SARS protease inhibitors by virtual high throughput screening

Modelling of potentially promising SARS protease inhibitors

Molecular phylogenetics of the RrmJ/fibrillarin superfamily of ribose 2'-Omethyltransferases

ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure

Mitochondria-associated satellite I RNA binds to hnRNP K protein

Characterization of hnRNP K protein-RNA interactions

Structure prediction, evolution and ligand interaction of CHASE domain

Application of 3D-Jury, GRDB, and Verify3D in fold recognition

PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics

Predicting protein structures accurately

How unique is the rice transcriptome?

Molecular modeling of phosphorylation sites in proteins using a database of local structure segments

A support vector machine approach to the identification of phosphorylation sites

Support-vector-machine classification of linear functional motifs in proteins

AutoMotif server: prediction of single residue post-translational modifications in proteins

AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine

The RPSP: Web server for prediction of signal peptides

Prediction of protein-protein interaction based on structure

Comparison of proteins based on segments structural similarity)

Integrated web service for improving alignment quality based on segments comparison

Identification of novel restriction endonuclease-like fold families among hypothetical proteins

3D-Jury: a simple approach to improve protein structure predictions

Detection of reliable and unexpected protein fold predictions using 3D-Jury

Topological structure analysis of the protein-protein interaction network in budding yeast

Functional clustering of yeast proteins from the protein-protein interaction network

PRISM: protein interactions by structural matching

Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces

Structure-based assembly of protein complexes in yeast

Interrogating protein interaction networks through structural biology

InterPreTS: protein interaction prediction through tertiary structure

Kernel methods for predicting protein-protein interactions

Choosing negative examples for the prediction of protein-protein interactions

Learning to predict proteinprotein interactions from protein sequences

An ensemble of K-local hyperplanes for predicting protein-protein interactions

Faster and more accurate global protein function assignment from protein interaction networks using the MFGO algorithm

Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations

The interactome as a tree-an attempt to visualize the protein-protein interaction network in yeast

Assessing different classification methods for virtual screening

Target specific compound identification using a support vector machine

Assessing different classification methods for virtual screening

Predicting binding sites of hydrolase-inhibitor complexes by combining several methods

CoC: a database of universally conserved residues in protein folds

How evolution makes proteins fold quickly

Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function

Acknowledgements. This study was supported by the EC BioSapiens