key: cord-022494-d66rz6dc
authors: Webb, B.; Eswar, N.; Fan, H.; Khuri, N.; Pieper, U.; Dong, G.Q.; Sali, A.
title: Comparative Modeling of Drug Target Proteins
date: 2014-10-01
journal: Reference Module in Chemistry, Molecular Sciences and Chemical Engineering
DOI: 10.1016/b978-0-12-409547-2.11133-3
sha: 
doc_id: 22494
cord_uid: d66rz6dc

In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.

Structure-based or rational drug discovery has already resulted in a number of drugs on the market and many more in the development pipeline. [1] [2] [3] [4] Structure-based methods are now routinely used in almost all stages of drug development, from target identification to lead optimization. [5] [6] [7] [8] Central to all structure-based discovery approaches is the knowledge of the threedimensional (3D) structure of the target protein or complex because the structure and dynamics of the target determine which ligands it binds. The 3D structures of the target proteins are best determined by experimental methods that yield solutions at atomic resolution, such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. 9 While developments in the techniques of experimental structure determination have enhanced the applicability, accuracy, and speed of these structural studies, 10, 11 structural characterization of sequences remains an expensive and time-consuming task.

The publicly available Protein Data Bank (PDB) 12 currently contains $ 92 000 structures and grows at a rate of approximately 40% every 2 years. On the other hand, the various genome-sequencing projects have resulted in over 40 million sequences, including the complete genetic blueprints of humans and hundreds of other organisms. 13, 14 This achievement has resulted in a vast collection of sequence information about possible target proteins with little or no structural information. Current statistics show that the structures available in the PDB account for less than 1% of the sequences in the UniProt database. 13 Moreover, the rate of growth of the sequence information is more than twice that of the structures, and is expected to accelerate even more with the advent of readily available next-generation sequencing technologies. Due to this wide sequence-structure gap, reliance on experimentally determined structures limits the number of proteins that can be targeted by structure-based drug discovery.

Fortunately, domains in protein sequences are gradually evolving entities that can be clustered into a relatively small number of families with similar sequences and structures. 15, 16 For instance, 75-80% of the sequences in the UniProt database have been grouped into fewer than 15 000 domain families. 17, 18 Similarly, all the structures in the PDB have been classified into about 1000 distinct folds. 19, 20 Computational protein structure prediction methods, such as threading 21 and comparative protein structure modeling, 22, 23 strive to bridge the sequence-structure gap by utilizing these evolutionary relationships. The speed, low cost, and relative accuracy of these computational methods have led to the use of predicted 3D structures in the drug discovery process. 24, 25 The other class of prediction methods, de novo or ab initio methods, attempts to predict the structure from sequence alone, without reliance on evolutionary relationships. However, despite progress in these methods, [26] [27] [28] especially for small proteins with fewer than 100 amino acid residues, comparative modeling remains the most reliable method of predicting the 3D structure of a protein, with an accuracy that can be comparable to a low-resolution, experimentally determined structure. 9 The Basis of Comparative Modeling

The primary requirement for reliable comparative modeling is a detectable similarity between the sequence of interest (target sequence) and a known structure (template). As early as 1986, Chothia and Lesk 29 showed that there is a strong correlation between sequence and structural similarities. This correlation provides the basis of comparative modeling, allows a coarse assessment of model errors, and also highlights one of its major challenges: modeling the structural differences between the template and target structures 30 (Figure 1 ).

Comparative modeling stands to benefit greatly from the structural genomics initiative. 31 Structural genomics aims to achieve significant structural coverage of the sequence space with an efficient combination of experimental and prediction methods. 32 This goal is pursued by careful selection of target proteins for structure determination by X-ray crystallography and NMR spectroscopy, such that most other sequences are within 'modeling distance' (e.g., >30% sequence identity) of a known structure. 15, 16, 31, 33 The expectation is that the determination of these structures combined with comparative modeling will yield useful structural information for the largest possible fraction of sequences in the shortest possible timeframe. The impact of structural genomics is illustrated by comparative modeling based on the structures determined by the New York Structural Genomics Research Consortium. For each new structure without a close homolog in the PDB, on average, 3500 protein sequences without any prior structural characterization could be modeled at least at the level of the fold. 34 Thus, the structures of most proteins will eventually be predicted by computation, not determined by experiment.

In this review, we begin by describing the various steps involved in comparative modeling. Next, we emphasize two aspects of model refinement, loop modeling and side-chain modeling, due to their relevance in ligand docking and rational drug discovery. We then discuss the errors in comparative models. Finally, we describe the role of comparative modeling in drug discovery, focusing on ligand docking against comparative models. We compare successes of docking against models and X-ray structures, and illustrate the computational docking against models with a number of examples. We conclude with a summary of topics that will impact on the future utility of comparative modeling in drug discovery, including an automation and integration of resources required for comparative modeling and ligand docking.

Comparative modeling consists of four main steps 23 (Figure 2 (a)): (1) fold assignment that identifies similarity between the target sequence of interest and at least one known protein structure (the template); (2) alignment of the target sequence and the template(s); (3) building a model based on the alignment with the chosen template(s); and (4) predicting model errors.

Although fold assignment and sequence-structure alignment are logically two distinct steps in the process of comparative modeling, in practice almost all fold assignment methods also provide sequence-structure alignments. In the past, fold assignment methods were optimized for better sensitivity in detecting remotely related homologs, often at the cost of alignment accuracy. However, recent methods simultaneously optimize both the sensitivity and alignment accuracy. Therefore, in the following discussion, we will treat fold assignment and sequence-structure alignment as a single protocol, explaining the differences as needed.

As mentioned earlier, the primary requirement for comparative modeling is the identification of one or more known template structures with detectable similarity to the target sequence. The identification of suitable templates is achieved by scanning structure databases, such as PDB, 12 SCOP, 19 DALI, 36 and CATH, 20 with the target sequence as the query. The detected similarity is usually quantified in terms of sequence identity or statistical measures, such as E-value or z-score, depending on the method used.

Sequence-structure relationships are coarsely classified into three different regimes in the sequence similarity spectrum: (1) the easily detected relationships characterized by >30% sequence identity; (2) the 'twilight zone,' 37 corresponding to relationships Figure 1 Average model accuracy as a function of sequence identity. 30 As the sequence identity between the target sequence and the template structure decreases, the average structural similarity between the template and the target also decreases (dashed line, triangles). 29 Structural overlap is defined as the fraction of equivalent C a atoms. For the comparison of the model with the actual structure (filled circles), two C a atoms were considered equivalent if they belonged to the same residue and were within 3.5 Å of each other after least-squares superposition. For comparisons between the template structure and the actual target structure (triangles), two C a atoms were considered equivalent if they were within 3.5 Å of each other after alignment and rigid-body superposition. The difference between the model and the actual target structure is a combination of the target-template differences (green area) and the alignment errors (red area). The figure was constructed by calculating 3993 comparative models based on a single template of varying similarity to the targets. All targets had known (experimentally determined) structures. 30 with statistically significant sequence similarity in the 10-30% range; and (3) the 'midnight zone,' 37 corresponding to statistically insignificant sequence similarity.

For closely related protein sequences with identities higher than 30-40%, the alignments produced by all methods are almost always largely correct. The quickest way to search for suitable templates in this regime is to use simple pairwise sequence alignment methods such as SSEARCH, 38 BLAST, 39 and FASTA. 38 Brenner et al. showed that these methods detect only $18% of the homologous pairs at less than 40% sequence identity, while they identify more than 90% of the relationships when sequence identity is between 30% and 40%. 40 Another benchmark, based on 200 reference structural alignments with 0-40% sequence identity, indicated that BLAST is able to correctly align only 26% of the residue positions. 41 

The sensitivity of the search and accuracy of the alignment become progressively difficult as the relationships move into the twilight zone. 37,42 A significant improvement in this area was the introduction of profile methods by Gribskov and co-workers. 43 The profile of a sequence is derived from a multiple sequence alignment and specifies residue-type occurrences for each alignment position. The information in a multiple sequence alignment is most often encoded as either a position-specific scoring matrix (PSSM) 39, 44, 45 or as a hidden Markov model (HMM). 46, 47 To identify suitable templates for comparative modeling, the profile of the target sequence is used to search against a database of template sequences. The profile-sequence methods are more sensitive in detecting related structures in the twilight zone than the pairwise sequence-based methods; they detect approximately twice the number of homologs under 40% sequence identity. 41, 48, 49 The resulting profile-sequence alignments correctly align approximately 43-48% of residues in the 0-40% sequence identity range; 41,50 this number is almost twice as large as that of the pairwise sequence methods. Frequently used programs for profile-sequence alignment are PSI-BLAST, 39 SAM, 51 HMMER, 46 HHsearch, 52 HHBlits, 53 and BUILD_PROFILE. 54

As a natural extension, the profile-sequence alignment methods have led to profile-profile alignment methods that search for suitable template structures by scanning the profile of the target sequence against a database of template profiles, as opposed to a database of template sequences. These methods have proven to include the most sensitive and accurate fold assignment and Figure 2 Comparative protein structure modeling. (a) A flowchart illustrating the steps in the construction of a comparative model. 23 (b) Description of comparative modeling by extraction of spatial restraints as implemented in Modeller. 35 By default, spatial restraints in Modeller include: (1) homology-derived restraints from the aligned template structures; (2) statistical restraints derived from all known protein structures; and (3) stereochemical restraints from the CHARMM-22 molecular mechanics force field. These restraints are combined into an objective function that is then optimized to calculate the final 3D model of the target sequence. alignment protocols to date. 50, [55] [56] [57] Profile-profile methods detect $28% more relationships at the superfamily level and improve the alignment accuracy by 15-20% compared to profile-sequence methods. 50, 58 There are a number of variants of profile-profile alignment methods that differ in the scoring functions they use. 50, 55, [58] [59] [60] [61] [62] [63] [64] However, several analyses have shown that the overall performances of these methods are comparable. 50, [55] [56] [57] Some of the programs that can be used to detect suitable templates are FFAS, 65 SP3, 58 SALIGN, 50 HHBlits, 53 HHsearch, 52 and PPSCAN. 54 Sequence-structure threading methods As the sequence identity drops below the threshold of the twilight zone, there is usually insufficient signal in the sequences or their profiles for the sequence-based methods discussed above to detect true relationships. 48 Sequence-structure threading methods are most useful in this regime as they can sometimes recognize common folds, even in the absence of any statistically significant sequence similarity. 21 These methods achieve higher sensitivity by using structural information derived from the templates. The accuracy of a sequence-structure match is assessed by the score of a corresponding coarse model and not by sequence similarity, as in sequence comparison methods. 21 The scoring scheme used to evaluate the accuracy is either based on residue substitution tables dependent on structural features such as solvent exposure, secondary structure type, and hydrogen bonding properties, 58,66-68 or on statistical potentials for residue interactions implied by the alignment. [69] [70] [71] [72] [73] The use of structural data does not have to be restricted to the structure side of the aligned sequence-structure pair. For example, SAM-T08 makes use of the predicted local structure for the target sequence to enhance homolog detection and alignment accuracy. 74 Commonly used threading programs are GenTHREADER, 66,75 3D-PSSM, 76 FUGUE, 68 SP3, 58 SAM-T08 multitrack HMM, 67, 74, 77 and MUSTER. 78 Iterative sequence-structure alignment Yet another strategy is to optimize the alignment by iterating over the process of calculating alignments, building models, and evaluating models. Such a protocol can sample alignments that are not statistically significant and identify the alignment that yields the best model. Although this procedure can be time-consuming, it can significantly improve the accuracy of the resulting comparative models in difficult cases. 79 

Regardless of the method used, searching in the twilight and midnight zones of the sequence-structure relationship often results in false negatives, false positives, or alignments that contain an increasingly large number of gaps and alignment errors. Improving the performance and accuracy of methods in this regime remains one of the main tasks of comparative modeling today. 80 It is imperative to calculate an accurate alignment between the target-template pair. Although some progress has been made recently, 81 comparative modeling can rarely recover from an alignment error. 82 

After a list of all related protein structures and their alignments with the target sequence have been obtained, template structures are prioritized depending on the purpose of the comparative model. Template structures may be chosen purely based on the targettemplate sequence identity or a combination of several other criteria, such as experimental accuracy of the structures (resolution of X-ray structures, number of restraints per residue for NMR structures), conservation of active-site residues, holo-structures that have bound ligands of interest, and prior biological information that pertains to the solvent, pH, and quaternary contacts. It is not necessary to select only one template. In fact, the use of several templates approximately equidistant from the target sequence generally increases the model accuracy. 83, 84 Model Building

Once an initial target-template alignment is built, a variety of methods can be used to construct a 3D model for the target protein. 23, 82, [85] [86] [87] [88] The original and still widely used method is modeling by rigid-body assembly. 86, 87, 89 This method constructs the model from a few core regions, and from loops and side chains that are obtained by dissecting related structures. Commonly used programs that implement this method are COMPOSER, 90-93 3D-JIGSAW, 94 RosettaCM, 81 and SWISS-MODEL. 95 Another family of methods, modeling by segment matching, relies on the approximate positions of conserved atoms from the templates to calculate the coordinates of other atoms. [96] [97] [98] [99] [100] An instance of this approach is implemented in SegMod. 99 The third group of methods, modeling by satisfaction of spatial restraints, uses either distance geometry or optimization techniques to satisfy spatial restraints obtained from the alignment of the target sequences with the template structures. 35, [101] [102] [103] [104] Specifically, Modeller, 35,105,106 our own program for comparative modeling, belongs to this group of methods.

Modeller implements comparative protein structure modeling by the satisfaction of spatial restraints that include: (1) homologyderived restraints on the distances and dihedral angles in the target sequence, extracted from its alignment with the template structures; 35 (2) stereochemical restraints such as bond length and bond angle preferences, obtained from the CHARMM-22 molecular mechanics force field; 107 (3) statistical preferences for dihedral angles and nonbonded interatomic distances, obtained from a representative set of known protein structures; 108 and (4) optional manually curated restraints, such as those from NMR spectroscopy, rules of secondary structure packing, cross-linking experiments, fluorescence spectroscopy, image reconstruction from electron microscopy, site-directed mutagenesis, and intuition ( Figure 2(b) ). The spatial restraints, expressed as probability density functions, are combined into an objective function that is optimized by a combination of conjugate gradients and molecular dynamics with simulated annealing. This model-building procedure is similar to structure determination by NMR spectroscopy.

Accuracies of the various model-building methods are relatively similar when used optimally. 109, 110 Other factors, such as template selection and alignment accuracy, usually have a larger impact on the model accuracy, especially for models based on less than 30% sequence identity to the templates. However, it is important that a modeling method allows a degree of flexibility and automation to obtain better models more easily and rapidly. For example, a method should allow for an easy recalculation of a model when a change is made in the alignment; it should be straightforward to calculate models based on several templates; and the method should provide tools for incorporation of prior knowledge about the target (e.g., cross-linking restraints and predicted secondary structure).

Protein sequences evolve through a series of amino acid residue substitutions, insertions, and deletions. While substitutions can occur throughout the length of the sequence, insertions and deletions mostly occur on the surface of proteins in segments that connect regular secondary structure segments (i.e., loops). While the template structures are helpful in the modeling of the aligned target backbone segments, they are generally less valuable for the modeling of side chains and irrelevant for the modeling of insertions such as loops. The loops and side chains of comparative models are especially important for ligand docking; thus, we discuss them in the following two sections.

Loop modeling is an especially important aspect of comparative modeling in the range from 30% to 50% sequence identity. In this range of overall similarity, loops among the homologs vary while the core regions are still relatively conserved and aligned accurately. Loops often play an important role in defining the functional specificity of a given protein, forming the active and binding sites. Loop modeling can be seen as a mini protein folding problem because the correct conformation of a given segment of a polypeptide chain has to be calculated mainly from the sequence of the segment itself. However, loops are generally too short to provide sufficient information about their local fold. Even identical decapeptides in different proteins do not always have the same conformation. 111, 112 Some additional restraints are provided by the core anchor regions that span the loop and by the structure of the rest of the protein that cradles the loop. Although many loop-modeling methods have been described, it is still challenging to correctly and confidently model loops longer than approximately 10-12 residues. 105, 113, 114 Two classes of methods There are two main classes of loop-modeling methods: (1) database search approaches that scan a database of all known protein structures to find segments fitting the anchor core regions 98, 115 ; and (2) conformational search approaches that rely on optimizing a scoring function. [116] [117] [118] There are also methods that combine these two approaches. 119, 120 Database-based loop modeling The database search approach to loop modeling is accurate and efficient when a database of specific loops is created to address the modeling of the same class of loops, such as b-hairpins, 121 or loops on a specific fold, such as the hypervariable regions in the immunoglobulin fold. 115, 122 There are attempts to classify loop conformations into more general categories, thus extending the applicability of the database search approach. [123] [124] [125] However, the database methods are limited because the number of possible conformations increases exponentially with the length of a loop, and until the late 1990s only loops up to 7 residues long could be modeled using the database of known protein structures. 126, 127 However, the growth of the PDB in recent years has largely eliminated this problem. 128 Optimization-based methods There are many optimization-based methods, exploiting different protein representations, objective functions, and optimization or enumeration algorithms. The search algorithms include the minimum perturbation method, 129 dihedral angle search through a rotamer library, 114,130 molecular dynamics simulations, 119, 131 genetic algorithms, 132 Monte Carlo and simulated annealing, [133] [134] [135] multiple-copy simultaneous search, 136 self-consistent field optimization, 137 robotics-inspired kinematic closure 138 and enumeration based on graph theory. 139 The accuracy of loop predictions can be further improved by clustering the sampled loop conformations and partially accounting for the entropic contribution to the free energy. 140 Another way of improving the accuracy of loop predictions is to consider the solvent effects. Improvements in implicit solvation models, such as the Generalized Born solvation model, motivated their use in loop modeling. The solvent contribution to the free energy can be added to the scoring function for optimization, or it can be used to rank the sampled loop conformations after they are generated with a scoring function that does not include the solvent terms. 105, [141] [142] [143] Side-Chain Modeling

Two simplifications are frequently applied in the modeling of side-chain conformations. 144 First, amino acid residue replacements often leave the backbone structure almost unchanged, 145 allowing us to fix the backbone during the search for the best side-chain conformations. Second, most side chains in high-resolution crystallographic structures can be represented by a limited number of conformers that comply with stereochemical and energetic constraints. 146 This observation motivated Ponder and Richards 147 to develop the first library of side-chain rotamers for the 17 types of residues with dihedral angle degrees of freedom in their side chains, based on 10 high-resolution protein structures determined by X-ray crystallography. Subsequently, a number of additional libraries have been derived. [148] [149] [150] [151] [152] [153] [154] [155] Rotamers Rotamers on a fixed backbone are often used when all the side chains need to be modeled on a given backbone. This approach reduces the combinatorial explosion associated with a full conformational search of all the side chains, and is applied by some comparative modeling 86 and protein design approaches. 156 However, $ 15% of the side chains cannot be represented well by these libraries. 157 In addition, it has been shown that the accuracy of side-chain modeling on a fixed backbone decreases rapidly when the backbone errors are larger than 0.5 Å . 158 

Earlier methods for side-chain modeling often put less emphasis on the energy or scoring function. The function was usually greatly simplified, and consisted of the empirical rotamer preferences and simple repulsion terms for nonbonded contacts. 151 Nevertheless, these approaches have been justified by their performance. For example, a method based on a rotamer library compared favorably with that based on a molecular mechanics force field, 159 and new methods continue to be based on the rotamer library approach. [160] [161] [162] The various optimization approaches include a Monte Carlo simulation, 163 simulated annealing, 164 a combination of Monte Carlo and simulated annealing, 165 the dead-end elimination theorem, 166, 167 genetic algorithms, 155 neural network with simulated annealing, 168 mean field optimization, 169 and combinatorial searches. 151, 170, 171 Several studies focused on the testing of more sophisticated potential functions for conformational search 171, 172 and development of new scoring functions for side-chain modeling, 173 reporting higher accuracy than earlier studies.

The major sources of error in comparative modeling are discussed in the relevant sections above. The following is a summary of these errors, dividing them into five categories (Figure 3) .

This error is a potential problem when distantly related proteins are used as templates (i.e., less than 30% sequence identity). Distinguishing between a model based on an incorrect template and a model based on an incorrect alignment with a correct template is difficult. In both cases, the evaluation methods (below) will predict an unreliable model. The conservation of the key functional or structural residues in the target sequence increases the confidence in a given fold assignment.

The single source of errors with the largest impact on comparative modeling is misalignments, especially when the target-template sequence identity decreases below 30%. Alignment errors can be minimized in two ways. Using the profile-based methods discussed above usually results in more accurate alignments than those from pairwise sequence alignment methods. Another way of improving the alignment is to modify those regions in the alignment that correspond to predicted errors in the model. 83 

Segments of the target sequence that have no equivalent region in the template structure (i.e., insertions or loops) are one of the most difficult regions to model. Again, when the target and template are distantly related, errors in the alignment can lead to incorrect positions of the insertions. Using alignment methods that incorporate structural information can often correct such errors. Once a reliable alignment is obtained, various modeling protocols can predict the loop conformation, for insertions of fewer than 8-10 residues. 105, 113, 119, 174 Distortions and Shifts in Correctly Aligned Regions

As a consequence of sequence divergence, the main-chain conformation changes, even if the overall fold remains the same. Therefore, it is possible that in some correctly aligned segments of a model, the template is locally different (<3 Å ) from the target, resulting in errors in that region. The structural differences are sometimes not due to differences in sequence, but are a consequence of artifacts in structure determination or structure determination in different environments (e.g., packing of subunits in a crystal). The simultaneous use of several templates can minimize this kind of error. 83,84 Figure 3 Typical errors in comparative modeling. 23 Shown are the typical sources of errors encountered in comparative models. Two of the major sources of errors in comparative modeling are due to incorrect templates or incorrect alignments with the correct templates. The modeling procedure can rarely recover from such errors. The next significant source of errors arises from regions in the target with no corresponding region in the template, i.e., insertions or loops. Other sources of errors, which occur even with an accurate alignment, are due to rigid-body shifts, distortions in the backbone, and errors in the packing of side chains.

As the sequences diverge, the packing of the atoms in the protein core changes. Sometimes even the conformation of identical side chains is not conserved -a pitfall for many comparative modeling methods. Side-chain errors are critical if they occur in regions that are involved in protein function, such as active sites and ligand-binding sites.

The accuracy of the predicted model determines the information that can be extracted from it. Thus, estimating the accuracy of a model in the absence of the known structure is essential for interpreting it.

As discussed earlier, a model calculated using a template structure that shares more than 30% sequence identity is indicative of an overall accurate structure. However, when the sequence identity is lower, the first aspect of model evaluation is to confirm whether or not a correct template was used for modeling. It is often the case, when operating in this regime, that the fold assignment step produces only false positives. A further complication is that at such low similarities the alignment generally contains many errors, making it difficult to distinguish between an incorrect template on one hand and an incorrect alignment with a correct template on the other hand. There are several methods that use 3D profiles and statistical potentials, 70, 175, 176 which assess the compatibility between the sequence and modeled structure by evaluating the environment of each residue in a model with respect to the expected environment, as found in native high-resolution experimental structures. These methods can be used to assess whether or not the correct template was used for the modeling. They include VERIFY3D, 175 184 and TSVMod. 185 Even when the model is based on alignments that have >30% sequence identity, other factors, including the environment, can strongly influence the accuracy of a model. For instance, some calcium-binding proteins undergo large conformational changes when bound to calcium. If a calcium-free template is used to model the calcium-bound state of the target, it is likely that the model will be incorrect, irrespective of the target-template similarity or accuracy of the template structure. 186 

The model should also be subjected to evaluations of self-consistency to ensure that it satisfies the restraints used to calculate it. Additionally, the stereochemistry of the model (e.g., bond lengths, bond angles, backbone torsion angles, and nonbonded contacts) may be evaluated using programs such as PROCHECK 187 and WHATCHECK. 188 Although errors in stereochemistry are rare and less informative than errors detected by statistical potentials, a cluster of stereochemical errors may indicate that there are larger errors (e.g., alignment errors) in that region.

When multiple models are calculated for the target based on a single template or when multiple loops are built for a single or multiple models, it is practical to select a subset of models or loops that are judged to be most suitable for subsequent docking calculations. If some known ligands or other information for the desired model is available, model selection should be guided by this known information. 189 If this extra information is not available, model selection should aim to select the most accurate model. While models or loops can be selected by the energy function used for guiding the building of comparative models or the sampling of loop configurations, using a separate statistical potential for selecting the most accurate models or loops is often more successful. 181, 182, 190, 191 

It is crucial for method developers and users alike to assess the accuracy of their methods. An attempt to address this problem has been made by the Critical Assessment of Techniques for Proteins Structure Prediction (CASP) 192 and in the past by the Critical Assessment of Fully Automated Structure Prediction (CAFASP) experiments, 193 which is now integrated into CASP. However, CASP assesses methods only over a limited number of target protein sequences, and is conducted only every 2 years. 109, 194 To overcome this limitation, the new CAMEO web server continuously evaluates the accuracy and reliability of a number of comparative protein structure prediction servers, in a fully automated manner. 195 Every week, CAMEO provides each tested server with the prerelease sequences of structures that are to be shortly released by the PDB. Each server then has 4 days to build and return a 3D model of these sequences. When PDB releases the structures, CAMEO compares the models against the experimentally determined structures, and presents the results on its web site. This enables developers, non-expert users, and reviewers to determine the performance of the tested prediction servers. CAMEO is similar in concept to two prior such continuous testing servers, LiveBench 194 and EVA. 196, 197 

There is a wide range of applications of protein structure models ( Figure 4) . 1, [198] [199] [200] [201] [202] [203] [204] For example, high-and medium-accuracy comparative models are frequently helpful in refining functional predictions that have been based on a sequence match alone because ligand binding is more directly determined by the structure of the binding site than by its sequence. It is often possible to predict correctly features of the target protein that do not occur in the template structure. 205, 206 For example, the size of a ligand may be predicted from the volume of the binding site cleft and the location of a binding site for a charged ligand can be predicted from a cluster of charged residues on the protein. Fortunately, errors in the functionally important regions in comparative models are many times relatively low because the functional regions, such as active sites, tend to be more conserved in evolution than the rest of the fold. Even low-accuracy comparative models may be useful, for example, for assigning the fold of a protein. Fold Figure 4 Accuracy and applications of protein structure models. 9 Shown are the different ranges of applicability of comparative protein structure modeling, threading, and de novo structure prediction, their corresponding accuracies, and their sample applications. assignment can be very helpful in drug discovery, because it can shortcut the search for leads by pointing to compounds that have been previously developed for other members of the same family. 207, 208 

The remainder of this review focuses on the use of comparative models for ligand docking. [209] [210] [211] Comparative protein structure modeling extends the applicability of virtual screening beyond the atomic structures determined by X-ray crystallography or NMR spectroscopy. In fact, comparative models have been used in virtual screening to detect novel ligands for many protein targets, 201 including the G-protein coupled receptors (GPCR), 210, [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] protein kinases, [224] [225] [226] [227] nuclear hormone receptors, and many different enzymes. [228] [229] [230] [231] [232] [233] [234] [235] [236] [237] [238] [239] [240] [241] Nevertheless, the relative utility of comparative models versus experimentally determined structures has only been relatively sparsely assessed. 212, 224, 225, [242] [243] [244] The utility of comparative models for molecular docking screens in ligand discovery has been documented 245 with the aid of 38 protein targets selected from the 'directory of useful decoys' (DUD). 246 For each target sequence, templates for comparative modeling were obtained from the PDB, including at least one holo (ligand bound) and one apo (ligand free) template structure for each of the eight 10% sequence identity ranges from 20% to 100%. In total, 222 models were generated based on 222 templates for the 38 test proteins using Modeller 9v2. 35 DUD ligands and decoys (98 266 molecules) were screened against the holo X-ray structure, the apo X-ray structure, and the comparative models of each target using DOCK 3.5.54. 247 The accuracy of virtual screening was evaluated by the overall ligand enrichment that was calculated by integrating the area under the enrichment plot (logAUC). A key result was that, for 63% and 79% of the targets, at least one comparative model yielded ligand enrichment better or comparable to that of the corresponding holo and apo X-ray structure. 245 This result indicates that comparative models can be useful docking targets when multiple templates are available. However, it was not possible to predict which model, out of all those used, led to the highest enrichment. Therefore, a 'consensus' enrichment score was computed by ranking each library compound by its best docking score against all comparative models and/or templates. For 47% and 70% of the targets, the consensus enrichment for multiple models was better or comparable to that of the holo and apo X-ray structures, respectively, suggesting that multiple comparative models can be useful for virtual screening.

Despite problems with comparative modeling and ligand docking, comparative models have been successfully used in practice in conjunction with virtual screening to identify novel inhibitors. We briefly review a few of these success stories to highlight the potential of the combined comparative modeling and ligand-docking approach to drug discovery.

Comparative models have been employed to aid rational drug design against parasites for more than 20 years. 132, 231, 232, 240 As early as 1993, Ring et al. 132 used comparative models for computational docking studies that identified low micromolar nonpeptidic inhibitors of proteases in malarial and schistosome parasite lifecycles. Li et al. 231 subsequently used similar methods to develop nanomolar inhibitors of falcipain that are active against chloroquine-resistant strains of malaria. In a study by Selzer et al. 232 comparative models were used to predict new nonpeptide inhibitors of cathepsin L-like cysteine proteases in Leishmania major. Sixty-nine compounds were selected by DOCK 3.5 as strong binders to a comparative model of protein cpB, and of these, 21 had experimental IC 50 values below 100 mmol l À1 . Finally, in a study by Que et al., 240 comparative models were used to rationalize ligand-binding affinities of cysteine proteases in Entamoeba histolytica. Specifically, this work provided an explanation for why proteases ACP1 and ACP2 had substrate specificity similar to that of cathepsin B, although their overall structure is more similar to that of cathepsin D.

Enyedy et al. 248 discovered 15 new inhibitors of matriptase by docking against its comparative model. The comparative model employed thrombin as the template, sharing only 34% sequence identity with the target sequence. Moreover, some residues in the binding site are significantly different; a trio of charged Asp residues in matriptase correspond to 1 Tyr and 2 Trp residues in thrombin. Thrombin was chosen as the template, in part because it prefers substrates with positively charged residues at the P1 position, as does matriptase. The National Cancer Institute database was used for virtual screening that targeted the S1 site with the DOCK program. The 2000 best-scoring compounds were manually inspected to identify positively charged ligands (the S1 site is negatively charged), and 69 compounds were experimentally screened for inhibition, identifying the 15 inhibitors. One of them, hexamidine, was used as a lead to identify additional compounds selective for matriptase relative to thrombin. The Wang group has also used similar methods to discover seven new, low-micromolar inhibitors of Bcl-2, using a comparative model based on the NMR solution structure of Bcl-X L . 233 Schapira et al. 249 discovered a novel inhibitor of a retinoic acid receptor by virtual screening using a comparative model. In this case, the target (RAR-a) and template (RAR-g) are very closely related; only three residues in the binding site are not conserved. The ICM program was used for virtual screening of ligands from the Available Chemicals Directory (ACD). The 5364 high-scoring compounds identified in the first round were subsequently docked into a full atom representation of the receptor with flexible side chains to obtain a final set of 300 good-scoring hits. These compounds were then manually inspected to choose the final 30 for testing. Two novel agonists were identified, with 50-nanomolar activity.

Zuccotto et al. 250 identified novel inhibitors of dihydrofolate reductase (DHFR) in Trypanosoma cruzi (the parasite that causes Chagas disease) by docking into a comparative model based on $50% sequence identity to DHFR in L. major, a related parasite. The virtual screening procedure used DOCK for rigid docking of over 50 000 selected compounds from the Cambridge Structural Database (CSD). Visual inspection of the top 100 hits was used to select 36 compounds for experimental testing. This work identified several novel scaffolds with micromolar IC 50 values. The authors report attempting to use virtual screening results to identify compounds with greater affinity for T. cruzi DHFR than human DHFR, but it is not clear how successful they were.

Following the outbreak of the severe acute respiratory syndrome (SARS) in 2003, Anand et al. 251 used the experimentally determined structures of the main protease from human coronavirus (M PRO ) and an inhibitor complex of porcine coronavirus (transmissible gastroenteritis virus, TGEV) M PRO to calculate a comparative model of the SARS coronavirus M PRO . This model then provided a basis for the design of anti-SARS drugs. In particular, a comparison of the active site residues in these and other related structures suggested that the AG7088 inhibitor of the human rhinovirus type 2 3C protease is a good starting point for design of anticoronaviral drugs. 252 Comparative models of protein kinases combined with virtual screening have also been intensely used for drug discovery. 224, 225, [253] [254] [255] The >500 kinases in the human genome, the relatively small number of experimental structures available, and the high level of conservation around the important adenosine triphosphate-binding site make comparative modeling an attractive approach toward structure-based drug discovery.

G protein-coupled receptors are another interesting class of proteins that in principle allow drug discovery through comparative modeling. 212, [256] [257] [258] [259] Approximately 40% of current drug targets belong to this class of proteins. However, these proteins have been extremely difficult to crystallize and most comparative modeling has been based on the atomic resolution structure of the bovine rhodopsin. 260 Despite this limitation, a rather extensive test of docking methods with rhodopsin-based comparative models shows encouraging results.

The applicability of structure-based modeling and virtual screening has recently been expanded to membrane proteins that transport solutes, such as ions, metabolites, peptides, and drugs. In humans, these transporters contribute to the absorption, distribution, metabolism, and excretion of drugs, and often, mediate drug-drug interactions. Additionally, several transporters can be targeted directly by small molecules. For instance, methylphenidate (Ritalin) inhibiting the norepinephrine transporter (NET) and, consequently, inhibiting the reuptake of norepinephrine, is used in the treatment of attention-deficit hyperactivity disorder (ADHD). 261 Schlessinger et al. 262 predicted 18 putative ligands of human NET by docking 6436 drugs from the Kyoto Encyclopedia of Genes (KEGG DRUG) into a comparative model based on $25% sequence identity to leucine transporter (LeuT) from Aquifex aeolicus. Of these 18 predicted ligands, ten were validated by cis-inhibition experiments; five of them were chemically novel. Close examination of the predicted primary binding site helped rationalize interactions of NET with its primary substrate, norepineprhine, as well as positive and negative pharmacological effects of other NET ligands. Subsequently, Schlessinger et al. 263 modeled two different conformations of the human GABA transporter 2 (GAT-2), using the LeuT structures in occluded-outward-facing and outward-facing conformations. Enrichment calculations were used to assess the quality of the models in molecular dynamics simulations and side-chain refinements. The key residue, Glu48, interacting with the substrate was identified during the refinement of the models and validated by site-directed mutagenesis. Docking against two conformations of the transporter enriches for different physicochemical properties of ligands. For example, top-scoring ligands found by docking against the outward-facing model were bulkier and more hydrophobic than those predicted using the occluded-outward-facing model. Among twelve ligands validated in cis-inhibition assays, six were chemically novel (e.g., homotaurine). Based on the validation experiments, GAT-2 is likely to be a high-selectivity/low-affinity transporter. Following these two studies, a combination of comparative modeling, ligand docking, and experimental validation was used to rationalize toxicity of an anti-cancer agent, acivicin. 264 The toxic sideeffects are thought to be facilitated by the active transport of acivicin through the blood-brain-barrier (BBB) via the large-neutral amino acid Transporter 1 (LAT-1). In addition, four small-molecule ligands of LAT-1 were identified by docking against a comparative model based on two templates, the structures of the outward-occluded arginine-bound arginine/agmatine transporter AdiC from E. coli 265 and the inward-apo conformation of the amino acid, polyamine, and organo-cation transporter ApcT from Methanococcus jannaschii. 266 Two of the four hits, acivicin and fenclonine, were confirmed as substrates by a trans-stimulation assay. These studies clearly illustrate the applicability of combined comparative modeling and virtual screening to ligand discovery for transporters.

Although reports of successful virtual screening against comparative models are encouraging, such efforts are not yet a routine part of rational drug design. Even the successful efforts appear to rely strongly on visual inspection of the docking results. Much work remains to be done to improve the accuracy, efficiency, and robustness of docking against comparative models. Despite assessments of relative successes of docking against comparative models and native X-ray structures, 225, 244 relatively little has been done to compare the accuracy achievable by different approaches to comparative modeling and to identify the specific structural reasons why comparative models generally produce less accurate virtual screening results than the holo structures. Among the many issues that deserve consideration are the following:

• The inclusion of cofactors and bound water molecules in protein receptors is often critical for success of virtual screening; however, cofactors are not routinely included in comparative models • Most docking programs currently retain the protein receptor in a rigid conformation. While this approach is appropriate for 'lock-and-key' binding modes, it does not work when the ligand induces conformational changes in the receptor upon binding. A flexible receptor approach is necessary to address such induced-fit cases 267, 268 • The accuracy of comparative models is frequently judged by the C a root mean square error or other similar measures of backbone accuracy. For virtual screening, however, the precise positioning of side chains in the binding site is likely to be critical; measures of accuracy for binding sites are needed to help evaluate the suitability of comparative modeling algorithms for constructing models for docking • Knowledge of known inhibitors, either for the target protein or the template, should help to evaluate and improve virtual screening against comparative models. For example, comparative models constructed from holo template structures implicitly preserve some information about the ligand-bound receptor conformation • Improvement in the accuracy of models produced by comparative modeling will require methods that finely sample protein conformational space using a free energy or scoring function that has sufficient accuracy to distinguish the native structure from the nonnative conformations. Despite many years of development of molecular simulation methods, attempts to refine models that are already relatively close to the native structure have met with relatively little success. This failure is likely to be due in part to inaccuracies in the scoring functions used in the simulations, particularly in the treatment of electrostatics and solvation effects. A combination of physics-based energy function with the statistical information extracted from known protein structures may provide a route to the development of improved scoring functions • Improvements in sampling strategies are also likely to be necessary, for both comparative modeling and flexible docking

Given the increasing number of target sequences for which no experimentally determined structures are available, drug discovery stands to gain immensely from comparative modeling and other in silico methods. Despite unsolved problems in virtually every step of comparative modeling and ligand docking, it is highly desirable to automate the whole process, starting with the target sequence and ending with a ranked list of its putative ligands. Automation encourages development of better methods, improves their testing, allows application on a large scale, and makes the technology more accessible to both experts and non-specialists alike. Through large-scale application, new questions, such as those about ligand-binding specificity, can in principle be addressed. Enabling a wider community to use the methods provides useful feedback and resources toward the development of the next generation of methods.

There are a number of servers for automated comparative modeling (Table 1) . However, in spite of automation, the process of calculating a model for a given sequence, refining its structure, as well as visualizing and analyzing its family members in the sequence and structure spaces can involve the use of scripts, local programs, and servers scattered across the internet and not necessarily interconnected. In addition, manual intervention is generally still needed to maximize the accuracy of the models in the difficult cases. The main repository for precomputed comparative models, the Protein Model Portal, 195, 198, 279 begins to address these deficiencies by serving models from several modeling groups, including the SWISS-MODEL 95 and ModBase 34 databases. It provides access to web-based comparative modeling tools, cross-links to other sequence and structure databases, and annotations of sequences and their models.

A number of databases containing comparative models and web servers for computing comparative models are publicly available. The Protein Model Portal (PMP) 195, 198, 279 centralizes access to these models created by different methodologies. The PMP is being developed as a module of the Protein Structure Initiative Knowledgebase (PSI KB) 316 and functions as a meta server for comparative models from external databases, including SWISS-MODEL 95 and ModBase, 34 additionally to being a repository for comparative models that are derived from structures determined by the PSI centers. It provides quality estimations of the deposited models, access to web-based comparative modeling tools, cross-links to other sequence and structure databases, annotations of sequences and their models, and detailed tutorials on comparative modeling and the use of their tools. The PMP currently contains 19.5 million comparative models for 4.4 million UniProt sequences (August 2013).

A schematic of our own attempt at integrating several useful tools for comparative modeling is shown in Figure 5 . 34, 291 ModBase is a database that currently contains $29 million predicted models for domains in approximately 4.7 million unique sequences from UniProt, Ensembl, 269 GenBank, 14 and private sequence datasets. The models were calculated using ModPipe 30, 291 and Modeller. 35 The web interface to the database allows flexible querying for fold assignments, sequence-structure alignments, models, and model assessments. An integrated sequence-structure viewer, Chimera, 304 allows inspection and analysis of the query results. Models can also be calculated using ModWeb, 291,309 a web interface to ModPipe, and stored in ModBase, which also makes them accessible through the PMP. Other resources associated with ModBase include a comprehensive database of multiple protein structure alignments (DBALI), 281 structurally defined ligand-binding sites, 319 structurally defined binary domain interfaces (PIBASE), 320,321 predictions of ligand-binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP). 199, 322, 323 A number of associated web services handle modeling of loops in protein structures (ModLoop), 324, 325 evaluation of models (ModEval), fitting of models against Small Angle X-ray Scattering (SAXS) profiles (FoXS), 326-328 modeling of ligand-induced protein dynamics such as allostery (AllosMod), 329, 330 prediction of the ensemble of conformations that best fit a given SAXS profile (AllosMod-FoXS), 331 prediction of cryptic binding sites, 332 scoring protein-ligand complexes based on a 335, 336 Compared to protein structure prediction, the attempts at automation and integration of resources in the field of docking for virtual screening are still in their nascent stages. One of the successful efforts in this direction is ZINC, 317,318 a publicly available database of commercially available drug-like compounds, developed in the laboratory of Brian Shoichet. ZINC contains more than 21 million 'ready-to-dock' compounds organized in several subsets and allows the user to query the compounds by molecular properties and constitution. The Shoichet group also provides a DOCKBLASTER service 337 that enables end-users to dock the ZINC compounds against their target structures using DOCK. 247, 338 In the future, we will no doubt see efforts to improve the accuracy of comparative modeling and ligand docking. But perhaps as importantly, the two techniques will be integrated into a single protocol for more accurate and automated docking of ligands against sequences without known structures. As a result, the number and variety of applications of both comparative modeling and ligand docking will continue to increase. CAMEO 195 http://cameo3d.org/ CASP 315 http://predictioncenter.llnl.gov Figure 5 An integrated set of resources for comparative modeling. 34 Various databases and programs required for comparative modeling and docking are usually scattered over the internet, and require manual intervention or a good deal of expertise to be useful. Automation and integration of these resources are efficient ways to put these resources in the hands of experts and non-specialists alike. We have outlined a comprehensive interconnected set of resources for comparative modeling and hope to integrate it with a similar effort in the area of ligand docking made by the Shoichet group. 317, 318 

Drug Disc

Comparative Protein Structure Modeling

Modeller 9

Comparative Protein Structure Modeling and Its Applications to Drug Discovery

Comparative Protein Structure Modeling

This article is partially based on papers by Jacobson and Sali, 201 Fiser and Sali, 339 and Madhusudhan et al. 340 We also acknowledge the funds from Sandler Family Supporting Foundation, NIH R01 GM54762, P01 GM71790, P01 A135707, and U54 GM62529, as well as Sun, IBM, and Intel for hardware gifts.