key: cord-0880473-73yhfraw
authors: Rosell, Mireia; Fernández-Recio, Juan
title: Docking approaches for modeling multi-molecular assemblies
date: 2020-06-29
journal: Curr Opin Struct Biol
DOI: 10.1016/j.sbi.2020.05.016
sha: e1448d80b4a4b1797167499f539694e158ee1331
doc_id: 880473
cord_uid: 73yhfraw

Computational docking approaches aim to overcome the limited availability of experimental structural data on protein–protein interactions, which are key in biology. The field is rapidly moving from the traditional docking methodologies for modeling of binary complexes to more integrative approaches using template-based, data-driven modeling of multi-molecular assemblies. We will review here the predictive capabilities of current docking methods in blind conditions, based on the results from the most recent community-wide blind experiments. Integration of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the new methodological advances on ab initio docking and integrative modeling.

Docking approaches for modeling multi-molecular assemblies Mireia Rosell 1,2 and Juan Ferná ndez-Recio 1, 2 Computational docking approaches aim to overcome the limited availability of experimental structural data on protein-protein interactions, which are key in biology. The field is rapidly moving from the traditional docking methodologies for modeling of binary complexes to more integrative approaches using templatebased, data-driven modeling of multi-molecular assemblies. We will review here the predictive capabilities of current docking methods in blind conditions, based on the results from the most recent community-wide blind experiments. Integration of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the new methodological advances on ab initio docking and integrative modeling.

Protein-protein interactions are key for the majority of biological functions. Proteins can form highly specific transient or permanent complexes that range from binary pairs to multi-molecular assemblies, often involving other biomolecules. A detailed structural knowledge of such complexes at atomic level would improve our understanding of biological processes and facilitate intervention for biomedical and biotechnological purposes. For example, recently reported structural data on the dynamic assembly formed by the SARS-CoV-2 trimeric spike protein and the cell receptor ACE2 are key to understand the molecular mechanisms of the virus infectivity and can be essential for the development of new vaccines and therapeutic candidates against COVID-19 [1 ,2 ,3] . However, structural data is available for only a small fraction of the protein interactome. For instance, the total number of protein-protein interactions in human is estimated to range from 130 000 [4] to 650 000 [5] , but less than 7000 of these interactions have available 3D structure (Interactome3D, 2019_1 version) [6] . In this context, computational docking approaches aim to overcome the limited availability of experimental structural data. Since the first reported protein-protein docking algorithms in the early 90's, based on Fast Fourier Transform (FFT) sampling [7] , the methodological developments have mostly focused on ab initio docking of binary complexes, starting from the structure of the unbound components. With the increasing availability of complex structures, in recent years attention is focused on template-based structural modeling of complexes, based the standard principles of homology-based modeling. The term template-based docking (as opposed to ab initio docking) is specifically used when a model is built by superimposing the structures (or models) of the unbound subunits onto the corresponding subunits of a template complex structure [18]. One advantage is that template-based modeling can be applied to multi-molecular complexes, not just to binary complexes as ab initio docking. In addition, it has been suggested that templates are available for the large majority of cases in which interacting subunits have structural information [19] . However, the general availability of good-quality templates that could be reliable used for template-based predictions seems much lower [20 ] . Actually, for the majority of known interactions, only templates with remote homology are available [4] , for which direct application of template-based methods leads to poor predictions [21] . Modeling multi-molecular assemblies implies additional challenges. For instance, some of the interfaces might not have available templates, in which case, we could model them by ab initio docking, in combination with restraints from evolutionary data or from available experimental information. Another challenge is to identify the relevant oligomerization state of the assembly when is different from that in the template [22] , in which case, alternative orientations provided by ab initio docking can be very helpful. Modeling the conformational variability of the assembly components imposes an additional difficulty. Indeed, directly taking the structure of a given subunit in another context (e.g. unbound state, different assembly or alternative oligomerization state) might lead to inaccurate models. For this, it can be useful the application of protein-protein docking and associated procedures, such as energy scoring, minimization, or flexible refinement.

We will review here the predictive capabilities of current protein-protein docking methods in blind conditions, based on the results from the most recent CASP [23 ] and CAPRI [24 ] experiments. These tests show that combination of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the most recent methodological novelties on ab initio docking, and new approaches for the inclusion of experimental information and integrative modeling.

Predictive capabilities of computational docking: the state-of-the-art Traditionally CASP has been focused on the prediction of the structure of individual proteins. However, very often proteins are found as oligomeric assemblies, which adds complexity to the modeling effort. To evaluate the applicability of docking methodologies for the prediction of protein oligomeric assemblies, the last three CASP editions included a CASP-CAPRI joint experiment focused on multimeric assemblies, which are independently evaluated by CASP and CAPRI communities. The recent CASP13-CAPRI challenge comprised a total of 20 protein oligomeric assemblies, including 14 homocomplexes and 6 hetero-complexes, which could be classified into 15 dimers and 5 multimeric assemblies [23 ]. In the 9 'easy' targets, there were good structural templates for the (partial or full) assembly, while for some of the remaining 11 'difficult' targets, it was possible to find remote templates for part of the assembly. The availability of templates in each case is critical to explain the predictive success of the groups. Focusing on the results for the top 10 predictions (to facilitate comparison with the reported performances of different docking methods in the literature), the best-performing group submitted acceptable (or better) models for 13 targets (65% of the cases) ( Figure 1 ). In the 'easy' targets, the best-performing group submitted acceptable models for all these cases, while in the 'difficult' targets, the best-performing group submitted acceptable models for only 4 of such targets (36% of the cases). Regarding the quality of the models, high-quality models [23 ] were submitted by any group in 78% of the 'easy' targets (with template), but only in 9% of the 'difficult' targets (no template).

On the other side, the recent 7th CAPRI edition showed more heterogeneity in its targets, comprising 8 proteinprotein, 3 protein-peptide, and 5 protein-oligosaccharide complexes, all hetero-oligomers (except for a homodecamer), which could be classified in 10 dimers and 6 multimeric assemblies [24 ] . The actual number of evaluated targets was 19, because some of the interfaces in these multimeric assemblies were considered as independent targets. There were structural templates for a total of 13 target interfaces (6 protein-protein, 2 protein-peptide, and 5 protein-saccharide). This was determinant for the overall predictive success of the groups as well as for the quality of the predicted models. Overall, the maximum number of target interfaces successfully predicted by a single group was 13 (i.e. success in 68% of the cases) ( Figure 1 ). But in cases with no available template, the best-performing groups submitted acceptable models for only 2 target interfaces (i.e. success in 33% of the cases). Regarding the quality of the models, high-quality models [24 ] were submitted by any group in 31% of the 'easy' targets (with template) and in 17% of the 'difficult' targets (no template). The 7th CAPRI edition showed that ab initio docking in cases for which there is no available template is still highly challenging, and progress is actually coming from the efficient procedures to combine template-based modeling and other docking methodologies.

The CASP and CAPRI experiments show that templatebased modeling approaches are clearly the tools of choice when one can use templates of sufficient quality. However, very often only remote templates are available, which might not be good enough to provide reliable models, as above discussed [21] . In unclear situations, a relevant question is which method to choose, or how to efficiently combine these protein-protein docking approaches depending on each specific case [20 ]. This is even more relevant when modeling multimeric complexes, in which some interfaces might be modelled based on homologous structures, while others would need ab initio docking, as above mentioned. An updated version of the InterEvDock2 server [27 ] can perform template-based docking or ab initio docking with evolutionary constraints, depending on the case. But the question is still open about how to efficiently combine template-based and ab initio docking when reliability of the template is unclear. We can obtain some hints from the recent CASP and CAPRI experiments.

In the recent CASP13-CAPRI joint assembly prediction experiment, one of the most efficient approaches was that of Fernández-Recio, based on a combination of templatebased and ab initio docking followed by pyDock scoring [23 ], which ranked 2nd and 1st among all the CAPRI predictors and scorers groups, respectively. Models for the subunits were built by CASP-hosted servers. Then, ab initio docking was applied in all cases, using appropriate symmetry constraints or interface restraints from literature. Additionally, when reliable templates were found, template-based models were built by superimposing all possible models of the monomers onto them. After sorting all built models by pyDock scoring, the proportion of template-based and ab initio docking models in the final set of submitted models depended on the reliability of the templates (Figure 2 ). The difference with other methodologies was more evident on the 'difficult' cases for which no clear template was available. For instance, in T154 ab initio docking by pyDock produced the only acceptable models among all participants. In T157, pyDock also produced some of the few successful models of all groups.

For scorers, pyDock was used to evaluate all the proposed models, and in case of reliable templates, consistency between energy-based scoring and template-based data was sought.

In 7th CAPRI, predictions using template information were in general successful. Indeed, failing to use available templates, as Fernández-Recio did in T122, T125 interface 1/4, and T133 targets, led to much worse predictions (although interestingly, this group was successful in the latter target, using only ab initio docking). This shows that it is critical to choose the optimal docking approach for each case, depending on the template availability. In the rest of targets, templates were used indirectly. In the two protein-peptide targets with good templates (T134, T135), ab initio docking with pyDock with restraints from the available templates was successful. In the six proteinsaccharide targets (T126-130), ab initio docking on the cavity identified from the available templates was also successful. These represent alternative strategies to combine ab initio docking with template information.

Computational approaches to docking Rosell and Ferná ndez-Recio 61 Finally, in the scorers experiment, pyDock got the best performance when considering top 10 predictions, which shows its capabilities to evaluate complex models derived from combined approaches (template-based, ab initio, refinement) [24 ] (Figure 1 ).

The most successful approach as predictor in CASP13-CAPRI was that of Venclovas group. They basically used template-based models when reliable templates were found, and free docking with HEX [11] otherwise. One of the reasons of their success could be the use of VoroMQA [28] for the evaluation and selection of the final models. However, they were less efficient in the scorers experiment (rank 7th), which might indicate that this function seems mostly optimized for their own pipeline for template-based and docking generation, while its application to models generated by other sources represents a challenge to be solved. Other successful approach was the use of CONSRANK [29, 30] for the ranking of docking models. CONSRANK is based on the most frequent inter-residue contacts in the ensemble of decoys, and has been updated to Clust-CONSRANK with the addition of a recently developed clustering procedure [31] . The best-performing server in CASP13-CAPRI was HDOCK [32] , from Huang's group, who developed a new pairwise shape-based scoring function (LSC) for protein-protein docking to take into account long-range interactions between protein atoms [33 ] . 

An example of the combination of template-based, ab initio docking and external data for integrative modeling of complexes. The scheme is based on the strategy followed by our group (Fernandez-Recio) as predictors in the recent CASP13-CAPRI and 7th CAPRI experiments.

of multi-protein complexes in combination with other evolutionary and physico-chemical properties [38] .

The identification of correct docking poses often fails due to intrinsic errors in current scoring functions, incorrect consideration of oligomerization states, or because of multiple interfaces that are not usually included in docking calculations. For all these reasons, the use of external information on a given complex is often critical for successful docking predictions. The pioneering HADDOCK [16], as well as other protein-protein docking methods, such as pyDock [39] , ZDOCK [40] or LightDock [4] have developed procedures to include distance restraints to improve the docking calculations. In this line, evolutionary information can be a relevant source of information for docking [42] . Indeed, the most successful docking approach in the recent 7th CAPRI edition was that of the Andreani and Guerois group. The challenging cases of this CAPRI edition encouraged them to go beyond their traditional rigid-body and InterEvScore approach, so they applied different strategies for the inclusion of evolutionary constraints, such as template-based modeling with Roset-taCM-based protocol [43] , identification of conserved anchoring interface motifs when only remote homologs were available, and covariation-based modeling of interacting subunits in cases in which traditional homology-based modeling would fail [44 ] .

In a broader sense, integrative computational approaches that aim to efficiently use experimental structural data and additional information from a variety of sources for the structural modeling of complexes are becoming increasingly popular [45] . One example is the integration of Small-Angle X-ray Scattering (SAXS) experimental data in ab initio docking methods such as pyDock [46] [47] [48] , HADDOCK [49] , PatchDock [50, 51] , ATTRACT [52] or ClusPro [53] . And chemical cross-linking data has also been integrated in protein docking methods such as ZDOCK [54] . In the 7th CAPRI experiment, the use of integrative modeling approaches was blindly evaluated. Targets T150 and T151 were the same complex as T149, a challenging multi-domain dimer, for which SAXS and chemical cross-linking data were provided, respectively. Interestingly, the inclusion of restraints from SAXS data improved the models submitted by pyDock for the original target (with few successful groups), and the cross-linking data further improved pyDock submissions [55] .

The most recent community-wide blind tests on the structural prediction of multi-molecular assemblies and heteromeric protein complexes (including interaction with peptides and saccharides) clearly showed that template availability, as well as any additional information on the complex, are critical for the modeling success. Several groups are focusing their efforts on developing new procedures for efficient integration of template-based and evolutionary information with ab initio docking methods, which are producing more accurate and realistic models. Additional methodological developments on protein docking include improvement of scoring functions, and better treatment of conformational flexibility during docking search, but the field is clearly moving towards an integrative analysis and modeling of protein complexes.

Nothing declared. On the basis of these results, they conclude that template-based methods yield more accurate predictions if good templates can be found, but generally fail without such templates. The argue that template-based docking for targets with good templates and free docking for targets with worse templates is likely to increase the success rates beyond 40%, which can be further improved by additional restraints from experimental information. Round 46 , the third joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of 20 targets including 14 homo-oligomers and 6 heterocomplexes. Eight of the homo-oligomer targets and one heterodimer comprised proteins that could be readily modeled using templates from the Protein Data Bank, often available for the full assembly. The remaining 11 targets comprised 5 homodimers, 3 heterodimers, and two higherorder assemblies. These were more difficult to model, as their prediction mainly involved 'ab-initio' docking of subunit models derived from distantly related templates. A total of 30 CAPRI groups, including 9 automatic servers, submitted on average 2000 models per target. About 17 groups participated in the CAPRI scoring rounds, offered for most targets, submitting 170 models per target. The prediction performance, measured by the fraction of models of acceptable quality or higher submitted across all predictors groups, was very good to excellent for the nine easy targets. Poorer performance was achieved by predictors for the 11 difficult targets, with medium and high quality models submitted for only 3 of these targets. This experiment highlights yet again the unmet challenge of modeling the conformational changes of the protein components that occur upon binding or that must be accounted for in template-based modeling.

Lensink MF, Nadzirin N, Velankar S, Wodak SJ: Modeling proteinprotein, protein-peptide, and protein-oligosaccharide complexes: CAPRI 7th edition. Proteins 2019. (in press) This manuscript describes a summary of the seventh Critical Assessment of Predicted Interactions (CAPRI) community-wide initiative. Performance was evaluated on the basis of 36 114 models of protein complexes submitted by 57 groups-including 13 automatic servers-in prediction rounds held during the years 2016-2019 for eight protein-protein, three protein-peptide, and five protein-oligosaccharide targets with different length ligands. Models of acceptable quality, or better, were obtained for a total of six protein-protein complexes, which included four of the challenging hetero-complexes and a homo-decamer. High accuracy models were obtained for two of the three protein-peptide targets, and for one of the protein-oligosaccharide targets. The remaining protein--sugar targets were predicted with medium accuracy. This analysis indicates that progress in predicting increasingly challenging and diverse types of targets is due to closer integration of template-based modeling techniques with docking, scoring, and model refinement procedures, and to significant incremental improvements in the underlying methodologies. 

Quignot C, Rey J, Yu J, Tuffé ry P, Guerois R, Andreani J: InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs. Nucleic Acids Res 2018, 46: W408-W416 This paper describes InterEvDock2 server, a major evolution of the previous InterEvDock server, which performs ab initio protein docking based on rigid-body sampling followed by consensus scoring using physics-based and statistical potentials, including the InterEvScore function specifically developed to incorporate co-evolutionary information in docking. InterEvDock2 includes automatic template search and comparative modeling of the input proteins. This new server has been benchmarked on 812 complexes in which structural models for the interacting partners can be built by homology and there is available co-evolutionary information in the PPI4DDOCK database. The server identifies a correct model among the top 10 consensus in 29% of the benchmark cases, a significant improvement with respect to the individual scoring functions.

Papers of particular interest, published within the period of review, have been highlighted as: of special interest of outstanding interest

press This manuscript reveals the 3D crystal structure of the SARS-CoV-2 spike receptor-binding domain (RBD) bound to the cell receptor ACE2. The structural comparison with SARS-CoV RBD-ACE2 complex helps to identify the critical residues for ACE2 binding, and bring new insights into convergent evolution between the SARSCoV-2 and SARS-CoV RBDs for improved binding to ACE2. They also structurally analyze the epitopes of two SARS-CoV antibodies targeting the RBD

Veesler D: Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein

This manuscript describes the cryo-EM structure of SARS-CoV-2 spike ectodomain trimer in two different conformational states, closed and partially open (one S B domain open), providing new insights for the design of vaccines and inhibitors of viral entry. They found that the SARSCoV-2 S glycoprotein harbors a furin cleavage site at the boundary between the S1/S2 subunits, which is not present in SARS-CoV and other SARSrelated CoVs

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation

An empirical framework for binary interactome mapping

Estimating the size of the human interactome

Interactome3D: adding structural details to protein networks

Molecular surface recognition: determination of

VoroMQA: assessment of protein structure quality using interatomic contact areas

Ranking multiple docking solutions based on the conservation of inter-residue contacts

CONSRANK: a server for the analysis, comparison and ranking of docking models based on inter-residue contacts

Introducing a clustering step in a consensus approach for the scoring of protein-protein docking models

HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy

Pushing the accuracy limit of shape complementarity for protein-protein docking

LightDock: a new multi-scale approach to protein-protein docking

Protein-protein interaction specificity is captured by contact preferences and interface composition

This manuscript describes a new contact propensity matrix for scoring of protein-protein docking decoys called CIPS (Combined Interface Propensity for decoy Scoring), which combines interface composition with residue-residue contact preferences. They successfully compare it with other residue statistical potentials. They propose CIPS as a fast, accurate and robust method for selecting millions of docking decoys, and discuss the possibility of using it in docking-drive search

Identification of protein-protein interaction sites from docking energy landscapes

Decrypting protein surfaces by combining evolution, geometry, and molecular docking

Efficient restraints for protein-protein docking by comparison of observed amino acid substitution patterns with those predicted from local environment

Integrating statistical pair potentials into protein complex prediction

Jimé nez-García B: LightDock goes information-driven

InterEvScore: a novel coarsegrained interface scoring function using a multi-body statistical potential coupled to evolution

High-resolution comparative modeling with RosettaCM. Structure

Docking proteins and peptides under evolutionary constraints in Critical Assessment of PRediction of Interactions rounds 38 to 45. Proteins 2019. (in press) This manuscript describes the participation of the Andreani/Guerois group in CAPRI 7th, where they had the best performance among all participants, using additional strategies to include evolutionary information in docking predictions beyond their standard InterEvDock pipeline. These strategies include template-based modeling with local adjustments for sequence identity templates above 30% and larger perturbations otherwise; covariation-based structure prediction for individual protein partners

Integrative modelling of biomolecular complexes

Structural characterization of proteinprotein complexes by integrating computational docking with small-angle scattering data

Stuctural characterization of protein-protein interactions with pyDock SAXS

pyDockSAXS: protein-protein complex structure bySAXS and computational docking

On the usefulness of ion-mobility mass spectrometry and SAXS data in scoring docking decoys

Modeling structure and dynamics of protein complexes with SAXS profiles

Macromolecular docking restrained by a small angle X-ray scattering profile

SAXS data alone can generate high-quality models of proteinprotein complexes

ClusPro FMFT-SAXS: ultrafast filtering using small-angle X-ray scattering data in protein docking

Integrating cross-linking experiments with ab initio protein-protein docking

Integrative modeling of protein-protein interactions with pyDock for the new docking challenges

This work was supported by grant BIO2016-79930-R from the Spanish 'Programa Estatal I+D+I', and EFA086/15 PIREPRED from the EU European Regional Development Fund (ERDF) Program Interreg V-A Spain-France-Andorra (POCTEFA).Uncited reference [41] .

This manuscript describes a new pairwise shape-based scoring function (LSC) for protein-protein docking, with an exponential form to take into account long-range interactions. The function is incorporated in their FFTbased docking program. They successfully compare the predictive performance with other FFT-based docking approaches. This scoring function is implemented in HDOCK, the best-performing server in CASP13-CAPRI, and one of the best ones in CAPRI 7th.

Marze NA, Roy Burman SS, Sheffler W, Gray JJ: Efficient flexible backbone protein-protein docking for challenging targets. Bioinformatics 2018, 34:3461-3469 This paper describes the new RosettaDock 4.0, based on a new backbone sampling algorithm called Adaptive Conformer Selection (ACS) that efficiently uses conformational ensembles of proteins, and a new low resolution scoring with Motif Dock Score (MDS). The reported performance is significantly improved over previous version and other docking methods, especially for flexible cases. For highly flexible proteins, the docking procedure is successful when a suitable conformer generation method exists.