key: cord-0309965-megexs7b authors: Xu, Gang; Wang, Yilin; Wang, Qinghua; Ma, Jianpeng title: Studying protein-protein interaction through side-chain modeling method OPUS-Mut date: 2022-05-16 journal: bioRxiv DOI: 10.1101/2022.05.15.492033 sha: ff0f6a98419866e40d7d32f9a3a39cc0de2244d2 doc_id: 309965 cord_uid: megexs7b Protein side chains are vitally important to many biological processes such as protein-protein interaction. In this study, we evaluate the performance of our previous released side-chain modeling method OPUS-Mut, together with some other methods, on three oligomer datasets, CASP14 (11), CAMEO-Homo (65), and CAMEO-Hetero (21). The results show that OPUS-Mut outperforms other methods measured by all residues or by the interfacial residues. We also demonstrate our method on evaluating protein-protein docking pose on a dataset Oligomer-Dock (75) created using the top 10 predictions from ZDOCK 3.0.2. Our scoring function correctly identifies the native pose as the top-1 in 45 out of 75 targets. Different from traditional scoring functions, our method is based on the overall side-chain packing favorableness in accordance with the local packing environment. It emphasizes the significance of side chains and provides a new and effective scoring term for studying protein-protein interaction. Protein-protein interaction is essential for many biological systems, and it is also important in designing peptidic drugs [1] . Many protein-protein interactions are mediated by amino-acid side chains, especially those of the interfacial residues [2] . Therefore, accurate side-chain modeling for the interfacial residues is crucial. In recent years, many successful backbone-dependent side-chain modeling methods have been proposed [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] . The sampling-based methods select the rotamers from the rotamer library according to their search schemes, and keep the best rotamer for each residue with the minimal score depending on their scoring functions. This kind of methods run fast, and they are suitable for the repeatedly-applied side-chain modeling in the folding process, examples include SCWRL4 [12] and FASPR [14] . However, the performance is limited by the discrete rotamers in the rotamer library and the accuracy of the scoring function [4] . With the help of deep learning techniques, some new methods have been developed, which successfully capture the local environment of each residue, and improve the accuracy of side-chain modeling by a large degree. Examples include DLPacker [5] and OPUS-Mut [15] . Scoring protein-protein docking poses is another important task in studying protein-protein interaction. Various criteria [16] have been proposed, such as force-field based criteria [17, 18] , knowledge-based criteria [19, 20] , and machine learning-based criteria [21] . Since the side chains of the residues, especially those located at the interface, are crucial for protein-protein interaction, a scoring term that is mainly based on side chains may be an effective term for better scoring the protein-protein docking poses. In this paper, we use the structures of oligomers to study protein-protein interaction. We evaluate the side-chain modeling performance of our previous released method OPUS-Mut [15] , along with some other methods, on three oligomer datasets collected in this study, CASP14 (11), CAMEO-Homo (65), and CAMEO-Hetero (21) . The results show that OPUS-Mut outperforms other methods measured by all residues, or by the interfacial residues. To evaluate the performance of OPUS-Mut in scoring protein-protein docking poses, we create a protein-protein docking pose dataset Oligomer-Dock (75) using the top 10 predictions from ZDOCK 3.0.2 [22] . The results show that our scoring function OPUS-Mut ( ) correctly identifies the native pose as the top-1 in 45 out of 75 targets, and ranks native pose among top-3 poses in 67 out of 75 targets. This indicates the effectiveness of using the overall side-chain packing favorableness in accordance with the local packing environment to evaluate different docking poses. In addition, we verify the performance of OPUS-Mut in studying protein mutation on an oligomeric target, SARS-CoV-2 NSP7-NSP8 complex. The results suggest that the usage of OPUS-Mut in studying protein mutation may be generalized to oligomeric complexes. Three oligomer datasets are collected in this study: CASP14 (11) contains 11 oligomers downloaded from the CASP14 website (https://predictioncenter.org/download_area/CASP14/targets), CAMEO-Homo (65) and CAMEO-Hetero (21) contain 65 homo-oligomers and 21 hetero-oligomers, labeled by the CAMEO website [23] released between November 2021 and February 2022, respectively. Note the oligomers with over 3000 residues in length have been excluded from the datasets because of the limitation of GPU memory. A protein-protein docking pose dataset Oligomer-Dock (75) is created in this study. Among all 97 oligomers from three oligomer datasets, we exclude the oligomers with more than 4 peptide chains, and retain 75 oligomers. For each oligomer, we use the top 10 poses generated by ZDOCK 3.0.2 [22] as its decoys. In the calculation of ZDOCK 3.0.2, for each oligomer, the last peptide chain in the PDB file is defined as "ligand", the remaining peptide chains are defined as "receptor". OPUS-Mut is a backbone-dependent side-chain modeling method that was released by us recently [15] . It is mainly based on OPUS-Rota4 [4] , but with some improvements. The method was shown to outperform some other methods, measured by all residues or by core residues only, on the targets with single peptide chain. In our previous study [15] , we use OPUS-Mut to study protein mutation. Briefly speaking, as shown in lower green panel in Figure 1 , by comparing the differences between its predicted unmutated (wild-type) side chains and its predicted mutated side chains, we can infer the extent of structural perturbation and the affected residues from those side chains significantly shifted upon the mutation. Also, from the extent of side-chain structural perturbation, we can infer the minimally disturbing mutation, from which we may construct a protein with relatively low sequence homology but with similar structure with respect to the wild type. From the affected residues, we may also use them to infer the possible functional changes if the functions are related to certain residues. [ Figure 1 ] For each residue, besides outputs the predicted side chain conformation, OPUS-Mut also outputs the predicted Root Mean Square Deviation (pRMSD) for its side-chain prediction (upper green panel in Figure 1 ). To this end, a classification node is used to learn the RMSD between the predicted side chain and its native counterpart for each residue. The pRMSD ranges from 0 to 1, and is segmented into 20 bins. Cross-entropy loss is used for training. In addition, OPUS-Mut adopts a 3DCNN module [5] to capture the local environment for each residue, therefore it can respond to the change of local environment with high sensitivity. For the residue with lower pRMSD value, OPUS-Mut predicts its side chain with a higher confidence in accordance with its local environment. In this study, we use the summation of pRMSD as an indicator to gauge the overall side-chain packing favorableness in a protein structure, i.e., likeliness of its local packing environment to the native packing environment. In studying docking pose, we name the summation of pRMSD over all residues as , the summation over interfacial residues as , and the summation over other residues as ℎ . In our pervious study, for the targets with single peptide chain, we have demonstrated that OPUS-Mut outperforms some other backbone-dependent side-chain modeling methods, measured by all residues or by core residues only. In this study, we evaluate the performance of OPUS-Mut and some other backbone-dependent side-chain modeling methods on oligomeric targets. For comparison, we also use OPUS-Mut to model each peptide chain separately, i.e., to model the conformation of side chains without considering the effects of other peptide chains. We name this single-chain approach as OPUS-Mut-s. The code and pre-trained models of OPUS-Mut and the four datasets used in this paper can be found at http://github.com/OPUS-MaLab/opus_mut. They are freely available for academic usage only. We compare the side-chain modeling performance of OPUS-Mut with that of SCWRL4 [12] , OSCAR-star [13] and DLPacker [5] on three oligomer datasets CASP14 (11), CAMEO-Homo (65), and CAMEO-Hetero (21) . In terms of residue-wise percentage of correct prediction with a tolerance criterion 20° for all side-chain dihedral angles (from χ 1 to χ 4 ), OPUS-Mut outperforms other methods measured by all residues (Figure 2a) , or by the residues located at the interfaces between different peptide chains ( Figure 2b ). In this study, the residues with at least one nearby residue (Cα-Cα distance < 10 Å) located at other peptide chain(s) are defined as interfacial residues, and the rest residues are defined as other residues. [ Figure 2 ] As examples, we show two cases of OPUS-Mut side-chain modeling results on interfacial residues and their corresponding experimentally determined crystal structures in Figure 3 . [ Figure 3 ] For a particular peptide chain, to evaluate the influence of other peptide chains on it, we use OPUS-Mut-s, which doesn't take the effect of other peptide chains into consideration, and models each peptide chain separately. As shown in Table 1 , OPUS-Mut outperforms OPUS-Mut-s measured by all residues. While the performance on other residues is almost the same, and the differences are mainly seen in the interfacial residues, for which the performance of OPUS-Mut is significantly better than that of OPUS-Mut-s. [ Table 1 ] For studying protein-protein docking poses, we first examine the effect of the partner peptide chain(s) in oligomers. We compare the summation of pRMSD obtained by OPUS-Mut with that obtained by OPUS-Mut-s on three oligomer datasets, the latter doesn't take the effects of other peptide chains into consideration. As shown in Table 2 , the average values of of the targets in each oligomer dataset is lower than that [ Table 2 ] For further investigation, we examine the performance of OPUS-Mut on distinguishing the native docking pose from 10 predicted docking pose decoys for the targets in Oligomer-Dock (75). The results of ZRANK [18] are also listed for comparison. Before the calculation of ZRANK, we use addh in Chimera 1.14 [24] to add the hydrogens for each PDB file, then the "TER" line is added between ligand and receptor atom coordinates. As shown in Table 3 Since the interfacial residues vary in different docking poses, we therefore use , as a scoring function (OPUS-Mut ( )), to measure the extent of the improvement of the interfacial side-chain packing favorableness upon docking. For each pose, N is the number of interfacial residues, pRMSD n1 denotes the pRMSD of the residue n predicted by OPUS-Mut-s, pRMSD 2 denotes the pRMSD of the residue n predicted by OPUS-Mut. We assume that a larger refers to a better docking pose. By using OPUS-Mut ( ) as a scoring function, as shown in Table 3 , our method correctly identifies the native pose as the top-1 in 47 out of 75 targets, and ranks native pose among top-3 poses in 61 out of 75 targets. [ Table 3 ] As examples, we show docking pose evaluation results on target T1080o in Figure 4d . Moreover, in Figure 4b , the docking pose is close to the native state (DockQ [25] score 0.948), the results show that the score of ZRANK for this pose is -841.8, lower than the native state score of -793.2, indicating that ZRANK does not identify the correct native pose in this case, while "Ours" does. [ Figure 4 ] Two conserved oligomer interfaces of NSP7 and NSP8 have been studied by Biswal et al [26] . NSP7 and NSP8 belong to a complex of non-structural proteins (NSP) that hetero-tetramer formation. The results also show that, the interface II can not only maintain the hetero-tetrameric assembly of NSP7-NSP8, but also helps to stabilize the hetero-dimeric assembly of NSP7-NSP8. The mutation NSP7 N37V does not affect the stability of the NSP7-NSP8 hetero-tetramer appreciably, but it leads to a modest disruption of the NSP7-NSP8-NSP12 complex. According to Subissi et al [27] , the mutations NSP8 K82A , and NSP8 S85A do not affect the NSP8-NSP12 interaction, but lead to activity loss. In this study, we download the SARS-CoV-2 NSP7-NSP8 complex (PDB: 7JLT) [26] . Then, we substitute the residues according to the corresponding mutations mentioned above and reconstruct their side chains with OPUS-Mut. Similar to our previous study, we define the affected residue as ones whose mean absolute error of all predicted side-chain dihedral angles (from χ 1 to χ 4 ) between the wild-type and mutation is greater than 5 degree. The rest of side chains of other residues are deemed relatively unshifted. All of the affected residues are listed in Table 4 for each mutation. As shown in does not affect the stability of the NSP7-NSP8 hetero-tetramer appreciably although it causes a modest disruption of the NSP7-NSP8-NSP12 complex. Our predictions show NSP7 N37V may cause significantly shift of several residues involved in interface I, interface II, and NSP8-NSP12 interface, which implies that it could affect the stability of the NSP7-NSP8 hetero-tetramer and the formation NSP7-NSP8-NSP12 complex. [ Table 4 ] Protein side chains, especially those located at the interfaces, are crucial for protein-protein interaction. In this study, we study protein-protein interaction through side-chain modeling. We evaluate the performance of several backbone-dependent side-chain modeling methods on three oligomer datasets. The results show that our pervious released method OPUS-Mut [15] outperforms other methods measured by all residues or by the interfacial residues (Figure 2) , and its side-chain modeling results for the residues located between different peptide chains are very close to their experimentally determined crystal structures (Figure 3 ). When omitting the influence of partner peptide chain(s) in an oligomer and modeling the side chains of each peptide chain separately, the modeling accuracy on oligomer will decrease, especially for that on interfacial residues (OPUS-Mut-s, Table 1 ). This result indicates that the side chains of interfacial residues may experience conformational changes upon protein-protein association, and it also demonstrates the sensitivity of OPUS-Mut towards local environmental changes. OPUS-Mut can output the predicted Root Mean Square Deviation (pRMSD) for its predicted side chains on each residue. For a particular residue, lower pRMSD value indicates that OPUS-Mut predicts its side chain with a higher confidence in accordance with its local environment. In this study, we use the summation of pRMSD as an indicator to gauge the overall packing favorableness of side-chain in a protein structure, i.e., likeliness of its local packing environment to the native packing state. As shown in Table 2 The results indicate that protein-protein interaction may bring a more favorable local packing environment to the interfacial residues. We compare the performance of identifying native docking pose based on pRMSD from OPUS-Mut with that of ZRANK on Oligomer-Dock (75). Using the summation of pRMSD over all residues (OPUS-Mut ( )) as a scoring function achieves better results than that using the result from ZRANK, either on correctly identifying native pose, or on ranking native pose in the top three poses ( Table 3 ). Note that, using the summation of pRMSD on interfacial residues as a scoring function (OPUS-Mut ( )) may have a bias since the interfacial residues vary in different docking poses. Therefore, we recommend using OPUS-Mut ( ) as a scoring function for scoring different poses. Along with the examples shown in Figure 4 , we show that, our scoring function OPUS-Mut ( ), which is based on the overall side-chain packing favorableness in accordance with the local packing environment, may be an effective term for better scoring protein-protein docking poses. We also verify the performance of OPUS-Mut in studying protein mutation on oligomeric target, SARS-CoV-2 NSP7-NSP8 complex. As shown in Table 4 , most of our results are consistent with the experimental results, which indicates that the usage of OPUS-Mut in studying protein mutation may be generalized to oligomeric target. Figure 1 . Two applications of backbone-dependent side-chain modeling method OPUS-Mut. The applications of OPUS-Mut in studying protein mutation on the target with single peptide chain (lower green panel) have been evaluated in our previous work. In this paper, we focus on its application in studying protein-protein interaction (upper green panel), and its application in studying protein mutation on oligomeric complexes. The residue-wise percentage of correct prediction with a tolerance criterion 20° for all side-chain dihedral angles (from χ 1 to χ 4 ) of different methods on three oligomer datasets. a) shows the results measured by all residues. b) shows the results measured by interfacial residues. TOP1 45 47 16 TOP3 67 61 56 Table 4 . Affected residues of different mutations predicted by OPUS-Mut and catalogized by their locations. Experimentally, on the interface I mutations NSP7 F49A , NSP7 M52A , NSP7 L56A , and NSP8 F92A impair the NSP7-NSP8 association; on the interface II mutations NSP7 C8G , NSP7 V11A , NSP8 M90A , and NSP8 M94A impair the NSP7-NSP8 hetero-tetramer formation. The interface II maintains the hetero-tetrameric assembly of NSP7-NSP8, and also stabilizes the hetero-dimeric assembly of NSP7-NSP8. The mutation NSP7 N37V does not destabilize the stability of the NSP7-NSP8 hetero-tetramer significantly, but causes a modest disruption of the NSP7-NSP8-NSP12 complex. The mutations NSP8 K82A , and NSP8 S85A do not affect the NSP8-NSP12 interaction, but result in activity loss. Imbalance Data Processing Strategy for Protein Interaction Sites Prediction Progress and challenges in predicting protein interfaces OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods OPUS-Rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors Deep learning for prediction of amino acid side chain conformations in proteins Improved side-chain modeling by coupling clash-detection guided iterative search with rotamer relaxation FASPR: an open-source tool for fast and accurate protein side-chain packing Protein side chain modeling with orientation-dependent atomic force fields derived by series expansions OPUS-Rota: a fast and accurate method for side-chain modeling RASP: rapid modeling of protein side chain conformations SIDEpro: a novel machine learning approach for the fast and accurate prediction of side-chain conformations Improved prediction of protein side-chain conformations with SCWRL4 Fast and accurate prediction of protein side-chain conformations OPUS-Rota2: An Improved Fast and Accurate Side-Chain Modeling Method OPUS-Mut: studying the effect of protein mutation through side-chain modeling Protein-Protein Docking: Past, Present, and Future HawkRank: a new scoring function for protein-protein docking based on weighted energy terms ZRANK: reranking protein docking predictions with an optimized energy function PIZSA): an empirical scoring scheme for evaluation of protein-protein interactions An iterative knowledge-based scoring function for protein-protein recognition IRaPPA: information retrieval based integration of biophysical models for protein assembly selection Accelerating protein docking in ZDOCK using an advanced 3D convolution library Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 UCSF chimera -A visualization system for exploratory research and analysis DockQ: A Quality Measure for Protein-Protein Docking Models Two conserved oligomer interfaces of NSP7 and NSP8 underpin the dynamic assembly of SARS-CoV-2 RdRP One severe acute respiratory syndrome coronavirus protein complex integrates processive RNA polymerase and exonuclease activities Table 2. The average values for the summation of pRMSD on three oligomer datasets Figure 4 . Docking poses evaluation results on T1080o. The structures of native receptor are marked in blue, the structures of native ligand are marked in red. The structures of ligand predicted by ZDOCK 3.0.2 are marked in yellow. We use "Ours" to denote the score from OPUS-Mut ( ). The pose with lower score is closer to the native pose. Table 1 . The residue-wise percentage of correct prediction with a tolerance criterion 20° for all side-chain dihedral angles (from χ 1 to χ 4 ) of OPUS-Mut and OPUS-Mut-s on three oligomer datasets. Interfacial residues Other residues