key: cord-0122555-nhewcqji authors: Yang, Lin; Guo, Shuai; Hou, Chengyu; Liao, Chencheng; Li, Jiacheng; Shi, Liping; Ma, Xiaoliang; Jiang, Shenda; Zheng, Bing; Fang, Yi; Ye, Lin; He, Xiaodong title: Space Layout of Low-entropy Hydration Shells Guides Protein Binding date: 2022-02-22 journal: nan DOI: nan sha: 7cbc1dd6533436d042a41e292964bd57b84b5ded doc_id: 122555 cord_uid: nhewcqji Protein-protein binding enables orderly and lawful biological self-organization, and is therefore considered a miracle of nature. Protein-protein binding is steered by electrostatic forces, hydrogen bonding, van der Waals force, and hydrophobic interactions. Among these physical forces, only the hydrophobic interactions can be considered as long-range intermolecular attractions between proteins in intracellular and extracellular fluid. Low-entropy regions of hydration shells around proteins drive hydrophobic attraction among them that essentially coordinate protein-protein docking in rotational-conformational space of mutual orientations at the guidance stage of the binding. Here, an innovative method was developed for identifying the low-entropy regions of hydration shells of given proteins, and we discovered that the largest low-entropy regions of hydration shells on proteins typically cover the binding sites. According to an analysis of determined protein complex structures, shape matching between the largest low-entropy hydration shell region of a protein and that of its partner at the binding sites is revealed as a regular pattern. Protein-protein binding is thus found to be mainly guided by hydrophobic collapse between the shape-matched low-entropy hydration shells that is verified by bioinformatics analyses of hundreds of structures of protein complexes. A simple algorithm is developed to precisely predict protein binding sites. Proteins serve a variety of important functions in organisms. A protein's intrinsic biological functions are normally expressed via precise binding with another protein (i.e. a ligand) that derives from the physical phenomenon of protein docking, by which a protein can find its partner protein to form their functional complex structure. Protein-protein binding is a spontaneous physical contact of high specificity established between two specific protein molecules, and erroneous protein-protein binding is highly rare in intracellular and extracellular fluid (1). Thus, the physical mechanism responsible for protein-protein binding can be considered the most important mechanism of biological self-organization, functionalization, and diversity. Protein-protein binding is one of the miracles of nature that human technology finds quite difficult to follow, due to the very large number of possibilities of the rotationalconformational space of mutual orientations potentially sampled by a pair of proteins as they interact. The protein-protein docking is the prediction of the structure of the complex, given the structures of the individual proteins (2, 3) . Research on protein-protein docking has become more popular due to its potential to predict protein-protein interactions (PPIs) (4, 5) . In the field of structural biology, protein docking research focuses on computationally simulating the molecular recognition process. A variety of conformational search strategies have been applied to predict protein docking (6)(7). Although searching algorithms normally aim to achieve an optimized conformation for the complex of a pair of proteins with the minimized free energy of the overall system, sampling of the conformational space in protein-protein docking is still a challenging (8, 9) . Protein-protein binding is mainly governed by electrostatic forces, hydrogen bonding, van der Waals forces, and hydrophobic interactions. Among these physical forces, only the hydrophobic interactions can be viewed as long-range intermolecular attractions between proteins in aqueous solutions. The hydration shell (i.e. hydration layer) around a protein has been experimentally found to have dynamics distinct from the bulk water to a distance of 1nm (10, 11) . Water molecules slow down greatly when they enter the hydration shell of a protein, resulting in lower entropy levels within the shell than bulk water molecules (10) (11) (12) (13) . Protein surface hydrophilic groups are normally hydrogen bonded with surrounding strong polar water molecules in the hydration shell, thereby preventing the surface hydrogen bond donors of a protein from randomly hydrogenbonding or electrostatic attracting with the hydrogen bond acceptors of another protein, namely, preventing erroneous protein-protein binding in unsaturated aqueous solution (11, (14) (15) (16) (17) (18) . Thus, protein-protein binding phenomenon should start from the long-range hydrophobic attraction between low-entropy regions within the protein's hydration shell (19, 20) . Despite this, there have been few studies to accurately identify low-entropy regions of protein's hydration shell. Recent experimental evidence has shown that the protein surface hydration dynamics are highly heterogeneous over the global protein surface (21) (22) (23) . This indicate the existence of low-entropy regions of protein hydration shells. The distribution of hydrophilic and hydrophobic groups on a protein surface determines the protein surface hydration dynamics (21, 22) . In experimentally determined protein structures, there are always some exposed hydrophilic backbone carbonyl oxygen atoms and backbone amide hydrogen atoms at the protein surfaces. While some surface hydrophobic side-chains protrude outward to surrounding water molecules that may shield the hydrophilic backbone carbonyl oxygen atoms and backbone amide hydrogen atoms. For example, according to all hydrophobicity scales, isoleucine, valine, and leucine residues are highly hydrophobic, even if the residues contain hydrophilic backbone carbonyl oxygen atoms and amide hydrogen atoms (see Fig.1a ). Hydrophobic side-chains of isoleucine, valine, and leucine residues on protein surfaces expel surrounding water molecules to van der Waals interactions operating distances (0.3 to 0.6 nm) to form the low-entropy hydration shells (i.e. ordered water molecules cages). van der Waals interactions operating distances are much larger than the hydrogen bonding distance (0.3nm). In this way, the ordered water molecules in the low-entropy hydration shells are inhibited from fluctuating and rearrangement, and are therefore prevented from frequently hydrogen bonding with the backbone carbonyl oxygen atoms and amide hydrogen atoms, as shown in Fig.1a . This indicates that the highly hydrophobic side-chains can shield the hydrophilicity of the backbone atoms to a certain extent. This make hydration shells covering the backbone carbonyl oxygen atoms and amide hydrogen atoms of these hydrophobic residues can be regarded as low-entropy hydration shells, due to the hydrogen-bond rearrangements between the backbone hydrophilic atoms and surrounding water molecules are inhibited (see Fig1.a). It is worth noting that the side-chains of tyrosine, tryptophan, cysteine, methionine, phenylalanine, lysine, arginine and alanine residues also contain highly hydrophobic structures (i.e. alkyl and benzene ring). The hydration shells surrounding the backbone carbonyl oxygen groups and the amide hydrogen groups of these residues also should be considered low-entropy hydration shells (see Fig1). According to different hydrophobicity scales of amino acid residues, tryptophan and tyrosine exhibit different hydrophilic and hydrophobic properties (24) (25) (26) (27) . It has previously been noted that tryptophan and tyrosine amino acid only express their hydrophilicity via a tiny CO or NH group in their long side-chains, whereas the other portions of the side-chains are highly hydrophobic alkyl and benzene ring structures (see Fig.1b ) (28) . The characteristic of hydration shell water molecules surrounding hydrophobic groups is that their hydrogen bonding network is much more ordered than free liquid water molecules, that is, their entropy is lower (less entropy in the system) (29) (30) (31) (32) (33) . Therefore, ordered water molecules are fixed in low-entropy hydration shells around the highly hydrophobic alkyl and benzene ring structures and are expelled to van der Waals interactions operating distances. The tiny CO or NH group is most likely to hydrogen bond with the hydrophobic-group-induced low entropy water cages rather than destroy the ordered water molecules network (see Fig.1b ). For instance, the hydrogenbond rearrangements between water molecules and the hydrophilic NH group of tyrosine side-chain can be inhibited by the hydrophobic benzene ring of the side-chain, due to the NH-group's neighboring water molecules were already fixed in the ordered network. This explains why tryptophan and tyrosine are categorized as hydrophobic residues in some hydrophobicity scales (25) (26) (27) . Therefore, hydration shells surrounding the side-chains of tryptophan, tyrosine, lysine can be regarded as a low entropy hydration shell (see Fig.1b ). When the side-chains of tryptophan, tyrosine, lysine locate at the surface of a protein, their hydration shells should be considered low-entropy as whole. It is important to note that proteins normally have an abundance of intramolecular hydrogen bonds. For example, protein secondary structures arise from the hydrogen bonds formed between the amide proton and the carbonyl oxygen of the polypeptide backbone. To form these intramolecular hydrogen bonds, nascent unfolded polypeptide chains need to escape from hydrogen bonding with surrounding polar water molecules that require entropy-enthalpy compensations during the protein folding, according to the Gibbs free energy equation. The entropy-enthalpy compensations are initially driven by laterally hydrophobic collapse among the side-chains of adjacent residues in the sequences of unfolded protein chains (28) . As a result of the entropy-enthalpy compensations, water molecules cannot break the intramolecular hydrogen bonds of proteins by competing with the donors and acceptors of these intramolecular hydrogen bonds (28) . It means that protein intramolecular hydrogen bonds saturate many of the hydrogen bonds formed by surface hydrophilic groups of proteins. Therefore, we can consider that these intramolecular hydrogen-bonded hydrophilic groups can't destroy surrounding hydrophobic-group-induced low-entropy water molecules network, due to their hydrophilicity have been expressed by the intramolecular hydrogen bonds. On protein surfaces, the hydration shells covering these intramolecular hydrogen-bonded hydrophilic groups can be considered as low-entropy hydration shells, due to hydrophobic alkyl always neighbored with the hydrophilic groups in protein structures (see supplementary Fig.S1 ). In secondary structures, for instance, the hydration shells covering the hydrogen bonded backbone amide proton and carbonyl oxygen should be regarded as low-entropy structures (see Fig.2 ). Based on this finding, we find out that typical secondary structures normally characterized by their one-side surfaces fully covered by low-entropy hydration shells. This enable secondary structures fully hydrophobic collapse together to form the protein tertiary structure, as illustrated in Fig.2 . Based on the above analysis, we can judge the low-entropy nature of the hydration shells surrounding specific surface hydrophilic groups of proteins. Consequently, we are able to map low-entropy regions of the hydration shell of a given protein (see Fig.3 ). We map the low-entropy hydration shell regions for 100 protein pairs, and found out that binding sites on proteins are typically covered by the largest low-entropy regions of the proteins hydration shells. All the 100 protein complexes were randomly selected from the PDB. The shape of the largest low-entropy hydration shell region of a given protein can be easily achieved from their projective images (28) . Surprisingly, we found out that shape matching between the largest low-entropy hydration shell region of a protein and that of its partner at the binding sites is prevailing in all the tested protein complexes. For example, the shape and size matching of the largest low-entropy hydration shell regions at the binding sites of the spike protein of the omicron variant of the severe acute respiratory syndrome coronavirus (SARS-CoV-2) and the receptor angiotensin converting enzyme 2 (ACE2) is demonstrated in Fig.3 . The shapes and sizes of the largest low-entropy regions of protein hydration shells can be used as parameters to predict protein docking. Through analyzing the hydrophobic attraction relationships among low-entropy regions of protein hydration shells of hundreds of protein pairs, we find out that the binding sites of a pair of proteins are always characterized by two rules of the space layout of low-entropy regions of the hydration shells at the binding sites. First, the docking position maximizes the overlapping of the largest low-entropy hydration shell region of a protein and that of its partner. Secondly, the binding sites of a pair of proteins must allow sufficient interfacial contact at the docking position of the complex. Ordered water molecules fixed in the water cages of the lowentropy regions of hydration shells that drive hydrophobic collapse of the low-entropy hydration shell regions in-between proteins, thereby rearrange ordered water molecules to free liquid water molecules to increase entropy. The bind affinity between two proteins are initially resourced from long-range hydrophobic effect among the low-entropy hydration shell regions of the two proteins at the binding sites, enable the shape-matched largest low-entropy hydration shell regions fully collapse in-between the two proteins (34) (35) (36) (37) . To prove that protein-protein binging process is guided by hydrophobic attraction between shape-matched largest low-entropy hydration shells of the proteins, we try to predict the binding sites of the 100 protein complexes by using the above two rules (see Figure 4 and Supplementary). All the binding sites of the 100 protein complexes were successfully predicted by the using the two rules, which provides potent proof for the theory of hydrophobic collapse between the shape-matched largest low-entropy regions of the hydration shells. All the 100 protein complex were randomly selected from the PDB. With further hydrogen bonding matches, protein-protein docking can be accurately predicted. As a typical spontaneous reaction, the guidance stage of protein-protein binding must release Gibbs free energy as it proceed. The early steps of protein-protein binding should be not dominated by electrostatic interaction or hydrogen bonding in-between the two proteins, due to the shielding effect of polar water molecules of the hydration shells. A protein-protein binding begins with a long-range hydrophobic attraction between the lowentropy regions of hydration shells of individual proteins. Entropy increase caused by the hydrophobic attraction guides the docking process and provides the binding affinity (28) . By the analysis, we show that all the binding sites of protein pairs were covered by the shape-matched largest low-entropy regions of the hydration shells, enable the largest lowentropy regions fully collapse at the protein-protein interfaces during the docking processes. Protein-protein binding is mainly guided by hydrophobic collapse between shape-matched low-entropy regions of hydration shells of individual proteins. Despite the difficulty in identifying low-entropy hydration shells around a protein using experiment methods, the theoretical approach allows us to identify the largest low-entropy hydration shell areas around proteins that can be used to predict protein binding sites. Space layout of the low-entropy region of the protein's hydration shell acts like a 'Lock or Key' for guiding the protein-protein binding in a precise manner. In this study, many experimentally determined native structures of proteins are used to study the protein-protein docking mechanism. All the three-dimensional (3D) structure data of protein molecules are resourced from the PDB database. IDs of these proteins according to PDB database are marked in all the figures. In order to show the distribution of low-entropy hydration shells on the surface of proteins at the binding sites in these figures, we used the structural biology visualization software PyMOL to display the lowentropy hydration shell areas. The detailed space layout of hydrophilicity of residues can be easily identified from the amount of charge of atoms according to the charmm36 force field (see supplementary) (38) . Insights into Protein-Ligand Interactions: Mechanisms, Models, and Methods Protein-Protein Docking: From Interaction to Interactome So Much More to Know & Classification and prediction of protein-protein interaction interface using machine learning algorithm Recent advances in the development of protein-protein interactions modulators: mechanisms and clinical trials Development and validation of a genetic algorithm for flexible docking11Edited by Automated docking of flexible ligands: Applications of autodock Protein docking along smooth association pathways The ClusPro web server for protein-protein docking Protein structure and dynamics in nonaqueous solvents: insights from molecular dynamics simulation studies Mapping hydration dynamics around a protein surface Biological water: A critique Entropy and dynamics of water in hydration layers of a bilayer Universal Initial Thermodynamic Metastable state of Unfolded Proteins Water follows polar and nonpolar protein surface domains Introduction to protein crystallization Fluoroolefins as Peptide Mimetics: A Computational Study of Structure, Charge Distribution, Hydration, and Hydrogen Bonding Regulation of protein-ligand binding affinity by hydrogen bond pairing A Hydrophobic-Interaction-Based Mechanism Triggers Docking between the SARS-CoV-2 Spike and Angiotensin-Converting Enzyme 2 SARS-CoV-2 Variants, RBD Mutations, Binding Affinity, and Antibody Escape Dynamics and mechanism of ultrafast water-protein interactions Water Dynamics in the Hydration Shells of Biomolecules Spatially Heterogeneous Surface Water Diffusivity around Structured Protein Surfaces at Equilibrium A simple method for displaying the hydropathic character of a protein Hydrophobicity of amino acid residues: Differential scanning calorimetry and synthesis of the aromatic analogues of the polypentapeptide of elastin Cation-π Interactions Involving Aromatic Amino Acids IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties Entropy-Enthalpy Compensations Fold Proteins in Precise Ways Protein-spanning water networks and implications for prediction of protein-protein interactions mediated through hydrophobic effects Molecular biology genes to proteins Origin of hydrophobicity and enhanced water hydrogen bond strength near purely hydrophobic solutes The physical origin of hydrophobic effects Water is an active matrix of life for cell and molecular biology A Hydrophobic-Interaction-Based Mechanism Triggers Docking between the SARS-CoV-2 Spike and Angiotensin-Converting Enzyme 2 Hydrophobic complementarity in protein-protein docking Recent progress in understanding hydrophobic interactions Principles of protein-protein recognition CHARMM: the biomolecular simulation program The authors declare no competing financial interests.