key: cord-0468947-8nflpnmh authors: Li, Jiacheng; Ma, Xiaoliang; Zhang, Hongchi; Hou, Chengyu; Shi, Liping; Guo, Shuai; Liao, Chenchen; Zheng, Bing; Ye, Lin; Yang, Lin; He, Xiaodong title: The role of hydrophobic interactions in folding of $beta$-sheets date: 2020-09-16 journal: nan DOI: nan sha: f0975ba62de933eb2c424b9d8fc5916049e444c6 doc_id: 468947 cord_uid: 8nflpnmh Exploring the protein-folding problem has been a long-standing challenge in molecular biology. Protein folding is highly dependent on folding of secondary structures as the way to pave a native folding pathway. Here, we demonstrate that a feature of a large hydrophobic surface area covering most side-chains on one side or the other side of adjacent $beta$-strands of a $beta$-sheet is prevail in almost all experimentally determined $beta$-sheets, indicating that folding of $beta$-sheets is most likely triggered by multistage hydrophobic interactions among neighbored side-chains of unfolded polypeptides, enable $beta$-sheets fold reproducibly following explicit physical folding codes in aqueous environments. $beta$-turns often contain five types of residues characterized with relatively small exposed hydrophobic proportions of their side-chains, that is explained as these residues can block hydrophobic effect among neighbored side-chains in sequence. Temperature dependence of the folding of $beta$-sheet is thus attributed to temperature dependence of the strength of the hydrophobicity. The hydrophobic-effect-based mechanism responsible for $beta$-sheets folding is verified by bioinformatics analyses of thousands of results available from experiments. The folding codes in amino acid sequence that dictate formation of a $beta$-hairpin can be deciphered through evaluating hydrophobic interaction among side-chains of an unfolded polypeptide from a $beta$-strand-like thermodynamic metastable state. Protein products are the basis of life on Earth and serve nearly all the functions in the essential biochemistry of life science. Each nascent protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA to a linear chain of residues by a ribosome. The intrinsic biological functions of a protein are expressed and determined by its native three-dimensional (3D) structure that derives from the physical process of protein folding 1 , by which a polypeptide folds into its native characteristic and functional three-dimensional structure, in an expeditious and reproducible manner. Protein folding can thereby be considered the most important mechanism, principle, and motivation of biological existence, functionalization, diversity, and evolution [2] [3] [4] . Based on the complexity of protein folding, the protein-folding problem has been summarized in three unanswered questions 1 : (i) What is the physical folding code in the amino acid sequence that dictates the particular native 3D structure? (ii) What is the folding mechanism that enables proteins to fold so quickly? (iii) Is it possible to devise a computer algorithm to effectively predict a protein's native structure from its amino acid sequence? Moreover, another essential question is why protein folding highly depends on the solvent (water or lipid bilayer) 5 and the temperature 6 ? The protein folding problem was brought to light over 60 years ago. In particular, since Anfinsen shared a 1972 Nobel Prize in Chemistry for his work revealing the connection between the amino acid sequence and the native conformation 7 , understanding of protein sequence-structure relationships has become the most fundamental task in molecular and structural biology 8 . Protein folding is one of the miracles of nature that human technology finds quite difficult to follow, due to the very large number of degrees of rotational freedom in an unfolded polypeptide chain. In the 1960s, Cyrus Levinthal pointed out that the apparent contradiction between the astronomical number of possible conformations for a protein chain and the fact that proteins can fold quickly into their native structures should be regarded as a paradox (Levinthal's paradox) 9 , so there must be mechanisms that allow polypeptide chains to find the native states encoded in their sequence. As stated in Anfinsen's Dogma, the well-defined native 3D structures of small globular proteins are uniquely encoded in their primary structures (the amino acid sequences), is kinetically reproducible and stable under a range of physiological conditions, and can therefore be considered as an issue of the certainty. Many proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro, thus proteins may be regarded as "folding themselves" following explicit folding pathways 1 . Protein folding is considered a free energy minimization or a relaxation process that is guided mainly by the following physical forces: (i) formation of intramolecular hydrogen bonds, (ii) van der Waals interactions, (iii) electrostatic interactions, (iv) hydrophobic interactions, (v) chain entropy of protein, (vi) thermal motions 1, 10 . Among them, hydrophobic effect is normally thought to play a decisive role 11 . Currently, the generally accepted hypothesis in the field is to conceive of protein folding in a funnel-shaped energy landscape, where every possible conformation is represented by a free energy value. The rapid folding of proteins has been attributed to random thermal motions that cause conformational changes leading energetically downhill toward the native structure corresponds to its free energy minimum under the solution conditions 1, 10 . However, there are both enthalpic and entropic contributions to free energy of protein that change with temperature and so give rise to heat denaturation and, in some cases, cold denaturation 12 . So far the hypothesis haven't been able to decipher the folding code and therefore aren't generally able to read a sequence and predict what shape it will adopt. The interaction of protein surface with the surrounding water is often referred to as protein hydration layer (also sometimes called hydration shell) and is fundamental to structural stability of protein, because non-aqueous solvents in general denature proteins 13 . The hydration layer around a protein has been found to have dynamics distinct from the bulk water to a distance of 1 nm and water molecules slow down greatly when they encounter a protein 14 . Thus, hydrophilic side chains of proteins are normally hydrogen bonded with surrounding water molecules in aqueous environments, thereby preventing the surface hydrophilic side-chains of proteins from randomly hydrogen bonding together 14,15 16 . This is the reason why proteins usually do not aggregate or crystallize in unsaturated aqueous solutions 17 , even though the solvent-facing surface of the proteins is usually composed of predominantly hydrophilic regions. Experiments have also shown that secondary structures of protein (such as -helices and -sheets) are stabilized by hydrogen bonds between the N-H groups and C=O groups of the main chain 18, 19 . This also indicates that the shielding effect of surrounding water molecules prevent hydrophilic side-chains from interfering with the formation of secondary structures during protein folding. Thus, water molecules should be able to saturate the hydrogen bond formations of hydrophilic side-chains and the main chain before the protein folding 14-16 , due to water molecules have very strong polarity 20,21 . This is the reason why intrinsically disordered proteins (IDPs) and regions (IDRs) can make up a significant part of the proteome 22 . Before the folding of secondary structures, the early steps of protein folding may be not directly dominated by the formation of intramolecular hydrogen bonds, due to the shielding effect of surrounding water molecules. Thus, this problem may lie in our lack of understanding of the hydrophobic interaction among neighbored side-chains of unfolded proteins at early steps of the folding, given the lack of awareness of the importance of the shielding effect of water. Almost all experimentally determined native tertiary structures of water-soluble proteins have a hydrophobic core in which hydrophobic side-chains are buried from water 23-25 . Incidentally, polar residues interact favorably with water, thus the solvent-facing surface of the peptide is usually composed of predominantly hydrophilic regions 26 . Minimizing the number of hydrophobic side-chains exposed to water, namely, hydrophobic collapse thus has been regarded as one of the most important driving force for protein folding processs 27 . Experimental methods such as laser temperature jumping technology and single molecule experimental techniques have revealed that protein folding first leads to the formation of secondary structures (α-helices and β-strands), and the tertiary structure is formed by the folding of secondary structures 28 . It is likely that the nascent polypeptide forms initial secondary structure through creating localized regions of predominantly hydrophobic residues due to hydrophobic effect 29 . The secondary structures interacts with water, thus placing thermodynamic pressures on these regions which then aggregate or "collapse" into a tertiary conformation with a hydrophobic core 26 . Therefore, protein folding is highly dependent on folding of secondary structures as the way to hierarchically pave a native folding pathway that lead to formation of correct tertiary structures and cause conformational changes leading energetically downhill toward the native globular structure that possesses the minimum free energy. Thus, decipher of the folding codes in amino acid sequence that dictate the secondary structures formation should be regarded as a key to crack the protein folding problem. Among types of secondary structure in proteins, the β-sheet is the most prevalent. If the controlling mechanism for β-sheet folding can be revealed, it would remarkably promote solution of the protein folding problem. Currently, several hypotheses has been proposed for explaining the folding mechanism of β-sheet. The hydrophobic zipper hypothesis indicates that a hairpin is first formed before hydrophobic contacts act as constraints which bring other contacts into spatial proximity 30 . This leads to further constrain and causes the rest of the contacts to zip up. Munoz et al proposed that the folding of a β-hairpin initiates at the turn and propagates towards the tails 31 . In particular, they found that stabilization through hydrophobic contacts between residues and hydrogen bonding interaction are important for the formation of the βhairpin. Petrovich et al. 32 studied a 37-residue triple-stranded β-sheet protein via MD simulations. Their results indicate that a β-hairpin first appears before the third strand joins in to complete the β-sheet at the end of the folding process. They ascribe the folding mechanism of the β-sheet to a combination of initial hydrophobic collapse and zipper mechanism, which serve to nucleate the hairpin formation. Notably, all the three mechanisms above suggest that the folding of a β-sheet is necessarily preceded by the occurrence of a β-turn. We are still missing a "folding mechanism" for β-sheets. By mechanism, we mean a narrative that explains how the time evolution of a β-sheet folding development derives from its amino acid sequence and solution conditions. β-sheet folding highly depends on the temperature 5 , where β-sheets can form in as little as 1 microsecond after the temperature jumping 33-35 . β-sheets consist of β-strands connected laterally by at least three backbone hydrogen bonds, forming a generally pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 or more amino acids long with backbone in an extended conformation. It most like that the β-strands exist before the folding of β-sheets. Because it is difficult to explain how the folding process of a β-sheet (i.e., laterally hydrogen bonding process of segments of unfolded polypeptide) is accompanied by stretching process of the segments of polypeptide into β-strands. There must be mechanisms that allow polypeptide chain segments to find the states of β-strands encoded in their sequence. There also must be some physical effects providing the long-range attractive force among βstrands for the β-sheets formation. Experimental evidences of the folding of unfolded proteins provide corroboration for a hypothesis that folding initiation sites arise from hydrophobic interactions 11,36 . The folding of β-strands and β-sheets may be driven by hydrophobic interactions, as the nascent polypeptide may form initial primary structure through creating localized regions of predominantly hydrophobic residues 29 . Hydrophobic effect most likely can contribute to the formation of β-sheets through multistage aggregations of neighbored hydrophobic groups of unfolded polypeptides, which lead to the formation of β-strands, and consequently fold into β-sheets. A β-sheet always is amphipathic in nature, namely, contain hydrophilic surface areas and hydrophobic surface areas. Note that the hydrophobic attraction (due to the hydrophobic effect) among adjacent side-chains on one side or the other side of a β-strand may be common in experimentally determined protein structures, which should be considered as an evidence for hydrophobic effect dominating the formation of β-strands. It has previously been noted that many amino acid side chains contain considerable nonpolar sections, even if they also contain polar or charged groups 36 . Namely, hydrophilic side-chains are not completely hydrophilic. The hydrophilicity of hydrophilic side-chains is normally expressed by C=O or N-H2 groups at their ends, and the other portions of hydrophilic side-chains are hydrophobic, because the molecular structures of these portions are basically alkyl and benzene ring structures, as shown in Figure 1 . Folding initiation sites of β-brands might therefore contain not only accepted "hydrophobic" amino acids, but also larger hydrophilic side-chains 36 . If formation of β-brands is driven by hydrophobic interactions among neighbored side-chains of unfolded polypeptide, we should be able to find experimental evidence of the hydrophobic interaction in the Protein Data Bank (PDB) achieves, due to hundreds of thousands of β-sheet structures have been experimentally determined. In an aqueous environment, the water molecules tend to segregate around the "hydrophobic" side chains of the nascent protein, creating hydration shells of ordered water molecules 37 . An ordering of water molecules around a hydrophobic region increases order in a system and therefore contributes a negative change in entropy (less entropy in the system) 38 . The water molecules are fixed in these water cages which drives the hydrophobic collapse, or the aggregation of the hydrophobic groups. Thus, the hydrophobic interaction among neighbored side-chains in sequence can introduce entropy back to the system via the breaking of their water cages which frees the ordered water molecules 39 . If hydrophobic interactions among neighbor sidechains in amino acid sequences provide the structural stability for β-brands formation, we must can find out that the phenomenon of a large hydrophobic surface area covering on one side or the other side of a β-strand is prevail in almost all experimentally determined β-sheets. If the phenomenon of hydrophobic side-chains tend to cluster together on one side of adjacent β-strands of a β-sheet is prevail in almost all experimentally determined β-sheets, we may demonstrate that the hydrophobic interaction among the neighbored side-chains responsible for β-sheet-folding initiation. The capability of an amino acid residue to get involved in the hydrophobic attraction with neighbored residues in sequence can be evaluated by the exposed alkyl and benzene ring structures of the side-chain, as shown in Fig.1 , in which 20 kinds of amino acid residue are divided into four groups 40 . Arginine-R, Histidine-H, and Lysine-K can involve in hydrophobic interaction with adjacent hydrophobic side-chains in sequence due to their long hydrophilic side chains contain long nonpolar alkyl structures, see Fig. 2C and 2D. Noting that every β-strand is characterized by a large hydrophobic surface fully covering one side of the β-brand (the inner side), and caused each side-chains is parallel to every other side-chain of each strands due to the hydrophobic interaction. Parallel distribution of adjacent peptide planes of these β-strands also causes adjacent sidechains to distribute on opposite sides of the main chain and each carbonyl oxygen atom in a peptide plane tends to hydrogen bond with an amide hydrogen atom in an adjacent peptide plane due to the electrostatic attractions between them, except the Proline-P 15 . Parallel distribution of neighbored "hydrophobic" sidechains in a β-strand can effectively introduce entropy back to the system via the merging of the water cages of the side-chains which frees the ordered water molecules, see Fig.2D . Thus, β-strand should be considered as a metastable state for unfolded polypeptides corresponds to its free energy minimum under the solution conditions, creating localized regions of predominantly hydrophobic side-chains 15 . We use another small-molecule protein (PBDID:1OUR) as the example to demonstrate the role of hydrophobic interactions among neighbored side-chains played in formation of β-strands, β-turns and βsheets, see Fig.3 . The protein is mainly composed with β-strands and 10 β-turns. Every β-strand of the protein is also characterized by a large hydrophobic surface fully covering one side or the other side of the β-brand, see Fig.3A . Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G most likely contribute to formation of β-turns in protein folding, due to the other neighbored side-chains in amino acid sequence tend to hydrophobic attract with each other through bypassing these residues (see Fig.1d ). Thereby, Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G can be classified as a hydrophobic blocking (RB) group. It is worth noting that almost all the 10 β-turns of the protein are composed with two or more residues of Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G, see Fig.3A and 3B. This indicates that two or more adjacent RB residues can effectively block hydrophobic attraction among neighbored side-chains in sequence at both side of a strand. We plot the protein structure into three parts according to three segments of the amino acid sequence to illustrate the hydrophobic collapse among neighbored β-strands in sequence, see Fig.3B and 3C. Hydrophobic interactions among these β-strands may drive them collapse together through bending the unfolded polypeptide at the location of these RB residues, namely, bypassing these RB residues at the turns to achieve the hydrophobic collapse. This also indicates that hydrophobic attraction among neighbored side-chains drive the β-strands formation and then cause hydrophobic attraction among the neighbored β-strands and formation of the β-sheets, due to β-strands formation create localized regions of predominantly hydrophobic residues and place thermodynamic pressures on these regions under the solution conditions. Formation of β-sheets also make β-strands aggregate or "collapse" into a tertiary conformation with a hydrophobic core. Thereby, we speculate that folding of β-sheets is triggered by multistage hydrophobic interactions among neighbored side-chains of unfolded polypeptides, enable β-sheets fold reproducibly following explicit physical folding codes in aqueous environments. We use 1000 experimentally determined small protein structures to further demonstrate the hydrophobiceffect-based folding mechanism for β-sheets. All the 1000 small proteins were randomly selected from the PDB. 3235 β-strands can be identified in the 1000 protein structures by using the PDB archive and the STRIDE software 42 . From analysis of all the 3235 β-strands of the 1000 proteins in PDB, we find out that the feature of hydrophobic attraction (due to the hydrophobic effect) among adjacent side-chains on one side or the other side of a β-strand covering the length of the β-strand is prevail in all the experimentally determined β-strands (see Supplementary S5 ). This indicates that the hydrophobic interaction among the neighbored side-chains responsible for the formation of β-strands. Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G can't effectively hydrophobic attract with neighbored side-chains in sequence, see Fig.1D . Thus, Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G most likely lead to β-turns formation in protein folding, due to the other neighbored sidechains in amino acid sequence tend to hydrophobic attract with each other through bypassing these residues. The β-turn is the third most important secondary structure after helices and β-strands. β-turns have been classified according to the values of the dihedral angles φ and ψ of the central residue. β-turns can be easily identified in between β-strands or α-helices of the protein structures using the PDB archive and the STRIDE software 42 . We identified 5776 β-turns in the 1000 protein structures, include about 1780 β-hairpin turns. We find out that about 97.4% of the β-turns contain at least one Aspartate-D, Asparagine-N, Serine-S, Proline-P or Glycine-G residue 43 , as illustrated in Supplementary 2. Whereas, most of the rest no-RB β-turns contain at least one Glutamate-E, Glutamine-Q, Threonine-T, and Alanine-A residue. This indicates that Glutamate-E, Glutamine-Q, Threonine-T, and Alanine-A may contribute to the formation of β-turns due to their exposed hydrophobic proportions is relatively small. Moreover, about 99.3% β-hairpin turns contain at least one Aspartate-D, Asparagine-N, Serine-S, Proline-P or Glycine-G residue, see Supplementary 2. Two RB residues coded together normally shouldn't be able to present at the middle of a long straight βstrand. Because the other residues of the strand at both sides of the two RB residues tend to hydrophobic aggregate together and thus would bend the strand at the two RB residues to achieve the hydrophobic interaction. However, we can still identified 29 long β-strands (each β-strands contain more than 12 residues), which are characterized by two adjacent RB residues locating at the middle of the β-strands through scanning the 1000 protein structures by using the STRIDE software 42 . By checking these long β-strands using PyMOL software, we find out that 24 of these long β-strands actually curved exactly at their two RB residues in the amino acid sequences, demonstrating the capability of RB residues to cause β-turns formation, see Fig. 4 . The other 5 long β-strands either have three or more RB residues coded together or have RB residues located at one end of the strands that make the hydrophobic blocking region extend to the ends of these β-strands, thus undermining the hydrophobic interaction between the both ends of these β-strands, see Supplementary S3. The long β-strand of the 1YV7 protein curved at a sequence segment of threonine-threonine-terine-glutamate (TTSE), see Supplementary S3. This indicates that Glutamate-E, Glutamine-Q, Threonine-T, and Alanine-A may also contribute to the formation of β-turns due to their exposed hydrophobic proportions is relatively small. The spike (S) protein of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is of great concern due to the coronavirus disease 2019 (COVID-19) pandemic. The D614G mutation in SARS-CoV-2 begin to receive widespread attention for its rising dominance worldwide. This mutation changes the amino acid at position 614, from D (aspartic acid) to G (glycine), the initial D614 is now the G614 variant. It is worth noting that the amino acid at position 614 is located at a β-turn in a tertiary structure of the spike. This is consistent with our new theory that both D (aspartic acid) to G (glycine) can result in the β-turn formation. The D-614-G nutation may accelerate the folding of the quaternary structure of the spike due to G614 most likely can contribute to the hydrophobic effect between two tertiary structures of the protein rather than the D614 (see Fig.1D ), due to the position 614 located at the docking site in between them. A typical β-hairpin structure contains two β-strands with hydrophobic attraction between each side-chain and every other side-chain on the strands. Thus, we might be able to predict β-hairpin structures through evaluating hydrophobic attraction among each side-chain with every other side-chain in the primary structure of a protein. We may can predict β-hairpin through identifying two neighbored sequences of residues in the polypeptide chain both characterized by hydrophobic attraction between each side-chain to every other side-chain, and have two RB in between them. By using this method, we identified 553 samples in terms of the characteristics above from the 1000 proteins. We find that 158 of the samples are β-hairpins, 36 of the samples are structures of strand-turn-strand, 296 of the samples are structures of strand-turn-helix, 23 of the samples are structures of coil-turn-strand, 23 of the samples are coil-turncoil and 6 of the samples are α-helices. Thus, physical folding codes for β-hairpins and strand-turn-strand can be deciphered through evaluating hydrophobic interaction among side-chains of an unfolded polypeptide. The results show that strand-turn-helix also can be predict by the method. This indicates that folding of α-helix may be initiated from a β-strand-like thermodynamic metastable state 15 . Many amino acid residues contain considerable nonpolar sections in their side-chains, even if they also contain polar or charged groups. This make hydrophobic interaction among neighbored amino acid sidechains in amino acid sequence of polypeptides becomes an important driving force for the stabilization of initial thermodynamic state of unfolded Proteins. The feature of a large hydrophobic surface area covering most side-chains on one side or the other side of adjacent β-strands of a β-sheet is prevail in almost all experimentally determined β-sheets. Minimizing the exposed hydrophobic portions of adjacent side-chains to water should be regarded as the most important driving force for the β-strands formation and caused each side-chains is parallel to every other side-chain on strands. β-turns often contain residues of Aspartate-D, Asparagine-N, Serine-S, Proline-P, Glycine-G which characterized with their sidechains having very small hydrophobic proportions exposure, that is explained as these residues can block hydrophobic effect among neighbored side-chains in sequence, thereby contribute to turns formation. The folding of β-sheets are most likely triggered by multistage hydrophobic interactions among neighbored side-chains of unfolded polypeptides, enable β-sheets fold reproducibly following explicit physical folding codes in aqueous environments. Temperature dependence of the folding of β-sheet is thus attributed to temperature dependence of the strength of the hydrophobicity. The hydrophobic collapse of β-strands into β-sheets most likely trigger enthalpy-entropy compensation of unfolded polypeptides, enable the main-chain of β-strands to get rid of the hydrogen-bonded water molecules and laterally hydrogen bonding with each other. The folding codes in amino acid sequence that dictate the formation of a β-hairpin can thus be deciphered through evaluating hydrophobic interaction among sidechains of an unfolded polypeptide from a β-strand-like thermodynamic metastable state. In this study, many experimentally determined native structures of proteins are used to study the folding mechanism of β-sheets. All the three-dimensional (3D) structure data of protein molecules are resourced from the PDB database. IDs of these proteins according to PDB database are marked in the Fig.2, Fig.3 , and Fig.4 . In order to show the distribution of hydrophobic areas on the surface of β-strands and β-sheets in these figures, we used the structural biology visualization software PyMOL to display the hydrophobic surface areas of these secondary structures. Secondary structures of β-strands, β-turns, β-sheets and α-helices were identified in the 1000 proteins by using the STRIDE software 42 . We also used molecular 3D structure display software PyMOL to confirm the identification of secondary structures of proteins. Long β-strands (more than 12 residues) characterized with two adjacent RB residues located at the middle of the β-strands and curved exactly at their two RB residues in the amino acid sequences. 134L 1KP4 1Z3L 3C3G 4IC9 1FDD 1R75 2OAI 3OZZ 5HBP 155C 1KSM 1Z3M 3C4S 4IDL 1FE3 1R9H 2OJR 3P2X 5HBQ 1A0B 1KTH 1Z3P 3C5K 4IP1 1FE5 1RAQ 2OLI 3PAZ 5HDG 1A18 1KXI 1ZIA 3C7I 4IP6 1FER 1RBI 2ON8 3Q1D 5HJC 1A6F 1KXW 1ZIB 3C97 4IPF 1FES 1RBW 2OPY 3Q4Y 5HKN 1AA2 1KXX 1ZJ7 3CE8 4JHB 1FEV 1RBX 2OQK 3Q7Y 5HMB 1AB0 1KXY 5LAW 1B0T 1LXI 2B4A 3ETW 4LYO 1FOW 1RRI 2P8V 3SGP 5LAZ 1B1E 1LYO 2B8G 3EZM 4MDQ 1FOY 1RRY 2PAL 3STM 5LN2 1B1I 1LZ4 2B9D 3F3Q 4MJJ 1G2S 1RS2 2PK7 3SUL 5M9A 1B1J 1M1S 2BEZ 3F45 4ML2 1GBQ 1RSI 2PKT 3T1X 5MXY 1B1U 1M4A 2BFH 3F8C 4MZ2 1GD6 1RTU 2PNE 3T3J 5NGN 1B2O 1M4B 2BHK 3FAJ 4N0Z 1GDC 1RWJ 2PPI 3T8R 5NWX 1B6E 1M4M 2BHO 3FFY 4N5T 1GHJ 1RZY 2PPN 3T8T 5O3A 1BAS 1MB3 2BO1 3FRV 4N6J 1GKH 1S3P 2PVT 3UAF 5OC4 1BEA 1MG6 2BPP 3FWU 4NEJ 1GMG 1S7I 2PW5 3UB2 5OC8 1BEL 1MH7 2BQQ 3FYR 4NXR 1GOD 1SDZ 2PWS 3UB3 5OMD 1BFE 1MH8 2BRF 3FZ9 4OOH 1GP3 1SF6 2PYK 3UB4 5PAL 1BGI 1MK0 2BS5 3FZA 4OV1 1GS3 1SF7 2PZW 3UMD 5PAZ 1BHF 1MKU 2BTI 3G7C 4OXW 1GSW 1SFB 2Q1M 3UME 5RHN 1BKF 1ML8 2BWK 3GBQ 4OZL 1GV2 1SFG 2QAS 3UNN 5RNT 1BKV 1MLI 2BWL 3GK2 4P15 1GVP 1SKZ 2QDB 3V19 5TAB 1BM2 1N9N 2BZY 3GK4 4P2P 1GXT 1SNP 2QHE 3V1G 5U9U 1BMG 1N9O 2CDS 3GKY 4P7U 1H2P 1SNQ 2QHW 3VXW 5UEP 1BO0 1NEH 2COQ 3GLW 4P9E 1HD0 1SSC 2QIU 3VYA 5UER 1BPQ 1NEQ 2CW4 3GM2 4P9V 1HE7 1SV9 2QNW 3WIT 5UES 1BQK 1NKO 2CXY 3GM3 4PAZ 1HEH 1T00 2QR3 3WRP 5UET 1BQR 1NLO 2D4L 3GQU 4PE7 1HEY 1T1D 2QVT 3WUN 5UEV 1BU3 1NN7 2D4M 3GSP 4PTA 1HFX 1T2J 2R2Y 3WVT 5UEY 1BXY 1NNX 2D58 3H33 4QYW 1HIK 1TAY 2R34 3WW5 5USV 1BYL 1NXT 2DB7 3H9W 4RLM 1HKF 1TCY 2R36 3WX4 5UVS 1C0B 1NXV 2DH1 3HAF 4RLN 1HME 1TDY 2R39 3ZEK 5UVY 1C0C 1O2E 2DP9 3HAK 4RUD 1HQ8 1TEN 2REA 3ZK0 5UVZ 1C76 1O3X 2DRT 3HEY 4TQN 1HQB 1TGJ 2REY 4AHG 5VEA 1C9H 1O41 2DU9 3HFF 4TS8 1HRC 1TNS 2RKN 4AHI 5VR3 1C9V 1O42 2DYQ 3HFN 4UNG 1HRQ 1TS9 2RLN 4AHN 5WOR 1CDP 1O43 6CE8 1DDV 1O4M 2FKL 3IRC 4ZC3 1IOR 1UID 2WOR 4DP2 6CEA 1DDW 1O4N 2FLS 3J4G 5AEF 1IOS 1UIE 2WOS 4DP4 6CEC 1DKJ 1O4O 2FLT 3JTE 5AFG 1IOT 1UIF 2WQ0 4E1P 6CED 1DMM 1O4Q 2FMB 3JZR 5AI2 1IR7 1UIG 2WQ3 4E1R 6CEE 1DMN 1O4R 2FO3 3K0X 5B1F 1IR8 1UKU 2WQH 4EC2 6CEF 1DMQ 1O5J 2FQL 3K63 5B1G 1IR9 1UPJ 2X44 4EES 6DXH 1DPY 1O7Z 2FTB 3K6D 5B3Q 1IRQ 1UPR 2X9C 4ET8 6DXR 1DQ7 1OAP 2GSP 3K9I 5B52 1IRW 1UVY 2XBD 4ET9 6EKB 1DYZ 1OB6 2GV2 3KD0 5B79 1ISU 1UW3 2XCZ 4ETA 6EVL 1DZ0 1OD7 2H2B 3KHQ 5BMH 1IYU 1UW7 2XDY 4ETB 6EVM 1E5B 1ODA 2H36 3KLU 5BPO 1IZA 1UWM 2XFE 4ETD 6FCE 1E6K 1OOI 2H46 3KMJ 5C39 1J2V 1V07 2XKH 4ETE 6FGL 1E6L 1OPY 2H8A 3KQ6 5C4P 1J4H 1V46 2XMU 4EWW 6FGT 1E6M 1OR5 2H8V 3KQI 5C68 1J4I 1VAT 2YHN 4EWZ 6FGU 1E97 1P0R 2H95 3KTP 5C6X 1J73 1VED 2YWZ 4EX0 6FH6 1EA2 1P65 2HB4 3KU7 5CB9 1J7Z 1VER 2YXY 4F1A 6FH7 1ECW 1P9G 2HC8 3KVT 5CO9 1J81 1VHF 2Z44 4F2E 6FXF 1ED1 1PAL 2HPL 3L1M 5CPV 1J82 1VIH 2Z7J 4F68 6GF0 1EFQ 1PAZ 2HWV 3L6W 5CUL 1JC7 1VMG 2ZGD 4F69 6H0K 1EIF 1PBI 2I9V 3LHC 5D53 1JDL 1W2L 2ZHH 4F6D 6H0L 1EIG 1PI8 2IGP 3LJM 5D54 1JER 1W6X 3A0D 4FC1 6HA4 1EMN 1PKS 2IIY 3LLH 5DKN 1JHC 1WAQ 3A0S 4FDX 6I3S 1EN2 1PO8 2INT 3LMF 5DM9 1JKD 1WHI 3A0V 4FE6 6I5A 1ENV 1PZ4 2J5A 3LR0 5DZD 1JOI 1WJG 3A7L 4G3O 6IFH 1EOQ 1PZA 2JIN 3LVE 5E0Z 1JON 1WJX 3ADY 4G4W 6INS 1EOT 1PZB 2JK7 3LYO 5E4P 1JPE 1WRP 3AZ5 4G4X 6IQC 1EOW 1PZC 2JP8 3LZ2 5E4X 1JPO 1WY9 3B7X 4GBC 6JQ4 1EQV 1Q4R 2JVH 3M4T 5EE2 1JZA 1XW3 3B84 4GBN 6MHN 1EV3 1Q4V 2KRI 3MAZ 5EH4 1K1Z 1XW4 3BF2 4GFY 6MPM 1F1W 1Q7O The Protein-Folding Problem, 50 Years On Amyloid Fibrils: the Eighth Wonder of the World in Protein Folding and Aggregation Tong from the University of Sydney for their support and guidance. Lin Yang is grateful for his research experience in the Weizmann Institute of Science for inspiration. The authors acknowledge the financial support from the National Natural Science Foundation of China The authors declare no competing financial interests. PDBID:1FLQ SKHAFSL CSKC REN LTDG YFDG PD YDED LDN NF TQA NTDG GILQ SRWW DGRTPGS DMLKM NLCN SSD DGNGMNAW KATD RGC PDBID:6INS PDBID:1FLY GERG TP PKG TSI LDN NF TQA NTDG GILQ SRWW DGRTPGS PDBID: IDT IDT IDT PDBID:1IRW PDBID:1T00 EKGG HKVGP LHGIFG SGQAEGY IPGT HMAG TDD LKN AAWC GDKI IDEN SIP PDBID:1JDL GG CMACH GPDA NLVGP LTGVID AGTAPGF PD PDBID:1UIA PDBID:1JKD DN NF TQA NTDG GILQ SRWW DGRTPGS MDG GY TRA AGDR GIFQ SRYW DGKTPGA NLCN SSD GN KGT RGC NACH QDN DPQ QNR VQGC PDBID:1UWM PDBID:1KH0 EHNG KPG AYEPNPAT DG FANG NG FANG NG PDBID:1WAQ PDBID:1KH8 WDDW PL EFPL DPEST DSAN DSSTS NLTKDR CKNG TGSS YP PDBID:1WHI PDBID:1KXI QE GSGR NIG TKRG RPDG RDDK AP LPFI PEGK KKFPL DNC SAL TDKC Table. S2. Amino acid sequences of β-turns of the 1000 protein samples.Note S3. Amino acid sequences of long β-strands with two adjacent RB residues locating at the middle segments of the β-strands.Figure. S1 Long β-strands haven't curved at their two RB residues in the amino acid sequences in the 1000 proteins.