key: cord-0429931-4nykshib authors: Chakravarty, Devlina; Porter, Lauren L. title: AlphaFold2 fails to predict protein fold switching date: 2022-03-24 journal: bioRxiv DOI: 10.1101/2022.03.08.483439 sha: 4117b002b5f348e4666bb6ffd1970ba7fe5b4d16 doc_id: 429931 cord_uid: 4nykshib AlphaFold2 has revolutionized protein structure prediction by leveraging sequence information to rapidly model protein folds with atomic-level accuracy. Nevertheless, previous work has shown that these predictions tend to be inaccurate for structurally heterogeneous proteins. To systematically assess factors that contribute to this inaccuracy, we tested AlphaFold2’s performance on 98 fold-switching proteins, which assume at least two distinct-yet-stable secondary and tertiary structures. Topological similarities were quantified between five predicted and two experimentally determined structures of each fold-switching protein. Overall, 94% of AlphaFold2 predictions captured one experimentally determined conformation but not the other. Despite these biased results, AlphaFold2’s estimated confidences were moderate-to-high for 74% of fold-switching residues, a result that contrasts with overall low confidences for intrinsically disordered proteins, which are also structurally heterogeneous. To investigate factors contributing to this disparity, we quantified sequence variation within the multiple sequence alignments used to generate AlphaFold2’s predictions of fold-switching and intrinsically disordered proteins. Unlike intrinsically disordered regions, whose sequence alignments show low conservation, fold-switching regions had conservation rates statistically similar to canonical single-fold proteins. Furthermore, intrinsically disordered regions had systematically lower prediction confidences than either fold-switching or single-fold proteins, regardless of sequence conservation. AlphaFold2’s high prediction confidences for one fold-switching conformation corroborate previous work showing that machine-learning-based structure predictors fail to capture other fundamental biophysical features of proteins such as their folding pathways. Our results emphasize the need to look at protein structure as an ensemble and suggest that systematic examination of fold-switching sequences may reveal propensities for multiple stable secondary and tertiary structures. AlphaFold2 has revolutionized protein structure prediction by using sequence information to rapidly model protein folds with atomic-level accuracy 1; 2 . Its predictions are generated by a deep neural network that identifies features of both multiple sequence alignments (MSAs) and experimentally determined protein structures. In other words, AlphaFold2 predictions are generated by a highly sophisticated deep-learning model that excels at recognizing correlations between protein sequence and structure. AlphaFold2's approach to protein structure prediction is rooted in a large training set of experimentally determined protein structures. Indeed, without the Protein Data Bank (PDB), a publically available repository of nearly 200,000 protein structures 3 , it would be impossible to predict protein structure through deep learning 4 . Consequently, deep learning-based approaches are likely to miss protein properties that are not apparent from available experimentally determined structures. For example, the conformations accessible to structurally heterogeneous proteins, whose overall secondary and tertiary structures are either unstable or change in response to their environment, cannot be captured in a single protein structure. Thus, it is not surprising that AlphaFold2 often fails to accurately predict the conformations of intrinsically disordered proteins [5] [6] [7] , whose structures are highly heterogeneous. AlphaFold2 predictions cover 99% of sequences in the human proteome (https://alphafold.ebi.ac.uk/), but only 58% of residues are modelled with high confidence 5; 7 . Many low-confidence predictions correspond to intrinsically disordered proteins/regions (IDPs/IDRs), often predicted to fold into long filaments 5; 6 . Furthermore, it remains unknown how accurately high-confidence predictions capture the structures of unchararacterized proteins, especially those with few homologs in sequence databases or with sequences dissimilar to proteins represented in the PDB. Here, we systematically assess whether AlphaFold2 captures the structural heterogeneity of fold-switching proteins. Contrasting IDPs/IDRs, which are natively unstructured, foldswitching proteins have regions of 20 or more contiguous amino acids that either assume distinct stable secondary and tertiary structures under different cellular conditions or populate distinct stable secondary and tertiary structures at equilibrium [8] [9] [10] . Thus, the sequences of fold-switching protein regions encode more than one ordered state 9; 11 . As AlphaFold2 maps primary structure (amino acid sequence) to three-dimensional structure, we compared its predictions with experimentally determined protein structures to explore whether it identifies the two stable structures encoded by fold-switching sequences, a single structure, or something else. A set of 98 proteins 9; 12 that assumed at least two distinct-yet-stable secondary/tertiary structures (folds) was used for the analysis (Table S1A) . This unique dataset contains protein pairs with extremely high levels of sequence identity (mean 99%/median 100%, Table S1B ) but regions of 20 or more contiguous residues with different secondary structure configurations, quantified previously 9 by comparing aligned secondary structure annotations assigned by hydrogen bonding 13 and backbone torsion angles 14 . Out of the 98, 93 had the alternate fold solved in PDB (Tables S1A and B); 91/93 of these proteins have 90% aligned identity or higher; the other 2 are homologs with experimental evidence of fold switching. Similarly, the remaining 5/98 were homologs of fold switchers with only one solved structure in the PDB. These proteins are expected to switch folds because their closely-related homologs do (such as KaiBs from other strains of cyanobacteria with circadian clocks: 4ksoA, 1wwjA, 1r5pA), or were shown to switch folds by methods other than crystallography, e.g., NMR (Nuclear Magnetic Resonance), CD (circular dichroism), or cryo-EM (cryogenic electron microscopy 1f16A, 3k2sA) 12 . The sequences of these 5 proteins were used mainly for generating predictions, followed by analyzing prediction scores after modeling (Assessment of model quality) and also for conservation scores from the MSA (Multiple Sequence Alignment) generated during prediction (Conservation scores and rate of evolution using MSA), but were not used for structural comparisons using Template Model (TM)-scores and Root Mean Square Deviations (RMSD) (Assessment of model quality). The structural comparisons between the experimentally determined pairs along with their sequence identities can be seen in Table S1B . Mean/median TM-scores of fold-switching pairs are 0.58/0.63 and RMSDs are 12.6/9.2Å, demonstrating that the experimentally determined structures differ. A set of 99 proteins was randomly selected from DisProt (https://disprot.org/), a database of experimentally characterized intrinsically disordered proteins, with disordered regions manually curated from the literature 15 . The proteins chosen for the analysis (Table S2) had disordered regions ranging from 20-100 residues (to keep their average sizes similar to fold-switching regions); these regions were not located at termini. The set also included the three disordered proteins mentioned in previous work 6 : histone acetyltransferase p300 (Uniprot: Q09472, DisProt: DP00633), CREB-binding protein (Uniprot: Q92793, DisProt: DP02004) and the RNA-binding protein FUS (Uniprot: P35637, DisProt: DP01102). The FASTA sequences of fold-switching proteins were extracted from PDB SEQRES records and used as input to the AlphaFold2 structure prediction model 1 Table S2 ). AlphaFold's top-scoring models were ranked from 1 to 5 by per-residue Local Distance Difference Test (pLDDT) scores (a per-residue estimate of the prediction confidence on a scale from 0 -100), quantified by determining the fraction of predicted Ca distances that lie within their expected intervals. The values correspond to the model's predicted scores based on the lDDT-Cα metric, a local superposition-free score to assess the atomic displacements of the residues in the model 1 . Models ranked in the top 5 were compared to the original PDB structure using structural alignment as implemented in TM-align 16 , an algorithm for sequence-independent protein structure comparisons. TM-align first generates an optimized residue-to-residue alignment based on secondary structure connections or topology using dynamic programming iterations. An optimal superposition of the two structures was then built on the resulting alignment and TM-score (ranging from 0 to 1) is reported as the measure of overall accuracy of prediction for the models. TM-score > 0.5 implies roughly the same fold 17 , and a higher value indicates a better match. As an alternative measure of structural similarity, we aligned sequences, used the alignment to determined least-square superposition of backbone atoms (C, Ca, O and N), and calculated their RMSD (root-mean-square deviation) using ProFit (Martin, A. C. R., http://www.bioinf.org.uk/software/profit/). These standards of structural similarity were also used by authors of AlphaFold2 to assess the quality of their predictions 1 . Conformations with higher TM-scores for at least 3/5 AlphaFold2 predictions were designated "Fold1." Two exceptional cases, 6z4u/5tpn, were designated "Fold1" because they had good/moderate TM-scores (>0.9/0.66) for 2/5 AlphaFold2 predictions, whereas the remaining 3 predictions had moderate/poor TM-scores (< 0.75/0.22) for both folds. This ordering was maintained for the RMSD analysis. with either RMSD < 5Å were also assigned to Cluster 2 ( Figure S4A ). TM-scores of foldswitching regions of proteins from Cluster 1 were determined by excising fold-switching regions from both experimentally determined structures and the 5 AlphaFold models and comparing them with TMalign. Orders of Fold1 and Fold2 were identical as in Table S1C except for 5B3Z_A/5BMY_A, all of whose predictions were biased toward Fold2, and thus their ordering was switched. This result is not surprising given that the fold-switching region of this pair was small compared with the whole protein (29/403 residues). Predictions for Fold2 were considered significantly larger if their TM-scores exceeded the TM-scores of Fold1 by at least 0.05, ruling out cases where predictions were equally good for both folds but the TM-score was marginally better for Fold2. As reported previously 19 , we divided the alignments into fold-switching and single-folding regions, since fold-switching proteins are typically composed of both 9 . Sequences of foldswitching regions along with their lengths are reported in Table S1A . The multiple sequence alignments (MSAs) generated by AlphaFold2 to predict residueresidue distances and orientations were used to determine evolution rates, using Rate4Site (https://www.tau.ac.il/~itaymay/cp/rate4site.html) 20 . The program requires an MSA file to compute a phylogenetic tree and then calculates the relative conservation score for each column in the MSA. An empirical Bayesian method, which significantly improved the accuracy of conservation score estimates over the Maximum Likelihood method, was used to generate the rates 20 . The scores are represented as grades ranging from conserved-9 to variable-1. The distributions of pLDDT scores and Rate4Site grades in fold-switching regions were compared to the rest of the protein (the single-folding regions) and with a set of 99 IDP/IDRs (Table S2) . Significance of differences in pLDDT distributions was calculated by employing the two-sample Kolmogorov-Smirnov and Epps-Singleton tests implemented in SciPy 21 . As Rate4Site scores yield a discrete distribution, only the Epps-Singleton test was used. The scripts used for all analyses were written in Perl and Python; protein figures were generated in PyMOL 22 and plots in Matplotlib 23 and seaborn 24 . The five top-scoring AlphaFold2 models of each fold-switching sequence were compared with two distinct, experimentally determined conformations using TM-align, which quantifies similarity of topology and connections between secondary structure elements by TM-score 16 . This metric was used by the authors of AlphaFold2 to assess the accuracies of their models 1 . We systematically termed the more accurately predicted conformation as Fold1 and the less accurately predicted on as Fold2. Generally, when three or more models out of five had higher TM-scores (i.e. were more similar) to one experimentally determined structure, we termed it Fold1 (Ordering conformations, Methods); the other experimentally determined structure was termed Fold2. To augment TM-scores we performed RMSD comparions as well ( Figure S4 ). AlphaFold2 models were highly biased toward Fold1. A scatterplot of prediction accuracies, measured by TM-scores ( Figure 1A) , indicates that nearly 94% of predictions fall below the identity line and are thus more similar to Fold1 than Fold2. The k-means algorithm was used to subdivide the scatterplot into three clusters, corresponding to the first local minimum of k-means inertia curvature with respect to the number of clusters ( Figure 1A, Figure S1 , Methods). To simplify discussion, clusters were ordered by prediction quality rather than size. Cluster I comprised ~33% of all predictions, which were the most accurate for both conformations (TMscores ≥ 0.8). Cluster II comprised ~52% of all predictions and generally paired either one good prediction (TM-score ≥ 0.8) and one moderate prediction (0.6 ≤ TM-score < 0.8) or two moderate predictions. Cluster III comprised the remaining 15.5% of predictions, all of which had at least one poor prediction (TM-score < 0.6). Structural predictions tended to be conformationally homogeneous. Specifically, all 5 models were most similar to Fold1 in over 80% (75/93) of fold-switching sequences ( Figure 1B) . Additionally, TM-scores of Fold1 and Fold2 were very close (average difference in TM-scores 0.022 ± 0.017) in 13 of the remaining 18 cases, again indicating high levels of structural similarity among AlphaFold2-predicted models. The remaining 5 cases sample both conformations with moderate-to-good accuracy, and representatives are shown in Figure S2 . Examples of fold-switching proteins from all three clusters are shown in Figure 1C . In Cluster 1, a short region of MacA, a bacterial cytochrome c peroxidase, switches folds during reductive activation 25 . AlphaFold2 predicts that its fold-switching region assumes only the oxidized conformation in its 5 best models. Although all models in Cluster 1 had good TM scores (≥0.8) for both conformations, they were more similar to Fold1 than Fold2. Good scores for Fold2 likely result from shorter lengths of fold-switching regions compared to the lengths of the remainder of the protein (Table S1A) , which had good overall predictions except for the relatively short fold-switching regions. Indeed, TM-scores comparing predicted and experimentally determined fold-switching regions of proteins in Cluster 1 were biased toward Fold1 (Fig. S3) : TM-scores can sometimes mislead, especially when comparing mostly helical segments and/or highly dissimilar structures or sequences. To assess the accuracy of this metric, we used RMSD (root-mean-square deviation) to compare AlphaFold2 models with Fold1 and Fold2, whose ordering was kept identical to TM-score ordering. In other words, for the RMSD calculations, Fold1 and and Fold2 were not reordered by accuracy but rather maintained the same ordering assigned by TM-score (Methods, Ordering conformations) . This allows consistent comparison between RMSD and TM-score calculations. It should also be noted that the authors of AlphaFold2 also combined TM-scores and RMSDs to assess the accuracy of their models 1 . As with TM-scores, AlphaFold2 predictions tended to have better RMSDs from Fold1 than Fold2 ( Figure S4A) . Specifically, predictions where RMSD was ≤ 5Å for at least one structure were better for Fold1 in 83% of cases. Additionally median/mean RMSDs were significantly more accurate for Fold1 (2.9/5.7Å) than Fold2 (9.6/11.9Å). TM-scores were plotted against sequence identities (calculated on the alignment generated by TM-align) between the protein and the prediction ( Figure S4B) . For ambiguous cases with sequence alignments and TM-scores <0.5, the RMSD values were also large (mostly >10 Å), as seen in the bar plot inset. Hence, structural deviations in these ambiguous cases are corroborated by high RMSD values. Finally, TM-scores and RMSDs of AlphaFold2 models vs. experimentally determined fold switchers were significantly correlated: Pearson R: -0.62, p < 3.3*10 -98 (assuming normal distribution, Figure S4C ). Together, these results indicate that AlphaFold2 preferentially predicts one fold-switch conformation over another. Only a few fold switchers have either been shown to populate two folds simultaneously in solution or populate two distinct crystal forms 10; 26; 31; 32 . Here, we found 7 ( Table S4) , all of which had only 1 conformation predicted by AlphaFold2. More typically, fold-switching proteins assume a more stable "ground" state and a less stable "excited" state 33 . Thus, we classified the remaining 86 protein pairs into "ground" and "excited-state" conformations. We define ground state in three ways: first as isolated protein when the other conformation binds a ligand, second as a preferred conformation suggested by the literature, such as the ground state tetrameric conformation of KaiB 34 , and third as one of two bound conformers (seven cases in Table S4 ). This third definition gives AlphaFold2 the benefit of the doubt when both structures are ligand-bound. One might argue that if AlphaFold2 captures the ground state, its predictions could reasonably be considered correct. It does so in 76% of cases, but not in the remaining 24%. For example, it predicts that KaiB assumes an "excited" thioredoxin fold 33 in 5/5 cases. Combining all 93 foldswitch pairs, AlphaFold2 captures the ground state conformation 70% of the time, but misses it in the remaining 30%. As with fold-switching proteins, AlphaFold2 frequently mispredicts the conformations of AlphaFold2's prediction confidences were compared for fold-switching, single-folding regions within the fold-switching proteins, and intrinsically disordered regions in IDPs. AlphaFold2 was run on 99 IDP sequences randomly selected from the DisProt database 15 (Methods). The pLDDT scores of predicted IDPs were compared with those of the fold-switching and single-folding protein regions determined previously (Methods). Figure 2 shows that IDPs have lower average pLDDT scores (55 ± 24) than fold-switching (80 ± 20) and single-folding (87 ± 16) sequences. Furthermore, 74%/87% of fold-switching/single-folding residues had good pLDDT scores (≥70), compared with only 30% for IDPs. Finally, the overall distributions of all three sets of sequences were statistically dissimilar (p ~ 0, Kolmogorov-Smirnov and Epps-Singleton tests). Together, these results demonstrate that, in contrast to IDPs, AlphaFold2 predictions of fold-switching sequences have relatively high confidences, though not quite as high as single-folding protein regions. AlphaFold2 does not capture the conformational heterogeneity of IDPs or fold switchers particularly well. While its prediction confidences are generally low for IDPs, they are higher for fold switchers. We investigated whether the low prediction confidences of IDPs resulted from their rapid rates of sequence evolution 35 . This often confounds construction of statistically robust multiple sequence alignments, necessary inputs for generating accurate distance restraints for protein structure prediction 28 . Thus, stronger conservation of fold-switching sequences could be a possible explanation for higher pLDDT scores. We calculated the conservation scores of the sequences from our set of 98 fold switchers and compared them with 99 IDPs. Specifically, we ran Rate4Site 20 on multiple sequence alignments generated and used by AlphaFold2 for predictions of fold-switching proteins and IDPs Distributions of AlphaFold2 predictions, measured by pLDDT scores, differ between fold-switching (blue), single-fold (gray), and intrinsically disordered (red) protein sequences. Lower pLDDT scores indicate lower prediction confidences. Thus, AlphaFold2 is generally less confident in its predictions of IDPs than fold-switching or singlefolding proteins. These results suggest that AlphaFold2 predictions of fold-switching proteins may have higher confidences because their sequences are more highly conserved than for IDPs. Frequencies of good AlphaFold2 prediction confidences (pLDDT scores ≥ 70) increase with residue conservation for fold-switching, single-folding and disordered proteins ( Figure 4A) . Thus, poorly conserved sequences, specifically conservation scores of 1-3, are associated with lower prediction confidences. At least for IDPs, this is likely explained by poorly conserved sequences having shallower and/or poorly aligned MSAs 1 . Nevertheless, shallow MSAs do not fully explain why AlphaFold2 prediction confidences are higher for fold switchers than for IDPs. Good pLDDT scores level off at around a conservation level of 4 for both fold switchers and single folders, whereas pLDDT scores continue to increase with conservation level for disordered proteins (Figure 4A) . Furthermore, for all conservation levels, distributions of prediction confidences for disordered residues were skewed systematically lower than corresponding distributions of fold-switching and single-folding residues ( Figure 4B) . Finally, prediction confidences for fold switchers and MSA depth are uncorrelated, as evidenced by a Pearson correlation coefficient of 0.02 ( Figure S5) . Conservation scores of IDPs, as indicated by Rate4Site grades (1=poorly conserved; 9= well conserved) differ between fold-switching (blue), single-fold (gray), and intrinsically disordered (red) protein sequences. Sequences of fold-switching and single-fold proteins tend to be more conserved than IDP sequences. AlphaFold2 is a major advance in protein structure prediction 1 , particularly for single-fold Distributions of prediction confidences (quantified by pLDDT scores) are skewed lower for disordered proteins (red) than for single-fold (gray) and fold-switching proteins (blue). (B). Wider regions correspond to more populated prediction confidences. In both cases, conservation score was determined using Rate4Site; higher scores correspond to more conserved sequences. Gray/white backgrounds group protein regions with the same conservation score. predictions often fail for proteins whose properties are not fully apparent from solved protein structures, such as IDPs [5] [6] [7] . Although AlphaFold2 can be used to predict alternative quaternary structures 38; 39 , here we show that it consistently fails to predict the conformational diversity of fold switchers, proteins that assume multiple secondary structure configurations. Specifically, AlphaFold2 failed to predict fold switching in its top 5 models for 75/93 proteins. Instead, it consistently predicted that fold switchers assume one dominant fold. Since proteins are typically assumed to have one fold, this result is not surprising, especially because AlphaFold2's training set, the Protein Data Bank, contains relatively few fold-switching proteins 8; 9 . It is notable, however, that AlphaFold2's predictions miss the ground state of fold switchers 30% of the time. This is further evidence that its predictions are primarily rooted in sophisticated pattern recognition, not protein biophysics 37 . Unlike IDPs, prediction confidences for fold-switching sequences are relatively high (74% have pLDDT scores > 70 compared with 30% for IDPs). This result, combined with the weak relationship between pLDDT distributions and conservation scores for fold-switching proteins, suggests that AlphaFold2 assumes that stably folded proteins assume one dominant structure. This assumption leads it to miss biologically relevant structural information for some proteins, despite high-confidence predictions. It also raises the question of how much of the full picture AlphaFold2's full-genome predictions 7 capture. The dramatic structural rearrangements of fold-switching proteins regulate biological processes 40 and are associated with numerous diseases, including COVID-19 27 , cancer 41 , Alzheimer's 42 , and malaria 31 . Thus, predicting fold-switching proteins is an important problem. While some progress has been made 12; 43; 44 , much work remains to identify features unique to foldswitching proteins. Furthermore, detailed biophysical characterization of fold-switching proteins 45; 46 is needed. These challenges present an opportunity to improve predictive methods and possibly identify fundamental biophysical principles that are not yet well understood. Such discoveries could help to advance the field of protein structure prediction from sophisticated pattern recognition to methods based fully on protein biophysics. . AlphaFold2 predictions of fold-switching regions of proteins from Cluster 1, whose overall folds are predicted well (TM-scores ≥ 0.8, Figure 1A) , are more accurate for Fold1 than Fold2. Prediction accuracies were quantified using TM-scores, and 41% of predictions were inaccurate (TM-score < 0.6) for both experimentally determined conformations of fold-switching regions. Sequence identities are also included. (C) List of fold-switching protein pairs (PDBID and chain) used for the analysis; first column corresponds to Fold1 and second to Fold2, followed by TM-scores of the the top 5 predicted models. Tables attached separately. Table S3 . List of pairs in which sequences are not identical in length and sequence, where structure prediction was performed using both sequences for the pair. Attached separately. Table S4 . Ground and excited states for all fold-switch pairs. Pairs likely to sample both folds at equilibrium are bold. ** denotes excited state. AlphaFold2 predictions of pairs with 2 excited states were considered to capture the ground state. Highly accurate protein structure prediction with alphafold The breakthrough in protein structure prediction The protein data bank Reframing the protein folding problem: Entropy as organizer The alphafold database of protein structures: A biologist's guide Alphafold and implications for intrinsically disordered proteins Highly accurate protein structure prediction for the human proteome Extant fold-switching proteins are widespread Interconversion between two unrelated protein folds in the lymphotactin native state Proteins that switch folds A sequence-based method for predicting extant fold switchers that undergo alpha-helix <--> beta-strand transitions Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features The protein coil library: A structural database of nonhelix, nonstrand fragments derived from the pdb Disprot in 2022: Improved quality and accessibility of protein intrinsic disorder annotation Tm-align: A protein structure alignment algorithm based on the tmscore How significant is a protein structure similarity with tm-score = 0.5? Scikit-learn: Machine learning in python Inaccurate secondary structure predictions often indicate protein fold switching Comparison of site-specific rate-inference methods for protein sequences: Empirical bayesian methods are superior Scipy 1.0: Fundamental algorithms for scientific computing in python The pymol molecular graphics system, version 2.0 schrödinger, llc Matplotlib: A 2d graphics environment Seaborn: Statistical data visualization Maca is a second cytochrome c peroxidase of geobacter sulfurreducens An alpha helix to beta barrel domain switch transforms the transcription factor rfah into a translation factor Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms Assessing the utility of coevolution-based residueresidue contact predictions in a sequence-and structure-rich era Alphafold and the amyloid landscape Energy landscapes of protein aggregation and conformation switching in intrinsically disordered proteins Secondary structure reshuffling modulates glycosyltransferase function at the membrane The mad2 conformational dimer: Structure and implications for the spindle assembly checkpoint Circadian rhythms. A protein fold switch joins the circadian oscillator to clock output in cyanobacteria Structural basis of the day-night transition in a bacterial circadian clock Comparing models of evolution for ordered and disordered proteins The protein folding problem Current structure predictors are not learning the physics of protein folding Alphafold2 predicts the inward-facing conformation of the multidrug transporter lmrp Sampling alternative conformational states of transporters and receptors with alphafold2 Functional and regulatory roles of fold-switching proteins Clic1 promotes the progression of gastric cancer by regulating the mapk/akt pathways Clic1 function is required for beta-amyloid-induced generation of reactive oxygen species by microglia Sequence-based prediction of metamorphic behavior in proteins A high-throughput predictive method for sequence-similar fold switchers Probing transient excited states of the bacterial cell division regulator mine by relaxation dispersion nmr spectroscopy Evolution of fold switching in a metamorphic protein We thank Loren Looger for critically reading this manuscript. This work utilized resources from the NIH HPS Biowulf cluster (http://hpc.nih.gov), and it was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health. Fig. S1 K-means inertia versus number of clusters. The optimal value, determined by finding the minimal number of clusters whose second derivative is less than that of its immediate neighbors, is shown in red.