key: cord-0282456-927rze2h authors: Pak, Marina A.; Markhieva, Karina A.; Novikova, Mariia S.; Petrov, Dmitry S.; Vorobyev, Ilya S.; Maksimova, Ekaterina S.; Kondrashov, Fyodor A.; Ivankov, Dmitry N. title: Using AlphaFold to predict the impact of single mutations on protein stability and function date: 2021-09-20 journal: bioRxiv DOI: 10.1101/2021.09.19.460937 sha: d032944bd2dcb33981fa9e072c47870dbd58a813 doc_id: 282456 cord_uid: 927rze2h AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is “solved”. However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the AlphaFold predictions on the impact of a single mutation on structure with a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold cannot be immediately applied to other problems or applications in protein folding. AlphaFold is widely claimed to have revolutized protein 3D structure prediction from protein sequence, a 50-years long-standing challenge of protein physics and structural bioinformatics [1] . The fourteenth round of CASP, a blind competition on protein 3D structure prediction [2] , demonstrated that AlphaFold, a newcomer to the field, significantly outperforms all other methods. Crucially, AlphaFold models showed an accuracy of their predicted structures that was comparable to structures solved by experimental methods, like X-ray crystallography, NMR, and Cryo-EM [3] . 'It will change everything', said Andrei Lupas in an interview to Nature [3] . One of the primary changes may be that AlphaFold may also solve other problems related to protein folding. These problems include the prediction of various protein interactions, such as protein-protein, proteinligand and protein-DNA/RNA, and the prediction of the impact of mutations on protein stability. AlphaFold proved to be useful for experimental determination of protein structures with molecular replacement phasing [4, 5] and already facilitated elucidation of SARS-Cov2 protein structures [6, 7] . Furthermore, AlphaFold in collaboration with EMBL-EBI launched a global initiative on constructing the structure models for the whole protein sequence space [8] . The database of freely available structures of all human proteins, and other 20 key organisms yet to be determined, is attributed to "revolutionize the life sciences" [3] . Furthermore, AlphaFold is expected to bring new insights into our understanding of the structural organization of proteins, boost the development of new drugs and vaccines [9] . Researchers in the field are already actively testing AlphaFold performance in various bioinformatics tasks, for instance, in peptide-protein docking [10, 11] . Guided by the expected immediate impact of AlphaFold for the solution of a wide range of problems in structural bioinformatics, we explored the capacity of AlphaFold predictions to serve as a proxy for the impact of mutations on protein stability change (∆∆G). Although Al-phaFold provides a disclaimer that it "has not been validated for predicting the effect of mutations" (https://alphafold.ebi.ac.uk/faq), the expectations of AlphaFold are so high that we judged it prudent to check how well AlphaFold predictions could work for estimation of ∆∆G values. We found that the difference between pLDDT scores, the only local AlphaFold prediction metric reported in the output PDB file, had a very weak correlation with experimentally determined ∆∆G values (Pearson correlation coefficient, PCC = 0.17). The difference in the global AlphaFold metric -the pLDDT averaged for all residues -shows no correlation, both isolated and in combination with the mutated residue's pLDDT score. Similarly, the same AlphaFold metrics had a very weak correlation with the impact of single mutations on protein function, fluorescence, of GFP. Recent results show that the use of AlphaFold models instead of template structures does not improve ∆∆G prediction by FoldX [12] [13] . Taken together, so far we do not see a use for AlphaFold to help solve the problem of predicting the impact of a mutation on protein stability. The availability of Al-phaFold models allows applying more accurate 3D protein structure-based ∆∆G predictors rather than sequence-based ∆∆G predictors; the bottleneck still seems to be the accuracy of current 3D protein structure-based ∆∆G predictors. We used experimental data on protein stability changes upon single-point variations from Thermo-Mut Database [14] . After the filtering procedure (see Methods) we randomly chose 976 mutations in 90 proteins for our analysis. For the multiple linear regression analysis, the dataset was split into two sets, a training and a testing set. The split was based on BLAST [15] results, such that the mutations were assigned to the testing set if corresponding proteins had <50% sequence identity to any other protein in the entire dataset (see Methods). All of the other mutations were assigned to the training set. Along with coordinates of all heavy atoms for a protein, AlphaFold model contains "its confidence in form of a predicted lDDT-Cα score (pLDDT) per residue" [1] . LDDT ranges from 0 to 100 and is a superposition-free metric indicating to what extent the protein model reproduces the reference structure [16] . The pLDDT scores averaged across all residues designate the overall confidence for the whole protein chain (). For each mutation in the dataset, we calculated the difference in pLDDT between the wild type and mutated structures in the mutated position as well as the difference in between wild type and mutant protein structure models. By checking ∆pLDDT and ∆ values as potential proxies for the change of protein stability we explored the hypothesis that the change of protein stability due to mutation is somehow reflected in the difference of AlphaFold confidence between wild type and mutant structures. First, we studied the relationship between the effect of mutation on protein structure stability and the difference in the accuracy of protein structure prediction by AlphaFold for the wild-type and mutant proteins. We did not observe a pronounced correlation between the mutation effect and the difference in confidence metrics (Figure 1 ). The correlation coefficient is 0.17 ± 0.03 (p-value < 10 -7 ) for ∆pLDDT and 0.00 ± 0.03 (p-value = 0.93) for the ∆. Since the confidence metrics for a given amino acid and whole protein are weakly correlated (PCC = 0.20 ± 0.03, p-value < 10 -9 ) we then explored how their combination correlates with the effect of mutation. Multiple linear regression model resulted in the dependence ∆∆G = -1.08 + 0.11·∆pLDDT + 0.05·∆. We did not obtain any pronounced correlation either for training (0.12 ± 0.05, p-value = 0.03) or test sets (0.20 ± 0.04, p-value < 10 -5 ). We explored the outliers with high absolute values of ∆pLDDT in Figure 1A . We noticed the association between ∆pLDDT score and the change of the side chain size upon mutation; to check the association, we conducted the Kruskal-Wallis H-test. The ∆pLDDT scores within different changes in amino acid volume were significantly different (p-value = 0.02). We observed that most of the outlier mutations had a destabilizing effect. We assessed the correlation between the destabilizing effect of a mutation and the decrease in the confidence score. The PCC was greater than for the whole set of mutations, but still not strong (PCC = 0.38 ± 0.08, p-value < 10 -5 ). However, the size-changing mutations also populate the region of near-zero pLDDT values, so these mutations also poorly predict ∆∆G from ∆pLDDT (PCC = 0.17 ± 0.07, p-value = 0.01). Protein stability is intimately coupled with protein functionality. Thus, a reasonable hypothesis holds that the loss of protein functionality due to mutations in most cases results from reduced stability [17] . Therefore, along with testing correlation of AlphaFold metrics with ∆∆G, it is reasonable to test the correlation of AlphaFold metrics with protein function. Furthermore, the change of pLDDT scores may contribute directly to protein functionality without contributing to protein stability. We checked the correlation between ∆pLDDT values and the fluorescent level of 447 randomly chosen single GFP mutants from [18] . The correlation coefficient is 0.14 ± 0.05 (p-value = 0.003) for ∆pLDDT and 0.16 ± 0.05 (p-value < 10 -3 ) for the ∆ (Figure 2 ). Extraordinary success of AlphaFold in predicting protein 3D structure from protein sequence may lead to temptation to apply this tool to other questions in structural bioinformatics. Here we checked the potential of AlphaFold metrics to serve as a predictor for the impact of mutation on protein stability and function. We found a weak correlation of 0.17 ± 0.03 between ∆pLDDT and ∆∆G associated with specific mutations. Although the correlation was statistically significant (p-value < 10 -7 ), it is so weak that it cannot be used for accurate ∆∆G predictions ( Figure 1 ) and it is unclear how such predictions can be used in practical applications. Clearly, ∆pLDDT would show a better correlation with ∆∆G if it was measured across bins of averaged ∆∆G. Alternatively, ∆pLDDT could be a separate term in a multiple linear regression model. The averaged metric ∆ shows correlation with ∆∆G, which is statistically indistinguishable Figure 2 : Correlation between the GFP fluorescence and change of confidence score of structure prediction for the mutated amino acid (A) and for the whole structure (B). from zero. However, a linear combination of the two metrics, ∆pLDDT and ∆, does not greatly improve the correlation. As for the loss-of-function prediction, the correlation with the impact of mutation on GFP fluorescence showed similar results: PCC was 0.14 ± 0.05 and 0.16 ± 0.05 for ∆pLDDT and ∆, respectively ( Figure 2 ). Taken together, our data indicate that AlphaFold predictions cannot be used directly to reliably estimate the impact of mutation on protein stability or function. But why should we have expected such a correlation in the first place? Indeed, AlphaFold was not designed to predict the change of protein stability or function due to mutation. In the words of the authors "AlphaFold is not expected to produce an unfolded protein structure given a sequence containing a destabilising point mutation" (https://alphafold.ebi.ac.uk/faq). However, the only reason for a protein to fold into the distinct native structure is the stability of this structure, so the protein 3D structure and its stability are closely connected. Logically, an algorithm predicting protein 3D structure from sequence should search for the most stable 3D state under the native (or standard) conditions. If a compact structure becomes unstable (for example, due to mutation) then we might expect that the algorithm shifts its predictions toward an unfolded state. Evidence in favor of this point of view is the successful prediction of natively disordered protein regions by AlphaFold and the correlation between the decrease of pLDDT and propensity to be in a disordered region [19] . Thus, it is not unreasonable to expect a decrease in the confidence score of the mutated residue or the whole native structure. Our results show that AlphaFold repurposing for ∆∆G prediction did not work for the proteins we studied. However, AlphaFold may be used to help predict the impact of a mutation on protein stability or function using AlphaFold 3D models for 3D-structure based ∆∆G predictors. Indeed, it was reported many times that 3D-based predictors perform better than 1D-based [20, 21, 22] , so the availability of a pool of high-quality 3D predicted structures could be a plus. However, the performance of the resulting predictions is going to be far from perfect: the 3D-structure based ∆∆G predictors are far from being perfect even using 3D structures from PDB [23] , they show correlation of 0.59 or less in independent tests [24] . Thus, using AlphaFold models instead of PDB structures does not make ∆∆G predictions more accurate [13] , so availability of AlphaFold models is expected to show an approximately 0.59 correlation with predictions of ∆∆G, which may be too low for many applications. Will the newly AlphaFold-generated 3D structures for proteins with unknown 3D structures be useful for the training of the new ∆∆G predictors? It seems unlikely, because the vast majority of the current experimental ∆∆G values relate to proteins with known 3D structures. Thus, the newly generated structures will be useful to the same extent as the entire PDB has been across the last few decades. The deep learning approach demonstrated by AlphaFold may be an inspiring example to develop a deep learning ∆∆G predictor. However, we see the dramatic difference between the situations with 3D structure prediction and ∆∆G prediction that may impede this development. The difference is in the amount of available data. For protein structure prediction AlphaFold used PDB with >180,000 files, and each file contained a wealth of information. In contrast to PDB, the number of experimentally measured ∆∆G values are of the order of 10,000 and these are just numbers without accompanying extra data. To make a rough comparison of information in bits, PDB structures occupy 100 Gb, while all the known experimentally ∆∆G values occupy 10 kb. Neural networks are very sensitive to the amount of information in the training set so the ability of deep learning to tackle the ∆∆G prediction task at present looks hindered mostly by the lack of experimental data. Overall, we explored the capacity of direct prediction of ∆∆G by all of the reported AlphaFold metrics: (i) the difference in the pLDDT score before and after mutation in the mutated position, (ii) the difference in the averaged pLDDT score across all positions before and after mutation. We found that the correlation was weak or absent, and, therefore, AlphaFold predictions are unlikely to be useful for ∆∆G predictions. Taken together with our recent result that AlphaFold models are not better for ∆∆G predictions by FoldX than best templates [13] , we see no straightforward way to use AlphaFold advances for solving the task of prediction of ∆∆G upon mutation. The task of ∆∆G prediction should be solved separately and it will face the problem of limited amount of data for training neural networks. The data on experimentally measured effects of mutations on protein stability were taken from ThermomutDB [14] (version 1.3). From 13,337 mutations in the database we extracted single-point mutations with data on ∆∆G. The filtered dataset resulted in 5712 mutations in 286 proteins. We have done the analysis for randomly chosen 976 mutations in 90 proteins. We took data on fluorescence levels of GFP mutants from [18] . From the original dataset we randomly extracted 447 single mutants for our analysis. The wild type protein structures were retrieved from the AlphaFold Protein Structure Database (AlphaFold DB) [8] by their Uniprot accession code. The structures of original proteins that were absent in the AlphaFold DB as well as structures of mutant proteins were modeled by the standalone version of AlphaFold [1] using the fasta file with Uniprot sequence of a protein as the only input in the '-fasta paths' flag. The per-residue local distance difference test (pLDDT) confidence scores for the protein structure models downloaded from the AlphaFold DB were retrieved from the B-factor field of the coordinate section of the pdb file. The pLDDT confidence scores for the protein structure models that we predicted by standalone AlphaFold were extracted from the pickle file for the best ranked model from "plddt" array. To identify the sequence identities of the proteins in the dataset of mutations we performed protein BLAST [15] search of protein sequences against themselves. We divided the dataset into training and test sets for linear regression model based on the arbitrary sequence identity threshold of 50%. Mutations in proteins above the threshold comprised the training set, and the rest of mutations were used as the test set. The training and test sets resulted in 364 mutations in 60 proteins and 612 mutations in 30 proteins. Multiple linear regression fit with two parameters was performed using the linear model module of Sklearn library with default parameters. Highly accurate protein structure prediction with AlphaFold Critical assessment of methods of protein structure prediction (CASP)-Round XIII It will change everything": DeepMind's AI makes gigantic leap in solving protein structures Assessing the utility of CASP14 models for molecular replacement AlphaFold2 transmembrane protein structure prediction shines CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes Crystallographic molecular replacement using an in silico-generated search model of SARS-CoV-2 ORF8 Highly accurate protein structure prediction for the human proteome Can We AlphaFold Our Way Out of the Next Pandemic Can AlphaFold2 predict protein-peptide complex structures accurately? Harnessing protein folding neural networks for peptide-protein docking Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutations Best templates outperform homology models in predicting the impact of mutations on protein stability ThermoMutDB: a thermodynamic database for missense mutations Basic local alignment search tool lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein Local fitness landscape of the green fluorescent protein AlphaFold and Implications for Intrinsically Disordered Proteins DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations INPS-MD: a web server to predict stability of protein variants from sequence and structure Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting The Protein Data Bank Assessing computational methods for predicting protein stability upon mutation : good on average but not in the details Zhores" -Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of The authors thank Zimin Foundation and Petrovax for support of the presented study at the School of Molecular and Theoretical Biology 2021. The authors acknowledge the use of Zhores supercomputer [25] for obtaining the results presented in this paper.