key: cord-0899910-cm22vurn authors: Pucci, Fabrizio; Rooman, Marianne title: Prediction and evolution of the molecular fitness of SARS-CoV-2 variants: Introducing SpikePro date: 2021-04-12 journal: bioRxiv DOI: 10.1101/2021.04.11.439322 sha: 06161f878de6b0539b2cd22e3c4ab2365a9f3f33 doc_id: 899910 cord_uid: cm22vurn The understanding of the molecular mechanisms driving the fitness of the SARS-CoV-2 virus and its mutational evolution is still a critical issue. We built a simplified computational model, called SpikePro, to predict the SARS-CoV-2 fitness from the amino acid sequence and structure of the spike protein. It contains three contributions: the viral transmissibility predicted from the stability of the spike protein, the infectivity computed in terms of the affinity of the spike protein for the ACE2 receptor, and the ability of the virus to escape from the human immune response based on the binding affinity of the spike protein for a set of neutralizing antibodies. Our model reproduces well the available experimental, epidemiological and clinical data on the impact of variants on the biophysical characteristics of the virus. For example, it is able to identify circulating viral strains that, by increasing their fitness, recently became dominant at the population level. SpikePro is a useful instrument for the genomic surveillance of the SARS-CoV-2 virus, since it predicts in a fast and accurate way the emergence of new viral strains and their dangerousness. It is freely available in the GitHub repository github.com/3BioCompBio/SpikeProSARS-CoV-2. Despite mitigation measures put in place around the world to slow down the fast spreading of the SARS-CoV- use. Thanks to these developments, large-scale vaccine administration is now ongoing throughout the world. 20 Moreover, while the pathogenic mechanisms of the viral infection are still unclear, effective therapeutic agents 21 have been developed. For example, neutralizing antibodies (nAbs) targeting the viral spike protein or human 22 convalescent plasma have been employed in clinical practice by passively transferring them to patients [10] [11] [12] [13] . 23 This therapy generally leads to an improvement of the disease conditions and to a reduction of viral load. 24 The increase in viral immunity at the population level due to infection, vaccination or passive immunization However, the prediction of how SARS-CoV-2 evolves under this selective pressure is far from obvious. 31 Indeed, even though SARS-CoV-2 has a moderate mutation rate compared to other RNA viruses due to its more 32 accurate replication [27] , tracking viral dynamics in the huge space of possible variant combinations (including also 33 deletions and insertions) under the influence of human immunity makes predictions highly challenging. Extensive 34 large-scale monitoring of SARS-CoV-2 evolution and host immunity will help to better understand these issues 35 [27]. 36 In this paper, we performed an extensive computational analysis of the mutational mechanisms that lead 37 to the emergence of SARS-CoV-2 strains with increased fitness, with the aim to better understand the molecular 38 mechanisms that drive viral adaptation and escape from the human immune system. We performed in silico 39 Version April 11, 2021 submitted to 3 of 15 mutation i on the binding affinity DDG nAb i (p) of each nAb/spike protein complex p, and computed their mean 89 value over the 31 complexes from D nAb : where n i is the number of structures that include the mutation i. Indeed, the structures of the nAb/spike protein 91 complexes do not cover exactly the same region of the spike protein. product of three fitness contributions: where f S , f ACE2 and f nAb represent the relative propensities of the mutant virus to be transmitted, to infect the host, 99 and to escape the host's immune system. These propensities are assumed to be higher for spike protein variants . More precisely: where µ S , µ ACE2 , µ nAb , b S , b ACE2 and b nAb are parameters. • We set by definition the fitness value of the wild-type equal to one: The global viral fitness, which takes into account multiple mutations in the spike protein, is defined as the where m correspond to the total number of mutations in the spike protein relative to the wild-type strain. Note 123 that, in doing so, we considered the mutations as independent and discard possible epistatic effects. In its viral evolution, SARS-CoV-2 and our immune system are constantly engaged in what is known as a 127 cat-and-mouse game, where SARS-CoV-2 attempts to increase its fitness by increasing its transmissibility, infectivity 128 and/or to escape from the human immune response. In order to identify mutations in the spike protein that increase or decrease the SARS-CoV-2 transmissibility 150 or infectivity, or that facilitate or block the escape from the protective immunity elicited by the infection, we con- In its viral evolution, SARS-CoV-2 and our immune system are constantly engaged in what is 112 "cat-and-mouse" game, where SARS-CoV-2 tends to increase its fitness by increasing its transmissibility 3.2. SARS-CoV-2 spike protein mutagenesis 127 We first performed a large in silico mutagenesis experiment to study how mutations impact on t as defined in Eqs (2-3). We made here the hypothesis that stability is the major ingredient of protein fitn 120 more precisely, the change in folding free energy caused by mutation of the spike protein (DDG S ), of t 121 protein/ACE2 complex (DDG ACE2 ), and of the spike protein/nAbs complexes (DDG nAb ). In order to identify mutations in the spike protein that increase the SARS-CoV-2 transmissibility or in 123 and/or that lead to the escape from the protective immunity elicited by the infection, we constructed a c Here we estimated the fitness f of the SARS-CoV-2 virus on the basis of a simplified model which take count only the spike protein. More realistic models consider the whole viral genome with the set of 29 ; we is point for a future investigation. We defined the molecular fitness F of a point mutation i as: here f S , f ACE2 and f nAb represent the relative probability of the mutant virus to be transmitted, to infect th d to escape the host's immune system. The virus transmission probability is considered to be augmented utations when the spike protein is stabler (DDG S < 0), the infectivity when the spike protein/ACE2 comp abler (DDG ACE2 < 0), and the viral escape from the immune system when the spike protein/nAbs complex ss stable (DDG nAb > 0). More precisely, we define the f functions of each mutation i as: The parameters have been chosen as: = oice of these f-functions and parameters is justified as follows: Mutations that strongly destabilize spike proteins (DDG S 0 kcal/mol in our conventions) or its bind ACE2 or that stabilize its binding with neutra have a fitness close to zero. Stabilizing mutations (DDG < 0 kcal/mol have a fitness higher than one. To avoid excessively high fi values, we cut the f-functions at DDG = b kcal/mol, with b chosen to be 1; for lower DDG's, f is co and equal to exp[µ + 1] ⇡ 3.86. The values of the PoPMuSiC scores have been shown to be biased towards destabilizing mutations [3 Our prediction pipeline, called SpikePro, is freely available as an easy-to-use c++ program, which needs a 157 variant spike protein sequence in fasta format as input. It outputs the sequence alignment with the reference spike 158 protein (Uniprot code P0DTC2), the list of all point mutations introduced and the predicted overall viral fitness F. It can be downloaded from github.com/3BioCompBio/SpikeProSARS-CoV-2. Moreover, we found a very good agreement between the predicted fitness f S i and the R i rate, as seen in Fig. 181 3.b. Indeed, variants that are predicted to be fitter than the wild type protein, and especially the variants i with 182 f S i > 2, have a high R i rate, which means that they circulate a lot and got fixed during viral evolution. We will 183 deepen this point in Sections 3.6-3.7. It is important to underline that we did not fit any parameters of our model on the SARS-CoV-2 data. Thus, 185 this prediction as well as all the predictions presented in the following sections are truly blind predictions. Table 1 define the fitness contribution f nAb and thus the overall immune escape ability. Table 2 . latter study, a stabilization of the spike protein was measured upon D614G substitution via a strengthening of the 300 S1-S2 subunit interactions, where S1 is the receptor binding subunit containing the RBD and S2 is the membrane 301 fusion subunit. In contrast, this variant was shown to alter neither the binding of the spike protein to ACE2 nor the 302 antibody neutralization, as it is situated outside the RBD [51]. We also correctly reproduced this result, with fitness 303 values of f ACE2 D614G = 1.0 = f nAb D614G ( Table 2 ). The overall predicted fitness is thus F D614G = 3.7. Two other variants, A222V and P681H, show similar albeit less pronounced trends. Our results predict an this variant to be more transmissible and infectious than the wild type but to have no impact on the response of 314 the human immune system. More precisely, we predicted N501Y as improving the stability of the spike protein 315 RBD and its binding affinity for ACE2; the latter property is also suggested by another computational study [54] . No clinical data suggest that N501Y is able to escape from the immune post-vaccination response [55] , which tends 317 to support our prediction results. to the fitness contributions of the variants i related to the stability of the spike protein, its binding affinity for ACE2 and its escape propensity from the host's immune system, respectively, and F i to the total fitness. December 2019 till March 2021, which amounts to about 7.8 ⇥10 5 strains. We subdivided the strains according Note that to predict the future evolution of the fitness F, it is necessary to take into account different parameters 327 such as the varying repertoire of human nAbs and the effect of vaccination. While the fitness contributions f S 328 and f ACE2 are expected to reach a plateau when the spike protein sequence becomes optimal for stability and for 329 binding to ACE2, the cat-and-mouse game played by the virus and its host leads the host to continuously adapt its 330 B-cell repertoire to the new variants of the virus, so that f nAb certainly increases with respect to the old nAbs, but 331 not with respect to the new nAbs. In total, the overall fitness F is expected to plateau after some time, or at least 332 increase less. 333 We analyzed in more detail the evolution of the partial distribution function of the per-month averaged fitness 334 in Fig. 6 .b. In January 2020, the population was dominated by the wild type strain whose fitness F is by definition In the first column there is the variants, in the second their occurrences in the GISAID database, in the third, fourth and fifth column the fitness related to the stability, binding and escaping from the host immune systems and in the last column the total fitness values of the variants. In the first column there is the variants, in the second their occurrences in the GISAID database, in the third, fourth and fifth column the fitness related to the stability, binding and escaping from the host immune systems and in the last column the total fitness values of the variants. In the first column there is the variants, in the second their occurrences in the GISAID database, in the third, fourth and fifth column the fitness related to the stability, binding and escaping from the host immune systems and in the last column the total fitness values of the variants. We thoroughly analyzed and validated SpikePro on a wide series of experimental, epidemiological and clinical 363 data available. Despite the simplicity of the model, the approximations made, and the absence of parameters that 364 were fitted to optimize the accuracy of the predictions, the SpikePro pipeline reproduces well the collected data. Whether the validation is performed on large-scale mutagenesis data, nAb cocktails or polyclonal human sera, 366 whether the comparison involves the fitness of the spike protein, of the spike protein/ACE2 complex, or of a series 367 of spike protein/nAb complexes, the results are very good with correlation coefficients in the 0.3 to 0.5 range. In addition, SpikePro predicts a high overall fitness value for the frequently occurring variants such as the 369 UK, Brazilian or South-African variants and correctly identifies the main fitness contributions. It also reproduces 370 quite well the overall fitness evolution of the SARS-CoV-2 virus over the past pandemic year. It has to be emphasized that the SpikePro model, besides being able to reproduce known results, has a true 372 prediction potential in describing and interpreting the effect of new spike protein variants that could be fixed in 373 the near future and the future SARS-CoV-2 evolution, owing to the physical description of the fitness in terms of 374 free energy contributions, which are estimated using the well-known structure-based PoPMuSiC and BeAtMuSiC 375 predictors [33, 34] . Despite the progress we made towards a better understanding of the molecular mechanisms underlying 377 the SARS-CoV-2 fitness, we made some approximations in the construction of our model which we will try to 378 relax in future studies. For example, we did not take into account possible amino acid deletions or insertions 379 in the spike protein, although they certainly influence the viral fitness. It would also be interesting to take into Neutralizing antibodies for the treatment of COVID-19 Convalescent plasma antibody levels and the risk of death from covid-19 Viral mechanisms of immune evasion SARS-CoV-2 evolution and vaccines: cause for concern? E.; others. Molecular 441 determinants and mechanism for antibody cocktail preventing SARS-CoV-2 escape Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody 446 recognition Prospective 448 mapping of viral mutations that escape antibodies used to treat COVID-19 Recurrent 450 deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) 453 lineage with multiple spike mutations in South Africa SARS-CoV-2 escape in vitro from a highly neutralizing COVID-19 convalescent plasma SARS-CoV-2 501Y. V2 escapes neutralization by South African COVID-19 donor plasma A.; others. mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants SARS-CoV-2 B.1.1.7 escape from mRNA vaccine-elicited neutralizing antibodies. medRxiv 2021 Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-469 mediated immunity Genetic Variants of SARS-CoV-2-What Do They Mean? The Protein Data Bank Structure, function, and antigenicity of the 474 SARS-CoV-2 spike glycoprotein SWISS-MODEL: homology modelling of protein structures and complexes CoV-2 479 spike receptor-binding domain bound to the ACE2 receptor CoV-AbDab: the coronavirus antibody database Fast and accurate predictions of protein stability 482 changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 BeAtMuSiC: prediction of changes in protein-protein binding 485 affinity on mutations Viral fitness: definitions, measurement, and current insights. Current opinion in virology Emergence of a Highly Fit SARS-CoV-2 Variant Structural basis of receptor recognition 490 by SARS-CoV-2 Relationship between protein thermodynamic constraints and variation of evolution-492 ary rates among sites Symmetry principles in optimization problems : an application 494 to protein stability prediction Quantification of biases in predictions of protein stability changes 496 upon mutations Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and 499 ACE2 binding Data, disease and diplomacy: GISAID's innovative contribution to global health Measuring the activity of protein variants on a large scale using deep mutational 503 scanning Innate immune evasion strategies of DNA and RNA viruses Complete mapping of viral escape from neutralizing antibodies Viral evasion and subversion of pattern-recognition receptor signalling Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail Structural basis for translational shutdown and immune evasion by the Nsp1 protein of 541 SARS-CoV-2