key: cord-0922539-akpzqctu authors: Alenquer, Marta; Ferreira, Filipe; Lousa, Diana; Valério, Mariana; Medina-Lopes, Mónica; Bergman, Marie-Louise; Gonçalves, Juliana; Demengeot, Jocelyne; Leite, Ricardo B.; Lilue, Jingtao; Ning, Zemin; Penha-Gonçalves, Carlos; Soares, Helena; Soares, Cláudio M.; Amorim, Maria João title: Amino acids 484 and 494 of SARS-CoV-2 spike are hotspots of immune evasion affecting antibody but not ACE2 binding date: 2021-05-14 journal: bioRxiv DOI: 10.1101/2021.04.22.441007 sha: f646095b06de09a7789cfe7629dc8ae59ea54ba8 doc_id: 922539 cord_uid: akpzqctu Understanding SARS-CoV-2 evolution and host immunity is critical to control COVID-19 pandemics. At the core is an arms-race between SARS-CoV-2 antibody and angiotensin-converting enzyme 2 (ACE2) recognition, a function of the viral protein spike. Mutations in spike impacting antibody and/or ACE2 binding are appearing worldwide, with the effect of mutation synergy still incompletely understood. We engineered 25 spike-pseudotyped lentiviruses containing individual and combined mutations, and confirmed that E484K evades antibody neutralization elicited by infection or vaccination, a capacity augmented when complemented by K417N and N501Y mutations. In silico analysis provided an explanation for E484K immune evasion. E484 frequently engages in interactions with antibodies but not with ACE2. Importantly, we identified a novel amino acid of concern, S494, which shares a similar pattern. Using the already circulating mutation S494P, we found that it reduces antibody neutralization of convalescent and post-immunization sera, particularly when combined with E484K and N501Y. Our analysis of synergic mutations provides a landscape for hotspots for immune evasion and for targets for therapies, vaccines and diagnostics. One-Sentence Summary Amino acids in SARS-CoV-2 spike protein implicated in immune evasion are biased for binding to neutralizing antibodies but dispensable for binding the host receptor angiotensin-converting enzyme Abstract: Understanding SARS-CoV-2 evolution and host immunity is critical to control COVID-19 pandemics. At the core is an arms-race between SARS-CoV-2 antibody and angiotensinconverting enzyme 2 (ACE2) recognition, a function of the viral protein spike. Mutations in spike impacting antibody and/or ACE2 binding are appearing worldwide, with the effect of mutation synergy still incompletely understood. We engineered 25 spike-pseudotyped lentiviruses 5 containing individual and combined mutations, and confirmed that E484K evades antibody neutralization elicited by infection or vaccination, a capacity augmented when complemented by K417N and N501Y mutations. In silico analysis provided an explanation for E484K immune evasion. E484 frequently engages in interactions with antibodies but not with ACE2. Importantly, we identified a novel amino acid of concern, S494, which shares a similar pattern. Using the 10 already circulating mutation S494P, we found that it reduces antibody neutralization of convalescent and post-immunization sera, particularly when combined with E484K and N501Y. Our analysis of synergic mutations provides a landscape for hotspots for immune evasion and for targets for therapies, vaccines and diagnostics. 15 One-Sentence Summary: Amino acids in SARS-CoV-2 spike protein implicated in immune evasion are biased for binding to neutralizing antibodies but dispensable for binding the host receptor angiotensin-converting enzyme 2. 20 Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is the virus responsible for the pandemic of coronavirus disease 2019 (COVID-19) 1 that has caused more than 140 million infections and provoked the death of over 3 million people (as of April 18, 2021). A pertinent question is whether viral evolution will permit escaping immunity developed by natural infection 5 or by vaccination. The answer to this multi-layered and complex question depends on the type, duration and heterogeneity of the selective pressures imposed by the host and by the environment to the virus, but also on the rate and phenotypic impact of the mutations the virus acquires. The central question is whether the virus is able to select mutations that escape host immunity while remaining efficient to replicate in the host 2-4 . Identifying hotspots of viral immune evasion is 10 critical for preparedness of future interventions to control COVID-19. SARS-CoV-2 is an enveloped virus, characterized by displaying spike proteins at the surface 5 . Spike is critical for viral entry 6 and is the primary target of vaccines and therapeutic strategies, as this protein is the immunodominant target for antibodies [7] [8] [9] [10] [11] . Spike is composed of S1 and S2 subdomains. S1 contains the N-terminal (NTD) and receptor-binding (RBD) domains, and the S2 15 contains the fusion peptide (FP), heptad repeat 1 (HR1) and HR2, the transmembrane (TM) and cytoplasmic domains (CD) 12 . S1 leads to the recognition of the angiotensin-converting enzyme 2 (ACE2) receptor and S2 is involved in membrane fusion 6, 13, 14 . Spike binds to ACE2 displayed at the host cell surface, followed by proteolytic cleavage to yield S1 and S2 fragments 6 . Interestingly, the spike protein oscillates between distinct conformations to recognize and bind different host 20 factors 15, 16 . In the prefusion state, the RBD domain alternates between open ('up') and closed ('down') conformations 5, 17 . For binding to the ACE2 receptor, the RBD is transiently exposed in the 'up' conformation. However, most potent neutralizing antibodies elicited by natural infection 9,11,18 and by vaccination 7,10 bind spike in the closed ('down') conformation. Understanding how each amino acid of spike dynamically interacts with either antibodies or ACE2 receptor may reveal the amino acid residues that are more prone to suffer mutations driving immunological escape without affecting viral entry. Viral mutations may affect host-pathogen interactions in many ways: affect viral spread, impact 5 virulence, escape natural or vaccine-induced immunity, evade therapies or detection by diagnostic tests, and change host species range 19, 20 . Therefore, it is critical to survey circulating variants and assess their impact in the progression of SARS-CoV-2 dynamics in the population, in real time. Being a novel virus circulating in the human population, SARS-CoV-2 evolution displayed a mutational pattern of mostly neutral random genetic drift until December 2020. In fact, the D614G 10 mutation in the spike protein was amongst the few epidemiologically significant variants, resulting in increased transmissibility without affecting the severity of the disease 21,22 . However, from the end of 2020, three divergent SARS-CoV-2 lineages evolved into fast-spreading variants that became known as variants of concern B.1.1.7 [United Kingdom (UK)] 23 RBD of spike may severely affect viral replication and host immune response since, as explained above, this region is responsible for binding to ACE2 and is immunodominant. Three mutations D614G, N501Y and L452R are associated with an increased ACE2 binding affinity in humans and increased viral transmission 21, 22, 29, 39 . Y453F was associated with a mink-to-human adaptation in cluster 5 40 . L452R and N439K are associated with a modest reduction in antibody-dependent neutralization by immune sera, whilst variants containing E484K display a reduction that is moderate to substantial 41 . Variants, however, contain several other mutations. Interestingly, some mutations are convergent whilst others are unique in lineages 42 These raise concerns on whether a reduction in vaccine efficacy could result in re-infections and delay the reduction in mortality caused by circulating SARS-CoV-2. There is, therefore, the urgent need to develop mutation-tolerant vaccines (and biopharmaceuticals) targeting the skype protein. 20 In this sense, it is critical to determine what makes mutations well tolerated for the viral lifecycle whilst efficiently escaping immunity. In this work, we engineered spike-pseudotyped viruses and analyzed individual and combined mutations that convergently appeared in different lineages, over 6 time, and across several geographic locations, to determine their synergetic effects on neutralizingantibody responses. We then used available structures of the complexes spike-antibodies and spike-ACE2 to determine the distance between each amino acid residue in the complex and the frequency of interactions of each amino acid residue of the RBD with either ACE2 or antibodies. We found a moderate reduction in neutralizing potency of sera against SARS-CoV-2 spike-5 pseudotyped lentivirus containing single mutations at position 484 (E484K) and 494 (S494P). Interestingly, the reduction became substantial with the addition of synergetic mutations K417N/N501Y or E484K/N501Y to E484K or S494P, respectively. In addition, we show that the amino acid residues at positions 484 and 494 frequently engage in binding to antibodies but not in binding to the receptor ACE2. Our work suggests that the amino acid residues at the RBD that are 10 more dispensable for binding to ACE2 can more promptly evolve immune escape mutants if the amino acid residue substitution severely alters the binding specificity of the antibody. Geographical distribution of mutations and associated prevalence is important, not only to track viral evolution and dynamics of lineages circulating worldwide, but also to identify recurrent mutations and ultimately tailor preventive measures to contain the virus. We used the pipeline 5 https://github.com/wtsi-hpag/covidPileup [github.com] to trace single nucleotide polymorphisms (SNPs) in a given country or region. We explored a comprehensive data set containing 416,893 sequences of SARS-CoV-2 downloaded from GISAID to unravel the dispersal history and dynamics of spike mutations observed in SARS-CoV-2 viral lineages in the geographic locations in different continents divided as Australia (AUS), UK, European Union (EU), SA and USA, and 10 integrated the data with viral circulation in these areas, measured by the number of new cases per week ( fig. S1 ). Most regions were selected based on the appearance of specific lineages and variants of concern. Australia was included because of the restrictive measures of entering the country, closeness to Asia, geographical dispersion and for constituting an almost independent evolutionary landscape. These geographical locations were selected based on availability of 15 sequenced samples during the period analyzed (from the 29 th of December 2019 until week 5 of 2021). Based on the reference genome NC_45512 (Wuhan-Hu-1, 29903 bases), the pipeline detected 3,766,497 SNPs and 2209 indels covering 25443 genome locations. For this study, we selected only mutations in spike because this protein is responsible for viral recognition of ACE2 receptor 6,12,14,52 and for inducing neutralizing antibodies in the host 10 . Of note, antibodies with 20 neutralizing capacity were shown to bind to the NTD, but mostly to the RBD of spike 2,11,50,53,54 . Many reports showed how single mutations change host neutralizing capacity and demonstrated that variants of concern B.1.351, P.1., or others including the mutation E484K, are able to escape immunity 31,35,41,55 . Whilst mutations on the RBD may sterically block binding of spike to ACE2, other mutations in spike may affect the conformation of the protein thereby impacting antibody recognition. Given this, we broaden our selection of mutations to include variants of concern and of interest (up to all mutations in lineages B.1.1.7, B.1.351, P.1., B.1.427/ B.1.429, table S1), high prevalence ( fig. S2 ), or convergent appearance in different lineages over time. In terms of 5 prevalence, the S477N mutation was accompanied by a peak of incidence in Australia from week 23-36 of 2020 and reached a prevalence of 20% worldwide (figs. S1 and S2). The L18F alone or combined with A222V was also accompanied by a peak in incidence in the UK and EU in weeks 38-45 of 2020, and in South Africa from week 42-to present and reached 20% incidence worldwide. Other single mutations include L452R and E484K, with a global prevalence of up to 10 6% worldwide; D839Y with peaks up to 2.5% during weeks 7-19 of 2020; Q675H, from week 23present; deletion of amino acid residues at positions 69/70 combined with Y543H from week 39-51 of 2020; and D936Y from week 9-37 of 2020 (figs. S1 and S2). The phenotypic evaluation of SARS-CoV-2 mutations comprises several layers and includes the interaction of the virus with the host, disease severity and epidemiology. In this study, we evaluated evasion of host neutralizing 15 antibodies. All mutations engineered in this study, and the domain in spike in which they occur, are highlighted in Fig. 1B and C, detailed in table S1 and shown at the structural level in figs. S3-S5. In addition, to understand how interactions in spike collectively change the RBD, we evaluated individual or a combination of mutations on spike associated with variants of concern (up to all defining 20 mutations, shown in table S1 and Fig. 2A -D) and have overall produced 25 different spikepseudotyped viruses (listed in table S1). The mutations in the RBD region used in this study are highlighted also in the open ('up') conformation (Fig. 1C) , showing the different conformational changes that each amino acid residue undergoes in this region. The different spike versions served to engineer lentiviral spike-pseudotyped particles through a three-plasmid approach: a plasmid harboring the lentiviral genome, lacking Gag-Pol and envelope proteins, and encoding a GFP reporter; a Gag-Pol expression plasmid; and a plasmid expressing 5 SARS-CoV-2 spike protein (or VSV-G protein, as control). Cells infected by these lentiviruses express GFP and can be easily quantified with high-throughput analytical methods. For the neutralization assay, we developed a 293T cell line expressing the human ACE2 receptor ( fig. S6 ). The specificity of the assay was assessed using an anti-spike polyclonal antibody and an 10 RBD peptide that competes with the spike protein for viral entry in SARS-CoV-2 spikepseudotyped but not in VSV G-pseudotyped lentivirus that was used in parallel ( fig. S7 ). We characterized the serum antibodies from 12-16 health care workers, infected with SARS-CoV-2 during spring/summer 2020, for their ability to neutralize our spike-pseudotyped mutant lentiviruses. These sera were tested by ELISA for their anti-spike content and classified as low (up 15 to endpoint titer 1:150), medium (endpoint titer 1:450) and high (endpoint titer from 1:1350) (table S2) . We also used 4 negative sera as control (one pre-pandemic and 3 contemporary). Representative neutralization curves of the WT (614G) and original (614D) strains, as well as variants of concern B.1.1.7, B.1.351 and P.1 (containing all defining mutations, listed in table S1) are shown in Fig. 2E and the complete set of neutralization curves in fig. S8 . For each 20 neutralization curve, we calculated the half-maximal neutralization titer (NT50), defined as the reciprocal of the dilution at which infection was decreased by 50% ( Fig. 2F and fig. S9 ). Consistent with the literature, the ELISA titer was associated with the neutralizing titer of sera (table S2) observed when Δ69-70 was added (Fig. 2H) , and also against L452R mutant (0.48±0.23) that is associated with the most recent variants of concern B.1.427/9, as reported in 60 . Importantly, we tested BNT162b2-elicited plasmas (collected after the first and second dosage of the Pfizer-BioNTech vaccine) for their capacity to neutralize the spike-pseudotyped particles expressing WT spike protein or mutants bearing all defining mutations of the variants of concern B.1.1.7, B.1.351 and P.1. Amongst 10 individuals, plasma collected 12 days after the administration of the first vaccine dose displayed anti-spike IgG titers from 1:450 to 1:12150 (table 5 S3), and only one of them exhibited neutralizing capacity (table S3 and fig. S10 ). BNT162b2elicited plasma collected 12 days after the administration of the 2 nd dose had very high levels of anti-spike IgG (titers 1:36450 -1:109350), higher than any sera collected upon natural infection, and all exhibited neutralizing activity against the variants of concern B. 1.351 and P.1, although with significantly lower efficiency than against the wild type or the B.1.1.7 variant. 10 Collectively, these data show that the spike-pseudotyped lentiviral particles used in this study, in which GFP is the output and is measured by high throughput and cheap methodologies, is suitable to quantitatively assess how single and synergetic mutations in full-length spike affect neutralizing-antibody responses. 15 To determine which spike protein residues are more frequently targeted by neutralizing antibodies, we analyzed 57 structures retrieved from the protein data bank (PBD) containing the spike protein (or only the RBD region) bound to antibodies (table S4). Our results revealed that the receptor binding motif (RBM), in RBD, is the region with the highest frequency of contacts ( Fig. 4A and B), consistent with what has been documented 11,61-64 . This region is important for the SARS-CoV-20 12 antibodies to this region would hinder ACE2 binding and, consequently, the entry of the virus in the cell. In addition to contacts with the RBD, we also identified other regions of the spike protein with antibody binding sites. The antibodies FC05 and DH1050.1 bind to segments in the N-terminal region (between residues 143-152 and 246-257), region identified as an antigenic site 54 , and 2G12 5 binds to residues 936-941 in the stalk ( Fig. 4B and C) . To choose the cut-off above which to select the most important residues for antibody binding, we built a histogram (fig. S11). Three distinct groups appear: the first includes residues that have an interaction probability below 20%, the second includes residues whose interaction is between 20% and 45% and the last corresponds to frequent binders (interaction probability over 45%). Within 10 this last group, we find Y449, L455, F456, E484, F486, N487, Y489, Q493, S494, and Y505 residues ( Fig. 4A and B ). More than half of these residues have hydrophobic side chains, showing that, within the batch of antibodies we studied, the most common binding mode is through hydrophobic contacts. These residues are important for antibody binding, which means that mutations within this group may enable the virus to escape antibodies from patients previously 15 exposed to the WT or another variant, or people vaccinated using WT sequences of the S protein. In fact, prior studies have shown that mutations on most of these residues influence antibody escape 2,67 . For residues L455, F456 and F486, mutations for less hydrophobic residues have been shown to reduce binding by polyclonal antibodies 67 and mutants at site 487 escape human monoclonal antibodies COV2-2165 and COV2-2832(2, 63). Interestingly, our results also show 20 that there is a high prevalence of antibodies binding to E484 (~50%, Fig. 4B ). To be able to escape antibodies while maintaining the ability to efficiently infect cells, the mutations introduced must destabilize the interaction with antibodies while keeping a high binding affinity for ACE2. Thus, we reason that mutations occurring in residues that are relevant for antibody binding but not very relevant for the interaction with ACE2 can be advantageous for the virus. To gain further insights into this subject, we performed molecular dynamics (MD) simulations of the RBD bound to ACE2. These allowed us to determine which RBD residues are relevant for binding to ACE2, and the persistence of these interactions (Fig. 4D ). Combining the 5 information on antibody and ACE2 binding allows us to predict which are the most relevant amino acid residues (high affinity for antibodies and low affinity for ACE2). Using these criteria, two residues stand out: E484 and S494 (Fig. 4E , IV quadrant circled in red). These results are consistent with the evidence that E484 is an important mutation site and bring to light a new relevant site: S494. Mutations in S494 have been found in circulating strains, and the S494P mutation was 10 shown to reduce the binding by polyclonal plasma antibodies 67 whilst having no 4 or modest effect 68 in RBD-ACE2 binding. To elucidate the role of this mutation in antibody neutralization, we used our neutralization assay and found that mutation S494P alone leads to a reduction in neutralization by covalence sera that is significant ( Some residues that are important for antibody binding are also involved in frequent interactions with ACE2. Seven of these residues remained bound to ACE2 throughout the whole simulation in 5 all the replicates: L455, F456, F486, N487, Y489, Q493 and Y505 (Fig. 4D , top residues in quadrant II). This group includes residue F456, which was found to be a hotspot for escape mutations 67 . Although this may seem contradictory, the same study shows that mutations in this residue were very rarely found in nature and, because of lack of prevalence, were not analyzed 67 . We hypothesize that the low mutation rate for this site may be due to the high relevance of this 10 residue for RBD-ACE2 binding. In fact, deep mutational analysis shows that when mutating F546 to any amino acid residue besides its SARS-CoV-1 counterpart (a leucine), the ACE2 binding affinity is reduced 4 . Even though a viral strain with a single mutation at the site 456 could escape antibodies, it would be unfit to subsist in nature, due to a reduced ability to infect host cells. Together, the analysis presented here is supported by experimental evidence on mutation-driven 15 binding to ACE2 and escape to antibodies 2-4 , as well was with the emergence of natural variants 31,35,41,55,70-72 , and may be used to predict mutation hotspots. It is critical to understand how SARS-CoV-2 will evolve and if it will escape host immunity, which could pose challenges to vaccination and render therapies ineffective. 20 We developed spike-pseudotyped lentiviral particles for high-throughput quantitation that express GFP upon entering cells, contributing to the toolkit of neutralizing assay methodologies 73-75 . Our method is suitable to assess the neutralization activity of sera/plasma from individuals infected naturally or vaccinated, and to screen for antiviral drugs that block viral entry, such as therapeutic antibodies, in biosafety level 2 settings, which greatly facilitates the procedure and broadens its usage. It can be easily adapted to include single and multiple mutations, as observed in the present work for sera neutralizing activity (Figs. 2). Our results with BNT162b2-elicited plasma (Fig. 3) agree with previous publications showing resistance to neutralization by B.1.351 and P.1 lineages 5 31,32,36,50,76 and validate our neutralization assay. The fact that only mutations containing the E484K substitution promote immune escape, although this effect increases with synergic mutations that improve viral binding to ACE2 (K417N and N501Y), may be a consequence of lack in immunological selective pressure, a recognized driver of evolution 58,77-79 , as a large proportion of the population remains susceptible to SARS-CoV-2 infection. However, the emergence of this 10 escape mutant suggests that the continued circulation of the virus may, in the future, impose further immunological constraints and result in viral evolution, as seen for influenza A virus 58,80 . How changes in SARS-CoV-2 will impact the circulation of the virus is not known, and the future will also elucidate whether vaccination will shape evolution of SARS-CoV-2 and permit reinfections. At the moment, reinfections are considered rare events 81,82 , but reports that SARS-CoV-2 escape 15 mutants were shown to drive resurgence of cases upon natural infection 26,83 are indicative that viral dynamics in the population may change 29,84 . In agreement, in this study and in reports by others, a reduction in neutralization activity by vaccination elicited plasma against variants of concern B.1.351 and P.1 was observed 31,32,36,50,76 . Therefore, the development of methods able to predict hotspots of immune evasion are in demand. The surveillance of circulating strains across 20 time, the evaluation of the type and duration of host immune responses upon natural infection and vaccination, and the identification of antibodies that efficiently control SARS-CoV-2 are essential measures to understand and control SARS-CoV-2 infection and host response 85-87 . We argue that the structural information on complexes between antibodies, or ACE2, and spike variants may be integrated with mutational maps and their phenotypic characterization 2-4 to predict hotspots of immune evasion. In this work, we analyzed by neutralization assays how a full-length spike carrying single or multiple mutations dispersed throughout the protein affected the conformation of the RBD. With this assay, we probe how antibodies from sera of infected patients or plasma collected from 5 vaccinated people after administration of the 1 st and 2 nd dose of the vaccine block viral entry. We used structural information on the protein complexes spike-antibodies and spike-ACE2 to define the non-overlapping frequency of interactions using a cut-off value of 4.5 Å distance to define a contact. Given the idea that only a few amino acid residues are involved in protein-protein interfaces 88 , including in antibody-antigen recognition, we aimed at identifying which amino acid 10 residues in the RBD may evolve antibody escape mutants without affecting viral entry. With this approach, we identified two amino acid residues at the RBD -positions 484 and 494 (Fig. 4E )that frequently engage in interactions with antibodies but not with ACE2. We observed that the E484 is relevant for antibody binding (Fig. 2G ), in agreement with our and previous experimental results, and with the occurrence of this mutation in two variants that are becoming highly prevalent 15 and were shown drive resurgence of infection in sites with high levels of seroconversion 2,36,57,71,72,83 . Additionally, we found that mutations in residue 494 may be problematic, either alone or combined with synergetic mutations, as it reduces neutralization competency of convalescent sera, and thus facilitates antibody escape without substantially altering the affinity for the ACE2 binding ( Fig. 4E-G and fig. S12 ). The mutation S494P has been found in nature with a prevalence 20 of 0.81% of all sequenced genomes (week 11 of 2021, fig. S2 ) and has already been observed in combination with mutation E484K (reported on the 22 nd October 2020). Of note, current prevalence of B.1.351 and of P.1 is 1.9% and 0.47%, respectively, being in the same range as S494P. We posit that the prevalence of mutations in amino acids 484, 494 and 501 will increase with viral circulation and/or spike seroconversion and should be surveilled worldwide. We observed a similar pattern relative to the substitution L454R/Q in India, in which the E484K mutation was acquired posteriorly. At the moment, however, with the majority of the population still susceptible to viral infection, mutations that increase viral transmission, such as N501Y 84 , have a selective advantage. 5 Our approach has some caveats. Regarding the neutralization assays, and despite spike being the immunodominant protein targeted by antibodies 7-11 , other viral proteins may contribute, even if in a small proportion to neutralization activity in vivo, which we are unable to detect using spikepseudotyped particles. In addition, the sera/plasma we used was obtained at a fixed time interval. Future experiments should repeat the analysis using sera/plasma of people infected with known combination with data from the molecular simulation and from experiments helps to predict and explain the emergence of new mutations. In conclusion, our analysis of the spike protein-antibody contacts revealed that (within the available data set) the receptor binding motif (RBM), in the receptor binding domain (RBD), is an important region for antibody binding, which makes sense, since antibody binding to this region We have used a pipeline for COVID-19 variation analysis using whole genome sequences. CovidPileup can be downloaded from https://github.com/wtsi-hpag/covidPileup. The pipeline includes SNP and indel calling and tracing specific SNPs in a given country or region. By 5 incorporating metadata, it is also possible to assign tags such as collecting time, age and sex to each identified SNP or indel. To allow quick search and data processing, a multi-layered indexed data structure has been designed. We assume the size of the reference is G and the number of (primers in table S7 ). Lentiviral reporter plasmid pLEX-GFP was produced by PCR amplifying GFP from pEGFP-N1 (primers in Table 1) and cloning the insert into BamHI-XhoI restriction sites in the multi-cloning site of pLEX.MCS 5 vector. For lentiviral plasmid pLEX-ACE2 production, human ACE2 coding sequence was amplified from Huh7 cDNA and cloned into pLEX.MCS, using XhoI and MluI restriction sites (primers in table S7). Production of 293T cells stably expressing human ACE2 receptor 10 To produce VSV-G pseudotyped lentiviruses encoding the human ACE2, 293ET cells were transfected with pVSV-G, psPAX2 and pLEX-ACE2 using jetPRIME (Polyplus), according to manufacturer's instructions. Lentiviral particles in the supernatant were collected after 3 days and were used to transduce 293T cells. Three days after transduction, puromycin (Merck, 540411) was added to the medium, to a final concentration of 2.5 μg/ml, to select for infected cells. Puromycin 15 selection was maintained until all cells in the control plate died and then reduced to half. The 293T-Ace2 cell line was passaged six times before use and kept in culture medium supplemented with 1.25 μg/ml puromycin. ACE2 expression was evaluated by flow cytometry (fig. S6 ). Production and titration of spike pseudotyped lentiviral particles 20 To generate spike pseudotyped lentiviral particles, 3x10 6 293ET cells were co-transfected with 8.89ug pLex-GFP reporter, 6.67ug psPAX2, and 4.44ug pCAGGS-SARS-CoV-2-S WT or mutants (or pVSV.G, as a control), using jetPRIME according to manufacturer's instructions. The virus-containing supernatant was collected after 3 days, concentrated 10 to 20-fold using Lenti- Flow cytometry was performed as in 89 . In brief, HEK 293T and 293T-ACE2 cells were prepared for flow cytometry analysis by detaching from the wells with trypsin, followed by fixation with 4% paraformaldehyde). For analysis of ACE2 expression, cells were stained with a primary antibody against ACE2 (4µg/ml, R&D Systems, catalog no. AF933) followed by a secondary Human convalescent sera and post-vaccination plasma/serum 15 Venous blood was collected by standard phlebotomy from health care providers who contracted COVID-19 during spring/summer 2020, as tested by RT-PCR on nasopharyngeal swabs. All participants provided informed consent to take part in the study. This study "Fatores de susceptibilidade genética e proteçao imunológica à COVID-19" was approved on the 25 th of May 2020 by the Ethics committee of the Centro Hospitalar Lisboa Ocidental, in compliance with the 20 Declaration of Helsinki, and follows international and national guidelines for health data protection. Serum was prepared using standard methodology and stored at -20ºC. Peripheral blood from health care workers was collected by venipuncture into EDTA tubes at day Plasma was collected to cryotubes and stored at -80°C ultra-low freezer until subsequent analysis. Direct ELISA was used to quantify IgG anti-full-length Spike in convalescent sera. The antigen 10 was produced as described in 90 . The assay was adapted from 91 and semi-automized to measure IgG in a 384-well format, according to a protocol to be detailed elsewhere. For titer estimation, sera were serially diluted 3-fold starting in a 1:50 dilution and cut off was defined by pre-pandemic sera (mean + 2 standard deviation). ELISA assay on post-vaccination plasma and serum was performed based on the protocol 92 and 15 modified as described in Gonçalves et al. 93 . Briefly, 96 well plates (Nunc) were coated with 50 µl of trimeric spike protein at 0.5 µg/mL and incubated overnight at 4°C. On the following day, plates were washed three times with 0.1% PBS/Tween20 (PBST) using an automatic plate washer (ThermoScientific). Plates were blocked with 3% bovine serum albumin (BSA) diluted in 0.05% PBS/T and incubated 1h at room temperature. Samples were diluted using 3-fold dilution series 20 starting at 1:50 and ending at 1:10,9350 in 1% BSA-PBST/T and incubated 1h at room temperature. Plates were washed three times as previously and goat anti-human IgG-HRP secondary antibodies (Abcam, ab97215) were added at 1:25,000 and incubated 30 min at room temperature. Plates were washed three times and incubated ~7min with 50 µl of TMB substrate (BioLegend). The reaction was stopped with 25µl of 1M phosphoric acid (Sigma) and read at 450nm on a plate reader (BioTek). Each plate contained 6 calibrator samples from two high-, two medium-, and two low-antibody producer from adult individuals collected at Hospital Fernando Fonseca that were confirmed positive for SARS-CoV-2 by RT-PCR from nasopharyngeal and/or oropharyngeal swabs in a laboratory certified by the Portuguese National Health Authorities 93 In order to determine which residues of the SARS-CoV-2 spike protein contribute the most for the binding of antibodies, we studied 57 PDB structures containing the S protein (or only the RBD 10 region) bound to antibodies (Table S1 ). These complexes were chosen based on their availability in the PDB repositorium 94 and are all neutralizing antibodies. We first used the MDAnalysis library 95, 96 to identify the residues of the antibody and the spike protein located in the interface between the two proteins. This was done for all PDB structures using in-house Python scripts. To determine the relevance of each spike protein/RBD residue in the binding, a distance cut-off value 15 of 4.5 Å was applied as a criterion (i.e., a contact is observed when two residues have a minimum distance lower than 4.5 Å). Finally, the numerical python library (NumPy 97 ) was used to calculate the frequency of contact for each spike protein residue with an antibody residue, from the sample of PDB structures analyzed. 20 MD simulation of the RBD-ACE2 complex Molecular dynamics (MD) simulations of the RBD bound to the ACE2 protein were performed with the GROMACS 2020.3 package 98 , using the Amber14sb 99 force field, starting from the 6m0j structure 12 , in a truncated dodecahedron box filled with water molecules (minimum of 1.2 nm between protein and box walls). The TIP3P water model 100 was used and the total charge of the system (-23, including the constitutive Znand Clions bound to ACE2) was neutralized with 23 Na + ions. Additional Na + and Cl + ions were added to the solution to reach an ionic strength of 0.1M. The system was energy-minimized using the steepest descent method for a maximum of 50000 steps using position restraints on the heteroatom positions by restraining them to the 5 crystallographic coordinates using a force constant of 1000 kJ/mol in the X, Y and Z positions. Before performing the production runs, an initialization process was carried out in 5 stages of 100 ps each. Initially, all heavy-atoms were restrained using a force constant of 1000 kJ/mol/nm, and at the final stage only the only C-α atoms were position-restrained using the same force constant. In the first stage, the Berendsen temperature algorithm 101 was used to initialise and maintain the 10 simulation at 300 K, using a temperature coupling constant of 0.01 ps, without pressure control. method, using a grid spacing of 0.12 nm, with a cubic interpolation. The neighbor list was updated every twenty steps and the cutoff scheme used was Verlet with 0.8nm as the real space cut-off radius. All bonds were constrained using the LINCS algorithm 106 . The system was simulated for 8 µs in 5 replicates (to a total of 40 µs). In order to determine the residues that contribute the most for the interaction between RBD and 5 ACE2, and the persistence of these interactions, we performed a contact analysis throughout the simulation. We started by eliminating the first equilibration µs of all replicates. The MDAnalysis library 95, 96 was then used to pinpoint the residues of the RBD that are in contact with the ACE2 protein. A distance cut-off value of 4.5 Å was applied as a criterion (i.e., a contact is observed when two residues have a minimum distance lower than 4.5 Å). Finally, we determined the 10 percentage of time for which a given RBD residue is at less than 4.5 Å of ACE2. Homology-based models of all the variants analyzed in this study were generated using the software Modeller 107 , version 9.23, using the structure of the wild type enzyme (PDB code: 6XR8) 15 15 as a template. The protocol used only optimizes the atoms belonging to the mutated residues and the residues that are located within a 5 Å radius from these residues, maintaining the remaining atoms fixed with the coordinates found in the template structure. The optimization parameters were Competing interests: Authors declare that they have no competing interests. Data and materials availability: All data are available in the main text or the supplementary 5 materials. Figs. S1 to S14 Tables S1 to S7 Figs. S1 to S14 15 Tables S1 to S7 Residues 18 and 20 are in an exterior loop. The mutation to L18F enables a pi-stacking interaction with F127, which in the WT also interacts with L18 through hydrophobic contacts. The mutation 5 T20N is not predicted to have a strong impact since T20 does not form specific interactions. Residue D138 sits in a semi-exposed loop and forms a salt bridge with R21, which is abolished when this residue is mutated to Y. (D) Residue 26 is in an exterior loop. Changing P26 to S might make the end of this loop more flexible, since prolines tend to confer rigidity to proteins. R190 is in a beta-sheet and can interact with H207 through pi-stacking, thus the mutation to S may 10 destabilize this interaction and the secondary structure. (E) H49 is in an interior beta-sheet and interacts with R44 through pi-stacking, which is maintained upon the mutation to Y. (F) H69 and V70 are in a loop that has the appropriate features to be targeted by antibodies, since it is exposed and moderately hydrophobic. If this is the case, the deletion of these residues may impact the interaction with antibodies. (G) D80 is located at the tip of a semi-exposed beta-strand and forms (C) N439 sits in an exposed helix turn of the RBD and forms a hydrogen bond with S45, which is 5 predicted to be lost with the mutation to K. This residue is not involved in RBD binding to ACE2. (D) L452 is at the tip of a short-exposed beta-strand of the RBD and does not interact with ACE2. Its mutation from hydrophobic L to the positively charged residue R may lead to changes in antibody interaction, since it is exposed. (E) Y453 is in the same B-sheet has L453 and interacts with H34 from ACE2 through pi-stacking, which can be maintained in the mutant Y453R. (F) 10 S477 is in an exposed loop of the RBD, which often interacts with antibodies. S477 does not interact with ACE2. The mutation S477N is not predicted to have a strong effect since both residues are polar. (G) E484 is in the same RBD loop as S477 and is important for antibody interaction, but not for binding to ACE2. The E484K mutation is expected to have a strong effect since it converts a negatively charged into a positively charged residue, which can significantly 15 alter the RBD interaction antibodies. (H) N501Y is in the receptor binding motif and interacts with ACE2. The mutation to Y decreases the hydrophilicity and will enable the formation of pi-stacking interaction with ACE2 residues. (I) S494 sits at the tip of a beta-strand in the receptor binding motif. Its mutation to P will likely destabilize the secondary structure. is in an internal alpha-helix and its mutation to I is not predicted to have a significant impact. (E) Q675 is present in a beta-sheet and the mutation to H is not predicted to have a significant effect. (F) Residue P681 sits in an exposed loop, near the S1 cleavage site. The mutation to H may make 10 this loop more flexible and affect cleavage by proteases and potentially also antibody binding. (G) A701 is in an exposed beta-sheet and the mutation to V may destabilize the secondary structure. (H) T716 is in an exposed loop and forms a hydrogen bond with the main chain of Q1045, which is lost upon mutation to I. D1118 is found in an interior loop, with the three copies of this residue (one in each monomer) facing one another and this motif can be maintained with the mutation to 15 H. (I) Residue D839 sits in an exposed helix turn, very near the fusion peptide, in a region that may be a good target for antibodies. Its mutation to Y is not expected to change the protein structure but may impact the fusion peptide interaction with the host membrane and/or antibody neutralization. (J) D936 is in an exposed alpha-helix. The mutation to Y is not expected to change the protein structure but may impact the interaction with antibodies that may bind here. (1:450) and High titer (≥1:1350). Triplicates were performed for each tested serum dilution. Error bars represent standard deviation. Table S3 . Plasma was collected from 10 individuals 12 days after the first and the second rounds of vaccination and was tested for neutralization of WT virus and variants of concern. 5 Triplicates were performed for each tested plasma dilution. Error bars represent standard deviation. shown in a cyan cartoon representation with key antibody interacting residues depicted as sticks. (B) Zoom in to the RBM region of the RBD. Residues relevant for antibody binding (>35% 5 frequency of contact) are depicted as sticks. Of these, the ones with an antibody binding probability higher than 45% have a green label, and those that also have a low frequency of binding to ACE2 (<50%) are labelled in pink. There is no structural information for any of these amino acids because they localize to unstructured regions at the NTD, TD and CD and therefore it is not possible to map them in fig S3-5. FREQUENCY 0.0% L18F/A222V L18F -Present in lineage. B.1.1.28.1/P.1. Has a mild impact in the structure of spike and also protects from some neutralizing monoclonal antibodies 54 . A222V -Mutation that expanded in Europe in lineage B.1.117 and is not associated with antibody escape 54, 113 . These mutations are frequently found in association. FREQUENCY 0.9% Mutation identified in China 114 and not reported to affect the function of spike 109 . FREQUENCY 0.3% This mutation is present in Lineage B.1.351 and is one of many mutations in the NTD of spike reported to contribute to resistance to neutralizing antibodies 54,115 . This mutation is currently found exclusively associated with B. 1.351. However, its prevalence in the past has displayed two peaks unrelated to E484K (see prevalence in fig. S2 ). FREQUENCY 2.2% Found in lineages B.1.141 (common in UK in the beginning of pandemics, until June) and B.1.258 (appeared in April, and its presence has been slowly increasing). N439K mutation sits in the RBD and results show that the virus retains viral fitness but becomes resistant to some neutralizing antibodies 41,116 . The N439 sits at the extremity of RBD, but not directly in the contact zone to ACE2. It is under the residues at the contact zone. It is exposed in the "up" conformation of the RBD. In the down conformation it is semi-exposed. A likely zone for antibody binding. FREQUENCY 2.0% Part of lineage B.1.427/9 / 20C/S (USA CAL). It is located in RBD and has been shown to increase infectivity 57 and escape antibody binding in a screen in yeast using the RBD and not full-length spike 2 and in 60 .. 17 Δ69H-70V/Y453F These set of mutations were found together in association with mink related infections. It was found initially in Denmark and the Netherlands and resulted in culling of minks 40 . Δ69-70 was explained above. Y453 is localised in the RBD and was shown to affect neutralization by SARS-CoV-2 specific antibodies 116 . It requires better understanding. FREQUENCY 0.0% Localized at the RBD, was reported to have a modest increase in the affinity of the RBD to ACE2 4 . FREQUENCY 2.4% 19 Q675H Q675H leads to a putative change in glycosylation and was reported to reduce infectivity 57 . FREQUENCY 0.20% Prevalent in Portugal by 30 th April 117 . The aspartic acid 839 is exposed, not making relevant interactions. It may be a zone for antibody targeting. This zone may be important for fusion and/or in the induction of host inflammatory responses. FREQUENCY 0.01% The aspartic acid 936Y is located in the fusion core of the heptad repeat 1. It was detected in Sweden and England and was reported to destabilize the post-fusion conformation of spike, while minimally impacting on the stability of the pre-fusion 118 . A distinct paper reported that this mutation increased infectivity but did not affect neutralization by antibodies 119 . The aspartic acid 936 sits in an exposed helix. Its substitution for a tyrosine looks harmless from the perspective of structural stability. FREQUENCY 0.7% The acquisition of Spike mutation S494P in was observed for B.1.1.7 at the end of February 2021, and was acquired multiple times, and its frequency is increasing. S494P allows evasion of binding or neutralization by several monoclonal antibodies 120 but has not been shown to impact on neutralization by convalescent or vaccine-induced polyclonal antisera. S494P confers increased binding to hACE2 4 . FREQUENCY 0.81% The acquisition of Spike mutation S494P in was observed for B.1.1.7 at the end of February 2021, and was acquired multiple times, and its frequency is increasing FREQUENCY 0.44% Although still at a very low frequency, this double mutation is starting to appear. FREQUENCY 0.01% 25 E484K/S494P/N501Y This triple mutation was not detected yet. FREQUENCY 0.00% Table S2 . IgG antibody titers against SARS-CoV-2 spike protein and neutralizing titers (NT50) of convalescent sera against WT and mutant spike pseudoviruses. <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 2 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 3 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 4 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 2 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 3 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 4 <50 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 <30 5 Table S3 . IgG antibody titers against SARS-CoV-2 spike protein and neutralizing titers (NT50) against WT and variant pseudoviruses of plasma from vaccinated individuals, collected 12 days after the first and after the second doses of the vaccine. Supplementary Table S4 . Summary of antibodies studied in the S protein-antibody complexes. These antibodies were chosen due to the availability of their structure resolved together with the S protein (or just the RBD region) in the PDB repository 94 . Impact of B.1.1.7 variant mutations on antibody recognition of linear SARS-CoV-2 epitopes. medRxiv SARS-CoV-2 immune evasion by variant B.1.427/B.1.429 Analysis of a SARS-CoV-2-Infected Individual Reveals Development of Potent Neutralizing Antibodies with Limited Somatic Mutation Broad neutralization of SARS-related viruses by human monoclonal antibodies Potently neutralizing and protective human antibodies against SARS-CoV-2 Rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the SARS-CoV-2 spike protein Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Comprehensive mapping of mutations in the SARS-CoV-2 receptor-20 binding domain that affect recognition by polyclonal human plasma antibodies The emerging plasticity of SARS-CoV-2 Anatomy of hot spots in protein interfaces KIF13A mediates trafficking of influenza A virus 5 ribonucleoproteins Production of high-quality SARS-CoV-2 antigens: Impact of bioprocess and storage on glycosylation, biophysical attributes, and ELISA serologic tests performance A serological assay to detect SARS-CoV-2 seroconversion in humans SARS-CoV-2 Seroconversion in Humans: A Detailed Protocol for a Serological Assay, Antigen Production, and Test Setup Evaluating SARS-CoV-2 Seroconversion Following Relieve of 15 The Protein Data Bank MDAnalysis: A Python Package for the Rapid Analysis of Molecular 20 Los Alamos National Lab. (LANL) MDAnalysis: a toolkit for the analysis of molecular dynamics simulations Array programming with NumPy GROMACS 2020.3 Source Code ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K Molecular Dynamics with Coupling to an External Bath Canonical sampling through velocity rescaling Polymorphic transitions in single crystals: A new molecular dynamics method Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems A smooth particle mesh Ewald method LINCS: A linear constraint solver for molecular simulations Comparative protein modelling by satisfaction of spatial restraints VMD: visual molecular dynamics SARS-CoV-2 D614G spike mutation increases entry efficiency with enhanced ACE2-binding affinity Enhanced receptor binding of SARS-CoV-2 through networks of hydrogen-bonding and hydrophobic interactions PCR assay to enhance global surveillance for SARS-CoV-2 variants of concern Emergence and spread of a SARS-CoV-2 variant through Europe in the 20 summer of 2020. medRxiv Evolutionary perspectives on novel coronaviruses identified in pneumonia cases in China 204-?) (?-1075) n pre-pandemic pool ? -could not be calculated n.d. -not done * pre-pandemic pool n.d. -not done ? -could not be calculated <30 CAGCCAGTGTGTGAACTTCACCACAAGAACCCAGC-3' Spike_L18F_Rv 5'-GCTGGGTTCTTGTGGTGAAGTTCACACACTGGCTG-3' Spike_T20N_Fw 5'-CAGTGTGTGAACCTGACCAATAGAACCCAGCTGCCTC-3' Spike_T20N_Rv 5'-GAGGCAGCTGGGTTCTATTGGTCAGGTTCACACACTG-3' Spike_P26S_Fw 5'-GAACCCAGCTGCCTTCAGCCTACACCAAC-3' Spike_P26S_Rv 5'-GTTGGTGTAGGCTGAAGGCAGCTGGGTTC-3' Spike_H49Y_Fw 5'-CAGATCCAGCGTGCTGTATTCTACCCAGGACCTGT-3' Spike_H49Y_Rv 5'-ACAGGTCCTGGGTAGAATACAGCACGCTGGATCTG-3' Spike_Δ69-70_Fw 5'-GGTTCCACGCCATCTCCGGCACCAATGG-3' Spike_Δ69-70_Rv 5'-CCATTGGTGCCGGAGATGGCGTGGAACC-3' Spike_D80A_Fw 5'-CACCAAGAGATTCGCCAACCCCGTGCTGC-3' Spike_D80A_Rv 5'-GCAGCACGGGGTTGGCGAATCTCTTGGTG-3' Spike_D138Y_Fw 5'-GTTCCAGTTCTGCAACTATCCCTTCCTGGGCGTCT-3' Spike_D138Y_Rv 5'-AGACGCCCAGGAAGGGATAGTTGCAGAACTGGAAC-3' Spike_Δ144_Fw 5'-CCCCTTCCTGGGCGTCTATCACAAGAACAACAA-3' Spike_Δ144_Rv 5'-TTGTTGTTCTTGTGATAGACGCCCAGGAAGGGG-3' Spike_R190S_Fw 5'-GCAACTTCAAGAACCTGAGCGAGTTCGTGTTCAAG-3' Spike_R190S_Rv 5'-CTTGAACACGAACTCGCTCAGGTTCTTGAAGTTGC-3' Spike_D215G_Fw 5'-ACCTCGTGCGGGGTCTGCCTCAGGG-3' Spike_ D215G_Rv 5'-CCCTGAGGCAGACCCCGCACGAGGT-3' Spike_A222V_Fw 5'-TCAGGGCTTCTCTGTTCTGGAACCCCTGG-3' Spike_A222V_Rv 5'-CCAGGGGTTCCAGAACAGAGAAGCCCTGA-3' Spike_K417N_Fw 5'-CCCTGGACAGACAGGCAATATCGCCGACT-3 Spike_P681H_Fw 5'-CACAGACAAACAGCCACAGACGGGCCAGATC-3' Spike_P681H_Rv 5'-GATCTGGCCCGTCTGTGGCTGTTTGTCTGTG-3' Spike_A701V_Fw 5'-AATGTCTCTGGGCGTCGAGAACAGCGTGG-3' Spike_A701V_Rv 5'-CCACGCTGTTCTCGACGCCCAGAGACATT-3' Spike_T716I_Fw 5'-CTCTATCGCTATCCCCATCAACTTCACCATCAGCG-3' Spike_T716I_Rv 5'-CGCTGATGGTGAAGTTGATGGGGATAGCGATAGAG-3' Spike_D839Y_Fw 5'-TCATCAAGCAGTATGGCTATTGTCTGGGCGACATT-3' Spike_D839Y_Rv 5'-AATGTCGCCCAGACAATAGCCATACTGCTTGATGA-3' Spike_D936Y_Fw 5'-CATCGGCAAGATCCAGTATAGCCTGAGCAGCACAG-3' Spike_D936Y_Rv 5'-CTGTGCTGCTCAGGCTATACTGGATCTTGCCGATG-3 GFP_Rv 5′-TCAGCTCGAGTTACTTGTACAGCTCGTCCATGC-3′ ACE2_Fw 5'-GAGCTCGAGATGTCAAGCTCTTCCTGG-3' ACE2_Rv