key: cord-0987717-wqpr8v3p authors: Yuan, Xianlin; li, liangping title: The influence of major S protein mutations of SARS-CoV-2 on the potential B cell epitopes date: 2020-08-24 journal: bioRxiv DOI: 10.1101/2020.08.24.264895 sha: 2671fc4099d4405f84f5d30907d4f3f11c76a9e0 doc_id: 987717 cord_uid: wqpr8v3p SARS-CoV-2 has rapidly transmitted worldwide and results in the COVID-19 pandemic. Spike glycoprotein on surface is a key factor of viral transmission, and has appeared a lot of variants due to gene mutations, which may influence the viral antigenicity and vaccine efficacy. Here, we used bioinformatic tools to analyze B-cell epitopes of prototype S protein and its 9 common variants. 12 potential linear and 53 discontinuous epitopes of B-cells were predicted from the S protein prototype. Importantly, by comparing the epitope alterations between prototype and variants, we demonstrate that B-cell epitopes and antigenicity of 9 variants appear significantly different alterations. The dominant D614G variant impacts the potential epitope least, only with moderately elevated antigenicity, while the epitopes and antigenicity of some mutants(V483A, V367F, etc.) with small incidence in the population change greatly. These results suggest that the currently developed vaccines should be valid for a majority of SARS-CoV-2 infectors. This study provides a scientific basis for large-scale application of SARS-CoV-2 vaccines and for taking precautions against the probable appearance of antigen escape induced by genetic variation after vaccination. Author Summary The global pandemic of SARS-CoV-2 has lasted for more than half a year and has not yet been contained. Until now there is no effective treatment for SARS-CoV-2 caused disease (COVID-19). Successful vaccine development seems to be the only hope. However, this novel coronavirus belongs to the RNA virus, there is a high mutation rate in the genome, and these mutations often locate on the Spike proteins of virus, the gripper of the virus entering the cells. Vaccination induce the generation of antibodies, which block Spike protein. However, the Spike protein variants may change the recognition and binding of antibodies and make the vaccine ineffective. In this study, we predict neutralizing antibody recognition sites (B cell epitopes) of the prototype S protein of SARS-COV2, along with several common variants using bioinformatics tools. We discovered the variability in antigenicity among the mutants, for instance, in the more widespread D614G variant the change of epitope was least affected, only with slight increase of antigenicity. However, the antigenic epitopes of some mutants change greatly. These results could be of potential importance for future vaccine design and application against SARS-CoV2 variants. Author Summary 24 The global pandemic of SARS-CoV-2 has lasted for more than half a year and has 25 not yet been contained. Until now there is no effective treatment for SARS-CoV-2 26 caused disease . Successful vaccine development seems to be the only 27 hope. However, this novel coronavirus belongs to the RNA virus, there is a high 28 mutation rate in the genome, and these mutations often locate on the Spike proteins of . 54 Since January 30, 2020, the WHO announced the CoVID-19 contagion as a public 55 health emergency of global concern. As of July 20, 14348858 cases of COVID-19 and 4 56 603691 deaths have been reported globally according to COVID-19 Situation Report-182 57 (WHO website at https://www.who.int/emergencies/diseases/ novel-coronavirus-2019). 58 The virion of SARS-CoV-2 is spherical, enveloped, and 60-140 nm in diameter 59 with spikes of about 9-12 nm outside. The coronaviral genome encodes 10 proteins, 60 four of them are major structural proteins: the spike (S), membrane (M), envelope (E) 61 and nucleocapsid (N) proteins [7] . Each of these proteins is responsible for different in the SARS-CoV-2 genome were discovered from 10022 public genome data 81 assemblies as at May 1, 2020 [9] , in which 394 missense mutations of S protein were 82 detected. Among these spike mutations, D614G mutation, in which Aspartic acid (D) 83 was replaced with Glycine (G) at the AA site of 614, was a major mutation of great 84 concern [10, 11]. SARS-CoV-2 with D614G mutation may have triggered fatal 85 infections in many European countries, such as Spain, Italy, France, etc. [11] . 86 These mutations will undoubtedly cause changes in the structure of S proteins. 87 However, it's highly worth concerning whether or not these mutations affect the 88 antigenicity of S proteins and the binding ability with neutralizing antibodies. If the 89 B-cell epitopes on S protein changed and could not bind the neutralizing antibodies , it 90 would result in losing efficacy of the developed vaccines based on prototype S protein. 91 Many immuno-bioinformatic tools have been developed to dope out the overall 92 and deep analysis of viral antigens, including both linear and discontinuous epitopes of 93 B-cells as well as their immunogenicity, etc. To explore these questions, here we report 94 to used these immuno-bioinformatic tools from the IEDB and related resources to 95 predict the B cell epitopes of S protein from the prototype and mutated strains of 96 SARS-CoV-2 and compare the changes of the likely epitope sites from dominant and 97 rare mutations of S protein. We found that the distinctive mutations of S proteins could 98 impact potential effective epitopes of S proteins in different degree. The exracellular domains divide into S1 and S2 subunits; S1 contains the N-terminal 137 domain (NTD) and receptor binding domain (RBD) [13] . Based on the beginning and 138 end position of different domains, 10 major mutations were shown in the schematic 139 diagram of S protein ( Figure 1A ). We found that most of S protein mutations (70%) 140 locates in S1 and near region of S protein. The highest mutation D614G is near RBD To predict the potential linear B-cell epitopes, we first used BepiPred-2.0 155 prediction tool on IEDB server to screen the prototype S protein sequence and 156 discovered total 30 B-cell linear epitopes (Table S2) , whose distribution is shown on Table 2 . Among them, 9 epitopes are in S1 subunit 4 in the NTD region, 5 in 163 the RBD domain) and 3 in the S2 subunit of S protein. Based on this analysis, we 164 found that three epitopes in the RBD domain ( 384 PTKLNDL 390 , 9 165 405 DEVRQIAPGQTGKI 418 , and 487 NCYFPL 492 ) have more significant antigenicity and 166 accessibility. 167 We further predicted the discontinuous epitopes by the Discotope 2.0 online 168 server. 3D structure of S protein (PDB ID: 6vyb, Chain ID: A) was utilized to predict 169 the discontinuous epitopes. The default threshold was −3.7 with 47% of Sensitivity and 170 75% of Specificity. The 53 discontinuous epitopes were predicted and mainly located 171 in the whole RBD region at 400aa~600aa of S protein shown in Figure 3A . All of the 172 predicted epitopes distributing on surface of S protein are shown in a 3D structure 173 picture in Figure 3B using JSmol Viewer. According to the distribution in different 174 domains, these epitopes (Table S3 ) could be divided into four groups (Table 3 ) and the 175 highest propensity score (P-Score) and DiscoTope score (D-Score) of epitopes were 176 concentrated at 498~500aa of RBD region shown by arrows in the Figure 3B . 177 Finally, these epitopes were validated by Pepitope tool (http://pepitope.tau.ac.il/) , 178 the three major antigen clusters were consistent with B-linear epitopes mentioned 179 above (Table 4) . By assessment for its antigenicity and surface 202 availability, we found that four epitopes have changed (Table S4 ). In brief, after H49Y 203 mutation, the S protein had 14 effective epitopes, two of which have better antigenicity 204 than original epitopes at site of 405~417 and 697~709 , two of which were newly 205 generated at sites of 519~533 and 618~629, and the remaining 10 epitopes were the 206 same as those without mutation. 207 208 Y145H mutation 11 209 Y145H mutation occurred in 8 countries, but the frequency appeared to decrease now 210 [12] . By using the screening methods above, we found that five altering sites have 211 distinct influences on the likely epitopes (Table S5) . Y145H mutation emerging, the S 212 protein had 13 effective epitopes, two of which were newly produced from originally 213 unlikely epitopes at sites of 618~625, three of which have better antigenicity than 214 original epitopes at site of 140~153, 459~465 and 657~663, and the remaining 9 215 epitopes were conservative. The epitope at site 486~492 was reduced antigenicity slightly. Therefore, 9 effective 238 epitopes have predicted after G476S mutation, in which the amount of 239 epitope-changing is the most, and the overall antigenicity was dereased. (Table S9) 276 The background of V615F mutation is consistent with V615I. Through the 277 above-mentioned forecasting tool, we found that 12 alterations directly affected B cell 278 epitope other than V615I mutation obviously (Table S11) 298 In order to investigate the influences of the above common 9 mutations of 299 S protein on B cell epitopes, we compared the predicted epitopes of reference and 300 mutant S protein, analyzed the association of epitope changes among mutations and 301 determined the influence of mutation on B cell epitopes. The detailed information of 302 changes in each mutation is listed in the Table S13. We found that some mutations did 303 not or slightly change B-cell epitopes, while others strongly impact the number and site 304 of B-cell epitopes. All the major changes of B-cells comparison was summarized in 305 Table 5 . Most important finding is that the commonest mutation D614G change the 306 B-cell epitopes of S protein slightly, only moderately increasing the accessibility and 307 antigenicity of epitope 657-663 . There are 12 potential epitopes in D614G 308 mutation, nearly identical to those without mutation. In D614G and V615I 309 mutation, their effective epitopes were also 12, in which only 1 epitope at the same site Table 1 , and 5 of them are concentrated on and near 330 RBD domain. Especially, the most frequently occurring mutation D614G located at S1 331 and S2 junctions, where is near the furin cleavage site of the S1/S2 boundary. Walls 332 reported that deletion of this cleavage region could influence SARS-CoV-2 S-mediated 333 entry into host cells [8] . Hence, Korber [12] and Zhang [19] proposed that D614G 334 mutation contributes to the spread of SARS-CoV-2, which makes G614 strain swiftly 335 become the dominant mutant. 336 The mutation in S protein may affect the B-cell epitopes and lead to vaccine 337 failure. Therefore, in order to explore the impact of mutations on antigenicity of S 338 protein, in this study, we applied immuno-informatics tools to predict potential B-cell 339 epitopes of prototype and variant S protein. (Table 2) and discontinuous epitopes such 357 as No.2 (Table 3 ) could be vaccine candidates targets. 358 Importantly, it's worth exploring whether or not the mutations on S protein leads 359 to epitope changes. Therefore, we used a group of prediction tools of B-cell epitopes to 360 predict the prototype and variant S protein. The primary sequence of SARS-CoV-2 S protein was retrieved from NCBI GenBank 406 database using accession number QHO62107.1 and was used as prototype sequence or 407 reference sequence for vaccine development in many projects [14] . Its complete 408 genome number is NC_045512, which. The major variation sequences were available 409 from The Global Initiative for Sharing All Influenza Data (GISAID) [26] and GenBank 446 We used the sequence from early onset SARS-CoV-2 as the wildtype or prototype and 447 the recent variant virus as mutation strains to predict the B-cell epitopes of S protein. 22 The S protein sequence was exclusive of the signal peptide (SP), TM and cytoplasmic 449 region, and only the ectodomain of S protein was used for analysis. The linear and 450 non-linear (discontinuous) epitopes of B cell were predicted by the different tools. 451 The linear epitopes were prediced by BepiPred-2.0 server of IEDB online database [33, 452 34]. The threshold was set to 0.55, which represented that the sensitivity was 29%, and 453 the specificity was 81%. Analysis result shows in a figure in which the residues with 454 scores above the threshold predicted to be part of an epitope were colored in yellow. 455 The effective B-cell epitopes relies on stronger antigenicity and accessibility of surface A pneumonia outbreak associated 490 with a new coronavirus of probable bat origin Identification of a 492 novel coronavirus in patients with severe acute respiratory syndrome Genome composition and divergence of 24 495 the novel coronavirus (2019-nCoV) originating in China. Cell host & microbe A novel coronavirus 499 associated with severe acute respiratory syndrome Isolation of a novel 502 coronavirus from a man with pneumonia in Saudi Arabia Properties of Coronavirus and SARS-CoV-2 Structure, function, and 507 antigenicity of the SARS-CoV-2 spike glycoprotein Variant analysis of SARS-CoV-2 genomes. Bulletin of the World 509 Health Organization SARS-CoV-2 viral spike G614 mutation exhibits higher case 511 fatality rate spike (S) protein be associated with higher COVID-19 mortality? Spike mutation 516 pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 Structure of the SARS-CoV-2 spike 518 receptor-binding domain bound to the ACE2 receptor RNA based mNGS approach identifies a 520 novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak. Emerging 521 microbes & infections The early 523 landscape of COVID-19 vaccine development in the UK and rest of the world Preliminary identification of potential vaccine targets for 525 the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies COVID-19, an emerging 528 coronavirus infection: advances and prospects in designing and developing vaccines, 529 immunotherapeutics, and therapeutics The SARS-CoV-2 Vaccine Pipeline: an Overview. 531 Current tropical medicine reports The D614G mutation in the 533 Epitope-based peptide 535 vaccine design and target site depiction against Middle East Respiratory Syndrome Coronavirus: An 536 immune-informatics study CoV spike protein: a key target for antivirals. 538 Expert opinion on therapeutic targets SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients' B cells Potent cross-reactive 543 neutralization of SARS coronavirus isolates by human monoclonal antibodies. Proceedings of the 544 National Academy of Sciences A 193-amino acid fragment of the SARS 546 coronavirus S protein efficiently binds angiotensin-converting enzyme 2 Spike Mutation Increases SARS CoV-2 Susceptibility to Neutralization Global initiative on sharing all influenza data-from vision to reality Clustal Omega for making accurate alignments of many protein sequences MSAViewer: 555 interactive JavaScript visualization of multiple sequence alignments CDD/SPARCLE: the 558 conserved domain database in 2020 Identifying candidate subunit vaccines using an 562 alignment-independent method based on principal amino acid properties Identification and validation of specific B-cell epitopes of hantaviruses associated to hemorrhagic 565 fever and renal syndrome The immune epitope database and 567 analysis resource: from vision to blueprint BepiPred-2.0: improving sequence-based B-cell 569 epitope prediction using conformational epitopes Influence of protein flexibility and 571 peptide conformation on reactivity of monoclonal anti-peptide antibodies with a protein alpha-helix Conformational B-cell epitope prediction on antigen protein 574 structures: a review of current algorithms and comparison with common binding site prediction 575 methods Bioinformatics resources and tools for 577 conformational B-cell epitope prediction. Computational and mathematical methods in medicine Reliable B cell epitope predictions: impacts of 580 method development and improved benchmarking It is grateful to Zijun Shu for his management of manuscript reference.