key: cord-0928298-cqgopt7u authors: Guo, Elisa; Guo, Hailong title: CD8 T cell epitope generation toward the continually mutating SARS-CoV-2 spike protein in genetically diverse human population: Implications for disease control and prevention date: 2020-09-10 journal: bioRxiv DOI: 10.1101/2020.09.10.290841 sha: 22bad59408b93103fda97f6d77b79f4e159b1c3c doc_id: 928298 cord_uid: cqgopt7u The ongoing pandemic of SARS-CoV-2 has brought tremendous crisis on global health care systems and industrial operations that dramatically affect the economic and social life of numerous individuals worldwide. Understanding anti-SARS-CoV-2 immune responses in population with different genetic backgrounds and tracking the viral evolution are crucial for successful vaccine design. In this study, we reported the generation of CD8 T cell epitopes by a total of 80 alleles of three major class I HLAs using NetMHC 4.0 algorithm for the spike protein of SARS-CoV-2, a key antigen that is targeted by both B cells and T cells. We found diverse capacities of S protein specific epitope presentation by different HLA alleles with very limited number of predicted epitopes for HLA-B*2705, HLA-B*4402 and HLA-B*4403 and as high as 132 epitopes for HLA-A*6601. Our analysis of 1000 S protein sequences from field isolates collected globally over the past few months identified three recurrent point mutations including L5F, D614G and G1124V. Differential effects of these mutations on CD8 T cell epitope generation by corresponding HLA alleles were observed. Finally, our multiple alignment analysis indicated the absence of seasonal CoV induced cross-reactive CD8 T cells to drive these mutations. Our findings provided molecular explanations for the observation that individuals with certain HLA alleles such as B*44 are more prone to SARS-CoV-2 infection. Studying anti-S protein specific CD8 T cell immunity in diverse genetic background is critical for better control and prevention of the SARS-CoV-2 pandemic. The coronavirus (CoV) is an enveloped, positive-stranded RNA virus that can cause 34 respiratory and enteric diseases in wide range of hosts including human, numerous animals, 35 birds and fish(1). Four genera (Alpha, Beta, Gamma, and Delta) of CoVs have been 36 classified with human CoVs designated within Alpha and Beta groups. Their genome, 37 about 30kb in length, is the largest found in RNA viruses and encodes more than 20 38 putative proteins, including four major structural proteins: spike (S), envelope (E), coronavirus 2 (SARS-CoV-2), was identified in Wuhan City, Hubei Province, China from 49 patients with severe pneumonia(5-7). Subsequently, this novel viral infection has rapidly 50 spread into nearly all countries over the world, leading to the declaration of the first-known 51 coronavirus global pandemic by the World Health Organization (WHO) on March 11, 52 2020(8). As of July 31, the COVID-19 pandemic has resulted in over 17 million confirmed 53 cases and more than 680,000 deaths globally according to the update from Johns Hopkins 54 Coronavirus Resource Center (https://coronavirus.jhu.edu/map.html). Like SARS-CoV 55 Table 2 ). The comparison data in Table 2 showed that L5F mutation increased the epitope 147 binding affinity for 37 different HLA alleles, meanwhile only 10 other alleles had 148 decreased binding affinity for the mutated epitope FVFFVLLPL. In addition, the mutated 149 epitopes could be presented by 5 more HLA alleles (Table 2) . 150 151 The amino acid G on position 1124 is within the connector domain of the S protein (25) 153 that is important for S protein trimerization and critical for stabilizing S protein 154 conformational structure during pre-and post-fusion with host cell membrane(33). By 155 inspecting the epitopes in Supplementary Table 2, we found a total of 19 of the exact same 156 epitopes starting at position 1121 (FVSGNCDVV) that could be presented by 12 HLA-As 157 (HLA-A*0201, HLA-A*0202, HLA-A*0203, HLA-A*0206, HLA-A*0207, HLA-158 A*0211, HLA-A*0212, HLA-A*0216, HLA-A*0219, HLA-A*0250, HLA-A*6802 and 159 HLA-A*6901) and 7 HLA-Cs (HLA-C*0303, HLA-C*0501, HLA-C*0602, HLA-C*0701, 160 HLA-C*0802, HLA-C*1203 and HLA-C*1502). Only one additional epitope starting at a 161 different position that contains the G1124 was identified for just one individual allele, 162 indicating FVSGNCDVV is the predominantly presentable epitope in the connector 163 domain of the S protein. Re-analyzing the epitopes of the 80 HLA alleles with the mutated 164 reference S protein containing the V1124 identified 13 HLA alleles with decreased binding 165 affinity for the mutated epitope FVSVNCDVV (Table 3) . We also observed 6 other HLA 166 alleles lost the ability to present the mutant epitope. No alleles showed increased binding 167 affinity due to this mutation. Further there were no other alleles capable of presenting this 168 mutated epitope (Table 3) . 169 The mutation at position 614 with a sole D to G switch at a 63% frequency is especially 172 alarming. The result revealed that SARS-Cov-2 isolates with G614 mutation has been 173 adapted and spread efficiently within human population. To our knowledge, the D614 174 amino acid is not within the essential receptor binding domain or a residue of any validated 175 neutralization epitopes for SARS-CoV-2. In addition, whether it is involved in CD8 T cell 176 response is not known. Through screening the epitopes listed in Supplementary Table 2 (Table 4 ). Unlike the L5 and G1124 containing epitopes, D614 containing 183 epitopes could start at several different positions including 606, 607, 610, 611, 612 and 614 184 with the epitope of YQDVNCTEV most frequently observed (Table 4) . 185 186 Similarly, we reanalyzed the 9-mer CD8 T cell epitope for the reference S protein with a 187 single D614G mutation. The resulting epitopes of each allele were compared to the 188 corresponding epitopes derived from original reference S protein (Supplementary Table 2) . 189 The comparison data was summarized in Table 5 . With a single D to G switch, there were 190 seven HLA alleles including HLA-A*0203, HLA-A*0205, HLA-A*0206, HLA-A*2403, 191 HLA-A*2501, HLA-A*2601 and HLA-C*1203 that obtained at least one CD8 epitope 192 containing the replaced amino acid G614. Interestingly, nine HLA alleles including A*0101, HLA-A*0207, HLA-A*6802, HLA-A*6901, HLA-B*1509, 194 HLA-C*0501, HLA-C*0802 and HLA-C*1203 lost one CD8 epitopes each that were 195 predicted using unmodified reference S protein sequence (Table 5 ). In addition to the direct 196 gain or loss of CD8 T cell epitopes, several HLA alleles had their epitope binding affinities 197 changed. These include the G614 containing epitopes with increased affinity for A*0211, HLA-A*0212, HLA-A*0216, HLA-A*0219, HLA-A*0250, HLA-A*6601, 199 HLA-A*6802, HLA-A*6901, HLA-B*2720 and HLA-C*1402 (Table 5 ). Finally, we also 200 observed there were other G614 containing epitopes with decreased affinity for alleles 201 including HLA-A*0201, HLA-A*0206, HLA-A*0211, HLA-A*0212, HLA-A*0216, 202 HLA-B*4801, HLA-C*0602 (Table 5 ). These results illustrated differential effects of 203 D614G mutation on the capability to present S protein specific CD8 T cell epitopes by 204 different HLA alleles. 205 206 Lack of seasonal CoVs cross-reactive CD8 T cell immunity to promote L5F, D614G 207 and G1124V mutations 208 Currently, the detailed mechanisms driving these mutations on the SARS-CoV-2 S protein 209 are not known. One possibility is that existing cross-reactive CD8 T cell immunity elicited 210 by human seasonal CoVs could promote the virus to mutate and escape immune 211 recognition in general population that are frequently targeted by seasonal CoVs. To test 212 this, we went ahead to evaluate if there are shared CD8 T cell epitopes containing L5, D614 213 and G1124 from four representative seasonal CoVs(1,4). However, our pairwise alignment 214 of the reference S protein with each of the four seasonal CoVs showed very low percent of 215 identities (27% for HCoV-NL63, 31% for HCoV-HKU1, 28% for HCoV-229E and 33% 216 for HCoV-OC-43). Further, multiple alignment of theses S protein sequences failed to 217 identify any identical or similar CD8 T cell epitope motif to these we described above that 218 include: FVFLVLLPL, FVSVNCDVV and several epitopes containing D614 (Table 4 and 219 Fig 2) . From these results, it was concluded that there were unlikely any cross-reactive 220 CD8 T cells induced by seasonal CoVs that could promote the key mutations we observed. In this study, we intended to provide a broad representation of S protein specific CD8 T 231 cell epitopes of 80 different human HLAs for better understanding anti-SARS-CoV2 T cell 232 immunity. Our results indicated a potential differential anti-S protein CD8 T cell response 233 during COVID-19 infection in individuals with unique HLA alleles as the capacity of these 234 alleles to present potential epitopes varies dramatically (Table 1) . Among them, HLA-235 B*2705 is only capable of presenting 16 epitopes. This allele is highly associated with 236 various forms of arthritis(39). It's known that SARS-CoV-2 infection in aged population 237 and people with underlying health conditions including immune disorders tend to be much 238 severe and lethal(40). Additionally, during acute influenza infection and chronic HIV 239 progression, viral escape mutations could occur on HLA-B*2705 restricted CD8 T cell 240 epitopes(41,42). Although we didn't find evidence that seasonal CoV-2 could induce cross-241 reactive CD8 T cells toward the three key mutations we identified, other mechanism may 242 promote SARS-CoV-2 to mutate and escape HLA-B*2705 restricted CD8 T cell immunity. alleles we analyzed showed higher numbers of CD8 T cell epitopes than B*44 (Table 1) , 257 indicating potential broad anti-S protein CD8 T cell responses may provide better 258 protection against SARS-CoV-2 infection. 259 Our data lacks experimental support, however, some of the S protein specific CD8 T cell 261 epitopes presented in this study have been validated in SARS-CoV infection and utilized 262 for monitoring human anti-SARS specific CD8 T cell responses using immunological 263 techniques such as ELISPOT and tetramer staining(18,34-37). One of these epitopes is 264 FIAGLIAIV that was identified in HLA-A*0201 patient infected with . Table 2 ). The identified epitope RLNEVAKNL on 271 SARS-CoV(37) was shared with 8 HLA-A, 4 HLA-B and 3 HLA-C alleles for SARS-272 CoV-2 in our study. Another SARS-CoV S protein specific CD8 T cell epitope, 273 VLNDILSRL(36) were predicted on the SARS-CoV-2 S protein for 9 HLA-A alleles, 1 274 HLA-B and 3 HLA-C alleles. The epitope NLNESLIDL we identified for 10 HLA-A 275 alleles was also characterized as a valid HLA-A*0201 restricted SARS-CoV CD8 T cell 276 epitope in a published study (35). 277 278 Although recent studies agreed that D614G mutation could enhance SARS-CoV-2 279 infectivity and promote its transmission(27,43), regarding its effect on virulence, one study 280 reported that G614 virus infection was associated with higher mortality(44), while the other 281 study concluded no obvious effect on disease severity(27). One potential explanation for 282 these disparate clinical findings is that the prevalent HLAs of the infected subjects in the 283 two studies differ, which leads to divergent anti-S protein CD8 T cell responses, either 284 toward the epitopes containing the G614 mutation alone and/or in combination with other 285 epitopes we identified, as D614G mutation can occur simultaneously with other mutations 286 on the S protein ( Supplementary Fig 1 and data not shown) . This is because first, our 287 prediction result showed only about 25% of the total HLA alleles we analyzed could mount 288 CD8 T cell responses targeting the epitopes containing D614 (Table 4) . Secondly, among 289 these HLA alleles that could generate D614 containing CD8 T cell epitopes, the epitopes 290 they recognize and present to TCR differ (Table 5 ), which may lead to different TCR 291 clonotypes and anti-viral efficacy(45,46). Third, our mutational analysis on a panel of CD8 292 T cell epitopes that have G614 also suggested possible different control outcomes by 293 different HLA alleles as their bindings to the mutated epitope could be altered differently 294 (Table 5) . We believe pairing HLA typing with SARS-CoV-2 sequencing and testing anti-295 S protein CD8 T cell responses could allow precise assessment of clinical outcome of 296 D614/G614 virus infection on individuals with different HLA alleles. 297 298 Our data of mutational effect of L5F suggests that the evolution of SARS-CoV2 virus 299 targeting L5 might be eventually unsuccessful as the mutated epitope could enhance CD8 300 T cell recognition and killing through the enhanced interaction with most of the HLA 301 alleles (Table 2 ). In contrast, the G1124V mutation appears to favor the virus to escape 302 immune recognition as this mutation reduces epitope binding affinity of HLA alleles to a 303 level that some are no longer able to bind (Table 3) . However, the V1124 variant has not 304 become as dominant as the G614. This could be due to other viral and host factors, such as 305 viral structure stability, peptide-MHC stability and innate responses that prevent this 306 variant to stand out further. Since the pandemic SARS-CoV-2 outbreak in several major 307 countries hasn't yet been controlled, the continual monitoring of these mutations on S 308 protein are still necessary. 309 310 Based on the data and discussions provided here, we would like to encourage the research 311 labs to carry out further validation and characterization of these candidate S protein 312 epitopes and study their role in protecting SARS-CoV-2 infection and vaccine immunity. 313 These epitopes should include not only the ones that had been identified for the S protein 314 of SARS-CoV, but also uncharacterized epitopes such as these containing L/F5, D/G614 315 and G/V1124. We also recommend the clinical labs to organize and utilize resources to 316 combine SARS-CoV-2 sampling and viral genome sequencing with HLA typing and S 317 protein specific CD8 T cell immune testing to gather more useful data for better identifying 318 risk groups and implementing policies that are suited for different geographical locations, 319 resulting in more effective transmission control. Ultimately, we believe these efforts will 320 provide more solid data for effective vaccine development and elimination of SARS-CoV-321 for prediction. 9-mer peptides (epitopes) with rank score ≤2.0% were selected as positive 335 HLA-binder. As the output format for the epitopes derived from NetMHC4.0 begins with 336 0, one was added to the resulting positions of all epitopes presented in this study to match 337 the S protein sequence numbering. 338 339 To identify mutations on SARS-CoV-2 S protein, we selected a total of 1000 complete or 341 near-complete S protein sequences of from viruses isolated from countries in North 342 America, Europe, Asia, Oceania, Africa and South America. These sequences were 343 deposited in the NCBI database with collection dates ranging from January to June 2020 344 and showed at least 95% of query coverage for the reference S protein (accession number 345 QHD43416.1). The list of the accession numbers for these sequences was provided in 346 Supplementary CoV-2 variants with deletions at the S1/S2 junction. Emerg Microbes Infect. 2020 432 Dec CoVs (NL63, HKU1, 229E and OC43). The alignment was performed using Mega-X as 582 described. The predicted CD8 T cell epitopes were highlighted in yellow and the 583 corresponding mutational targets of L5, D614 and F1124 were bolded and underlined on 584 the SARS-CoV-2. The accession number for S proteins of SARS-CoV-2, HCoV-NL63, 585 HCoV-HKU1, HCoV-229E and HCoV-OC43 are QHD43416.1, APF29063.1, 586 BBA20983.1, AAG48592.1 and CAA83661.1, respectfully. 587 588 589 SARS-CoV-2 LHYT