key: cord-0287249-z2fwv9i5 authors: Takagi, Akira; Matsui, Masanori title: Identification of HLA-A*24:02-restricted CTL candidate epitopes derived from the non-structural polyprotein 1a of SARS-CoV-2 and analysis of their conservation using the mutation database of SARS-CoV-2 variants date: 2021-09-22 journal: bioRxiv DOI: 10.1101/2021.09.21.461322 sha: 5f23c2f91279c2aac4ab449d5c11e9f34a1c3fb5 doc_id: 287249 cord_uid: z2fwv9i5 COVID-19 vaccines are currently being administrated worldwide and playing a critical role in controlling the pandemic. They have been designed to elicit neutralizing antibodies against Spike protein of the original SARS-CoV-2, and hence they are less effective against SARS-CoV-2 variants with mutated Spike than the original virus. It is possible that novel variants with abilities of enhanced transmissibility and/or immunoevasion will appear in the near future and perfectly escape from vaccine-elicited immunity. Therefore, the current vaccines may need to be improved to compensate for the viral evolution. For this purpose, it may be beneficial to take advantage of CD8+ cytotoxic T lymphocytes (CTLs). Several lines of evidence suggest the contribution of CTLs on the viral control in COVID-19, and CTLs target a wide range of proteins involving comparatively conserved non-structural proteins. Here, we identified twenty-two HLA-A*24:02-restricted CTL candidate epitopes derived from the non-structural polyprotein 1a (pp1a) of SARS-CoV-2 using computational algorithms, HLA-A*24:02 transgenic mice and the peptide-encapsulated liposomes. We focused on pp1a and HLA-A*24:02 because pp1a is relatively conserved and HLA-A*24:02 is predominant in East Asians such as Japanese. The conservation analysis revealed that the amino acid sequences of 7 out of the 22 epitopes were hardly affected by a number of mutations in the Sequence Read Archive database of SARS-CoV-2 variants. The information of such conserved epitopes might be useful for designing the next-generation COVID-19 vaccine that is universally effective against any SARS-CoV-2 variants by the induction of both anti-Spike neutralizing antibodies and CTLs specific for conserved epitopes. Importance COVID-19 vaccines have been designed to elicit neutralizing antibodies against the Spike protein of the original SARS-CoV-2, and hence they are less effective against variants. It is possible that novel variants will appear and escape from vaccine-elicited immunity. Therefore, the current vaccines may need to be improved to compensate for the viral evolution. For this purpose, it may be beneficial to take advantage of CD8+ cytotoxic T lymphocytes (CTLs). Here, we identified twenty-two HLA-A*24:02-restricted CTL candidate epitopes derived from the non-structural polyprotein 1a (pp1a) of SARS-CoV-2. We focused on pp1a and HLA-A*24:02 because pp1a is conserved and HLA-A*24:02 is predominant in East Asians. The conservation analysis revealed that the amino acid sequences of 7 out of the 22 epitopes were hardly affected by mutations in the database of SARS-CoV-2 variants. The information might be useful for designing the next-generation COVID-19 vaccine that is universally effective against any variants. against Spike protein of the original SARS-CoV-2, and hence they are less effective against 23 SARS-CoV-2 variants with mutated Spike than the original virus. It is possible that novel 24 variants with abilities of enhanced transmissibility and/or immunoevasion will appear in the 25 near future and perfectly escape from vaccine-elicited immunity. Therefore, the current 26 vaccines may need to be improved to compensate for the viral evolution. For this purpose, it 27 may be beneficial to take advantage of CD8 + cytotoxic T lymphocytes (CTLs). Several lines 28 of evidence suggest the contribution of CTLs on the viral control in COVID-19, and CTLs 29 target a wide range of proteins involving comparatively conserved non-structural proteins. 30 Here, we identified twenty-two HLA-A*24:02-restricted CTL candidate epitopes derived 31 from the non-structural polyprotein 1a (pp1a) of SARS-CoV-2 using computational 32 algorithms, HLA-A*24:02 transgenic mice and the peptide-encapsulated liposomes. We 33 focused on pp1a and HLA-A*24:02 because pp1a is relatively conserved and HLA-A*24:02 34 is predominant in East Asians such as Japanese. The conservation analysis revealed that the 35 amino acid sequences of 7 out of the 22 epitopes were hardly affected by a number of 36 mutations in the Sequence Read Archive database of SARS-CoV-2 variants. The 37 information of such conserved epitopes might be useful for designing the next-generation 38 COVID-19 vaccine that is universally effective against any SARS-CoV-2 variants by the 39 The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative 59 7 pneumonia caused by the SARS-CoV-2 infection (26). In the virus-challenge experiment 121 using rhesus macaques, depletion of CD8+ T cells in convalescent macaques that had been 122 infected with SARS-CoV-2 partially abrogated the protective efficacy of natural immunity 123 against rechallenge with SARS-CoV-2 (27), suggesting CD8 + T cells can contribute to virus 124 control in COVID-19. The current mRNA vaccine and adenoviral-vectored vaccine elicit 125 SARS-CoV-2 S protein-specific CD8 + CTLs as well as anti-S neutralizing antibodies (28), 126 which might make these vaccines more efficient than inactivated and subunit vaccines. It is 127 known that BNT162b2 mediates protection from severe disease as early as 10 days after 128 prime vaccination, when neutralizing antibodies are hardly detectable. Since functional 129 S-specific CD8 + T cells were shown to be already present at this early stage, CD8 + T cells 130 were speculated to be the main mediators of the protection (29). Thus, several lines of 131 evidence suggest the contribution of CTLs on the viral control in COVID-19, and therefore 132 it may be beneficial to take advantage of CD8 + CTLs for the development of the 133 next-generation vaccine. In addition, CTLs can target a wide range of proteins involving 134 comparatively conserved non-structural proteins. A novel vaccine with ability to elicit 135 conserved epitope-specific CTLs may not be affected by mutations of various SARS-CoV-2 136 variants. 137 As shown in Fig. 1 , the 5'-terminal two-thirds of the genome of SARS-CoV-2 are 138 composed of the open reading frame 1a (ORF1a) and ORF1b. The ORF1a encodes the 139 polyprotein 1a (pp1a) which is a largest protein composed of 11 non-structural regulatory 140 proteins (nsp1-11) in SARS-CoV-2. Due to its large size, it seems highly possible to find 8 dominant epitopes in the pp1a. Saini et al. revealed that most of the immunodominant 142 epitopes they identified belonged to the ORF1 region (30). In addition, it may be possible to 143 identify conserved CTL epitopes in the pp1a because the ORF1 region is highly conserved 144 within coronaviruses relative to structural proteins (31). From the above, we here attempted 145 to identify conserved CTL epitopes derived from pp1a of SARS-CoV-2 using MHC-I 146 transgenic mice. We focused on HLA-A*24:02-resctricted CTL epitopes because 147 HLA-A*24:02 is relatively predominant in East Asians such as Japanese (32). This 148 information might be useful for designing the next-generation COVID-19 vaccine that is 149 universally effective against any SARS-CoV-2 variants by the induction of both anti-Spike 150 neutralizing antibodies and CTLs specific for conserved epitopes. 151 Prediction of HLA-A*24:02-restricted CTL epitopes derived from SARS-CoV-2 pp1a. 154 To predict HLA-A*24:02-rescricted CTL epitopes derived from SARS-CoV-2 pp1a, we 155 used a T-cell epitope database, SYFPEITHI (33). The top 80 epitopes in the database were 156 selected and were synthesized into 9-mer peptides ( Table 1) . These epitopes were also 157 evaluated by other three programs, IEDB (34), ProPred-1 (35), and NetCTL (36) Table 1 , the rank of each epitope was not always the same in the four 163 programs, suggesting that multiple programs are needed to successfully predict CTL 164 epitopes. 165 Eighty peptides were investigated for their binding affinities to HLA-A*24:02 molecules 166 using TAP2-deficient RMA-S-HHD-A24 cells. Since the half-maximal binding level (BL 50 ) 167 value of a positive control peptide, Influenza PA 130-138 (37) was 2.4 M, we defined an 168 extremely high binder with a BL 50 value below 1.0 M, a high binder with a BL 50 value 169 ranging from 1 to 10 M, a medium binder with a BL 50 value ranging from 10 to 80 M, 170 and a low binder with a BL 50 value above 80 M. Among 80 peptides, 11peptides and 10 171 peptides were extremely high binders and high binders, respectively, while 15 peptides were 172 medium binders (Table 2) . The remaining 44 peptides demonstrated low binding affinities or no binding to HLA-A*24:02 (Table 2) . Comparison of the peptide binding affinity and the 174 peptide rank in the 4 algorithms (Table 3) revealed that A-ranked peptides did not always 175 show the high level of the peptide binding affinity to HLA-A*24:02. On the other hand, 176 none of D-ranked peptides were classified into the extremely high group. When comparing 177 the four programs in the prediction of extremely high binders and high binders, the IEDB 178 program was likely to estimate them most accurately (Fig. 2) . 179 In the following experiments, 36 peptides involving extremely high, high, and medium 180 binders were chosen to investigate their abilities of peptide-specific CTL induction. 181 182 Induction of SARS-CoV-2 pp1a-specific CD8 + T cell responses in HLA-A*24:02 183 transgenic mice immunized with liposomal peptides. 184 The 36 peptides were randomly divided into 6 groups. Six peptides in each group were 185 mixed and encapsulated into liposomes as described in the materials and methods. 186 HLA-A*24:02 transgenic mice were then subcutaneously (s.c.) immunized four times at a 187 one-week interval with peptide-encapsulated liposomes together with CpG adjuvant. One 188 week later, spleen cells of immunized mice were prepared, stimulated in vitro with a relevant 189 peptide for 5 hours, and stained for their expression of cell-surface CD8 and intracellular 190 interferon-gamma (IFN-). As shown in Fig. 3 , it was demonstrated that significant numbers 191 of IFN--producing CD8 + T cells were detected in mice immunized with 22 liposomal 192 peptides including pp1a-265, -634, -835, -1182, -1255, -1417, -1845, -1899, -2330, -2338, 193 These data indicated that the 22 peptides were HLA-A*24:02-restricted CTL candidate 195 epitopes derived from SARS-CoV-2 pp1a. However, the induction level of IFN--producing 196 CD8 + T cells varied among the 22 peptides. Five peptides including pp1a-265, -1255, -2330, 197 -3104, and -3792 elicited high percentages of intracellular IFN- + cells in CD8 + T cells, 198 ranging from 1.8% to 7.7%, whereas the other 17 peptides induced medium (0.5-1%) or low 199 percentages (0.1-0.5%) of IFN- + CD8 + T cells (Fig. 3) . When comparing between the data 200 of ICS and the peptide binding affinity (Table 4) , it was shown that all of extremely high 201 binders did not elicit IFN- producing CD8 + T cells and two medium binder peptides 202 activated high percentages of intracellular IFN- + cells in CD8 + T cells. However, the 203 proportion of extremely high binder peptides that induced IFN- producing CD8 + T cells 204 was higher than that of medium binder peptides (Table 4 ), confirming that the peptide 205 binding affinity to HLA class I molecules is closely associated with the induction of 206 peptide-specific CTLs. 207 208 We next investigated whether the 22 candidate epitopes were mutated in various 210 SARS-CoV-2 variants. To do this, we utilized the National Center for Biotechnology 211 Information (NCBI) Virus database (https://www.ncbi.nlm.nih.gov) (38), in which they 212 provide us data-sets of mutations in the Sequence Read Archive (SRA) records of 213 SARS-CoV-2 variants. In the database, the nucleotide and amino acid sequences of variants 214 in SRA records were aligned for comparison with those of the original strain, Wuhan-Hu-1 12 (NCBI Reference Sequence: NC_045512.2). In the SRA mutation data, the most frequent, 216 non-synonymous amino acid change was the mutation from D to G at position 614 (D614G) 217 in the S protein, and the total count of D614G across the database was 615,601 in 924,785 218 SRA runs (Frequency per run: 66.6%) available as of 23rd August 2021. To investigate the 219 conservation of the 22 epitopes, we counted the total number of non-synonymous amino 220 acid substitutions present in the 9-mer amino acid sequence of each epitope that were found 221 in a number of SRA sequencing data of SARS-CoV-2 variants in 924,785 SRA runs. It was 222 discovered that all of those epitopes had more or less amino acid substitutions in their amino 223 acid sequences (Table 5 & Fig. 4 ), indicating none of them were fully conserved throughout 224 all of the available SRA data. However, there were seven epitopes with low counts of total 225 mutations present in their 9-mer amino acid sequences, indicating that the amino acid 226 sequences of the seven epitopes were hardly affected by a number of mutations in the SRA 227 database ( Table 5) Frequency: 2.13%) ( Table 5) . It was then determined which of the relatively conserved top 4 epitopes, namely 237 pp1a-1417, -3104, -3792, and -4229, was most dominant in the induction of pp1a-specific 238 CTLs. The same amounts of the 4 peptide solutions at an equal concentration were mixed 239 together and encapsulated into liposomes. Eight mice were immunized with the liposomes 240 containing the peptide mixture. One week later, spleen cells were incubated with each of the 241 4 peptides for 5 hours, and the ICS assay was performed. It was found that pp1a-3104 was 242 far superior to all other peptides in the induction of peptide-specific IFN- + CD8 + T cells 243 ( Fig. 5A ). We also examined the peptide-specific induction of CD8 + T cells expressing a 244 degranulation marker, CD107a. As shown in Fig. 5B , pp1a-3104 was statistically 245 predominant over pp1a-1417 and -4229 for the CD107a induction of CD8 + T cells. Thus, it 246 was found that pp1a-3104 was the most prominent HLA-A*24:02-restricted CTL epitope 247 among the conserved top 4 epitopes. 248 All of the current available COVID-19 vaccines have been directed against the S protein 251 of the original SARS-CoV-2, and therefore they are less effective against some variants with 252 mutated S such as the Beta and Delta strains than the original virus. Our concern is that 253 SARS-CoV-2 is currently under evolution and various variants are appearing one after 254 another. One day soon, new mutant strains that perfectly evade the immunity generated by 255 the vaccines may emerge. To develop a next-generation vaccine to compensate for the viral 256 evolution, it may be beneficial to take advantage of CTLs because they can target a wide 257 range of SARS-CoV-2-derived proteins, involving comparatively conserved non-structural 258 proteins. 259 Here, we have identified twenty-two HLA-A*24:02-restricted CTL candidate epitopes 260 derived from SARS-CoV-2 pp1a using HLA-A*24:02 transgenic mice. The pp1a is a large 261 polyprotein consisting of 4,401 amino acids that may be relatively conserved compared to 262 structural proteins such as the S protein (31). Furthermore, Tarke et al. demonstrated that 263 most of T cell epitopes they identified were conserved across the Alpha, Beta, Gamma, and 264 Epsilon (CAL.20C) variants, and the impact of the four variants on the total CD8 + T cell 265 reactivity in vaccinated individuals was negligible (39). Hence, we firstly thought it might be 266 possible to find pp1a-derived epitopes that were fully conserved across a number of the 267 existing SARS-CoV-2 variants. Unfortunately, none of the 22 epitopes we identified were 268 found to be completely conserved throughout vast amounts of the SRA data in the NCBI 269 Virus database. This is understandable because most (73.3%) of the 4,401 amino acids that make up the pp1a have non-synonymous amino acid substitutions found in the 271 SARS-CoV-2 SRA data. As shown in Table 5 , however, seven epitopes including pp1a-835, 272 -1417, -1899, -2590, -3104, -3792, and -4299 were relatively conserved due to low counts of 273 total mutations and minimum mutation frequencies of less than 0.1% in their amino acid 274 sequences. Of note, pp1a-3104 was indicated to be the most dominant epitope in the 275 induction of activated CD8 + T cells (Fig. 5) . 276 In the current study, we have focused on HLA-A*24:02-restricted CTL epitopes 277 because HLA-A*24:02 is predominant in East Asian people (34, 40) such as Japanese (allele 278 frequency: 32.7%) (41). On the other hand, HLA-A*02:01 individuals are well known to be 279 highly frequent all over the world (34). We previously identified eighteen of 280 HLA-A*02:01-restricted CTL candidate epitopes derived from SARS-CoV-2 pp1a using 281 HLA-A*02:01 transgenic mice (42). Then, we here examined how much these epitopes were 282 mutated across the vast SRA data. As shown in Table 6 , four epitopes involving pp1a-2785, 283 -2884, -3403, and -3583 were found to be relatively conserved because of their low mutation 284 frequencies per SRA run. Fig. 6 indicates where the four HLA-A*02:01-restricted (Table 6) , 285 and seven HLA-A*24:02-rescricted (Table 5) To identify HLA-A*24:02-restricted CTL epitopes, we utilized highly reactive 295 HLA-A*24:02 transgenic mice (43) . One reason for using MHC-I transgenic mice instead 296 of lymphocytes of SARS-CoV-2-infected individuals is that a large number of lymphocytes 297 are required to examine many candidates of CTL epitopes. Furthermore, when using 298 patients' lymphocytes, we are only testing whether the peptide candidates are recognized by 299 memory CTLs. In contrast, naive mice can be used to see if the epitope candidates are able 300 to prime peptide-specific CTLs. This may be a better criterion to judge them as vaccine 301 antigens. However, we have to take into account that the immunogenic variation in HLA 302 class I transgenic mice may not be identical to that in humans because the antigen 303 processing and presentation differ between them. In addition, we did not present data 304 showing that viral infection in a mouse model induces T cells targeting these epitopes because 305 liposomal peptides were used as an immunogen. Hence, there is no guarantee that the candidate 306 epitopes identified here are real pp1a-derived epitopes that are presented by human cells during 307 live infection with SARS-CoV-2. Recently, eight epitopes with the same amino acid sequences 308 as pp1a-265, -1182, -1899, -2330, -3104, -3114, -3249, and -3684 ( HLA-A*24:02-restricted pp1a-specific CTL epitopes. Five (pp1a-265, -1182, -1899, -2330, 311 and -3249) of them were shown to be positive in T cell assays using human lymphocytes, and therefore, they are thought to be real epitopes. Three epitopes (pp1a-3104, -3114, and 313 -3684) of them were positive in the binding assay but negative in T cell assays, suggesting 314 that they are not likely to be real epitopes. Then, the remaining 14 candidate epitopes in 315 Table 5 represent new candidate epitopes that have not been previously identified. 316 In summary, we have identified 22 kinds of HLA-A*24:02-restricted CTL candidate 317 epitopes derived from the pp1a of SARS-CoV-2 using computational algorithms, 318 HLA-A*24:02 transgenic mice and the peptide-encapsulated liposomes. The conservation 319 analysis revealed that the amino acid sequences of 7 out of the 22 epitopes were hardly 320 affected by a number of mutations in the SRA database of SARS-CoV-2 variants. We also 321 found four relatively conserved epitopes among 18 HLA-A*02:01-restricted CTL candidate 322 epitopes that we had previously identified. The new mRNA or adenoviral-vectored vaccine 323 containing nucleotide sequences encoding some of these epitopes might have the potential to 324 become the universal vaccine against almost all of the existing and upcoming SARS-CoV-2 325 variants. 326 A T-cell epitope database, SYFPEITHI (33) was used to predict HLA-A*24:02-restricted 330 CTL epitopes derived from pp1a of SARS-CoV-2 (GenBank accession numbers: 331 LC528232.1 & LC528233.1). Eighty of 9-mer peptides with superior scores (17 or higher) 332 in the SYFPEITHI database were selected (Table 1 ) and were synthesized by Eurofins 333 Genomics (Tokyo, Japan). These epitopes were also evaluated by three other algorithms, 334 IEDB (34), ProPred-1 (35), and NetCTL (36) (Table 1). An HLA-A*24:02-restricted control 335 peptide, Influenza PA 130-138 (sequence: YYLEKANKI) (37), was synthesized as well. 336 337 We used HLA-A*24:02 transgenic mice which were kindly provided by Dr. François A. 339 Lemonnier (Pasteur Institute, Paris, France). The HLA-A*24:02 transgenic mouse expresses 340 an HLA-A*24:02 monochain, designated as HHD-A24, in which human 2m is covalently 341 linked to a chimeric heavy chain composed of HLA-A*24:02 (1 and 2 domains) and 342 H-2D b (3, transmembrane, and cytoplasmic domains) in an H-2D b , K b , and mouse 2m 343 triple knockout environment (43) . Six-to ten-week-old mice were used for all experiments. Binding affinity of each peptide to HLA-A*24:02 was measured by the peptide binding 362 assay using RMA-S-HHD-A24 cells, as described before (42) To examine the conservation of the CTL candidate epitopes, we utilized the SRA data of 403 SARS-CoV-2 variants in the NCBI Virus database. We counted the total number of 404 non-synonymous amino acid changes present in the 9-mer amino acid sequence of each 405 epitope that were found in the SRA mutation database, and calculated percentage of the 406 mutation frequency per SRA run of each epitope. Count of the total amino acid substitutions present in the 9-mer amino acid sequence of each HLA-A*24:02-restricted, pp1a-specific CTL candidate epitope that were found in the SRA database of SARS-CoV-2 variants. Percentage in parenthesis indicates the mutation frequency per SRA run. Safety and efficacy of the BNT162b2 mRNA 434 Covid-19 vaccine Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine Accelerated COVID-19 vaccine 443 development: milestones, lessons, and prospects COVID-19 vaccines: modes of immune activation and 445 future challenges Coronavirus RNA proofreading: Molecular basis and therapeutic targeting HLA-A*24:02, HLA-B*08:01, HLA-B*27:05, HLA-B* HLA-C*07:01 monochain transgenic/H-2 class I null mice: Novel 621 versatile preclinical models of human T cell responses A monoclonal antibody against 623 A: Excellent; B: Very good; C: good; D: Poor) in each of the four algorithms (SYFPEITHI, IEDB, ProPred-I, NetCTL)